LLM orchestration using LangChain — Part 2 (Application)

Anirban Sen
6 min readSep 16, 2023
Photo by Laurin Steffens on Unsplash

In the previous blog, we learnt about the background/basics of Langchain, In this one we will see some use-cases and know about how to evaluate these LLMs

4. Question and Answer over documents

LLM’s can only inspect a few thousand words at a time. This is problem in long documents and for that vector stores are used. Embedding vector captures content/meaning. Text with similar content will have similar vectors. Vector Database is a way to store vector representations. We break the text into chunks for each chunk we embed and store the embedding vector in the database.

In runtime, when a query is provided we create the embedding from the model, and look the closest N embeddings. These are then passed to the LLM model to get the final answer.

Now this can be done using various tools. Let’s see how we can do this using Langchain and ChatGPT. So, we have a file with product name and descriptions which we will use for the usecase —

OutdoorClothingCatalog_1000.csv
from langchain.document_loaders import CSVLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.embeddings import OpenAIEmbeddings
from IPython.display import display, Markdown

query = "Please list all shirts with sun protection in a tale in markdown and summarize each one"
file = 'OutdoorClothingCatalog_1000.csv' #contains name and desc of products

#short code
loader = CSVLoader(file_path = file)
index = VectorstoreIndexCreator(
vectorstore_cls = DocArrayInMemorySearch).from_loaders([loader])
response = index.query(query)
display(Markdown(response))
This will respond with all the products having sun protection. Easy?

Now this can also be done in a step-by-step manner as shown below

#Step 1 : Load files
loader = CSVLoader(file_path = file)
docs = loader.load() #will create a list of documents which can be lookaed at like docs[0]
#Step 2 : Create embeddings
embeddings = OpenAIEmbeddings()
db = DocArrayInMemorySearch.from_documents(docs, embeddings)
#Step 3 : Search and summarize similar docs
retriever = db.as_retriever()
qa = RetrievalQA.from_chain_type(llm = llm, chain_type = "stuff",
retreiver = retriever, verbose = True)
response = qa.run(query)
display(Markdown(response))

The method/chain_type we used to pass the context (OutdoorClothingCatalog_1000.csv) to the LLM with the question is “stuff” (as seen in code). Stuffing is the simplest method where it simply stuffs all data into the prompt as context to pass to the LLM.
Pros — It makes a single call to the LLM. The LLM has access to all the data at once.
Cons — LLMs have a context length, and for large documents or many documents this will not wok as it will result in a prompt larger than the context length.
The alternative 3 methods are as follows —

  1. Map_reduce — Passes all chunks along with question to LLMs and then final passes all the results to an LLM to summarize to get the final answer
  2. Refine — Iteratively builds upon the answer by taking the chunk and the result from the LLM for the previous chunk
  3. Map_rerank — Passes all chunks along with question to LLMs and asks it also rank the answer if its relevant. Takes the highest scored answer as the final answer.

5. Agents

In chains, a sequence of actions is hardcoded (in code). In agents, a language model is used as a reasoning engine to determine which actions to take and in which order. In the below code, the agent automatically chooses the tool it needs (without explicitly mentioning like in Chains)

from langchain.agents import load_tools, initialize_agent
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature = 0)
tools = load_tools(["llm-math", "wikipedia"])
agent = initialize_agent( tools, llm,
agent = AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,
handle_parsing_errors = True,
verbose = True #will show all the path it has taken
)

#uses calculator tool automatically
agent("What is the 25% of 300?")
#uses wikipedia tool automatically
question = "Tom M. Mitchell is an American computer scientist and the Founders University Professor at Carnegie Mellon University (CMU)\what book did he write?"
result = agent(question)
results of above queries

Langchain also has a Python REPL tool which can be used to get things done using Python by asking in natural language.

from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool

agent = create_python_agent(llm, tool = PythonREPLTool(), verbose = True)
customer_list = [["Harrison", "Chase"],
["Dolly", "Too"],
["Geoff", "Fusion"]]
#uses python REPL
agent.run(f"""Sort these customers by last name and then first name and \
print the output: {customer_list}""")
result of the above query

6. Evaluation

When building a complex application using an LLM, one of the important but sometimes tricky steps is how do you evaluate how well your application is doing? Is it meeting some accuracy criteria? And also, if you decide to change your implementation, maybe swap in a different LLM, or change the strategy of how you use a vector database or something else to retrieve chunks, or change some other parameters of your system, how do you know if you’re making it better or worse?

First step of evaluation of a model is to setup validation data, that can be done both manually as well using an LLM 🙈 . We will be using the same OutdoorClothingCatalog_1000 dataset and the QnA model to be evaluated.

import langchain
from langchain.evaluation.qa import QAGenerateChain

#manual qa example [{"query": "", "answer":""}]
eg = [{"query": "Do the Cozy Comfort Pullover Set have side pockets?", "answer": "Yes"}]
#generate example questions and answers using LLM
example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI())
data = loader.load() # from csvloader in above code for QnA
new_eg = example_gen_chain.apply_and_parse([{"doc":t} for t in data[:5]])
eg += new_eg

#sample qna generated from LLM
new_eg[0]
{'query': "What is the weight of one pair of Women's Campside Oxfords?",
'answer': "The weight of one pair of Women's Campside Oxfords is approximately 1 lb. 1 oz."}

Now just like in the setting up validation step, we can either evaluate manually or use an LLM to do so. By manual evaluation, we get to see the whole process of passing the question, the context (OutdoorClothingCatalog_1000 dataset) through the chains to ChatGPT using either stuff method or others.

#Follows the path - RetrievalQA<->StuffDocumentsChain<->LLMChain<->ChatOpenAI
langchain.debug = True
qa.run(eg[0]["query"])
langchain.debug = False
Start of the process
End of the process

Instead of going through all these for all the examples, we can using LLM assisted evaluation.

from langchain.evaluation.qa import QAEvalChain
#llm assisted evauation
preds = qa.apply(eg)
llm = ChatOpenAI(temperature = 0)
eval_chain = QAEvalChain.from_llm(llm)
graded_outputs = eval_chain.evaluate(eg, preds)
#looking at results
for i, eg in enumerate(examples):
print("Question:" + predictions[i]['query'])
print("Real Answer:" + predictions[i]['answer'])
print("Predicted Answer:" + predictions[i]['result'])
print("Predicted Grade:" + predictions[i]['text'])
We will get a summary like this for each example

Please do provide your feedback in form of responses and claps :)

--

--