Retrieval Augmented Generation (RAG)¶
Learning outcomes
- Understand the key components of RAG applications by looking at what popular open-source RAG libraries provide
- Perform an Agentic RAG task

-
LLMs are not trained on your personal data or fairly recent data.
-
RAG can help provide richer and accurate responses based on external knowledge.
-
It incurs significantly lower computation cost compared to long-context LLMs.
-
We will learn RAG through the lens of popular open-source RAG libraries, viz. LangChain and LlamaIndex.
Basics¶
RAG figures
Naive RAG

Naive Retrieval System

Stages in RAG 🔄¶
Atomic unit
- LangChain's atomic unit is a Document.
- LlamaIndex's atomic unit is a Node. A collection of Nodes consitutes a Document.
Loading 📥¶
Loading/Parsing data from source and creating well-formatted Documents with metadata. (1)
This step includes splitting texts such that it can be embedded into lower dimensions.
flowchart LR
A[Text/image + metadata] --> B[Chunking/Splitting] --> C[Document]
Document parsing sequence
- LangChain creates Documents first and then performs chunking.
- LlamaIndex performs chunking first, that becomes a Node and then creates Documents of multiple Nodes.
Indexing 📊¶
Creating data structure and/or reducing dimensions of the data for easy querying of data.(2)
flowchart LR
A[Document] --> B[Embeddings]
Storing 💾¶
Storing Documents, metadata and embeddings in a persistant manner (Ex. Vector Stores). (3)
flowchart LR
A[Document] --> C[Vector Store/Storage Context]
B[Embeddings] --> C
Querying ❓¶
Retrieving relavent Documents for a user Query and feeding it to LLM for added context. (4)
flowchart LR
A[Vector Store/Storage Context] --> D[LLM + tools]
B[Query] --> D
C[Prompt] --> D
D --> E[Response]
Evaluation 📈¶
Trace inspection, meterics, comparisons to test if full pipeline gives desired results. (5)
-
LlamaIndex ex.: SimpleDirectoryReader class
LangChain ex.: document_loaders module, langchain_text_splitters module -
LlamaIndex ex.: VectorStoreIndex class
LangChain ex.: Embeddings class -
LlamaIndex ex.: StorageContext
LangChain ex.: VectorStore -
LlamaIndex ex.: RetrieverQueryEngine class
LangChain ex.: Retriever class -
LlamaIndex ex.: LLM-Evaluator
LangChain ex.: LangSmith, QAEvalChain -
Retrieval techniques
- QA/chat
- Misc: Reranker model, GraphRAG, RAPTOR, EraRAG, multimodal
Exercise
DIY: vllm with langchain
When and when not to use RAG ⚖️
-
It was found1 that RAG lags behing Long-Context LLMs in the following scenarios: (1)
- Query requiring multi-step reasoning.
- General queries to which embeddings model does not perform well.
- Long and complex queries.
- Implicit queries requiring the reader to connect the dots.
-
Way easier than just fine-tuning on personal data.
- Allows smaller models with shorter context memory to be on par with larger models. Therefore, saving compute and memory cost on GPUs.
Note on popular Chat UI frameworks
If doing basic RAG which can read a few of your documents and can search the web, check out popular LLM chat frameworks with integrated RAG functionality.
- LMstudio
- Open-webui
- WebUI by llama.cpp
- Chainlit
Agentic RAG 🤖¶
- An LLM-powered agent decides when and how to retrieve during reasoning. This gives more flexibility in the decision making process by the system but low control over it by the engineer.
- Router
- Tool calling
- Multistep reasoning with tools
- llm.txt
- Documentation sites have already started building their RAG powered chatbots: vllm, langchain, anthropic etc.
(More about Agents will be covered in Day 3.)
Exercise
-
Create
~/portal/jupyterdir if you dont have already. -
Copy
llm-workshop/containers/rag/rag_env.shto your~/portal/jupyter/. ie.cp /mimer/NOBACKUP/groups/llm-workshop/containers/rag/rag_env.sh ~/portal/jupyter/ -
Start a jupyter server on 1x A40 node using
rag_env.shruntime and working directory as your project folder. -
Run
rag.ipynb
RAG on single node

Have you tried out chatbot for UPPMAX docs yet? Its a RAG system under the hood! C3SE chatbot is coming soon too.
Resources 📚
-
Recommended papers on RAG:
-
Popular libraries and software suite: