Retrieval Augmented Generation (RAG)¶
Learning outcomes
- Understand the key components of RAG applications by looking at what popular open-source RAG libraries provide
- Perform a simple RAG task

-
LLMs are not trained on your personal data or fairly recent data.
-
RAG can help provide richer and accurate responses based on external knowledge.
-
It incurs significantly lower computation cost compared to long-context LLMs.
-
We will learn RAG through the lens of popular open-source RAG libraries, viz. LangChain and LlamaIndex.
Basics¶
RAG figures
Naive RAG

Naive Retrieval System

Stages in RAG 🔄¶
Atomic unit
- LangChain's atomic unit is a Document.
- LlamaIndex's atomic unit is a Node. A collection of Nodes consitutes a Document.
Loading 📥¶
Loading/Parsing data from source and creating well-formatted Documents with metadata. (1)
This step includes splitting texts such that it can be embedded into lower dimensions.
flowchart LR
A[Text/image + metadata] --> B[Chunking/Splitting] --> C[Document]
Document parsing sequence
- LangChain creates Documents first and then performs chunking.
- LlamaIndex performs chunking first, that becomes a Node and then creates Documents of multiple Nodes.
Indexing 📊¶
Creating data structure and/or reducing dimensions of the data for easy querying of data.(2)
flowchart LR
A[Document] --> B[Embeddings]
Storing 💾¶
Storing Documents, metadata and embeddings in a persistant manner (Ex. Vector Stores). (3)
flowchart LR
A[Document] --> C[Vector Store/Storage Context]
B[Embeddings] --> C
Querying ❓¶
Retrieving relavent Documents for a user Query and feeding it to LLM for added context. (4)
flowchart LR
A[Vector Store/Storage Context] --> D[LLM + tools]
B[Query] --> D
C[Prompt] --> D
D --> E[Response]
Evaluation 📈¶
Trace inspection, meterics, comparisons to test if full pipeline gives desired results. (5)
-
LlamaIndex ex.: SimpleDirectoryReader class
LangChain ex.: document_loaders module, langchain_text_splitters module -
LlamaIndex ex.: VectorStoreIndex class
LangChain ex.: Embeddings class -
LlamaIndex ex.: StorageContext
LangChain ex.: VectorStore -
LlamaIndex ex.: RetrieverQueryEngine class
LangChain ex.: Retriever class -
LlamaIndex ex.: LLM-Evaluator
LangChain ex.: LangSmith, QAEvalChain -
Retrieval techniques
- QA/chat
- Misc: Reranker model, GraphRAG, RAPTOR, EraRAG, multimodal
https://learn.deeplearning.ai/courses/langchain-chat-with-your-data/lesson/snupv/introduction
Resources 📚
-
Recommended papers on RAG:
-
Popular libraries and software suite:
When and when not to use RAG ⚖️
-
It was found1 that RAG lags behing Long-Context LLMs in the following scenarios: (1)
- Query requiring multi-step reasoning.
- General queries to which embeddings model does not perform well.
- Long and complex queries.
- Implicit queries requiring the reader to connect the dots.
-
Way easier than just fine-tuning on personal data.
- Allows smaller models with shorter context memory to be on par with larger models. Therefore, saving compute and memory cost on GPUs.
Agentic RAG 🤖¶
- An LLM-powered agent decides when and how to retrieve during reasoning. This gives more flexibility in the decision making process by the system but low control over it by the engineer.
- Router
- Tool calling
- Multistep reasoning with tools
(More about Agents will be covered in Day 3.)
https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/yd6nd/introduction
Some more useful techniques in retrieval pipelines 🛠️
- Rerankers
- article on rerankers
- GraphRAG (Knowledge graphs?)
- RAPTOR
- EraRAG