Skip to content

Retrieval Augmented Generation (RAG)

Learning outcomes
  • Understand the key components of RAG applications by looking at what popular open-source RAG libraries provide
  • Perform a simple RAG task

RAG meme

  • LLMs are not trained on your personal data or fairly recent data.

  • RAG can help provide richer and accurate responses based on external knowledge.

  • It incurs significantly lower computation cost compared to long-context LLMs.

  • We will learn RAG through the lens of popular open-source RAG libraries, viz. LangChain and LlamaIndex.

Basics

RAG figures
Naive RAG

Naive RAG

Naive Retrieval System

Naive Retrieval

Stages in RAG 🔄

Atomic unit

  • LangChain's atomic unit is a Document.
  • LlamaIndex's atomic unit is a Node. A collection of Nodes consitutes a Document.

Loading 📥

Loading/Parsing data from source and creating well-formatted Documents with metadata. (1)
This step includes splitting texts such that it can be embedded into lower dimensions.

flowchart LR
  A[Text/image + metadata] --> B[Chunking/Splitting] --> C[Document]

Document parsing sequence

  • LangChain creates Documents first and then performs chunking.
  • LlamaIndex performs chunking first, that becomes a Node and then creates Documents of multiple Nodes.

Indexing 📊

Creating data structure and/or reducing dimensions of the data for easy querying of data.(2)

flowchart LR
  A[Document] --> B[Embeddings]

Storing 💾

Storing Documents, metadata and embeddings in a persistant manner (Ex. Vector Stores). (3)

flowchart LR
  A[Document] --> C[Vector Store/Storage Context]
  B[Embeddings] --> C

Querying ❓

Retrieving relavent Documents for a user Query and feeding it to LLM for added context. (4)

flowchart LR
  A[Vector Store/Storage Context] --> D[LLM + tools]
  B[Query] --> D
  C[Prompt] --> D
  D --> E[Response]

Evaluation 📈

Trace inspection, meterics, comparisons to test if full pipeline gives desired results. (5)

  1. LlamaIndex ex.: SimpleDirectoryReader class
    LangChain ex.: document_loaders module, langchain_text_splitters module

  2. LlamaIndex ex.: VectorStoreIndex class
    LangChain ex.: Embeddings class

  3. LlamaIndex ex.: StorageContext
    LangChain ex.: VectorStore

  4. LlamaIndex ex.: RetrieverQueryEngine class
    LangChain ex.: Retriever class

  5. LlamaIndex ex.: LLM-Evaluator
    LangChain ex.: LangSmith, QAEvalChain

  6. Retrieval techniques

  7. QA/chat
  8. Misc: Reranker model, GraphRAG, RAPTOR, EraRAG, multimodal

https://learn.deeplearning.ai/courses/langchain-chat-with-your-data/lesson/snupv/introduction

Resources 📚

When and when not to use RAG ⚖️

  • It was found1 that RAG lags behing Long-Context LLMs in the following scenarios: (1)

    • Query requiring multi-step reasoning.
    • General queries to which embeddings model does not perform well.
    • Long and complex queries.
    • Implicit queries requiring the reader to connect the dots.
  • Way easier than just fine-tuning on personal data.

  • Allows smaller models with shorter context memory to be on par with larger models. Therefore, saving compute and memory cost on GPUs.

  1. RAG failure reasons

Agentic RAG 🤖

  • An LLM-powered agent decides when and how to retrieve during reasoning. This gives more flexibility in the decision making process by the system but low control over it by the engineer.
  • Router
  • Tool calling
  • Multistep reasoning with tools

(More about Agents will be covered in Day 3.)

https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/yd6nd/introduction

Some more useful techniques in retrieval pipelines 🛠️

  • Rerankers
  • GraphRAG (Knowledge graphs?)
  • RAPTOR
  • EraRAG

  1. Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach arXiv