RAG-Agent Pattern

An agent that decides when to retrieve additional context from a knowledge base, synthesizes the retrieved information, and produces grounded responses.

When to Use

Question answering over a knowledge base where answers must be grounded in specific documents
Codebase-aware AI assistants that need to read source files before answering
Document Q&A with selective retrieval (not all queries need retrieval)
Real-time information lookup during reasoning where stale knowledge is insufficient
Multi-hop reasoning where answering requires synthesizing information across multiple sources

When NOT to Use

General knowledge queries — if the LLM already knows the answer, retrieval adds unnecessary latency
Fixed retrieval patterns — if you always retrieve the same sources, use MapReduce instead
Simple keyword search — a traditional search engine is faster and more appropriate
When you need ALL documents — RAG-Agent selectively retrieves; if you need exhaustive coverage, use MapReduce

Architecture

graph TD
    START --> decide{Decide: Retrieve or Answer?}
    decide -->|RETRIEVE| retrieve[Retrieve Documents]
    decide -->|ANSWER| END
    retrieve --> synthesize[Synthesize Context]
    synthesize --> decide
    decide -->|max_retrievals| END

Key Concepts

The RAG-Agent Pattern combines the reasoning capabilities of an LLM with selective retrieval from an external knowledge base. Unlike MapReduce, which retrieves all sources regardless of relevance, RAG-Agent decides whether retrieval is needed based on the query.

The key distinction from MapReduce: - MapReduce: Fixed retrieval pipeline — all sources are analyzed regardless of relevance - RAG-Agent: Conditional retrieval — the agent decides what's needed, reducing unnecessary retrieval

The agent loop: 1. Decide: Given the query and any previously retrieved context, should I retrieve more or answer? 2. Retrieve: If yes, retrieve specific documents from the knowledge base 3. Synthesize: Update context with newly retrieved documents 4. Loop: Go back to Decide until satisfied or max retrievals reached

Quick Start

cd patterns/rag_agent
python example.py

Core Code

def _decide(self, state: RAGAgentState) -> dict:
    """Agent decides whether to retrieve more or answer."""
    response = self.llm.invoke(messages)
    # Parse ## Decision: RETRIEVE or ANSWER
    # Return should_retrieve, docs_to_retrieve, response, reasoning

How It Works

Decide: Agent evaluates the query and determines if additional context is needed
Retrieve (if needed): Specific documents are retrieved from the knowledge base
Synthesize: Retrieved documents are integrated into the agent's context
Loop: The agent decides again with the updated context
Answer: Once satisfied (or max retrievals reached), the agent produces a final answer

Configuration

Parameter	Default	Description
`model`	`gpt-4o-mini`	LLM model name
`llm`	`None`	Pre-configured LLM instance
`max_retrievals`	`3`	Maximum number of retrieval iterations

Comparison with Other Patterns

Aspect	RAG-Agent	MapReduce	Reflection
Retrieval trigger	Agent decision	Always all sources	Never (internal knowledge)
Retrieval scope	Selective	Exhaustive	N/A
Agent autonomy	High (decides what to retrieve)	Low (fixed pipeline)	Medium (decides when to stop)
Best for	Multi-hop Q&A	Comprehensive analysis	Quality improvement
Latency	Query-dependent	Always high	Low

Example Output

QUERY: What is Python and when was it created?
Retrievals: 1
Documents Retrieved: 1
  - [doc1]
Response: Python is a high-level programming language known for its readability.
Created by Guido van Rossum in 1991...