RAG-Agent Pattern
An agent that decides when to retrieve additional context from a knowledge base, synthesizes the retrieved information, and produces grounded responses.
When to Use
- Question answering over a knowledge base where answers must be grounded in specific documents
- Codebase-aware AI assistants that need to read source files before answering
- Document Q&A with selective retrieval (not all queries need retrieval)
- Real-time information lookup during reasoning where stale knowledge is insufficient
- Multi-hop reasoning where answering requires synthesizing information across multiple sources
When NOT to Use
- General knowledge queries — if the LLM already knows the answer, retrieval adds unnecessary latency
- Fixed retrieval patterns — if you always retrieve the same sources, use MapReduce instead
- Simple keyword search — a traditional search engine is faster and more appropriate
- When you need ALL documents — RAG-Agent selectively retrieves; if you need exhaustive coverage, use MapReduce
Architecture
graph TD
START --> decide{Decide: Retrieve or Answer?}
decide -->|RETRIEVE| retrieve[Retrieve Documents]
decide -->|ANSWER| END
retrieve --> synthesize[Synthesize Context]
synthesize --> decide
decide -->|max_retrievals| END
Key Concepts
The RAG-Agent Pattern combines the reasoning capabilities of an LLM with selective retrieval from an external knowledge base. Unlike MapReduce, which retrieves all sources regardless of relevance, RAG-Agent decides whether retrieval is needed based on the query.
The key distinction from MapReduce: - MapReduce: Fixed retrieval pipeline — all sources are analyzed regardless of relevance - RAG-Agent: Conditional retrieval — the agent decides what's needed, reducing unnecessary retrieval
The agent loop: 1. Decide: Given the query and any previously retrieved context, should I retrieve more or answer? 2. Retrieve: If yes, retrieve specific documents from the knowledge base 3. Synthesize: Update context with newly retrieved documents 4. Loop: Go back to Decide until satisfied or max retrievals reached
Quick Start
cd patterns/rag_agent
python example.py
Core Code
def _decide(self, state: RAGAgentState) -> dict:
"""Agent decides whether to retrieve more or answer."""
response = self.llm.invoke(messages)
# Parse ## Decision: RETRIEVE or ANSWER
# Return should_retrieve, docs_to_retrieve, response, reasoning
How It Works
- Decide: Agent evaluates the query and determines if additional context is needed
- Retrieve (if needed): Specific documents are retrieved from the knowledge base
- Synthesize: Retrieved documents are integrated into the agent's context
- Loop: The agent decides again with the updated context
- Answer: Once satisfied (or max retrievals reached), the agent produces a final answer
Configuration
| Parameter | Default | Description |
|---|---|---|
model |
gpt-4o-mini |
LLM model name |
llm |
None |
Pre-configured LLM instance |
max_retrievals |
3 |
Maximum number of retrieval iterations |
Comparison with Other Patterns
| Aspect | RAG-Agent | MapReduce | Reflection |
|---|---|---|---|
| Retrieval trigger | Agent decision | Always all sources | Never (internal knowledge) |
| Retrieval scope | Selective | Exhaustive | N/A |
| Agent autonomy | High (decides what to retrieve) | Low (fixed pipeline) | Medium (decides when to stop) |
| Best for | Multi-hop Q&A | Comprehensive analysis | Quality improvement |
| Latency | Query-dependent | Always high | Low |
Example Output
QUERY: What is Python and when was it created?
Retrievals: 1
Documents Retrieved: 1
- [doc1]
Response: Python is a high-level programming language known for its readability.
Created by Guido van Rossum in 1991...