GuardRail Pattern

A primary agent executes a task, then a safety guard reviews the output and approves, blocks, or redirects it. If blocked or redirected, the primary retries with specific feedback.

When to Use

Content moderation where harmful or inappropriate output must be caught before delivery
Code generation with security review (prevent SQL injection, XSS, etc.)
Financial or legal output requiring compliance verification
High-stakes decisions where a second opinion is mandatory
User-facing applications where quality and safety cannot be compromised

When NOT to Use

Iterative refinement — if you need ongoing quality improvement, use Reflection instead
Low-stakes content — the guard overhead is not worth it for casual outputs
Simple transformations — no need for a guard on straightforward, predictable tasks
Real-time streaming — the guard checkpoint introduces latency

Architecture

graph TD
    START --> primary[Primary Agent: Execute]
    primary --> guard[Guard Agent: Check]
    guard --> decision{Verdict?}
    decision -->|APPROVE| finalize[Finalize Output]
    decision -->|BLOCK| primary
    decision -->|REDIRECT| primary
    finalize --> END

Key Concepts

The GuardRail Pattern is preventive, not corrective. Unlike Reflection which iteratively improves output through review-rewrite cycles, GuardRail operates as a checkpoint: execute once, review once, decide once.

The guard agent makes a ternary decision: - APPROVE: Output passes review, proceed to finalization - BLOCK: Serious violation detected, primary must retry with corrections - REDIRECT: Minor issues, primary should retry with guidance

The key distinction from Reflection: - Reflection: write → review → rewrite → review → rewrite → ... (iterative, quality-focused) - GuardRail: execute → guard → APPROVE/BLOCK/REDIRECT → END or retry (checkpoint, safety-focused)

Quick Start

cd patterns/guardrail
python example.py

Core Code

def _should_continue(self, state: GuardRailState) -> str:
    """Route based on guard verdict and attempt count."""
    if state["guard_verdict"] == "approve":
        return "approve"
    if state["attempts"] >= state.get("max_attempts", self.max_attempts):
        return "max_attempts"
    return state["guard_verdict"]

How It Works

Primary Execute: Primary agent generates output for the given task
Guard Check: Guard agent reviews output and returns verdict + feedback
Routing: Based on verdict and attempt count, either finalize or retry
Finalize: Output is accepted as final

Configuration

Parameter	Default	Description
`model`	`gpt-4o-mini`	LLM model name
`llm`	`None`	Pre-configured LLM instance
`max_attempts`	`3`	Maximum execution attempts before forcing acceptance

Comparison with Other Patterns

Aspect	GuardRail	Reflection	Debate
Purpose	Safety checkpoint	Quality improvement	Conflict resolution
Iteration	Checkpoint (1 retry)	Iterative loop	Multi-round
Review type	Binary/ternary verdict	Quality score	Arguments
Trigger	Every execution	Low quality score	Probing
Best for	High-stakes output	Writing refinement	Adversarial exploration