Skip to content

Pattern: external governance gate via on_tool_start (independent /review before irreversible tool calls) #3697

Description

@babyblueviper1

The SDK already ships on_tool_start / on_tool_end and a set of guardrail examples — input_guardrails.py, output_guardrails.py, llm_as_a_judge.py. One pattern that falls between these is an external governance gate: a second opinion that isn't the same model, running before high-stakes tool execution, with a signed verifiable proof attached.

llm_as_a_judge.py uses the model to grade its own output. The complement: an independent party that isn't the engine, so the grade isn't from the optimist that produced the work.

Pattern using on_tool_start:

import httpx
from agents import RunHooks, RunContextWrapper, Agent, Tool

REVIEW_URL = "https://api.babyblueviper.com/review"
HIGH_STAKES_TOOLS = {"shell", "apply_patch", "file_write", "deploy", "transfer", "swap"}

class GovernanceHooks(RunHooks):
    async def on_tool_start(
        self, context: RunContextWrapper, agent: Agent, tool: Tool
    ) -> None:
        if tool.name not in HIGH_STAKES_TOOLS:
            return

        async with httpx.AsyncClient() as client:
            try:
                r = await client.post(
                    REVIEW_URL,
                    json={
                        "artifact_type": "tool_call",
                        "artifact": {
                            "tool": tool.name,
                            # tool.args available at on_tool_start via ToolContext
                            "args": getattr(context, "tool_arguments", {}),
                        },
                        "sign": True,  # returns a signed, recomputable proof
                    },
                    timeout=5.0,
                )
                if r.status_code == 200:
                    data = r.json()
                    if data.get("verdict") == "reject":
                        raise ValueError(
                            f"[Governance] {tool.name} blocked: {data.get('summary')}\n"
                            f"proof: {data.get('proof', {}).get('id')}"
                        )
                    # proof_id can be logged or attached to context for audit trail
            except httpx.TimeoutException:
                pass  # advisory — never block on gate unavailability

# wire it
result = await Runner.run(agent, input="...", hooks=GovernanceHooks())

Why external matters for high-stakes tools:

The existing guardrails run the same model on its own output — useful for formatting or safety checks, but the model that decided to call shell is also grading whether shell should be called. For high-stakes irreversible actions (deploy, pay, delete, trade), the grade coming from the same engine is the problem.

An external gate:

  • Isn't the same model, so it has no incentive to approve what it just decided
  • Returns a signed proof (sign: true) — any downstream system can verify the verdict without trusting the agent or the gate
  • Degrades gracefully (fail-open on timeout) — the agent stays fully autonomous
  • Appends to the audit trail without any changes to agent behavior

The gate is live at https://api.babyblueviper.com/mcp (MCP, tool name: review) or POST /review (REST). The SDK's llms_as_judge.py + this pattern together cover self-grading (fast, free) + external verification (authoritative, signed) for different action classes.

Happy to contribute this as an agent_patterns/external_governance.py example if the direction fits.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions