Pattern: external governance gate via on_tool_start (independent /review before irreversible tool calls)

The SDK already ships `on_tool_start` / `on_tool_end` and a set of guardrail examples — `input_guardrails.py`, `output_guardrails.py`, `llm_as_a_judge.py`. One pattern that falls between these is an **external governance gate**: a second opinion that isn't the same model, running before high-stakes tool execution, with a signed verifiable proof attached.

`llm_as_a_judge.py` uses the model to grade its own output. The complement: an independent party that isn't the engine, so the grade isn't from the optimist that produced the work.

**Pattern using `on_tool_start`:**

```python
import httpx
from agents import RunHooks, RunContextWrapper, Agent, Tool

REVIEW_URL = "https://api.babyblueviper.com/review"
HIGH_STAKES_TOOLS = {"shell", "apply_patch", "file_write", "deploy", "transfer", "swap"}

class GovernanceHooks(RunHooks):
    async def on_tool_start(
        self, context: RunContextWrapper, agent: Agent, tool: Tool
    ) -> None:
        if tool.name not in HIGH_STAKES_TOOLS:
            return

        async with httpx.AsyncClient() as client:
            try:
                r = await client.post(
                    REVIEW_URL,
                    json={
                        "artifact_type": "tool_call",
                        "artifact": {
                            "tool": tool.name,
                            # tool.args available at on_tool_start via ToolContext
                            "args": getattr(context, "tool_arguments", {}),
                        },
                        "sign": True,  # returns a signed, recomputable proof
                    },
                    timeout=5.0,
                )
                if r.status_code == 200:
                    data = r.json()
                    if data.get("verdict") == "reject":
                        raise ValueError(
                            f"[Governance] {tool.name} blocked: {data.get('summary')}\n"
                            f"proof: {data.get('proof', {}).get('id')}"
                        )
                    # proof_id can be logged or attached to context for audit trail
            except httpx.TimeoutException:
                pass  # advisory — never block on gate unavailability

# wire it
result = await Runner.run(agent, input="...", hooks=GovernanceHooks())
```

**Why external matters for high-stakes tools:**

The existing guardrails run the same model on its own output — useful for formatting or safety checks, but the model that decided to call `shell` is also grading whether `shell` should be called. For high-stakes irreversible actions (deploy, pay, delete, trade), the grade coming from the same engine is the problem.

An external gate:
- Isn't the same model, so it has no incentive to approve what it just decided
- Returns a signed proof (`sign: true`) — any downstream system can verify the verdict without trusting the agent or the gate
- Degrades gracefully (fail-open on timeout) — the agent stays fully autonomous
- Appends to the audit trail without any changes to agent behavior

The gate is live at `https://api.babyblueviper.com/mcp` (MCP, tool name: `review`) or `POST /review` (REST). The SDK's `llms_as_judge.py` + this pattern together cover self-grading (fast, free) + external verification (authoritative, signed) for different action classes.

Happy to contribute this as an `agent_patterns/external_governance.py` example if the direction fits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pattern: external governance gate via on_tool_start (independent /review before irreversible tool calls) #3697

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Pattern: external governance gate via on_tool_start (independent /review before irreversible tool calls) #3697

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions