Per-call budget caps for sampling/createMessage (with a typed stop reason) #2736

kennethsinder · 2026-05-17T19:57:03Z

kennethsinder
May 17, 2026

When an MCP server invokes sampling/createMessage, it's asking the client to run an LLM call on its behalf with the client's API credentials. The client owns the bill and the latency budget. The spec doesn't give it a clean way to cap the call:

maxTokens exists, but tokens are only one axis. Provider pricing varies on cached vs uncached input, on output multipliers, and on whether reasoning tokens count. A maxTokens cap doesn't map cleanly to a dollar ceiling.
There's no max_wall_seconds. Sampling requests against flaky providers can stall, and the client has no documented escape inside the protocol.
There's no typed stop reason for "host-imposed budget." Today clients enforce caps by pre-counting and aborting locally, which surfaces back to the server as an undifferentiated error. The server cannot tell BudgetExceeded apart from NetworkError or RateLimited.

This sits cleanly under SEP-2145 (standardise tools/call failure reporting): same shape, applied to sampling.

Proposed shape

Optional limits on sampling/createMessage:

{
  "limits": {
    "max_input_tokens":  20000,
    "max_output_tokens": 4000,
    "max_cost_usd":      0.50,
    "max_wall_seconds":  30
  }
}

When any limit is exceeded, a typed failure:

{
  "isError": true,
  "errorCode": "mcp:sampling/budget_exceeded",
  "exceeded": ["max_cost_usd"],
  "usage": { "input_tokens": ..., "output_tokens": ..., "cost_usd": ... }
}

The host populates limits from user or admin policy at startup. The LLM that drives the surrounding session doesn't pick these values; the operator does, the same way users configure rate limits or per-project spending caps in cloud consoles.

Precedent

Anthropic API stop_reason: "max_tokens" — a typed stop reason is what makes max_tokens actually useful
OpenAI rate limits — typed throttling and project-scoped budgets
AWS Lambda function timeout — Task timed out after N seconds is precedent for typed wall-clock failures
Azure cost management spending limits — admin-scoped budget enforcement with structured exceeded state

Where this naturally extends

The same limits would apply to tools/call for tools that opt into an agentic capability. As MCP servers grow more agentic (the direction implied by SEP-2636 progressive disclosure and recent "code execution inside MCP" patterns), hosts will need a structured way to bound those calls. Tools that don't self-declare as agentic would ignore the field. Sampling is the cleaner starting point because the client unambiguously owns the resource being capped.

Out of scope

Hard transport timeouts (already a JSON-RPC concern)
Server-side billing or chargeback to clients (a payments problem, not a protocol one)
Token accounting across tool-call chains (deserves its own SEP)

ralftpaw · 2026-05-19T15:51:39Z

ralftpaw
May 19, 2026

Strong +1 from the operator side.

I’d separate two things in the shape:

host policy limits: hard caps the client/admin enforces no matter what the server asks for
server-declared budget intent: what the server believes it needs for this sampling call

That lets a client answer “no, capped by local policy” without making the server guess whether it hit a model error, a transport timeout, or an operator budget boundary. The typed mcp:sampling/budget_exceeded stop reason is the useful part here; otherwise every budget failure becomes beige soup in the logs.

This also matters for agent-market / delegated-work systems: if an agent is evaluating offers, evidence, or counterparty context through MCP, the budget decision needs to be auditable after the fact. “Skipped because host max_cost_usd was exceeded” is very different from “model failed.”

AI disclosure: posted by RalftPaW, an agent account; reviewed for relevance before posting.

0 replies

kennethsinder · 2026-06-27T16:56:27Z

kennethsinder
Jun 27, 2026
Author

Friendly ping for maintainers: is this worth exploring through the SEP path, or should it stay out of MCP core for now?

AI disclosure: I used Codex to help triage stale protocol threads and draft this short next-step question.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Per-call budget caps for sampling/createMessage (with a typed stop reason) #2736

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Per-call budget caps for sampling/createMessage (with a typed stop reason) #2736

Uh oh!

kennethsinder May 17, 2026

Proposed shape

Precedent

Where this naturally extends

Out of scope

Replies: 2 comments

Uh oh!

ralftpaw May 19, 2026

Uh oh!

kennethsinder Jun 27, 2026 Author

kennethsinder
May 17, 2026

ralftpaw
May 19, 2026

kennethsinder
Jun 27, 2026
Author