Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions docs/adr/0011-token-economics-from-session-shutdown.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,20 +72,21 @@ On 2026-06-21, `copilot help monitoring` and local file-exporter probes with Cop
`1.0.64-0` showed an additional optional source: OpenTelemetry `chat <model>` spans carry
per-LLM-call `gen_ai.usage.input_tokens`, `gen_ai.usage.cache_creation_input_tokens`,
`gen_ai.usage.output_tokens`, `github.copilot.nano_aiu`, `github.copilot.server_duration`, and
`github.copilot.turn_id`. In the probes, summing chat spans matched the native
`github.copilot.turn_id`. Newer/local exporter variants may also include per-call cache-read fields
such as `gen_ai.usage.cache_read_input_tokens`. In the probes, summing chat spans matched the native
`session.shutdown` totals for input, cache-write, output, nano-AIU, and API duration.

We integrated this as a complement, not a replacement. The harness now enables local OTel file
export to `copilot-otel.jsonl` for Copilot agent runs when no explicit OTLP destination is
configured, and `SessionAnalysis` ingests that file when present to populate a per-call LLM table
and annotate turns with OTel input/cache-write/output/AIU/API-duration fields. Native
and annotate turns with OTel input/cache-read/cache-write/output/AIU/API-duration fields. Native
`events.jsonl` remains the authoritative source for session-level totals and richer forensic
details; OTel still does not split each call's input into non-cached input versus cache-read tokens.
details; OTel cache-read fields are captured when present, otherwise that split remains session-level.

The Pier ATIF trajectory also carries this complement now. Assistant steps matched by
`github.copilot.turn_id` get ATIF prompt/completion token metrics plus
`metrics.extra.copilot_otel` for the OTel-only fields (`cache_creation_input_tokens`, AIU, server
duration, context size, token limit, and normalized per-call records). `final_metrics.extra` stores
the aggregate OTel totals. We deliberately do not map OTel cache creation into ATIF
`cached_tokens`/`total_cached_tokens`, because those fields represent cache reads in our native
Copilot economics.
duration, context size, token limit, optional cache-read tokens, and normalized per-call records).
`final_metrics.extra` stores the aggregate OTel totals. We deliberately do not map OTel cache
creation into ATIF `cached_tokens`/`total_cached_tokens`, because those fields represent cache reads
in our native Copilot economics.
11 changes: 6 additions & 5 deletions docs/analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,8 @@ The rendering (built with [Rich](https://rich.readthedocs.io/)) has these parts:
truncations, files modified, lines ±, AIU per line). Multi-model sessions also get a per-model
table. See [ADR-0011](adr/0011-token-economics-from-session-shutdown.md).
5. **LLM calls (OTel)** — *only when `copilot-otel.jsonl` is available.* One row per `chat <model>`
span with turn id, model, input tokens, cache-write tokens, output tokens, AIU, API time, and
current context size.
span with turn id, model, input tokens, cache-read/cache-write tokens when exported, output and
total tokens, AIU, wall/API time, and current/limit context size.
6. **Phases (temporal)** — *only when the session has at least five turns.* The turns are split
into five contiguous, near-equal groups (`early` → `later`) and each phase shows its turn span,
tool calls, output tokens, share of total output, and duration. This mirrors the phase-level
Expand All @@ -73,8 +73,9 @@ Warnings, if any, are shown in a panel at the bottom.
> `events.jsonl` logs input, cache, reasoning, and AIU **only as session totals**
> (`session.shutdown`), never per turn; the per-turn `assistant.message` carries `outputTokens` but
> not `inputTokens`. Copilot's OTel `chat <model>` spans expose per-LLM-call `input_tokens`,
> `cache_creation_input_tokens`, `output_tokens`, `nano_aiu`, and server duration when OTel export is
> enabled; `analyze` auto-loads the harness-captured `copilot-otel.jsonl` when present, or accepts
> optional cache-read details, `cache_creation_input_tokens`, `output_tokens`, `nano_aiu`, and
> server duration when OTel export is enabled; `analyze` auto-loads the harness-captured
> `copilot-otel.jsonl` when present, or accepts
> `--otel-file` for direct log analysis. The current phase table still distributes only the native
> per-turn signals — output tokens, tool activity, and duration — and deliberately does **not**
> fabricate per-phase input or cost. The separate **LLM calls (OTel)** table carries the per-call
Expand Down Expand Up @@ -113,7 +114,7 @@ legacy `analysis.json`, and any future consumer.
| `economics` | `TokenEconomics`: token-type split, AIU cost, context composition, productivity (see below). |
| `tools` | `ToolStat(name, calls, failures, total_duration_ms, total_result_chars)`, sorted by calls desc. |
| `llm_calls` | `LlmCallSummary` rows parsed from OTel `chat <model>` spans when `copilot-otel.jsonl` is available. |
| `turns` | `TurnSummary(turn_no, duration_s, tools, output_tokens, text_preview, …)` plus OTel `input_tokens`, `cache_creation_input_tokens`, `aiu`, and `api_duration_ms` when available. |
| `turns` | `TurnSummary(turn_no, duration_s, tools, output_tokens, text_preview, …)` plus OTel `input_tokens`, optional `cache_read_input_tokens`, `cache_creation_input_tokens`, `aiu`, and `api_duration_ms` when available. |
| `phases` | `PhaseStat(name, turn_from, turn_to, n_turns, n_tool_calls, output_tokens, duration_s, output_share)` — five temporal phases (empty for sessions under five turns). |
| `warnings` | Warning messages. |
| `event_type_counts` | Histogram of raw event `type`s. |
Expand Down
16 changes: 9 additions & 7 deletions docs/collecting-run-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -319,9 +319,10 @@ When `agent/copilot-otel.jsonl` is present, each assistant step that can be matc
`metrics.extra.copilot_otel`. That extra payload preserves the per-call OTel facts the standard
ATIF token fields cannot represent directly: `cache_creation_input_tokens`, `aiu`,
`server_duration_ms`, current context size, token limit, and the raw normalized `llm_calls` list.
`final_metrics.extra.copilot_otel` stores the run-level OTel aggregate. ATIF
`total_cached_tokens` remains the native cache-read total; OTel cache creation is not written there
because Copilot OTel does not expose a per-call cache-read split.
When the exporter includes per-call cache reads, the payload also includes
`cache_read_input_tokens`. `final_metrics.extra.copilot_otel` stores the run-level OTel aggregate.
ATIF `total_cached_tokens` uses the native cache-read total when available and falls back to OTel
cache reads when OTel exposes them.

Use ATIF for cross-agent comparisons and fallback analysis. Use native `events.jsonl` for
Copilot-specific economics, compaction/truncation detail, and the most faithful timeline.
Expand Down Expand Up @@ -491,7 +492,7 @@ Observed span records from local file-exporter probes:
| Span | Useful fields | What it adds beyond `events.jsonl` |
| --- | --- | --- |
| `invoke_agent` | `gen_ai.conversation.id`, `gen_ai.agent.id`, `gen_ai.agent.version`, `gen_ai.request.model`, `github.copilot.agent.type`, `github.copilot.turn_count`, aggregate `gen_ai.usage.*`, aggregate `github.copilot.nano_aiu`, `github.copilot.context.skills`, `github.copilot.context.mcp_server_names`, `github.copilot.context.custom_agent_names` | Agent-level trace root and aggregate usage/cost for the invocation. MCP server names are hashed. |
| `chat <model>` | `gen_ai.conversation.id`, `github.copilot.turn_id`, `gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.response.id`, `gen_ai.response.finish_reasons`, `gen_ai.usage.input_tokens`, `gen_ai.usage.cache_creation_input_tokens`, `gen_ai.usage.output_tokens`, `github.copilot.nano_aiu`, `github.copilot.server_duration`, `github.copilot.interaction_id`, `github.copilot.service_request_id` | Per-LLM-call input/output/cache-write tokens, nano-AIU, and server duration. This is the main data source for turn-level economics. |
| `chat <model>` | `gen_ai.conversation.id`, `github.copilot.turn_id`, `gen_ai.request.model`, `gen_ai.response.model`, `gen_ai.response.id`, `gen_ai.response.finish_reasons`, `gen_ai.usage.input_tokens`, optional cache-read usage fields, `gen_ai.usage.cache_creation_input_tokens`, `gen_ai.usage.output_tokens`, `github.copilot.nano_aiu`, `github.copilot.server_duration`, `github.copilot.interaction_id`, `github.copilot.service_request_id` | Per-LLM-call input/output/cache-read/cache-write tokens when exported, nano-AIU, and server duration. This is the main data source for turn-level economics. |
| `execute_tool <tool>` | `gen_ai.operation.name`, `gen_ai.tool.name`, `gen_ai.tool.call.id`, `gen_ai.tool.type`, `gen_ai.provider.name`, span status | Standard tool spans that can be joined to chat spans by trace and to native events by tool-call id. Native `tool.execution_complete.toolTelemetry` remains richer for result sizes and file/code metadata. |

`chat <model>` spans also emitted a `github.copilot.session.usage_info` event with
Expand All @@ -505,15 +506,16 @@ Per-call token/cost interpretation:
| Field | Interpretation |
| --- | --- |
| `gen_ai.usage.input_tokens` | Total prompt/input tokens for that LLM request. In observed runs, summing chat spans matched `session.shutdown.modelMetrics.<model>.usage.inputTokens`. |
| `gen_ai.usage.cache_read_input_tokens` | Cache-read tokens for that request when exported. The parser also accepts common aliases such as `gen_ai.usage.cached_tokens`. |
| `gen_ai.usage.cache_creation_input_tokens` | Cache-write tokens for that request. In observed runs, summing chat spans matched `session.shutdown.tokenDetails.cache_write.tokenCount`. |
| `gen_ai.usage.output_tokens` | Output tokens for that request. |
| `github.copilot.nano_aiu` | Per-request cost in nano-AIU. Prefer this over `github.copilot.cost` for economics; summing observed chat spans matched `session.shutdown.totalNanoAiu`. |
| `github.copilot.server_duration` | Per-request server/API duration in milliseconds; summing observed chat spans matched `session.shutdown.totalApiDurationMs`. |
| `github.copilot.turn_id` | Join key to native `assistant.turn_*` / `assistant.message` events. |

The observed OTel spans did **not** split per-call `input_tokens` into non-cached input versus
cache-read tokens. That split remains session-level in `session.shutdown.tokenDetails`; per call, we
can capture total input, cache creation, output, server duration, and AIU.
Some observed OTel spans did **not** split per-call `input_tokens` into non-cached input versus
cache-read tokens. When the exporter provides cache-read fields, the parser captures them; otherwise
that split remains session-level in `session.shutdown.tokenDetails`.

Metrics currently documented by the CLI help:

Expand Down
41 changes: 41 additions & 0 deletions src/copilot_experiments/analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,7 @@ def _apply_otel_records(analysis: SessionAnalysis, records: list[dict[str, Any]]
if turn is None:
continue
_add_turn_int(turn, "input_tokens", call.input_tokens)
_add_turn_int(turn, "cache_read_input_tokens", call.cache_read_input_tokens)
_add_turn_int(turn, "cache_creation_input_tokens", call.cache_creation_input_tokens)
if turn.output_tokens is None:
_add_turn_int(turn, "output_tokens", call.output_tokens)
Expand All @@ -283,6 +284,7 @@ def _apply_otel_records(analysis: SessionAnalysis, records: list[dict[str, Any]]
return

input_tokens = _sum_int(call.input_tokens for call in calls)
cache_read_tokens = _sum_int(call.cache_read_input_tokens for call in calls)
cache_creation_tokens = _sum_int(call.cache_creation_input_tokens for call in calls)
output_tokens = _sum_int(call.output_tokens for call in calls)
api_duration_ms = _sum_int(call.server_duration_ms for call in calls)
Expand All @@ -291,6 +293,8 @@ def _apply_otel_records(analysis: SessionAnalysis, records: list[dict[str, Any]]
if input_tokens is not None:
analysis.input_tokens = input_tokens
analysis.economics.input_tokens_total = input_tokens
if cache_read_tokens is not None:
analysis.economics.cache_read_tokens = cache_read_tokens
if cache_creation_tokens is not None:
analysis.economics.cache_write_tokens = cache_creation_tokens
if output_tokens is not None:
Expand Down Expand Up @@ -339,6 +343,17 @@ def llm_calls_from_otel(records: list[dict[str, Any]]) -> list[LlmCallSummary]:
response_id=_otel_string(attrs.get("gen_ai.response.id")),
finish_reasons=_strings(attrs.get("gen_ai.response.finish_reasons")),
input_tokens=input_tokens,
cache_read_input_tokens=_first_otel_int(
attrs,
"gen_ai.usage.cache_read_input_tokens",
"gen_ai.usage.cache_read_tokens",
"gen_ai.usage.cached_input_tokens",
"gen_ai.usage.cached_tokens",
"gen_ai.usage.prompt_tokens_details.cached_tokens",
"gen_ai.usage.input_token_details.cached_tokens",
"gen_ai.usage.input_token_details.cache_read_tokens",
"gen_ai.usage.input_tokens_details.cached_tokens",
),
cache_creation_input_tokens=_otel_int(
attrs.get("gen_ai.usage.cache_creation_input_tokens")
),
Expand Down Expand Up @@ -372,6 +387,32 @@ def _chat_usage_event(events: Any) -> dict[str, Any]:
return {}


def _first_otel_int(attrs: dict[str, Any], *keys: str) -> int | None:
for key in keys:
value = _otel_attr_value(attrs, key)
if value is None:
continue
parsed = _otel_int(value)
if parsed is not None:
return parsed
return None


def _otel_attr_value(attrs: dict[str, Any], key: str) -> Any:
if key in attrs:
return attrs[key]
parts = key.split(".")
for index in range(len(parts) - 1, 0, -1):
prefix = ".".join(parts[:index])
value = attrs.get(prefix)
if not isinstance(value, dict):
continue
for part in parts[index:]:
value = value.get(part) if isinstance(value, dict) else None
return value
return None


def _otel_attrs(raw: Any) -> dict[str, Any]:
if isinstance(raw, dict):
return {str(k): _otel_value(v) for k, v in raw.items()}
Expand Down
2 changes: 2 additions & 0 deletions src/copilot_experiments/invoker.py
Original file line number Diff line number Diff line change
Expand Up @@ -363,6 +363,7 @@ def _write_synthetic_otel(path: Path, events: list[dict], model: str) -> None:
data = ev.get("data") or {}
output_tokens = int(data.get("outputTokens") or 0)
input_tokens = 1000 + i * 100
cache_read_tokens = 900 + i * 90
cache_creation_tokens = 100 + i * 10
nano_aiu = 100_000_000 + i * 10_000_000
record = {
Expand All @@ -373,6 +374,7 @@ def _write_synthetic_otel(path: Path, events: list[dict], model: str) -> None:
"gen_ai.request.model": model,
"gen_ai.response.model": model,
"gen_ai.usage.input_tokens": input_tokens,
"gen_ai.usage.cache_read_input_tokens": cache_read_tokens,
"gen_ai.usage.cache_creation_input_tokens": cache_creation_tokens,
"gen_ai.usage.output_tokens": output_tokens,
"github.copilot.nano_aiu": nano_aiu,
Expand Down
2 changes: 2 additions & 0 deletions src/copilot_experiments/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -344,6 +344,7 @@ class LlmCallSummary(BaseModel):
response_id: str | None = None
finish_reasons: list[str] = Field(default_factory=list)
input_tokens: int | None = None
cache_read_input_tokens: int | None = None
cache_creation_input_tokens: int | None = None
output_tokens: int | None = None
total_tokens: int | None = None
Expand All @@ -368,6 +369,7 @@ class TurnSummary(BaseModel):
tools: list[str] = Field(default_factory=list)
output_tokens: int | None = None
input_tokens: int | None = None
cache_read_input_tokens: int | None = None
cache_creation_input_tokens: int | None = None
aiu: float | None = None
api_duration_ms: int | None = None
Expand Down
7 changes: 6 additions & 1 deletion src/copilot_experiments/pier_agents/copilot_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -490,7 +490,7 @@ def append_step(step: Step) -> None:
total_cached_tokens=(
parsed_metrics.cache_read_tokens
if parsed_metrics is not None and parsed_metrics.cache_read_tokens is not None
else None
else otel_totals.get("cache_read_input_tokens")
),
total_steps=len(steps),
extra=extra or None,
Expand Down Expand Up @@ -572,6 +572,7 @@ def _apply_otel_metrics(step: Step, calls: list[LlmCallSummary]) -> None:

metrics = step.metrics or Metrics()
input_tokens = _sum_optional_int(call.input_tokens for call in calls)
cache_read_input_tokens = _sum_optional_int(call.cache_read_input_tokens for call in calls)
output_tokens = _sum_optional_int(call.output_tokens for call in calls)
if input_tokens is not None:
metrics.prompt_tokens = input_tokens
Expand All @@ -584,6 +585,7 @@ def _apply_otel_metrics(step: Step, calls: list[LlmCallSummary]) -> None:
{
"llm_call_count": len(calls),
"input_tokens": input_tokens,
"cache_read_input_tokens": cache_read_input_tokens,
"cache_creation_input_tokens": _sum_optional_int(
call.cache_creation_input_tokens for call in calls
),
Expand All @@ -608,6 +610,9 @@ def _otel_final_totals(calls: list[LlmCallSummary]) -> dict[str, Any]:
{
"llm_call_count": len(calls),
"input_tokens": _sum_optional_int(call.input_tokens for call in calls),
"cache_read_input_tokens": _sum_optional_int(
call.cache_read_input_tokens for call in calls
),
"cache_creation_input_tokens": _sum_optional_int(
call.cache_creation_input_tokens for call in calls
),
Expand Down
15 changes: 13 additions & 2 deletions src/copilot_experiments/render.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,25 +192,36 @@ def _llm_calls_table(a: SessionAnalysis) -> Table | None:
table.add_column("turn", justify="right")
table.add_column("model")
table.add_column("in", justify="right")
table.add_column("cache read", justify="right")
table.add_column("cache write", justify="right")
table.add_column("out", justify="right")
table.add_column("total", justify="right")
table.add_column("AIU", justify="right")
table.add_column("wall", justify="right")
table.add_column("API", justify="right")
table.add_column("ctx", justify="right")
table.add_column("ctx/limit", justify="right")
for call in a.llm_calls:
model = call.response_model or call.request_model or "-"
api_duration = (
_dur(call.server_duration_ms / 1000) if call.server_duration_ms is not None else "-"
)
ctx = (
f"{_int(call.current_tokens)}/{_int(call.token_limit)}"
if call.token_limit is not None
else _int(call.current_tokens)
)
table.add_row(
call.turn_id or "-",
model,
_int(call.input_tokens),
_int(call.cache_read_input_tokens),
_int(call.cache_creation_input_tokens),
_int(call.output_tokens),
_int(call.total_tokens),
_aiu(call.aiu),
_dur(call.duration_s),
api_duration,
_int(call.current_tokens),
ctx,
)
return table

Expand Down
Loading
Loading