Expose evaluationReferenceInputs (ground truth) on EvaluatorInput for code-based evaluators

## Summary

`EvaluatorInput` (used by `@custom_code_based_evaluator()`) does not expose the `evaluationReferenceInputs` field from the Lambda event. The code-based evaluator contract now delivers ground-truth reference inputs, so evaluator functions cannot access them through the typed input model.

## Background

Per the [code-based evaluators docs](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/code-based-evaluators.html#code-based-ground-truth), the Lambda event includes a top-level `evaluationReferenceInputs` list when ground truth is configured:

```json
{
    "schemaVersion": "1.0",
    "evaluationLevel": "TRACE",
    "evaluationInput": { "sessionSpans": [...] },
    "evaluationReferenceInputs": [
        {
            "context": { "spanContext": { "sessionId": "...", "traceId": "..." } },
            "expectedResponse": { "text": "..." }
        }
    ],
    "evaluationTarget": { "traceIds": ["trace123"], "spanIds": ["span123"] }
}
```

The service filters these by evaluation level (SESSION → all; TRACE → session + matching traceId; TOOL_CALL → session + matching spanId).

## Current behavior

`EvaluatorInput` only carries `evaluation_level`, `session_spans`, `target_trace_id`, `target_span_id`, and `schema_version`:

https://github.com/aws/bedrock-agentcore-sdk-python/blob/main/src/bedrock_agentcore/evaluation/custom_code_based_evaluators/models.py

The decorator parses the raw event but drops `evaluationReferenceInputs`:

https://github.com/aws/bedrock-agentcore-sdk-python/blob/main/src/bedrock_agentcore/evaluation/custom_code_based_evaluators/decorator.py

As a result, evaluator functions that need ground truth (e.g. expected-response comparisons, exact-match scoring) have no typed access to it and must drop down to the raw event — which the decorator does not even pass through.

## Proposed change

1. Add an optional `reference_inputs: List[Dict] = []` field to `EvaluatorInput`.
2. Populate it in the decorator from `event.get("evaluationReferenceInputs") or []`.

This is backward compatible — existing evaluators ignore the new field.

## Motivation

This also unblocks consolidating third-party evaluator integrations (e.g. PR #528's `DeepEvalHandler`) onto the standard `@custom_code_based_evaluator()` contract. That handler currently reinvents event parsing and output serialization partly because `EvaluatorInput` cannot surface `evaluationReferenceInputs` (needed to build `expected_output` for metrics like ContextualPrecision/Recall).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose evaluationReferenceInputs (ground truth) on EvaluatorInput for code-based evaluators #539

Summary

Background

Current behavior

Proposed change

Motivation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Expose evaluationReferenceInputs (ground truth) on EvaluatorInput for code-based evaluators #539

Description

Summary

Background

Current behavior

Proposed change

Motivation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions