Skip to content

Inconsistency between code and documentation for custom code-based evaluators #541

@rseeto

Description

@rseeto

Describe the bug
The documentation for Amazon Bedrock AgentCore code-based evaluators says a Lambda may return one of two response shapes:

  1. Success response:

    • label
    • optional value
    • optional explanation
  2. Error response:

    • errorCode
    • errorMessage

However, when using the Python SDK/helper model bedrock_agentcore.evaluation.EvaluatorOutput, constructing an error-only response without label raises a Pydantic validation error because label is still required by the SDK model.

This creates a mismatch between the documented Lambda response contract and the actual SDK behavior. In practice, valid documented error responses can fail at runtime unless a placeholder label is included.

To Reproduce
Steps to reproduce the behavior:

  1. Install bedrock-agentcore in Python.
  2. Create a minimal code-based evaluator or simple repro script.
  3. Construct an error response using the documented error-only shape, for example:
    from bedrock_agentcore.evaluation import EvaluatorOutput
    
    EvaluatorOutput(
        errorCode="InvalidJudgeResponse",
        errorMessage="Example error"
    )
  4. Observe that the SDK raises a validation error because label is required.

Expected behavior
A code-based evaluator error response documented as:

{
  "errorCode": "VALIDATION_FAILED",
  "errorMessage": "Input spans missing required tool call attributes."
}

should be accepted by the SDK/helper model and should not require label.

Either:

  • the SDK should allow error-only responses without label, or
  • the documentation should explicitly state that label is still required by the SDK model even for error responses.

Additional context

  • Documentation referenced:
    https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/code-based-evaluators.html
  • Observed package version: bedrock-agentcore 1.14.1
  • The validation failure appears to come from the Python SDK model layer, not from the documented wire contract itself.
  • This mismatch can surface as test failures and may also cause evaluator runtime failures or generic internal server errors if the Lambda attempts to return a documented error-only response through EvaluatorOutput.
  • As a workaround, we had to populate label="" on error responses even though that is not consistent with the documented error response schema.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions