Skip to content

evals: declarative requires_expected on Evaluator, checked before the task runs #127

Description

@renaudcepre

Context

contains_expected and word_overlap now raise when expected_output is None (#118, PR #123), but the check lives inside the evaluator, which runs after the task. The wrapper knows expected from the kwargs before executing func (_extract_expected): a ForEach of 50 miswired cases burns 50 LLM calls before each one errors.

The wrapper already has the early-guard precedent: since PR #122, the zero-evaluators and duplicate-name guards both run before the task call, precisely to avoid spending tokens on a doomed case.

Suggestion

A declarative flag on Evaluator (e.g. requires_expected, set by the built-ins and available to @evaluator users), checked in the wrapper's pre-task guard block alongside the existing ones. The in-evaluator raise stays as the backstop for direct ev.run(ctx) calls.

This generalizes the pattern the two PR reviews converged on: every statically-knowable failure should fire before the expensive task call, not after.

Found during the adversarial reviews of PRs #122/#123.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions