evals: declarative requires_expected on Evaluator, checked before the task runs

### Context

`contains_expected` and `word_overlap` now raise when `expected_output is None` (#118, PR #123), but the check lives inside the evaluator, which runs *after* the task. The wrapper knows `expected` from the kwargs before executing `func` (`_extract_expected`): a ForEach of 50 miswired cases burns 50 LLM calls before each one errors.

The wrapper already has the early-guard precedent: since PR #122, the zero-evaluators and duplicate-name guards both run before the task call, precisely to avoid spending tokens on a doomed case.

### Suggestion

A declarative flag on `Evaluator` (e.g. `requires_expected`, set by the built-ins and available to `@evaluator` users), checked in the wrapper's pre-task guard block alongside the existing ones. The in-evaluator raise stays as the backstop for direct `ev.run(ctx)` calls.

This generalizes the pattern the two PR reviews converged on: every statically-knowable failure should fire before the expensive task call, not after.

Found during the adversarial reviews of PRs #122/#123.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evals: declarative requires_expected on Evaluator, checked before the task runs #127

Context

Suggestion

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

evals: declarative requires_expected on Evaluator, checked before the task runs #127

Description

Context

Suggestion

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions