Context
contains_expected and word_overlap now raise when expected_output is None (#118, PR #123), but the check lives inside the evaluator, which runs after the task. The wrapper knows expected from the kwargs before executing func (_extract_expected): a ForEach of 50 miswired cases burns 50 LLM calls before each one errors.
The wrapper already has the early-guard precedent: since PR #122, the zero-evaluators and duplicate-name guards both run before the task call, precisely to avoid spending tokens on a doomed case.
Suggestion
A declarative flag on Evaluator (e.g. requires_expected, set by the built-ins and available to @evaluator users), checked in the wrapper's pre-task guard block alongside the existing ones. The in-evaluator raise stays as the backstop for direct ev.run(ctx) calls.
This generalizes the pattern the two PR reviews converged on: every statically-knowable failure should fire before the expensive task call, not after.
Found during the adversarial reviews of PRs #122/#123.
Context
contains_expectedandword_overlapnow raise whenexpected_output is None(#118, PR #123), but the check lives inside the evaluator, which runs after the task. The wrapper knowsexpectedfrom the kwargs before executingfunc(_extract_expected): a ForEach of 50 miswired cases burns 50 LLM calls before each one errors.The wrapper already has the early-guard precedent: since PR #122, the zero-evaluators and duplicate-name guards both run before the task call, precisely to avoid spending tokens on a doomed case.
Suggestion
A declarative flag on
Evaluator(e.g.requires_expected, set by the built-ins and available to@evaluatorusers), checked in the wrapper's pre-task guard block alongside the existing ones. The in-evaluator raise stays as the backstop for directev.run(ctx)calls.This generalizes the pattern the two PR reviews converged on: every statically-knowable failure should fire before the expensive task call, not after.
Found during the adversarial reviews of PRs #122/#123.