Skip to content

[Feature]: Support automatic custom metrics registration in LocalEvalSampler and 'adk optimize' CLI #6177

@chaotingxuan

Description

@chaotingxuan

** Please make sure you read the contribution guide and file the issues in the right place. **
Contribution guide.

🔴 Required Information

Please ensure all items in this section are completed to allow for efficient
triaging. Requests without complete information may be rejected / deprioritized.
If an item is not applicable to you - please mark it as N/A

Is your feature request related to a specific problem?

Yes. Currently, when utilizing custom metrics defined in eval_config.custom_metrics, the adk eval CLI command successfully registers them into DEFAULT_METRIC_EVALUATOR_REGISTRY via inline logic in cli_eval.
However, this registration logic is completely absent in cli_optimize and the underlying LocalEvalSampler. As a result, developers leveraging custom metrics in prompt optimization workflows face two frustrating friction points:

  1. Running adk optimize agent/ agent/sampler_config.json via the CLI fails with an unregistered metric KeyError.
  2. Initializing LocalEvalSampler programmatically in custom Python scripts or Jupyter Notebooks forces developers to write boilerplate code to manually register their custom evaluators into DEFAULT_METRIC_EVALUATOR_REGISTRY prior to running the optimizer.

Describe the Solution You'd Like

We propose centralizing the custom metric registration logic and moving it directly into LocalEvalSampler.
Requested Changes:

  1. Shared Helper (eval_config.py): Create a modular helper function register_custom_metrics_from_config(eval_config: EvalConfig) to eliminate duplicate registration logic across the codebase.
  2. Sampler Integration (local_eval_sampler.py): Invoke the shared helper directly inside LocalEvalSampler.__init__. This guarantees that any sampler instantiated with an EvalConfig containing custom metrics automatically registers them before LocalEvalService is invoked.
  3. CLI Refactor (cli_tools_click.py): Clean up cli_eval by replacing its verbose inline registration loop with the new clean helper function.

Impact on your work

This feature directly impacts our automated prompt engineering workflows for healthcare and ambient clinical note generation (e.g., some healthcare customer's Ambient Scribe project). We utilize custom clinical evaluation metrics (measuring hallucination and omission rates) to optimize scribe agent prompts. Currently, we are forced to maintain custom runner scripts and local monkey-patches to bypass the CLI limitation. Having native custom metric support in adk optimize will allow us to execute clean, automated prompt tuning directly in our CI/CD pipelines.

Willingness to contribute

Are you interested in implementing this feature yourself or submitting a PR?
(Yes/No) Yes


🟡 Recommended Information

Describe Alternatives You've Considered

  • Patching cli_optimize only: We considered adding the registration loop directly to cli_optimize in cli_tools_click.py (matching cli_eval). However, this is architecturally suboptimal because it leaves LocalEvalSampler dependent on the CLI layer. If a developer initializes LocalEvalSampler programmatically in Python/Jupyter, cli_optimize is bypassed, and custom metrics would still fail without manual boilerplate registration.

Proposed API / Implementation

**1. Shared Helper (`google/adk/evaluation/eval_config.py`):**
def register_custom_metrics_from_config(eval_config: EvalConfig) -> None:
  """Registers custom metrics defined in EvalConfig into the default registry."""
  if not eval_config or not eval_config.custom_metrics:
    return
  metric_evaluator_registry = DEFAULT_METRIC_EVALUATOR_REGISTRY
  for metric_name, config in eval_config.custom_metrics.items():
    if config.metric_info:
      metric_info = config.metric_info.model_copy()
      metric_info.metric_name = metric_name
    else:
      from ..cli.cli_eval import get_default_metric_info
      metric_info = get_default_metric_info(
          metric_name=metric_name, description=config.description
      )
    metric_evaluator_registry.register_evaluator(
        metric_info, _CustomMetricEvaluator
    )

**2. Sampler Integration (google/adk/optimization/local_eval_sampler.py):**
class LocalEvalSampler(Sampler[UnstructuredSamplingResult]):
  def __init__(
      self,
      config: LocalEvalSamplerConfig,
      eval_sets_manager: EvalSetsManager,
  ):
    self._config = config
    self._eval_sets_manager = eval_sets_manager

    # Automatically register custom metrics if present
    if self._config.eval_config:
      register_custom_metrics_from_config(self._config.eval_config)
      
    # ... existing init logic ...

Additional Context

We have already created and verified a local patch using this exact logic in our development environment. It successfully runs our custom clinical metrics via adk optimize without any errors. We have a fully structured implementation and unit testing plan ready, and we will submit the formal Pull Request (PR) as soon as this issue is approved.

Metadata

Metadata

Assignees

Labels

eval[Component] This issue is related to evaluation
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions