Skip to content

Scope 4.3: Wire _compute_fold_metrics() into Regression, KNN, and MICE diagnostic paths #227

Description

@DEVunderdog

Parent

#93

What to build

Extend _compute_regression_diagnostic, _compute_knn_diagnostics, and _compute_mice_diagnostics in _numeric_imputer.py to accumulate rmse and mae alongside r2 inside their existing k-fold CV loops. Replace the current direct _r2_score(y_true, y_pred) call in each fold with _compute_fold_metrics(y_true, y_pred) (from #226), which returns (r2, rmse, mae) for that fold. Average rmse_fold and mae_fold across folds exactly as fold_r2s is averaged today. Populate diagnostic.rmse and diagnostic.mae on the returned ImputationFitDiagnostic.

Regression path: the existing k-fold CV on complete rows of the joint array [col] + feat_cols gains two accumulated lists — fold_rmses and fold_maes. When the minimum-rows guard fires (n_complete < refit_r2_min_complete_rows), all three fields stay None.

KNN path: the existing per-column k-fold CV on the scaled KNN matrix gains the same two lists per column. Inverse-scaling is already applied to y_pred before _r2_score — the same inverse-scaled values feed _compute_fold_metrics.

MICE path: the existing k-fold CV on the intersection of complete rows across all MICE columns gains fold_rmses and fold_maes per column. All MICE columns share the same fold splits; if the intersection is below refit_r2_min_complete_rows, all three fields go None for every column in the block simultaneously.

No changes to scalar strategy paths (Mean, Median, Mode, Constant, Dropped, Passthrough, MNAR) — diagnostic remains None for those.

Acceptance criteria

  • _compute_regression_diagnostic accumulates fold_rmses and fold_maes via _compute_fold_metrics() and sets diagnostic.rmse and diagnostic.mae as the mean across folds
  • _compute_knn_diagnostics does the same per KNN column, using inverse-scaled predictions
  • _compute_mice_diagnostics does the same per MICE column from the shared fold splits; all MICE columns have the same None / non-None status on r2_train, rmse, and mae simultaneously
  • diagnostic.rmse and diagnostic.mae are non-negative floats for model-based columns with sufficient complete rows
  • diagnostic.rmse and diagnostic.mae are None when n_complete < refit_r2_min_complete_rows
  • diagnostic.rmse and diagnostic.mae are None for Mean, Median, Mode, Constant, Dropped, and Passthrough columns (no change to scalar paths)
  • Integration test: after NumericImputer.fit(), diagnostic.rmse and diagnostic.mae are non-negative floats for KNN, Regression, and MICE columns
  • Integration test: diagnostic.rmse and diagnostic.mae are None for all scalar strategy columns
  • Integration test: diagnostic.rmse and diagnostic.mae are None when complete rows < refit_r2_min_complete_rows
  • Integration test: for a MICE block, all columns share the same None / non-None status across all three metric fields
  • FittedImputer.to_dict() / from_dict() round-trip preserves rmse and mae on a fitted imputer with model-based columns

Blocked by

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions