You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Extend _compute_regression_diagnostic, _compute_knn_diagnostics, and _compute_mice_diagnostics in _numeric_imputer.py to accumulate rmse and mae alongside r2 inside their existing k-fold CV loops. Replace the current direct _r2_score(y_true, y_pred) call in each fold with _compute_fold_metrics(y_true, y_pred) (from #226), which returns (r2, rmse, mae) for that fold. Average rmse_fold and mae_fold across folds exactly as fold_r2s is averaged today. Populate diagnostic.rmse and diagnostic.mae on the returned ImputationFitDiagnostic.
Regression path: the existing k-fold CV on complete rows of the joint array [col] + feat_cols gains two accumulated lists — fold_rmses and fold_maes. When the minimum-rows guard fires (n_complete < refit_r2_min_complete_rows), all three fields stay None.
KNN path: the existing per-column k-fold CV on the scaled KNN matrix gains the same two lists per column. Inverse-scaling is already applied to y_pred before _r2_score — the same inverse-scaled values feed _compute_fold_metrics.
MICE path: the existing k-fold CV on the intersection of complete rows across all MICE columns gains fold_rmses and fold_maes per column. All MICE columns share the same fold splits; if the intersection is below refit_r2_min_complete_rows, all three fields go None for every column in the block simultaneously.
No changes to scalar strategy paths (Mean, Median, Mode, Constant, Dropped, Passthrough, MNAR) — diagnostic remains None for those.
Acceptance criteria
_compute_regression_diagnostic accumulates fold_rmses and fold_maes via _compute_fold_metrics() and sets diagnostic.rmse and diagnostic.mae as the mean across folds
_compute_knn_diagnostics does the same per KNN column, using inverse-scaled predictions
_compute_mice_diagnostics does the same per MICE column from the shared fold splits; all MICE columns have the same None / non-None status on r2_train, rmse, and mae simultaneously
diagnostic.rmse and diagnostic.mae are non-negative floats for model-based columns with sufficient complete rows
diagnostic.rmse and diagnostic.mae are None when n_complete < refit_r2_min_complete_rows
diagnostic.rmse and diagnostic.mae are None for Mean, Median, Mode, Constant, Dropped, and Passthrough columns (no change to scalar paths)
Integration test: after NumericImputer.fit(), diagnostic.rmse and diagnostic.mae are non-negative floats for KNN, Regression, and MICE columns
Integration test: diagnostic.rmse and diagnostic.mae are None for all scalar strategy columns
Integration test: diagnostic.rmse and diagnostic.mae are None when complete rows < refit_r2_min_complete_rows
Integration test: for a MICE block, all columns share the same None / non-None status across all three metric fields
FittedImputer.to_dict() / from_dict() round-trip preserves rmse and mae on a fitted imputer with model-based columns
Parent
#93
What to build
Extend
_compute_regression_diagnostic,_compute_knn_diagnostics, and_compute_mice_diagnosticsin_numeric_imputer.pyto accumulatermseandmaealongsider2inside their existing k-fold CV loops. Replace the current direct_r2_score(y_true, y_pred)call in each fold with_compute_fold_metrics(y_true, y_pred)(from #226), which returns(r2, rmse, mae)for that fold. Averagermse_foldandmae_foldacross folds exactly asfold_r2sis averaged today. Populatediagnostic.rmseanddiagnostic.maeon the returnedImputationFitDiagnostic.Regression path: the existing k-fold CV on complete rows of the joint array
[col] + feat_colsgains two accumulated lists —fold_rmsesandfold_maes. When the minimum-rows guard fires (n_complete < refit_r2_min_complete_rows), all three fields stayNone.KNN path: the existing per-column k-fold CV on the scaled KNN matrix gains the same two lists per column. Inverse-scaling is already applied to
y_predbefore_r2_score— the same inverse-scaled values feed_compute_fold_metrics.MICE path: the existing k-fold CV on the intersection of complete rows across all MICE columns gains
fold_rmsesandfold_maesper column. All MICE columns share the same fold splits; if the intersection is belowrefit_r2_min_complete_rows, all three fields goNonefor every column in the block simultaneously.No changes to scalar strategy paths (Mean, Median, Mode, Constant, Dropped, Passthrough, MNAR) —
diagnosticremainsNonefor those.Acceptance criteria
_compute_regression_diagnosticaccumulatesfold_rmsesandfold_maesvia_compute_fold_metrics()and setsdiagnostic.rmseanddiagnostic.maeas the mean across folds_compute_knn_diagnosticsdoes the same per KNN column, using inverse-scaled predictions_compute_mice_diagnosticsdoes the same per MICE column from the shared fold splits; all MICE columns have the sameNone/ non-Nonestatus onr2_train,rmse, andmaesimultaneouslydiagnostic.rmseanddiagnostic.maeare non-negative floats for model-based columns with sufficient complete rowsdiagnostic.rmseanddiagnostic.maeareNonewhenn_complete < refit_r2_min_complete_rowsdiagnostic.rmseanddiagnostic.maeareNonefor Mean, Median, Mode, Constant, Dropped, and Passthrough columns (no change to scalar paths)NumericImputer.fit(),diagnostic.rmseanddiagnostic.maeare non-negative floats for KNN, Regression, and MICE columnsdiagnostic.rmseanddiagnostic.maeareNonefor all scalar strategy columnsdiagnostic.rmseanddiagnostic.maeareNonewhen complete rows <refit_r2_min_complete_rowsNone/ non-Nonestatus across all three metric fieldsFittedImputer.to_dict()/from_dict()round-trip preservesrmseandmaeon a fitted imputer with model-based columnsBlocked by