Parent
#91
What to build
For large MICE blocks, compute n_nearest_features from value-level Pearson correlations already available in the CorrelationProfiler output, rather than leaving it unset (which forces IterativeImputer to use all predictors). For small blocks, leave n_nearest_features=None.
Algorithm:
- If the MICE block has ≤
mice_n_nearest_features_min_cols columns: set n_nearest_features=None (use all predictors)
- Otherwise: for each MICE column, count how many other MICE columns have
|Pearson r| > mice_correlation_threshold (from CorrelationProfiler). Take the median count across all MICE columns, cap at mice_max_nearest_features, and pass the result as n_nearest_features
- Record the decision (and the computed value, or "all predictors") in every MICE column's signals
All three threshold fields (mice_n_nearest_features_min_cols, mice_max_nearest_features, mice_correlation_threshold) come from NumericImputationConfig (added in #157).
Acceptance criteria
Blocked by
Parent
#91
What to build
For large MICE blocks, compute
n_nearest_featuresfrom value-level Pearson correlations already available in theCorrelationProfileroutput, rather than leaving it unset (which forcesIterativeImputerto use all predictors). For small blocks, leaven_nearest_features=None.Algorithm:
mice_n_nearest_features_min_colscolumns: setn_nearest_features=None(use all predictors)|Pearson r| > mice_correlation_threshold(fromCorrelationProfiler). Take the median count across all MICE columns, cap atmice_max_nearest_features, and pass the result asn_nearest_featuresAll three threshold fields (
mice_n_nearest_features_min_cols,mice_max_nearest_features,mice_correlation_threshold) come fromNumericImputationConfig(added in #157).Acceptance criteria
mice_n_nearest_features_min_colscolumns haven_nearest_featurescomputed fromCorrelationProfilervalue-level Pearson correlationsmice_n_nearest_features_min_colshaven_nearest_features=Nonen_nearest_featuresis capped atmice_max_nearest_features|Pearson r| > mice_correlation_thresholdcount as informative predictorssignalsrecords then_nearest_featuresdecision (value used, or "all predictors used — block below min_cols threshold")mice_n_nearest_features_min_colsproduces an_nearest_featuressignal entry on all MICE column recordsn_nearest_features=Noneis passed toIterativeImputerBlocked by