Parent
#105
What to build
Add bimodal_correlation_threshold: float = 0.2 to NumericImputationConfig in src/dataforge_ml/imputation/_config.py. This is the minimum absolute Pearson |r| a feature must have against a bimodal column for it to count toward the branch 2/3 feature tally in the Bimodal Imputation Framework.
The field is intentionally separate from mcar_feature_predictability_threshold — those two thresholds answer different questions: bimodal feature-counting asks "does this feature know which cluster a row belongs to"; MCAR predictability asks "can a model predict this column's value". They deserve independent knobs.
The hardcoded 0.2 at _numeric_imputer.py:652 (used when computing branch 3 feature centroids during fit) must be replaced with config.bimodal_correlation_threshold in the same PR.
Files
src/dataforge_ml/imputation/_config.py — add field, docstring entry in NumericImputationConfig, to_dict(), from_dict()
src/dataforge_ml/imputation/_numeric_imputer.py — replace hardcoded 0.2 at line 652 with config.bimodal_correlation_threshold
tests/unit/imputation/test_imputation_config.py — round-trip test for the new field
Acceptance criteria
Blocked by
None — can start immediately.
Parent
#105
What to build
Add
bimodal_correlation_threshold: float = 0.2toNumericImputationConfiginsrc/dataforge_ml/imputation/_config.py. This is the minimum absolute Pearson|r|a feature must have against a bimodal column for it to count toward the branch 2/3 feature tally in the Bimodal Imputation Framework.The field is intentionally separate from
mcar_feature_predictability_threshold— those two thresholds answer different questions: bimodal feature-counting asks "does this feature know which cluster a row belongs to"; MCAR predictability asks "can a model predict this column's value". They deserve independent knobs.The hardcoded
0.2at_numeric_imputer.py:652(used when computing branch 3 feature centroids during fit) must be replaced withconfig.bimodal_correlation_thresholdin the same PR.Files
src/dataforge_ml/imputation/_config.py— add field, docstring entry inNumericImputationConfig,to_dict(),from_dict()src/dataforge_ml/imputation/_numeric_imputer.py— replace hardcoded0.2at line 652 withconfig.bimodal_correlation_thresholdtests/unit/imputation/test_imputation_config.py— round-trip test for the new fieldAcceptance criteria
NumericImputationConfighasbimodal_correlation_threshold: float = 0.2with a numpy-style docstring entry (ADR-0034)to_dict()serialises the field;from_dict()deserialises it with the correct default0.2at_numeric_imputer.py:652is replaced withconfig.bimodal_correlation_thresholdto_dict()/from_dict()correctlyBlocked by
None — can start immediately.