Skip to content

Add bimodal_correlation_threshold to NumericImputationConfig #237

Description

@DEVunderdog

Parent

#105

What to build

Add bimodal_correlation_threshold: float = 0.2 to NumericImputationConfig in src/dataforge_ml/imputation/_config.py. This is the minimum absolute Pearson |r| a feature must have against a bimodal column for it to count toward the branch 2/3 feature tally in the Bimodal Imputation Framework.

The field is intentionally separate from mcar_feature_predictability_threshold — those two thresholds answer different questions: bimodal feature-counting asks "does this feature know which cluster a row belongs to"; MCAR predictability asks "can a model predict this column's value". They deserve independent knobs.

The hardcoded 0.2 at _numeric_imputer.py:652 (used when computing branch 3 feature centroids during fit) must be replaced with config.bimodal_correlation_threshold in the same PR.

Files

  • src/dataforge_ml/imputation/_config.py — add field, docstring entry in NumericImputationConfig, to_dict(), from_dict()
  • src/dataforge_ml/imputation/_numeric_imputer.py — replace hardcoded 0.2 at line 652 with config.bimodal_correlation_threshold
  • tests/unit/imputation/test_imputation_config.py — round-trip test for the new field

Acceptance criteria

  • NumericImputationConfig has bimodal_correlation_threshold: float = 0.2 with a numpy-style docstring entry (ADR-0034)
  • to_dict() serialises the field; from_dict() deserialises it with the correct default
  • The hardcoded 0.2 at _numeric_imputer.py:652 is replaced with config.bimodal_correlation_threshold
  • Unit test confirms the field round-trips through to_dict() / from_dict() correctly
  • All existing tests pass

Blocked by

None — can start immediately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions