Skip to content

feat: dynamic max_iter and tol for MICE via seven-signal framework #159

@DEVunderdog

Description

@DEVunderdog

Parent

#91

What to build

Replace the hardcoded max_iter=10 and default tol in the MICE IterativeImputer with values computed from a seven-signal framework, aggregated conservatively across all MICE columns. Add post-fit convergence monitoring: if n_iter_ == max_iter, append a convergence warning to every MICE column's ColumnImputationRecord.signals.

The seven signals and their MICE-specific aggregation rules:

  1. NonlinearityTag — most complex tag across MICE columns; ComplexNonlinear adds to base max_iter
  2. Feature matrix missingness fraction — fraction of NaN cells across the full MICE matrix
  3. R² strength — minimum R²_linear across MICE columns (worst-case convergence speed); high minimum R² reduces max_iter
  4. Inter-feature correlation — maximum pairwise Pearson |r| among MICE columns (from CorrelationProfiler); high correlation increases max_iter
  5. Complete row fraction — fraction of rows with no NaN across all MICE columns; low fraction increases max_iter
  6. Scale-relative toltol = min_iqr * scaling_factor where min_iqr is the minimum IQR across MICE columns
  7. ComplexNonlinear tol tightening — tighter tol when most-complex tag is ComplexNonlinear

Implement as _compute_mice_max_iter and _compute_mice_tol helpers, parallel to the existing _compute_max_iter / _compute_tol used by the Regression strategy.

Acceptance criteria

  • IterativeImputer for the MICE block is constructed with dynamically computed max_iter and tol (not hardcoded 10 / sklearn default)
  • _compute_mice_max_iter applies all seven signals with MICE-specific aggregation (min R², max inter-column |r|, full-matrix missingness)
  • _compute_mice_tol uses min_iqr across MICE columns, with tighter scaling for ComplexNonlinear
  • Post-fit: if n_iter_ == max_iter, a convergence warning is appended to every MICE column's signals (not only the first or last)
  • Unit test: a MICE block engineered to hit max_iter before convergence produces a convergence warning signal on all MICE column records
  • Unit test: a well-conditioned MICE block that converges early does not produce a convergence warning
  • Numpy-style docstrings on _compute_mice_max_iter and _compute_mice_tol

Blocked by

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions