Skip to content

Scope 16 / Phase 1: Data model — BimodalStats, TailAsymmetryTag, new NumericFlag members, NumericStats fields, NumericProfileConfig thresholds #229

Description

@DEVunderdog

Parent

#105

What to build

Add all new schema elements introduced by Scope 16 to the Phase 1 data model. No computation is implemented here — this slice is the pure type / field / config foundation that every other Scope 16 slice builds on.

New types (_numeric_config.py)

  • BimodalStats dataclass with fields: dip_statistic: float, dip_p_value: float, center1: float, center2: float
  • TailAsymmetryTag StrEnum with members: Symmetric = "symmetric", RightHeavy = "right_heavy", LeftHeavy = "left_heavy"
  • NumericFlag.Bimodal = "bimodal" and NumericFlag.HighOutlierDensity = "high_outlier_density" added to the existing StrEnum

NumericStats new fields

  • bimodal_stats: Optional[BimodalStats] = None
  • tail_asymmetry_tag: Optional[TailAsymmetryTag] = None
  • tail_asymmetry_ratio: Optional[float] = None
  • outlier_density: Optional[float] = None

to_dict() and from_dict() extended to round-trip all four new fields. BimodalStats serialises/deserialises as a nested dict (None when absent).

NumericProfileConfig five new threshold fields

  • bimodal_dip_p_value_threshold: float = 0.05
  • tail_asymmetry_right_threshold: float = 2.0
  • tail_asymmetry_left_threshold: float = 0.5
  • outlier_sigma_threshold: float = 3.0
  • high_outlier_density_threshold: float = 0.05

to_dict() and from_dict() extended for all five fields.

Acceptance criteria

  • BimodalStats dataclass exists with the four specified fields and is importable from dataforge_ml.profiling._numeric_config
  • TailAsymmetryTag StrEnum exists with Symmetric, RightHeavy, LeftHeavy
  • NumericFlag.Bimodal and NumericFlag.HighOutlierDensity are present in the NumericFlag StrEnum
  • NumericStats carries the four new fields, all defaulting to None
  • NumericStats.to_dict() serialises bimodal_stats as a nested dict when present and None otherwise; from_dict() reconstructs a BimodalStats from the nested dict
  • NumericStats.to_dict() / from_dict() round-trips tail_asymmetry_tag, tail_asymmetry_ratio, and outlier_density correctly
  • NumericProfileConfig exposes all five new threshold fields with the specified defaults
  • NumericProfileConfig.to_dict() / from_dict() round-trips all five new fields; missing keys fall back to defaults
  • Existing NumericStats and NumericProfileConfig round-trip tests pass without modification (no regressions)
  • All in-scope symbols carry numpy-style docstrings per ADR-0034

Blocked by

None — can start immediately.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions