Skip to content

Scope 16 / Phase 1: Compute tail asymmetry signal in NumericProfiler #231

Description

@DEVunderdog

Parent

#105

What to build

Compute tail_asymmetry_ratio and classify TailAsymmetryTag in NumericProfiler from the already-available PercentileSnapshot. No additional data pass over the column is required.

Computation (_numeric_profiler.py)

For each numeric column, after percentiles are computed:

tail_asymmetry_ratio = (p99 − p95) / (p5 − p1)
  • When p5 == p1 (flat left tail — denominator is zero), store tail_asymmetry_ratio = None and leave tail_asymmetry_tag = None. Do not raise.
  • Otherwise, classify using NumericProfileConfig thresholds:
    • ratio > tail_asymmetry_right_thresholdTailAsymmetryTag.RightHeavy
    • ratio < tail_asymmetry_left_thresholdTailAsymmetryTag.LeftHeavy
    • otherwise → TailAsymmetryTag.Symmetric
  • Store both values on NumericStats.tail_asymmetry_ratio and NumericStats.tail_asymmetry_tag

Acceptance criteria

  • A column with a disproportionately heavy right extreme tail (p99 − p95 >> p5 − p1) stores TailAsymmetryTag.RightHeavy
  • A column with a disproportionately heavy left extreme tail stores TailAsymmetryTag.LeftHeavy
  • A column with balanced extreme tails stores TailAsymmetryTag.Symmetric
  • A column where p5 == p1 stores tail_asymmetry_ratio = None and tail_asymmetry_tag = None without raising
  • tail_asymmetry_ratio is computed from PercentileSnapshot values already present — no second pass over the raw column data
  • tail_asymmetry_right_threshold and tail_asymmetry_left_threshold are read from NumericProfileConfig (configurable, not hard-coded)
  • All existing NumericProfiler tests pass (no regressions)

Blocked by

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions