Parent
#105
What to build
Compute outlier_density and set NumericFlag.HighOutlierDensity in NumericProfiler. Outlier density is the fraction of non-null values that lie beyond outlier_sigma_threshold standard deviations from the column mean.
Computation (_numeric_profiler.py)
After mean and std are computed for a column:
outlier_density = count(|value − mean| > outlier_sigma_threshold × std) / n_non_null
- Store as
NumericStats.outlier_density
- When
outlier_density > high_outlier_density_threshold, append NumericFlag.HighOutlierDensity to NumericStats.flags
- Both thresholds are read from
NumericProfileConfig (outlier_sigma_threshold, high_outlier_density_threshold)
- Skip (leave
outlier_density = None, no flag) when std is None or zero (constant column)
Acceptance criteria
Blocked by
Parent
#105
What to build
Compute
outlier_densityand setNumericFlag.HighOutlierDensityinNumericProfiler. Outlier density is the fraction of non-null values that lie beyondoutlier_sigma_thresholdstandard deviations from the column mean.Computation (
_numeric_profiler.py)After
meanandstdare computed for a column:NumericStats.outlier_densityoutlier_density > high_outlier_density_threshold, appendNumericFlag.HighOutlierDensitytoNumericStats.flagsNumericProfileConfig(outlier_sigma_threshold,high_outlier_density_threshold)outlier_density = None, no flag) whenstdisNoneor zero (constant column)Acceptance criteria
NumericFlag.HighOutlierDensityand stores the correct fraction inNumericStats.outlier_densityoutlier_densityis still populatedoutlier_sigma_thresholdandhigh_outlier_density_thresholdare read fromNumericProfileConfig, not hard-codedoutlier_density = None, no flagHighOutlierDensityfires independently ofKurtosisTag— aMesokurticcolumn with high outlier fraction correctly receives the flagNumericProfilertests pass (no regressions)Blocked by
NumericFlag.HighOutlierDensity,NumericStats.outlier_densitymust exist)