Skip to content

reader: ScanIterator.compareValues swallows ClassCastException, silently disabling zone-map pruning on type-mismatched filters #159

Description

@dfa1

Summary

ScanIterator.compareValues(Object a, Object b) catches ClassCastException and returns 0. Because the canPruneChunk arms treat compareValues(...) < 0 / > 0 as the prune condition, a 0 means "cannot prune" — so a filter whose value type does not match the stored zone-map stat type silently prunes nothing and the whole column is decoded. No error, no log: a correctness-safe but performance-silent failure.

Where

reader/src/main/java/io/github/dfa1/vortex/reader/ScanIterator.java

private static int compareValues(Object a, Object b) {
    try {
        return ((Comparable<Object>) a).compareTo(b);
    } catch (ClassCastException _) {
        return 0;   // <-- mismatch => 0 => canPruneChunk yields false => no pruning
    }
}

Integer zone stats decode as Long (ArrayStats.decodeScalar returns int64_value()), and floats as Double. So a caller that builds a RowFilter with an Integer (or Float) value against an integer (or f32) column gets zero pruning even though the predicate is perfectly valid — the comparator just throws ClassCastException internally and is swallowed.

How it surfaced

Building the Calcite adapter's filter push-down (feat/vortex-calcite-demo), a WHERE date BETWEEN ... on an I32 column pruned 0 of 100 chunks until the literal was coerced from Integer to Long. The filter, expansion, and canPruneChunk path were all correct — only the boxed type differed, and the swallow hid it.

Impact

  • Silent performance cliff: a valid, selective filter degrades to a full scan with no signal.
  • Easy to hit from any caller that boxes filter values at the column's "natural" width (Integer for I32, Float for F32) rather than the stat's storage width (Long / Double).

Suggested fix (pick one)

  1. Normalise numeric comparison in compareValues — if both args are Number but different boxed types, compare via Long/Double (or BigDecimal) instead of returning 0. Most robust; makes pruning width-agnostic.
  2. Coerce on RowFilter construction — normalise integer values to Long and float values to Double when a RowFilter is built, matching the stat storage types.
  3. At minimum, do not return 0 on mismatch — a genuine type mismatch (e.g. comparing a string filter to a numeric column) should be a clear error or a logged "pruning skipped", not a silent no-op.

Option 1 is preferred: it fixes every caller, not just the careful ones.

Notes

The Calcite adapter worked around this by coercing integer literals to Long (commit on feat/vortex-calcite-demo), so it is unblocked — but the underlying reader trap should be fixed so the next caller does not rediscover it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions