Skip to content

API: Implement startsWith bounds check in StrictMetricsEvaluator#15902

Open
bharos wants to merge 1 commit intoapache:mainfrom
bharos:perf/strict-metrics-starts-with-bounds
Open

API: Implement startsWith bounds check in StrictMetricsEvaluator#15902
bharos wants to merge 1 commit intoapache:mainfrom
bharos:perf/strict-metrics-starts-with-bounds

Conversation

@bharos
Copy link
Copy Markdown
Contributor

@bharos bharos commented Apr 6, 2026

What

Implements bounds-based evaluation for startsWith in
StrictMetricsEvaluator, replacing the unconditional
ROWS_MIGHT_NOT_MATCH return with actual logic.

Previously, startsWith always returned ROWS_MIGHT_NOT_MATCH,
which prevented the engine from eliminating the residual predicate even
when file-level column bounds made it provable that every value started
with the given prefix.

Changes

  • StrictMetricsEvaluator.startsWith: Added checks for nested
    columns, null-containing columns, and lower/upper bound comparisons
    against the prefix. Returns ROWS_MUST_MATCH when both bounds start
    with the prefix.
  • TestStrictMetricsEvaluator: Added 9 test methods covering:
    both bounds match prefix, single-char prefix match, only lower bound
    matches, bounds outside prefix range, wider range, missing stats,
    all-nulls, some-nulls, and prefix longer than bounds.

How it works

For STARTS WITH <prefix>:

  • If the column can contain nulls → ROWS_MIGHT_NOT_MATCH (conservative)
  • If the lower bound is shorter than the prefix → ROWS_MIGHT_NOT_MATCH
  • If the lower bound (truncated to prefix length) equals the prefix and
    the upper bound (truncated to prefix length) equals the prefix →
    ROWS_MUST_MATCH (all values in the file start with the prefix)
  • Otherwise → ROWS_MIGHT_NOT_MATCH (conservative)

This follows the same pattern used by eq and the PR for
notStartsWith bounds (#15883) check in this class.

Closes #15901

StrictMetricsEvaluator.startsWith() previously returned ROWS_MIGHT_NOT_MATCH
unconditionally, without using column bounds. This misses the opportunity to
determine that all rows in a file must match when both lower and upper bounds
start with the given prefix.

When both bounds start with the prefix, all values between them must also
start with it, so the evaluator can return ROWS_MUST_MATCH. This enables
whole-file matching optimizations for prefix-based operations.
@github-actions github-actions bot added the API label Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

StrictMetricsEvaluator does not use column bounds to evaluate startsWith

1 participant