Skip to content

Support core-owned OFFSET + LIMIT scan pruning for DataFusion #220

@QuakeWang

Description

@QuakeWang

Search before asking

  • I searched in the issues and found nothing similar.

Description

PaimonTableProvider::scan(..., limit) already allows plain LIMIT queries to pass a row-count hint into paimon-core scan planning, so simple limit queries can already benefit from conservative split pruning.

However, support for OFFSET + LIMIT is still incomplete. DataFusion can derive a tighter row requirement (skip + fetch) during optimization, but we do not yet have a clean, core-owned way to refine scan planning with that information.

The key design constraint is that scan-side row-count pruning semantics should stay owned by paimon-core scan planning, rather than being reimplemented inside the DataFusion integration layer. In particular, we should avoid making PaimonTableScan own a DataFusion-specific fetch contract (fetch() / with_fetch()) together with duplicated planning state.

Expected direction

  • paimon-core owns fetch/limit-aware split pruning semantics
  • DataFusion only decides whether it is safe to pass a row-count hint into core scan planning
  • future support for OFFSET + LIMIT should reuse core planning state rather than rebuilding planning logic inside the integration layer
  • final LIMIT / OFFSET semantics remain enforced by DataFusion
  • split pruning stays conservative and fail-open for unsafe cases such as residual or inexact filters

Willingness to contribute

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions