Search before asking
Description
PaimonTableProvider::scan(..., limit) already allows plain LIMIT queries to pass a row-count hint into paimon-core scan planning, so simple limit queries can already benefit from conservative split pruning.
However, support for OFFSET + LIMIT is still incomplete. DataFusion can derive a tighter row requirement (skip + fetch) during optimization, but we do not yet have a clean, core-owned way to refine scan planning with that information.
The key design constraint is that scan-side row-count pruning semantics should stay owned by paimon-core scan planning, rather than being reimplemented inside the DataFusion integration layer. In particular, we should avoid making PaimonTableScan own a DataFusion-specific fetch contract (fetch() / with_fetch()) together with duplicated planning state.
Expected direction
- paimon-core owns fetch/limit-aware split pruning semantics
- DataFusion only decides whether it is safe to pass a row-count hint into core scan planning
- future support for
OFFSET + LIMIT should reuse core planning state rather than rebuilding planning logic inside the integration layer
- final
LIMIT / OFFSET semantics remain enforced by DataFusion
- split pruning stays conservative and fail-open for unsafe cases such as residual or inexact filters
Willingness to contribute
Search before asking
Description
PaimonTableProvider::scan(..., limit)already allows plainLIMITqueries to pass a row-count hint into paimon-core scan planning, so simple limit queries can already benefit from conservative split pruning.However, support for
OFFSET + LIMITis still incomplete. DataFusion can derive a tighter row requirement (skip + fetch) during optimization, but we do not yet have a clean, core-owned way to refine scan planning with that information.The key design constraint is that scan-side row-count pruning semantics should stay owned by paimon-core scan planning, rather than being reimplemented inside the DataFusion integration layer. In particular, we should avoid making
PaimonTableScanown a DataFusion-specific fetch contract (fetch()/with_fetch()) together with duplicated planning state.Expected direction
OFFSET + LIMITshould reuse core planning state rather than rebuilding planning logic inside the integration layerLIMIT/OFFSETsemantics remain enforced by DataFusionWillingness to contribute