[spark] Support partition statistics in SHOW TABLE EXTENDED PARTITION command#7612
[spark] Support partition statistics in SHOW TABLE EXTENDED PARTITION command#7612kerwin-zk wants to merge 3 commits intoapache:masterfrom
Conversation
739ceae to
9c62ca7
Compare
There was a problem hiding this comment.
Pull request overview
Adds partition-level statistics to Spark’s SHOW TABLE EXTENDED ... PARTITION(...) output for Paimon tables by wiring SupportsPartitionManagement.loadPartitionMetadata to real partition stats and surfacing them in the Spark 3 command implementation.
Changes:
- Implement
loadPartitionMetadatainPaimonPartitionManagementto return partition stats (record count, file size, file count, last file creation time) from snapshot partition entries. - Update Spark 3
PaimonShowTablePartitionCommandto display “Partition Parameters” and a human-readable “Partition Statistics” line. - Extend unit tests to assert that partition parameters (and recordCount values) appear in
SHOW TABLE EXTENDED ... PARTITION(...)output.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| paimon-spark/paimon-spark3-common/src/main/scala/org/apache/paimon/spark/commands/PaimonShowTablePartitionCommand.scala | Formats and prints partition metadata and derived partition statistics in SHOW TABLE EXTENDED (Spark 3 path). |
| paimon-spark/paimon-spark-common/src/main/scala/org/apache/paimon/spark/PaimonPartitionManagement.scala | Implements loadPartitionMetadata by reading partition entries from the snapshot reader and returning stats as metadata. |
| paimon-spark/paimon-spark-ut/src/test/scala/org/apache/paimon/spark/sql/DescribeTableTestBase.scala | Adds assertions that SHOW TABLE EXTENDED PARTITION output includes partition parameter keys and expected recordCount values. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Partition Parameters and Partition Statistics | ||
| val metadata = partitionTable.loadPartitionMetadata(row) | ||
| if (!metadata.isEmpty) { | ||
| val metadataMap = metadata.asScala | ||
| results.put( | ||
| "Partition Parameters", | ||
| s"{${metadataMap.map { case (k, v) => s"$k=$v" }.mkString(", ")}}") | ||
|
|
||
| val fileSizeInBytes = | ||
| metadataMap.getOrElse(PartitionStatistics.FIELD_FILE_SIZE_IN_BYTES, "0").toLong | ||
| val recordCount = | ||
| metadataMap.getOrElse(PartitionStatistics.FIELD_RECORD_COUNT, "0").toLong | ||
| results.put("Partition Statistics", s"$recordCount rows, $fileSizeInBytes bytes") | ||
| } |
There was a problem hiding this comment.
The command now adds a new "Partition Statistics" line (derived from loadPartitionMetadata), but the updated unit test only asserts that partition parameters exist and that recordCount has expected values. Add an assertion that the output includes the "Partition Statistics" section (and ideally that it reflects the expected row count / byte size) so this new user-facing behavior is covered and less likely to regress.
9c62ca7 to
e9792f3
Compare
e9792f3 to
25fade7
Compare
| res2.select("information").collect().head.getString(0).contains("Partition Values")) | ||
|
|
||
| val info2 = res2.select("information").collect().head.getString(0) | ||
| Assertions.assertTrue( |
There was a problem hiding this comment.
A tiny comment: we should only call 'assertTrue(boolean condition)', and we don't need to throw exception when fail to assert, which is meaningless in unit tests.
Purpose
Examples
Before
Only partition values were displayed. The TODO
"Partition Parameters", "Created Time", "Last Access", "Partition Statistics"was left unimplemented, andloadPartitionMetadataalways returned an empty map.After
Partition Values: [dt=2025-01-01]
Partition Parameters: {recordCount=2, fileSizeInBytes=741, fileCount=1, lastFileCreationTime=1744105200000}
Partition Statistics: 2 rows, 741 bytes
Tests
CI