[WIP] Replace transactions rebase onto refreshed metadata#15092
[WIP] Replace transactions rebase onto refreshed metadata#15092smaheshwar-pltr wants to merge 4 commits intoapache:mainfrom
Conversation
| @Test | ||
| public void testConcurrentReplaceTransactions() { | ||
| @ParameterizedTest | ||
| @ValueSource(ints = {2, 3}) |
There was a problem hiding this comment.
#15091 shows the failure of this V3 test, prior to this PR
| // All three successfully committed snapshots should be present | ||
| assertThat(afterSecondReplace.snapshots()).hasSize(3); |
There was a problem hiding this comment.
#15090 shows the failure of this added line, prior to this PR
There was a problem hiding this comment.
can you just add a new test please to show where exactly stuff fails with V3?
There was a problem hiding this comment.
Makes sense - I've updated the PR description in the meantime to cover this.
(Any concurrent change to a table's snapshots causes the replace transaction to fail entirely for the REST catalog, due to server-side row-lineage validation. I'll put up an issue to track, actually)
| private boolean hasLastOpCommitted; | ||
| private final MetricsReporter reporter; | ||
|
|
||
| private Schema replaceSchema; |
There was a problem hiding this comment.
TODO: This change turned out to be more breaking than I expected. If we want to proceed, see if this can be cleaned up
There was a problem hiding this comment.
(Realise this is the sort of change that'd require a dev list discussion - I wanted to experiment with this approach first)
| if (base != underlyingOps.refresh()) { | ||
| // use refreshed the metadata | ||
| try { | ||
| underlyingOps.refresh(); |
There was a problem hiding this comment.
TODO: TestHiveCreateReplaceTable#testCreateOrReplaceTableTxnTableDeletedConcurrently shows an NPE where this refresh actually returns null instead of NoSuchTableException being thrown. Consider handling that here, fixing if it's a bug, or leaving for now (as that's maybe how concurrent appends with dropped failed prior to this PR)
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
not stale |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Motivation
There are a few issues related to table replaces.
BaseTransaction.commitReplaceTransaction()does not re-apply replacement and transaction updates onto refreshed metadata. When concurrent changes occur, the transaction therefore commits stale metadata.When a
REPLACEtransaction commits after concurrent changes (appends, snapshot expiration, other replaces), it overwrites those changes with stale metadata. This can lead to snapshot history loss, and concurrent snapshot expiration can even cause table corruption. (#15090)V3 tables require that
snapshot.first-row-id>=table.next-row-idwhen adding a snapshot. The snapshot'sfirst-row-idis set frombase.nextRowId()when the snapshot is produced.With REST catalogs, updates are sent to the server and applied to the server's current metadata. If a concurrent commit advanced the server's
next-row-id, the snapshot'sfirst-row-id(based on stale metadata) will be behind:This is returned as
CommitFailedExceptionso the client can retry, butcommitReplaceTransactionretries the same stalecurrent— the snapshot still has the oldfirst-row-id, so it fails every time. Therefore, in V3, any concurrent snapshot change in general (append, compaction, other replace) causes the replace to fail entirely. (#15905)Less severe, but there are currently behaviour differences in concurrent replaces for REST vs non-REST catalogs due to this. E.g. for REST catalogs, properties are sent as a
SetPropertiesdelta and the server generally merges them viaputAll, so concurrent property additions that have succeed survive a concurrent table replace. For non-REST catalogs though, they don't as the fullTableMetadataobject is committed directly, so the stalecurrentoverwrites all concurrent property changes.This PR
This PR makes replace (and createOrReplace) transactions rebase their changes onto refreshed table metadata, using the same
applyUpdatesmechanism thatcommitSimpleTransactionalready uses.The
startmetadata (the initialbuildReplacementresult) is stored onBaseTransactionto allow the replacement to be rebuiltAlso: in
RESTTableOperations, thereplaceBasefield used before to generate requirements is removed - requirements are now generated frombaseand kept in sync viaapplyUpdates.Note also that with the current PR, schema field IDs may be re-derived on rebase as the metadata is rebuilt. That could then lead to old files referencing old IDs added during the transaction (I think, need to think about this...).