Skip to content

[WIP] Replace transactions rebase onto refreshed metadata#15092

Closed
smaheshwar-pltr wants to merge 4 commits intoapache:mainfrom
smaheshwar-pltr:sm/replace-applies-to-current-metadata
Closed

[WIP] Replace transactions rebase onto refreshed metadata#15092
smaheshwar-pltr wants to merge 4 commits intoapache:mainfrom
smaheshwar-pltr:sm/replace-applies-to-current-metadata

Conversation

@smaheshwar-pltr
Copy link
Copy Markdown
Contributor

@smaheshwar-pltr smaheshwar-pltr commented Jan 20, 2026

Motivation

There are a few issues related to table replaces. BaseTransaction.commitReplaceTransaction() does not re-apply replacement and transaction updates onto refreshed metadata. When concurrent changes occur, the transaction therefore commits stale metadata.

When a REPLACE transaction commits after concurrent changes (appends, snapshot expiration, other replaces), it overwrites those changes with stale metadata. This can lead to snapshot history loss, and concurrent snapshot expiration can even cause table corruption. (#15090)

V3 tables require that snapshot.first-row-id >= table.next-row-id when adding a snapshot. The snapshot's first-row-id is set from base.nextRowId() when the snapshot is produced.

With REST catalogs, updates are sent to the server and applied to the server's current metadata. If a concurrent commit advanced the server's next-row-id, the snapshot's first-row-id (based on stale metadata) will be behind:

Cannot add a snapshot, first-row-id is behind table next-row-id: 100 < 150

This is returned as CommitFailedException so the client can retry, but commitReplaceTransaction retries the same stale current — the snapshot still has the old first-row-id, so it fails every time. Therefore, in V3, any concurrent snapshot change in general (append, compaction, other replace) causes the replace to fail entirely. (#15905)

Less severe, but there are currently behaviour differences in concurrent replaces for REST vs non-REST catalogs due to this. E.g. for REST catalogs, properties are sent as a SetProperties delta and the server generally merges them via putAll, so concurrent property additions that have succeed survive a concurrent table replace. For non-REST catalogs though, they don't as the full TableMetadata object is committed directly, so the stale current overwrites all concurrent property changes.

This PR

This PR makes replace (and createOrReplace) transactions rebase their changes onto refreshed table metadata, using the same applyUpdates mechanism that commitSimpleTransaction already uses.

The start metadata (the initial buildReplacement result) is stored on BaseTransaction to allow the replacement to be rebuilt

Also: in RESTTableOperations, the replaceBase field used before to generate requirements is removed - requirements are now generated from base and kept in sync viaapplyUpdates.

Note also that with the current PR, schema field IDs may be re-derived on rebase as the metadata is rebuilt. That could then lead to old files referencing old IDs added during the transaction (I think, need to think about this...).

@github-actions github-actions bot added the core label Jan 20, 2026
@Test
public void testConcurrentReplaceTransactions() {
@ParameterizedTest
@ValueSource(ints = {2, 3})
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15091 shows the failure of this V3 test, prior to this PR

Comment on lines +2699 to +2700
// All three successfully committed snapshots should be present
assertThat(afterSecondReplace.snapshots()).hasSize(3);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15090 shows the failure of this added line, prior to this PR

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just add a new test please to show where exactly stuff fails with V3?

Copy link
Copy Markdown
Contributor Author

@smaheshwar-pltr smaheshwar-pltr Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - I've updated the PR description in the meantime to cover this.
(Any concurrent change to a table's snapshots causes the replace transaction to fail entirely for the REST catalog, due to server-side row-lineage validation. I'll put up an issue to track, actually)

@smaheshwar-pltr smaheshwar-pltr changed the title Fix: Replace transactions rebase onto refreshed metadata [WIP] Fix: Replace transactions rebase onto refreshed metadata Jan 20, 2026
private boolean hasLastOpCommitted;
private final MetricsReporter reporter;

private Schema replaceSchema;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: This change turned out to be more breaking than I expected. If we want to proceed, see if this can be cleaned up

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Realise this is the sort of change that'd require a dev list discussion - I wanted to experiment with this approach first)

if (base != underlyingOps.refresh()) {
// use refreshed the metadata
try {
underlyingOps.refresh();
Copy link
Copy Markdown
Contributor Author

@smaheshwar-pltr smaheshwar-pltr Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: TestHiveCreateReplaceTable#testCreateOrReplaceTableTxnTableDeletedConcurrently shows an NPE where this refresh actually returns null instead of NoSuchTableException being thrown. Consider handling that here, fixing if it's a bug, or leaving for now (as that's maybe how concurrent appends with dropped failed prior to this PR)

@smaheshwar-pltr smaheshwar-pltr changed the title [WIP] Fix: Replace transactions rebase onto refreshed metadata [WIP] Replace transactions rebase onto refreshed metadata Jan 20, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Feb 20, 2026
@smaheshwar-pltr
Copy link
Copy Markdown
Contributor Author

smaheshwar-pltr commented Feb 20, 2026

not stale

@github-actions github-actions bot removed the stale label Feb 21, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Mar 23, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants