Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements a wound-wait deadlock prevention policy for transactions, replacing the previous wait-die approach. It also introduces transaction kill messaging, refactors lock manager internals, adds runInTransaction to TxManager, and restructures test hierarchies for lock management tests.
Changes:
- Replaces
DeadlockPreventionPolicyImplwith specific policy classes (WoundWaitDeadlockPreventionPolicy,NoWaitDeadlockPreventionPolicy,TimeoutDeadlockPreventionPolicy,ReversedWaitDieDeadlockPreventionPolicy) and refactorsHeapLockManagerto useallowWaitcallback with a "sealable" tx map - Adds
TxKillMessagefor cross-node transaction kill signaling and integrates it intoTxManagerImplas the wound-wait fail action - Moves
runInTransactionfromIgniteTransactionsdefault methods intoTxManagerand refactors the retry logic inRunInTransactionInternalImpl
Reviewed changes
Copilot reviewed 54 out of 54 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
DeadlockPreventionPolicy.java |
Adds allowWait, failAction, reverse methods; removes usePriority |
WoundWaitDeadlockPreventionPolicy.java |
New wound-wait policy implementation |
WaitDieDeadlockPreventionPolicy.java |
Adds allowWait and reverse implementations |
NoWaitDeadlockPreventionPolicy.java |
New no-wait policy |
TimeoutDeadlockPreventionPolicy.java |
New timeout-only policy |
ReversedWaitDieDeadlockPreventionPolicy.java |
New reversed wait-die policy |
DeadlockPreventionPolicyImpl.java |
Deleted — replaced by specific policy classes |
HeapLockManager.java |
Major refactor: sealable tx map, tryAcquireInternal, findConflicts, callback-based conflict resolution |
TxKillMessage.java |
New network message for tx kill requests |
TxMessageGroup.java |
Registers new TX_KILL_MESSAGE type |
TxMessageSender.java |
Adds kill() method |
TxManagerImpl.java |
Switches to wound-wait policy, adds kill message handler, adds runInTransaction |
ReadWriteTransactionImpl.java |
Makes killed volatile, moves assignment, exposes enlistFailedException |
InternalTransaction.java |
Adds enlistFailedException default method |
PublicApiThreadingTransaction.java |
Delegates enlistFailedException |
IgniteTransactions.java |
Makes runInTransaction/runInTransactionAsync abstract |
IgniteTransactionsImpl.java |
Implements runInTransaction/runInTransactionAsync |
ClientTransactions.java |
Stub implementations throwing IllegalArgumentException |
RestartProofIgniteTransactions.java |
Delegates new methods |
PublicApiThreadingIgniteTransactions.java |
Delegates new methods |
RunInTransactionInternalImpl.java |
Refactored retry logic, made public |
TransactionIds.java |
Adds retryCnt field to tx ID encoding |
InternalTxOptions.java |
Adds retryId option |
Lock.java |
Adds equals/hashCode |
LockKey.java |
Improves toString for ByteBuffer keys |
LockManager.java |
Adds policy() method |
TransactionKilledException.java |
Adds simplified constructor |
InternalTableImpl.java |
Delegates to tx.enlistFailedException() |
PartitionReplicaListener.java |
Replaces TxCleanupReadyState with PartitionInflights |
PartitionInflights.java |
New inflight tracking for partition replicas |
TraceableFuture.java |
Debug utility (should be removed) |
ThreadAssertingMvPartitionStorage.java |
Disables thread assertion (should be reverted) |
TpccBenchmarkNodeRunner.java |
Developer-local benchmark runner |
ItDataConsistencyTest.java |
Test adjustments for new policy |
AbstractLockingTest.java |
Refactored base test class |
AbstractLockManagerTest.java |
Deleted — tests moved to HeapLockManagerTest |
HeapLockManagerTest.java |
Now extends AbstractLockingTest, contains moved tests |
| Various test files | Updated to use new policy classes and matchers |
LockWaiterMatcher.java, LockFutureMatcher.java, LockConflictMatcher.java |
New Hamcrest matchers for lock test assertions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
...r/src/integrationTest/java/org/apache/ignite/internal/benchmark/TpccBenchmarkNodeRunner.java
Outdated
Show resolved
Hide resolved
...ions/src/main/java/org/apache/ignite/internal/tx/impl/WoundWaitDeadlockPreventionPolicy.java
Outdated
Show resolved
Hide resolved
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/TxMessageSender.java
Show resolved
Hide resolved
...rc/main/java/org/apache/ignite/internal/table/distributed/replicator/PartitionInflights.java
Show resolved
Hide resolved
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/TxMessageSender.java
Outdated
Show resolved
Hide resolved
...transactions/src/testFixtures/java/org/apache/ignite/internal/tx/test/LockWaiterMatcher.java
Outdated
Show resolved
Hide resolved
modules/api/src/main/java/org/apache/ignite/tx/IgniteTransactions.java
Outdated
Show resolved
Hide resolved
...-api/src/main/java/org/apache/ignite/internal/storage/ThreadAssertingMvPartitionStorage.java
Outdated
Show resolved
Hide resolved
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/TxManagerImpl.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/apache/ignite/internal/table/distributed/replicator/TraceableFuture.java
Outdated
Show resolved
Hide resolved
01373d9 to
c7e434f
Compare
bf3d308 to
46abfdc
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces a new deadlock-prevention framework centered around a fair wound-wait policy, including coordinator-driven transaction kill, lock-manager behavior changes, and broad test updates to run under both wound-wait and wait-die/reversed policies.
Changes:
- Implement wound-wait deadlock prevention (including tx “kill” messaging) and refactor deadlock-prevention policies into dedicated implementations.
- Rework
HeapLockManagerconflict handling and add new concurrency guards to address rollback/enlistment races. - Update/parameterize a large suite of unit/integration tests to support multiple lock policies and new retry semantics.
Reviewed changes
Copilot reviewed 64 out of 64 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| modules/transactions/src/testFixtures/java/org/apache/ignite/internal/tx/test/LockWaiterMatcher.java | New matcher for “waiting” lock futures (used by updated tests). |
| modules/transactions/src/testFixtures/java/org/apache/ignite/internal/tx/test/LockFutureMatcher.java | New matcher for verifying granted lock futures. |
| modules/transactions/src/testFixtures/java/org/apache/ignite/internal/tx/test/LockConflictMatcher.java | New matcher for verifying lock-conflict completion. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/WoundWaitDeadlockPreventionRollbackFailActionTest.java | New tests validating wound-wait behavior when failAction rolls back. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/WoundWaitDeadlockPreventionNoOpFailActionTest.java | New tests validating wound-wait behavior when failAction is a no-op. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/WaitDieDeadlockPreventionTest.java | Adjust tests to new policy/matcher infrastructure and lockKey helpers. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/TimeoutDeadlockPreventionTest.java | Switch to new timeout policy class and updated waiting/conflict matchers. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/ReversedWaitDieDeadlockPreventionTest.java | Replace old reversed-policy wiring with a dedicated reversed wait-die policy. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/NoWaitDeadlockPreventionTest.java | Switch to dedicated no-wait policy class and lockKey helper. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/NoneDeadlockPreventionTest.java | Update tests to shared deadlock-prevention test base and wait matcher. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/LockManagerTxLabelTest.java | Switch to dedicated no-wait policy for label-in-message coverage. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/impl/OrphanDetectorTxLabelTest.java | Parameterize over wait-die and wound-wait policies. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/impl/OrphanDetectorTest.java | Parameterize over policies; construct lock manager per policy. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/HeapLockManagerEventsTest.java | Parameterize lock-manager events tests over policies. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/CoarseGrainedLockManagerTest.java | Adjust coarse-lock tests for new “who waits” semantics under policy changes. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/AbstractLockManagerTest.java | Remove large legacy lock-manager test base (superseded by new bases). |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/AbstractLockManagerEventsTest.java | Update to lockKey helper usage. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/AbstractLockingTest.java | Centralize lock manager setup, tx ordering, and lock acquisition helpers. |
| modules/transactions/src/test/java/org/apache/ignite/internal/tx/AbstractDeadlockPreventionTest.java | Refactor common scenarios to use new matchers and policy hook points. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/TransactionIds.java | Add hash(UUID, divisor) helper for striped hashing. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/message/TxMessageGroup.java | Add message type for tx kill. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/message/TxKillMessage.java | New fire-and-forget “kill tx” network message. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/LockManager.java | Expose the active deadlock prevention policy via policy(). |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/LockKey.java | Improve toString formatting for byte-buffer based keys. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/Lock.java | Add equals/hashCode to support matcher-based assertions. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/InternalTxOptions.java | Add retryId plumbing for retriable transaction creation. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/InternalTransaction.java | Add enlist-failure exception hook used by table operations. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/WoundWaitDeadlockPreventionPolicy.java | New wound-wait policy implementation. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/WaitDieDeadlockPreventionPolicy.java | Implement allowWait/reverse semantics explicitly for wait-die. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/TxMessageSender.java | Add kill send helper for coordinator-directed tx invalidation. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/TxManagerImpl.java | Default to wound-wait policy; add kill handling and retryId-based tx creation. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/TimeoutDeadlockPreventionPolicy.java | New “timeout wait” policy class. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/ReversedWaitDieDeadlockPreventionPolicy.java | New dedicated reversed wait-die policy class. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/ReadWriteTransactionImpl.java | Make kill flag volatile; expose enlist failure exception; adjust finish semantics. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/PublicApiThreadingTransaction.java | Delegate new enlist-failure exception method. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/NoWaitDeadlockPreventionPolicy.java | New no-wait policy class. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/HeapLockManager.java | Major lock acquisition/conflict resolution rewrite; sealing to prevent enlist races. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/DeadlockPreventionPolicyImpl.java | Remove old generic policy implementation. |
| modules/transactions/src/main/java/org/apache/ignite/internal/tx/DeadlockPreventionPolicy.java | Extend policy API with allowWait/failAction/reverse hooks. |
| modules/transactions/src/integrationTest/java/org/apache/ignite/tx/distributed/ItTransactionRecoveryTest.java | Make recovery tests conditional on policy semantics (reverse vs non-reverse). |
| modules/transactions/src/integrationTest/java/org/apache/ignite/internal/tx/ItRunInTransactionTest.java | Adjust retry test flow to account for policy differences. |
| modules/transactions/src/integrationTest/java/org/apache/ignite/internal/disaster/ItDisasterRecoveryReconfigurationTest.java | Skip scenario not compatible with wound-wait until follow-up ticket. |
| modules/table/src/testFixtures/java/org/apache/ignite/internal/table/TxAbstractTest.java | Update table tx tests for new waiting semantics and less timing-based assertions. |
| modules/table/src/main/java/org/apache/ignite/internal/table/distributed/storage/InternalTableImpl.java | Change retry eligibility; delegate enlist failure to transaction-provided exception. |
| modules/table/src/main/java/org/apache/ignite/internal/table/distributed/replicator/PartitionReplicaListener.java | Add PartitionInflights + guards to prevent rollback/enlist races and lock leaks. |
| modules/table/src/main/java/org/apache/ignite/internal/table/distributed/replicator/PartitionInflights.java | New inflight tracking primitive used to coordinate cleanup barriers. |
| modules/table/src/integrationTest/java/org/apache/ignite/internal/table/ItOperationRetryTest.java | Make some retry scenarios policy-dependent (only reversed/wait-die). |
| modules/table/src/integrationTest/java/org/apache/ignite/distributed/ItTxStateLocalMapTest.java | Adjust unlock-only state assertions (commit + read gating). |
| modules/table/src/integrationTest/java/org/apache/ignite/distributed/ItTxAbstractDistributedTestSingleNode.java | Skip implicit-timeout scenario unless applicable to wait-die behavior. |
| modules/sql-engine/src/integrationTest/java/org/apache/ignite/internal/sql/engine/systemviews/ItLocksSystemViewTest.java | Make lock-view conflict test resilient to policy direction. |
| modules/sql-engine/src/integrationTest/java/org/apache/ignite/internal/sql/engine/ItDmlTest.java | Adjust transactional scan test to ensure a true waiter/owner ordering. |
| modules/runner/src/integrationTest/java/org/apache/ignite/internal/table/ItTableScanTest.java | Make scan/insert concurrency tests policy-aware. |
| modules/runner/src/integrationTest/java/org/apache/ignite/internal/table/ItDataConsistencyTest.java | Update consistency stress test for new retry/kill behavior and stronger progress checks. |
| modules/runner/src/integrationTest/java/org/apache/ignite/internal/benchmark/TpccBenchmarkNodeRunner.java | Add a standalone runner for TPC-C style benchmarking setup. |
| modules/replicator/src/main/java/org/apache/ignite/internal/replicator/exception/ReplicaUnavailableException.java | Mark replica-unavailable as retriable for transaction retries. |
| modules/replicator/src/main/java/org/apache/ignite/internal/replicator/exception/ReplicationException.java | Remove RetriableTransactionException marker from base replication exception. |
| modules/platforms/cpp/tests/odbc-test/transaction_test.cpp | Disable a failing ODBC transaction test (linked ticket). |
| modules/core/src/testFixtures/java/org/apache/ignite/internal/testframework/IgniteTestUtils.java | Add ensureFutureNotCompleted helper to avoid sleep-based test timing. |
| modules/core/src/main/java/org/apache/ignite/internal/lang/NodeStoppingException.java | Remove retriable markers from node-stopping exception. |
| modules/client/src/integrationTest/java/org/apache/ignite/internal/streamer/ItClientDataStreamerLoadTest.java | Reduce N+1 gets; add policy-aware tolerances and extra logging. |
| modules/client/src/integrationTest/java/org/apache/ignite/internal/client/ItThinClientTransactionsTest.java | Make thin-client conflict tests policy-aware; adjust expected exception types. |
| modules/api/src/test/java/org/apache/ignite/tx/RunInTransactionRetryTest.java | Update expected retry behavior to include commit retryability. |
| modules/api/src/main/java/org/apache/ignite/tx/RunInTransactionInternalImpl.java | Make commit part of retriable closure path; simplify async flow; adjust retriable detection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/HeapLockManager.java
Show resolved
Hide resolved
|
|
||
| private static String dump(Object key) { | ||
| if (key instanceof ByteBuffer) { | ||
| return Arrays.toString(((ByteBuffer) key).array()); |
There was a problem hiding this comment.
LockKey.toString() can throw at runtime: ((ByteBuffer) key).array() fails for direct or read-only buffers (and ignores position/limit). Since LockKey is used with BinaryTuple.byteBuffer() (which returns a sliced buffer), this makes logging/debugging potentially crash. Use a safe dump that does not call array() unless hasArray(), and respects remaining() (e.g., copy bytes from a duplicate buffer).
| return Arrays.toString(((ByteBuffer) key).array()); | |
| ByteBuffer buffer = ((ByteBuffer) key).duplicate(); | |
| byte[] bytes = new byte[buffer.remaining()]; | |
| buffer.get(bytes); | |
| return Arrays.toString(bytes); |
There was a problem hiding this comment.
We do have a org.apache.ignite.internal.util.IgniteUtils#byteBufferToByteArray already, but it allocates new array. Maybe we need to introduce a more efficient toString variant in a separate issue. This code should reuse existing methods to avoid copy-pasting the code.
There was a problem hiding this comment.
Swtiched to byteBufferToByteArray.
This should be enough for dump, because it's not called on hot path.
If we need a separate ticket, I suggest to create it, but it shouldn't be the part of this PR.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/InternalTransaction.java
Outdated
Show resolved
Hide resolved
...transactions/src/testFixtures/java/org/apache/ignite/internal/tx/test/LockWaiterMatcher.java
Show resolved
Hide resolved
...c/integrationTest/java/org/apache/ignite/internal/streamer/ItClientDataStreamerLoadTest.java
Show resolved
Hide resolved
...ions/src/main/java/org/apache/ignite/internal/tx/impl/WoundWaitDeadlockPreventionPolicy.java
Show resolved
Hide resolved
modules/api/src/main/java/org/apache/ignite/tx/RunInTransactionInternalImpl.java
Show resolved
Hide resolved
| return Stream.of( | ||
| argumentSet("kv", new KillTestContext(TransactionException.class, ItThinClientTransactionsTest::putKv)), | ||
| argumentSet("sql", new KillTestContext(SqlException.class, ItThinClientTransactionsTest::putSql)) | ||
| argumentSet("sql", new KillTestContext(IgniteException.class, ItThinClientTransactionsTest::putSql)) |
There was a problem hiding this comment.
Can we somehow check a more specific type here? IgniteException is too broad for an integration test
There was a problem hiding this comment.
Reverted to SqlException
| + " \"network\": {\n" | ||
| + " \"port\":{},\n" | ||
| + " \"nodeFinder\":{\n" | ||
| + " \"netClusterNodes\": [ {} ]\n" |
There was a problem hiding this comment.
| + " \"network\": {\n" | |
| + " \"port\":{},\n" | |
| + " \"nodeFinder\":{\n" | |
| + " \"netClusterNodes\": [ {} ]\n" | |
| + " network: {\n" | |
| + " port:{},\n" | |
| + " nodeFinder:{\n" | |
| + " netClusterNodes: [ {} ]\n" |
| + " raft.fsync = " + fsync() + ",\n" | ||
| + " system.partitionsLogPath = \"" + logPath() + "\",\n" | ||
| + " failureHandler.handler: {\n" | ||
| + " type: \"" + StopNodeOrHaltFailureHandlerConfigurationSchema.TYPE + "\",\n" |
There was a problem hiding this comment.
How "safe" is it to use "halt" failure handler in integrationTest? Maybe "stop node" is enough?
There was a problem hiding this comment.
This is not an integration test. This class starts nodes for tpc-c benchmark to debug from IDE. Similar to PlatformTestNodeRunner
| protected static IgniteImpl igniteImpl; | ||
|
|
||
| public static void main(String[] args) throws Exception { | ||
| TpccBenchmarkNodeRunner runner = new TpccBenchmarkNodeRunner(); |
There was a problem hiding this comment.
Is it possible to reuse the code from org.apache.ignite.internal.Cluster or other test classes? I feel like there's a lot of copy-paste going on here
There was a problem hiding this comment.
Don't see much copypaste here.
This class delegates cluster management to TestIgnitionManager, which is used in many places.
...r/src/integrationTest/java/org/apache/ignite/internal/benchmark/TpccBenchmarkNodeRunner.java
Outdated
Show resolved
Hide resolved
|
|
||
| private static String dump(Object key) { | ||
| if (key instanceof ByteBuffer) { | ||
| return Arrays.toString(((ByteBuffer) key).array()); |
There was a problem hiding this comment.
We do have a org.apache.ignite.internal.util.IgniteUtils#byteBufferToByteArray already, but it allocates new array. Maybe we need to introduce a more efficient toString variant in a separate issue. This code should reuse existing methods to avoid copy-pasting the code.
| } | ||
|
|
||
| public static int hash(UUID txId, int divisor) { | ||
| return Math.floorMod(spread(txId.hashCode()), divisor); |
There was a problem hiding this comment.
spread is already positive, why don't we use % here?
| assertFalse(fut0.isDone()); | ||
|
|
||
| lockManager.release(lock); | ||
| fut0.thenAccept(l -> lockManager.release(l)); |
There was a problem hiding this comment.
We should probably wait for this future, and then assert that "everything's alright". Test doesn't look complete
There was a problem hiding this comment.
Fixed the test (however, this test is not written by me)
| try { | ||
| lockManager.acquire(txId1, key, X).join(); | ||
|
|
||
| fail(); | ||
| } catch (CompletionException e) { | ||
| // Expected. | ||
| } |
There was a problem hiding this comment.
| try { | |
| lockManager.acquire(txId1, key, X).join(); | |
| fail(); | |
| } catch (CompletionException e) { | |
| // Expected. | |
| } | |
| assertThat(lockManager.acquire(txId1, key, X), CompletableFutureExceptionMatcher.willThrow(CompletionException.class)); |
There was a problem hiding this comment.
With a static import, of course, I mentioned class name for comment's clarity only
| @Override | ||
| @Test | ||
| public void testNonFair() { | ||
| super.testNonFair(); | ||
| } |
There was a problem hiding this comment.
I don't think that I understand what's happening here. Is the test disabled in a superclass? Maybe it should be commented
There was a problem hiding this comment.
This is debugging artefact, removed.
| return this; | ||
| } | ||
|
|
||
| public Builder retryId(UUID id) { |
There was a problem hiding this comment.
never used (not even tested?)
There was a problem hiding this comment.
This is remnants of https://issues.apache.org/jira/browse/IGNITE-28448, which I refactored to separate ticker.
Removed.
|
|
||
| private final Supplier<Int2ObjectMap<IndexLocker>> indexesLockers; | ||
|
|
||
| private final ConcurrentMap<UUID, TxCleanupReadyState> txCleanupReadyFutures = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
TxCleanupReadyState is never used now
| * <p>Lock waiters are placed in the queue, ordered according to comparator provided by {@link HeapLockManager#deadlockPreventionPolicy}. | ||
| * When a new waiter is placed in the queue, it's validated against current lock owner: if there is an owner with a higher priority (as | ||
| * defined by comparator) lock request is denied. | ||
| * <p>Lock waiters are placed in the queue, ordered according to transaction priority: older transactions are first. |
There was a problem hiding this comment.
DeadlockPreventionPolicy#txIdComparator is still used, is it true about older transactions?
There was a problem hiding this comment.
I've changed javadoc to better reflect the implementation:
Lock waiters are placed in the queue, ordered according to transaction priority: higher priority transactions go first.
| private final VolatileTxStateMetaStorage txStateVolatileStorage; | ||
|
|
||
| private static class SealableQueue extends ConcurrentLinkedQueue<Releasable> { | ||
| boolean sealed; |
There was a problem hiding this comment.
Please add javadoc about what sealed means.
|
|
||
| // Waiter is allowed to wait for owner if it's younger. | ||
| // Otherwise we have to fail waiter. | ||
| return res > 0 ? null : waiter; |
There was a problem hiding this comment.
What is the reason of defining the same comparator for all policies but manipulating the comparison result in different ways? The primary goal of comparator was to avoid this.
There was a problem hiding this comment.
The main intention for this change was to make tx queue always ordered in the priority order, by the definition of a lock queue.
This makes reasoning about lock conflicts much simplier.
Additionaly, I've added allowWait to lock policy contract to explicitly define who can wait for who.
It's not just about comparison result. For example, no wait policy never allows to wait and timeout policy always allows to wait .
But they still need proper ordering (from higher priority to lower) to ensure no starvation happens (higher priority should acquire locks first)
You can look at this change as an attemp to define lock policy responsibilites in a more transparent and understandable way.
| */ | ||
| default boolean usePriority() { | ||
| return txIdComparator() != null; | ||
| default boolean reverse() { |
There was a problem hiding this comment.
This property is directly related to tx comparator. You can define different comparators (straight and reverse) for different policies, add reverse property to them and make this property refer to comparator's. This would simplify the logic.
There was a problem hiding this comment.
See my previous comment for reasoning about removing "reverse" comparator.
Yes I can, but I will prefer to have lock queue always ordered in priority order.
I've renamed this method to invertedWaitOrder to avoid associations with the "reverse" comparator.
| */ | ||
| public class ReplicationException extends IgniteInternalException implements RetriableTransactionException, | ||
| RetriableReplicaRequestException { | ||
| public class ReplicationException extends IgniteInternalException implements RetriableReplicaRequestException { |
There was a problem hiding this comment.
I am not sure that removing RetriableTransactionException from here is correct (same is about ReplicaUnavailableException). Transactions may be retried within timeout after failures caused by internal recoverable errors.
Note that some other exceptions are derived from this.
There was a problem hiding this comment.
I'm pretty sure in opposite: not any ReplicationException derived class is retriable.
I've discovered this by unifying retry logic between explicit and implicit transactions, which I believe should be the same:
private static boolean exceptionAllowsImplicitTxRetry(Throwable e) {
return ExceptionUtils.hasCause(e, RetriableTransactionException.class);
}
I've observed some tests failing due to unexpected retries
This PR introduces fair wound wait deadlock prevention algorithm.
List of major changes:
ItDataConsistencyTestandItClientDataStreamerLoadTesttests with WW enabled.RunInTransactionInternalImplto ensure a killed transaction is properly retried.Multiple follow-up tickets are created as consequences of introduced changes:
https://issues.apache.org/jira/browse/IGNITE-28447
https://issues.apache.org/jira/browse/IGNITE-28450
https://issues.apache.org/jira/browse/IGNITE-28365
https://issues.apache.org/jira/browse/IGNITE-28448
https://issues.apache.org/jira/browse/IGNITE-28458
https://issues.apache.org/jira/browse/IGNITE-28461
https://issues.apache.org/jira/browse/IGNITE-28464
A quick benchmark using ItDataConsistency scenario shows x5 throughput increase:
wound-wait:
[ItDataConsistencyTest] After test ops=53340 restarts=5162 fails=0 readOps=0 readFails=0
wound-die:
[ItDataConsistencyTest] After test ops=10731 restarts=1638 fails=0 readOps=0 readFails=0