Skip to content

IGNITE-24963#7799

Open
ascherbakoff wants to merge 56 commits intoapache:mainfrom
gridgain:ignite-24963-3
Open

IGNITE-24963#7799
ascherbakoff wants to merge 56 commits intoapache:mainfrom
gridgain:ignite-24963-3

Conversation

@ascherbakoff
Copy link
Copy Markdown
Contributor

@ascherbakoff ascherbakoff commented Mar 17, 2026

This PR introduces fair wound wait deadlock prevention algorithm.
List of major changes:

  1. Implemented wound-wait deadlock prevention by invalidating (rolling back) a conflicting younger transaction, if an older transaction attempts to acquire a lock. This approach ensures older transaction is never failed on lock conflict and newer transaction is allowed to wait for older.
  2. A tx invalidation relies on tx kill operation: org.apache.ignite.internal.tx.impl.ReadWriteTransactionImpl#kill
  3. Fixed multiple data races, which can arise on concurrent tx tollback and enlistment operation. Most of such races are reproduceable by ItDataConsistencyTest and ItClientDataStreamerLoadTest tests with WW enabled.
  4. Commit operation is made retriable in RunInTransactionInternalImpl to ensure a killed transaction is properly retried.
  5. Lock manager is changed to support both wound-wait and wait-die policies.
  6. Fixed multiple tests to work both in wound-wait and wait-die modes.

Multiple follow-up tickets are created as consequences of introduced changes:
https://issues.apache.org/jira/browse/IGNITE-28447
https://issues.apache.org/jira/browse/IGNITE-28450
https://issues.apache.org/jira/browse/IGNITE-28365
https://issues.apache.org/jira/browse/IGNITE-28448
https://issues.apache.org/jira/browse/IGNITE-28458
https://issues.apache.org/jira/browse/IGNITE-28461
https://issues.apache.org/jira/browse/IGNITE-28464

A quick benchmark using ItDataConsistency scenario shows x5 throughput increase:
wound-wait:
[ItDataConsistencyTest] After test ops=53340 restarts=5162 fails=0 readOps=0 readFails=0
wound-die:
[ItDataConsistencyTest] After test ops=10731 restarts=1638 fails=0 readOps=0 readFails=0

Copilot AI review requested due to automatic review settings March 17, 2026 07:03
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a wound-wait deadlock prevention policy for transactions, replacing the previous wait-die approach. It also introduces transaction kill messaging, refactors lock manager internals, adds runInTransaction to TxManager, and restructures test hierarchies for lock management tests.

Changes:

  • Replaces DeadlockPreventionPolicyImpl with specific policy classes (WoundWaitDeadlockPreventionPolicy, NoWaitDeadlockPreventionPolicy, TimeoutDeadlockPreventionPolicy, ReversedWaitDieDeadlockPreventionPolicy) and refactors HeapLockManager to use allowWait callback with a "sealable" tx map
  • Adds TxKillMessage for cross-node transaction kill signaling and integrates it into TxManagerImpl as the wound-wait fail action
  • Moves runInTransaction from IgniteTransactions default methods into TxManager and refactors the retry logic in RunInTransactionInternalImpl

Reviewed changes

Copilot reviewed 54 out of 54 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
DeadlockPreventionPolicy.java Adds allowWait, failAction, reverse methods; removes usePriority
WoundWaitDeadlockPreventionPolicy.java New wound-wait policy implementation
WaitDieDeadlockPreventionPolicy.java Adds allowWait and reverse implementations
NoWaitDeadlockPreventionPolicy.java New no-wait policy
TimeoutDeadlockPreventionPolicy.java New timeout-only policy
ReversedWaitDieDeadlockPreventionPolicy.java New reversed wait-die policy
DeadlockPreventionPolicyImpl.java Deleted — replaced by specific policy classes
HeapLockManager.java Major refactor: sealable tx map, tryAcquireInternal, findConflicts, callback-based conflict resolution
TxKillMessage.java New network message for tx kill requests
TxMessageGroup.java Registers new TX_KILL_MESSAGE type
TxMessageSender.java Adds kill() method
TxManagerImpl.java Switches to wound-wait policy, adds kill message handler, adds runInTransaction
ReadWriteTransactionImpl.java Makes killed volatile, moves assignment, exposes enlistFailedException
InternalTransaction.java Adds enlistFailedException default method
PublicApiThreadingTransaction.java Delegates enlistFailedException
IgniteTransactions.java Makes runInTransaction/runInTransactionAsync abstract
IgniteTransactionsImpl.java Implements runInTransaction/runInTransactionAsync
ClientTransactions.java Stub implementations throwing IllegalArgumentException
RestartProofIgniteTransactions.java Delegates new methods
PublicApiThreadingIgniteTransactions.java Delegates new methods
RunInTransactionInternalImpl.java Refactored retry logic, made public
TransactionIds.java Adds retryCnt field to tx ID encoding
InternalTxOptions.java Adds retryId option
Lock.java Adds equals/hashCode
LockKey.java Improves toString for ByteBuffer keys
LockManager.java Adds policy() method
TransactionKilledException.java Adds simplified constructor
InternalTableImpl.java Delegates to tx.enlistFailedException()
PartitionReplicaListener.java Replaces TxCleanupReadyState with PartitionInflights
PartitionInflights.java New inflight tracking for partition replicas
TraceableFuture.java Debug utility (should be removed)
ThreadAssertingMvPartitionStorage.java Disables thread assertion (should be reverted)
TpccBenchmarkNodeRunner.java Developer-local benchmark runner
ItDataConsistencyTest.java Test adjustments for new policy
AbstractLockingTest.java Refactored base test class
AbstractLockManagerTest.java Deleted — tests moved to HeapLockManagerTest
HeapLockManagerTest.java Now extends AbstractLockingTest, contains moved tests
Various test files Updated to use new policy classes and matchers
LockWaiterMatcher.java, LockFutureMatcher.java, LockConflictMatcher.java New Hamcrest matchers for lock test assertions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new deadlock-prevention framework centered around a fair wound-wait policy, including coordinator-driven transaction kill, lock-manager behavior changes, and broad test updates to run under both wound-wait and wait-die/reversed policies.

Changes:

  • Implement wound-wait deadlock prevention (including tx “kill” messaging) and refactor deadlock-prevention policies into dedicated implementations.
  • Rework HeapLockManager conflict handling and add new concurrency guards to address rollback/enlistment races.
  • Update/parameterize a large suite of unit/integration tests to support multiple lock policies and new retry semantics.

Reviewed changes

Copilot reviewed 64 out of 64 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
modules/transactions/src/testFixtures/java/org/apache/ignite/internal/tx/test/LockWaiterMatcher.java New matcher for “waiting” lock futures (used by updated tests).
modules/transactions/src/testFixtures/java/org/apache/ignite/internal/tx/test/LockFutureMatcher.java New matcher for verifying granted lock futures.
modules/transactions/src/testFixtures/java/org/apache/ignite/internal/tx/test/LockConflictMatcher.java New matcher for verifying lock-conflict completion.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/WoundWaitDeadlockPreventionRollbackFailActionTest.java New tests validating wound-wait behavior when failAction rolls back.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/WoundWaitDeadlockPreventionNoOpFailActionTest.java New tests validating wound-wait behavior when failAction is a no-op.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/WaitDieDeadlockPreventionTest.java Adjust tests to new policy/matcher infrastructure and lockKey helpers.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/TimeoutDeadlockPreventionTest.java Switch to new timeout policy class and updated waiting/conflict matchers.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/ReversedWaitDieDeadlockPreventionTest.java Replace old reversed-policy wiring with a dedicated reversed wait-die policy.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/NoWaitDeadlockPreventionTest.java Switch to dedicated no-wait policy class and lockKey helper.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/NoneDeadlockPreventionTest.java Update tests to shared deadlock-prevention test base and wait matcher.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/LockManagerTxLabelTest.java Switch to dedicated no-wait policy for label-in-message coverage.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/impl/OrphanDetectorTxLabelTest.java Parameterize over wait-die and wound-wait policies.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/impl/OrphanDetectorTest.java Parameterize over policies; construct lock manager per policy.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/HeapLockManagerEventsTest.java Parameterize lock-manager events tests over policies.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/CoarseGrainedLockManagerTest.java Adjust coarse-lock tests for new “who waits” semantics under policy changes.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/AbstractLockManagerTest.java Remove large legacy lock-manager test base (superseded by new bases).
modules/transactions/src/test/java/org/apache/ignite/internal/tx/AbstractLockManagerEventsTest.java Update to lockKey helper usage.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/AbstractLockingTest.java Centralize lock manager setup, tx ordering, and lock acquisition helpers.
modules/transactions/src/test/java/org/apache/ignite/internal/tx/AbstractDeadlockPreventionTest.java Refactor common scenarios to use new matchers and policy hook points.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/TransactionIds.java Add hash(UUID, divisor) helper for striped hashing.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/message/TxMessageGroup.java Add message type for tx kill.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/message/TxKillMessage.java New fire-and-forget “kill tx” network message.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/LockManager.java Expose the active deadlock prevention policy via policy().
modules/transactions/src/main/java/org/apache/ignite/internal/tx/LockKey.java Improve toString formatting for byte-buffer based keys.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/Lock.java Add equals/hashCode to support matcher-based assertions.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/InternalTxOptions.java Add retryId plumbing for retriable transaction creation.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/InternalTransaction.java Add enlist-failure exception hook used by table operations.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/WoundWaitDeadlockPreventionPolicy.java New wound-wait policy implementation.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/WaitDieDeadlockPreventionPolicy.java Implement allowWait/reverse semantics explicitly for wait-die.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/TxMessageSender.java Add kill send helper for coordinator-directed tx invalidation.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/TxManagerImpl.java Default to wound-wait policy; add kill handling and retryId-based tx creation.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/TimeoutDeadlockPreventionPolicy.java New “timeout wait” policy class.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/ReversedWaitDieDeadlockPreventionPolicy.java New dedicated reversed wait-die policy class.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/ReadWriteTransactionImpl.java Make kill flag volatile; expose enlist failure exception; adjust finish semantics.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/PublicApiThreadingTransaction.java Delegate new enlist-failure exception method.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/NoWaitDeadlockPreventionPolicy.java New no-wait policy class.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/HeapLockManager.java Major lock acquisition/conflict resolution rewrite; sealing to prevent enlist races.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/impl/DeadlockPreventionPolicyImpl.java Remove old generic policy implementation.
modules/transactions/src/main/java/org/apache/ignite/internal/tx/DeadlockPreventionPolicy.java Extend policy API with allowWait/failAction/reverse hooks.
modules/transactions/src/integrationTest/java/org/apache/ignite/tx/distributed/ItTransactionRecoveryTest.java Make recovery tests conditional on policy semantics (reverse vs non-reverse).
modules/transactions/src/integrationTest/java/org/apache/ignite/internal/tx/ItRunInTransactionTest.java Adjust retry test flow to account for policy differences.
modules/transactions/src/integrationTest/java/org/apache/ignite/internal/disaster/ItDisasterRecoveryReconfigurationTest.java Skip scenario not compatible with wound-wait until follow-up ticket.
modules/table/src/testFixtures/java/org/apache/ignite/internal/table/TxAbstractTest.java Update table tx tests for new waiting semantics and less timing-based assertions.
modules/table/src/main/java/org/apache/ignite/internal/table/distributed/storage/InternalTableImpl.java Change retry eligibility; delegate enlist failure to transaction-provided exception.
modules/table/src/main/java/org/apache/ignite/internal/table/distributed/replicator/PartitionReplicaListener.java Add PartitionInflights + guards to prevent rollback/enlist races and lock leaks.
modules/table/src/main/java/org/apache/ignite/internal/table/distributed/replicator/PartitionInflights.java New inflight tracking primitive used to coordinate cleanup barriers.
modules/table/src/integrationTest/java/org/apache/ignite/internal/table/ItOperationRetryTest.java Make some retry scenarios policy-dependent (only reversed/wait-die).
modules/table/src/integrationTest/java/org/apache/ignite/distributed/ItTxStateLocalMapTest.java Adjust unlock-only state assertions (commit + read gating).
modules/table/src/integrationTest/java/org/apache/ignite/distributed/ItTxAbstractDistributedTestSingleNode.java Skip implicit-timeout scenario unless applicable to wait-die behavior.
modules/sql-engine/src/integrationTest/java/org/apache/ignite/internal/sql/engine/systemviews/ItLocksSystemViewTest.java Make lock-view conflict test resilient to policy direction.
modules/sql-engine/src/integrationTest/java/org/apache/ignite/internal/sql/engine/ItDmlTest.java Adjust transactional scan test to ensure a true waiter/owner ordering.
modules/runner/src/integrationTest/java/org/apache/ignite/internal/table/ItTableScanTest.java Make scan/insert concurrency tests policy-aware.
modules/runner/src/integrationTest/java/org/apache/ignite/internal/table/ItDataConsistencyTest.java Update consistency stress test for new retry/kill behavior and stronger progress checks.
modules/runner/src/integrationTest/java/org/apache/ignite/internal/benchmark/TpccBenchmarkNodeRunner.java Add a standalone runner for TPC-C style benchmarking setup.
modules/replicator/src/main/java/org/apache/ignite/internal/replicator/exception/ReplicaUnavailableException.java Mark replica-unavailable as retriable for transaction retries.
modules/replicator/src/main/java/org/apache/ignite/internal/replicator/exception/ReplicationException.java Remove RetriableTransactionException marker from base replication exception.
modules/platforms/cpp/tests/odbc-test/transaction_test.cpp Disable a failing ODBC transaction test (linked ticket).
modules/core/src/testFixtures/java/org/apache/ignite/internal/testframework/IgniteTestUtils.java Add ensureFutureNotCompleted helper to avoid sleep-based test timing.
modules/core/src/main/java/org/apache/ignite/internal/lang/NodeStoppingException.java Remove retriable markers from node-stopping exception.
modules/client/src/integrationTest/java/org/apache/ignite/internal/streamer/ItClientDataStreamerLoadTest.java Reduce N+1 gets; add policy-aware tolerances and extra logging.
modules/client/src/integrationTest/java/org/apache/ignite/internal/client/ItThinClientTransactionsTest.java Make thin-client conflict tests policy-aware; adjust expected exception types.
modules/api/src/test/java/org/apache/ignite/tx/RunInTransactionRetryTest.java Update expected retry behavior to include commit retryability.
modules/api/src/main/java/org/apache/ignite/tx/RunInTransactionInternalImpl.java Make commit part of retriable closure path; simplify async flow; adjust retriable detection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


private static String dump(Object key) {
if (key instanceof ByteBuffer) {
return Arrays.toString(((ByteBuffer) key).array());
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LockKey.toString() can throw at runtime: ((ByteBuffer) key).array() fails for direct or read-only buffers (and ignores position/limit). Since LockKey is used with BinaryTuple.byteBuffer() (which returns a sliced buffer), this makes logging/debugging potentially crash. Use a safe dump that does not call array() unless hasArray(), and respects remaining() (e.g., copy bytes from a duplicate buffer).

Suggested change
return Arrays.toString(((ByteBuffer) key).array());
ByteBuffer buffer = ((ByteBuffer) key).duplicate();
byte[] bytes = new byte[buffer.remaining()];
buffer.get(bytes);
return Arrays.toString(bytes);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a org.apache.ignite.internal.util.IgniteUtils#byteBufferToByteArray already, but it allocates new array. Maybe we need to introduce a more efficient toString variant in a separate issue. This code should reuse existing methods to avoid copy-pasting the code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swtiched to byteBufferToByteArray.
This should be enough for dump, because it's not called on hot path.
If we need a separate ticket, I suggest to create it, but it shouldn't be the part of this PR.

return Stream.of(
argumentSet("kv", new KillTestContext(TransactionException.class, ItThinClientTransactionsTest::putKv)),
argumentSet("sql", new KillTestContext(SqlException.class, ItThinClientTransactionsTest::putSql))
argumentSet("sql", new KillTestContext(IgniteException.class, ItThinClientTransactionsTest::putSql))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we somehow check a more specific type here? IgniteException is too broad for an integration test

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted to SqlException

Comment on lines +76 to +79
+ " \"network\": {\n"
+ " \"port\":{},\n"
+ " \"nodeFinder\":{\n"
+ " \"netClusterNodes\": [ {} ]\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ " \"network\": {\n"
+ " \"port\":{},\n"
+ " \"nodeFinder\":{\n"
+ " \"netClusterNodes\": [ {} ]\n"
+ " network: {\n"
+ " port:{},\n"
+ " nodeFinder:{\n"
+ " netClusterNodes: [ {} ]\n"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🆗

+ " raft.fsync = " + fsync() + ",\n"
+ " system.partitionsLogPath = \"" + logPath() + "\",\n"
+ " failureHandler.handler: {\n"
+ " type: \"" + StopNodeOrHaltFailureHandlerConfigurationSchema.TYPE + "\",\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How "safe" is it to use "halt" failure handler in integrationTest? Maybe "stop node" is enough?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an integration test. This class starts nodes for tpc-c benchmark to debug from IDE. Similar to PlatformTestNodeRunner

protected static IgniteImpl igniteImpl;

public static void main(String[] args) throws Exception {
TpccBenchmarkNodeRunner runner = new TpccBenchmarkNodeRunner();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to reuse the code from org.apache.ignite.internal.Cluster or other test classes? I feel like there's a lot of copy-paste going on here

Copy link
Copy Markdown
Contributor Author

@ascherbakoff ascherbakoff Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see much copypaste here.
This class delegates cluster management to TestIgnitionManager, which is used in many places.


private static String dump(Object key) {
if (key instanceof ByteBuffer) {
return Arrays.toString(((ByteBuffer) key).array());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a org.apache.ignite.internal.util.IgniteUtils#byteBufferToByteArray already, but it allocates new array. Maybe we need to introduce a more efficient toString variant in a separate issue. This code should reuse existing methods to avoid copy-pasting the code.

}

public static int hash(UUID txId, int divisor) {
return Math.floorMod(spread(txId.hashCode()), divisor);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spread is already positive, why don't we use % here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🆗

assertFalse(fut0.isDone());

lockManager.release(lock);
fut0.thenAccept(l -> lockManager.release(l));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably wait for this future, and then assert that "everything's alright". Test doesn't look complete

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the test (however, this test is not written by me)

Comment on lines +350 to +356
try {
lockManager.acquire(txId1, key, X).join();

fail();
} catch (CompletionException e) {
// Expected.
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
try {
lockManager.acquire(txId1, key, X).join();
fail();
} catch (CompletionException e) {
// Expected.
}
assertThat(lockManager.acquire(txId1, key, X), CompletableFutureExceptionMatcher.willThrow(CompletionException.class));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a static import, of course, I mentioned class name for comment's clarity only

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +74 to +78
@Override
@Test
public void testNonFair() {
super.testNonFair();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that I understand what's happening here. Is the test disabled in a superclass? Maybe it should be commented

Copy link
Copy Markdown
Contributor Author

@ascherbakoff ascherbakoff Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is debugging artefact, removed.

return this;
}

public Builder retryId(UUID id) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never used (not even tested?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is remnants of https://issues.apache.org/jira/browse/IGNITE-28448, which I refactored to separate ticker.
Removed.


private final Supplier<Int2ObjectMap<IndexLocker>> indexesLockers;

private final ConcurrentMap<UUID, TxCleanupReadyState> txCleanupReadyFutures = new ConcurrentHashMap<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TxCleanupReadyState is never used now

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

* <p>Lock waiters are placed in the queue, ordered according to comparator provided by {@link HeapLockManager#deadlockPreventionPolicy}.
* When a new waiter is placed in the queue, it's validated against current lock owner: if there is an owner with a higher priority (as
* defined by comparator) lock request is denied.
* <p>Lock waiters are placed in the queue, ordered according to transaction priority: older transactions are first.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeadlockPreventionPolicy#txIdComparator is still used, is it true about older transactions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed javadoc to better reflect the implementation:
Lock waiters are placed in the queue, ordered according to transaction priority: higher priority transactions go first.

private final VolatileTxStateMetaStorage txStateVolatileStorage;

private static class SealableQueue extends ConcurrentLinkedQueue<Releasable> {
boolean sealed;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add javadoc about what sealed means.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🆗


// Waiter is allowed to wait for owner if it's younger.
// Otherwise we have to fail waiter.
return res > 0 ? null : waiter;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason of defining the same comparator for all policies but manipulating the comparison result in different ways? The primary goal of comparator was to avoid this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main intention for this change was to make tx queue always ordered in the priority order, by the definition of a lock queue.
This makes reasoning about lock conflicts much simplier.
Additionaly, I've added allowWait to lock policy contract to explicitly define who can wait for who.
It's not just about comparison result. For example, no wait policy never allows to wait and timeout policy always allows to wait .
But they still need proper ordering (from higher priority to lower) to ensure no starvation happens (higher priority should acquire locks first)
You can look at this change as an attemp to define lock policy responsibilites in a more transparent and understandable way.

*/
default boolean usePriority() {
return txIdComparator() != null;
default boolean reverse() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property is directly related to tx comparator. You can define different comparators (straight and reverse) for different policies, add reverse property to them and make this property refer to comparator's. This would simplify the logic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment for reasoning about removing "reverse" comparator.
Yes I can, but I will prefer to have lock queue always ordered in priority order.
I've renamed this method to invertedWaitOrder to avoid associations with the "reverse" comparator.

*/
public class ReplicationException extends IgniteInternalException implements RetriableTransactionException,
RetriableReplicaRequestException {
public class ReplicationException extends IgniteInternalException implements RetriableReplicaRequestException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that removing RetriableTransactionException from here is correct (same is about ReplicaUnavailableException). Transactions may be retried within timeout after failures caused by internal recoverable errors.
Note that some other exceptions are derived from this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure in opposite: not any ReplicationException derived class is retriable.
I've discovered this by unifying retry logic between explicit and implicit transactions, which I believe should be the same:

private static boolean exceptionAllowsImplicitTxRetry(Throwable e) {
        return ExceptionUtils.hasCause(e, RetriableTransactionException.class);
}

I've observed some tests failing due to unexpected retries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants