Skip to content

KAFKA-17999: Fix flaky DynamicConnectionQuotaTest testDynamicConnectionQuota#21354

Open
chickenchickenlove wants to merge 2 commits intoapache:trunkfrom
chickenchickenlove:KAFKA-17999
Open

KAFKA-17999: Fix flaky DynamicConnectionQuotaTest testDynamicConnectionQuota#21354
chickenchickenlove wants to merge 2 commits intoapache:trunkfrom
chickenchickenlove:KAFKA-17999

Conversation

@chickenchickenlove
Copy link
Copy Markdown
Contributor

@chickenchickenlove chickenchickenlove commented Jan 25, 2026

Summary

  • Fix flakiness in
    DynamicConnectionQuotaTest#testDynamicConnectionQuota by waiting until
    the broker’s max.connections.per.ip.overrides is actually applied in
    ConnectionQuotas before running the second verification.
  • Fixes https://issues.apache.org/jira/browse/KAFKA-17999

Background

testDynamicConnectionQuota dynamically updates
max.connections.per.ip.overrides from 5 to 7 and immediately starts
verifyMaxConnections(7, ...).

In practice, the config update is propagated asynchronously: the broker
SocketServer reconfigure is triggered, but there is a short window
where the data-plane acceptor still evaluates new connections using the
old override state(i.e, 5). This can cause one of the “prefill” sockets
in the second verification to be rejected under max=5, breaking the
test’s assumptions and leading to intermittent failures.

Changes

  • Add a test-only accessor in ConnectionQuotas to check whether an
    override for a given IP is present.
  • In DynamicConnectionQuotaTest, after altering
    max.connections.per.ip.overrides, wait until the broker’s
    ConnectionQuotas observes 127.0.0.1 -> 7 before executing the second
    verifyMaxConnections(...).

Reproduction / Verification

  • Before: intermittent failures observed in testDynamicConnectionQuota
    when the override was not yet applied during the second verification.
  • After: repeated runs show the second verification consistently sees
    max=7 and the test becomes deterministic.

Test Fail Ratio in my local.

  • Before : 7/50 ~= 13%
  • After : 0/500 ~= 0%

Flaky / Success diagram

fail-1

Reviewers: Mickael Maison mimaison@users.noreply.github.com

@github-actions github-actions bot added triage PRs from the community core Kafka Broker small Small PRs labels Jan 25, 2026
@chickenchickenlove
Copy link
Copy Markdown
Contributor Author

@mumrah Hi!
Sorry to ping you suddenly.
I made a PR to fix the issue that you reported!
When you have bandwidth, please take a look 🙇‍♂️

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 2, 2026

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

@chickenchickenlove
Copy link
Copy Markdown
Contributor Author

@mumrah
Gently ping and sorry to bother you again!
When you get a chance, please take a look 🙇‍♂️

@github-actions
Copy link
Copy Markdown

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

@chickenchickenlove
Copy link
Copy Markdown
Contributor Author

@chia7712 Hi! sorry to mention suddenly.
If you have bandwidth, could you take a look...? 🙇‍♂️

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 2, 2026

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

@mimaison
Copy link
Copy Markdown
Member

Thanks for the PR. Can you rebase on trunk to trigger the CI?

@chickenchickenlove
Copy link
Copy Markdown
Contributor Author

@mimaison
Thanks for taking the time to look at this.
I’ve rebased it!
When you get a chance, please take a look. 🙇‍♂️

@github-actions
Copy link
Copy Markdown

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

Copy link
Copy Markdown
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also remove the Flaky annotation on the test?

val maxConnectionsPerIPOverride = 7
props.put(SocketServerConfigs.MAX_CONNECTIONS_PER_IP_OVERRIDES_CONFIG, s"localhost:$maxConnectionsPerIPOverride")
reconfigureServers(props, perBrokerConfig = false, (SocketServerConfigs.MAX_CONNECTIONS_PER_IP_OVERRIDES_CONFIG, s"localhost:$maxConnectionsPerIPOverride"))
waitForMaxConnectionsOverrideApplied("127.0.0.1", maxConnectionsPerIPOverride)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to use localhost like above?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! I will address it!

@github-actions github-actions bot removed needs-attention triage PRs from the community labels Apr 8, 2026
@chickenchickenlove
Copy link
Copy Markdown
Contributor Author

@mimaison thanks for taking the time to look at this!
I've addressed it.
When you get a chance, please take a look 🙇‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-approved core Kafka Broker small Small PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants