Skip to content

fix(o11y): use multi_thread for grpc test#5308

Merged
haphungw merged 2 commits intogoogleapis:mainfrom
haphungw:fix-grpc-test-flavor
Apr 7, 2026
Merged

fix(o11y): use multi_thread for grpc test#5308
haphungw merged 2 commits intogoogleapis:mainfrom
haphungw:fix-grpc-test-flavor

Conversation

@haphungw
Copy link
Copy Markdown
Contributor

@haphungw haphungw commented Apr 7, 2026

Use current_thread with delayed tokio::time::pause() for grpc_client_request_success to avoid flakiness and deadlocks.

I made a mistake previously when applying #[tokio::test(flavor = "current_thread", start_paused = true)] to this test. That caused a deadlock because the background gRPC server was started in a paused runtime and could not make progress before the client blocked.

The test originally used flavor = "multi_thread", which avoided the deadlock but left the test vulnerable to timing-sensitive flakiness (as noted by @coryan).

To fix both issues, I have:

  1. Kept the default current_thread runtime.
  2. Started the background gRPC server before pausing time.
  3. Called tokio::time::pause(); after the server is ready and before the client request.

This ensures the test executes deterministically and fast without flakiness or deadlocks. We verified this by running 1,000 iterations without failure.

This surfaces in #5292 when the unstable_tracing guards are removed.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.78%. Comparing base (7cbbd7d) to head (910df22).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5308   +/-   ##
=======================================
  Coverage   97.77%   97.78%           
=======================================
  Files         220      220           
  Lines       45765    45766    +1     
=======================================
+ Hits        44749    44750    +1     
  Misses       1016     1016           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@haphungw haphungw marked this pull request as ready for review April 7, 2026 15:32
@haphungw haphungw requested a review from a team as a code owner April 7, 2026 15:32
Copy link
Copy Markdown
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear this will flake as-is. I may be wrong, I am easy to convince (ha ha).

Copy link
Copy Markdown
Contributor

@coryan coryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will work, but maybe you can run the test a few times (say 1,000) to see if it flakes? You can use cargo nextest with some flags to run those 1,000 iterations with multiple parallel jobs so they finish quickly.

@haphungw
Copy link
Copy Markdown
Contributor Author

haphungw commented Apr 7, 2026

I ran both flavor = "multi_thread" and tokio::time::pause(); with nextest, and multi_thread really just took to long that I did not bother finishing it.
The tokio::time::pause(); approach passed!

RUSTFLAGS="--cfg google_cloud_unstable_tracing" \
cargo nextest run \
  -p google-cloud-gax-internal \
  -E "test(observability::client_signals::tests::grpc_client_request_success)" \
  --count 1000 \
  --no-fail-fast \
  --success-output immediate-final

@haphungw haphungw merged commit 9faf396 into googleapis:main Apr 7, 2026
36 checks passed
haphungw added a commit that referenced this pull request Apr 7, 2026
Fix a `gcb-pr-minimal-versions` build failure in
`google-cloud-gax-internal` tests caused by `h2` version `0.4.2`, which
surfaces after `google_cloud_unstable_tracing` guards are removed in
#5292.

The Problem
During minimal-versions checks, `h2` resolves to `0.4.2`, which somehow
leads to deadlocks or missing tracing spans in our gRPC tests.

Previous Attempts
I tried switching tests to `multi_thread` (PR #5308), which masked the
issue but was not a good solution.
I tried forcing the version in the root `[workspace.dependencies]`, but
`Cargo` ignored it during the isolated package build for `gax-internal`.

The Fix
Add `h2 = "0.4.13"` as a direct dependency in
`src/gax-internal/Cargo.toml`. `0.4.13` is the version resolved in our
normal Cargo.lock.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants