fix(o11y): use multi_thread for grpc test#5308
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5308 +/- ##
=======================================
Coverage 97.77% 97.78%
=======================================
Files 220 220
Lines 45765 45766 +1
=======================================
+ Hits 44749 44750 +1
Misses 1016 1016 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
coryan
left a comment
There was a problem hiding this comment.
I fear this will flake as-is. I may be wrong, I am easy to convince (ha ha).
coryan
left a comment
There was a problem hiding this comment.
I think this will work, but maybe you can run the test a few times (say 1,000) to see if it flakes? You can use cargo nextest with some flags to run those 1,000 iterations with multiple parallel jobs so they finish quickly.
|
I ran both RUSTFLAGS="--cfg google_cloud_unstable_tracing" \
cargo nextest run \
-p google-cloud-gax-internal \
-E "test(observability::client_signals::tests::grpc_client_request_success)" \
--count 1000 \
--no-fail-fast \
--success-output immediate-final |
Fix a `gcb-pr-minimal-versions` build failure in `google-cloud-gax-internal` tests caused by `h2` version `0.4.2`, which surfaces after `google_cloud_unstable_tracing` guards are removed in #5292. The Problem During minimal-versions checks, `h2` resolves to `0.4.2`, which somehow leads to deadlocks or missing tracing spans in our gRPC tests. Previous Attempts I tried switching tests to `multi_thread` (PR #5308), which masked the issue but was not a good solution. I tried forcing the version in the root `[workspace.dependencies]`, but `Cargo` ignored it during the isolated package build for `gax-internal`. The Fix Add `h2 = "0.4.13"` as a direct dependency in `src/gax-internal/Cargo.toml`. `0.4.13` is the version resolved in our normal Cargo.lock.
Use
current_threadwith delayedtokio::time::pause()forgrpc_client_request_successto avoid flakiness and deadlocks.I made a mistake previously when applying
#[tokio::test(flavor = "current_thread", start_paused = true)]to this test. That caused a deadlock because the background gRPC server was started in a paused runtime and could not make progress before the client blocked.The test originally used
flavor = "multi_thread", which avoided the deadlock but left the test vulnerable to timing-sensitive flakiness (as noted by @coryan).To fix both issues, I have:
current_threadruntime.tokio::time::pause();after the server is ready and before the client request.This ensures the test executes deterministically and fast without flakiness or deadlocks. We verified this by running 1,000 iterations without failure.
This surfaces in #5292 when the
unstable_tracingguards are removed.