Add E2E acceptance test for backend routing after proxy restart#4574
Open
Add E2E acceptance test for backend routing after proxy restart#4574
Conversation
MCPServer supports horizontal scaling with Redis session storage, but there was no E2E test verifying that a session established on one pod is accessible from a different pod. This test deploys an MCPServer with replicas=2 and Redis session storage, initializes an MCP session, then sends raw JSON-RPC requests directly to each pod IP using the same Mcp-Session-Id header to prove sessions are shared via Redis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pod IPs are not reachable from the CI runner host in Kind clusters. Replace direct pod IP HTTP calls with kubectl port-forward to each pod, which tunnels through the Kind node's network. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The MCPServer CRD's sessionStorage config was populated by the operator into RunConfig but the proxy runner never read it — sessions always used in-memory LocalStorage, making cross-replica routing non-functional. Add WithSessionStorage transport option and wire ScalingConfig.SessionRedis from RunConfig into the transport layer so both StdioTransport and HTTPTransport (transparent proxy) use Redis-backed session storage when configured. Rewrite the E2E test to use mcp-go clients throughout, including transport.WithSession to create a client on pod B that reuses the session established on pod A. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pass SessionStorage through types.Config instead of a factory option with interface assertion. The factory now sets the field directly on each transport type during construction. Add clientA to codespell ignore list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When backendReplicas > 1 and the proxy runner restarts, the recovered session's backend_url points to the ClusterIP service. Kube-proxy may route to a backend pod that never handled the session's initialize, causing HTTP 404 / JSON-RPC -32001 "session not found" errors. This test creates an MCPServer with backendReplicas=2, establishes a session, deletes the proxy runner pod, and sends 5 requests with the same session ID. With 2 backends and random routing, P(all 5 hit the correct pod) ≈ 3%, making the failure reliably detectable. The test is expected to fail until pod-specific headless DNS routing is implemented in the proxy runner. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4574 +/- ##
==========================================
- Coverage 69.06% 68.84% -0.23%
==========================================
Files 502 505 +3
Lines 51997 52408 +411
==========================================
+ Hits 35913 36078 +165
- Misses 13300 13542 +242
- Partials 2784 2788 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The namespace parameter is always defaultNamespace in current tests but is kept for reusability across future test contexts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 6, 2026
yrobla
pushed a commit
that referenced
this pull request
Apr 9, 2026
Extract re-initialization logic from tracingTransport into a dedicated backendRecovery type backed by a narrow recoverySessionStore interface (Get + UpsertSession) and a forward func. tracingTransport now owns only the request lifecycle (session guard, initialize detection, httptrace, session creation) and delegates all forwarding and recovery to backendRecovery. This makes reinitializeAndReplay and podBackendURL testable without standing up a full TransparentProxy: backend_recovery_test.go covers all recovery paths (no session, no init body, happy path, forward error, missing new session ID) using a stubSessionStore and inline httptest servers. tracingTransport.forward() and the base field are removed; all network I/O goes through recovery.forward — a single source of truth for the underlying transport. Also integrates the E2E acceptance test from #4574 that exercises backendReplicas=2 + proxy runner restart, verifying that sessions are routed to the correct backend pod after re-initialization.
yrobla
pushed a commit
that referenced
this pull request
Apr 9, 2026
Extract re-initialization logic from tracingTransport into a dedicated backendRecovery type backed by a narrow recoverySessionStore interface (Get + UpsertSession) and a forward func. tracingTransport now owns only the request lifecycle (session guard, initialize detection, httptrace, session creation) and delegates all forwarding and recovery to backendRecovery. This makes reinitializeAndReplay and podBackendURL testable without standing up a full TransparentProxy: backend_recovery_test.go covers all recovery paths (no session, no init body, happy path, forward error, missing new session ID) using a stubSessionStore and inline httptest servers. tracingTransport.forward() and the base field are removed; all network I/O goes through recovery.forward — a single source of truth for the underlying transport. Also integrates the E2E acceptance test from #4574 that exercises backendReplicas=2 + proxy runner restart, verifying that sessions are routed to the correct backend pod after re-initialization.
yrobla
pushed a commit
that referenced
this pull request
Apr 9, 2026
Extract re-initialization logic from tracingTransport into a dedicated backendRecovery type backed by a narrow recoverySessionStore interface (Get + UpsertSession) and a forward func. tracingTransport now owns only the request lifecycle (session guard, initialize detection, httptrace, session creation) and delegates all forwarding and recovery to backendRecovery. This makes reinitializeAndReplay and podBackendURL testable without standing up a full TransparentProxy: backend_recovery_test.go covers all recovery paths (no session, no init body, happy path, forward error, missing new session ID) using a stubSessionStore and inline httptest servers. tracingTransport.forward() and the base field are removed; all network I/O goes through recovery.forward — a single source of truth for the underlying transport. Also integrates the E2E acceptance test from #4574 that exercises backendReplicas=2 + proxy runner restart, verifying that sessions are routed to the correct backend pod after re-initialization.
yrobla
pushed a commit
that referenced
this pull request
Apr 9, 2026
Extract re-initialization logic from tracingTransport into a dedicated backendRecovery type backed by a narrow recoverySessionStore interface (Get + UpsertSession) and a forward func. tracingTransport now owns only the request lifecycle (session guard, initialize detection, httptrace, session creation) and delegates all forwarding and recovery to backendRecovery. This makes reinitializeAndReplay and podBackendURL testable without standing up a full TransparentProxy: backend_recovery_test.go covers all recovery paths (no session, no init body, happy path, forward error, missing new session ID) using a stubSessionStore and inline httptest servers. tracingTransport.forward() and the base field are removed; all network I/O goes through recovery.forward — a single source of truth for the underlying transport. Also integrates the E2E acceptance test from #4574 that exercises backendReplicas=2 + proxy runner restart, verifying that sessions are routed to the correct backend pod after re-initialization.
yrobla
pushed a commit
that referenced
this pull request
Apr 9, 2026
Extract re-initialization logic from tracingTransport into a dedicated backendRecovery type backed by a narrow recoverySessionStore interface (Get + UpsertSession) and a forward func. tracingTransport now owns only the request lifecycle (session guard, initialize detection, httptrace, session creation) and delegates all forwarding and recovery to backendRecovery. This makes reinitializeAndReplay and podBackendURL testable without standing up a full TransparentProxy: backend_recovery_test.go covers all recovery paths (no session, no init body, happy path, forward error, missing new session ID) using a stubSessionStore and inline httptest servers. tracingTransport.forward() and the base field are removed; all network I/O goes through recovery.forward — a single source of truth for the underlying transport. Also integrates the E2E acceptance test from #4574 that exercises backendReplicas=2 + proxy runner restart, verifying that sessions are routed to the correct backend pod after re-initialization.
yrobla
pushed a commit
that referenced
this pull request
Apr 9, 2026
Extract re-initialization logic from tracingTransport into a dedicated backendRecovery type backed by a narrow recoverySessionStore interface (Get + UpsertSession) and a forward func. tracingTransport now owns only the request lifecycle (session guard, initialize detection, httptrace, session creation) and delegates all forwarding and recovery to backendRecovery. This makes reinitializeAndReplay and podBackendURL testable without standing up a full TransparentProxy: backend_recovery_test.go covers all recovery paths (no session, no init body, happy path, forward error, missing new session ID) using a stubSessionStore and inline httptest servers. tracingTransport.forward() and the base field are removed; all network I/O goes through recovery.forward — a single source of truth for the underlying transport. Also integrates the E2E acceptance test from #4574 that exercises backendReplicas=2 + proxy runner restart, verifying that sessions are routed to the correct backend pod after re-initialization.
yrobla
pushed a commit
that referenced
this pull request
Apr 9, 2026
Extract re-initialization logic from tracingTransport into a dedicated backendRecovery type backed by a narrow recoverySessionStore interface (Get + UpsertSession) and a forward func. tracingTransport now owns only the request lifecycle (session guard, initialize detection, httptrace, session creation) and delegates all forwarding and recovery to backendRecovery. This makes reinitializeAndReplay and podBackendURL testable without standing up a full TransparentProxy: backend_recovery_test.go covers all recovery paths (no session, no init body, happy path, forward error, missing new session ID) using a stubSessionStore and inline httptest servers. tracingTransport.forward() and the base field are removed; all network I/O goes through recovery.forward — a single source of truth for the underlying transport. Also integrates the E2E acceptance test from #4574 that exercises backendReplicas=2 + proxy runner restart, verifying that sessions are routed to the correct backend pod after re-initialization.
jerm-dro
pushed a commit
that referenced
this pull request
Apr 9, 2026
) * Route MCP sessions to the originating backend pod using httptrace When a proxy runner pod restarts it recovers sessions from Redis but backend_url stored the ClusterIP, so kube-proxy could send follow-up requests to a different backend pod that never handled initialize — causing JSON-RPC -32001 "session not found" errors on the first request. Use net/http/httptrace.GotConn to capture the actual backend pod IP after kube-proxy DNAT on every initialize request, and store that as backend_url instead of the ClusterIP URL. The existing Rewrite closure already reads backend_url and pins routing to the correct pod; no changes to that path are needed. When the backend pod is later replaced (rescheduled to a new IP or restarted in place and lost in-memory session state), the proxy now re-initializes the backend session transparently rather than returning 404 to the client: - Dial error (pod IP unreachable): re-init triggers on TCP failure - Backend 404 (session lost, same IP): re-init triggers on response In both cases the proxy replays the stored initialize body against the ClusterIP, captures the new pod IP via GotConn, stores the new backend session ID, rewrites outbound Mcp-Session-Id headers, and replays the original client request — the client sees no error. DELETE responses are excluded from the 404 re-init path since the session is intentionally torn down in that case. Closes #4575 * Reimplement tracingTransport logic + add e2e tests Extract re-initialization logic from tracingTransport into a dedicated backendRecovery type backed by a narrow recoverySessionStore interface (Get + UpsertSession) and a forward func. tracingTransport now owns only the request lifecycle (session guard, initialize detection, httptrace, session creation) and delegates all forwarding and recovery to backendRecovery. This makes reinitializeAndReplay and podBackendURL testable without standing up a full TransparentProxy: backend_recovery_test.go covers all recovery paths (no session, no init body, happy path, forward error, missing new session ID) using a stubSessionStore and inline httptest servers. tracingTransport.forward() and the base field are removed; all network I/O goes through recovery.forward — a single source of truth for the underlying transport. Also integrates the E2E acceptance test from #4574 that exercises backendReplicas=2 + proxy runner restart, verifying that sessions are routed to the correct backend pod after re-initialization. --------- Co-authored-by: taskbot <taskbot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
backendReplicas > 1and the proxy runner restarts, the recovered session'sbackend_urlpoints to the ClusterIP service. Kube-proxy may route to a backend pod that never handled the session'sinitialize, causing HTTP 404 / JSON-RPC-32001"session not found" errors.backendReplicas=2, establishes a session, deletes the proxy runner pod, and sends 5 requests with the same session ID. With 2 backends and random routing, P(all 5 hit correct pod) ≈ 3%, making the failure reliably detectable.Type of change
Test plan
task test-e2e) — this PR adds the test; it is expected to fail on CI, proving the bug existstask test) — no production code changed, existing tests unaffectedDoes this introduce a user-facing change?
No
Special notes for reviewers
This test is intentionally expected to fail on CI. It demonstrates the backend routing bug that will be fixed in a follow-up PR implementing pod-specific headless DNS routing. The test context
"backendReplicas=2 and proxy runner restarts"will pass once the fix lands.Generated with Claude Code