Skip to content

core: Reduce per-stream idle memory by 20%#12751

Open
ejona86 wants to merge 1 commit intogrpc:masterfrom
ejona86:reduce-stream-memory
Open

core: Reduce per-stream idle memory by 20%#12751
ejona86 wants to merge 1 commit intogrpc:masterfrom
ejona86:reduce-stream-memory

Conversation

@ejona86
Copy link
Copy Markdown
Member

@ejona86 ejona86 commented Apr 10, 2026

Metadata was accidentally being retained after the start of the call. That can be an overwhelming percentage of memory for an idle RPC; don't do that. The other changes are considerably smaller, but I happened to notice them and the changes are straight-forward without magic numbers (e.g., there's many arrays that could be tuned).

The regular interop server uses 4600 bytes per full duplex stream while idle, but much of that is Census recorded events hanging around. Keeping the Census integration but removing the Census impl (so a noop is used) drops that to 3000 bytes. This change brings that down to ~2450 bytes (which is still including stuff from TestServiceImpl). But there's very little Metadata in the interop tests, so absolute real-life savings would be much higher (but relative real-life savings may be lower, because the application will often have more state).

The measurements were captured using a modified
timeout_on_sleeping_server client that had 100,000 concurrent full duplex calls on one connection.

Metadata was accidentally being retained after the start of the call.
That can be an overwhelming percentage of memory for an idle RPC; don't
do that. The other changes are considerably smaller, but I happened to
notice them and the changes are straight-forward without magic numbers
(e.g., there's many arrays that could be tuned).

The regular interop server uses 4600 bytes per full duplex stream while
idle, but much of that is Census recorded events hanging around. Keeping
the Census integration but removing the Census impl (so a noop is used)
drops that to 3000 bytes. This change brings that down to ~2450 bytes
(which is still including stuff from TestServiceImpl). But there's very
little Metadata in the interop tests, so absolute real-life savings
would be much higher (but relative real-life savings may be lower,
because the application will often have more state).

The measurements were captured using a modified
timeout_on_sleeping_server client that had 100,000 concurrent full
duplex calls on one connection.
@ejona86 ejona86 requested a review from shivaspeaks April 10, 2026 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant