perf: direct-write stdout bypass StringWriter/StringBuffer allocation by He-Pin · Pull Request #680 · databricks/sjsonnet

He-Pin · 2026-04-05T01:59:40Z

Motivation

When outputting to stdout (no --output-file), sjsonnet renders JSON into a StringWriter, calls toString() to get the full string, then println() to write it. This creates a redundant copy: the JSON is materialized in memory as a String before being written. For large outputs (e.g., Kubernetes manifests), this doubles peak memory usage.

Key Design Decision

When writing to stdout (no output file), use a ByteArrayOutputStream + OutputStreamWriter pipeline instead of StringWriter. This:

Renders JSON directly into a byte buffer (no intermediate String)
Uses baos.writeTo(stdout) for a single bulk write
On rendering error, the buffer is simply discarded (nothing reaches stdout)

Modification

sjsonnet/src-jvm-native/sjsonnet/SjsonnetMainBase.scala:

Added stdout: PrintStream parameter to writeToFile and renderNormal
New code path when stdout != null and no output file: ByteArrayOutputStream(65536) → OutputStreamWriter → writeTo(stdout)
Thread stdout through processFile → renderNormal → writeToFile

Benchmark Results

This optimization targets CLI I/O throughput, not JMH evaluation speed. The benefit is:

Reduced memory: No intermediate String for stdout output
Reduced copies: One writeTo call vs toString() + println()
Measurable via CLI: time sjsonnet large_file.jsonnet > /dev/null

Analysis

Error safety: ByteArrayOutputStream buffers all output. On error, the buffer is discarded — no partial output reaches stdout.
Buffer size: Initial 65536 bytes (64KB) — handles most outputs without reallocation.
No functional change: Only affects the I/O path when writing to stdout. --output-file path unchanged.

References

ByteArrayOutputStream.writeTo(OutputStream) — zero-copy transfer to stdout
Original StringWriter.toString allocation pattern

Result

Eliminates intermediate String allocation for stdout output. Reduces peak memory and I/O copies for CLI usage. Draft PR pending native benchmark data.

When writing to stdout (no --output-file), render directly through OutputStreamWriter(BufferedOutputStream(stdout, 65536)) instead of accumulating in a StringWriter then calling toString + println. For large outputs this eliminates ~3x output-size of intermediate char[] allocations from StringBuffer doubling growth + toString copy. The mainConfigured() API gains an optional stdout parameter (default null) that preserves backward compatibility — callers not passing stdout get the existing StringWriter behavior. Upstream: b09647c0

He-Pin marked this pull request as ready for review April 5, 2026 02:04

He-Pin mentioned this pull request Apr 5, 2026

performance optimization #666

Open

He-Pin marked this pull request as draft April 5, 2026 18:24

He-Pin force-pushed the perf/direct-write-stdout branch 4 times, most recently from 1b8644f to b2862ad Compare April 10, 2026 03:33

He-Pin force-pushed the perf/direct-write-stdout branch from b2862ad to e48a9f2 Compare April 10, 2026 09:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: direct-write stdout bypass StringWriter/StringBuffer allocation#680

perf: direct-write stdout bypass StringWriter/StringBuffer allocation#680
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/direct-write-stdout

He-Pin commented Apr 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Key Design Decision

Modification

Benchmark Results

Analysis

References

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented Apr 5, 2026 •

edited

Loading