Skip to content

perf: direct-write stdout bypass StringWriter/StringBuffer allocation#680

Draft
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/direct-write-stdout
Draft

perf: direct-write stdout bypass StringWriter/StringBuffer allocation#680
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/direct-write-stdout

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 5, 2026

Motivation

When outputting to stdout (no --output-file), sjsonnet renders JSON into a StringWriter, calls toString() to get the full string, then println() to write it. This creates a redundant copy: the JSON is materialized in memory as a String before being written. For large outputs (e.g., Kubernetes manifests), this doubles peak memory usage.

Key Design Decision

When writing to stdout (no output file), use a ByteArrayOutputStream + OutputStreamWriter pipeline instead of StringWriter. This:

  1. Renders JSON directly into a byte buffer (no intermediate String)
  2. Uses baos.writeTo(stdout) for a single bulk write
  3. On rendering error, the buffer is simply discarded (nothing reaches stdout)

Modification

sjsonnet/src-jvm-native/sjsonnet/SjsonnetMainBase.scala:

  • Added stdout: PrintStream parameter to writeToFile and renderNormal
  • New code path when stdout != null and no output file: ByteArrayOutputStream(65536)OutputStreamWriterwriteTo(stdout)
  • Thread stdout through processFilerenderNormalwriteToFile

Benchmark Results

This optimization targets CLI I/O throughput, not JMH evaluation speed. The benefit is:

  • Reduced memory: No intermediate String for stdout output
  • Reduced copies: One writeTo call vs toString() + println()
  • Measurable via CLI: time sjsonnet large_file.jsonnet > /dev/null

Analysis

  • Error safety: ByteArrayOutputStream buffers all output. On error, the buffer is discarded — no partial output reaches stdout.
  • Buffer size: Initial 65536 bytes (64KB) — handles most outputs without reallocation.
  • No functional change: Only affects the I/O path when writing to stdout. --output-file path unchanged.

References

  • ByteArrayOutputStream.writeTo(OutputStream) — zero-copy transfer to stdout
  • Original StringWriter.toString allocation pattern

Result

Eliminates intermediate String allocation for stdout output. Reduces peak memory and I/O copies for CLI usage. Draft PR pending native benchmark data.

@He-Pin He-Pin marked this pull request as ready for review April 5, 2026 02:04
@He-Pin He-Pin marked this pull request as draft April 5, 2026 18:24
@He-Pin He-Pin force-pushed the perf/direct-write-stdout branch 4 times, most recently from 1b8644f to b2862ad Compare April 10, 2026 03:33
When writing to stdout (no --output-file), render directly through
OutputStreamWriter(BufferedOutputStream(stdout, 65536)) instead of
accumulating in a StringWriter then calling toString + println.

For large outputs this eliminates ~3x output-size of intermediate
char[] allocations from StringBuffer doubling growth + toString copy.

The mainConfigured() API gains an optional stdout parameter (default
null) that preserves backward compatibility — callers not passing
stdout get the existing StringWriter behavior.

Upstream: b09647c0
@He-Pin He-Pin force-pushed the perf/direct-write-stdout branch from b2862ad to e48a9f2 Compare April 10, 2026 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant