Skip to content

fix: base64DecodeBytes unsigned byte values + improve JMH benchmark config#705

Draft
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:fix/base64-unsigned-jmh-config
Draft

fix: base64DecodeBytes unsigned byte values + improve JMH benchmark config#705
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:fix/base64-unsigned-jmh-config

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 6, 2026

Motivation

std.base64DecodeBytes was returning signed Java byte values (-128..127) instead of the Jsonnet-standard unsigned range (0..255). Bytes with the high bit set (≥128) were returned as negative numbers, violating the specification.

Key Design Decision

Apply & 0xff mask to convert signed Java bytes to unsigned integers, matching the Jsonnet specification and behavior of the C++, Go, and Rust implementations.

Modification

sjsonnet/src/sjsonnet/stdlib/EncodingModule.scala (base64DecodeBytes):

  • Added & 0xff mask: decoded(i) & 0xff converts signed byte (-128..127) to unsigned int (0..255)
  • Added comment explaining the conversion

Test:

  • new_test_suite/base64DecodeBytes_unsigned.jsonnet — verifies bytes ≥ 128 are returned as unsigned (e.g., 0xff → 255, not -1)
  • Covers: all-zero bytes, ASCII range, high bytes (128-255), mixed content

Benchmark Results

This is a correctness fix, not a performance optimization. No benchmark impact expected.

Analysis

  • Root cause: Java's byte type is signed (-128..127). Base64.getDecoder.decode() returns byte[]. Without masking, values ≥ 128 are negative.
  • Specification: Jsonnet spec defines byte arrays as [0, 255] range.
  • Compatibility: C++ jsonnet, go-jsonnet, jrsonnet, and rsjsonnet all return unsigned bytes.

References

  • Jsonnet specification: byte values are 0-255
  • Java byte type: signed, -128 to 127
  • & 0xff mask: standard Java idiom for unsigned byte conversion

Result

Fixes std.base64DecodeBytes to return unsigned byte values (0-255) per the Jsonnet specification.

…onfig

Fix std.base64DecodeBytes to correctly return unsigned byte values (0-255)
instead of signed Java byte values (-128..127). Java's byte type is signed,
so without masking with 0xff, bytes >= 128 (e.g., 0x80 = 128) would appear
as negative numbers (e.g., -128), violating the Jsonnet specification.

Also increase JMH RegressionBenchmark iterations from 1 to 3 for both
warmup and measurement phases, providing more stable and reliable benchmark
results with proper confidence intervals.

Changes:
- EncodingModule.scala: Apply `& 0xff` mask in base64DecodeBytes to convert
  signed Java bytes to unsigned integers, matching Jsonnet byte semantics
- RegressionBenchmark.scala: Increase warmup/measurement iterations to 3
- Add regression test: base64DecodeBytes_unsigned.jsonnet with boundary
  values (0, 127, 128, 255) and encode/decode round-trip verification

Upstream reference: jit branch commits b833428, af4832f
@He-Pin He-Pin force-pushed the fix/base64-unsigned-jmh-config branch from bb5274e to d1c2cb4 Compare April 10, 2026 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant