perf: stdlib batch 3 — while-loop conversions and allocation reduction (19 functions) by He-Pin · Pull Request #702 · databricks/sjsonnet

He-Pin · 2026-04-06T14:53:40Z

Motivation

Continue the systematic stdlib optimization effort. This batch converts 19 functions across 8 files from Scala collection operations (.map, .filter, .forall, .exists, .foreach, .mkString) to while-loops and replaces intermediate allocations with direct array operations (System.arraycopy, StringBuilder).

Key Design Decisions

Lazy semantics preservation: std.count and std.member guard x.value evaluation with empty-array checks — std.count([], error "boom") must return 0, not throw. This was caught during cross-review.
Surrogate-safe string comparison: The compareStringsByCodepoint equality fast-path is restricted to c1 < 0xD800 to prevent misaligned surrogate pair comparisons when one side has valid pairs and the other has unpaired surrogates.
Single-pass sum/avg: Old code did two passes (validate all elements with forall, then map + sum). New code validates and accumulates in a single pass — same error message, earlier failure on invalid input.

Modification

ArrayModule.scala (9 functions)

std.all: forall → while-loop with Val.False(pos) early return
std.any: iterator.exists → while-loop with Val.True(pos) early return
std.count: foreach closure → while-loop; x.value hoisted with empty guard
std.member (array): indexWhere → while-loop with var found; empty guard
std.contains: indexWhere → while-loop
std.remove: indexWhere + slice/++ → while-loop search + System.arraycopy
std.removeAt: slice/++ → System.arraycopy
std.repeat (array): ArrayBuilder → pre-sized array + System.arraycopy
std.sum/avg: two-pass → single-pass while-loop with pattern match

ObjectModule.scala

prune: for-comprehension → ArrayBuilder while-loop, fusing map+filter

StringModule.scala

escapeStringXml: StringWriter + for → java.lang.StringBuilder + while + charAt

ManifestModule.scala (4 functions)

Lines: filter + map + mkString → while-loop + StringBuilder
manifestYamlStream: map + mkString → while-loop + StringBuilder
manifestIni: flatMap + Seq + mkString → StringBuilder direct append
manifestPythonVars: map + mkString → StringBuilder

SetModule.scala

sortArr: reuse single-element argBuf for keyF invocations; while-loops for key computation, index remapping, and strict evaluation

DecimalFormat.scala

leftPad/rightPad: string concat with "0" * n → StringBuilder

Evaluator.scala

visitApply: e.args.map(...) → while-loops (Array[Val] eager, Array[Eval] lazy)

Util.scala

compareStringsByCodepoint: add c1 == c2 && c1 < 0xD800 fast skip + c1 < 0xD800 && c2 < 0xD800 direct subtraction

Benchmark Results

JMH (focused, wi=3, i=5, f=1)

Benchmark	Master	Batch 3	Delta
bench.02	46.042	46.287	~0% (noise)
realistic2	66.377	65.202 ± 0.759	-1.8%
bench.04	33.116	31.963	-3.5%
bench.07	3.557	3.166	-11.0%

Full Regression Suite (22 cases)

No significant regressions across any benchmark case.

Analysis

These are incremental micro-optimizations that primarily reduce GC pressure through fewer intermediate allocations (no lambda closures, no temporary arrays from .map). The gains are most visible in benchmarks that exercise stdlib functions heavily. The visitApply while-loop benefits any code with variable-arity function calls.

References

Upstream jit branch commits: b3c696f2, 9ff850f3, e19833b5, 72eb9711, d1629fde, cd612df7, af4832f2
Prior PRs: perf: While-loop stdlib optimizations + base64DecodeBytes unsigned fix #700 (batch 1), perf(stdlib): while-loop conversions for Array/Object/String modules #701 (batch 2)

Result

All 55 test suites pass. 22 benchmark cases show no regressions.

Convert 19 stdlib/core functions from Scala collection operations to while-loops and replace intermediate allocations with direct array ops: ArrayModule: std.all, std.any, std.count, std.member (array), std.contains, std.remove (while-loop search + arraycopy), std.removeAt (arraycopy), std.repeat (pre-sized array + arraycopy), std.sum/std.avg (single-pass) ObjectModule: prune (fuse map+filter with ArrayBuilder) StringModule: escapeStringXml (StringBuilder + charAt) ManifestModule: Lines, manifestYamlStream, manifestIni, manifestPythonVars (all StringBuilder conversions) SetModule: sortArr (reuse argBuf, while-loop key compute + index remap) DecimalFormat: leftPad/rightPad (StringBuilder) Evaluator: visitApply (while-loop arg evaluation) Util: compareStringsByCodepoint (fast path for equal non-surrogate chars) Key safety guards: - std.count/std.member: guard x.value with empty-array check to preserve lazy semantics (std.count([], error) must return 0, not throw) - compareStringsByCodepoint: equality fast-path restricted to c1 < 0xD800 to avoid misaligned surrogate pair comparison Upstream: jit branch commits b3c696f, 9ff850f, e19833b, 72eb971, d1629fd, cd612df, af4832f

He-Pin · 2026-04-06T18:36:06Z

Code Review

I found several correctness issues in this PR:

1. `std.sum` — returns raw `Double`, not `Val.Num`

builtin("sum", "arr") { (_, _, arr: Val.Arr) =>
  val la = arr.asLazyArray
  var sum = 0.0
  var i = 0
  while (i < la.length) {
    la(i).value match {
      case n: Val.Num => sum += n.asDouble
      case _          => Error.fail("Argument must be an array of numbers")
    }
    i += 1
  }
  sum  // <-- Returns Double directly, but the builtin expects Val
}

In Scala, a Double is implicitly converted to Val.Num only if there's an implicit conversion in scope. However, there is no such implicit conversion in ArrayModule. The return type of the builtin lambda should be Val, and returning a bare Double would be a compile error unless there's an implicit.

Wait — looking more carefully: the builtin helper likely expects the lambda to return Val, but sum is a Double. This should fail to compile unless there's an implicit conversion from Double to Val somewhere. If this compiles, it's likely via an implicit that wraps it in Val.Num(pos, ...), but the position is wrong — it would use some implicit position, not the call site position.

Actually, after checking: the builtin method signature likely uses a type class or implicit conversion. If there's an implicit Double => Val.Num(pos) using the captured pos, this is fine. But if the implicit doesn't exist, this is a compile error. Please verify this compiles.

2. `std.avg` — same `Double` return issue

sum / la.length  // Same concern as std.sum

3. `std.count` — lazy semantics guard is correct

if (a.isEmpty) return Val.Num(pos, 0)
val xVal = x.value

This is correct — x.value is only forced when the array is non-empty. No issue.

4. `std.member` (array) — lazy semantics guard is correct

if (la.isEmpty) false
else {
  val xVal = x.value
  ...
}

Correct — same pattern as std.count. No issue.

5. `std.remove` — `idx` could be at the last position, `arraycopy` is safe

System.arraycopy(lazyArray, idx + 1, result, idx, lazyArray.length - idx - 1)

When idx == lazyArray.length - 1, this becomes arraycopy(src, len, dest, len-1, 0) which copies 0 elements. This is valid. No issue.

6. `compareStringsByCodepoint` — surrogate fast path condition

if (c1 == c2 && c1 < 0xd800.toChar) {

The condition c1 < 0xd800.toChar correctly excludes the surrogate range (0xD800-0xDFFF). But what about the range 0xE000-0xFFFF (valid BMP characters above surrogates)? These are NOT surrogates, so c1 < 0xd800.toChar being false for them means they fall through to the else if branch:

} else if (c1 < 0xd800.toChar && c2 < 0xd800.toChar) {

For chars in 0xE000-0xFFFF, this condition is also false, so they fall through to the full codepoint logic. This is correct but suboptimal — these chars could be compared directly. Not a bug, just a missed optimization.

7. `visitApply` — while-loop arg evaluation changes exception ordering

Previously: e.args.map(visitExpr(_)) evaluated all args left-to-right via map.
Now: while loop evaluates left-to-right explicitly.

The behavior is the same — left-to-right evaluation for tailstrict calls. No issue.

8. `escapeStringXml` — `StringBuilder` initial capacity

val sb = new java.lang.StringBuilder(str.length)

If the string contains characters that need escaping (e.g., & -> &), the output will be longer than the input. The initial capacity will be too small, causing reallocation. This is a minor performance concern, not a correctness issue. The original StringWriter also had no size hint.

9. `manifestIni` — trailing newline behavior change

sb.append(k).append(" = ").append(render(x)).append('\n')

The original code used .mkString which joins elements. The new code appends \n after every line. Let me check if this produces the same output...

The original: lines.flatMap(Seq(_, "\n")).mkString — this produces each line followed by \n, which is exactly what the new code does. Same behavior, no issue.

He-Pin · 2026-04-06T21:50:24Z

Closing: superseded by consolidated stdlib optimization effort. All 19 function optimizations from this PR are preserved and will be resubmitted with comprehensive Scala Native benchmarks.

He-Pin mentioned this pull request Apr 6, 2026

perf: While-loop stdlib optimizations + base64DecodeBytes unsigned fix #700

Closed

He-Pin closed this Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: stdlib batch 3 — while-loop conversions and allocation reduction (19 functions)#702

perf: stdlib batch 3 — while-loop conversions and allocation reduction (19 functions)#702
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/while-loop-stdlib-3

He-Pin commented Apr 6, 2026

Uh oh!

He-Pin commented Apr 6, 2026

Uh oh!

He-Pin commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Apr 6, 2026

Motivation

Key Design Decisions

Modification

ArrayModule.scala (9 functions)

ObjectModule.scala

StringModule.scala

ManifestModule.scala (4 functions)

SetModule.scala

DecimalFormat.scala

Evaluator.scala

Util.scala

Benchmark Results

JMH (focused, wi=3, i=5, f=1)

Full Regression Suite (22 cases)

Analysis

References

Result

Uh oh!

He-Pin commented Apr 6, 2026

Code Review

1. std.sum — returns raw Double, not Val.Num

2. std.avg — same Double return issue

3. std.count — lazy semantics guard is correct

4. std.member (array) — lazy semantics guard is correct

5. std.remove — idx could be at the last position, arraycopy is safe

6. compareStringsByCodepoint — surrogate fast path condition

7. visitApply — while-loop arg evaluation changes exception ordering

8. escapeStringXml — StringBuilder initial capacity

9. manifestIni — trailing newline behavior change

Uh oh!

He-Pin commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `std.sum` — returns raw `Double`, not `Val.Num`

2. `std.avg` — same `Double` return issue

3. `std.count` — lazy semantics guard is correct

4. `std.member` (array) — lazy semantics guard is correct

5. `std.remove` — `idx` could be at the last position, `arraycopy` is safe

6. `compareStringsByCodepoint` — surrogate fast path condition

7. `visitApply` — while-loop arg evaluation changes exception ordering

8. `escapeStringXml` — `StringBuilder` initial capacity

9. `manifestIni` — trailing newline behavior change