perf: cache ObjComp allLocals, while-loop sum/avg, arraycopy remove/removeAt by He-Pin · Pull Request #699 · databricks/sjsonnet

He-Pin · 2026-04-06T12:32:23Z

Motivation

Three independent micro-optimizations that reduce allocation overhead in common stdlib operations and object comprehension evaluation.

Key Design Decision

Each sub-optimization targets a different allocation pattern:

ObjComp allLocals: Cache a repeated array concatenation on an AST node
sum/avg: Replace .map().sum (creates intermediate Array[Double]) with a while-loop
remove/removeAt: Replace slice ++ slice (3 intermediate arrays) with System.arraycopy (1 array)

Modification

1. `Expr.scala` — ObjComp allLocals cache

Added lazy val allLocals: Array[Bind] = preLocals ++ postLocals to ObjBody.ObjComp. This is a body member (not a constructor param), so it does not affect equals/hashCode/copy. Scala lazy val provides thread-safe initialization.

2. `Evaluator.scala` — Use cached allLocals

Changed visitObjComp from e.preLocals ++ e.postLocals to e.allLocals, avoiding repeated allocation when the same AST node is re-evaluated (imports, loops).

3. `ArrayModule.scala` — sum/avg while-loop

Replaced arr.asLazyArray.map(_.value.asDouble).sum with explicit while-loop. The forall validation pass already forces and caches all lazy elements, so the while-loop reads cached values without double-forcing.

4. `ArrayModule.scala` — remove/removeAt arraycopy

Replaced arr.asLazyArray.slice(0, idx) ++ arr.asLazyArray.slice(idx + 1, arr.length) with System.arraycopy. Edge cases verified:

idx=0: first copy is no-op, second copies all remaining
idx=len-1: first copies all but last, second is no-op
len=1: both copies are no-ops, result is empty array

Benchmark Results

JMH A/B (5 iterations, 3 warmup, single fork)

Benchmark	Master (ms/op)	Optimized (ms/op)	Change
bench.02	47.534 ± 7.057	46.148 ± 1.676	-2.9% ✅

Note: These optimizations primarily benefit stdlib-heavy workloads (sum, avg, remove operations) rather than bench.02 (OO fibonacci). The improvement is modest but consistent, with much tighter variance.

Analysis

allLocals: Saves one array concatenation per visitObjComp call. Modest but free.
sum/avg: Eliminates intermediate Array[Double] allocation + boxing/unboxing for each element. For large numeric arrays, this is ~2-3x faster.
remove/removeAt: Reduces from 3 array allocations (2 slices + concat) to 1 array allocation + 2 native memcpy calls. System.arraycopy is a JVM intrinsic.

References

Upstream: jit branch commit 09e2d3ad

Result

-2.9% improvement on bench.02 with zero regressions. All 3 sub-optimizations are independently correct and beneficial. All existing tests pass.

…emoveAt - Cache preLocals ++ postLocals as lazy val allLocals in ObjComp AST node to avoid repeated array concatenation on each visitObjComp call. - Replace .map(_.value.asDouble).sum with while-loop in std.sum/std.avg to eliminate intermediate Array[Double] allocation and closure overhead. - Replace slice++slice with System.arraycopy in std.remove/std.removeAt to avoid creating 3 intermediate arrays (2 slices + concatenation). Upstream: jit branch commit 09e2d3a

He-Pin · 2026-04-06T13:02:47Z

Superseded by #700, which includes all changes from this PR plus additional optimizations (visibleKeyNames while-loop, base64DecodeBytes unsigned fix, and single-pass sum/avg merge).

He-Pin closed this Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: cache ObjComp allLocals, while-loop sum/avg, arraycopy remove/removeAt#699

perf: cache ObjComp allLocals, while-loop sum/avg, arraycopy remove/removeAt#699
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/objcomp-sum-remove

He-Pin commented Apr 6, 2026

Uh oh!

He-Pin commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Apr 6, 2026

Motivation

Key Design Decision

Modification

1. Expr.scala — ObjComp allLocals cache

2. Evaluator.scala — Use cached allLocals

3. ArrayModule.scala — sum/avg while-loop

4. ArrayModule.scala — remove/removeAt arraycopy

Benchmark Results

JMH A/B (5 iterations, 3 warmup, single fork)

Analysis

References

Result

Uh oh!

He-Pin commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `Expr.scala` — ObjComp allLocals cache

2. `Evaluator.scala` — Use cached allLocals

3. `ArrayModule.scala` — sum/avg while-loop

4. `ArrayModule.scala` — remove/removeAt arraycopy