Skip to content

perf: stdlib batch 3 — while-loop conversions and allocation reduction (19 functions)#702

Closed
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/while-loop-stdlib-3
Closed

perf: stdlib batch 3 — while-loop conversions and allocation reduction (19 functions)#702
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/while-loop-stdlib-3

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 6, 2026

Motivation

Continue the systematic stdlib optimization effort. This batch converts 19 functions across 8 files from Scala collection operations (.map, .filter, .forall, .exists, .foreach, .mkString) to while-loops and replaces intermediate allocations with direct array operations (System.arraycopy, StringBuilder).

Key Design Decisions

  1. Lazy semantics preservation: std.count and std.member guard x.value evaluation with empty-array checks — std.count([], error "boom") must return 0, not throw. This was caught during cross-review.

  2. Surrogate-safe string comparison: The compareStringsByCodepoint equality fast-path is restricted to c1 < 0xD800 to prevent misaligned surrogate pair comparisons when one side has valid pairs and the other has unpaired surrogates.

  3. Single-pass sum/avg: Old code did two passes (validate all elements with forall, then map + sum). New code validates and accumulates in a single pass — same error message, earlier failure on invalid input.

Modification

ArrayModule.scala (9 functions)

  • std.all: forall → while-loop with Val.False(pos) early return
  • std.any: iterator.exists → while-loop with Val.True(pos) early return
  • std.count: foreach closure → while-loop; x.value hoisted with empty guard
  • std.member (array): indexWhere → while-loop with var found; empty guard
  • std.contains: indexWhere → while-loop
  • std.remove: indexWhere + slice/++ → while-loop search + System.arraycopy
  • std.removeAt: slice/++System.arraycopy
  • std.repeat (array): ArrayBuilder → pre-sized array + System.arraycopy
  • std.sum/avg: two-pass → single-pass while-loop with pattern match

ObjectModule.scala

  • prune: for-comprehension → ArrayBuilder while-loop, fusing map+filter

StringModule.scala

  • escapeStringXml: StringWriter + forjava.lang.StringBuilder + while + charAt

ManifestModule.scala (4 functions)

  • Lines: filter + map + mkString → while-loop + StringBuilder
  • manifestYamlStream: map + mkString → while-loop + StringBuilder
  • manifestIni: flatMap + Seq + mkStringStringBuilder direct append
  • manifestPythonVars: map + mkStringStringBuilder

SetModule.scala

  • sortArr: reuse single-element argBuf for keyF invocations; while-loops for key computation, index remapping, and strict evaluation

DecimalFormat.scala

  • leftPad/rightPad: string concat with "0" * nStringBuilder

Evaluator.scala

  • visitApply: e.args.map(...) → while-loops (Array[Val] eager, Array[Eval] lazy)

Util.scala

  • compareStringsByCodepoint: add c1 == c2 && c1 < 0xD800 fast skip + c1 < 0xD800 && c2 < 0xD800 direct subtraction

Benchmark Results

JMH (focused, wi=3, i=5, f=1)

Benchmark Master Batch 3 Delta
bench.02 46.042 46.287 ~0% (noise)
realistic2 66.377 65.202 ± 0.759 -1.8%
bench.04 33.116 31.963 -3.5%
bench.07 3.557 3.166 -11.0%

Full Regression Suite (22 cases)

No significant regressions across any benchmark case.

Analysis

These are incremental micro-optimizations that primarily reduce GC pressure through fewer intermediate allocations (no lambda closures, no temporary arrays from .map). The gains are most visible in benchmarks that exercise stdlib functions heavily. The visitApply while-loop benefits any code with variable-arity function calls.

References

Result

All 55 test suites pass. 22 benchmark cases show no regressions.

Convert 19 stdlib/core functions from Scala collection operations to
while-loops and replace intermediate allocations with direct array ops:

ArrayModule: std.all, std.any, std.count, std.member (array), std.contains,
  std.remove (while-loop search + arraycopy), std.removeAt (arraycopy),
  std.repeat (pre-sized array + arraycopy), std.sum/std.avg (single-pass)
ObjectModule: prune (fuse map+filter with ArrayBuilder)
StringModule: escapeStringXml (StringBuilder + charAt)
ManifestModule: Lines, manifestYamlStream, manifestIni, manifestPythonVars
  (all StringBuilder conversions)
SetModule: sortArr (reuse argBuf, while-loop key compute + index remap)
DecimalFormat: leftPad/rightPad (StringBuilder)
Evaluator: visitApply (while-loop arg evaluation)
Util: compareStringsByCodepoint (fast path for equal non-surrogate chars)

Key safety guards:
- std.count/std.member: guard x.value with empty-array check to preserve
  lazy semantics (std.count([], error) must return 0, not throw)
- compareStringsByCodepoint: equality fast-path restricted to c1 < 0xD800
  to avoid misaligned surrogate pair comparison

Upstream: jit branch commits b3c696f, 9ff850f, e19833b, 72eb971,
  d1629fd, cd612df, af4832f
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented Apr 6, 2026

Code Review

I found several correctness issues in this PR:

1. std.sum — returns raw Double, not Val.Num

builtin("sum", "arr") { (_, _, arr: Val.Arr) =>
  val la = arr.asLazyArray
  var sum = 0.0
  var i = 0
  while (i < la.length) {
    la(i).value match {
      case n: Val.Num => sum += n.asDouble
      case _          => Error.fail("Argument must be an array of numbers")
    }
    i += 1
  }
  sum  // <-- Returns Double directly, but the builtin expects Val
}

In Scala, a Double is implicitly converted to Val.Num only if there's an implicit conversion in scope. However, there is no such implicit conversion in ArrayModule. The return type of the builtin lambda should be Val, and returning a bare Double would be a compile error unless there's an implicit.

Wait — looking more carefully: the builtin helper likely expects the lambda to return Val, but sum is a Double. This should fail to compile unless there's an implicit conversion from Double to Val somewhere. If this compiles, it's likely via an implicit that wraps it in Val.Num(pos, ...), but the position is wrong — it would use some implicit position, not the call site position.

Actually, after checking: the builtin method signature likely uses a type class or implicit conversion. If there's an implicit Double => Val.Num(pos) using the captured pos, this is fine. But if the implicit doesn't exist, this is a compile error. Please verify this compiles.

2. std.avg — same Double return issue

sum / la.length  // Same concern as std.sum

3. std.count — lazy semantics guard is correct

if (a.isEmpty) return Val.Num(pos, 0)
val xVal = x.value

This is correct — x.value is only forced when the array is non-empty. No issue.

4. std.member (array) — lazy semantics guard is correct

if (la.isEmpty) false
else {
  val xVal = x.value
  ...
}

Correct — same pattern as std.count. No issue.

5. std.removeidx could be at the last position, arraycopy is safe

System.arraycopy(lazyArray, idx + 1, result, idx, lazyArray.length - idx - 1)

When idx == lazyArray.length - 1, this becomes arraycopy(src, len, dest, len-1, 0) which copies 0 elements. This is valid. No issue.

6. compareStringsByCodepoint — surrogate fast path condition

if (c1 == c2 && c1 < 0xd800.toChar) {

The condition c1 < 0xd800.toChar correctly excludes the surrogate range (0xD800-0xDFFF). But what about the range 0xE000-0xFFFF (valid BMP characters above surrogates)? These are NOT surrogates, so c1 < 0xd800.toChar being false for them means they fall through to the else if branch:

} else if (c1 < 0xd800.toChar && c2 < 0xd800.toChar) {

For chars in 0xE000-0xFFFF, this condition is also false, so they fall through to the full codepoint logic. This is correct but suboptimal — these chars could be compared directly. Not a bug, just a missed optimization.

7. visitApply — while-loop arg evaluation changes exception ordering

Previously: e.args.map(visitExpr(_)) evaluated all args left-to-right via map.
Now: while loop evaluates left-to-right explicitly.

The behavior is the same — left-to-right evaluation for tailstrict calls. No issue.

8. escapeStringXmlStringBuilder initial capacity

val sb = new java.lang.StringBuilder(str.length)

If the string contains characters that need escaping (e.g., & -> &amp;), the output will be longer than the input. The initial capacity will be too small, causing reallocation. This is a minor performance concern, not a correctness issue. The original StringWriter also had no size hint.

9. manifestIni — trailing newline behavior change

sb.append(k).append(" = ").append(render(x)).append('\n')

The original code used .mkString which joins elements. The new code appends \n after every line. Let me check if this produces the same output...

The original: lines.flatMap(Seq(_, "\n")).mkString — this produces each line followed by \n, which is exactly what the new code does. Same behavior, no issue.

@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented Apr 6, 2026

Closing: superseded by consolidated stdlib optimization effort. All 19 function optimizations from this PR are preserved and will be resubmitted with comprehensive Scala Native benchmarks.

@He-Pin He-Pin closed this Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant