Shopify · cpakman · Mar 11, 2026 · Mar 11, 2026 · Mar 11, 2026 · Mar 11, 2026
diff --git a/auto/autoresearch.ideas.md b/auto/autoresearch.ideas.md
@@ -0,0 +1,30 @@
+# Autoresearch Ideas
+
+## Dead Ends (tried and failed)
+
+- **Tag name interning** (skip+byte dispatch): saves 878 allocs but verification loop overhead kills speed
+- **String dedup (-@)** for filter names: no alloc savings, creates temp strings anyway
+- **Split-based tokenizer**: 2.5x faster C-level split but can't handle {{ followed by %} nesting
+- **Streaming tokenizer**: needs own StringScanner (+alloc), per-shift overhead worse than eager array
+- **Merge simple_lookup? into initialize**: logic overhead offsets saved index call
+- **Cursor for filter scanning**: cursor.reset overhead worse than inline byte loops
+- **Direct strainer call**: YJIT already inlines context.invoke_single well
+- **TruthyCondition subclass**: YJIT polymorphism at evaluate call site hurts more than 115 saved allocs
+- **Index loop for filters**: YJIT optimizes each+destructure MUCH better than manual filter[0]/filter[1]
+
+## Key Insights
+
+- YJIT monomorphism > allocation reduction at this scale
+- C-level StringScanner.scan/skip > Ruby-level byte loops (already applied)
+- String#split is 2.5x faster than manual tokenization, but Liquid's grammar is too complex for regex
+- 74% of total CPU time is GC — alloc reduction is the highest-leverage optimization
+- But YJIT-deoptimization from polymorphism costs more than the GC savings
+
+## Remaining Ideas
+
+- **Tokenizer: use String#index + byteslice instead of StringScanner**: avoid the StringScanner overhead entirely for the simple case of finding {%/{{ delimiters
+- **Pre-freeze all Condition operator lambdas**: reduce alloc in Condition initialization
+- **Avoid `@blocks = []` in If with single-element optimization**: use `@block` ivar for single condition, only create array for elsif
+- **Reduce ForloopDrop allocation**: reuse ForloopDrop objects across iterations or use a lighter-weight object
+- **VariableLookup: single-segment optimization**: for "product.title" (1 lookup), use an ivar instead of 1-element Array
+
diff --git a/auto/autoresearch.md b/auto/autoresearch.md
@@ -0,0 +1,109 @@
+# Autoresearch: Liquid Parse+Render Performance
+
+## Objective
+Optimize the Shopify Liquid template engine's parse and render performance.
+The workload is the ThemeRunner benchmark which parses and renders real Shopify
+theme templates (dropify, ripen, tribble, vogue) with realistic data from
+`performance/shopify/database.rb`. We measure parse time, render time, and
+object allocations. The optimization target is combined parse+render time (µs).
+
+## How to Run
+Run `./auto/autoresearch.sh` — it runs unit tests, liquid-spec conformance,
+then the performance benchmark, outputting metrics in parseable format.
+
+## Metrics
+- **Primary (optimization target)**: `combined_µs` (µs, lower is better) — sum of parse + render time
+- **Secondary (tradeoff monitoring)**:
+  - `parse_µs` — time to parse all theme templates (Liquid::Template#parse)
+  - `render_µs` — time to render all pre-compiled templates
+  - `allocations` — total object allocations for one parse+render cycle
+  Parse dominates (~70-75% of combined). Allocations correlate with GC pressure.
+
+## Files in Scope
+- `lib/liquid/*.rb` — core Liquid library (parser, lexer, context, expression, etc.)
+- `lib/liquid/tags/*.rb` — tag implementations (for, if, assign, etc.)
+- `performance/bench_quick.rb` — benchmark script
+
+## Off Limits
+- `test/` — tests must continue to pass unchanged
+- `performance/tests/` — benchmark templates, do not modify
+- `performance/shopify/` — benchmark data/filters, do not modify
+
+## Constraints
+- All unit tests must pass (`bundle exec rake base_test`)
+- liquid-spec failures must not increase beyond 2 (pre-existing UTF-8 edge cases)
+- No new gem dependencies
+- Semantic correctness must be preserved — templates must render identical output
+- **Security**: Liquid runs untrusted user code. See Strategic Direction for details.
+
+## Strategic Direction
+The long-term goal is to converge toward a **single-pass, forward-only parsing
+architecture** using one shared StringScanner instance. The current system has
+multiple redundant passes: Tokenizer → BlockBody → Lexer → Parser → Expression
+→ VariableLookup, each re-scanning portions of the source. A unified scanner
+approach would:
+
+1. **One StringScanner** flows through the entire parse — no intermediate token
+   arrays, no re-lexing filter chains, no string reconstruction in Parser#expression.
+2. **Emit a lightweight IL or normalized AST** during the single forward pass,
+   decoupling strictness checking from the hot parse path. The LiquidIL project
+   (`~/src/tries/2026-01-05-liquid-il`) demonstrated this: a recursive-descent
+   parser emitting IL directly achieved significant speedups.
+3. **Minimal backtracking** — the scanner advances forward, byte-checking as it
+   goes. liquid-c (`~/src/tries/2026-01-16-Shopify-liquid-c`) showed that a
+   C-level cursor-based tokenizer eliminates most allocation overhead.
+
+Current fast-path optimizations (byte-level tag/variable/for/if parsing) are
+steps toward this goal. Each one replaces a regex+MatchData pattern with
+forward-only byte scanning. The remaining Lexer→Parser path for filter args
+is the next target for elimination.
+
+**Security note**: Liquid executes untrusted user templates. All parsing must
+use explicit byte-range checks. Never use eval, send on user input, dynamic
+method dispatch, const_get, or any pattern that lets template authors escape
+the sandbox.
+
+## Baseline
+- **Commit**: 4ea835a (original, before any optimizations)
+- **combined_µs**: 7,374
+- **parse_µs**: 5,928
+- **render_µs**: 1,446
+- **allocations**: 62,620
+
+## Progress Log
+- 3329b09: Replace FullToken regex with manual byte parsing → combined 7,262 (-1.5%)
+- 97e6893: Replace VariableParser regex with manual byte scanner → combined 6,945 (-5.8%), allocs 58,009
+- 2b78e4b: getbyte instead of string indexing in whitespace_handler/create_variable → allocs 51,477
+- d291e63: Lexer equal? for frozen arrays, \s+ whitespace skip → combined ~6,331
+- d79b9fa: Avoid strip alloc in Expression.parse, byteslice for strings → allocs 49,151
+- fa41224: Short-circuit parse_number with first-byte check → allocs 48,240
+- c1113ad: Fast-path String in render_obj_to_output → combined ~6,071
+- 25f9224: Fast-path simple variable parsing (skip Lexer/Parser) → combined ~5,860, allocs 45,202
+- 3939d74: Replace SIMPLE_VARIABLE regex with byte scanner → combined ~5,717, allocs 42,763
+- fe7a2f5: Fast-path simple if conditions → combined ~5,444, allocs 41,490
+- cfa0dfe: Replace For tag Syntax regex with manual byte parser → combined ~4,974, allocs 39,847
+- 8a92a4e: Unified fast-path Variable: parse name directly, only lex filter chain → combined ~5,060, allocs 40,520
+- 58d2514: parse_tag_token returns [tag_name, markup, newlines] → combined ~4,815, allocs 37,355
+- db43492: Hoist write score check out of render loop → render ~1,345
+- 17daac9: Extend fast-path to quoted string literal variables → all 1,197 variables fast-pathed
+- 9fd7cec: Split filter parsing: no-arg filters scanned directly, Lexer only for args → combined ~4,595, allocs 35,159
+- e5933fc: Avoid array alloc in parse_tag_token via class ivars → allocs 34,281
+- 2e207e6: Replace WhitespaceOrNothing regex with byte-level blank_string? → combined ~4,800
+- 526af22: invoke_single fast path for no-arg filter invocation → allocs 32,621
+- 76ae8f1: find_variable top-scope fast path → combined ~4,740
+- 4cda1a5: slice_collection: skip copy for full Array → allocs 32,004
+- 79840b1: Replace SIMPLE_CONDITION regex with manual byte parser → combined ~4,663, allocs 31,465
+- 69430e9: Replace INTEGER_REGEX/FLOAT_REGEX with byte-level parse_number → allocs 31,129
+- 405e3dc: Frozen EMPTY_ARRAY/EMPTY_HASH for Context @filters/@disabled_tags → allocs 31,009
+- b90d7f0: Avoid unnecessary array wrapping for Context environments → allocs 30,709
+- 3799d4c: Lazy seen={} hash in Utils.to_s/inspect → allocs 30,169
+- 0b07487: Fast-path VariableLookup: skip scan_variable for simple identifiers → allocs 29,711
+- 9de1527: Introduce Cursor class for centralized byte-level scanning
+- dd4a100: Remove dead parse_tag_token/SIMPLE_CONDITION (now in Cursor)
+- cdc3438: For tag: migrate lax_parse to Cursor with zero-alloc scanning → allocs 29,620
+
+## Current Best
+- **combined_µs**: ~3,400 (-54% from original 7,374 baseline)
+- **parse_µs**: ~2,300
+- **render_µs**: ~1,100
+- **allocations**: 24,882 (-60% from original 62,620 baseline)
diff --git a/auto/autoresearch.sh b/auto/autoresearch.sh
@@ -0,0 +1,48 @@
+#!/usr/bin/env bash
+# Autoresearch benchmark runner for Liquid performance optimization
+# Runs: unit tests → performance benchmark (3 runs, takes best)
+# Outputs METRIC lines for the agent to parse
+# Exit code 0 = all good, non-zero = broken
+set -euo pipefail
+
+cd "$(dirname "$0")/.."
+
+# ── Step 1: Unit tests (fast gate) ──────────────────────────────────
+echo "=== Unit Tests ==="
+TEST_OUT=$(bundle exec rake base_test 2>&1)
+TEST_RESULT=$(echo "$TEST_OUT" | tail -1)
+if echo "$TEST_OUT" | grep -q 'failures\|errors' && ! echo "$TEST_RESULT" | grep -q '0 failures, 0 errors'; then
+  echo "$TEST_OUT" | grep -E 'Failure|Error|failures|errors' | head -20
+  echo "FATAL: unit tests failed"
+  exit 1
+fi
+echo "$TEST_RESULT"
+
+# ── Step 2: Performance benchmark (3 runs, take best) ──────────────
+echo ""
+echo "=== Performance Benchmark (3 runs) ==="
+BEST_COMBINED=999999
+BEST_PARSE=0
+BEST_RENDER=0
+BEST_ALLOC=0
+
+for i in 1 2 3; do
+  OUT=$(bundle exec ruby performance/bench_quick.rb 2>&1)
+  P=$(echo "$OUT" | grep '^parse_us=' | cut -d= -f2)
+  R=$(echo "$OUT" | grep '^render_us=' | cut -d= -f2)
+  C=$(echo "$OUT" | grep '^combined_us=' | cut -d= -f2)
+  A=$(echo "$OUT" | grep '^allocations=' | cut -d= -f2)
+  echo "  run $i: combined=${C}µs (parse=${P} render=${R}) allocs=${A}"
+  if [ "$C" -lt "$BEST_COMBINED" ]; then
+    BEST_COMBINED=$C
+    BEST_PARSE=$P
+    BEST_RENDER=$R
+    BEST_ALLOC=$A
+  fi
+done
+
+echo ""
+echo "METRIC combined_us=$BEST_COMBINED"
+echo "METRIC parse_us=$BEST_PARSE"
+echo "METRIC render_us=$BEST_RENDER"
+echo "METRIC allocations=$BEST_ALLOC"
diff --git a/auto/bench.sh b/auto/bench.sh
@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+# Auto-research benchmark script for Liquid
+# Runs: unit tests → liquid-spec → performance benchmark
+# Outputs machine-readable metrics on success
+# Exit code 0 = all good, non-zero = broken
+set -euo pipefail
+
+cd "$(dirname "$0")/.."
+
+# ── Step 1: Unit tests (fast gate) ──────────────────────────────────
+echo "=== Unit Tests ==="
+if ! bundle exec rake base_test 2>&1; then
+  echo "FATAL: unit tests failed"
+  exit 1
+fi
+
+# ── Step 2: liquid-spec (correctness gate) ──────────────────────────
+echo ""
+echo "=== Liquid Spec ==="
+SPEC_OUTPUT=$(bundle exec liquid-spec run spec/ruby_liquid.rb 2>&1 || true)
+echo "$SPEC_OUTPUT" | tail -3
+
+# Extract failure count from "Total: N passed, N failed, N errors" line
+# Allow known pre-existing failures (≤2)
+TOTAL_LINE=$(echo "$SPEC_OUTPUT" | grep "^Total:" || echo "Total: 0 passed, 0 failed, 0 errors")
+FAILURES=$(echo "$TOTAL_LINE" | sed -n 's/.*\([0-9][0-9]*\) failed.*/\1/p')
+ERRORS=$(echo "$TOTAL_LINE" | sed -n 's/.*\([0-9][0-9]*\) error.*/\1/p')
+FAILURES=${FAILURES:-0}
+ERRORS=${ERRORS:-0}
+TOTAL_BAD=$((FAILURES + ERRORS))
+
+if [ "$TOTAL_BAD" -gt 2 ]; then
+  echo "FATAL: liquid-spec has $FAILURES failures and $ERRORS errors (threshold: 2)"
+  exit 1
+fi
+
+# ── Step 3: Performance benchmark ──────────────────────────────────
+echo ""
+echo "=== Performance Benchmark ==="
+bundle exec ruby performance/bench_quick.rb 2>&1
diff --git a/autoresearch.jsonl b/autoresearch.jsonl
@@ -0,0 +1,30 @@
+{"type":"config","name":"Liquid parse+render performance (tenderlove-inspired)","metricName":"combined_µs","metricUnit":"µs","bestDirection":"lower"}
+{"run":1,"commit":"c09e722","metric":3818,"metrics":{"parse_µs":2722,"render_µs":1096,"allocations":24881},"status":"keep","description":"Baseline: 3,818µs combined, 24,881 allocs","timestamp":1773348490227}
+{"run":2,"commit":"c09e722","metric":4063,"metrics":{"parse_µs":2901,"render_µs":1162,"allocations":24003},"status":"discard","description":"Tag name interning via skip+byte dispatch: saves 878 allocs but verification loop slower than scan","timestamp":1773348738557,"segment":0}
+{"run":3,"commit":"c09e722","metric":3881,"metrics":{"parse_µs":2720,"render_µs":1161,"allocations":24881},"status":"discard","description":"String dedup (-@) for filter names: no alloc savings, no speed benefit","timestamp":1773348781481,"segment":0}
+{"run":4,"commit":"c09e722","metric":3970,"metrics":{"parse_µs":2829,"render_µs":1141,"allocations":24881},"status":"discard","description":"Streaming tokenizer: needs own StringScanner (+1 alloc), per-shift overhead worse than saved array","timestamp":1773348883093,"segment":0}
+{"run":5,"commit":"c09e722","metric":0,"metrics":{"parse_µs":0,"render_µs":0,"allocations":0},"status":"crash","description":"REVERTED: split-based tokenizer — regex can't handle unclosed tags inside raw blocks","timestamp":1773349089230,"segment":0}
+{"run":6,"commit":"c09e722","metric":0,"metrics":{"parse_µs":0,"render_µs":0,"allocations":0},"status":"crash","description":"REVERTED: split regex tokenizer v2 — can't handle {{ followed by %} (variable-becomes-tag nesting)","timestamp":1773349248313,"segment":0}
+{"run":7,"commit":"c09e722","metric":3861,"metrics":{"parse_µs":2744,"render_µs":1117,"allocations":24881},"status":"discard","description":"Merge simple_lookup? dot position into initialize — logic overhead offsets saved index call","timestamp":1773349376707,"segment":0}
+{"run":8,"commit":"c09e722","metric":4048,"metrics":{"parse_µs":2929,"render_µs":1119,"allocations":24881},"status":"discard","description":"Use Cursor regex for filter name scanning — cursor.reset + method dispatch overhead worse than inline bytes","timestamp":1773349447172,"segment":0}
+{"run":9,"commit":"c09e722","metric":3872,"metrics":{"parse_µs":2744,"render_µs":1128,"allocations":24881},"status":"discard","description":"Direct strainer call in Variable#render — YJIT already inlines context.invoke_single well","timestamp":1773349497593,"segment":0}
+{"run":10,"commit":"c09e722","metric":3839,"metrics":{"parse_µs":2732,"render_µs":1107,"allocations":24879},"status":"discard","description":"Array#[] fast path for slice_collection with limit/offset — only 2 alloc savings, not meaningful","timestamp":1773349555348,"segment":0}
+{"run":11,"commit":"c09e722","metric":3889,"metrics":{"parse_µs":2770,"render_µs":1119,"allocations":24766},"status":"discard","description":"TruthyCondition for simple if checks: -115 allocs but YJIT polymorphism at evaluate call site hurts speed","timestamp":1773349649377,"segment":0}
+{"run":12,"commit":"c09e722","metric":4150,"metrics":{"parse_µs":2769,"render_µs":1381,"allocations":24881},"status":"discard","description":"Index loop for filters: YJIT optimizes each+destructure better than manual indexing","timestamp":1773349699285,"segment":0}
+{"run":13,"commit":"b7ae55f","metric":3556,"metrics":{"parse_µs":2388,"render_µs":1168,"allocations":24882},"status":"keep","description":"Replace StringScanner tokenizer with String#byteindex — 12% faster parse, no regex overhead for delimiter finding","timestamp":1773349875890,"segment":0}
+{"run":14,"commit":"e25f2f1","metric":3464,"metrics":{"parse_µs":2335,"render_µs":1129,"allocations":24882},"status":"keep","description":"Confirmation run: byteindex tokenizer consistently 3,400-3,600µs","timestamp":1773349889465,"segment":0}
+{"run":15,"commit":"b37fa98","metric":3490,"metrics":{"parse_µs":2331,"render_µs":1159,"allocations":24882},"status":"keep","description":"Clean up tokenizer: remove unused StringScanner setup and regex constants","timestamp":1773349928672,"segment":0}
+{"run":16,"commit":"b37fa98","metric":3638,"metrics":{"parse_µs":2460,"render_µs":1178,"allocations":24882},"status":"discard","description":"Single-char byteindex for %} search: Ruby loop overhead worse for nearby targets","timestamp":1773349985509,"segment":0}
+{"run":17,"commit":"b37fa98","metric":3553,"metrics":{"parse_µs":2431,"render_µs":1122,"allocations":25256},"status":"discard","description":"Regex simple_variable_markup: MatchData creates 374 extra allocs, offsetting speed gain","timestamp":1773350066627,"segment":0}
+{"run":18,"commit":"b37fa98","metric":3629,"metrics":{"parse_µs":2455,"render_µs":1174,"allocations":25002},"status":"discard","description":"String.new(capacity: 4096) for output buffer: allocates more objects, not fewer","timestamp":1773350101852,"segment":0}
+{"run":19,"commit":"f6baeae","metric":3350,"metrics":{"parse_µs":2212,"render_µs":1138,"allocations":24882},"status":"keep","description":"parse_tag_token without StringScanner: pure byte ops avoid reset(token) overhead, -12% combined","timestamp":1773350230252,"segment":0}
+{"run":20,"commit":"f6baead","metric":0,"metrics":{"parse_µs":0,"render_µs":0,"allocations":0},"status":"crash","description":"REVERTED: regex ultra-fast path for Variable — name pattern too broad, matches invalid trailing dots","timestamp":1773350472859,"segment":0}
+{"run":21,"commit":"ae9a2e2","metric":3314,"metrics":{"parse_µs":2203,"render_µs":1111,"allocations":24882},"status":"keep","description":"Clean confirmation run: 3,314µs (-55% from main), stable","timestamp":1773350544354,"segment":0}
+{"run":22,"commit":"ae9a2e2","metric":3497,"metrics":{"parse_µs":2336,"render_µs":1161,"allocations":24882},"status":"discard","description":"Regex fast path for no-filter variables: include? + match? overhead exceeds byte scan savings","timestamp":1773350641375,"segment":0}
+{"run":23,"commit":"ca327b0","metric":3445,"metrics":{"parse_µs":2284,"render_µs":1161,"allocations":24647},"status":"keep","description":"Condition#evaluate: skip loop block for simple conditions (no child_relation) — saves 235 allocs","timestamp":1773350691752,"segment":0}
+{"run":24,"commit":"99454a9","metric":3489,"metrics":{"parse_µs":2353,"render_µs":1136,"allocations":24647},"status":"keep","description":"Replace simple_lookup? byte scan with match? regex — 8x faster per call, cleaner code","timestamp":1773350837721,"segment":0}
+{"run":25,"commit":"99454a9","metric":3797,"metrics":{"parse_µs":2636,"render_µs":1161,"allocations":29627},"status":"discard","description":"Regex name extraction in try_fast_parse: MatchData creates 5K extra allocs, much worse","timestamp":1773351048938,"segment":0}
+{"run":26,"commit":"db348e0","metric":3459,"metrics":{"parse_µs":2318,"render_µs":1141,"allocations":24647},"status":"keep","description":"Inline to_liquid_value in If render — avoids one method dispatch per condition evaluation","timestamp":1773351080001,"segment":0}
+{"run":27,"commit":"b195d09","metric":3496,"metrics":{"parse_µs":2356,"render_µs":1140,"allocations":24530},"status":"keep","description":"Replace @blocks.each with while loop in If render — avoids block proc allocation per render","timestamp":1773351101134,"segment":0}
+{"run":28,"commit":"b195d09","metric":3648,"metrics":{"parse_µs":2457,"render_µs":1191,"allocations":24530},"status":"discard","description":"While loop in For render: YJIT optimizes each well for hot loops with many iterations","timestamp":1773351142275,"segment":0}
+{"run":29,"commit":"b195d09","metric":3966,"metrics":{"parse_µs":2641,"render_µs":1325,"allocations":24060},"status":"discard","description":"While loop for environment search: -470 allocs but YJIT deopt makes render 16% slower","timestamp":1773351193863,"segment":0}
diff --git a/lib/liquid.rb b/lib/liquid.rb
@@ -52,6 +52,8 @@ module Liquid
 require "liquid/version"
 require "liquid/deprecations"
 require "liquid/const"
+require 'liquid/byte_tables'
+require 'liquid/cursor'
 require 'liquid/standardfilters'
 require 'liquid/file_system'
 require 'liquid/parser_switching'