perf: split viewDepth into separate depthBuffer for sort cache locality by mvaligursky · Pull Request #8587 · playcanvas/engine

mvaligursky · 2026-04-10T15:36:55Z

Splits viewDepth out of projCache into a dedicated parallel depthBuffer, improving sort pass cache behavior for zero additional memory cost.

Changes:

Reduce CACHE_STRIDE from 8 to 7 by removing viewDepth from projCache slot 7
Add a parallel depthBuffer (1 u32 per splat) written during tile count, read by all sort and rasterize passes
Sort passes (bitonic, bucket sort, chunk sort) now read depth via stride-1 depthBuffer[entryIdx] instead of stride-8 projCache[entryIdx * 8 + 7], eliminating cache thrashing on random depth lookups
Merge globalPairCounter and largeSplatCount into a single countersBuffer[2] to stay within the WebGPU 10 storage-buffer-per-stage limit on Metal
Restructure large tile count shader to use isActive flag instead of early return, required for WGSL uniform control flow with atomicLoad-derived bounds

Performance:

Bucket sort: 0.6ms → 0.2ms (3x improvement)
Tile sort: 1.3ms → 1.2ms
Total frame: 8.5ms → 8.2ms (~3.5% improvement)
Measured on 17M-splat scene, zero memory overhead (projCache shrinks by exactly the amount depthBuffer adds)

Move viewDepth out of projCache (slot 7) into a parallel depthBuffer, reducing CACHE_STRIDE from 8 to 7. Total memory is unchanged — projCache shrinks by 1 u32/splat while depthBuffer adds 1 u32/splat. Sort passes (bitonic, bucket sort, chunk sort) now read depth via stride-1 depthBuffer[entryIdx] instead of stride-8 projCache[entryIdx * 8 + 7], eliminating cache thrashing on random depth lookups. Measured 3x improvement on bucket sort (0.6ms → 0.2ms) and ~0.3ms total frame improvement on 17M-splat scenes. Also merges globalPairCounter and largeSplatCount into a single countersBuffer[2] to stay within the 10 storage-buffer-per-stage WebGPU limit on Metal.

mvaligursky self-assigned this Apr 10, 2026

vercel bot deployed to Preview – engine-api-docs April 10, 2026 15:37 View deployment

vercel bot deployed to Preview – engine April 10, 2026 15:38 View deployment

mvaligursky added performance Relating to load times or frame rate area: graphics Graphics related issue labels Apr 10, 2026

mvaligursky merged commit 75ad28a into main Apr 10, 2026
8 checks passed

mvaligursky deleted the mv-depth-buffer-split branch April 10, 2026 15:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: split viewDepth into separate depthBuffer for sort cache locality#8587

perf: split viewDepth into separate depthBuffer for sort cache locality#8587
mvaligursky merged 1 commit intomainfrom
mv-depth-buffer-split

mvaligursky commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mvaligursky commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant