bvle-voxels

History

Samuel Bouchet cd9814e494 Phase 5.2-5.3: CPU perf optimizations + GPU compute Surface Nets CPU smooth mesher optimizations (560ms → 17ms): - VoxelData grid cache eliminates redundant readVoxel calls - Pre-cached 27 neighbor chunk pointers (readVoxelFast) - smoothNear dilation (8 lookups/cell instead of 56) - Early exit via containsSmooth flag on chunks - Thread-local scratch buffers (SmoothScratch ~600KB) - wi::jobsystem parallelization across all cores - Persistent staging vectors for upload TopingSystem optimizations (58ms → 6ms): - collectInstancesParallel() with per-chunk local vectors - Neighbor chunk pointer caching GPU compute Surface Nets (Phase 5.3): - Two-pass compute shader: centroid grid + emit with smooth normals - Pass 1 (voxelSmoothCentroidCS): computes centroids + solid flags for cells [-1..32], cross-chunk neighbor voxel reading - Pass 2 (voxelSmoothCS): reads ONLY from centroid grid, computes area-weighted smooth normals from 12 incident edges per vertex - Batched dispatch: all centroid passes then all emit passes with single UAV→SRV barrier (instead of 2 barriers per chunk) - Smooth chunk filtering: only dispatches chunks with containsSmooth - Centroid grid buffer dynamically sized per smooth chunk count - 1-frame readback delay with auto-redispatch on first frame	2026-03-27 22:30:43 +01:00
..
app	Phase 5.1: smooth PS blending uses same logic as blocky PS + debug scene	2026-03-27 14:21:35 +01:00
voxel	Phase 5.2-5.3: CPU perf optimizations + GPU compute Surface Nets	2026-03-27 22:30:43 +01:00

Samuel Bouchet cd9814e494 Phase 5.2-5.3: CPU perf optimizations + GPU compute Surface Nets

CPU smooth mesher optimizations (560ms → 17ms):
- VoxelData grid cache eliminates redundant readVoxel calls
- Pre-cached 27 neighbor chunk pointers (readVoxelFast)
- smoothNear dilation (8 lookups/cell instead of 56)
- Early exit via containsSmooth flag on chunks
- Thread-local scratch buffers (SmoothScratch ~600KB)
- wi::jobsystem parallelization across all cores
- Persistent staging vectors for upload

TopingSystem optimizations (58ms → 6ms):
- collectInstancesParallel() with per-chunk local vectors
- Neighbor chunk pointer caching

GPU compute Surface Nets (Phase 5.3):
- Two-pass compute shader: centroid grid + emit with smooth normals
- Pass 1 (voxelSmoothCentroidCS): computes centroids + solid flags
  for cells [-1..32], cross-chunk neighbor voxel reading
- Pass 2 (voxelSmoothCS): reads ONLY from centroid grid, computes
  area-weighted smooth normals from 12 incident edges per vertex
- Batched dispatch: all centroid passes then all emit passes with
  single UAV→SRV barrier (instead of 2 barriers per chunk)
- Smooth chunk filtering: only dispatches chunks with containsSmooth
- Centroid grid buffer dynamically sized per smooth chunk count
- 1-frame readback delay with auto-redispatch on first frame

2026-03-27 22:30:43 +01:00

app

Phase 5.1: smooth PS blending uses same logic as blocky PS + debug scene

2026-03-27 14:21:35 +01:00

voxel

Phase 5.2-5.3: CPU perf optimizations + GPU compute Surface Nets

2026-03-27 22:30:43 +01:00