Commit graph

5 commits

Author SHA1 Message Date
c2d1a1e0b6 Commit plan and iteration instructions 2026-03-31 20:04:00 +02:00
57ac08f231 Refactor: extract VoxelRTManager, DeferredGPUBuffer, decompose VoxelRenderPath
- Extract DeferredGPUBuffer utility (staging→dirty→capacity GPU buffer pattern)
- Extract VoxelRTManager from VoxelRenderer (~500 lines: BLAS/TLAS, RT shadows+AO)
- Decompose VoxelRenderPath into CameraController, AnimationState, VoxelProfiler
- Replace toping std::sort with O(n) counting sort by (type, variant)
- Update CLAUDE.md architecture docs to reflect new file structure
2026-03-31 13:46:35 +02:00
f134a5786d Add sky and refs 2026-03-30 21:54:55 +02:00
cd9814e494 Phase 5.2-5.3: CPU perf optimizations + GPU compute Surface Nets
CPU smooth mesher optimizations (560ms → 17ms):
- VoxelData grid cache eliminates redundant readVoxel calls
- Pre-cached 27 neighbor chunk pointers (readVoxelFast)
- smoothNear dilation (8 lookups/cell instead of 56)
- Early exit via containsSmooth flag on chunks
- Thread-local scratch buffers (SmoothScratch ~600KB)
- wi::jobsystem parallelization across all cores
- Persistent staging vectors for upload

TopingSystem optimizations (58ms → 6ms):
- collectInstancesParallel() with per-chunk local vectors
- Neighbor chunk pointer caching

GPU compute Surface Nets (Phase 5.3):
- Two-pass compute shader: centroid grid + emit with smooth normals
- Pass 1 (voxelSmoothCentroidCS): computes centroids + solid flags
  for cells [-1..32], cross-chunk neighbor voxel reading
- Pass 2 (voxelSmoothCS): reads ONLY from centroid grid, computes
  area-weighted smooth normals from 12 incident edges per vertex
- Batched dispatch: all centroid passes then all emit passes with
  single UAV→SRV barrier (instead of 2 barriers per chunk)
- Smooth chunk filtering: only dispatches chunks with containsSmooth
- Centroid grid buffer dynamically sized per smooth chunk count
- 1-frame readback delay with auto-redispatch on first frame
2026-03-27 22:30:43 +01:00
45af49a659 Phase 2.2: MDI rendering with CPU-filled indirect args
Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.

Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
  quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)

Also adds voxel_engine_spec.md and updates references from .docx to .md.
2026-03-25 22:07:22 +01:00