Stone: add corner fill triangles at adjacent open edges and cap
triangles at strip terminaisons. Grass: replace bevel strips with
tuft-based grass blades — clusters of 3-9 curved double-sided
blades with per-tuft height/lean personality and hash-driven
placement (quadratic inset 0-0.30 from edge). Vegetation PS uses
half-Lambert wrap lighting + translucency for soft stylized shading
(inspired by Airborn Trees). Stone keeps classic Lambert.
Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.
- Add bleedMask/resistBleedMask bitmasks to CB for per-material blend control
- Grass: canBleed + resistsBleed (bleeds onto others, nothing bleeds onto it)
- Stone: no bleed (doesn't overflow, but accepts bleed from others)
- Other materials: normal bidirectional blending
- PS checks flags before blending: mainResists → skip, !neighCanBleed → skip
- Flatten terrain (heightScale 64→20) for better surface visibility
- Replace altitude-based material bands with noise-based 2D patches
(3 noise channels create organic patches of all 5 materials on surface)
- Make stone/sand more visually distinct (stone=blue-gray, sand=warm yellow)
- Lower stone heightContrast (1.2→0.5) so neighbors bleed onto it more
Compute shader fills indirect args buffer, replacing CPU cull loop.
Single DrawInstancedIndirectCount renders all visible face groups.
Key fixes:
- Compute shader: pack chunkIndex|(faceIndex<<16) in push constant,
startVertexLocation=0 (aligned with Phase 2.2 SV_VertexID fix)
- PushConstants must be called AFTER BindPipelineState, not before.
Wicked Engine dispatches to SetGraphicsRoot32BitConstants only when
active_pso is set; after BindComputeShader it targets compute instead.
- Barriers: UNDEFINED(COMMON)→UAV before compute, UAV→INDIRECT_ARGUMENT after
- Buffer decay: DX12 buffers always return to COMMON between frames,
no cross-frame state tracking needed
Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.
Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)
Also adds voxel_engine_spec.md and updates references from .docx to .md.
- VS supports dual mode: CPU path (push constants) and MDI path (binary search)
- CPU render loop now does per-face-group draws with backface culling (6 draws/chunk max)
- Frustum planes extracted and populated in constant buffer for GPU cull shader
- GPU cull + MDI path fully implemented but disabled (barrier/state debugging needed)
- GPU timestamp query infrastructure with readback for cull/draw timing
- HUD shows rendering mode (GPU cull vs CPU fallback)
Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture