Commit graph

7 commits

Author SHA1 Message Date
d7e69f97ca Phase 3: PS-based texture blending with winner-takes-all heightmap
Replace pre-encoded quad blend data (v1) with per-pixel voxel data
lookups in the pixel shader. The PS reads voxelDataBuffer (SRV t3)
to find neighbor materials dynamically, enabling 2 independent blend
axes, stair-priority neighbor detection, and winner-takes-all
heightmap-driven transitions.

Key design decisions validated through 6 iterations (see
blending_experiments.md):
- Winner-takes-all: material with highest heightmap score wins 100%
  (sharp but organic transitions, not smooth gradient)
- Symmetric bias: bias = 0.5 - weight ensures equal chance at border
- Subtractive corner attenuation (param=0.80): xAdj = xEdge -
  saturate(yEdge - 0.80) reduces blend at corners naturally
- Blend zone = 0.25 voxels from each edge (50% of face)
- Debug mode (F4) visualizes blend zones as colors
2026-03-26 12:14:08 +01:00
21f1bd1a12 Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)
Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static
2026-03-26 09:05:52 +01:00
9a8f80de51 Phase 2.4: GPU compute mesher benchmark (CPU greedy vs GPU baseline)
One-shot benchmark runs automatically after world generation:
- CPU greedy mesher: 277ms, 358K quads (binary greedy merge)
- GPU baseline (1x1): 5.3ms, 2.43M quads (no merge, 52x faster)
- Greedy merge reduces quad count by 6.8x

Implementation:
- State machine: DISPATCH (upload voxels + dispatch) → READBACK → DONE
- GPU timestamps for accurate timing
- Readback buffer for quad counter
- Each chunk's voxel data uploaded and dispatched sequentially
2026-03-25 22:51:22 +01:00
1bfadc2f7c Phase 2.3: GPU compute culling with frustum + backface cull
Compute shader fills indirect args buffer, replacing CPU cull loop.
Single DrawInstancedIndirectCount renders all visible face groups.

Key fixes:
- Compute shader: pack chunkIndex|(faceIndex<<16) in push constant,
  startVertexLocation=0 (aligned with Phase 2.2 SV_VertexID fix)
- PushConstants must be called AFTER BindPipelineState, not before.
  Wicked Engine dispatches to SetGraphicsRoot32BitConstants only when
  active_pso is set; after BindComputeShader it targets compute instead.
- Barriers: UNDEFINED(COMMON)→UAV before compute, UAV→INDIRECT_ARGUMENT after
- Buffer decay: DX12 buffers always return to COMMON between frames,
  no cross-frame state tracking needed
2026-03-25 22:30:50 +01:00
45af49a659 Phase 2.2: MDI rendering with CPU-filled indirect args
Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.

Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
  quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)

Also adds voxel_engine_spec.md and updates references from .docx to .md.
2026-03-25 22:07:22 +01:00
abc640c2d0 cleanup 2026-03-25 19:38:50 +01:00
5f346bb14a Phase 2: GPU-driven voxel rendering pipeline
Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture
2026-03-25 14:24:05 +01:00