Commit graph

17 commits

Author SHA1 Message Date
cd9814e494 Phase 5.2-5.3: CPU perf optimizations + GPU compute Surface Nets
CPU smooth mesher optimizations (560ms → 17ms):
- VoxelData grid cache eliminates redundant readVoxel calls
- Pre-cached 27 neighbor chunk pointers (readVoxelFast)
- smoothNear dilation (8 lookups/cell instead of 56)
- Early exit via containsSmooth flag on chunks
- Thread-local scratch buffers (SmoothScratch ~600KB)
- wi::jobsystem parallelization across all cores
- Persistent staging vectors for upload

TopingSystem optimizations (58ms → 6ms):
- collectInstancesParallel() with per-chunk local vectors
- Neighbor chunk pointer caching

GPU compute Surface Nets (Phase 5.3):
- Two-pass compute shader: centroid grid + emit with smooth normals
- Pass 1 (voxelSmoothCentroidCS): computes centroids + solid flags
  for cells [-1..32], cross-chunk neighbor voxel reading
- Pass 2 (voxelSmoothCS): reads ONLY from centroid grid, computes
  area-weighted smooth normals from 12 incident edges per vertex
- Batched dispatch: all centroid passes then all emit passes with
  single UAV→SRV barrier (instead of 2 barriers per chunk)
- Smooth chunk filtering: only dispatches chunks with containsSmooth
- Centroid grid buffer dynamically sized per smooth chunk count
- 1-frame readback delay with auto-redispatch on first frame
2026-03-27 22:30:43 +01:00
d075a8492c Phase 5.1: smooth normals, triplanar fix, depth bias, hasSmooth tighten
- Smooth vertex normals: area-weighted accumulation of face normals per
  indexed vertex before triangle expansion. Gives Gouraud-smooth shading
  without adding geometry.
- Triplanar fix: PS uses geometric normal (ddx/ddy of worldPos) for
  texture projection weights, smooth normal for lighting only. Prevents
  texture stretching on smoothed surfaces.
- Depth bias: custom rasterizer state (depth_bias=2, slope_scaled=1.0)
  on smooth PSO resolves z-fighting at smooth↔blocky overlap.
- hasSmooth filter tightened: check face-adjacent voxels of each corner
  (1-voxel reach) instead of neighbor cells' corners (2-cell cascade).
  Prevents smooth mesh from extending into underground blocky territory.
2026-03-27 15:08:35 +01:00
c755f20325 Fix smooth↔blocky gap by extending hasSmooth filter to adjacent cells
Cells at the smooth↔blocky boundary had no smooth corners themselves,
so the strict hasSmooth filter skipped them entirely. This prevented
quad emission between the smooth mesh and blocky territory, leaving
a visible gap. Now checks 6-connected neighbor cells for smooth corners,
ensuring boundary vertices exist for connecting quads.
2026-03-27 14:39:54 +01:00
b45d5a1884 Phase 5.1: smooth PS blending uses same logic as blocky PS + debug scene
Rewrote voxelSmoothPS.hlsl to derive a dominant face axis from the smooth
normal, then use the exact same neighbor verification as voxelPS.hlsl:
faceU/faceV tangent tables, stair-priority getNeighborMat(), face-aligned
fractional coords, blendZone 0.25, corner attenuation, bleedMask checks.

Added generateDebugSmooth() with 11 isolated test configurations
(smooth↔blocky transitions, staircases, surrounded patches, reference
blocky pairs). Launch with: BVLEVoxels.exe debugsmooth
2026-03-27 14:21:35 +01:00
aab38bb9b9 Phase 5.1: Naive Surface Nets smooth rendering
Implement CPU-side Naive Surface Nets for smooth voxel surfaces (SmoothStone,
Snow) coexisting with blocky voxels (Grass, Dirt, Stone, Sand).

Key features:
- SmoothMesher with binary SDF, centroid vertex placement, per-axis boundary
  clamping to align with blocky grid at smooth↔blocky transitions
- Cross-chunk connectivity: PAD=2 SDF grid, vertex range [-1, CHUNK_SIZE),
  canonical edge ownership (no duplicate triangles, no z-fighting)
- Face normals oriented by edge axis+sign (robust with binary SDF, unlike
  SDF gradient dot or centroid sampling approaches)
- Y-axis winding fix: sharing cells have different spatial arrangement,
  requiring opposite winding from X and Z axes
- GPU mesher treats smooth neighbors as solid (no blocky faces toward smooth)
- Material blending: primary (smooth-only) + secondary (all counts) per vertex
- Dedicated shaders: voxelSmoothVS (vertex pulling t6) + voxelSmoothPS
  (triplanar + lerp blending between two materials)
- Separate render pass with LoadOp::LOAD after voxels+topings
- New materials: SmoothStone (mat 6), blocky Stone (mat 3) and Dirt patches
  added to world generation for boundary testing
2026-03-27 13:03:55 +01:00
9086a794a8 Add wind to grass toping 2026-03-26 18:58:19 +01:00
ef89bd8c49 Phase 4.2: grass blade tufts, stone corner fills/caps, vegetation shading
Stone: add corner fill triangles at adjacent open edges and cap
triangles at strip terminaisons. Grass: replace bevel strips with
tuft-based grass blades — clusters of 3-9 curved double-sided
blades with per-tuft height/lean personality and hash-driven
placement (quadratic inset 0-0.30 from edge). Vegetation PS uses
half-Lambert wrap lighting + translucency for soft stylized shading
(inspired by Airborn Trees). Stone keeps classic Lambert.
2026-03-26 18:48:35 +01:00
bc29a02c35 Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes
Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.
2026-03-26 17:47:08 +01:00
9e777d653b Phase 4.1: TopingSystem infrastructure + procedural mesh generation
- TopingSystem with TopingDef registry, procedural mesh gen, instance collection
- 2 toping types: stone bevel (h=0.06, smooth) + grass edge (h=0.12, bumpy)
- 16 mesh variants per type indexed by 4-bit adjacency bitmask (~6 unique with symmetry)
- Wedge cross-section: outer wall + sloped top, grass has sinusoidal height profile
- Instance collection scans exposed +Y faces, checks same-material neighbors
- Cross-chunk adjacency via VoxelWorld::getVoxel()
- Integrated into VoxelRenderPath: init at Start(), stats in HUD
- ~191K instances, 1920 mesh vertices for 170 chunks (validated)
- Research doc (research_connected_meshes.md) + plan (plan_phase4.md)
2026-03-26 15:27:15 +01:00
f166394b60 Phase 3: per-material bleed flags + patch-based terrain for blend testing
- Add bleedMask/resistBleedMask bitmasks to CB for per-material blend control
  - Grass: canBleed + resistsBleed (bleeds onto others, nothing bleeds onto it)
  - Stone: no bleed (doesn't overflow, but accepts bleed from others)
  - Other materials: normal bidirectional blending
- PS checks flags before blending: mainResists → skip, !neighCanBleed → skip
- Flatten terrain (heightScale 64→20) for better surface visibility
- Replace altitude-based material bands with noise-based 2D patches
  (3 noise channels create organic patches of all 5 materials on surface)
- Make stone/sand more visually distinct (stone=blue-gray, sand=warm yellow)
- Lower stone heightContrast (1.2→0.5) so neighbors bleed onto it more
2026-03-26 12:47:10 +01:00
d7e69f97ca Phase 3: PS-based texture blending with winner-takes-all heightmap
Replace pre-encoded quad blend data (v1) with per-pixel voxel data
lookups in the pixel shader. The PS reads voxelDataBuffer (SRV t3)
to find neighbor materials dynamically, enabling 2 independent blend
axes, stair-priority neighbor detection, and winner-takes-all
heightmap-driven transitions.

Key design decisions validated through 6 iterations (see
blending_experiments.md):
- Winner-takes-all: material with highest heightmap score wins 100%
  (sharp but organic transitions, not smooth gradient)
- Symmetric bias: bias = 0.5 - weight ensures equal chance at border
- Subtractive corner attenuation (param=0.80): xAdj = xEdge -
  saturate(yEdge - 0.80) reduces blend at corners naturally
- Blend zone = 0.25 voxels from each edge (50% of face)
- Debug mode (F4) visualizes blend zones as colors
2026-03-26 12:14:08 +01:00
21f1bd1a12 Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)
Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static
2026-03-26 09:05:52 +01:00
9a8f80de51 Phase 2.4: GPU compute mesher benchmark (CPU greedy vs GPU baseline)
One-shot benchmark runs automatically after world generation:
- CPU greedy mesher: 277ms, 358K quads (binary greedy merge)
- GPU baseline (1x1): 5.3ms, 2.43M quads (no merge, 52x faster)
- Greedy merge reduces quad count by 6.8x

Implementation:
- State machine: DISPATCH (upload voxels + dispatch) → READBACK → DONE
- GPU timestamps for accurate timing
- Readback buffer for quad counter
- Each chunk's voxel data uploaded and dispatched sequentially
2026-03-25 22:51:22 +01:00
1bfadc2f7c Phase 2.3: GPU compute culling with frustum + backface cull
Compute shader fills indirect args buffer, replacing CPU cull loop.
Single DrawInstancedIndirectCount renders all visible face groups.

Key fixes:
- Compute shader: pack chunkIndex|(faceIndex<<16) in push constant,
  startVertexLocation=0 (aligned with Phase 2.2 SV_VertexID fix)
- PushConstants must be called AFTER BindPipelineState, not before.
  Wicked Engine dispatches to SetGraphicsRoot32BitConstants only when
  active_pso is set; after BindComputeShader it targets compute instead.
- Barriers: UNDEFINED(COMMON)→UAV before compute, UAV→INDIRECT_ARGUMENT after
- Buffer decay: DX12 buffers always return to COMMON between frames,
  no cross-frame state tracking needed
2026-03-25 22:30:50 +01:00
45af49a659 Phase 2.2: MDI rendering with CPU-filled indirect args
Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.

Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
  quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)

Also adds voxel_engine_spec.md and updates references from .docx to .md.
2026-03-25 22:07:22 +01:00
46e8f50f37 Phase 2 complete: per-face-group backface culling, frustum planes, GPU cull infrastructure
- VS supports dual mode: CPU path (push constants) and MDI path (binary search)
- CPU render loop now does per-face-group draws with backface culling (6 draws/chunk max)
- Frustum planes extracted and populated in constant buffer for GPU cull shader
- GPU cull + MDI path fully implemented but disabled (barrier/state debugging needed)
- GPU timestamp query infrastructure with readback for cull/draw timing
- HUD shows rendering mode (GPU cull vs CPU fallback)
2026-03-25 14:50:55 +01:00
5f346bb14a Phase 2: GPU-driven voxel rendering pipeline
Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture
2026-03-25 14:24:05 +01:00