Commit graph

24 commits

Author SHA1 Message Date
40560c25ef Phase 6.3: temporal accumulation + IGN noise for RT AO
- Interleaved Gradient Noise replaces world-space hash for ray sampling
- Cranley-Patterson rotation (golden ratio × frameIndex) per frame
- Temporal accumulation: blend 5% current + 95% reprojected history (~20 frames)
- aoHistoryTexture_ persists between frames, copy pre-blur for next frame
- prevViewProjection added to VoxelCB for screen-space reprojection
- Push constants: frameIndex + historyValid for temporal control
- Result: nearly noise-free AO with only 8 rays per pixel
2026-03-29 09:55:08 +02:00
9de53e5293 Phase 6.3: RT ambient occlusion with bilateral blur
- 8 cosine-weighted hemisphere rays per pixel (inline ray queries, SM 6.5)
- Distance-weighted AO: quadratic falloff (1-hitT/aoRadius)² instead of binary hit/miss
- World-space hash seed: voxel coord + tangent-plane frac position (stable, no flicker)
- Bilateral blur pipeline: 2-pass separable (H+V), radius 6, depth+normal edge-stopping
- 4-pass dispatch: shadow+rawAO → blur H → blur V → apply
- AO written to separate R8_UNORM texture, blurred, then applied to color buffer
- Debug mode (F5 x3): grayscale AO visualization
2026-03-29 09:31:19 +02:00
6b41da0932 Phase 6.2: RT shadows — inline ray queries with BLAS/TLAS fix
Add shadow compute shader (voxelShadowCS.hlsl) that traces rays toward
the sun using DXR inline ray queries (RayQuery<>, SM 6.5). Shadows
modulate voxelRT_ in-place via RWTexture2D (no extra render target).

Key fixes to Phase 6.1 BLAS/TLAS infrastructure:
- Sequential index buffer required: Wicked treats IndexCount=0 with
  non-null IndexBuffer as "0 indexed triangles" → empty BLAS
- Memory barriers between BLAS→TLAS→RT: without GPUBarrier::Memory()
  the TLAS build races with BLAS builds, causing zero ray hits
- inverseViewProjection added to VoxelCB for depth reconstruction

F5 toggles shadows OFF→ON→DEBUG (red=hit, green=miss, blue=backface).
2026-03-28 20:01:18 +01:00
7f36bdae38 Phase 6.1: RT infrastructure — MRT normals + BLAS/TLAS build
- Normal render target (R16G16B16A16_SNORM) as MRT SV_TARGET1 in all 3 pixel
  shaders (voxelPS, voxelTopingPS, voxelSmoothPS) for future RT shadow/AO
- BLAS extraction compute shader (voxelBLASExtractCS.hlsl): converts PackedQuad
  StructuredBuffer to float3 position buffer for DXR BLAS input
- Blocky BLAS: single BLAS from all GPU-meshed quads (~1.5M triangles)
- Smooth BLAS: single BLAS from smooth vertex buffer directly
- TLAS: 2 instances (blocky + smooth), identity transforms, CreateBuffer2 with
  callback to avoid UpdateBuffer on RAY_TRACING flagged buffers
- Fix: Wicked always accesses index_buffer in CreateRaytracingAccelerationStructure
  via to_internal() even for non-indexed geometry — provide dummy valid buffer
2026-03-28 14:48:11 +01:00
cd9814e494 Phase 5.2-5.3: CPU perf optimizations + GPU compute Surface Nets
CPU smooth mesher optimizations (560ms → 17ms):
- VoxelData grid cache eliminates redundant readVoxel calls
- Pre-cached 27 neighbor chunk pointers (readVoxelFast)
- smoothNear dilation (8 lookups/cell instead of 56)
- Early exit via containsSmooth flag on chunks
- Thread-local scratch buffers (SmoothScratch ~600KB)
- wi::jobsystem parallelization across all cores
- Persistent staging vectors for upload

TopingSystem optimizations (58ms → 6ms):
- collectInstancesParallel() with per-chunk local vectors
- Neighbor chunk pointer caching

GPU compute Surface Nets (Phase 5.3):
- Two-pass compute shader: centroid grid + emit with smooth normals
- Pass 1 (voxelSmoothCentroidCS): computes centroids + solid flags
  for cells [-1..32], cross-chunk neighbor voxel reading
- Pass 2 (voxelSmoothCS): reads ONLY from centroid grid, computes
  area-weighted smooth normals from 12 incident edges per vertex
- Batched dispatch: all centroid passes then all emit passes with
  single UAV→SRV barrier (instead of 2 barriers per chunk)
- Smooth chunk filtering: only dispatches chunks with containsSmooth
- Centroid grid buffer dynamically sized per smooth chunk count
- 1-frame readback delay with auto-redispatch on first frame
2026-03-27 22:30:43 +01:00
d075a8492c Phase 5.1: smooth normals, triplanar fix, depth bias, hasSmooth tighten
- Smooth vertex normals: area-weighted accumulation of face normals per
  indexed vertex before triangle expansion. Gives Gouraud-smooth shading
  without adding geometry.
- Triplanar fix: PS uses geometric normal (ddx/ddy of worldPos) for
  texture projection weights, smooth normal for lighting only. Prevents
  texture stretching on smoothed surfaces.
- Depth bias: custom rasterizer state (depth_bias=2, slope_scaled=1.0)
  on smooth PSO resolves z-fighting at smooth↔blocky overlap.
- hasSmooth filter tightened: check face-adjacent voxels of each corner
  (1-voxel reach) instead of neighbor cells' corners (2-cell cascade).
  Prevents smooth mesh from extending into underground blocky territory.
2026-03-27 15:08:35 +01:00
c755f20325 Fix smooth↔blocky gap by extending hasSmooth filter to adjacent cells
Cells at the smooth↔blocky boundary had no smooth corners themselves,
so the strict hasSmooth filter skipped them entirely. This prevented
quad emission between the smooth mesh and blocky territory, leaving
a visible gap. Now checks 6-connected neighbor cells for smooth corners,
ensuring boundary vertices exist for connecting quads.
2026-03-27 14:39:54 +01:00
b45d5a1884 Phase 5.1: smooth PS blending uses same logic as blocky PS + debug scene
Rewrote voxelSmoothPS.hlsl to derive a dominant face axis from the smooth
normal, then use the exact same neighbor verification as voxelPS.hlsl:
faceU/faceV tangent tables, stair-priority getNeighborMat(), face-aligned
fractional coords, blendZone 0.25, corner attenuation, bleedMask checks.

Added generateDebugSmooth() with 11 isolated test configurations
(smooth↔blocky transitions, staircases, surrounded patches, reference
blocky pairs). Launch with: BVLEVoxels.exe debugsmooth
2026-03-27 14:21:35 +01:00
aab38bb9b9 Phase 5.1: Naive Surface Nets smooth rendering
Implement CPU-side Naive Surface Nets for smooth voxel surfaces (SmoothStone,
Snow) coexisting with blocky voxels (Grass, Dirt, Stone, Sand).

Key features:
- SmoothMesher with binary SDF, centroid vertex placement, per-axis boundary
  clamping to align with blocky grid at smooth↔blocky transitions
- Cross-chunk connectivity: PAD=2 SDF grid, vertex range [-1, CHUNK_SIZE),
  canonical edge ownership (no duplicate triangles, no z-fighting)
- Face normals oriented by edge axis+sign (robust with binary SDF, unlike
  SDF gradient dot or centroid sampling approaches)
- Y-axis winding fix: sharing cells have different spatial arrangement,
  requiring opposite winding from X and Z axes
- GPU mesher treats smooth neighbors as solid (no blocky faces toward smooth)
- Material blending: primary (smooth-only) + secondary (all counts) per vertex
- Dedicated shaders: voxelSmoothVS (vertex pulling t6) + voxelSmoothPS
  (triplanar + lerp blending between two materials)
- Separate render pass with LoadOp::LOAD after voxels+topings
- New materials: SmoothStone (mat 6), blocky Stone (mat 3) and Dirt patches
  added to world generation for boundary testing
2026-03-27 13:03:55 +01:00
72af8af979 Tweak grass blade color 2026-03-26 20:00:33 +01:00
36b8de9285 Phase 4.2: match grass blade colors to voxel faces + stronger translucency
Use same ambient (0.15, 0.18, 0.25) as voxel PS instead of greener
tint. Increase translucency (0.6) to reduce contrast when orbiting
around grass. Wrap at 0.85 for balanced lit-side brightness.
2026-03-26 19:07:04 +01:00
9086a794a8 Add wind to grass toping 2026-03-26 18:58:19 +01:00
ef89bd8c49 Phase 4.2: grass blade tufts, stone corner fills/caps, vegetation shading
Stone: add corner fill triangles at adjacent open edges and cap
triangles at strip terminaisons. Grass: replace bevel strips with
tuft-based grass blades — clusters of 3-9 curved double-sided
blades with per-tuft height/lean personality and hash-driven
placement (quadratic inset 0-0.30 from edge). Vegetation PS uses
half-Lambert wrap lighting + translucency for soft stylized shading
(inspired by Airborn Trees). Stone keeps classic Lambert.
2026-03-26 18:48:35 +01:00
bc29a02c35 Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes
Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.
2026-03-26 17:47:08 +01:00
9e777d653b Phase 4.1: TopingSystem infrastructure + procedural mesh generation
- TopingSystem with TopingDef registry, procedural mesh gen, instance collection
- 2 toping types: stone bevel (h=0.06, smooth) + grass edge (h=0.12, bumpy)
- 16 mesh variants per type indexed by 4-bit adjacency bitmask (~6 unique with symmetry)
- Wedge cross-section: outer wall + sloped top, grass has sinusoidal height profile
- Instance collection scans exposed +Y faces, checks same-material neighbors
- Cross-chunk adjacency via VoxelWorld::getVoxel()
- Integrated into VoxelRenderPath: init at Start(), stats in HUD
- ~191K instances, 1920 mesh vertices for 170 chunks (validated)
- Research doc (research_connected_meshes.md) + plan (plan_phase4.md)
2026-03-26 15:27:15 +01:00
f166394b60 Phase 3: per-material bleed flags + patch-based terrain for blend testing
- Add bleedMask/resistBleedMask bitmasks to CB for per-material blend control
  - Grass: canBleed + resistsBleed (bleeds onto others, nothing bleeds onto it)
  - Stone: no bleed (doesn't overflow, but accepts bleed from others)
  - Other materials: normal bidirectional blending
- PS checks flags before blending: mainResists → skip, !neighCanBleed → skip
- Flatten terrain (heightScale 64→20) for better surface visibility
- Replace altitude-based material bands with noise-based 2D patches
  (3 noise channels create organic patches of all 5 materials on surface)
- Make stone/sand more visually distinct (stone=blue-gray, sand=warm yellow)
- Lower stone heightContrast (1.2→0.5) so neighbors bleed onto it more
2026-03-26 12:47:10 +01:00
d7e69f97ca Phase 3: PS-based texture blending with winner-takes-all heightmap
Replace pre-encoded quad blend data (v1) with per-pixel voxel data
lookups in the pixel shader. The PS reads voxelDataBuffer (SRV t3)
to find neighbor materials dynamically, enabling 2 independent blend
axes, stair-priority neighbor detection, and winner-takes-all
heightmap-driven transitions.

Key design decisions validated through 6 iterations (see
blending_experiments.md):
- Winner-takes-all: material with highest heightmap score wins 100%
  (sharp but organic transitions, not smooth gradient)
- Symmetric bias: bias = 0.5 - weight ensures equal chance at border
- Subtractive corner attenuation (param=0.80): xAdj = xEdge -
  saturate(yEdge - 0.80) reduces blend at corners naturally
- Blend zone = 0.25 voxels from each edge (50% of face)
- Debug mode (F4) visualizes blend zones as colors
2026-03-26 12:14:08 +01:00
21f1bd1a12 Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)
Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static
2026-03-26 09:05:52 +01:00
9a8f80de51 Phase 2.4: GPU compute mesher benchmark (CPU greedy vs GPU baseline)
One-shot benchmark runs automatically after world generation:
- CPU greedy mesher: 277ms, 358K quads (binary greedy merge)
- GPU baseline (1x1): 5.3ms, 2.43M quads (no merge, 52x faster)
- Greedy merge reduces quad count by 6.8x

Implementation:
- State machine: DISPATCH (upload voxels + dispatch) → READBACK → DONE
- GPU timestamps for accurate timing
- Readback buffer for quad counter
- Each chunk's voxel data uploaded and dispatched sequentially
2026-03-25 22:51:22 +01:00
1bfadc2f7c Phase 2.3: GPU compute culling with frustum + backface cull
Compute shader fills indirect args buffer, replacing CPU cull loop.
Single DrawInstancedIndirectCount renders all visible face groups.

Key fixes:
- Compute shader: pack chunkIndex|(faceIndex<<16) in push constant,
  startVertexLocation=0 (aligned with Phase 2.2 SV_VertexID fix)
- PushConstants must be called AFTER BindPipelineState, not before.
  Wicked Engine dispatches to SetGraphicsRoot32BitConstants only when
  active_pso is set; after BindComputeShader it targets compute instead.
- Barriers: UNDEFINED(COMMON)→UAV before compute, UAV→INDIRECT_ARGUMENT after
- Buffer decay: DX12 buffers always return to COMMON between frames,
  no cross-frame state tracking needed
2026-03-25 22:30:50 +01:00
45af49a659 Phase 2.2: MDI rendering with CPU-filled indirect args
Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.

Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
  quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)

Also adds voxel_engine_spec.md and updates references from .docx to .md.
2026-03-25 22:07:22 +01:00
abc640c2d0 cleanup 2026-03-25 19:38:50 +01:00
46e8f50f37 Phase 2 complete: per-face-group backface culling, frustum planes, GPU cull infrastructure
- VS supports dual mode: CPU path (push constants) and MDI path (binary search)
- CPU render loop now does per-face-group draws with backface culling (6 draws/chunk max)
- Frustum planes extracted and populated in constant buffer for GPU cull shader
- GPU cull + MDI path fully implemented but disabled (barrier/state debugging needed)
- GPU timestamp query infrastructure with readback for cull/draw timing
- HUD shows rendering mode (GPU cull vs CPU fallback)
2026-03-25 14:50:55 +01:00
5f346bb14a Phase 2: GPU-driven voxel rendering pipeline
Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture
2026-03-25 14:24:05 +01:00