bvle-voxels

Author	SHA1	Message	Date
Samuel Bouchet	afb86446cd	Fix VRAM leak: capacity-based BLAS/TLAS + deferred toping BLAS upload Per-frame CreateRaytracingAccelerationStructure calls during F3 animation caused VRAM explosion (especially toping BLAS at ~23M vertices). Now all 3 BLASes use capacity-based allocation with 25% headroom — only recreated when vertex count exceeds capacity, otherwise just BuildRaytracingAS with updated desc.vertex_count. TLAS only recreated when instance count changes. Also adds deferred toping BLAS position upload via UpdateBuffer in Render() (topingBLASDirty_ flag), enabling toping shadows to update during animation. Split CLAUDE.md into CLAUDE.md + TROUBLESHOOTING.md for maintainability.	2026-03-30 21:37:39 +02:00
Samuel Bouchet	dac63e3be5	Phase 6.2: toping BLAS shadows + adaptive TMin + perf optimization - Re-enable toping BLAS in TLAS (3 instances: blocky + smooth + topings) with PREFER_FAST_TRACE for optimized BVH traversal (23M tris) - Separate shadow/AO ray origins: shadow uses worldPos directly (zero bias), AO keeps normal bias (0.15) for hemisphere self-avoidance - Adaptive TMin solves self-hit vs gap dilemma: ground (N.y≈1) → TMin=0.002 for tight blade shadows, blade surfaces (N.y≈0) → TMin=0.10 to skip own geometry - Shadow rays 4→3 with tight cone (0.012 rad), AO rays 8→4 (7 total rays/pixel, temporal accumulation compensates) - Remove screen-space contact shadows (doesn't work for thin geometry)	2026-03-30 13:58:57 +02:00
Samuel Bouchet	55c67686f2	Phase 7.1: stylized lighting — hemisphere ambient, colored shadows, rim light, tone mapping Wonderbox-inspired lighting overhaul across all 3 pixel shaders: - Hemisphere ambient (sky blue above, warm brown below) replaces flat ambient - RT shadows lerp toward blue-violet tint instead of plain darkening (factor 0.55) - Rim light (fresnel) with warm golden color on silhouettes (30% on vegetation) - Soft exponential tone mapping + saturation boost in final post-process pass - CB parameters for all lighting values (skyAmbient, groundAmbient, shadowTint, etc.) - Fog color/density centralized from CB instead of hardcoded per-shader - Screenshot mode (CLI "screenshot"): fixed camera, AO convergence, auto-capture - AO noise stability: world-space hash using voxel center + tangent-axis frac position - AO distance-weighted falloff: continuous occlusion values instead of binary hit/miss	2026-03-29 15:00:12 +02:00
Samuel Bouchet	40560c25ef	Phase 6.3: temporal accumulation + IGN noise for RT AO - Interleaved Gradient Noise replaces world-space hash for ray sampling - Cranley-Patterson rotation (golden ratio × frameIndex) per frame - Temporal accumulation: blend 5% current + 95% reprojected history (~20 frames) - aoHistoryTexture_ persists between frames, copy pre-blur for next frame - prevViewProjection added to VoxelCB for screen-space reprojection - Push constants: frameIndex + historyValid for temporal control - Result: nearly noise-free AO with only 8 rays per pixel	2026-03-29 09:55:08 +02:00
Samuel Bouchet	9de53e5293	Phase 6.3: RT ambient occlusion with bilateral blur - 8 cosine-weighted hemisphere rays per pixel (inline ray queries, SM 6.5) - Distance-weighted AO: quadratic falloff (1-hitT/aoRadius)² instead of binary hit/miss - World-space hash seed: voxel coord + tangent-plane frac position (stable, no flicker) - Bilateral blur pipeline: 2-pass separable (H+V), radius 6, depth+normal edge-stopping - 4-pass dispatch: shadow+rawAO → blur H → blur V → apply - AO written to separate R8_UNORM texture, blurred, then applied to color buffer - Debug mode (F5 x3): grayscale AO visualization	2026-03-29 09:31:19 +02:00
Samuel Bouchet	6b41da0932	Phase 6.2: RT shadows — inline ray queries with BLAS/TLAS fix Add shadow compute shader (voxelShadowCS.hlsl) that traces rays toward the sun using DXR inline ray queries (RayQuery<>, SM 6.5). Shadows modulate voxelRT_ in-place via RWTexture2D (no extra render target). Key fixes to Phase 6.1 BLAS/TLAS infrastructure: - Sequential index buffer required: Wicked treats IndexCount=0 with non-null IndexBuffer as "0 indexed triangles" → empty BLAS - Memory barriers between BLAS→TLAS→RT: without GPUBarrier::Memory() the TLAS build races with BLAS builds, causing zero ray hits - inverseViewProjection added to VoxelCB for depth reconstruction F5 toggles shadows OFF→ON→DEBUG (red=hit, green=miss, blue=backface).	2026-03-28 20:01:18 +01:00
Samuel Bouchet	7f36bdae38	Phase 6.1: RT infrastructure — MRT normals + BLAS/TLAS build - Normal render target (R16G16B16A16_SNORM) as MRT SV_TARGET1 in all 3 pixel shaders (voxelPS, voxelTopingPS, voxelSmoothPS) for future RT shadow/AO - BLAS extraction compute shader (voxelBLASExtractCS.hlsl): converts PackedQuad StructuredBuffer to float3 position buffer for DXR BLAS input - Blocky BLAS: single BLAS from all GPU-meshed quads (~1.5M triangles) - Smooth BLAS: single BLAS from smooth vertex buffer directly - TLAS: 2 instances (blocky + smooth), identity transforms, CreateBuffer2 with callback to avoid UpdateBuffer on RAY_TRACING flagged buffers - Fix: Wicked always accesses index_buffer in CreateRaytracingAccelerationStructure via to_internal() even for non-indexed geometry — provide dummy valid buffer	2026-03-28 14:48:11 +01:00
Samuel Bouchet	cd9814e494	Phase 5.2-5.3: CPU perf optimizations + GPU compute Surface Nets CPU smooth mesher optimizations (560ms → 17ms): - VoxelData grid cache eliminates redundant readVoxel calls - Pre-cached 27 neighbor chunk pointers (readVoxelFast) - smoothNear dilation (8 lookups/cell instead of 56) - Early exit via containsSmooth flag on chunks - Thread-local scratch buffers (SmoothScratch ~600KB) - wi::jobsystem parallelization across all cores - Persistent staging vectors for upload TopingSystem optimizations (58ms → 6ms): - collectInstancesParallel() with per-chunk local vectors - Neighbor chunk pointer caching GPU compute Surface Nets (Phase 5.3): - Two-pass compute shader: centroid grid + emit with smooth normals - Pass 1 (voxelSmoothCentroidCS): computes centroids + solid flags for cells [-1..32], cross-chunk neighbor voxel reading - Pass 2 (voxelSmoothCS): reads ONLY from centroid grid, computes area-weighted smooth normals from 12 incident edges per vertex - Batched dispatch: all centroid passes then all emit passes with single UAV→SRV barrier (instead of 2 barriers per chunk) - Smooth chunk filtering: only dispatches chunks with containsSmooth - Centroid grid buffer dynamically sized per smooth chunk count - 1-frame readback delay with auto-redispatch on first frame	2026-03-27 22:30:43 +01:00
Samuel Bouchet	c755f20325	Fix smooth↔blocky gap by extending hasSmooth filter to adjacent cells Cells at the smooth↔blocky boundary had no smooth corners themselves, so the strict hasSmooth filter skipped them entirely. This prevented quad emission between the smooth mesh and blocky territory, leaving a visible gap. Now checks 6-connected neighbor cells for smooth corners, ensuring boundary vertices exist for connecting quads.	2026-03-27 14:39:54 +01:00
Samuel Bouchet	b45d5a1884	Phase 5.1: smooth PS blending uses same logic as blocky PS + debug scene Rewrote voxelSmoothPS.hlsl to derive a dominant face axis from the smooth normal, then use the exact same neighbor verification as voxelPS.hlsl: faceU/faceV tangent tables, stair-priority getNeighborMat(), face-aligned fractional coords, blendZone 0.25, corner attenuation, bleedMask checks. Added generateDebugSmooth() with 11 isolated test configurations (smooth↔blocky transitions, staircases, surrounded patches, reference blocky pairs). Launch with: BVLEVoxels.exe debugsmooth	2026-03-27 14:21:35 +01:00
Samuel Bouchet	aab38bb9b9	Phase 5.1: Naive Surface Nets smooth rendering Implement CPU-side Naive Surface Nets for smooth voxel surfaces (SmoothStone, Snow) coexisting with blocky voxels (Grass, Dirt, Stone, Sand). Key features: - SmoothMesher with binary SDF, centroid vertex placement, per-axis boundary clamping to align with blocky grid at smooth↔blocky transitions - Cross-chunk connectivity: PAD=2 SDF grid, vertex range [-1, CHUNK_SIZE), canonical edge ownership (no duplicate triangles, no z-fighting) - Face normals oriented by edge axis+sign (robust with binary SDF, unlike SDF gradient dot or centroid sampling approaches) - Y-axis winding fix: sharing cells have different spatial arrangement, requiring opposite winding from X and Z axes - GPU mesher treats smooth neighbors as solid (no blocky faces toward smooth) - Material blending: primary (smooth-only) + secondary (all counts) per vertex - Dedicated shaders: voxelSmoothVS (vertex pulling t6) + voxelSmoothPS (triplanar + lerp blending between two materials) - Separate render pass with LoadOp::LOAD after voxels+topings - New materials: SmoothStone (mat 6), blocky Stone (mat 3) and Dirt patches added to world generation for boundary testing	2026-03-27 13:03:55 +01:00
Samuel Bouchet	ef89bd8c49	Phase 4.2: grass blade tufts, stone corner fills/caps, vegetation shading Stone: add corner fill triangles at adjacent open edges and cap triangles at strip terminaisons. Grass: replace bevel strips with tuft-based grass blades — clusters of 3-9 curved double-sided blades with per-tuft height/lean personality and hash-driven placement (quadratic inset 0-0.30 from edge). Vegetation PS uses half-Lambert wrap lighting + translucency for soft stylized shading (inspired by Airborn Trees). Stone keeps classic Lambert.	2026-03-26 18:48:35 +01:00
Samuel Bouchet	bc29a02c35	Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes Add instanced rendering for toping bevels: dedicated shaders (voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances), per-group DrawInstanced in a separate render pass with LoadOp::LOAD. Fix inverted face winding (emitTri auto-winding condition flipped for CW front-facing), slope normals (use inward direction not outward), and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md with Phase 4.1/4.2 documentation.	2026-03-26 17:47:08 +01:00
Samuel Bouchet	d7e69f97ca	Phase 3: PS-based texture blending with winner-takes-all heightmap Replace pre-encoded quad blend data (v1) with per-pixel voxel data lookups in the pixel shader. The PS reads voxelDataBuffer (SRV t3) to find neighbor materials dynamically, enabling 2 independent blend axes, stair-priority neighbor detection, and winner-takes-all heightmap-driven transitions. Key design decisions validated through 6 iterations (see blending_experiments.md): - Winner-takes-all: material with highest heightmap score wins 100% (sharp but organic transitions, not smooth gradient) - Symmetric bias: bias = 0.5 - weight ensures equal chance at border - Subtractive corner attenuation (param=0.80): xAdj = xEdge - saturate(yEdge - 0.80) reduces blend at corners naturally - Blend zone = 0.25 voxels from each edge (50% of face) - Debug mode (F4) visualizes blend zones as colors	2026-03-26 12:14:08 +01:00
Samuel Bouchet	21f1bd1a12	Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS) Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline. Key optimizations identified via CPU profiling (ProfileAccum, 5s averages): - Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms) - VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms) - Dirty-skip: GPU dispatch/upload only when chunks change, not every frame - Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms) - Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static	2026-03-26 09:05:52 +01:00
Samuel Bouchet	9a8f80de51	Phase 2.4: GPU compute mesher benchmark (CPU greedy vs GPU baseline) One-shot benchmark runs automatically after world generation: - CPU greedy mesher: 277ms, 358K quads (binary greedy merge) - GPU baseline (1x1): 5.3ms, 2.43M quads (no merge, 52x faster) - Greedy merge reduces quad count by 6.8x Implementation: - State machine: DISPATCH (upload voxels + dispatch) → READBACK → DONE - GPU timestamps for accurate timing - Readback buffer for quad counter - Each chunk's voxel data uploaded and dispatched sequentially	2026-03-25 22:51:22 +01:00
Samuel Bouchet	1bfadc2f7c	Phase 2.3: GPU compute culling with frustum + backface cull Compute shader fills indirect args buffer, replacing CPU cull loop. Single DrawInstancedIndirectCount renders all visible face groups. Key fixes: - Compute shader: pack chunkIndex\|(faceIndex<<16) in push constant, startVertexLocation=0 (aligned with Phase 2.2 SV_VertexID fix) - PushConstants must be called AFTER BindPipelineState, not before. Wicked Engine dispatches to SetGraphicsRoot32BitConstants only when active_pso is set; after BindComputeShader it targets compute instead. - Barriers: UNDEFINED(COMMON)→UAV before compute, UAV→INDIRECT_ARGUMENT after - Buffer decay: DX12 buffers always return to COMMON between frames, no cross-frame state tracking needed	2026-03-25 22:30:50 +01:00
Samuel Bouchet	45af49a659	Phase 2.2: MDI rendering with CPU-filled indirect args Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount. CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1. Key discoveries: - Wicked Engine command signature includes push constant (20-byte stride, not 16) - SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect - Solution: pack chunkIndex\|(faceIndex<<16) in push constant, VS reconstructs quad offset from GPUChunkInfo lookup - No explicit DX12 barriers needed (implicit promotion from COMMON suffices) Also adds voxel_engine_spec.md and updates references from .docx to .md.	2026-03-25 22:07:22 +01:00
Samuel Bouchet	abc640c2d0	cleanup	2026-03-25 19:38:50 +01:00
Samuel Bouchet	5f346bb14a	Phase 2: GPU-driven voxel rendering pipeline Mega-buffer architecture replacing per-chunk GPU buffers: - Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB) - StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups) - VS reads chunk info via push constants (b999) for driver-safe chunk indexing - CPU frustum culling with wi::primitive::Frustum + AABB per chunk - Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts) - GPU frustum + backface cull compute shader (voxelCullCS.hlsl) - GPU binary mesher compute shader baseline (voxelMeshCS.hlsl) - Indirect draw buffers and timestamp query infrastructure - README with build instructions and project architecture	2026-03-25 14:24:05 +01:00

20 commits