bvle-voxels

Author	SHA1	Message	Date
Samuel Bouchet	40560c25ef	Phase 6.3: temporal accumulation + IGN noise for RT AO - Interleaved Gradient Noise replaces world-space hash for ray sampling - Cranley-Patterson rotation (golden ratio × frameIndex) per frame - Temporal accumulation: blend 5% current + 95% reprojected history (~20 frames) - aoHistoryTexture_ persists between frames, copy pre-blur for next frame - prevViewProjection added to VoxelCB for screen-space reprojection - Push constants: frameIndex + historyValid for temporal control - Result: nearly noise-free AO with only 8 rays per pixel	2026-03-29 09:55:08 +02:00
Samuel Bouchet	9de53e5293	Phase 6.3: RT ambient occlusion with bilateral blur - 8 cosine-weighted hemisphere rays per pixel (inline ray queries, SM 6.5) - Distance-weighted AO: quadratic falloff (1-hitT/aoRadius)² instead of binary hit/miss - World-space hash seed: voxel coord + tangent-plane frac position (stable, no flicker) - Bilateral blur pipeline: 2-pass separable (H+V), radius 6, depth+normal edge-stopping - 4-pass dispatch: shadow+rawAO → blur H → blur V → apply - AO written to separate R8_UNORM texture, blurred, then applied to color buffer - Debug mode (F5 x3): grayscale AO visualization	2026-03-29 09:31:19 +02:00
Samuel Bouchet	6b41da0932	Phase 6.2: RT shadows — inline ray queries with BLAS/TLAS fix Add shadow compute shader (voxelShadowCS.hlsl) that traces rays toward the sun using DXR inline ray queries (RayQuery<>, SM 6.5). Shadows modulate voxelRT_ in-place via RWTexture2D (no extra render target). Key fixes to Phase 6.1 BLAS/TLAS infrastructure: - Sequential index buffer required: Wicked treats IndexCount=0 with non-null IndexBuffer as "0 indexed triangles" → empty BLAS - Memory barriers between BLAS→TLAS→RT: without GPUBarrier::Memory() the TLAS build races with BLAS builds, causing zero ray hits - inverseViewProjection added to VoxelCB for depth reconstruction F5 toggles shadows OFF→ON→DEBUG (red=hit, green=miss, blue=backface).	2026-03-28 20:01:18 +01:00
Samuel Bouchet	7f36bdae38	Phase 6.1: RT infrastructure — MRT normals + BLAS/TLAS build - Normal render target (R16G16B16A16_SNORM) as MRT SV_TARGET1 in all 3 pixel shaders (voxelPS, voxelTopingPS, voxelSmoothPS) for future RT shadow/AO - BLAS extraction compute shader (voxelBLASExtractCS.hlsl): converts PackedQuad StructuredBuffer to float3 position buffer for DXR BLAS input - Blocky BLAS: single BLAS from all GPU-meshed quads (~1.5M triangles) - Smooth BLAS: single BLAS from smooth vertex buffer directly - TLAS: 2 instances (blocky + smooth), identity transforms, CreateBuffer2 with callback to avoid UpdateBuffer on RAY_TRACING flagged buffers - Fix: Wicked always accesses index_buffer in CreateRaytracingAccelerationStructure via to_internal() even for non-indexed geometry — provide dummy valid buffer	2026-03-28 14:48:11 +01:00
Samuel Bouchet	cd9814e494	Phase 5.2-5.3: CPU perf optimizations + GPU compute Surface Nets CPU smooth mesher optimizations (560ms → 17ms): - VoxelData grid cache eliminates redundant readVoxel calls - Pre-cached 27 neighbor chunk pointers (readVoxelFast) - smoothNear dilation (8 lookups/cell instead of 56) - Early exit via containsSmooth flag on chunks - Thread-local scratch buffers (SmoothScratch ~600KB) - wi::jobsystem parallelization across all cores - Persistent staging vectors for upload TopingSystem optimizations (58ms → 6ms): - collectInstancesParallel() with per-chunk local vectors - Neighbor chunk pointer caching GPU compute Surface Nets (Phase 5.3): - Two-pass compute shader: centroid grid + emit with smooth normals - Pass 1 (voxelSmoothCentroidCS): computes centroids + solid flags for cells [-1..32], cross-chunk neighbor voxel reading - Pass 2 (voxelSmoothCS): reads ONLY from centroid grid, computes area-weighted smooth normals from 12 incident edges per vertex - Batched dispatch: all centroid passes then all emit passes with single UAV→SRV barrier (instead of 2 barriers per chunk) - Smooth chunk filtering: only dispatches chunks with containsSmooth - Centroid grid buffer dynamically sized per smooth chunk count - 1-frame readback delay with auto-redispatch on first frame	2026-03-27 22:30:43 +01:00
Samuel Bouchet	d075a8492c	Phase 5.1: smooth normals, triplanar fix, depth bias, hasSmooth tighten - Smooth vertex normals: area-weighted accumulation of face normals per indexed vertex before triangle expansion. Gives Gouraud-smooth shading without adding geometry. - Triplanar fix: PS uses geometric normal (ddx/ddy of worldPos) for texture projection weights, smooth normal for lighting only. Prevents texture stretching on smoothed surfaces. - Depth bias: custom rasterizer state (depth_bias=2, slope_scaled=1.0) on smooth PSO resolves z-fighting at smooth↔blocky overlap. - hasSmooth filter tightened: check face-adjacent voxels of each corner (1-voxel reach) instead of neighbor cells' corners (2-cell cascade). Prevents smooth mesh from extending into underground blocky territory.	2026-03-27 15:08:35 +01:00
Samuel Bouchet	c755f20325	Fix smooth↔blocky gap by extending hasSmooth filter to adjacent cells Cells at the smooth↔blocky boundary had no smooth corners themselves, so the strict hasSmooth filter skipped them entirely. This prevented quad emission between the smooth mesh and blocky territory, leaving a visible gap. Now checks 6-connected neighbor cells for smooth corners, ensuring boundary vertices exist for connecting quads.	2026-03-27 14:39:54 +01:00
Samuel Bouchet	b45d5a1884	Phase 5.1: smooth PS blending uses same logic as blocky PS + debug scene Rewrote voxelSmoothPS.hlsl to derive a dominant face axis from the smooth normal, then use the exact same neighbor verification as voxelPS.hlsl: faceU/faceV tangent tables, stair-priority getNeighborMat(), face-aligned fractional coords, blendZone 0.25, corner attenuation, bleedMask checks. Added generateDebugSmooth() with 11 isolated test configurations (smooth↔blocky transitions, staircases, surrounded patches, reference blocky pairs). Launch with: BVLEVoxels.exe debugsmooth	2026-03-27 14:21:35 +01:00
Samuel Bouchet	aab38bb9b9	Phase 5.1: Naive Surface Nets smooth rendering Implement CPU-side Naive Surface Nets for smooth voxel surfaces (SmoothStone, Snow) coexisting with blocky voxels (Grass, Dirt, Stone, Sand). Key features: - SmoothMesher with binary SDF, centroid vertex placement, per-axis boundary clamping to align with blocky grid at smooth↔blocky transitions - Cross-chunk connectivity: PAD=2 SDF grid, vertex range [-1, CHUNK_SIZE), canonical edge ownership (no duplicate triangles, no z-fighting) - Face normals oriented by edge axis+sign (robust with binary SDF, unlike SDF gradient dot or centroid sampling approaches) - Y-axis winding fix: sharing cells have different spatial arrangement, requiring opposite winding from X and Z axes - GPU mesher treats smooth neighbors as solid (no blocky faces toward smooth) - Material blending: primary (smooth-only) + secondary (all counts) per vertex - Dedicated shaders: voxelSmoothVS (vertex pulling t6) + voxelSmoothPS (triplanar + lerp blending between two materials) - Separate render pass with LoadOp::LOAD after voxels+topings - New materials: SmoothStone (mat 6), blocky Stone (mat 3) and Dirt patches added to world generation for boundary testing	2026-03-27 13:03:55 +01:00
Samuel Bouchet	72af8af979	Tweak grass blade color	2026-03-26 20:00:33 +01:00
Samuel Bouchet	36b8de9285	Phase 4.2: match grass blade colors to voxel faces + stronger translucency Use same ambient (0.15, 0.18, 0.25) as voxel PS instead of greener tint. Increase translucency (0.6) to reduce contrast when orbiting around grass. Wrap at 0.85 for balanced lit-side brightness.	2026-03-26 19:07:04 +01:00
Samuel Bouchet	9086a794a8	Add wind to grass toping	2026-03-26 18:58:19 +01:00
Samuel Bouchet	ef89bd8c49	Phase 4.2: grass blade tufts, stone corner fills/caps, vegetation shading Stone: add corner fill triangles at adjacent open edges and cap triangles at strip terminaisons. Grass: replace bevel strips with tuft-based grass blades — clusters of 3-9 curved double-sided blades with per-tuft height/lean personality and hash-driven placement (quadratic inset 0-0.30 from edge). Vegetation PS uses half-Lambert wrap lighting + translucency for soft stylized shading (inspired by Airborn Trees). Stone keeps classic Lambert.	2026-03-26 18:48:35 +01:00
Samuel Bouchet	bc29a02c35	Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes Add instanced rendering for toping bevels: dedicated shaders (voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances), per-group DrawInstanced in a separate render pass with LoadOp::LOAD. Fix inverted face winding (emitTri auto-winding condition flipped for CW front-facing), slope normals (use inward direction not outward), and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md with Phase 4.1/4.2 documentation.	2026-03-26 17:47:08 +01:00
Samuel Bouchet	9e777d653b	Phase 4.1: TopingSystem infrastructure + procedural mesh generation - TopingSystem with TopingDef registry, procedural mesh gen, instance collection - 2 toping types: stone bevel (h=0.06, smooth) + grass edge (h=0.12, bumpy) - 16 mesh variants per type indexed by 4-bit adjacency bitmask (~6 unique with symmetry) - Wedge cross-section: outer wall + sloped top, grass has sinusoidal height profile - Instance collection scans exposed +Y faces, checks same-material neighbors - Cross-chunk adjacency via VoxelWorld::getVoxel() - Integrated into VoxelRenderPath: init at Start(), stats in HUD - ~191K instances, 1920 mesh vertices for 170 chunks (validated) - Research doc (research_connected_meshes.md) + plan (plan_phase4.md)	2026-03-26 15:27:15 +01:00
Samuel Bouchet	f166394b60	Phase 3: per-material bleed flags + patch-based terrain for blend testing - Add bleedMask/resistBleedMask bitmasks to CB for per-material blend control - Grass: canBleed + resistsBleed (bleeds onto others, nothing bleeds onto it) - Stone: no bleed (doesn't overflow, but accepts bleed from others) - Other materials: normal bidirectional blending - PS checks flags before blending: mainResists → skip, !neighCanBleed → skip - Flatten terrain (heightScale 64→20) for better surface visibility - Replace altitude-based material bands with noise-based 2D patches (3 noise channels create organic patches of all 5 materials on surface) - Make stone/sand more visually distinct (stone=blue-gray, sand=warm yellow) - Lower stone heightContrast (1.2→0.5) so neighbors bleed onto it more	2026-03-26 12:47:10 +01:00
Samuel Bouchet	d7e69f97ca	Phase 3: PS-based texture blending with winner-takes-all heightmap Replace pre-encoded quad blend data (v1) with per-pixel voxel data lookups in the pixel shader. The PS reads voxelDataBuffer (SRV t3) to find neighbor materials dynamically, enabling 2 independent blend axes, stair-priority neighbor detection, and winner-takes-all heightmap-driven transitions. Key design decisions validated through 6 iterations (see blending_experiments.md): - Winner-takes-all: material with highest heightmap score wins 100% (sharp but organic transitions, not smooth gradient) - Symmetric bias: bias = 0.5 - weight ensures equal chance at border - Subtractive corner attenuation (param=0.80): xAdj = xEdge - saturate(yEdge - 0.80) reduces blend at corners naturally - Blend zone = 0.25 voxels from each edge (50% of face) - Debug mode (F4) visualizes blend zones as colors	2026-03-26 12:14:08 +01:00
Samuel Bouchet	21f1bd1a12	Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS) Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline. Key optimizations identified via CPU profiling (ProfileAccum, 5s averages): - Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms) - VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms) - Dirty-skip: GPU dispatch/upload only when chunks change, not every frame - Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms) - Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static	2026-03-26 09:05:52 +01:00
Samuel Bouchet	9a8f80de51	Phase 2.4: GPU compute mesher benchmark (CPU greedy vs GPU baseline) One-shot benchmark runs automatically after world generation: - CPU greedy mesher: 277ms, 358K quads (binary greedy merge) - GPU baseline (1x1): 5.3ms, 2.43M quads (no merge, 52x faster) - Greedy merge reduces quad count by 6.8x Implementation: - State machine: DISPATCH (upload voxels + dispatch) → READBACK → DONE - GPU timestamps for accurate timing - Readback buffer for quad counter - Each chunk's voxel data uploaded and dispatched sequentially	2026-03-25 22:51:22 +01:00
Samuel Bouchet	1bfadc2f7c	Phase 2.3: GPU compute culling with frustum + backface cull Compute shader fills indirect args buffer, replacing CPU cull loop. Single DrawInstancedIndirectCount renders all visible face groups. Key fixes: - Compute shader: pack chunkIndex\|(faceIndex<<16) in push constant, startVertexLocation=0 (aligned with Phase 2.2 SV_VertexID fix) - PushConstants must be called AFTER BindPipelineState, not before. Wicked Engine dispatches to SetGraphicsRoot32BitConstants only when active_pso is set; after BindComputeShader it targets compute instead. - Barriers: UNDEFINED(COMMON)→UAV before compute, UAV→INDIRECT_ARGUMENT after - Buffer decay: DX12 buffers always return to COMMON between frames, no cross-frame state tracking needed	2026-03-25 22:30:50 +01:00
Samuel Bouchet	45af49a659	Phase 2.2: MDI rendering with CPU-filled indirect args Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount. CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1. Key discoveries: - Wicked Engine command signature includes push constant (20-byte stride, not 16) - SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect - Solution: pack chunkIndex\|(faceIndex<<16) in push constant, VS reconstructs quad offset from GPUChunkInfo lookup - No explicit DX12 barriers needed (implicit promotion from COMMON suffices) Also adds voxel_engine_spec.md and updates references from .docx to .md.	2026-03-25 22:07:22 +01:00
Samuel Bouchet	abc640c2d0	cleanup	2026-03-25 19:38:50 +01:00
Samuel Bouchet	46e8f50f37	Phase 2 complete: per-face-group backface culling, frustum planes, GPU cull infrastructure - VS supports dual mode: CPU path (push constants) and MDI path (binary search) - CPU render loop now does per-face-group draws with backface culling (6 draws/chunk max) - Frustum planes extracted and populated in constant buffer for GPU cull shader - GPU cull + MDI path fully implemented but disabled (barrier/state debugging needed) - GPU timestamp query infrastructure with readback for cull/draw timing - HUD shows rendering mode (GPU cull vs CPU fallback)	2026-03-25 14:50:55 +01:00
Samuel Bouchet	5f346bb14a	Phase 2: GPU-driven voxel rendering pipeline Mega-buffer architecture replacing per-chunk GPU buffers: - Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB) - StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups) - VS reads chunk info via push constants (b999) for driver-safe chunk indexing - CPU frustum culling with wi::primitive::Frustum + AABB per chunk - Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts) - GPU frustum + backface cull compute shader (voxelCullCS.hlsl) - GPU binary mesher compute shader baseline (voxelMeshCS.hlsl) - Indirect draw buffers and timestamp query infrastructure - README with build instructions and project architecture	2026-03-25 14:24:05 +01:00

24 commits