- Load CC0 FreeStylized textures (6 materials: grass, dirt, stone, sand, snow, smoothstone)
as Texture2DArray: t1=albedo+heightmap RGBA, t7=normal maps GL format
- Height-based texture blending: winner-takes-all with sharpness=16, 40% blend zone,
asymmetric bias (coeff 1.6) for resistBleed materials (grass resists sand bleed)
- UDN triplanar normal mapping with 3 critical fixes:
* Use raw normal (NOT abs) in UDN formula — abs inverts lighting on -X/-Y/-Z faces
* sign(normal) correction on tangent X for back-facing UV mirror
* GL green channel flip on Y-projection only (not X/Z where V=worldY is correct)
- Dirt material rendered smooth (FLAG_SMOOTH), ground_02 texture darkened 0.75
- Sun orbit debug mode (F7): 10s cycle with sinusoidal altitude
- Crosshair + face debug HUD (F8): DDA raycast, camera/target/face/normal info
- Screenshot F6 now writes companion .log file with full debug state
- Document UDN pitfalls and logical vs physical coordinates in TROUBLESHOOTING.md
- Add tools/prepare_textures.py for texture pipeline (ZIP → albedo+height RGBA + normal)
- Remove ~430 lines of dead CPU mesh, MDI, and GPU cull render paths
(rebuildMegaBuffer, IndirectDrawArgs, drawCountBuffer, cullShader, etc.)
- Add voxelTopingBLASCS.hlsl compute shader replacing 196ms CPU loop
for toping BLAS position extraction (<1ms on GPU)
- Reduce animation rate from 60Hz to 30Hz (halves CPU regen cost)
- Simplify render() to GPU mesh path only (no conditional branches)
- Remove benchmark state machine and stale mode strings
Per-frame CreateRaytracingAccelerationStructure calls during F3 animation
caused VRAM explosion (especially toping BLAS at ~23M vertices). Now all
3 BLASes use capacity-based allocation with 25% headroom — only recreated
when vertex count exceeds capacity, otherwise just BuildRaytracingAS with
updated desc.vertex_count. TLAS only recreated when instance count changes.
Also adds deferred toping BLAS position upload via UpdateBuffer in Render()
(topingBLASDirty_ flag), enabling toping shadows to update during animation.
Split CLAUDE.md into CLAUDE.md + TROUBLESHOOTING.md for maintainability.
Wonderbox-inspired lighting overhaul across all 3 pixel shaders:
- Hemisphere ambient (sky blue above, warm brown below) replaces flat ambient
- RT shadows lerp toward blue-violet tint instead of plain darkening (factor 0.55)
- Rim light (fresnel) with warm golden color on silhouettes (30% on vegetation)
- Soft exponential tone mapping + saturation boost in final post-process pass
- CB parameters for all lighting values (skyAmbient, groundAmbient, shadowTint, etc.)
- Fog color/density centralized from CB instead of hardcoded per-shader
- Screenshot mode (CLI "screenshot"): fixed camera, AO convergence, auto-capture
- AO noise stability: world-space hash using voxel center + tangent-axis frac position
- AO distance-weighted falloff: continuous occlusion values instead of binary hit/miss
- Interleaved Gradient Noise replaces world-space hash for ray sampling
- Cranley-Patterson rotation (golden ratio × frameIndex) per frame
- Temporal accumulation: blend 5% current + 95% reprojected history (~20 frames)
- aoHistoryTexture_ persists between frames, copy pre-blur for next frame
- prevViewProjection added to VoxelCB for screen-space reprojection
- Push constants: frameIndex + historyValid for temporal control
- Result: nearly noise-free AO with only 8 rays per pixel
- 8 cosine-weighted hemisphere rays per pixel (inline ray queries, SM 6.5)
- Distance-weighted AO: quadratic falloff (1-hitT/aoRadius)² instead of binary hit/miss
- World-space hash seed: voxel coord + tangent-plane frac position (stable, no flicker)
- Bilateral blur pipeline: 2-pass separable (H+V), radius 6, depth+normal edge-stopping
- 4-pass dispatch: shadow+rawAO → blur H → blur V → apply
- AO written to separate R8_UNORM texture, blurred, then applied to color buffer
- Debug mode (F5 x3): grayscale AO visualization
Add shadow compute shader (voxelShadowCS.hlsl) that traces rays toward
the sun using DXR inline ray queries (RayQuery<>, SM 6.5). Shadows
modulate voxelRT_ in-place via RWTexture2D (no extra render target).
Key fixes to Phase 6.1 BLAS/TLAS infrastructure:
- Sequential index buffer required: Wicked treats IndexCount=0 with
non-null IndexBuffer as "0 indexed triangles" → empty BLAS
- Memory barriers between BLAS→TLAS→RT: without GPUBarrier::Memory()
the TLAS build races with BLAS builds, causing zero ray hits
- inverseViewProjection added to VoxelCB for depth reconstruction
F5 toggles shadows OFF→ON→DEBUG (red=hit, green=miss, blue=backface).
- Normal render target (R16G16B16A16_SNORM) as MRT SV_TARGET1 in all 3 pixel
shaders (voxelPS, voxelTopingPS, voxelSmoothPS) for future RT shadow/AO
- BLAS extraction compute shader (voxelBLASExtractCS.hlsl): converts PackedQuad
StructuredBuffer to float3 position buffer for DXR BLAS input
- Blocky BLAS: single BLAS from all GPU-meshed quads (~1.5M triangles)
- Smooth BLAS: single BLAS from smooth vertex buffer directly
- TLAS: 2 instances (blocky + smooth), identity transforms, CreateBuffer2 with
callback to avoid UpdateBuffer on RAY_TRACING flagged buffers
- Fix: Wicked always accesses index_buffer in CreateRaytracingAccelerationStructure
via to_internal() even for non-indexed geometry — provide dummy valid buffer
- Smooth vertex normals: area-weighted accumulation of face normals per
indexed vertex before triangle expansion. Gives Gouraud-smooth shading
without adding geometry.
- Triplanar fix: PS uses geometric normal (ddx/ddy of worldPos) for
texture projection weights, smooth normal for lighting only. Prevents
texture stretching on smoothed surfaces.
- Depth bias: custom rasterizer state (depth_bias=2, slope_scaled=1.0)
on smooth PSO resolves z-fighting at smooth↔blocky overlap.
- hasSmooth filter tightened: check face-adjacent voxels of each corner
(1-voxel reach) instead of neighbor cells' corners (2-cell cascade).
Prevents smooth mesh from extending into underground blocky territory.
Cells at the smooth↔blocky boundary had no smooth corners themselves,
so the strict hasSmooth filter skipped them entirely. This prevented
quad emission between the smooth mesh and blocky territory, leaving
a visible gap. Now checks 6-connected neighbor cells for smooth corners,
ensuring boundary vertices exist for connecting quads.
Rewrote voxelSmoothPS.hlsl to derive a dominant face axis from the smooth
normal, then use the exact same neighbor verification as voxelPS.hlsl:
faceU/faceV tangent tables, stair-priority getNeighborMat(), face-aligned
fractional coords, blendZone 0.25, corner attenuation, bleedMask checks.
Added generateDebugSmooth() with 11 isolated test configurations
(smooth↔blocky transitions, staircases, surrounded patches, reference
blocky pairs). Launch with: BVLEVoxels.exe debugsmooth
Implement CPU-side Naive Surface Nets for smooth voxel surfaces (SmoothStone,
Snow) coexisting with blocky voxels (Grass, Dirt, Stone, Sand).
Key features:
- SmoothMesher with binary SDF, centroid vertex placement, per-axis boundary
clamping to align with blocky grid at smooth↔blocky transitions
- Cross-chunk connectivity: PAD=2 SDF grid, vertex range [-1, CHUNK_SIZE),
canonical edge ownership (no duplicate triangles, no z-fighting)
- Face normals oriented by edge axis+sign (robust with binary SDF, unlike
SDF gradient dot or centroid sampling approaches)
- Y-axis winding fix: sharing cells have different spatial arrangement,
requiring opposite winding from X and Z axes
- GPU mesher treats smooth neighbors as solid (no blocky faces toward smooth)
- Material blending: primary (smooth-only) + secondary (all counts) per vertex
- Dedicated shaders: voxelSmoothVS (vertex pulling t6) + voxelSmoothPS
(triplanar + lerp blending between two materials)
- Separate render pass with LoadOp::LOAD after voxels+topings
- New materials: SmoothStone (mat 6), blocky Stone (mat 3) and Dirt patches
added to world generation for boundary testing
Stone: add corner fill triangles at adjacent open edges and cap
triangles at strip terminaisons. Grass: replace bevel strips with
tuft-based grass blades — clusters of 3-9 curved double-sided
blades with per-tuft height/lean personality and hash-driven
placement (quadratic inset 0-0.30 from edge). Vegetation PS uses
half-Lambert wrap lighting + translucency for soft stylized shading
(inspired by Airborn Trees). Stone keeps classic Lambert.
Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.
- Add bleedMask/resistBleedMask bitmasks to CB for per-material blend control
- Grass: canBleed + resistsBleed (bleeds onto others, nothing bleeds onto it)
- Stone: no bleed (doesn't overflow, but accepts bleed from others)
- Other materials: normal bidirectional blending
- PS checks flags before blending: mainResists → skip, !neighCanBleed → skip
- Flatten terrain (heightScale 64→20) for better surface visibility
- Replace altitude-based material bands with noise-based 2D patches
(3 noise channels create organic patches of all 5 materials on surface)
- Make stone/sand more visually distinct (stone=blue-gray, sand=warm yellow)
- Lower stone heightContrast (1.2→0.5) so neighbors bleed onto it more
Compute shader fills indirect args buffer, replacing CPU cull loop.
Single DrawInstancedIndirectCount renders all visible face groups.
Key fixes:
- Compute shader: pack chunkIndex|(faceIndex<<16) in push constant,
startVertexLocation=0 (aligned with Phase 2.2 SV_VertexID fix)
- PushConstants must be called AFTER BindPipelineState, not before.
Wicked Engine dispatches to SetGraphicsRoot32BitConstants only when
active_pso is set; after BindComputeShader it targets compute instead.
- Barriers: UNDEFINED(COMMON)→UAV before compute, UAV→INDIRECT_ARGUMENT after
- Buffer decay: DX12 buffers always return to COMMON between frames,
no cross-frame state tracking needed
Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.
Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)
Also adds voxel_engine_spec.md and updates references from .docx to .md.
- VS supports dual mode: CPU path (push constants) and MDI path (binary search)
- CPU render loop now does per-face-group draws with backface culling (6 draws/chunk max)
- Frustum planes extracted and populated in constant buffer for GPU cull shader
- GPU cull + MDI path fully implemented but disabled (barrier/state debugging needed)
- GPU timestamp query infrastructure with readback for cull/draw timing
- HUD shows rendering mode (GPU cull vs CPU fallback)
Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture