Compare commits

...

10 commits

Author SHA1 Message Date
626fbaea80 Fix smooth Surface Nets rendering: eliminate faceting, fix blocky junction
- Remove geoN (ddx/ddy) from smooth PS entirely — use smooth interpolated
  normal N for all triplanar sampling (albedo, heightmap, normal map).
  geoN changes discontinuously at triangle edges, causing per-triangle
  faceting in texture weights and normal perturbation.
- Tune consistency-based vertex normal blend to smoothstep(0.70, 0.90):
  snaps to face normal at 90° boundaries (seamless blocky join) while
  preserving smooth normals on curved terrain.
- Unify all 3 edge axes (X/Y/Z) to same smoothstep formula (was mixed
  smoothstep + pow4).
- Remove grass-specific hardcoded shading from both PS (side darkening,
  warm shift, ambient boost) — will be data-driven per-material later.
- Remove CPU SmoothMesher code (GPU-only path).
- Document all findings in TROUBLESHOOTING.md with calibration table.
2026-04-01 20:35:42 +02:00
d5bf499375 Add debug tools 2026-04-01 18:12:58 +02:00
4c50727cb6 Ignore some files 2026-04-01 18:12:53 +02:00
4419c612bd Phase 8: Real stylized textures with UDN triplanar normal mapping
- Load CC0 FreeStylized textures (6 materials: grass, dirt, stone, sand, snow, smoothstone)
  as Texture2DArray: t1=albedo+heightmap RGBA, t7=normal maps GL format
- Height-based texture blending: winner-takes-all with sharpness=16, 40% blend zone,
  asymmetric bias (coeff 1.6) for resistBleed materials (grass resists sand bleed)
- UDN triplanar normal mapping with 3 critical fixes:
  * Use raw normal (NOT abs) in UDN formula — abs inverts lighting on -X/-Y/-Z faces
  * sign(normal) correction on tangent X for back-facing UV mirror
  * GL green channel flip on Y-projection only (not X/Z where V=worldY is correct)
- Dirt material rendered smooth (FLAG_SMOOTH), ground_02 texture darkened 0.75
- Sun orbit debug mode (F7): 10s cycle with sinusoidal altitude
- Crosshair + face debug HUD (F8): DDA raycast, camera/target/face/normal info
- Screenshot F6 now writes companion .log file with full debug state
- Document UDN pitfalls and logical vs physical coordinates in TROUBLESHOOTING.md
- Add tools/prepare_textures.py for texture pipeline (ZIP → albedo+height RGBA + normal)
2026-04-01 13:41:06 +02:00
c2d1a1e0b6 Commit plan and iteration instructions 2026-03-31 20:04:00 +02:00
8ab908054c Fix HDR screenshot, reduce sun size, windowed 1080p by default
- Add F6 in-app screenshot saving voxelRT_ directly (bypasses Windows HDR)
- Shrink sun disc (pow 256), glow (pow 64), and haze (pow 8) for subtler sky
- Launch as centered 1920x1080 window instead of maximized
2026-03-31 14:58:44 +02:00
57ac08f231 Refactor: extract VoxelRTManager, DeferredGPUBuffer, decompose VoxelRenderPath
- Extract DeferredGPUBuffer utility (staging→dirty→capacity GPU buffer pattern)
- Extract VoxelRTManager from VoxelRenderer (~500 lines: BLAS/TLAS, RT shadows+AO)
- Decompose VoxelRenderPath into CameraController, AnimationState, VoxelProfiler
- Replace toping std::sort with O(n) counting sort by (type, variant)
- Update CLAUDE.md architecture docs to reflect new file structure
2026-03-31 13:46:35 +02:00
53df73e5e6 fixes after Improving perfs 2026-03-31 08:53:37 +02:00
0d93cef8f1 GPU profiling + staggered BLAS builds + RT disable during animation
- Add comprehensive GPU timestamp queries for all major operations
  (mesh, smooth mesh, BLAS extract, BLAS build, draw, RT shadows)
- Add full-frame profiling: Wicked Render, GPU Wait/Sync, true FPS
- Stagger BLAS builds during animation: alternate blocky/smooth per
  frame, skip toping BLAS entirely (~130ms savings per frame)
- Auto-disable RT shadows on F3 animation start (prevents stale
  shadow artifacts), auto-restore on F3 stop with full BLAS rebuild
- Split buildAccelerationStructures() with selective build flags
- Result: animation ~24 FPS (CPU-bound on Regenerate 27ms)
  vs previous 2 FPS (GPU-bound on BLAS Build 1368ms)
2026-03-31 02:21:11 +02:00
0d3f8200b4 Refactor: remove dead CPU/MDI paths, GPU BLAS compute, 30Hz animation
- Remove ~430 lines of dead CPU mesh, MDI, and GPU cull render paths
  (rebuildMegaBuffer, IndirectDrawArgs, drawCountBuffer, cullShader, etc.)
- Add voxelTopingBLASCS.hlsl compute shader replacing 196ms CPU loop
  for toping BLAS position extraction (<1ms on GPU)
- Reduce animation rate from 60Hz to 30Hz (halves CPU regen cost)
- Simplify render() to GPU mesh path only (no conditional branches)
- Remove benchmark state machine and stale mode strings
2026-03-31 01:43:53 +02:00
38 changed files with 3184 additions and 2670 deletions

3
.gitignore vendored
View file

@ -29,3 +29,6 @@ Desktop.ini
# Claude Code
.claude/
assets/raw
bvle_screenshot_*.log
bvle_screenshot_*.png

View file

@ -18,7 +18,9 @@ bvle-voxels/
│ │ ├── VoxelTypes.h # Types fondamentaux (VoxelData, PackedQuad, MaterialDesc, ChunkPos)
│ │ ├── VoxelWorld.h/.cpp # Monde voxel (hashmap de chunks, génération procédurale)
│ │ ├── VoxelMesher.h/.cpp # Binary Greedy Mesher CPU + SmoothMesher (Naive Surface Nets)
│ │ ├── VoxelRenderer.h/.cpp# Renderer + VoxelRenderPath (sous-classe RenderPath3D)
│ │ ├── VoxelRenderer.h/.cpp# Renderer + VoxelRenderPath (CameraController, AnimationState, VoxelProfiler)
│ │ ├── VoxelRTManager.h/.cpp # Ray tracing: BLAS/TLAS lifecycle, shadows+AO dispatches
│ │ ├── DeferredGPUBuffer.h # Utilitaire staging→dirty→capacity GPU buffer upload
│ │ └── TopingSystem.h/.cpp # Système de topings (biseaux décoratifs sur faces +Y)
│ └── app/
│ └── main.cpp # Point d'entrée Win32 + crash handler SEH
@ -36,6 +38,11 @@ bvle-voxels/
│ ├── voxelShadowCS.hlsl # Compute shader RT shadows + raw AO (inline ray queries, Phase 6.2+6.3)
│ ├── voxelAOBlurCS.hlsl # Compute shader bilateral AO blur (separable H/V, Phase 6.3)
│ └── voxelAOApplyCS.hlsl # Compute shader AO apply + tone mapping + saturation (Phase 6.3 + 7)
├── assets/
│ ├── voxel/ # Textures stylisées (6 albedo+height RGBA + 6 normal GL, 512x512)
│ └── raw/ # ZIPs sources FreeStylized.com (CC0)
├── tools/
│ └── prepare_textures.py # Script: ZIP → albedo+heightmap RGBA + normal PNG (512x512)
├── CLAUDE.md
└── TROUBLESHOOTING.md # Pièges techniques, debugging, APIs Wicked
```
@ -62,6 +69,38 @@ cmake --build build --config Release --target BVLEVoxels --parallel
Le SDK 10.0.26100 est requis car les headers DX12 (`d3dx12_check_feature_support.h`) fournis par Wicked Engine ne sont pas compatibles avec le SDK 22621.
### Exécution
**IMPORTANT** : Le CWD doit être la **racine du projet**, pas `build/Release/`.
L'exe utilise des chemins relatifs pour les assets (`Content/`) et la compilation shader (`engine/WickedEngine/shaders/`).
```bash
# Lancer normalement (fenêtre 1920x1080 centrée)
build/Release/BVLEVoxels.exe
# Mode screenshot (640x480, capture 3 vues, quitte automatiquement)
build/Release/BVLEVoxels.exe screenshot
# Autres arguments
build/Release/BVLEVoxels.exe debug # Faces colorées par direction
build/Release/BVLEVoxels.exe debugsmooth # Scène smooth debug
build/Release/BVLEVoxels.exe vulkan # Forcer backend Vulkan
```
**Fichiers de sortie** (écrits dans le CWD, donc la racine du projet) :
- `bvle_backlog.txt` — log Wicked Engine
- `bvle_crash.log` + `bvle_crash.dmp` — crash report SEH (si crash)
- `bvle_screenshot_*.png` — captures mode screenshot ou F6
**Raccourcis clavier** :
- `F2` — toggle backlog Wicked
- `F3` — toggle animation terrain (30 Hz)
- `F4` — toggle debug blend
- `F5` — cycle RT shadows/AO (ON → debug shadows → debug AO → OFF)
- `F6` — screenshot in-app (sauvegarde `voxelRT_` en PNG + `.log` compagnon)
- `F7` — toggle sun orbit (cycle 10s, altitude sinusoïdale)
- `F8` — toggle crosshair + debug face info (camera, target, face, normal map proj)
### Post-build automatique (CMakeLists.txt)
Le build copie automatiquement :
@ -129,7 +168,11 @@ Perlin noise 3D, fBm 5 octaves (2 en animation), caves 3D, matériaux par altitu
- **Per-chunk info** : `StructuredBuffer<GPUChunkInfo>` (80 bytes/chunk)
- **Height-based blending** (Phase 3) : PS lit `voxelDataBuffer` (t3), winner-takes-all heightmap, corner attenuation
- **Render targets propres** : `voxelRT_` (R8G8B8A8) + `voxelDepth_` (D32_FLOAT)
- **CPU profiling** : `ProfileAccum` avec moyennes toutes les 5s
- **CPU profiling** : `VoxelProfiler` (21 `ProfileAccum`, moyennes toutes les 5s)
- **DeferredGPUBuffer** : utilitaire pour buffers GPU avec staging CPU, dirty flag, capacity-based growth (25% headroom)
- **VoxelRTManager** (`VoxelRTManager.h/.cpp`) : gère BLAS/TLAS, dispatches RT shadows+AO, isolé du renderer
- **VoxelRenderPath** décomposé en : `CameraController` (mouvement/souris), `AnimationState` (tick terrain), `VoxelProfiler`
- **Toping sort** : counting sort O(n) par (type, variant) au lieu de `std::sort`
## Phases de développement
@ -174,6 +217,17 @@ PS-based heightmap blending, winner-takes-all, corner attenuation subtractive. G
- **7.1** [FAIT] : Hemisphere ambient, colored shadows, rim light, tone mapping + saturation, screenshot mode
### Phase 8 - Textures stylisées réelles [EN COURS]
- **8.1** [FAIT] : Chargement textures CC0 FreeStylized (6 matériaux, albedo+heightmap RGBA, normal maps GL)
- **8.2** [FAIT] : Texture2DArray (t1=albedo+height, t7=normals), triplanar sampling, stb_image loading
- **8.3** [FAIT] : Height-based texture blending (winner-takes-all, sharpness=16, corner attenuation)
- **8.4** [FAIT] : Asymmetric blend pour resistBleed (coeff 1.6), zone de blend 40%
- **8.5** [FAIT] : UDN triplanar normal mapping (sign correction, GL green flip Y-proj only, NO abs)
- **8.6** [FAIT] : Dirt rendu smooth (FLAG_SMOOTH), ground_02 texture assombrie 0.75
- **8.7** [FAIT] : Sun orbit debug (F7, cycle 10s), crosshair + face debug HUD (F8)
- **8.8** [FAIT] : Screenshot F6 avec .log compagnon (camera, target, debug states, RT stats)
## Métriques cibles et résultats
| Métrique | Cible | Résultat (Ryzen 7 9800X3D + RX 9070 XT) |

View file

@ -41,6 +41,14 @@ add_custom_command(TARGET BVLEVoxels POST_BUILD
COMMENT "Copying DXC shader compiler DLL"
)
# Copy voxel texture assets to Content/voxel/ next to the exe
add_custom_command(TARGET BVLEVoxels POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_directory
${CMAKE_SOURCE_DIR}/assets/voxel
$<TARGET_FILE_DIR:BVLEVoxels>/Content/voxel
COMMENT "Copying voxel texture assets"
)
# Copy our custom shader sources into Wicked's shader source tree
# so LoadShader can find and compile them as "voxel/voxelVS.cso"
add_custom_command(TARGET BVLEVoxels POST_BUILD

View file

@ -3,6 +3,8 @@
## Table des matières
- [APIs Wicked utilisées](#apis-wicked-utilisées)
- [Coordonnées logiques vs physiques](#coordonnées-logiques-vs-physiques--piège-majeur)
- [Triplanar UDN Normal Mapping](#triplanar-udn-normal-mapping--pièges-majeurs)
- [Shaders custom — Pièges importants](#shaders-custom--pièges-importants)
1. [Root signature obligatoire](#1-root-signature-obligatoire)
2. [Root signature Wicked (HLSL 6.6+)](#2-root-signature-wicked-hlsl-66)
@ -17,6 +19,7 @@
- [CreateBuffer avec capacity > data size](#createbuffer-avec-capacity--data-size)
- [BLAS/TLAS per-frame recreation — VRAM leak](#blastlas-per-frame-recreation--vram-leak)
- [Diagnostics et debugging](#diagnostics-et-debugging)
- [Smooth Surface Nets — Rendu facetté et jointure blocky](#smooth-surface-nets--rendu-facetté-et-jointure-blocky)
- [Gestion des resource states DX12 (buffers)](#gestion-des-resource-states-dx12-buffers)
---
@ -41,6 +44,107 @@
| Render pass | NE JAMAIS imbriquer ! Un seul render pass actif par command list |
| Debug DX12 | Passer `"debugdevice"` en argument pour activer la couche de debug D3D12 |
| Logging | `wi::backlog::post(message, logLevel)` — préférer au logging fichier |
| Screen size (draw) | **`GetLogicalWidth()`/`GetLogicalHeight()`** pour `wi::font` et `wi::image` (PAS `GetPhysicalWidth`) |
| Solid rect draw | `wi::image::Draw(wi::texturehelper::getWhite(), params, cmd)` — ne PAS passer `nullptr` |
---
## Coordonnées logiques vs physiques — Piège majeur
Wicked Engine distingue deux systèmes de coordonnées écran :
- **Physical** (`GetPhysicalWidth()`/`GetPhysicalHeight()`) : pixels réels du backbuffer. Utilisé pour créer les render targets, viewports, et textures GPU.
- **Logical** (`GetLogicalWidth()`/`GetLogicalHeight()`) : pixels DPI-scaled. **Tout le système 2D de Wicked** (`wi::font::Draw`, `wi::image::Draw`, `wi::image::Params::pos/siz`) travaille en coordonnées logiques.
**Symptôme** : éléments HUD décalés, crosshair excentré, texte hors écran.
```cpp
// ❌ FAUX — décalé si DPI scaling ≠ 100%
float cx = (float)GetPhysicalWidth() * 0.5f;
wi::font::Params fp; fp.posX = cx;
// ✅ CORRECT
float cx = GetLogicalWidth() * 0.5f;
wi::font::Params fp; fp.posX = cx;
```
**Pour dessiner un rectangle solide** (pas de texture) :
```cpp
// ❌ FAUX — ne dessine rien
wi::image::Draw(nullptr, params, cmd);
// ✅ CORRECT — utiliser la texture blanche 1x1 intégrée
#include "wiTextureHelper.h"
wi::image::Draw(wi::texturehelper::getWhite(), params, cmd);
```
La projection 2D est définie dans `wiCanvas.h` :
```cpp
GetProjection() = XMMatrixOrthographicOffCenterLH(0, GetLogicalWidth(), GetLogicalHeight(), 0, -1, 1);
```
---
## Triplanar UDN Normal Mapping — Pièges majeurs
L'implémentation UDN (Unreal Derivative Normal) triplanar pour les normal maps a trois subtilités critiques :
### 1. NE PAS utiliser `abs(normal)` dans la formule UDN
La référence Ben Golus utilise `abs(normal)` car elle cible des terrains (normales toujours vers le haut). Pour des voxels avec 6 directions de faces, `abs()` force la composante dominante à être positive, **inversant l'éclairage sur les faces -X, -Y et -Z**.
```hlsl
// ❌ FAUX — inverse les normales sur 3 faces (le NdotL est faux)
float3 absN = abs(normal);
float3 worldNX = float3(tnX.xy + absN.zy, absN.x).zyx;
// Face -X: absN.x = 1 → résultat pointe vers +X au lieu de -X
// ✅ CORRECT — utiliser le normal brut
float3 worldNX = float3(tnX.xy + normal.zy, normal.x).zyx;
// Face -X: normal.x = -1 → résultat pointe bien vers -X
```
**Diagnostic** : ombres RT correctes (elles utilisent la géométrie) mais éclairage direct inversé sur certaines faces → contradiction visuelle.
### 2. Correction de signe pour les faces négatives
Les UV sont miroir sur les faces négatives. Le `sign(normal)` corrige la composante tangent-space X :
```hlsl
float3 axisSign = sign(normal);
tnX.x *= axisSign.x; // Flip U-tangent pour -X
tnY.x *= axisSign.y; // Flip U-tangent pour -Y
tnZ.x *= axisSign.z; // Flip U-tangent pour -Z
```
### 3. Flip green channel pour les normal maps OpenGL (seulement projection Y)
Les textures `normal_gl` ont le green channel inversé par rapport à DX. En triplanar, seule la **projection Y** (faces horizontales, UV=xz) nécessite le flip — les projections X et Z ont V=world Y qui est naturellement correct.
```hlsl
// ❌ FAUX — casse les faces verticales
tnX.y = -tnX.y; tnY.y = -tnY.y; tnZ.y = -tnZ.y;
// ✅ CORRECT — seulement la projection Y
tnY.y = -tnY.y;
```
**Formule complète correcte** :
```hlsl
float3 axisSign = sign(normal);
float3 tnX = sample(wp.zy).rgb * 2.0 - 1.0;
float3 tnY = sample(wp.xz).rgb * 2.0 - 1.0;
float3 tnZ = sample(wp.xy).rgb * 2.0 - 1.0;
tnY.y = -tnY.y; // GL flip Y-projection only
tnX.x *= axisSign.x; // sign correction
tnY.x *= axisSign.y;
tnZ.x *= axisSign.z;
float3 worldNX = float3(tnX.xy + normal.zy, normal.x).zyx; // RAW normal
float3 worldNY = float3(tnY.xy + normal.xz, normal.y).xzy;
float3 worldNZ = float3(tnZ.xy + normal.xy, normal.z);
return normalize(worldNX * w.x + worldNY * w.y + worldNZ * w.z);
```
---
@ -217,6 +321,66 @@ dev->BuildRaytracingAccelerationStructure(&blas, cmd, nullptr);
---
## Smooth Surface Nets — Rendu facetté et jointure blocky
### Problème 1 : Rendu smooth facetté malgré normales lisses
**Symptôme** : en mode debug (FLAT, NdotL, NORMAL), la surface smooth est parfaitement lisse. Mais en rendu final (ALL), elle apparaît facettée avec des arêtes de triangles visibles.
**Cause racine** : `geoN` (geometric normal via `ddx(worldPos)`/`ddy(worldPos)`) était utilisé pour le triplanar sampling (poids de projection) ET le normal mapping. Cette valeur est la **face normal du triangle à l'écran** — elle change de manière **discontinue** à chaque arête de triangle. Résultat :
1. **Poids triplanar discontinus** → la texture saute aux arêtes (coutures visibles)
2. **Normal map discontinu** → la perturbation normale diffère par triangle → NdotL facetté
Les modes debug étaient lisses car ils utilisaient `flatN` (smooth normal **avant** perturbation normal map), pas le `N` perturbé.
**Correction** : utiliser `N` (smooth interpolated normal) pour **tout** le triplanar dans `voxelSmoothPS.hlsl` :
- Poids triplanar albedo/heightmap → `N` (pas `geoN`)
- Normal map sampling → `N` (pas `geoN`)
- `geoN` n'est plus calculé/utilisé du tout
`N` varie continûment entre vertices → transitions lisses partout.
### Problème 2 : Jointure visible smooth/blocky
**Symptôme** : contraste visible entre faces smooth et blocky adjacentes, quasi-coplanaires.
**Causes racines** (cumulatives) :
1. **Traitements per-material dans un seul PS** — le blocky PS avait un shading spécifique grass (side darkening 60%, warm shift chromatique, ambient boost ×1.15) absent du smooth PS. Pour une face grass +X, ça créait ~40% d'écart de luminosité.
2. **Smooth normals biaisées aux frontières** — les vertex normals aux arêtes 90° (mur smooth → sol) étaient moyennées entre faces perpendiculaires (consistency ≈ 0.707), produisant une normale biaisée vers +Y au lieu de +X pur.
**Correction** :
- **Supprimer les traitements per-material hardcodés** des deux PS. Quand on aura besoin de shading par matériau, le rendre data-driven et l'appliquer identiquement dans les deux shaders.
- **Consistency-based vertex normal blend** dans `voxelSmoothCS.hlsl` : métrique `|Σfn| / Σ|fn|` qui mesure l'accord des face normals incidentes. Les vertices à faible consistency (arêtes nettes, frontières) reçoivent la face normal pure ; les vertices à haute consistency (surfaces courbes) gardent la smooth normal.
### Calibration du seuil de consistency
Le seuil `smoothstep(low, high, consistency)` contrôle le compromis lisse/net :
| Seuil | con=0.707 (90° edge) | con=0.85 (courbe) | con=0.95 (pente) | Résultat |
|---|---|---|---|---|
| `(0.85, 1.0)` | t=0 face ✓ | t=0 face ✗ | t=0.26 ≈ face ✗ | Trop agressif, tout facetté |
| `(0.60, 0.85)` | t=0.27 ≈ 73% face | t=1.0 smooth ✓ | t=1.0 smooth ✓ | Frontière visible, intérieur lisse |
| `(0.70, 0.90)` | t≈0 face ✓ | t=0.84 smooth ✓ | t=1.0 smooth ✓ | **Bon compromis** |
**Valeur retenue : `smoothstep(0.70, 0.90)`** — les arêtes 90° (con ≤ 0.707) reçoivent 100% face normal (jointure nette avec blocky), les courbes modérées (con > 0.85) restent smooth.
### Normal map strength
Le smooth PS utilise `nmStrength * 0.7` (vs `nmStrength * 1.0` pour blocky). Les surfaces courbes nécessitent des normal maps atténuées pour que les perturbations ne cassent pas la continuité visuelle du smooth shading.
### Règles
- **Toute modification de lighting/texturing** dans `voxelPS.hlsl` doit être portée dans `voxelSmoothPS.hlsl` (et vice-versa)
- **Ne JAMAIS utiliser `geoN`** (ddx/ddy) dans le smooth PS pour le triplanar ou le normal mapping — utiliser `N` exclusivement
- Les deux PS doivent produire un résultat identique sur des faces coplanaires de même matériau
**Fichiers** : `shaders/voxelSmoothCS.hlsl` (consistency blend), `shaders/voxelSmoothPS.hlsl` (triplanar + normal map), `shaders/voxelPS.hlsl` (blocky reference)
---
## Gestion des resource states DX12 (buffers)
**Wicked Engine ne fait AUCUN tracking automatique d'état pour les buffers.** Les `GPUBarrier::Buffer(buf, before, after)` sont passées directement à D3D12 sans validation. **Le `state_before` DOIT correspondre à l'état DX12 réel, sinon → DXGI_ERROR_INVALID_CALL.**

Binary file not shown.

After

Width:  |  Height:  |  Size: 327 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 260 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 472 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 228 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 262 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 208 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 431 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 224 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 178 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 186 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 488 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 222 KiB

146
docs/plan-lod-skirts.md Normal file
View file

@ -0,0 +1,146 @@
# LOD multi-resolution avec skirts
Inspire du talk Roblox SIGGRAPH 2020 (p.34-38) et de l'approche Transvoxel.
Objectif : augmenter la distance de vue sans exploser le cout de meshing/rendu.
## Probleme actuel
- Le monde entier (512x512x256 = 8192 chunks potentiels, ~648 actifs) est meshe et
rendu a pleine resolution 32^3
- Le meshing smooth CPU coute 17ms pour 648 chunks (parallelise)
- Le rendu est cheap (0.1ms GPU mesh), mais le meshing smooth bloque le scale-up
- Pas de distance de vue variable : tout ou rien
## Approche Roblox : mip pyramid + skirts
### Principe
1. Chaque chunk stocke un **mip pyramid** de voxels : 32^3, 16^3, 8^3, 4^3, 2^3, 1^3
2. Un **octree** de rendu decide quel niveau de mip utiliser par chunk (distance camera)
3. Les coutures entre chunks de LOD different sont masquees par des **skirts**
4. Les skirts sont des triangles supplementaires avec **depth bias** dans le VS
### Pourquoi des skirts plutot que Transvoxel ?
| | Transvoxel | Skirts (Roblox) |
|--|-----------|-----------------|
| Complexite | Elevee (tables de cas, 73 transition cells) | Faible (1 couche extra + depth bias) |
| Qualite | Parfaite (mesh continu) | Bonne (gaps invisibles grace au depth bias) |
| Cout meshing | +50% (transition cells) | +15% (1 couche de voxels en plus) |
| Integration | Invasive (change le mesher) | Additive (post-process sur le mesh) |
## Plan d'implementation
### Phase A : Mip pyramid storage
**Fichier :** `VoxelWorld.h/.cpp`
```cpp
struct Chunk {
VoxelData voxels[CHUNK_SIZE * CHUNK_SIZE * CHUNK_SIZE]; // LOD 0 (32^3)
// Mip levels stockes a la demande
std::array<std::vector<VoxelData>, 4> mips; // LOD 1-4 (16^3, 8^3, 4^3, 2^3)
uint8_t maxAvailableLOD = 0;
};
```
**Downsampling :** pour chaque groupe 2x2x2, le voxel dominant (material le plus frequent,
occupancy > 4/8) est conserve. Voxels smooth : l'occupancy est moyennee.
**Memoire :** un mip pyramid complet = 32^3 + 16^3 + 8^3 + 4^3 + 2^3 = 37448 voxels
= ~73 Ko par chunk (vs 64 Ko actuellement). Surcout de 14%.
### Phase B : Selection de LOD
**Fichier :** `VoxelRenderer.cpp` (dans le frustum cull ou en CPU)
```cpp
uint8_t selectLOD(const ChunkPos& pos, const XMFLOAT3& cameraPos) {
float dist = distance(chunkCenter(pos), cameraPos);
if (dist < 64.0f) return 0; // pleine resolution
if (dist < 128.0f) return 1; // 16^3
if (dist < 256.0f) return 2; // 8^3
return 3; // 4^3
}
```
Le LOD est passe au mesher. Le mesher binaire greedy et le Surface Nets travaillent
sur le mip correspondant (identiques, juste un tableau plus petit).
Le compute shader `voxelMeshCS` recoit le LOD level et ajuste le chunk size en
consequence. Les positions des quads sont multipliees par `2^LOD` pour rester en
coordonnees monde.
### Phase C : Generation des skirts
**Principe :** quand un chunk a un LOD inferieur (moins detaille) qu'un voisin, des gaps
apparaissent a la frontiere. On genere une "jupe" de geometrie supplementaire pour
les masquer.
**Implementation :**
1. Pour chaque face de chunk adjacente a un chunk de LOD superieur :
- Ajouter une couche supplementaire de voxels dupliques depuis le voisin haute-res
- Mesher normalement cette couche (elle s'etend legerement au-dela du chunk)
2. Taguer les vertices de skirt dans le vertex data (1 bit dans les flags)
3. Dans le VS, appliquer un **depth bias** aux vertices de skirt :
```hlsl
if (isSkirt) {
// Pousse le skirt legerement derriere la surface
output.position.z += 0.0001; // en clip space (reverse-Z: vers le far)
}
```
Le skirt n'est visible que la ou il y a un gap, car la geometrie normale le
masque partout ailleurs grace au depth test.
### Phase D : Integration rendu
**Buffers :** les skirts sont inclus dans le meme mega-buffer de quads, tagges par un bit.
Pas de draw call supplementaire.
**Compute cull :** le compute shader de culling (`voxelCullCS`) recoit le LOD par chunk
dans le GPUChunkInfo. Les chunks LOD > 0 ont moins de quads, donc moins de vertices
a traiter.
**RT :** les BLAS sont construits par chunk. Les chunks LOD > 0 ont des BLAS plus petits.
Le TLAS reste identique.
### Phase E : Smooth LOD specifique
Pour les chunks smooth (Surface Nets), le LOD est plus delicat :
- Le mesh smooth LOD 1 (16^3) a des triangles 2x plus grands
- Les normales sont moins precises
- La deformation par materiau (plan-vertex-deformation.md) doit rester coherente
**Approche :** mesher smooth sur le mip correspondant. Les skirts smooth sont generes
de la meme facon (couche supplementaire). La coherence visuelle est acceptable car
le smooth est deja "flou" par nature.
## Estimation d'effort
| Phase | Effort | Dependance |
|-------|--------|------------|
| A. Mip pyramid | 4h | Aucune |
| B. Selection LOD | 2h | A |
| C. Skirts blocky | 4h | B |
| D. Integration rendu | 3h | C |
| E. Smooth LOD | 4h | B + Phase 5.1 |
| **Total** | **~17h** | |
## Risques
- **Popping** : le changement de LOD est visible si les distances sont trop proches.
Solution : cross-fade ou hysteresis (changer de LOD a dist+10% pour eviter l'oscillation).
- **Skirt artifacts** : si le depth bias est trop grand, les skirts sont visibles comme
des ombres. Tuner le bias par LOD level.
- **Meshing cache** : les mips LOD > 0 changent moins souvent. Cacher le mesh par LOD
level et ne re-mesher que quand le mip change.
## References
- Roblox SIGGRAPH 2020, p.34-38 (skirts + depth bias)
- Transvoxel (Eric Lengyel) : https://transvoxel.org/
- 0fps LOD for blocky voxels : https://0fps.net/2018/03/03/a-level-of-detail-method-for-blocky-voxels/
- Nick Gildea, Dual Contouring seams : http://ngildea.blogspot.com/2014/09/dual-contouring-chunked-terrain.html

View file

@ -0,0 +1,222 @@
# Textures stylisees reelles + quilting
Passer des couleurs procedurales a de vraies textures hand-painted dans un style
Wonderbox / Enshrouded. Inclut la technique de quilting Roblox comme optimisation.
## Etat actuel
- `textureArray_` : 5 layers 256x256 generees proceduralement (bruit + couleur unie)
- `MaterialDesc` : champs `albedoTextureIndex`, `normalTextureIndex`, `heightmapTextureIndex`
deja presents mais pointent vers des textures generees
- Triplanar mapping : fonctionnel dans `voxelPS.hlsl` (blocky) et `voxelSmoothPS.hlsl`
- Height-based blending : fonctionnel (Phase 3), winner-takes-all + corner attenuation
- `sampler_` : deja cree, lineaire avec wrap
L'infrastructure est prete, il manque les textures et l'integration.
## Plan d'implementation
### Phase A : Preparer les textures (art)
**Format cible par materiau :**
| Texture | Format | Contenu | Taille |
|---------|--------|---------|--------|
| Albedo | RGBA8 | RGB = couleur, A = heightmap | 512x512 |
| Normal | RG8 | Normal map tangent-space (BC5) | 512x512 |
La heightmap dans le canal alpha de l'albedo est la convention Roblox et evite une
texture separee. Le height-based blending lit deja un canal height.
**Materiaux a creer (6) :**
1. **Grass** : herbe hand-painted, brins visibles, height map avec pointes hautes
2. **Dirt** : terre seche, crevasses, height map irreguliere
3. **Stone** : pierre grise, fissures, height map avec aretes saillantes
4. **Sand** : sable fin, ondulations, height map douce
5. **Snow** : neige poudreuse, surface quasi-plate, height map tres lisse
6. **Smoothstone** : pierre polie, veines subtiles
**Sources de textures stylisees (libres ou a creer) :**
- Polyhaven (CC0, PBR) : redessiner par-dessus pour le style hand-painted
- Ambientcg (CC0) : bases realistes a simplifier
- Creer from scratch dans Krita/Aseprite en 512x512
### Phase B : Charger les textures dans le texture array
**Fichier :** `VoxelRenderer.cpp`, remplacer `generateTextures()`.
```cpp
void VoxelRenderer::loadTextures() {
// Charger chaque materiau depuis des fichiers PNG/DDS
const char* albedoPaths[] = {
"Content/voxel/grass_albedo.png",
"Content/voxel/dirt_albedo.png",
"Content/voxel/stone_albedo.png",
"Content/voxel/sand_albedo.png",
"Content/voxel/snow_albedo.png",
"Content/voxel/smoothstone_albedo.png",
};
// Creer un texture array 512x512 x N layers
TextureDesc desc;
desc.width = 512;
desc.height = 512;
desc.arraySize = NUM_MATERIALS;
desc.format = Format::R8G8B8A8_UNORM;
desc.bindFlags = BindFlag::SHADER_RESOURCE;
desc.mipLevels = 0; // auto mip generation
// Charger chaque layer via wi::helper::loadTextureFromFile
// puis copier dans le texture array via CopyTexture
}
```
**Wicked Engine helper :** `wi::resourcemanager::Load()` charge PNG/DDS et genere
les mips automatiquement. On peut aussi utiliser `wi::helper::CreateTexture()` avec
des donnees brutes.
**Post-build :** ajouter une copie des textures dans CMakeLists.txt :
```cmake
add_custom_command(TARGET BVLEVoxels POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy_directory
${CMAKE_SOURCE_DIR}/assets/voxel
$<TARGET_FILE_DIR:BVLEVoxels>/Content/voxel
)
```
### Phase C : Adapter les shaders
**Changements dans `voxelPS.hlsl` (blocky) :**
Le shader utilise deja `materialTextures` (texture array) et `triplanarSample()`.
Modifications :
```hlsl
// Actuel : couleur procedurale + texture subtile
float3 color = materialColor * texSample.rgb;
// Nouveau : texture directe, la couleur vient de la texture
float3 albedo = triplanarSample(materialTextures, worldPos, normal, matIndex).rgb;
float height = triplanarSample(materialTextures, worldPos, normal, matIndex).a;
```
Le height est utilise pour le blending inter-materiaux (deja en place).
**Changements dans `voxelSmoothPS.hlsl` :**
Identique. Le triplanar est deja en place, juste remplacer la source de couleur.
### Phase D : Detiling (anti-repetition)
Probleme : le triplanar avec des textures 512x512 montre de la repetition visible
tous les 16 voxels (si tiling = 1 texel/voxel).
**Technique Roblox (p.25) :** rotation + shift pseudo-random par vertex.
```hlsl
// Dans le VS, calculer un seed de detiling a partir de la position
uint detileSeed = hash(uint3(floor(worldPos)));
// Dans le PS, appliquer une rotation/shift aux UVs
float angle = (detileSeed & 0x3) * (3.14159 / 2.0); // 0, 90, 180, 270 degres
float2 rotatedUV = rotate2D(uv, angle);
float2 shiftedUV = rotatedUV + float2(
((detileSeed >> 2) & 0xF) / 16.0,
((detileSeed >> 6) & 0xF) / 16.0
);
```
Cela casse la repetition sans ajouter de samples supplementaires.
Le seed est passe du VS au PS via un interpolant (1 uint).
**Alternative plus simple :** varier le tiling scale par axe triplanar (1.0, 0.97, 1.03).
Casse deja pas mal la repetition pour un cout quasi nul.
### Phase E : Quilting (optimisation optionnelle)
Si le triplanar (3 fetches par texture * N textures) devient un bottleneck :
**Technique Roblox (p.22) :** choisir UN plan de projection par vertex parmi 18 plans
(6 axes * 3 rotations de 30 degres). Encode le plan ID dans le vertex data (5 bits).
```
18 plans = 6 faces * 3 rotations :
+X face : 0deg, 30deg, 60deg
-X face : 0deg, 30deg, 60deg
+Y face : ...
...
```
Le PS n'echantillonne qu'une seule fois par materiau au lieu de 3 (triplanar).
Reduction : de 9 fetches (3 materiaux * 3 axes) a 3 fetches (3 materiaux * 1 plan).
**Pour le blocky :** pas necessaire. Les quads ont une face unique, le triplanar est
deja reduit a 1 axe dominant. Le quilting n'apporte rien.
**Pour le smooth :** potentiellement utile si on blend 3+ materiaux. A mesurer d'abord
si le triplanar est reellement un bottleneck (peu probable avec 6 materiaux).
**Verdict :** reporter le quilting apres avoir mesure. Le triplanar standard devrait
suffire avec notre nombre de materiaux.
### Phase F : Normal mapping
Ajouter une deuxieme texture array pour les normal maps (ou un 2eme set de layers).
```hlsl
// Triplanar normal map sampling
float3 normalMap = triplanarSampleNormal(normalTextures, worldPos, geometricNormal, matIndex);
// Perturber la normale geometrique
float3 finalNormal = normalize(geometricNormal + normalMap * normalStrength);
```
Le triplanar normal mapping necessite de reconstruire le TBN par axe de projection.
C'est un calcul supplementaire mais classique.
**Approche simplifiee :** pour un style hand-painted, les normal maps ne sont pas
obligatoires. L'albedo porte la majeure partie du detail visuel. A evaluer
visuellement avant d'investir du temps.
## Structure des assets
```
assets/
voxel/
grass_albedo.png # RGBA : RGB=couleur, A=heightmap
dirt_albedo.png
stone_albedo.png
sand_albedo.png
snow_albedo.png
smoothstone_albedo.png
# (optionnel) grass_normal.png, etc.
```
## Estimation d'effort
| Phase | Effort | Dependance |
|-------|--------|------------|
| A. Creer textures (art) | 4-8h | Aucune (parallelisable) |
| B. Loader texture array | 3h | A |
| C. Adapter shaders | 2h | B |
| D. Detiling | 2h | C |
| E. Quilting | 4h | C (optionnel) |
| F. Normal maps | 3h | C (optionnel) |
| **Total minimum** | **~11h** | A+B+C+D |
## Risques
- **Style incoherent** : les textures doivent toutes avoir le meme style hand-painted.
Mieux vaut commencer par 2 materiaux (grass+stone) et valider le look avant de
faire les 6.
- **Mip bleeding** : dans un texture array, les mips peuvent bleed entre layers.
Solution : padding 4px autour de chaque texture, ou utiliser des formats compresses
(BC7) avec mips explicites.
- **Tiling visible** : le detiling resout ca, mais necessitee un tuning par materiau.
Les textures doivent etre tileable de base.
## References
- Roblox SIGGRAPH 2020, p.21-29 (quilting, detiling, height-based blend)
- DreamCat Games, Smooth Voxel Mapping : https://bonsairobo.medium.com/smooth-voxel-mapping-a-technical-deep-dive-on-real-time-surface-nets-and-texturing-ef06d0f8ca14
- Real-time Image Quilting (Hugh Malan, SIGGRAPH 2011)

View file

@ -0,0 +1,140 @@
# Deformation de vertices par materiau
Inspiree du talk Roblox SIGGRAPH 2020 (p.19). Chaque materiau definit une deformation
procedurale appliquee aux vertices Surface Nets apres le calcul du centroide.
Donne un caractere visuel distinct a chaque materiau sans cout GPU supplementaire.
## Objectif
Actuellement, tous les materiaux smooth produisent les memes blobs lisses uniformes.
Avec la deformation par materiau :
- La **pierre** aurait des aretes plus marquees (cubify)
- Le **sable** aurait des surfaces ondulees (shift)
- La **neige** resterait lisse (aucune deformation)
- La **glace** aurait des facettes cristallines (quantize)
## Modes de deformation (Roblox)
| Mode | Effet | Formule | Materiaux cibles |
|------|-------|---------|-----------------|
| `None` | Aucune deformation | identity | snow, water |
| `Shift` | Offset pseudo-random | `pos += hash(pos) * amplitude` | sand, dirt |
| `Cubify` | Lerp vers centre du cube | `pos = lerp(pos, round(pos) + 0.5, factor)` | stone, rock |
| `Quantize` | Arrondi a pas fixe | `pos = round(pos * K) / K` | ice, crystal |
| `Barrel` | Cubify uniquement en Y | `pos.y = lerp(pos.y, round(pos.y) + 0.5, f)` | pillars, trunks |
## Integration dans le code existant
### 1. Etendre MaterialDesc (VoxelTypes.h)
```cpp
struct MaterialDesc {
// ... champs existants ...
uint8_t deformMode = 0; // 0=None, 1=Shift, 2=Cubify, 3=Quantize, 4=Barrel
uint8_t deformStrength = 0; // 0-255 -> 0.0-1.0
// remplace _pad ou ajoute 2 bytes (struct reste 16-aligned)
};
```
Pas de changement GPU : la deformation est CPU-only dans le mesher.
### 2. Modifier SmoothMesher (VoxelMesher.cpp)
Le point d'insertion est apres le calcul du centroide, avant l'ecriture dans le buffer
de sortie. Actuellement dans `meshSurfaceNets()` :
```
centroid = average(edge_crossings)
normal = average(triangle_normals)
-> INSERER DEFORMATION ICI
write SmoothVertex(centroid, normal, material)
```
Implementation :
```cpp
// Apres le calcul du centroid et avant l'ecriture du vertex
XMFLOAT3 deformVertex(XMFLOAT3 pos, const MaterialDesc& mat) {
switch (mat.deformMode) {
case 1: { // Shift
float strength = mat.deformStrength / 255.0f;
// Hash stable base sur position entiere (pas de flicker en animation)
uint32_t h = hash3(int(pos.x), int(pos.y), int(pos.z));
float rx = ((h & 0xFF) / 255.0f - 0.5f) * strength;
float ry = (((h >> 8) & 0xFF) / 255.0f - 0.5f) * strength;
float rz = (((h >> 16) & 0xFF) / 255.0f - 0.5f) * strength;
return { pos.x + rx, pos.y + ry, pos.z + rz };
}
case 2: { // Cubify
float f = mat.deformStrength / 255.0f;
float cx = floorf(pos.x) + 0.5f;
float cy = floorf(pos.y) + 0.5f;
float cz = floorf(pos.z) + 0.5f;
return { lerp(pos.x, cx, f), lerp(pos.y, cy, f), lerp(pos.z, cz, f) };
}
case 3: { // Quantize
float K = 2.0f + (mat.deformStrength / 255.0f) * 6.0f; // 2-8 steps
return { roundf(pos.x * K) / K, roundf(pos.y * K) / K, roundf(pos.z * K) / K };
}
case 4: { // Barrel (cubify Y only)
float f = mat.deformStrength / 255.0f;
float cy = floorf(pos.y) + 0.5f;
return { pos.x, lerp(pos.y, cy, f), pos.z };
}
default: return pos;
}
}
```
### 3. Recalculer les normales apres deformation
Les normales moyennees doivent etre recalculees APRES la deformation, sinon elles ne
correspondent plus a la geometrie deformee. Deux options :
**Option A (simple) :** Recalculer les face normals des triangles adjacents apres deformation.
C'est ce que fait deja le pass de normales dans `meshSurfaceNets()`, il suffit de le
deplacer apres la deformation.
**Option B (rapide) :** Garder les normales originales. La deformation est subtile,
l'erreur de normale est visuellement acceptable. Recommande pour le prototype.
### 4. Soft/hard edges par materiau
Roblox controle aussi les aretes douces/dures par materiau. On peut ajouter :
```cpp
uint8_t edgeHardness = 0; // 0=smooth normals, 255=flat/geometric normals
```
Dans le PS, interpoler entre les smooth normals et les geometric normals (deja
disponibles via le triplanar). Cout zero cote mesher, petit calcul PS.
### 5. Configurer les materiaux existants
```cpp
// Dans VoxelWorld::initMaterials() ou equivalent
materials[5].deformMode = 0; materials[5].deformStrength = 0; // snow: lisse
materials[3].deformMode = 2; materials[3].deformStrength = 180; // stone: cubify fort
materials[6].deformMode = 2; materials[6].deformStrength = 100; // smoothstone: cubify leger
materials[4].deformMode = 1; materials[4].deformStrength = 60; // sand: shift subtil
materials[2].deformMode = 1; materials[2].deformStrength = 30; // dirt: shift tres leger
```
## Risques et precautions
- **Self-intersection** : une deformation trop forte peut creer des triangles inverses.
Limiter `deformStrength` a ~200 max et verifier visuellement.
- **Coutures chunk** : la deformation doit etre identique des deux cotes d'une frontiere
de chunk. Le hash base sur la position monde (pas locale) garantit la coherence.
- **Animation** : en mode animation (terrain regenere a 30Hz), la deformation doit etre
stable. Utiliser la position entiere (pas le centroide) comme seed du hash.
## Estimation d'effort
- Etendre MaterialDesc : 15 min
- Fonction deformVertex : 30 min
- Integration dans meshSurfaceNets : 30 min
- Tuning des parametres par materiau : 1h
- **Total : ~2h**
Aucun changement shader, aucun changement GPU buffer, aucun impact performance.

94
docs/plan.md Normal file
View file

@ -0,0 +1,94 @@
# BVLE Voxels - Plan de travail
Fonctionnalites restantes et idees d'evolution, organisees par sujet.
Chaque sujet a un document d'implementation detaille dans `docs/`.
L'etat actuel du prototype est documente dans `CLAUDE.md` a la racine.
---
## Sujets restants de la specification originale
### 1. GPU Compute Surface Nets (Phase 5.3) ✅
Le mesher smooth fonctionne en GPU compute (2-pass: centroid CS + mesh CS).
Auto-bascule GPU/CPU. Shaders: `voxelSmoothCentroidCS.hlsl`, `voxelSmoothCS.hlsl`.
**Statut :** Termine.
### 2. LOD multi-resolution (Phase 5.4)
LOD 1 implementé : chunks 32³ couvrant 64³ world space, lodScale dans GPUChunkInfo,
VS multiplie localPos par lodScale. LOD 0 radius=6, LOD 1 ring radius=12 (480 chunks).
Pas de smooth/topings sur LOD 1.
Reste a faire : LOD 2+, skirts pour cacher les coutures, fog aux bords, LOD dynamique.
**Statut :** En cours. Voir `docs/plan-lod-skirts.md`.
### 3. Fallback Shadow Maps + SSAO (Phase 6.4)
Le ray tracing est obligatoire pour les ombres/AO. Les GPU sans RT (ou les configs
faibles) n'ont aucun eclairage directionnel. Le fallback devrait utiliser le pipeline
existant de Wicked Engine.
**Statut :** Non commence. Priorite basse (tous les GPU cibles supportent RT).
### 4. Connected Blocks / Tuyaux (idee spec)
Blocs contenant des modeles 3D customs avec jointure dynamique selon les voisins
identiques. Exemple : tuyaux qui se connectent automatiquement. Extension du systeme
de topings avec bitmask 6-faces au lieu de 4-adjacence.
**Statut :** Concept uniquement.
---
## Nouveaux sujets (inspires du talk Roblox SIGGRAPH 2020)
### 5. Deformation de vertices par materiau
Roblox definit des deformations procedurales par materiau sur les vertices Surface Nets :
shift (offset random), cubify (lerp vers centre cube), quantize (arrondi a 1/K),
barrel (cubify en Y), soft/hard edges. Donne du caractere visuel a chaque materiau
sans cout GPU.
**Statut :** Non commence. Voir `docs/plan-vertex-deformation.md`.
### 6. LOD avec skirts
Roblox utilise un mip pyramid par chunk + octree LOD. Les coutures entre niveaux LOD
sont resolues par des "skirts" (triangles de debordement + depth bias) au lieu du
stitching Transvoxel, qui est complexe. Solution elegante et simple.
**Statut :** Non commence. Voir `docs/plan-lod-skirts.md`.
### 7. Textures stylisees reelles
Passer des couleurs procedurales actuelles a de vraies textures (albedo + heightmap +
normal) dans un texture array. Triplanar mapping ameliore avec detiling (rotation/shift
par vertex a la Roblox). Height-based blending deja en place cote shader.
**Statut :** Infrastructure presente (texture array 5 layers, triplanar, height blend),
mais textures generees proceduralement. Voir `docs/plan-stylized-textures.md`.
### 8. Texture quilting (Roblox)
Alternative au triplanar : 1 plan de projection parmi 18 par vertex, encode dans le
vertex data. Reduit les fetches de 9-27 a 3. Technique a integrer dans le sujet
textures si le triplanar devient un bottleneck.
**Statut :** Non commence. Integre dans `docs/plan-stylized-textures.md`.
---
## Priorites suggerees
| Priorite | Sujet | Impact | Effort |
|----------|-------|--------|--------|
| 1 | Textures stylisees reelles | Visuel majeur | Moyen |
| 2 | Deformation vertices/materiau | Visuel fort, cout nul | Faible |
| 3 | LOD avec skirts | Scalabilite | Moyen-eleve |
| 4 | GPU Surface Nets | Performance smooth | Moyen |
| 5 | Fallback SM+SSAO | Compatibilite | Faible |
| 6 | Connected blocks | Gameplay | Eleve |

View file

@ -55,12 +55,13 @@ float3 computeSky(float2 uv) {
sky = lerp(horizonColor, nadirColor, h);
}
// Sun glow near sun direction (soft halo)
// Sun glow near sun direction (compact disc + subtle haze)
float3 L = normalize(-sunDirection.xyz);
float sunDot = saturate(dot(viewDir, L));
float sunGlow = pow(sunDot, 32.0) * 0.4;
float sunHaze = pow(sunDot, 4.0) * 0.15;
sky += float3(1.0, 0.85, 0.5) * (sunGlow + sunHaze);
float sunDisc = pow(sunDot, 256.0) * 0.6; // tight bright disc
float sunGlow = pow(sunDot, 64.0) * 0.2; // narrow glow ring
float sunHaze = pow(sunDot, 8.0) * 0.08; // subtle atmospheric haze
sky += float3(1.0, 0.85, 0.5) * (sunDisc + sunGlow + sunHaze);
return sky;
}

View file

@ -5,6 +5,7 @@
#include "voxelCommon.hlsli"
Texture2DArray materialTextures : register(t1);
Texture2DArray normalTextures : register(t7);
SamplerState materialSampler : register(s0);
// Voxel data buffer (same as compute mesher uses) — bound at t3 in GPU mesh path
@ -105,7 +106,7 @@ float3 triplanarWeights(float3 normal, float sharpness) {
// Triplanar sampling — RGB only (non-blended path)
float3 sampleTriplanar(float3 worldPos, float3 normal, uint texIndex, float tiling) {
float3 w = triplanarWeights(normal, 4.0);
float3 colX = materialTextures.Sample(materialSampler, float3(worldPos.yz * tiling, (float)texIndex)).rgb;
float3 colX = materialTextures.Sample(materialSampler, float3(worldPos.zy * tiling, (float)texIndex)).rgb;
float3 colY = materialTextures.Sample(materialSampler, float3(worldPos.xz * tiling, (float)texIndex)).rgb;
float3 colZ = materialTextures.Sample(materialSampler, float3(worldPos.xy * tiling, (float)texIndex)).rgb;
return colX * w.x + colY * w.y + colZ * w.z;
@ -114,12 +115,46 @@ float3 sampleTriplanar(float3 worldPos, float3 normal, uint texIndex, float tili
// Triplanar sampling — RGBA (includes heightmap in alpha)
float4 sampleTriplanarRGBA(float3 worldPos, float3 normal, uint texIndex, float tiling) {
float3 w = triplanarWeights(normal, 4.0);
float4 colX = materialTextures.Sample(materialSampler, float3(worldPos.yz * tiling, (float)texIndex));
float4 colX = materialTextures.Sample(materialSampler, float3(worldPos.zy * tiling, (float)texIndex));
float4 colY = materialTextures.Sample(materialSampler, float3(worldPos.xz * tiling, (float)texIndex));
float4 colZ = materialTextures.Sample(materialSampler, float3(worldPos.xy * tiling, (float)texIndex));
return colX * w.x + colY * w.y + colZ * w.z;
}
// ── Triplanar normal mapping ───────────────────────────────────────
// UDN (Unreal Derivative Normal) triplanar blend.
// For each projection axis, the tangent-space normal's XY perturbs the
// two world-space axes orthogonal to the projection direction.
float3 sampleTriplanarNormal(float3 worldPos, float3 normal, uint texIndex, float tiling) {
float3 w = triplanarWeights(normal, 4.0);
float3 axisSign = sign(normal);
// Sample tangent-space normals per projection axis (Ben Golus UDN triplanar)
float3 tnX = normalTextures.Sample(materialSampler, float3(worldPos.zy * tiling, (float)texIndex)).rgb * 2.0 - 1.0;
float3 tnY = normalTextures.Sample(materialSampler, float3(worldPos.xz * tiling, (float)texIndex)).rgb * 2.0 - 1.0;
float3 tnZ = normalTextures.Sample(materialSampler, float3(worldPos.xy * tiling, (float)texIndex)).rgb * 2.0 - 1.0;
// OpenGL normal maps: flip green channel ONLY for Y-projection (horizontal faces).
// X/Z projections have texture V = world Y (up), which already matches GL convention.
// Y-projection has texture V = world Z, where GL/DX conventions differ.
tnY.y = -tnY.y;
// Sign correction for back-facing projections (Golus reference)
// Flips the tangent-space X to account for mirrored UVs on negative faces.
tnX.x *= axisSign.x;
tnY.x *= axisSign.y;
tnZ.x *= axisSign.z;
// UDN blend using RAW normal (NOT abs!) so that negative faces (-X,-Y,-Z)
// produce normals pointing in the correct direction. abs() would force
// all dominant components positive, inverting lighting on 3 of 6 faces.
float3 worldNX = float3(tnX.xy + normal.zy, normal.x).zyx;
float3 worldNY = float3(tnY.xy + normal.xz, normal.y).xzy;
float3 worldNZ = float3(tnZ.xy + normal.xy, normal.z);
return normalize(worldNX * w.x + worldNY * w.y + worldNZ * w.z);
}
// ── Debug face colors ──────────────────────────────────────────────
static const float3 faceDebugColors[6] = {
float3(1.0, 0.2, 0.2), // 0: +X = RED
@ -158,8 +193,6 @@ PSOutput main(PSInput input)
// ── NORMAL MODE: triplanar textured with height-based blending ──
float3 N = normalize(input.normal);
float3 L = normalize(-sunDirection.xyz);
float NdotL = max(dot(N, L), 0.0);
uint texIndex = clamp(input.materialID - 1u, 0u, 5u);
float tiling = textureTiling;
@ -198,8 +231,8 @@ PSOutput main(PSInput input)
uint uNeighborMat = getNeighborMat(voxelCoord, uEdgeDir, normalDir, input.chunkIndex);
uint vNeighborMat = getNeighborMat(voxelCoord, vEdgeDir, normalDir, input.chunkIndex);
// Blend zone: 0.25 voxels from each edge (covers 50% of face total)
float blendZone = 0.25;
// Blend zone: 0.40 voxels from each edge (covers 80% of face total)
float blendZone = 0.40;
// Edge distances normalized to 0..1 (0=center, 1=edge) for corner attenuation
float uEdge = abs(faceFracU - 0.5) * 2.0; // 0 at center, 1 at edge
@ -213,12 +246,14 @@ PSOutput main(PSInput input)
float uWeight = saturate((uAdj - blendStart) / (1.0 - blendStart)) * 0.5;
float vWeight = saturate((vAdj - blendStart) / (1.0 - blendStart)) * 0.5;
// Only blend if neighbor has a different material AND blend flags allow it:
// - Current material must NOT resist bleed (resistBleedMask)
// - Neighbor material must be allowed to bleed (bleedMask)
// Blend flags:
// - mainResists: current material resists being bled onto → no blending from this side
// - neighResists: neighbor resists bleed → asymmetric blend (neighbor dominates at edge)
bool mainResists = (resistBleedMask >> input.materialID) & 1u;
bool uNeighCanBleed = (bleedMask >> uNeighborMat) & 1u;
bool vNeighCanBleed = (bleedMask >> vNeighborMat) & 1u;
bool uNeighResists = (resistBleedMask >> uNeighborMat) & 1u;
bool vNeighResists = (resistBleedMask >> vNeighborMat) & 1u;
bool uBlend = (uNeighborMat > 0u && uNeighborMat != input.materialID && uWeight > 0.001
&& !mainResists && uNeighCanBleed);
bool vBlend = (vNeighborMat > 0u && vNeighborMat != input.materialID && vWeight > 0.001
@ -258,9 +293,16 @@ PSOutput main(PSInput input)
uint uTexIdx = clamp(uNeighborMat - 1u, 0u, 5u);
float4 uTex = sampleTriplanarRGBA(input.worldPos, N, uTexIdx, tiling);
// Symmetric proximity bias: at edge (weight=0.5) bias=0 → pure heightmap.
// Away from edge (weight=0) bias=0.5 → main always wins.
float bias = 0.5 - uWeight;
// Proximity bias controls heightmap blending:
// Symmetric: at edge (w=0.5) bias=0 → pure heightmap; center (w=0) bias=0.5 → main wins
// Asymmetric (neighbor resists bleed): at edge bias=-0.15 → neighbor gets +0.3
// score advantage (dominates at equal heights); center bias=0.5 → main wins
float bias;
if (uNeighResists) {
bias = 0.5 - uWeight * 1.6;
} else {
bias = 0.5 - uWeight;
}
float mainScore = mainTex.a + bias;
float neighScore = uTex.a - bias;
@ -272,7 +314,12 @@ PSOutput main(PSInput input)
uint vTexIdx = clamp(vNeighborMat - 1u, 0u, 5u);
float4 vTex = sampleTriplanarRGBA(input.worldPos, N, vTexIdx, tiling);
float bias = 0.5 - vWeight;
float bias;
if (vNeighResists) {
bias = 0.5 - vWeight * 1.6;
} else {
bias = 0.5 - vWeight;
}
float mainScore = mainTex.a + bias;
float neighScore = vTex.a - bias;
@ -292,27 +339,54 @@ PSOutput main(PSInput input)
albedo = (input.materialID > 0u) ? texColor : baseColor;
}
// ── Normal map perturbation ──
float3 flatN = N; // preserve flat face normal for ambient + side-darkening
float nmStrength = toneMapParams.z; // 0 = off (F9 toggle)
if (nmStrength > 0.0) {
float3 perturbedN = sampleTriplanarNormal(input.worldPos, N, texIndex, tiling);
N = normalize(lerp(N, perturbedN, nmStrength));
}
// ── Lighting ──
float hemiLerp = N.y * 0.5 + 0.5; // 0=down, 1=up
// Use FLAT normal for hemisphere ambient + side-darkening (consistent per face)
// Use PERTURBED normal for NdotL only (organic detail variation)
float3 L = normalize(-sunDirection.xyz);
float NdotL = max(dot(N, L), 0.0);
float hemiLerp = flatN.y * 0.5 + 0.5; // flat: consistent per face orientation
float3 ambient = lerp(groundAmbient.rgb, skyAmbient.rgb, hemiLerp);
float3 diffuse = sunColor.rgb * NdotL;
// Grass-specific shading (Wonderbox style)
bool isGrass = (texIndex == 0); // material 1 = grass = texture layer 0
if (isGrass) {
// Vertical face darkening: grass sides are darker green (not black)
float verticalDarken = saturate(abs(N.y)); // 1=top, 0=side
float sideFactor = lerp(0.60, 1.0, verticalDarken); // sides at 60% brightness
albedo *= sideFactor;
// Subtle warm shift: sunlit grass slightly warmer
if (NdotL > 0.0) {
float3 warmShift = float3(0.08, 0.05, -0.03) * NdotL;
diffuse += warmShift;
}
// Boost ambient for grass: inter-reflection from dense foliage
ambient *= 1.15;
// ── Debug lighting modes (F9 cycle) ──
uint dbgLight = (uint)toneMapParams.w;
if (dbgLight == 2) {
// FLAT: uniform color per face, no texture, no blend, no normal map
// Pure lighting with flat face normal. If two +X faces differ here, it's a VS/mesher bug.
float flatNdotL = max(dot(flatN, normalize(-sunDirection.xyz)), 0.0);
float flatHemi = flatN.y * 0.5 + 0.5;
float3 flatAmb = lerp(groundAmbient.rgb, skyAmbient.rgb, flatHemi);
float3 flatColor = float3(0.5, 0.5, 0.5) * (flatAmb + sunColor.rgb * flatNdotL);
output.color = float4(flatColor, 1.0);
output.normal = float4(flatN, 0.0);
return output;
}
if (dbgLight == 3) {
// ALBEDO only: texture + blend, no lighting
output.color = float4(albedo, 1.0);
output.normal = float4(flatN, 0.0);
return output;
}
if (dbgLight == 4) {
// NdotL only: grayscale NdotL with flat normal (no normal map)
float flatNdotL = max(dot(flatN, normalize(-sunDirection.xyz)), 0.0);
output.color = float4(flatNdotL, flatNdotL, flatNdotL, 1.0);
output.normal = float4(flatN, 0.0);
return output;
}
if (dbgLight == 5) {
// NORMAL viz: geometric normal mapped to RGB (XYZ → [0,1])
output.color = float4(flatN * 0.5 + 0.5, 1.0);
output.normal = float4(flatN, 0.0);
return output;
}
float3 color = albedo * (ambient + diffuse);

View file

@ -80,11 +80,25 @@ float3 computeQuadFaceNormal(int3 c0, int3 c1, int3 c2, int3 c3,
return fn; // area-weighted (not normalized)
}
// ── Smooth normal for a vertex at cell v ────────────────────────────
// ── Smooth normal + consistency for a vertex at cell v ──────────────
// Checks all 12 incident edges (4 per axis), computes face normals from
// centroid grid, averages them. All reads from grid only.
float3 computeSmoothNormal(int3 v) {
// centroid grid, averages them. Also returns a consistency metric:
// consistency = |sum(fn)| / sum(|fn|)
// = 1.0 when all face normals agree (flat surface)
// ≈ 0.707 at a 90° edge (two perpendicular faces)
// → 0 when faces cancel out
// Used at emission time to blend between smooth normal (interior) and
// face normal (edge vertices).
float3 computeSmoothNormal(int3 v, out float consistency) {
float3 accum = float3(0, 0, 0);
float totalMag = 0;
// Helper macro: accumulate one quad's face normal + its magnitude
#define ACCUM_QUAD(c0,c1,c2,c3,solid,axis) { \
float3 fn_ = computeQuadFaceNormal(c0,c1,c2,c3,solid,axis); \
accum += fn_; \
totalMag += length(fn_); \
}
// X-edges: at (v.x, v.y+dy, v.z+dz) for dy,dz in {0,1}
{
@ -97,30 +111,14 @@ float3 computeSmoothNormal(int3 v) {
bool sv_11 = isCellSolid(int3(v.x, v.y+1, v.z+1));
bool sv_11_x1 = isCellSolid(int3(v.x+1, v.y+1, v.z+1));
// Edge (v.x, v.y, v.z)
if (sv != sv_x1) {
accum += computeQuadFaceNormal(
v + int3(0,-1,-1), v + int3(0,0,-1),
v + int3(0,-1,0), v, sv, 0);
}
// Edge (v.x, v.y+1, v.z)
if (sv_01 != sv_01_x1) {
accum += computeQuadFaceNormal(
int3(v.x, v.y, v.z-1), int3(v.x, v.y+1, v.z-1),
v, int3(v.x, v.y+1, v.z), sv_01, 0);
}
// Edge (v.x, v.y, v.z+1)
if (sv_10 != sv_10_x1) {
accum += computeQuadFaceNormal(
int3(v.x, v.y-1, v.z), v,
int3(v.x, v.y-1, v.z+1), int3(v.x, v.y, v.z+1), sv_10, 0);
}
// Edge (v.x, v.y+1, v.z+1)
if (sv_11 != sv_11_x1) {
accum += computeQuadFaceNormal(
v, int3(v.x, v.y+1, v.z),
int3(v.x, v.y, v.z+1), int3(v.x, v.y+1, v.z+1), sv_11, 0);
}
if (sv != sv_x1)
ACCUM_QUAD(v+int3(0,-1,-1), v+int3(0,0,-1), v+int3(0,-1,0), v, sv, 0)
if (sv_01 != sv_01_x1)
ACCUM_QUAD(int3(v.x,v.y,v.z-1), int3(v.x,v.y+1,v.z-1), v, int3(v.x,v.y+1,v.z), sv_01, 0)
if (sv_10 != sv_10_x1)
ACCUM_QUAD(int3(v.x,v.y-1,v.z), v, int3(v.x,v.y-1,v.z+1), int3(v.x,v.y,v.z+1), sv_10, 0)
if (sv_11 != sv_11_x1)
ACCUM_QUAD(v, int3(v.x,v.y+1,v.z), int3(v.x,v.y,v.z+1), int3(v.x,v.y+1,v.z+1), sv_11, 0)
}
// Y-edges: at (v.x+dx, v.y, v.z+dz) for dx,dz in {0,1}
@ -134,26 +132,14 @@ float3 computeSmoothNormal(int3 v) {
bool sv_11 = isCellSolid(int3(v.x+1, v.y, v.z+1));
bool sv_11_y1 = isCellSolid(int3(v.x+1, v.y+1, v.z+1));
if (sv != sv_y1) {
accum += computeQuadFaceNormal(
v + int3(-1,0,-1), v + int3(0,0,-1),
v + int3(-1,0,0), v, sv, 1);
}
if (sv_10 != sv_10_y1) {
accum += computeQuadFaceNormal(
int3(v.x, v.y, v.z-1), int3(v.x+1, v.y, v.z-1),
v, int3(v.x+1, v.y, v.z), sv_10, 1);
}
if (sv_01 != sv_01_y1) {
accum += computeQuadFaceNormal(
int3(v.x-1, v.y, v.z), v,
int3(v.x-1, v.y, v.z+1), int3(v.x, v.y, v.z+1), sv_01, 1);
}
if (sv_11 != sv_11_y1) {
accum += computeQuadFaceNormal(
v, int3(v.x+1, v.y, v.z),
int3(v.x, v.y, v.z+1), int3(v.x+1, v.y, v.z+1), sv_11, 1);
}
if (sv != sv_y1)
ACCUM_QUAD(v+int3(-1,0,-1), v+int3(0,0,-1), v+int3(-1,0,0), v, sv, 1)
if (sv_10 != sv_10_y1)
ACCUM_QUAD(int3(v.x,v.y,v.z-1), int3(v.x+1,v.y,v.z-1), v, int3(v.x+1,v.y,v.z), sv_10, 1)
if (sv_01 != sv_01_y1)
ACCUM_QUAD(int3(v.x-1,v.y,v.z), v, int3(v.x-1,v.y,v.z+1), int3(v.x,v.y,v.z+1), sv_01, 1)
if (sv_11 != sv_11_y1)
ACCUM_QUAD(v, int3(v.x+1,v.y,v.z), int3(v.x,v.y,v.z+1), int3(v.x+1,v.y,v.z+1), sv_11, 1)
}
// Z-edges: at (v.x+dx, v.y+dy, v.z) for dx,dy in {0,1}
@ -167,30 +153,21 @@ float3 computeSmoothNormal(int3 v) {
bool sv_11 = isCellSolid(int3(v.x+1, v.y+1, v.z));
bool sv_11_z1 = isCellSolid(int3(v.x+1, v.y+1, v.z+1));
if (sv != sv_z1) {
accum += computeQuadFaceNormal(
v + int3(-1,-1,0), v + int3(0,-1,0),
v + int3(-1,0,0), v, sv, 2);
}
if (sv_10 != sv_10_z1) {
accum += computeQuadFaceNormal(
int3(v.x, v.y-1, v.z), int3(v.x+1, v.y-1, v.z),
v, int3(v.x+1, v.y, v.z), sv_10, 2);
}
if (sv_01 != sv_01_z1) {
accum += computeQuadFaceNormal(
int3(v.x-1, v.y, v.z), v,
int3(v.x-1, v.y+1, v.z), int3(v.x, v.y+1, v.z), sv_01, 2);
}
if (sv_11 != sv_11_z1) {
accum += computeQuadFaceNormal(
v, int3(v.x+1, v.y, v.z),
int3(v.x, v.y+1, v.z), int3(v.x+1, v.y+1, v.z), sv_11, 2);
}
if (sv != sv_z1)
ACCUM_QUAD(v+int3(-1,-1,0), v+int3(0,-1,0), v+int3(-1,0,0), v, sv, 2)
if (sv_10 != sv_10_z1)
ACCUM_QUAD(int3(v.x,v.y-1,v.z), int3(v.x+1,v.y-1,v.z), v, int3(v.x+1,v.y,v.z), sv_10, 2)
if (sv_01 != sv_01_z1)
ACCUM_QUAD(int3(v.x-1,v.y,v.z), v, int3(v.x-1,v.y+1,v.z), int3(v.x,v.y+1,v.z), sv_01, 2)
if (sv_11 != sv_11_z1)
ACCUM_QUAD(v, int3(v.x+1,v.y,v.z), int3(v.x,v.y+1,v.z), int3(v.x+1,v.y+1,v.z), sv_11, 2)
}
#undef ACCUM_QUAD
float len = length(accum);
return (len > 0.0001) ? accum / len : float3(0, 1, 0);
float accumLen = length(accum);
// consistency: 1.0 = all faces agree, <1.0 = diverging face directions
consistency = (totalMag > 0.0001) ? accumLen / totalMag : 1.0;
return (accumLen > 0.0001) ? accum / accumLen : float3(0, 1, 0);
}
// ── Emit helpers ────────────────────────────────────────────────────
@ -249,16 +226,30 @@ void main(uint3 DTid : SV_DispatchThreadID)
if (isCentroidValid(cells[0]) && isCentroidValid(cells[1]) &&
isCentroidValid(cells[2]) && isCentroidValid(cells[3])) {
float3 p[4], n[4];
float con[4];
[loop] for (uint i = 0; i < 4; i++)
p[i] = chunkWorldPos + readCentroidPos(cells[i]);
[loop] for (uint i = 0; i < 4; i++)
n[i] = computeSmoothNormal(cells[i]);
n[i] = computeSmoothNormal(cells[i], con[i]);
float3 fn = cross(p[1] - p[0], p[3] - p[0]);
int s = cellSolid ? +1 : -1;
if ((fn.x > 0.0) != (s > 0)) fn = -fn;
bool windingA = !cellSolid;
// Consistency-based blend: sharp edge vertices → face normal, curved → smooth
// consistency ≈ 1.0 = flat, ≈ 0.707 = 90° edge, < 0.5 = sharp corner
// smoothstep(0.70, 0.90): snaps to face normal at 90° boundaries (con<0.70)
// for seamless join with blocky, preserves smooth for terrain curves (con>0.90)
float fnLen = length(fn);
if (fnLen > 0.0001) {
float3 fnN = fn / fnLen;
[loop] for (uint i = 0; i < 4; i++) {
float t = smoothstep(0.70, 0.90, con[i]);
n[i] = normalize(lerp(fnN, n[i], t));
}
}
uint packed = readGridPacked(cells[3]);
uint mat = packed & 0xFF;
uint secMat = (packed >> 8) & 0xFF;
@ -281,10 +272,11 @@ void main(uint3 DTid : SV_DispatchThreadID)
if (isCentroidValid(cells[0]) && isCentroidValid(cells[1]) &&
isCentroidValid(cells[2]) && isCentroidValid(cells[3])) {
float3 p[4], n[4];
float con[4];
[loop] for (uint i = 0; i < 4; i++)
p[i] = chunkWorldPos + readCentroidPos(cells[i]);
[loop] for (uint i = 0; i < 4; i++)
n[i] = computeSmoothNormal(cells[i]);
n[i] = computeSmoothNormal(cells[i], con[i]);
float3 fn = cross(p[1] - p[0], p[3] - p[0]);
int s = cellSolid ? +1 : -1;
@ -292,6 +284,16 @@ void main(uint3 DTid : SV_DispatchThreadID)
bool windingA = !cellSolid;
windingA = !windingA; // Y-axis winding flip
// Consistency-based blend (same formula as X-edge)
float fnLen = length(fn);
if (fnLen > 0.0001) {
float3 fnN = fn / fnLen;
[loop] for (uint i = 0; i < 4; i++) {
float t = smoothstep(0.70, 0.90, con[i]);
n[i] = normalize(lerp(fnN, n[i], t));
}
}
uint packed = readGridPacked(cells[3]);
uint mat = packed & 0xFF;
uint secMat = (packed >> 8) & 0xFF;
@ -314,16 +316,27 @@ void main(uint3 DTid : SV_DispatchThreadID)
if (isCentroidValid(cells[0]) && isCentroidValid(cells[1]) &&
isCentroidValid(cells[2]) && isCentroidValid(cells[3])) {
float3 p[4], n[4];
float con[4];
[loop] for (uint i = 0; i < 4; i++)
p[i] = chunkWorldPos + readCentroidPos(cells[i]);
[loop] for (uint i = 0; i < 4; i++)
n[i] = computeSmoothNormal(cells[i]);
n[i] = computeSmoothNormal(cells[i], con[i]);
float3 fn = cross(p[1] - p[0], p[3] - p[0]);
int s = cellSolid ? +1 : -1;
if ((fn.z > 0.0) != (s > 0)) fn = -fn;
bool windingA = !cellSolid;
// Consistency-based blend (same formula as X-edge)
float fnLen = length(fn);
if (fnLen > 0.0001) {
float3 fnN = fn / fnLen;
[loop] for (uint i = 0; i < 4; i++) {
float t = smoothstep(0.70, 0.90, con[i]);
n[i] = normalize(lerp(fnN, n[i], t));
}
}
uint packed = readGridPacked(cells[3]);
uint mat = packed & 0xFF;
uint secMat = (packed >> 8) & 0xFF;

View file

@ -6,6 +6,7 @@
#include "voxelCommon.hlsli"
Texture2DArray<float4> materialTextures : register(t1);
Texture2DArray<float4> normalTextures : register(t7);
StructuredBuffer<GPUChunkInfo> chunkInfoBuffer : register(t2);
StructuredBuffer<uint> voxelData : register(t3);
SamplerState texSampler : register(s0);
@ -76,7 +77,7 @@ float3 triplanarWeights(float3 n, float sharpness) {
float3 sampleTriplanar(float3 wp, float3 n, uint texIdx, float tiling) {
float3 w = triplanarWeights(n, 4.0);
float3 cx = materialTextures.Sample(texSampler, float3(wp.yz * tiling, (float)texIdx)).rgb;
float3 cx = materialTextures.Sample(texSampler, float3(wp.zy * tiling, (float)texIdx)).rgb;
float3 cy = materialTextures.Sample(texSampler, float3(wp.xz * tiling, (float)texIdx)).rgb;
float3 cz = materialTextures.Sample(texSampler, float3(wp.xy * tiling, (float)texIdx)).rgb;
return cx * w.x + cy * w.y + cz * w.z;
@ -84,12 +85,33 @@ float3 sampleTriplanar(float3 wp, float3 n, uint texIdx, float tiling) {
float4 sampleTriplanarRGBA(float3 wp, float3 n, uint texIdx, float tiling) {
float3 w = triplanarWeights(n, 4.0);
float4 cx = materialTextures.Sample(texSampler, float3(wp.yz * tiling, (float)texIdx));
float4 cx = materialTextures.Sample(texSampler, float3(wp.zy * tiling, (float)texIdx));
float4 cy = materialTextures.Sample(texSampler, float3(wp.xz * tiling, (float)texIdx));
float4 cz = materialTextures.Sample(texSampler, float3(wp.xy * tiling, (float)texIdx));
return cx * w.x + cy * w.y + cz * w.z;
}
// ── Triplanar normal mapping (UDN blend) ────────────────────────
float3 sampleTriplanarNormal(float3 wp, float3 n, uint texIdx, float tiling) {
float3 w = triplanarWeights(n, 4.0);
float3 axisSign = sign(n);
// Ben Golus UDN reference — swizzled coordinates + sign corrections
float3 tnX = normalTextures.Sample(texSampler, float3(wp.zy * tiling, (float)texIdx)).rgb * 2.0 - 1.0;
float3 tnY = normalTextures.Sample(texSampler, float3(wp.xz * tiling, (float)texIdx)).rgb * 2.0 - 1.0;
float3 tnZ = normalTextures.Sample(texSampler, float3(wp.xy * tiling, (float)texIdx)).rgb * 2.0 - 1.0;
// OpenGL normal maps: flip green channel ONLY for Y-projection
tnY.y = -tnY.y;
// Sign correction for back-facing projections
tnX.x *= axisSign.x;
tnY.x *= axisSign.y;
tnZ.x *= axisSign.z;
// UDN blend using RAW normal (NOT abs!) — preserves sign for negative faces
float3 worldNX = float3(tnX.xy + n.zy, n.x).zyx;
float3 worldNY = float3(tnY.xy + n.xz, n.y).xzy;
float3 worldNZ = float3(tnZ.xy + n.xy, n.z);
return normalize(worldNX * w.x + worldNY * w.y + worldNZ * w.z);
}
// ── MRT Output ──────────────────────────────────────────────────
struct PSOutput {
float4 color : SV_TARGET0;
@ -102,14 +124,11 @@ PSOutput main(PSInput input) {
PSOutput output;
float3 N = normalize(input.normal); // smooth normal (for lighting)
// Geometric normal from screen-space derivatives of worldPos.
// This is the true triangle face normal — use it for triplanar weights
// to avoid texture stretching caused by smooth normal interpolation.
float3 dpx = ddx(input.worldPos);
float3 dpy = ddy(input.worldPos);
float3 geoN = normalize(cross(dpx, dpy));
// Ensure geometric normal faces same hemisphere as smooth normal
if (dot(geoN, N) < 0.0) geoN = -geoN;
// NOTE: geoN (ddx/ddy geometric normal) is NOT used for triplanar sampling
// or normal mapping on smooth surfaces. It changes abruptly at triangle edges,
// causing per-triangle faceting in texture weights, normal perturbation, and
// therefore lighting (NdotL). All triplanar operations use N (smooth interpolated
// normal) which varies continuously across vertices → seamless result.
float tiling = textureTiling;
@ -160,7 +179,7 @@ PSOutput main(PSInput input) {
uint vNeighborMat = getNeighborMat(voxelCoord, vEdgeDir, normalDir, input.chunkIndex);
// ── Blend weights (SAME params as blocky PS) ──
float blendZone = 0.25;
float blendZone = 0.40;
float uEdge = abs(faceFracU - 0.5) * 2.0;
float vEdge = abs(faceFracV - 0.5) * 2.0;
@ -175,6 +194,8 @@ PSOutput main(PSInput input) {
bool mainResists = (resistBleedMask >> selfMat) & 1u;
bool uNeighCanBleed = (bleedMask >> uNeighborMat) & 1u;
bool vNeighCanBleed = (bleedMask >> vNeighborMat) & 1u;
bool uNeighResists = (resistBleedMask >> uNeighborMat) & 1u;
bool vNeighResists = (resistBleedMask >> vNeighborMat) & 1u;
bool uBlend = (uNeighborMat > 0u && uNeighborMat != selfMat && uWeight > 0.001
&& !mainResists && uNeighCanBleed);
bool vBlend = (vNeighborMat > 0u && vNeighborMat != selfMat && vWeight > 0.001
@ -185,14 +206,19 @@ PSOutput main(PSInput input) {
float3 albedo;
if (uBlend || vBlend) {
float4 mainTex = sampleTriplanarRGBA(input.worldPos, geoN, selfTexIdx, tiling);
float4 mainTex = sampleTriplanarRGBA(input.worldPos, N, selfTexIdx, tiling);
float3 result = mainTex.rgb;
float sharpness = 16.0;
if (uBlend) {
uint uTexIdx = clamp(uNeighborMat - 1u, 0u, 5u);
float4 uTex = sampleTriplanarRGBA(input.worldPos, geoN, uTexIdx, tiling);
float bias = 0.5 - uWeight;
float4 uTex = sampleTriplanarRGBA(input.worldPos, N, uTexIdx, tiling);
float bias;
if (uNeighResists) {
bias = 0.5 - uWeight * 1.6;
} else {
bias = 0.5 - uWeight;
}
float mainScore = mainTex.a + bias;
float neighScore = uTex.a - bias;
float blend = saturate((neighScore - mainScore) * sharpness + 0.5);
@ -201,8 +227,13 @@ PSOutput main(PSInput input) {
if (vBlend) {
uint vTexIdx = clamp(vNeighborMat - 1u, 0u, 5u);
float4 vTex = sampleTriplanarRGBA(input.worldPos, geoN, vTexIdx, tiling);
float bias = 0.5 - vWeight;
float4 vTex = sampleTriplanarRGBA(input.worldPos, N, vTexIdx, tiling);
float bias;
if (vNeighResists) {
bias = 0.5 - vWeight * 1.6;
} else {
bias = 0.5 - vWeight;
}
float mainScore = mainTex.a + bias;
float neighScore = vTex.a - bias;
float blend = saturate((neighScore - mainScore) * sharpness + 0.5);
@ -211,15 +242,57 @@ PSOutput main(PSInput input) {
albedo = result;
} else {
albedo = sampleTriplanar(input.worldPos, geoN, selfTexIdx, tiling);
albedo = sampleTriplanar(input.worldPos, N, selfTexIdx, tiling);
}
// Lighting
// ── Normal map perturbation ──
float3 flatN = N; // preserve for ambient
float nmStrength = toneMapParams.z;
if (nmStrength > 0.0) {
float3 perturbedN = sampleTriplanarNormal(input.worldPos, N, selfTexIdx, tiling);
N = normalize(lerp(N, perturbedN, nmStrength * 0.7)); // lighter on smooth for softer transitions
}
// ── Lighting ──
float3 L = normalize(-sunDirection.xyz);
float NdotL = max(dot(N, L), 0.0);
float hemiLerp = N.y * 0.5 + 0.5;
float hemiLerp = flatN.y * 0.5 + 0.5;
float3 ambient = lerp(groundAmbient.rgb, skyAmbient.rgb, hemiLerp);
float3 color = albedo * (sunColor.rgb * NdotL + ambient);
float3 diffuse = sunColor.rgb * NdotL;
// ── Debug lighting modes (F9 cycle) ──
uint dbgLight = (uint)toneMapParams.w;
if (dbgLight == 2) {
// FLAT: uniform gray, no texture, no normal map — pure lighting with geometric normal
float flatNdotL = max(dot(flatN, normalize(-sunDirection.xyz)), 0.0);
float flatHemi = flatN.y * 0.5 + 0.5;
float3 flatAmb = lerp(groundAmbient.rgb, skyAmbient.rgb, flatHemi);
float3 flatColor = float3(0.5, 0.5, 0.5) * (flatAmb + sunColor.rgb * flatNdotL);
output.color = float4(flatColor, 1.0);
output.normal = float4(flatN, 0.0);
return output;
}
if (dbgLight == 3) {
// ALBEDO only: texture + blend, no lighting
output.color = float4(albedo, 1.0);
output.normal = float4(flatN, 0.0);
return output;
}
if (dbgLight == 4) {
// NdotL only: grayscale NdotL with flat normal (no normal map)
float flatNdotL = max(dot(flatN, normalize(-sunDirection.xyz)), 0.0);
output.color = float4(flatNdotL, flatNdotL, flatNdotL, 1.0);
output.normal = float4(flatN, 0.0);
return output;
}
if (dbgLight == 5) {
// NORMAL viz: geometric normal mapped to RGB (XYZ → [0,1])
output.color = float4(flatN * 0.5 + 0.5, 1.0);
output.normal = float4(flatN, 0.0);
return output;
}
float3 color = albedo * (ambient + diffuse);
// ── Rim light ──
float3 V = normalize(cameraPosition.xyz - input.worldPos);

View file

@ -0,0 +1,80 @@
// BVLE Voxels - Toping BLAS Position Extraction Compute Shader
// Replaces the 196ms CPU loop that computed world-space toping positions.
// Reads vertex templates (t4) + instance positions (t5) + group table (t7),
// writes flat float3 positions (u0) for DXR BLAS construction.
//
// One thread per output vertex. Group table maps global vertex index to
// the correct (instance, local vertex) pair via prefix-sum offsets.
#include "voxelCommon.hlsli"
// Toping mesh vertex (must match C++ TopingVertex, 24 bytes)
struct TopingVtx {
float3 position; // local to voxel [0,1]^3
float3 normal; // unused here, but struct must match
};
// Toping instance (just the world position, 12 bytes)
struct TopingInst {
float3 worldPos;
};
// Draw group descriptor for BLAS extraction (must match C++ TopingBLASGroupGPU, 20 bytes)
struct TopingBLASGroup {
uint globalVertexOffset; // prefix sum: first global vertex index for this group
uint vertexTemplateOffset; // offset into topingVertices (t4)
uint vertexCount; // vertices per instance (mesh slice count)
uint instanceOffset; // offset into topingInstances (t5)
uint instanceCount; // number of instances in this group
};
StructuredBuffer<TopingVtx> topingVertices : register(t4);
StructuredBuffer<TopingInst> topingInstances : register(t5);
StructuredBuffer<TopingBLASGroup> topingGroups : register(t7);
// Output: raw float3 positions (12 bytes each)
RWByteAddressBuffer blasPositions : register(u0);
// Push constants (b999)
struct TopingBLASPush {
uint totalVertices;
uint groupCount;
uint pad0, pad1, pad2, pad3, pad4, pad5, pad6, pad7, pad8, pad9;
};
[[vk::push_constant]] ConstantBuffer<TopingBLASPush> push : register(b999);
void storeFloat3(uint byteOffset, float3 v) {
blasPositions.Store(byteOffset, asuint(v.x));
blasPositions.Store(byteOffset + 4, asuint(v.y));
blasPositions.Store(byteOffset + 8, asuint(v.z));
}
[RootSignature(VOXEL_ROOTSIG)]
[numthreads(64, 1, 1)]
void main(uint3 DTid : SV_DispatchThreadID) {
uint globalIdx = DTid.x;
if (globalIdx >= push.totalVertices) return;
// Find which group this vertex belongs to (linear scan, max ~32 groups)
uint groupIdx = 0;
for (uint g = 1; g < push.groupCount; g++) {
if (globalIdx >= topingGroups[g].globalVertexOffset)
groupIdx = g;
else
break;
}
TopingBLASGroup grp = topingGroups[groupIdx];
// Map global vertex to (instance, local vertex) within this group
uint localIdx = globalIdx - grp.globalVertexOffset;
uint instanceIdx = grp.instanceOffset + localIdx / grp.vertexCount;
uint vertexIdx = grp.vertexTemplateOffset + localIdx % grp.vertexCount;
TopingVtx vtx = topingVertices[vertexIdx];
TopingInst inst = topingInstances[instanceIdx];
float3 worldPos = inst.worldPos + vtx.position;
storeFloat3(globalIdx * 12, worldPos);
}

View file

@ -50,13 +50,15 @@ VSOutput main(uint vertexID : SV_VertexID, uint instanceID : SV_InstanceID) {
// Quadratic scaling: base stays anchored, tips sway the most.
if (push.materialID != 3u) { // not stone
float localHeight = vtx.position.y - 1.0;
float amplitude = 2.0;
float frequency = 1.4;
if (localHeight > 0.0) {
float heightFactor = localHeight * localHeight; // quadratic
float phase = worldPos.x * 1.8 + worldPos.z * 1.3 + windTime * 3.5;
float phase2 = worldPos.x * 0.7 - worldPos.z * 2.1 + windTime * 2.7;
float swayX = sin(phase) * 0.11 * heightFactor;
float swayZ = cos(phase2) * 0.08 * heightFactor;
float swayY = -abs(sin(phase * 0.7)) * 0.02 * heightFactor; // slight droop
float phase = worldPos.x * 1.8 + worldPos.z * 1.3 + windTime * 3.5 * frequency;
float phase2 = worldPos.x * 0.7 - worldPos.z * 2.1 + windTime * 2.7 * frequency;
float swayX = sin(phase) * 0.11 * heightFactor * amplitude;
float swayZ = cos(phase2) * 0.08 * heightFactor * amplitude;
float swayY = -abs(sin(phase * 0.7)) * 0.02 * heightFactor * amplitude; // slight droop
worldPos.x += swayX;
worldPos.y += swayY;
worldPos.z += swayZ;

View file

@ -139,19 +139,29 @@ int APIENTRY wWinMain(
wcex.lpszClassName = L"BVLEVoxels";
RegisterClassExW(&wcex);
// Screenshot mode: small minimized window to avoid interrupting user
// Compute window size so the client area is exactly 1920x1080
DWORD style = WS_OVERLAPPEDWINDOW;
int clientW = isScreenshot ? 640 : 1920;
int clientH = isScreenshot ? 480 : 1080;
RECT rc = { 0, 0, clientW, clientH };
AdjustWindowRect(&rc, style, FALSE);
int windowW = rc.right - rc.left;
int windowH = rc.bottom - rc.top;
// Center on screen
int screenW = GetSystemMetrics(SM_CXSCREEN);
int screenH = GetSystemMetrics(SM_CYSCREEN);
int posX = isScreenshot ? 0 : (screenW - windowW) / 2;
int posY = isScreenshot ? 0 : (screenH - windowH) / 2;
HWND hWnd = CreateWindowW(
wcex.lpszClassName,
isScreenshot ? L"BVLE Screenshot" : L"BVLE Voxels - Prototype",
WS_OVERLAPPEDWINDOW,
isScreenshot ? 0 : CW_USEDEFAULT,
isScreenshot ? 0 : 0,
isScreenshot ? 640 : 1920,
isScreenshot ? 480 : 1080,
style,
posX, posY, windowW, windowH,
nullptr, nullptr, hInstance, nullptr
);
// SW_SHOWNOACTIVATE: visible but doesn't steal focus (minimized windows don't render)
ShowWindow(hWnd, isScreenshot ? SW_SHOWNOACTIVATE : SW_SHOWMAXIMIZED);
ShowWindow(hWnd, isScreenshot ? SW_SHOWNOACTIVATE : SW_SHOW);
// Initialize Wicked Engine
application.SetWindow(hWnd);
@ -198,9 +208,10 @@ int APIENTRY wWinMain(
if (renderPath.screenshotMode) {
struct CamView { float x, y, z, pitch, yaw; const char* name; };
static const CamView views[] = {
{ 223.f, 36.5f, 261.f, -0.20f, 0.7f, "closeup" }, // close-up: slightly above grass, looking across
{ 222.5f, 36.2f, 261.f, -0.10f, 0.5f,"blade" }, // eye-level with grass blades
{ 220.f, 39.f, 258.f, -0.35f, 0.7f, "medium" }, // medium shot of grass patch
{ 220.f, 42.f, 258.f, -0.40f, 0.7f, "landscape" }, // higher overview
{ 220.f, 39.f, 258.f, -0.35f, 0.7f, "medium" }, // medium shot, terrain detail
{ 222.f, 37.f, 260.f, -0.20f, 0.5f, "closeup" }, // close-up ground level
{ 220.f, 120.f, 258.f, 1.0f, 0.7f, "birdseye" }, // bird's eye (LOD overview)
};
static const int numViews = sizeof(views) / sizeof(views[0]);
static int currentView = 0;

View file

@ -0,0 +1,68 @@
#pragma once
#include "WickedEngine.h"
namespace voxel {
// ── Deferred GPU Buffer ─────────────────────────────────────────
// Encapsulates the repeated pattern of:
// 1. CPU staging data prepared during Update()
// 2. GPU buffer with capacity-based growth (25% headroom)
// 3. Dirty flag for deferred upload in Render()
//
// Eliminates ~50 lines of boilerplate per buffer and centralizes
// the invariants (capacity >= count, CreateBuffer with nullptr,
// UpdateBuffer with actual data size).
struct DeferredGPUBuffer {
wi::graphics::GPUBuffer gpu;
mutable uint32_t capacity = 0; // in elements
mutable bool dirty = false;
uint32_t stride = 0; // bytes per element
// Ensure GPU buffer has enough capacity for elementCount elements.
// Creates/recreates buffer only when capacity is insufficient.
// Returns true if buffer was (re)created.
bool ensureCapacity(wi::graphics::GraphicsDevice* device,
uint32_t elementCount,
uint32_t elementStride,
wi::graphics::BindFlag bindFlags,
wi::graphics::ResourceMiscFlag miscFlags = wi::graphics::ResourceMiscFlag::BUFFER_STRUCTURED)
{
stride = elementStride;
if (gpu.IsValid() && capacity >= elementCount) return false;
capacity = elementCount + elementCount / 4; // 25% headroom
wi::graphics::GPUBufferDesc desc;
desc.size = (uint64_t)capacity * stride;
desc.bind_flags = bindFlags;
desc.misc_flags = miscFlags;
desc.stride = (miscFlags == wi::graphics::ResourceMiscFlag::BUFFER_STRUCTURED) ? stride : 0;
desc.usage = wi::graphics::Usage::DEFAULT;
device->CreateBuffer(&desc, nullptr, &gpu);
dirty = true;
return true;
}
// Upload data to GPU. Call from Render() with a valid CommandList.
// dataCount = number of elements to upload (may be < capacity).
void upload(wi::graphics::GraphicsDevice* device,
wi::graphics::CommandList cmd,
const void* data,
uint32_t dataCount) const
{
if (!dirty || !gpu.IsValid() || dataCount == 0 || !data) return;
size_t uploadSize = (size_t)dataCount * stride;
size_t bufferSize = (size_t)capacity * stride;
if (uploadSize <= bufferSize) {
device->UpdateBuffer(&gpu, data, cmd, uploadSize);
}
dirty = false;
}
// Mark as needing upload (call after staging data changes).
void markDirty() { dirty = true; }
bool isValid() const { return gpu.IsValid(); }
};
} // namespace voxel

View file

@ -243,538 +243,11 @@ uint8_t VoxelMesher::calcAO(const VoxelWorld& world, const ChunkPos& cpos,
}
// ══════════════════════════════════════════════════════════════════
// ── Naive Surface Nets Mesher (Phase 5) ─────────────────────────
// ── Smooth meshing (Phase 5) ────────────────────────────────────
// ══════════════════════════════════════════════════════════════════
//
// Algorithm:
// 1. Compute SDF for each voxel: smooth solid = -1, empty = +1
// Non-smooth solid voxels act as hard walls (SDF crushed to -1).
// 2. For each cell on the surface (SDF sign differs from at least one neighbor),
// place a vertex at the centroid of edge crossings.
// 3. For each edge (pair of adjacent cells) with a sign change,
// emit a quad connecting the 4 cells that share that edge, then split to 2 triangles.
// 4. Normals derived from SDF gradient (central differences).
// Padded grid: +2 border for cross-chunk SDF lookups and neighbor smooth detection
static constexpr int PAD = 2;
static constexpr int GRID = CHUNK_SIZE + 2 * PAD; // 36
static inline int gridIdx(int x, int y, int z) {
return (x + PAD) + (y + PAD) * GRID + (z + PAD) * GRID * GRID;
}
// Helper: read voxel data at chunk-local coords (with cross-chunk fallback)
static VoxelData readVoxel(const Chunk& chunk, const VoxelWorld& world, int x, int y, int z) {
if (chunk.isInBounds(x, y, z))
return chunk.at(x, y, z);
return world.getVoxel(
chunk.pos.x * CHUNK_SIZE + x,
chunk.pos.y * CHUNK_SIZE + y,
chunk.pos.z * CHUNK_SIZE + z);
}
float SmoothMesher::computeSDF(const Chunk& chunk, const VoxelWorld& world,
int x, int y, int z) {
VoxelData v = readVoxel(chunk, world, x, y, z);
if (v.isEmpty()) return 1.0f; // empty → positive SDF
return -1.0f; // any solid → negative SDF
}
void SmoothMesher::computeNormal(const Chunk& chunk, const VoxelWorld& world,
int x, int y, int z,
float& nx, float& ny, float& nz) {
// Central differences of the SDF
float dx = computeSDF(chunk, world, x+1, y, z) - computeSDF(chunk, world, x-1, y, z);
float dy = computeSDF(chunk, world, x, y+1, z) - computeSDF(chunk, world, x, y-1, z);
float dz = computeSDF(chunk, world, x, y, z+1) - computeSDF(chunk, world, x, y, z-1);
float len = std::sqrt(dx*dx + dy*dy + dz*dz);
if (len > 0.0001f) {
nx = dx / len;
ny = dy / len;
nz = dz / len;
} else {
nx = 0.0f; ny = 1.0f; nz = 0.0f;
}
}
// Thread-local scratch buffers to avoid per-chunk allocation overhead.
// Each worker thread gets its own set, eliminating malloc/free thrashing.
struct SmoothScratch {
float sdf[GRID * GRID * GRID];
uint8_t smoothGrid[GRID * GRID * GRID];
uint8_t smoothNear[GRID * GRID * GRID]; // dilated: 1 if smooth OR face-adjacent to smooth
VoxelData voxelGrid[GRID * GRID * GRID];
int32_t vertexMap[33 * 33 * 33]; // VERT_RANGE³
};
static thread_local SmoothScratch* tls_scratch = nullptr;
uint32_t SmoothMesher::meshChunk(Chunk& chunk, const VoxelWorld& world) {
chunk.smoothVertices.clear();
chunk.hasSmooth = false;
// ── Early exit: skip chunks far from any smooth voxels ──────
// Check this chunk + 26 neighbors for containsSmooth flag.
// This avoids the expensive 36³ grid fill for ~70% of chunks.
{
bool nearSmooth = chunk.containsSmooth;
if (!nearSmooth) {
for (int dz = -1; dz <= 1 && !nearSmooth; dz++)
for (int dy = -1; dy <= 1 && !nearSmooth; dy++)
for (int dx = -1; dx <= 1 && !nearSmooth; dx++) {
if (dx == 0 && dy == 0 && dz == 0) continue;
const Chunk* nc = world.getChunk(
ChunkPos{chunk.pos.x + dx, chunk.pos.y + dy, chunk.pos.z + dz});
if (nc && nc->containsSmooth) nearSmooth = true;
}
}
if (!nearSmooth) return 0;
}
// Allocate thread-local scratch once per thread (persists across calls)
if (!tls_scratch) tls_scratch = new SmoothScratch();
auto& scratch = *tls_scratch;
// ── Step 1: Build SDF grid + smooth flag grid + voxel cache ──
// PAD=2 so we have SDF data for cells at [-1..CHUNK_SIZE] (all 8 corners accessible)
// Also build a "isSmooth" grid for the same range to detect proximity to smooth voxels.
// voxelGrid caches VoxelData to avoid repeated cross-chunk hashmap lookups later.
float* sdf = scratch.sdf;
uint8_t* smoothGrid = scratch.smoothGrid;
VoxelData* voxelGrid = scratch.voxelGrid;
constexpr int GRID3 = GRID * GRID * GRID;
std::memset(smoothGrid, 0, GRID3);
// SDF defaults to 1.0f (empty) — fill below
for (int i = 0; i < GRID3; i++) sdf[i] = 1.0f;
bool anySmooth = false;
// Pre-cache neighbor chunk pointers for fast cross-chunk access
const Chunk* neighborChunks[3][3][3] = {};
for (int dz = -1; dz <= 1; dz++)
for (int dy = -1; dy <= 1; dy++)
for (int dx = -1; dx <= 1; dx++) {
neighborChunks[dx+1][dy+1][dz+1] = world.getChunk(
ChunkPos{chunk.pos.x + dx, chunk.pos.y + dy, chunk.pos.z + dz});
}
// Helper: fast voxel read using cached neighbor chunk pointers
auto readVoxelFast = [&](int x, int y, int z) -> VoxelData {
if (x >= 0 && x < CHUNK_SIZE && y >= 0 && y < CHUNK_SIZE && z >= 0 && z < CHUNK_SIZE)
return chunk.at(x, y, z);
// Determine which neighbor chunk
int cx = (x < 0) ? 0 : (x >= CHUNK_SIZE) ? 2 : 1;
int cy = (y < 0) ? 0 : (y >= CHUNK_SIZE) ? 2 : 1;
int cz = (z < 0) ? 0 : (z >= CHUNK_SIZE) ? 2 : 1;
const Chunk* nc = neighborChunks[cx][cy][cz];
if (!nc) return VoxelData{}; // empty if chunk not loaded
int lx = ((x % CHUNK_SIZE) + CHUNK_SIZE) % CHUNK_SIZE;
int ly = ((y % CHUNK_SIZE) + CHUNK_SIZE) % CHUNK_SIZE;
int lz = ((z % CHUNK_SIZE) + CHUNK_SIZE) % CHUNK_SIZE;
return nc->at(lx, ly, lz);
};
for (int z = -PAD; z < CHUNK_SIZE + PAD; z++) {
for (int y = -PAD; y < CHUNK_SIZE + PAD; y++) {
for (int x = -PAD; x < CHUNK_SIZE + PAD; x++) {
int gi = gridIdx(x, y, z);
VoxelData v = readVoxelFast(x, y, z);
voxelGrid[gi] = v;
sdf[gi] = v.isEmpty() ? 1.0f : -1.0f;
if (v.isSmooth()) {
smoothGrid[gi] = 1;
// Only need anySmooth for this chunk's own voxels
if (chunk.isInBounds(x, y, z)) anySmooth = true;
}
}
}
}
// Also check 1 beyond the chunk (neighbor chunks may have smooth voxels that
// affect cells at the chunk boundary)
if (!anySmooth) {
// Check if any neighbor voxels just outside the chunk are smooth
for (int z = -1; z <= CHUNK_SIZE && !anySmooth; z++)
for (int y = -1; y <= CHUNK_SIZE && !anySmooth; y++)
for (int x = -1; x <= CHUNK_SIZE && !anySmooth; x++) {
if (chunk.isInBounds(x, y, z)) continue; // already checked
if (smoothGrid[gridIdx(x, y, z)]) anySmooth = true;
}
}
if (!anySmooth) return 0;
chunk.hasSmooth = true;
// ── Step 1b: Dilate smoothGrid → smoothNear ──────────────────
// Pre-compute "smooth or face-adjacent to smooth" to reduce the
// per-cell hasSmooth check from 56 lookups to 8 lookups.
uint8_t* smoothNear = scratch.smoothNear;
std::memcpy(smoothNear, smoothGrid, GRID3);
for (int z = -PAD + 1; z < CHUNK_SIZE + PAD - 1; z++)
for (int y = -PAD + 1; y < CHUNK_SIZE + PAD - 1; y++)
for (int x = -PAD + 1; x < CHUNK_SIZE + PAD - 1; x++) {
if (smoothGrid[gridIdx(x, y, z)]) {
smoothNear[gridIdx(x+1, y, z)] = 1;
smoothNear[gridIdx(x-1, y, z)] = 1;
smoothNear[gridIdx(x, y+1, z)] = 1;
smoothNear[gridIdx(x, y-1, z)] = 1;
smoothNear[gridIdx(x, y, z+1)] = 1;
smoothNear[gridIdx(x, y, z-1)] = 1;
}
}
// ── Step 2: Generate vertices for surface cells ──────────────
// Extended range: [-1, CHUNK_SIZE) for cross-chunk connectivity.
// This chunk generates vertices for cells at [-1..CHUNK_SIZE-1].
// The vertex map covers [-1..CHUNK_SIZE-1] → size = CHUNK_SIZE+1, offset by +1.
static constexpr int VERT_MIN = -1;
static constexpr int VERT_MAX = CHUNK_SIZE; // exclusive
static constexpr int VERT_RANGE = VERT_MAX - VERT_MIN; // CHUNK_SIZE + 1 = 33
int32_t* vertexMap = scratch.vertexMap;
std::memset(vertexMap, -1, VERT_RANGE * VERT_RANGE * VERT_RANGE * sizeof(int32_t));
auto vertMapIdx = [](int x, int y, int z) -> int {
// shift coordinates by -VERT_MIN = +1 so index range is [0, VERT_RANGE)
return (x - VERT_MIN) + (y - VERT_MIN) * VERT_RANGE + (z - VERT_MIN) * VERT_RANGE * VERT_RANGE;
};
// World offset for this chunk
float ox = (float)(chunk.pos.x * CHUNK_SIZE);
float oy = (float)(chunk.pos.y * CHUNK_SIZE);
float oz = (float)(chunk.pos.z * CHUNK_SIZE);
// Corner offsets: (dx,dy,dz) for corner index 0-7 of a cell
static const int cornerOff[8][3] = {
{0,0,0}, {1,0,0}, {0,1,0}, {1,1,0},
{0,0,1}, {1,0,1}, {0,1,1}, {1,1,1},
};
static const float cornerOffF[8][3] = {
{0,0,0}, {1,0,0}, {0,1,0}, {1,1,0},
{0,0,1}, {1,0,1}, {0,1,1}, {1,1,1},
};
static const int edges[12][2] = {
{0,1}, {2,3}, {4,5}, {6,7}, // X-axis edges
{0,2}, {1,3}, {4,6}, {5,7}, // Y-axis edges
{0,4}, {1,5}, {2,6}, {3,7}, // Z-axis edges
};
for (int z = VERT_MIN; z < VERT_MAX; z++) {
for (int y = VERT_MIN; y < VERT_MAX; y++) {
for (int x = VERT_MIN; x < VERT_MAX; x++) {
// hasSmooth check via dilated grid: at least one corner must be
// smooth or face-adjacent to smooth. Uses pre-dilated smoothNear
// grid → only 8 lookups instead of 56.
bool hasSmooth = false;
for (int c = 0; c < 8 && !hasSmooth; c++) {
if (smoothNear[gridIdx(x + cornerOff[c][0], y + cornerOff[c][1], z + cornerOff[c][2])])
hasSmooth = true;
}
if (!hasSmooth) continue;
// Get SDF at 8 corners of cell (x,y,z)
float corner[8];
bool hasPos = false, hasNeg = false;
for (int c = 0; c < 8; c++) {
corner[c] = sdf[gridIdx(x + cornerOff[c][0], y + cornerOff[c][1], z + cornerOff[c][2])];
if (corner[c] < 0.0f) hasNeg = true;
else hasPos = true;
}
if (!hasPos || !hasNeg) continue; // no sign change → not on surface
// Compute vertex position as centroid of edge crossings.
// +0.5 offset: SDF is sampled at voxel centers, so the cell spans
// from (x+0.5) to (x+1.5) in world space. This naturally aligns
// the isosurface with the integer grid (voxel face positions).
float sumX = 0, sumY = 0, sumZ = 0;
int crossCount = 0;
for (int e = 0; e < 12; e++) {
float s0 = corner[edges[e][0]];
float s1 = corner[edges[e][1]];
if ((s0 < 0.0f) == (s1 < 0.0f)) continue;
float t = s0 / (s0 - s1);
t = std::clamp(t, 0.01f, 0.99f);
const float* c0 = cornerOffF[edges[e][0]];
const float* c1 = cornerOffF[edges[e][1]];
sumX += c0[0] + t * (c1[0] - c0[0]);
sumY += c0[1] + t * (c1[1] - c0[1]);
sumZ += c0[2] + t * (c1[2] - c0[2]);
crossCount++;
}
if (crossCount == 0) continue;
float invCross = 1.0f / (float)crossCount;
// centroid in [0,1] within the cell
float cx = sumX * invCross;
float cy = sumY * invCross;
float cz = sumZ * invCross;
// ── Per-axis clamping at blocky boundaries ───────────
// With +0.5 offset, the cell spans [x+0.5, x+1.5] in world space.
// The integer grid (blocky faces) is at x+1. In centroid coords,
// that's centroid = 0.5 (the midpoint of the cell).
// If the +side corners (dx=1) contain a blocky solid, clamp centroid ≤ 0.5
// If the -side corners (dx=0) contain a blocky solid, clamp centroid ≥ 0.5
// This prevents the smooth mesh from extending into blocky territory.
bool blockyXlo = false, blockyXhi = false;
bool blockyYlo = false, blockyYhi = false;
bool blockyZlo = false, blockyZhi = false;
for (int c = 0; c < 8; c++) {
if (corner[c] >= 0.0f) continue; // empty corner
VoxelData v = voxelGrid[gridIdx(
x + cornerOff[c][0], y + cornerOff[c][1], z + cornerOff[c][2])];
if (!v.isEmpty() && !v.isSmooth()) {
// This corner is a blocky solid
if (cornerOff[c][0] == 0) blockyXlo = true; else blockyXhi = true;
if (cornerOff[c][1] == 0) blockyYlo = true; else blockyYhi = true;
if (cornerOff[c][2] == 0) blockyZlo = true; else blockyZhi = true;
}
}
if (blockyXhi) cx = std::min(cx, 0.5f);
if (blockyXlo) cx = std::max(cx, 0.5f);
if (blockyYhi) cy = std::min(cy, 0.5f);
if (blockyYlo) cy = std::max(cy, 0.5f);
if (blockyZhi) cz = std::min(cz, 0.5f);
if (blockyZlo) cz = std::max(cz, 0.5f);
// World position with +0.5 offset (SDF at voxel centers)
float vx = (float)x + 0.5f + cx;
float vy = (float)y + 0.5f + cy;
float vz = (float)z + 0.5f + cz;
// Determine material: prefer smooth voxels' materials to avoid
// picking up subsurface blocky materials (e.g., dirt under stone)
uint8_t smoothMatCounts[256] = {};
uint8_t allMatCounts[256] = {};
int smoothCount = 0;
for (int c = 0; c < 8; c++) {
if (corner[c] < 0.0f) {
VoxelData v = voxelGrid[gridIdx(
x + cornerOff[c][0], y + cornerOff[c][1], z + cornerOff[c][2])];
if (!v.isEmpty()) {
allMatCounts[v.getMaterialID()]++;
if (v.isSmooth()) {
smoothMatCounts[v.getMaterialID()]++;
smoothCount++;
}
}
}
}
// Primary material: prefer smooth-only counts to avoid subsurface bleed
uint8_t* primaryCounts = (smoothCount > 0) ? smoothMatCounts : allMatCounts;
uint8_t bestMat = 6, bestCount = 0;
for (int m = 1; m < 256; m++) {
if (primaryCounts[m] > bestCount) {
bestMat = (uint8_t)m; bestCount = primaryCounts[m];
}
}
// Secondary material: only count SURFACE-EXPOSED voxels (at least one
// empty neighbor). This prevents underground materials (dirt under stone)
// from bleeding through — same principle as blocky face blending.
static const int dirs6[6][3] = {{1,0,0},{-1,0,0},{0,1,0},{0,-1,0},{0,0,1},{0,0,-1}};
uint8_t surfaceMatCounts[256] = {};
for (int c = 0; c < 8; c++) {
if (corner[c] >= 0.0f) continue;
int cx = x + cornerOff[c][0], cy = y + cornerOff[c][1], cz = z + cornerOff[c][2];
VoxelData v = voxelGrid[gridIdx(cx, cy, cz)];
if (v.isEmpty()) continue;
// Check if this voxel is on the surface
bool onSurface = false;
for (int d = 0; d < 6 && !onSurface; d++) {
if (sdf[gridIdx(cx + dirs6[d][0], cy + dirs6[d][1], cz + dirs6[d][2])] > 0.0f)
onSurface = true;
}
if (onSurface) surfaceMatCounts[v.getMaterialID()]++;
}
uint8_t secMat = bestMat, secCount = 0;
for (int m = 1; m < 256; m++) {
if (m == bestMat) continue;
if (surfaceMatCounts[m] > secCount) {
secMat = (uint8_t)m; secCount = surfaceMatCounts[m];
}
}
// blendWeight: binary flag — 255 at material boundary, 0 at interior.
// GPU interpolation creates the smooth edge-to-interior falloff.
uint8_t blendW = (secCount > 0 && secMat != bestMat) ? 255 : 0;
// Store vertex (normals zeroed — computed later from face normals in Step 4)
int32_t vertIdx = (int32_t)chunk.smoothVertices.size();
vertexMap[vertMapIdx(x, y, z)] = vertIdx;
SmoothVertex sv;
sv.px = ox + vx;
sv.py = oy + vy;
sv.pz = oz + vz;
sv.nx = 0;
sv.ny = 0;
sv.nz = 0;
sv.materialID = bestMat;
sv.secondaryMat = secMat;
sv.blendWeight = blendW;
sv._pad1 = 0;
sv.chunkIndex = 0;
sv._pad2 = 0;
chunk.smoothVertices.push_back(sv);
}
}
}
if (chunk.smoothVertices.empty()) {
chunk.hasSmooth = false;
return 0;
}
// ── Step 3: Emit quads for edges with sign change ────────────
// Canonical ownership: this chunk owns edges whose lower endpoint
// is in [0, CHUNK_SIZE). Extended to check edges at the chunk
// boundary (lower endpoint at CHUNK_SIZE-1, upper at CHUNK_SIZE).
// The sharing cells may be at [-1..CHUNK_SIZE-1], all covered by vertex map.
// Tri with edge axis info for correct normal orientation.
// normalAxis: 0=X, 1=Y, 2=Z — the axis of the edge that generated this quad.
// normalSign: +1 if the normal should point in +axis direction, -1 for -axis.
struct Tri { int32_t a, b, c; int8_t normalAxis; int8_t normalSign; };
std::vector<Tri> triangles;
triangles.reserve(chunk.smoothVertices.size() * 2);
// Helper: safe vertex map lookup (returns -1 if out of range)
auto safeVertMap = [&](int x, int y, int z) -> int32_t {
if (x < VERT_MIN || x >= VERT_MAX ||
y < VERT_MIN || y >= VERT_MAX ||
z < VERT_MIN || z >= VERT_MAX) return -1;
return vertexMap[vertMapIdx(x, y, z)];
};
// Helper: emit 2 triangles for a quad (a,b,c,d) with known desired normal.
// The Y-axis sharing cells have a different spatial arrangement from X and Z,
// requiring opposite winding to produce correct front-facing triangles.
auto emitQuad = [&](int a, int b, int c, int d, float s0, int8_t axis) {
if (a < 0 || b < 0 || c < 0 || d < 0) return;
int8_t sign = (s0 < 0.0f) ? +1 : -1;
// Y-axis has natural winding swapped relative to X and Z
bool useWindingA = (s0 > 0.0f);
if (axis == 1) useWindingA = !useWindingA;
if (useWindingA) {
triangles.push_back({a, b, d, axis, sign});
triangles.push_back({a, d, c, axis, sign});
} else {
triangles.push_back({a, d, b, axis, sign});
triangles.push_back({a, c, d, axis, sign});
}
};
// Iterate over edges owned by this chunk: grid points [0, CHUNK_SIZE)
for (int z = 0; z < CHUNK_SIZE; z++) {
for (int y = 0; y < CHUNK_SIZE; y++) {
for (int x = 0; x < CHUNK_SIZE; x++) {
float s0 = sdf[gridIdx(x, y, z)];
// X-axis edge: (x,y,z) → (x+1,y,z)
{
float s1 = sdf[gridIdx(x+1, y, z)];
if ((s0 < 0.0f) != (s1 < 0.0f)) {
emitQuad(
safeVertMap(x, y-1, z-1), safeVertMap(x, y, z-1),
safeVertMap(x, y-1, z), safeVertMap(x, y, z),
s0, 0);
}
}
// Y-axis edge: (x,y,z) → (x,y+1,z)
{
float s1 = sdf[gridIdx(x, y+1, z)];
if ((s0 < 0.0f) != (s1 < 0.0f)) {
emitQuad(
safeVertMap(x-1, y, z-1), safeVertMap(x, y, z-1),
safeVertMap(x-1, y, z), safeVertMap(x, y, z),
s0, 1);
}
}
// Z-axis edge: (x,y,z) → (x,y,z+1)
{
float s1 = sdf[gridIdx(x, y, z+1)];
if ((s0 < 0.0f) != (s1 < 0.0f)) {
emitQuad(
safeVertMap(x-1, y-1, z), safeVertMap(x, y-1, z),
safeVertMap(x-1, y, z), safeVertMap(x, y, z),
s0, 2);
}
}
}
}
}
// ── Step 4: Compute smooth vertex normals ──────────────────────
// Accumulate area-weighted face normals into each indexed vertex,
// then normalize. This gives Gouraud-style smooth shading across
// the Surface Nets mesh without adding geometry.
const int vertCount = (int)chunk.smoothVertices.size();
// Zero out vertex normals (will accumulate face normals)
for (auto& sv : chunk.smoothVertices) {
sv.nx = 0; sv.ny = 0; sv.nz = 0;
}
// For each triangle: compute oriented face normal, accumulate into vertices.
// The cross product magnitude is proportional to triangle area, so larger
// triangles contribute more — this is the standard area-weighted approach.
for (const auto& tri : triangles) {
const SmoothVertex& va = chunk.smoothVertices[tri.a];
const SmoothVertex& vb = chunk.smoothVertices[tri.b];
const SmoothVertex& vc = chunk.smoothVertices[tri.c];
float e1x = vb.px - va.px, e1y = vb.py - va.py, e1z = vb.pz - va.pz;
float e2x = vc.px - va.px, e2y = vc.py - va.py, e2z = vc.pz - va.pz;
float fnx = e1y * e2z - e1z * e2y;
float fny = e1z * e2x - e1x * e2z;
float fnz = e1x * e2y - e1y * e2x;
// Orient using the known edge axis (same logic as before)
float component = (tri.normalAxis == 0) ? fnx : (tri.normalAxis == 1) ? fny : fnz;
if ((component > 0.0f) != (tri.normalSign > 0)) {
fnx = -fnx; fny = -fny; fnz = -fnz;
}
// Accumulate (area-weighted — cross product magnitude IS the area×2)
chunk.smoothVertices[tri.a].nx += fnx;
chunk.smoothVertices[tri.a].ny += fny;
chunk.smoothVertices[tri.a].nz += fnz;
chunk.smoothVertices[tri.b].nx += fnx;
chunk.smoothVertices[tri.b].ny += fny;
chunk.smoothVertices[tri.b].nz += fnz;
chunk.smoothVertices[tri.c].nx += fnx;
chunk.smoothVertices[tri.c].ny += fny;
chunk.smoothVertices[tri.c].nz += fnz;
}
// Normalize accumulated vertex normals
for (auto& sv : chunk.smoothVertices) {
float len = std::sqrt(sv.nx*sv.nx + sv.ny*sv.ny + sv.nz*sv.nz);
if (len > 0.0001f) {
sv.nx /= len; sv.ny /= len; sv.nz /= len;
} else {
sv.nx = 0; sv.ny = 1; sv.nz = 0;
}
}
// ── Step 5: Expand indexed triangles to triangle list ─────────
std::vector<SmoothVertex> expanded;
expanded.reserve(triangles.size() * 3);
for (const auto& tri : triangles) {
expanded.push_back(chunk.smoothVertices[tri.a]);
expanded.push_back(chunk.smoothVertices[tri.b]);
expanded.push_back(chunk.smoothVertices[tri.c]);
}
chunk.smoothVertices = std::move(expanded);
chunk.smoothVertexCount = (uint32_t)chunk.smoothVertices.size();
return chunk.smoothVertexCount;
}
// The CPU SmoothMesher has been removed. Smooth meshing is now handled
// exclusively by the GPU compute shaders (voxelSmoothCentroidCS.hlsl
// + voxelSmoothCS.hlsl) which include crease-angle correction for
// correct normals at sharp edges (e.g. vertical walls).
} // namespace voxel

View file

@ -37,25 +37,4 @@ private:
int x, int y, int z, uint8_t face);
};
// ── Naive Surface Nets Mesher (Phase 5) ─────────────────────────
// Generates smooth triangle mesh for voxels marked FLAG_SMOOTH.
// Algorithm: one vertex per surface cell, positioned at edge-crossing centroid.
// Quads emitted for each edge with sign change, then split into 2 triangles.
class SmoothMesher {
public:
// Mesh smooth voxels in a chunk, populating chunk.smoothVertices.
// Returns number of smooth vertices generated (always multiple of 3, triangle list).
static uint32_t meshChunk(Chunk& chunk, const VoxelWorld& world);
private:
// SDF value at a voxel position (solid smooth = -1, empty = +1)
// Non-smooth solid voxels are treated as walls (SDF = -1 at boundary)
static float computeSDF(const Chunk& chunk, const VoxelWorld& world,
int x, int y, int z);
// Compute SDF gradient (numerical central differences) for normal
static void computeNormal(const Chunk& chunk, const VoxelWorld& world,
int x, int y, int z, float& nx, float& ny, float& nz);
};
} // namespace voxel

View file

@ -0,0 +1,610 @@
#include "VoxelRTManager.h"
#include <cstring>
using namespace wi::graphics;
namespace voxel {
void VoxelRTManager::initialize(GraphicsDevice* dev, uint32_t maxBlasVertices) {
device_ = dev;
maxBlasVertices_ = maxBlasVertices;
available_ = dev->CheckCapability(GraphicsDeviceCapability::RAYTRACING);
if (!available_) {
wi::backlog::post("VoxelRTManager: RT not available (GPU does not support ray tracing)");
return;
}
wi::renderer::LoadShader(ShaderStage::CS, blasExtractShader_, "voxel/voxelBLASExtractCS.cso");
if (blasExtractShader_.IsValid()) {
// BLAS position buffer: 6 float3 per quad (non-indexed triangles), raw buffer
GPUBufferDesc posDesc;
posDesc.size = (uint64_t)maxBlasVertices * sizeof(float) * 3;
posDesc.bind_flags = BindFlag::UNORDERED_ACCESS | BindFlag::SHADER_RESOURCE;
posDesc.misc_flags = ResourceMiscFlag::BUFFER_RAW;
posDesc.stride = 0;
posDesc.usage = Usage::DEFAULT;
bool ok = dev->CreateBuffer(&posDesc, nullptr, &blasPositionBuffer_);
// Sequential index buffer for BLAS
GPUBufferDesc idxDesc;
idxDesc.size = (uint64_t)maxBlasVertices * sizeof(uint32_t);
idxDesc.bind_flags = BindFlag::SHADER_RESOURCE;
idxDesc.usage = Usage::DEFAULT;
auto fillIndices = [maxBlasVertices](void* dest) {
uint32_t* p = (uint32_t*)dest;
for (uint32_t i = 0; i < maxBlasVertices; i++)
p[i] = i;
};
bool okIdx = dev->CreateBuffer2(&idxDesc, fillIndices, &blasIndexBuffer_);
if (ok && blasPositionBuffer_.IsValid() && okIdx && blasIndexBuffer_.IsValid()) {
dev->SetName(&blasPositionBuffer_, "VoxelRTManager::blasPositionBuffer");
dev->SetName(&blasIndexBuffer_, "VoxelRTManager::blasIndexBuffer");
wi::backlog::post("VoxelRTManager: RT available (BLAS pos "
+ std::to_string(posDesc.size / (1024*1024)) + " MB + idx "
+ std::to_string(idxDesc.size / (1024*1024)) + " MB)");
} else {
available_ = false;
wi::backlog::post("VoxelRTManager: RT buffer creation failed", wi::backlog::LogLevel::Warning);
}
} else {
available_ = false;
wi::backlog::post("VoxelRTManager: BLAS extraction shader failed", wi::backlog::LogLevel::Warning);
}
// Toping BLAS CS
wi::renderer::LoadShader(ShaderStage::CS, topingBLASShader_, "voxel/voxelTopingBLASCS.cso");
if (topingBLASShader_.IsValid()) {
static constexpr uint32_t MAX_GROUPS = 64;
GPUBufferDesc grpDesc;
grpDesc.size = MAX_GROUPS * 20; // 5 × uint32 per group
grpDesc.bind_flags = BindFlag::SHADER_RESOURCE;
grpDesc.misc_flags = ResourceMiscFlag::BUFFER_STRUCTURED;
grpDesc.stride = 20;
grpDesc.usage = Usage::DEFAULT;
dev->CreateBuffer(&grpDesc, nullptr, &topingBLASGroupBuffer_);
wi::backlog::post("VoxelRTManager: toping BLAS CS available");
} else {
wi::backlog::post("VoxelRTManager: toping BLAS CS failed", wi::backlog::LogLevel::Warning);
}
// RT Shadows + AO
wi::renderer::LoadShader(ShaderStage::CS, shadowShader_, "voxel/voxelShadowCS.cso",
ShaderModel::SM_6_5);
wi::renderer::LoadShader(ShaderStage::CS, aoBlurShader_, "voxel/voxelAOBlurCS.cso");
wi::renderer::LoadShader(ShaderStage::CS, aoApplyShader_, "voxel/voxelAOApplyCS.cso");
if (shadowShader_.IsValid() && aoBlurShader_.IsValid() && aoApplyShader_.IsValid()) {
shadowsEnabled_ = true;
wi::backlog::post("VoxelRTManager: RT shadows + AO blur available");
} else {
wi::backlog::post("VoxelRTManager: RT shadow/AO shader(s) failed",
wi::backlog::LogLevel::Warning);
}
}
// ── BLAS extraction: blocky quads → float3 positions ────────────
void VoxelRTManager::dispatchBLASExtract(CommandList cmd,
const GPUBuffer& quadBuffer,
const GPUBuffer& chunkInfoBuffer,
uint32_t quadCount) const
{
if (!available_ || !blasExtractShader_.IsValid() || quadCount == 0) return;
auto* dev = device_;
GPUBarrier preBarriers[] = {
GPUBarrier::Buffer(&blasPositionBuffer_,
ResourceState::UNDEFINED, ResourceState::UNORDERED_ACCESS),
};
dev->Barrier(preBarriers, 1, cmd);
dev->BindComputeShader(&blasExtractShader_, cmd);
dev->BindResource(&quadBuffer, 0, cmd); // t0
dev->BindResource(&chunkInfoBuffer, 2, cmd); // t2
dev->BindUAV(&blasPositionBuffer_, 0, cmd); // u0
struct BLASPush {
uint32_t quadCount;
uint32_t pad[11];
} pushData = {};
pushData.quadCount = quadCount;
dev->PushConstants(&pushData, sizeof(pushData), cmd);
uint32_t groupCount = (quadCount + 63) / 64;
dev->Dispatch(groupCount, 1, 1, cmd);
GPUBarrier postBarriers[] = {
GPUBarrier::Buffer(&blasPositionBuffer_,
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
};
dev->Barrier(postBarriers, 1, cmd);
blockyVertexCount_ = quadCount * 6;
}
// ── Toping BLAS extraction (GPU compute) ────────────────────────
void VoxelRTManager::dispatchTopingBLASExtract(CommandList cmd,
const GPUBuffer& topingVertexBuffer,
const GPUBuffer& topingInstanceBuffer,
const void* groupsGPUData, size_t groupsGPUSize,
uint32_t groupCount, uint32_t totalVertices) const
{
if (!topingBLASShader_.IsValid() || !topingBLASGroupBuffer_.IsValid() ||
!topingBLASPositionBuf_.isValid() || !topingVertexBuffer.IsValid() ||
!topingInstanceBuffer.IsValid() || totalVertices == 0 || groupCount == 0)
return;
auto* dev = device_;
// Upload group table
dev->UpdateBuffer(&topingBLASGroupBuffer_, groupsGPUData, cmd, groupsGPUSize);
GPUBarrier preBarriers[] = {
GPUBarrier::Buffer(&topingBLASGroupBuffer_,
ResourceState::COPY_DST, ResourceState::SHADER_RESOURCE),
GPUBarrier::Buffer(&topingBLASPositionBuf_.gpu,
ResourceState::UNDEFINED, ResourceState::UNORDERED_ACCESS),
};
dev->Barrier(preBarriers, 2, cmd);
dev->BindComputeShader(&topingBLASShader_, cmd);
dev->BindResource(&topingVertexBuffer, 4, cmd); // t4
dev->BindResource(&topingInstanceBuffer, 5, cmd); // t5
dev->BindResource(&topingBLASGroupBuffer_, 7, cmd); // t7
dev->BindUAV(&topingBLASPositionBuf_.gpu, 0, cmd); // u0
struct {
uint32_t totalVertices;
uint32_t groupCount;
uint32_t pad[10];
} pushData = {};
pushData.totalVertices = totalVertices;
pushData.groupCount = groupCount;
dev->PushConstants(&pushData, sizeof(pushData), cmd);
uint32_t threadGroups = (totalVertices + 63) / 64;
dev->Dispatch(threadGroups, 1, 1, cmd);
GPUBarrier postBarriers[] = {
GPUBarrier::Buffer(&topingBLASPositionBuf_.gpu,
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
};
dev->Barrier(postBarriers, 1, cmd);
topingVertexCount_ = totalVertices;
dirty = true;
topingBLASDirty = false;
}
// ── Ensure toping BLAS buffer capacity ──────────────────────────
bool VoxelRTManager::ensureTopingBLASCapacity(uint32_t totalVertices) {
if (totalVertices == 0) return false;
bool recreated = topingBLASPositionBuf_.ensureCapacity(device_, totalVertices,
3 * sizeof(float),
BindFlag::UNORDERED_ACCESS | BindFlag::SHADER_RESOURCE,
ResourceMiscFlag::BUFFER_RAW);
if (recreated) {
char msg[256];
snprintf(msg, sizeof(msg), "VoxelRTManager: toping BLAS pos buffer (%u capacity, %.1f MB)",
topingBLASPositionBuf_.capacity,
(size_t)topingBLASPositionBuf_.capacity * 3 * sizeof(float) / (1024.0 * 1024.0));
wi::backlog::post(msg);
}
// Index buffer: grow if needed
if (topingBLASIndexCount_ < topingBLASPositionBuf_.capacity) {
uint32_t idxCount = topingBLASPositionBuf_.capacity;
std::vector<uint32_t> indices(idxCount);
for (uint32_t j = 0; j < idxCount; j++) indices[j] = j;
GPUBufferDesc idxDesc;
idxDesc.size = (size_t)idxCount * sizeof(uint32_t);
idxDesc.bind_flags = BindFlag::SHADER_RESOURCE;
idxDesc.misc_flags = ResourceMiscFlag::NONE;
idxDesc.usage = Usage::DEFAULT;
device_->CreateBuffer(&idxDesc, indices.data(), &topingBLASIndexBuffer_);
topingBLASIndexCount_ = idxCount;
recreated = true;
}
topingBLASDirty = true;
return recreated;
}
// ── Acceleration structure build ────────────────────────────────
void VoxelRTManager::buildAccelerationStructures(CommandList cmd,
uint32_t buildFlags,
const GPUBuffer& smoothVB,
uint32_t smoothVertCount) const
{
if (!available_) return;
auto* dev = device_;
// ── Blocky BLAS ──
uint32_t blockyVertCount = blockyVertexCount_;
if (blockyVertCount < 3) blockyVertCount = 0;
if ((buildFlags & BUILD_BLOCKY) && blockyVertCount > 0 && blasPositionBuffer_.IsValid()) {
if (!blockyBLAS_.IsValid() || blockyVertCount > blockyBLASCapacity_) {
blockyBLASCapacity_ = blockyVertCount + blockyVertCount / 4;
RaytracingAccelerationStructureDesc desc;
desc.type = RaytracingAccelerationStructureDesc::Type::BOTTOMLEVEL;
desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
desc.bottom_level.geometries.resize(1);
auto& geom = desc.bottom_level.geometries[0];
geom.type = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::Type::TRIANGLES;
geom.flags = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::FLAG_OPAQUE;
geom.triangles.vertex_buffer = blasPositionBuffer_;
geom.triangles.vertex_byte_offset = 0;
geom.triangles.vertex_count = blockyBLASCapacity_;
geom.triangles.vertex_stride = sizeof(float) * 3;
geom.triangles.vertex_format = Format::R32G32B32_FLOAT;
geom.triangles.index_buffer = blasIndexBuffer_;
geom.triangles.index_count = blockyBLASCapacity_;
geom.triangles.index_format = IndexBufferFormat::UINT32;
geom.triangles.index_offset = 0;
bool ok = dev->CreateRaytracingAccelerationStructure(&desc, &blockyBLAS_);
if (ok) {
dev->SetName(&blockyBLAS_, "VoxelRTManager::blockyBLAS");
wi::backlog::post("VoxelRTManager: blocky BLAS created (capacity "
+ std::to_string(blockyBLASCapacity_ / 3) + " tris)");
} else {
wi::backlog::post("VoxelRTManager: failed to create blocky BLAS", wi::backlog::LogLevel::Error);
available_ = false;
return;
}
}
blockyBLAS_.desc.bottom_level.geometries[0].triangles.vertex_count = blockyVertCount;
blockyBLAS_.desc.bottom_level.geometries[0].triangles.index_count = blockyVertCount;
dev->BuildRaytracingAccelerationStructure(&blockyBLAS_, cmd, nullptr);
}
// ── Smooth BLAS ──
if (smoothVertCount < 3) smoothVertCount = 0;
if ((buildFlags & BUILD_SMOOTH) && smoothVertCount > 0 && smoothVB.IsValid()) {
if (!smoothBLAS_.IsValid() || smoothVertCount > smoothBLASCapacity_) {
smoothBLASCapacity_ = smoothVertCount + smoothVertCount / 4;
RaytracingAccelerationStructureDesc desc;
desc.type = RaytracingAccelerationStructureDesc::Type::BOTTOMLEVEL;
desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
desc.bottom_level.geometries.resize(1);
auto& geom = desc.bottom_level.geometries[0];
geom.type = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::Type::TRIANGLES;
geom.flags = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::FLAG_OPAQUE;
geom.triangles.vertex_buffer = smoothVB;
geom.triangles.vertex_byte_offset = 0;
geom.triangles.vertex_count = smoothBLASCapacity_;
geom.triangles.vertex_stride = 32;
geom.triangles.index_buffer = blasIndexBuffer_;
geom.triangles.index_count = smoothBLASCapacity_;
geom.triangles.index_format = IndexBufferFormat::UINT32;
geom.triangles.index_offset = 0;
geom.triangles.vertex_format = Format::R32G32B32_FLOAT;
bool ok = dev->CreateRaytracingAccelerationStructure(&desc, &smoothBLAS_);
if (ok) {
dev->SetName(&smoothBLAS_, "VoxelRTManager::smoothBLAS");
wi::backlog::post("VoxelRTManager: smooth BLAS created (capacity "
+ std::to_string(smoothBLASCapacity_ / 3) + " tris)");
} else {
wi::backlog::post("VoxelRTManager: failed to create smooth BLAS", wi::backlog::LogLevel::Error);
}
}
if (smoothBLAS_.IsValid()) {
smoothBLAS_.desc.bottom_level.geometries[0].triangles.vertex_count = smoothVertCount;
smoothBLAS_.desc.bottom_level.geometries[0].triangles.index_count = smoothVertCount;
dev->BuildRaytracingAccelerationStructure(&smoothBLAS_, cmd, nullptr);
}
smoothVertexCount_ = smoothVertCount;
}
// ── Toping BLAS ──
uint32_t topingVertCount = topingVertexCount_;
if ((buildFlags & BUILD_TOPING) && topingVertCount >= 3 && topingBLASPositionBuf_.isValid()) {
if (!topingBLAS_.IsValid() || topingVertCount > topingBLASASCapacity_) {
topingBLASASCapacity_ = topingVertCount + topingVertCount / 4;
RaytracingAccelerationStructureDesc desc;
desc.type = RaytracingAccelerationStructureDesc::Type::BOTTOMLEVEL;
desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
desc.bottom_level.geometries.resize(1);
auto& geom = desc.bottom_level.geometries[0];
geom.type = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::Type::TRIANGLES;
geom.flags = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::FLAG_OPAQUE;
geom.triangles.vertex_buffer = topingBLASPositionBuf_.gpu;
geom.triangles.vertex_byte_offset = 0;
geom.triangles.vertex_count = topingBLASASCapacity_;
geom.triangles.vertex_stride = sizeof(float) * 3;
geom.triangles.vertex_format = Format::R32G32B32_FLOAT;
geom.triangles.index_buffer = topingBLASIndexBuffer_;
geom.triangles.index_count = topingBLASASCapacity_;
geom.triangles.index_format = IndexBufferFormat::UINT32;
geom.triangles.index_offset = 0;
bool ok = dev->CreateRaytracingAccelerationStructure(&desc, &topingBLAS_);
if (ok) {
dev->SetName(&topingBLAS_, "VoxelRTManager::topingBLAS");
wi::backlog::post("VoxelRTManager: toping BLAS created (capacity "
+ std::to_string(topingBLASASCapacity_ / 3) + " tris)");
} else {
wi::backlog::post("VoxelRTManager: failed to create toping BLAS", wi::backlog::LogLevel::Error);
}
}
if (topingBLAS_.IsValid()) {
topingBLAS_.desc.bottom_level.geometries[0].triangles.vertex_count = topingVertCount;
topingBLAS_.desc.bottom_level.geometries[0].triangles.index_count = topingVertCount;
dev->BuildRaytracingAccelerationStructure(&topingBLAS_, cmd, nullptr);
}
}
// Memory barrier: sync BLAS builds before TLAS
{
GPUBarrier barriers[] = { GPUBarrier::Memory() };
dev->Barrier(barriers, 1, cmd);
}
// ── TLAS ──
uint32_t instanceCount = 0;
if (blockyBLAS_.IsValid()) instanceCount++;
if (smoothBLAS_.IsValid() && smoothVertCount > 0) instanceCount++;
if (topingBLAS_.IsValid() && topingVertCount >= 3) instanceCount++;
if (instanceCount == 0) { dirty = false; return; }
if (!tlas_.IsValid() || instanceCount != tlasInstanceCount_) {
const size_t instSize = dev->GetTopLevelAccelerationStructureInstanceSize();
auto setIdentity = [](float transform[3][4]) {
std::memset(transform, 0, sizeof(float) * 12);
transform[0][0] = 1.0f;
transform[1][1] = 1.0f;
transform[2][2] = 1.0f;
};
const RaytracingAccelerationStructure* blockyPtr = blockyBLAS_.IsValid() ? &blockyBLAS_ : nullptr;
const RaytracingAccelerationStructure* smoothPtr = (smoothBLAS_.IsValid() && smoothVertCount > 0) ? &smoothBLAS_ : nullptr;
const RaytracingAccelerationStructure* topingPtr = (topingBLAS_.IsValid() && topingVertCount >= 3) ? &topingBLAS_ : nullptr;
RaytracingAccelerationStructureDesc desc;
desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
desc.type = RaytracingAccelerationStructureDesc::Type::TOPLEVEL;
desc.top_level.count = instanceCount;
GPUBufferDesc bufdesc;
bufdesc.misc_flags = ResourceMiscFlag::RAY_TRACING;
bufdesc.stride = (uint32_t)instSize;
bufdesc.size = bufdesc.stride * desc.top_level.count;
auto initInstances = [&](void* dest) {
uint32_t idx = 0;
auto addInstance = [&](const RaytracingAccelerationStructure* blas, uint32_t id) {
if (!blas) return;
RaytracingAccelerationStructureDesc::TopLevel::Instance inst;
setIdentity(inst.transform);
inst.instance_id = id; inst.instance_mask = 0xFF;
inst.instance_contribution_to_hit_group_index = 0; inst.flags = 0;
inst.bottom_level = blas;
dev->WriteTopLevelAccelerationStructureInstance(&inst, (uint8_t*)dest + idx * instSize);
idx++;
};
addInstance(blockyPtr, 0);
addInstance(smoothPtr, 1);
addInstance(topingPtr, 2);
};
bool ok = dev->CreateBuffer2(&bufdesc, initInstances, &desc.top_level.instance_buffer);
if (!ok) {
wi::backlog::post("VoxelRTManager: failed to create TLAS instance buffer", wi::backlog::LogLevel::Error);
dirty = false;
return;
}
ok = dev->CreateRaytracingAccelerationStructure(&desc, &tlas_);
if (!ok) {
wi::backlog::post("VoxelRTManager: failed to create TLAS", wi::backlog::LogLevel::Error);
dirty = false;
return;
}
tlasInstanceCount_ = instanceCount;
wi::backlog::post("VoxelRTManager: TLAS created (" + std::to_string(instanceCount) + " instances)");
}
dev->BuildRaytracingAccelerationStructure(&tlas_, cmd, nullptr);
{
GPUBarrier barriers[] = { GPUBarrier::Memory(&tlas_) };
dev->Barrier(barriers, 1, cmd);
}
dirty = false;
}
// ── RT Shadow + AO dispatch ─────────────────────────────────────
void VoxelRTManager::dispatchShadows(CommandList cmd,
const Texture& depthBuffer,
const Texture& renderTarget,
const Texture& normalTarget,
const GPUBuffer& constantBuffer) const
{
if (!shadowsEnabled_ || !shadowShader_.IsValid() || !tlas_.IsValid())
return;
auto* dev = device_;
uint32_t w = renderTarget.GetDesc().width;
uint32_t h = renderTarget.GetDesc().height;
uint32_t gx = (w + 7) / 8;
uint32_t gy = (h + 7) / 8;
// Pass 1: Shadow + raw AO
{
GPUBarrier preBarriers[] = {
GPUBarrier::Image(&const_cast<Texture&>(depthBuffer),
ResourceState::DEPTHSTENCIL, ResourceState::SHADER_RESOURCE),
GPUBarrier::Image(&const_cast<Texture&>(renderTarget),
ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
GPUBarrier::Image(&aoRawTexture,
ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
};
dev->Barrier(preBarriers, 3, cmd);
dev->BindComputeShader(&shadowShader_, cmd);
dev->BindResource(&depthBuffer, 0, cmd);
dev->BindResource(&normalTarget, 1, cmd);
dev->BindResource(&tlas_, 2, cmd);
dev->BindResource(&aoHistoryTexture, 3, cmd);
dev->BindUAV(&renderTarget, 0, cmd);
dev->BindUAV(&aoRawTexture, 1, cmd);
dev->BindConstantBuffer(&constantBuffer, 0, cmd);
struct ShadowPush {
uint32_t width, height;
float normalBias, shadowMaxDist;
uint32_t debugMode;
float aoRadius;
uint32_t aoRayCount;
float aoStrength;
uint32_t frameIndex;
uint32_t historyValid;
uint32_t pad[2];
} pushData = {};
pushData.width = w;
pushData.height = h;
pushData.normalBias = 0.15f;
pushData.shadowMaxDist = 512.0f;
pushData.debugMode = shadowDebug_;
pushData.aoRadius = 8.0f;
pushData.aoRayCount = 4;
pushData.aoStrength = 0.7f;
pushData.frameIndex = frameCounter++;
pushData.historyValid = aoHistoryValid ? 1u : 0u;
dev->PushConstants(&pushData, sizeof(pushData), cmd);
dev->Dispatch(gx, gy, 1, cmd);
}
// Pass 1.5: Copy raw AO → history
{
GPUBarrier copyBarriers[] = {
GPUBarrier::Image(&aoRawTexture,
ResourceState::UNORDERED_ACCESS, ResourceState::COPY_SRC),
GPUBarrier::Image(&aoHistoryTexture,
ResourceState::SHADER_RESOURCE, ResourceState::COPY_DST),
};
dev->Barrier(copyBarriers, 2, cmd);
dev->CopyResource(&aoHistoryTexture, &aoRawTexture, cmd);
GPUBarrier postCopyBarriers[] = {
GPUBarrier::Image(&aoRawTexture,
ResourceState::COPY_SRC, ResourceState::SHADER_RESOURCE),
GPUBarrier::Image(&aoHistoryTexture,
ResourceState::COPY_DST, ResourceState::SHADER_RESOURCE),
};
dev->Barrier(postCopyBarriers, 2, cmd);
aoHistoryValid = true;
}
// Pass 2: Bilateral blur horizontal
{
GPUBarrier barriers[] = {
GPUBarrier::Image(&aoBlurredTexture,
ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
};
dev->Barrier(barriers, 1, cmd);
dev->BindComputeShader(&aoBlurShader_, cmd);
dev->BindResource(&aoRawTexture, 0, cmd);
dev->BindResource(&depthBuffer, 1, cmd);
dev->BindResource(&normalTarget, 2, cmd);
dev->BindUAV(&aoBlurredTexture, 0, cmd);
struct BlurPush {
uint32_t width, height, direction, radius;
float depthThreshold, normalThreshold;
uint32_t pad[6];
} blurPush = {};
blurPush.width = w; blurPush.height = h;
blurPush.direction = 0; blurPush.radius = 6;
blurPush.depthThreshold = 0.001f; blurPush.normalThreshold = 0.9f;
dev->PushConstants(&blurPush, sizeof(blurPush), cmd);
dev->Dispatch(gx, gy, 1, cmd);
}
// Pass 3: Bilateral blur vertical
{
GPUBarrier barriers[] = {
GPUBarrier::Image(&aoBlurredTexture,
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
GPUBarrier::Image(&aoRawTexture,
ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
};
dev->Barrier(barriers, 2, cmd);
dev->BindComputeShader(&aoBlurShader_, cmd);
dev->BindResource(&aoBlurredTexture, 0, cmd);
dev->BindResource(&depthBuffer, 1, cmd);
dev->BindResource(&normalTarget, 2, cmd);
dev->BindUAV(&aoRawTexture, 0, cmd);
struct BlurPush {
uint32_t width, height, direction, radius;
float depthThreshold, normalThreshold;
uint32_t pad[6];
} blurPush = {};
blurPush.width = w; blurPush.height = h;
blurPush.direction = 1; blurPush.radius = 6;
blurPush.depthThreshold = 0.001f; blurPush.normalThreshold = 0.9f;
dev->PushConstants(&blurPush, sizeof(blurPush), cmd);
dev->Dispatch(gx, gy, 1, cmd);
}
// Pass 4: Apply blurred AO
{
GPUBarrier barriers[] = {
GPUBarrier::Image(&aoRawTexture,
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
};
dev->Barrier(barriers, 1, cmd);
dev->BindComputeShader(&aoApplyShader_, cmd);
dev->BindResource(&aoRawTexture, 0, cmd);
dev->BindResource(&depthBuffer, 1, cmd);
dev->BindUAV(&renderTarget, 0, cmd);
struct ApplyPush {
uint32_t width, height, debugMode;
uint32_t pad[9];
} applyPush = {};
applyPush.width = w; applyPush.height = h;
applyPush.debugMode = shadowDebug_;
dev->PushConstants(&applyPush, sizeof(applyPush), cmd);
dev->Dispatch(gx, gy, 1, cmd);
}
// Restore resource states
GPUBarrier postBarriers[] = {
GPUBarrier::Image(&const_cast<Texture&>(depthBuffer),
ResourceState::SHADER_RESOURCE, ResourceState::DEPTHSTENCIL),
GPUBarrier::Image(&const_cast<Texture&>(renderTarget),
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
};
dev->Barrier(postBarriers, 2, cmd);
}
} // namespace voxel

124
src/voxel/VoxelRTManager.h Normal file
View file

@ -0,0 +1,124 @@
#pragma once
#include "DeferredGPUBuffer.h"
#include "WickedEngine.h"
namespace voxel {
// ── Ray Tracing Manager (Phase 6) ──────────────────────────────
// Groups all RT state: BLAS/TLAS management, shadow/AO dispatches.
// Extracted from VoxelRenderer to isolate the ~500 lines of RT code
// and its 20+ members for easier debugging and maintenance.
class VoxelRTManager {
public:
// ── Initialization ──────────────────────────────────────────
void initialize(wi::graphics::GraphicsDevice* device, uint32_t maxBlasVertices);
// ── BLAS extraction (compute shaders) ───────────────────────
// Extract blocky quad positions into BLAS vertex buffer.
void dispatchBLASExtract(wi::graphics::CommandList cmd,
const wi::graphics::GPUBuffer& quadBuffer,
const wi::graphics::GPUBuffer& chunkInfoBuffer,
uint32_t quadCount) const;
// Extract toping instance positions via GPU compute.
// groupBuffer/groupsGPU: toping BLAS group table.
void dispatchTopingBLASExtract(wi::graphics::CommandList cmd,
const wi::graphics::GPUBuffer& topingVertexBuffer,
const wi::graphics::GPUBuffer& topingInstanceBuffer,
const void* groupsGPUData, size_t groupsGPUSize,
uint32_t groupCount, uint32_t totalVertices) const;
// ── Acceleration structure build ────────────────────────────
static constexpr uint32_t BUILD_BLOCKY = 1 << 0;
static constexpr uint32_t BUILD_SMOOTH = 1 << 1;
static constexpr uint32_t BUILD_TOPING = 1 << 2;
static constexpr uint32_t BUILD_ALL = BUILD_BLOCKY | BUILD_SMOOTH | BUILD_TOPING;
void buildAccelerationStructures(wi::graphics::CommandList cmd,
uint32_t buildFlags,
const wi::graphics::GPUBuffer& smoothVB,
uint32_t smoothVertCount) const;
// ── RT Shadows + AO dispatch ────────────────────────────────
void dispatchShadows(wi::graphics::CommandList cmd,
const wi::graphics::Texture& depthBuffer,
const wi::graphics::Texture& renderTarget,
const wi::graphics::Texture& normalTarget,
const wi::graphics::GPUBuffer& constantBuffer) const;
// ── Toping BLAS buffer management ───────────────────────────
// Ensure capacity for toping BLAS position + index buffers.
// Returns true if buffers were (re)created.
bool ensureTopingBLASCapacity(uint32_t totalVertices);
// ── State queries ───────────────────────────────────────────
bool isAvailable() const { return available_; }
bool isReady() const { return available_ && tlas_.IsValid(); }
bool isShadowsEnabled() const { return shadowsEnabled_; }
void setShadowsEnabled(bool v) { shadowsEnabled_ = v; }
uint32_t getShadowDebug() const { return shadowDebug_; }
void setShadowDebug(uint32_t v) { shadowDebug_ = v; }
uint32_t getBlockyTriCount() const { return blockyVertexCount_ / 3; }
uint32_t getSmoothTriCount() const { return smoothVertexCount_ / 3; }
uint32_t getTopingTriCount() const { return topingVertexCount_ / 3; }
uint32_t getTopingVertexCount() const { return topingVertexCount_; }
uint32_t getTlasInstanceCount() const { return tlasInstanceCount_; }
const wi::graphics::RaytracingAccelerationStructure& getTLAS() const { return tlas_; }
// Dirty flags (public for VoxelRenderPath orchestration)
mutable bool dirty = true; // BLAS/TLAS need rebuild
mutable bool topingBLASDirty = false; // toping BLAS extract + rebuild needed
mutable bool aoHistoryValid = false;
mutable uint32_t frameCounter = 0;
mutable XMFLOAT4X4 prevViewProjection;
// AO textures (created by VoxelRenderPath::createRenderTargets)
mutable wi::graphics::Texture aoRawTexture;
mutable wi::graphics::Texture aoBlurredTexture;
mutable wi::graphics::Texture aoHistoryTexture;
private:
wi::graphics::GraphicsDevice* device_ = nullptr;
mutable bool available_ = false;
mutable bool shadowsEnabled_ = false;
mutable uint32_t shadowDebug_ = 0;
// Shaders
wi::graphics::Shader blasExtractShader_;
wi::graphics::Shader topingBLASShader_;
wi::graphics::Shader shadowShader_;
wi::graphics::Shader aoBlurShader_;
wi::graphics::Shader aoApplyShader_;
// Blocky BLAS resources
mutable wi::graphics::GPUBuffer blasPositionBuffer_;
wi::graphics::GPUBuffer blasIndexBuffer_;
mutable wi::graphics::RaytracingAccelerationStructure blockyBLAS_;
mutable uint32_t blockyBLASCapacity_ = 0;
mutable uint32_t blockyVertexCount_ = 0;
// Smooth BLAS
mutable wi::graphics::RaytracingAccelerationStructure smoothBLAS_;
mutable uint32_t smoothBLASCapacity_ = 0;
mutable uint32_t smoothVertexCount_ = 0;
// Toping BLAS
mutable wi::graphics::RaytracingAccelerationStructure topingBLAS_;
mutable uint32_t topingBLASASCapacity_ = 0;
mutable uint32_t topingVertexCount_ = 0;
mutable DeferredGPUBuffer topingBLASPositionBuf_;
mutable wi::graphics::GPUBuffer topingBLASIndexBuffer_;
mutable uint32_t topingBLASIndexCount_ = 0;
wi::graphics::GPUBuffer topingBLASGroupBuffer_;
// TLAS
mutable wi::graphics::RaytracingAccelerationStructure tlas_;
mutable uint32_t tlasInstanceCount_ = 0;
uint32_t maxBlasVertices_ = 0;
};
} // namespace voxel

File diff suppressed because it is too large Load diff

View file

@ -2,6 +2,8 @@
#include "VoxelWorld.h"
#include "VoxelMesher.h"
#include "TopingSystem.h"
#include "DeferredGPUBuffer.h"
#include "VoxelRTManager.h"
#include "WickedEngine.h"
namespace voxel {
@ -27,7 +29,7 @@ struct GPUChunkInfo {
uint32_t pad2[2]; // pad to 112 bytes (7 × float4)
};
// ── Voxel Renderer (Phase 2: mega-buffer + MDI pipeline) ────────
// ── Voxel Renderer (GPU mesh pipeline) ──────────────────────────
class VoxelRenderer {
friend class VoxelRenderPath;
public:
@ -49,8 +51,8 @@ public:
const wi::graphics::Texture& normalTarget
) const;
// Generate procedural textures for materials
void generateTextures();
// Load material textures from PNG files (RGB=albedo, A=heightmap)
void loadTextures();
// Stats
uint32_t getTotalQuads() const { return totalQuads_; }
@ -58,16 +60,16 @@ public:
uint32_t getDrawCalls() const { return drawCalls_; }
uint32_t getChunkCount() const { return chunkCount_; }
bool isInitialized() const { return initialized_; }
bool isGpuCulling() const { return gpuCullingEnabled_; }
bool isMdiEnabled() const { return mdiEnabled_; }
bool debugFaceColors_ = false;
bool debugBlend_ = false;
float windTime_ = 0.0f; // set by VoxelRenderPath::Update each frame
float normalStrength_ = 0.7f; // normal map strength (0=off)
int debugLighting_ = 0; // 0=all, 1=no nmap, 2=flat, 3=albedo, 4=NdotL
XMFLOAT4 sunDirection_ = { -0.7f, -0.4f, -0.3f, 0.0f }; // set by VoxelRenderPath::Update
private:
void createPipeline();
void rebuildMegaBuffer(VoxelWorld& world);
wi::graphics::GraphicsDevice* device_ = nullptr;
@ -75,16 +77,12 @@ private:
wi::graphics::Shader vertexShader_;
wi::graphics::Shader pixelShader_;
wi::graphics::PipelineState pso_;
wi::graphics::Shader cullShader_; // Frustum cull compute shader
// Shaders & Pipeline (topings, Phase 4)
wi::graphics::Shader topingVS_;
wi::graphics::Shader topingPS_;
wi::graphics::PipelineState topingPso_;
wi::graphics::GPUBuffer topingVertexBuffer_; // StructuredBuffer<TopingVertex>, SRV t4
wi::graphics::GPUBuffer topingInstanceBuffer_; // StructuredBuffer<float3>, SRV t5
mutable uint32_t topingInstanceCapacity_ = 0; // pre-allocated capacity (avoid per-frame CreateBuffer)
mutable bool topingInstanceDirty_ = false; // deferred upload via UpdateBuffer in Render()
DeferredGPUBuffer topingInstanceBuf_; // StructuredBuffer<float3>, SRV t5
static constexpr uint32_t MAX_TOPING_INSTANCES = 256 * 1024; // 256K instances max
// Persistent staging buffers for toping upload (avoids per-frame allocations)
struct TopingSortedInst { float wx, wy, wz; uint16_t type, variant; };
@ -93,30 +91,41 @@ private:
std::vector<TopingGPUInst> topingGpuInsts_;
mutable uint32_t topingDrawCalls_ = 0;
// ── Toping draw groups (shared between render + BLAS CS) ─────
struct TopingDrawGroup {
uint16_t type, variant;
uint32_t instanceOffset, instanceCount;
uint32_t vertexTemplateOffset, vertexCount; // from TopingDef::variants[]
};
std::vector<TopingDrawGroup> topingDrawGroups_; // built in uploadTopingData, reused in renderTopings
// ── Toping BLAS group staging (passed to VoxelRTManager) ──────
struct TopingBLASGroupGPU {
uint32_t globalVertexOffset; // prefix sum of total vertices before this group
uint32_t vertexTemplateOffset; // offset into topingVertices (t4)
uint32_t vertexCount; // vertices per instance
uint32_t instanceOffset; // offset into topingInstances (t5)
uint32_t instanceCount; // instances in this group
};
std::vector<TopingBLASGroupGPU> topingBLASGroupsGPU_; // CPU staging for group table
mutable uint32_t topingBLASTotalVertices_ = 0;
// Shaders & Pipeline (smooth surfaces, Phase 5)
wi::graphics::Shader smoothVS_;
wi::graphics::Shader smoothPS_;
wi::graphics::RasterizerState smoothRasterizer_;
wi::graphics::PipelineState smoothPso_;
wi::graphics::GPUBuffer smoothVertexBuffer_; // StructuredBuffer<SmoothVertex>, SRV t6
mutable uint32_t smoothVertexCapacity_ = 0; // pre-allocated capacity (avoid per-frame CreateBuffer)
std::vector<SmoothVertex> smoothStagingVerts_; // persistent staging buffer (avoids per-frame alloc)
static constexpr uint32_t MAX_SMOOTH_VERTICES = 4 * 1024 * 1024; // 4M vertices max
mutable uint32_t smoothVertexCount_ = 0;
mutable uint32_t smoothDrawCalls_ = 0;
mutable bool smoothVertexDirty_ = false; // deferred upload via UpdateBuffer in Render()
bool smoothDirty_ = true;
// Texture array for materials (256x256, 5 layers for prototype)
wi::graphics::Texture textureArray_;
// Texture arrays for materials (512x512, 6 layers each)
wi::graphics::Texture textureArray_; // RGBA: RGB=albedo, A=heightmap (t1)
wi::graphics::Texture normalArray_; // RGB: tangent-space normal map (t7)
wi::graphics::Sampler sampler_;
// ── Mega-buffer architecture (Phase 2) ──────────────────────
static constexpr uint32_t MEGA_BUFFER_CAPACITY = 2 * 1024 * 1024; // 2M quads max (16 MB)
static constexpr uint32_t MAX_CHUNKS = 2048;
static constexpr uint32_t MAX_DRAWS = MAX_CHUNKS * 6; // up to 6 face groups per chunk
wi::graphics::GPUBuffer megaQuadBuffer_; // StructuredBuffer<PackedQuad>, SRV t0
wi::graphics::GPUBuffer chunkInfoBuffer_; // StructuredBuffer<GPUChunkInfo>, SRV t2
// CPU-side tracking
@ -127,27 +136,9 @@ private:
};
std::vector<ChunkSlot> chunkSlots_;
std::vector<GPUChunkInfo> cpuChunkInfo_;
std::vector<PackedQuad> cpuMegaQuads_; // CPU staging for mega-buffer
uint32_t chunkCount_ = 0;
bool megaBufferDirty_ = true;
// ── Indirect draw (Phase 2 MDI) ─────────────────────────────
// Wicked Engine's DrawInstancedIndirectCount command signature includes a
// push constant (1 × uint32 at b999) BEFORE each D3D12_DRAW_ARGUMENTS.
// Total stride = 4 + 16 = 20 bytes per draw entry.
struct IndirectDrawArgs {
uint32_t pushConstant; // written to b999[0] by ExecuteIndirect
uint32_t vertexCountPerInstance;
uint32_t instanceCount;
uint32_t startVertexLocation;
uint32_t startInstanceLocation;
};
wi::graphics::GPUBuffer indirectArgsBuffer_; // IndirectDrawArgs[MAX_DRAWS]
wi::graphics::GPUBuffer drawCountBuffer_; // uint32_t[1]
mutable std::vector<IndirectDrawArgs> cpuIndirectArgs_;
bool gpuCullingEnabled_ = true; // Phase 2.3: GPU compute cull (true) vs CPU fallback (false)
bool mdiEnabled_ = true; // Phase 2.2: MDI rendering with CPU-filled indirect args
// Constants buffer (must match HLSL VoxelCB)
struct VoxelConstants {
XMFLOAT4X4 viewProjection;
@ -184,7 +175,6 @@ private:
wi::graphics::GPUBuffer gpuQuadCounter_; // atomic counter for GPU mesh output
wi::graphics::GPUBuffer meshCounterReadback_; // READBACK buffer for quad counter
bool gpuMesherAvailable_ = false;
bool gpuMeshEnabled_ = true; // Use GPU meshing instead of CPU greedy
mutable uint32_t gpuMeshQuadCount_ = 0; // Readback from previous frame (1-frame delay)
mutable uint32_t voxelDataCapacity_ = 0; // Current capacity of voxelDataBuffer_ (in uint32s)
mutable std::vector<uint32_t> packedVoxelCache_; // cached packed voxel data for all chunks
@ -204,81 +194,39 @@ private:
mutable uint32_t gpuSmoothVertexCount_ = 0; // readback from previous frame
mutable bool gpuSmoothMeshDirty_ = true;
// ── Ray Tracing (Phase 6.1) ─────────────────────────────────────
wi::graphics::Shader blasExtractShader_; // voxelBLASExtractCS compute shader
mutable wi::graphics::GPUBuffer blasPositionBuffer_; // float3[] for blocky BLAS (6 verts per quad)
wi::graphics::GPUBuffer blasIndexBuffer_; // sequential uint32 indices [0,1,2,...] for BLAS
mutable wi::graphics::RaytracingAccelerationStructure blockyBLAS_;
mutable wi::graphics::RaytracingAccelerationStructure smoothBLAS_;
mutable wi::graphics::RaytracingAccelerationStructure topingBLAS_;
mutable wi::graphics::RaytracingAccelerationStructure tlas_;
mutable wi::graphics::GPUBuffer topingBLASPositionBuffer_; // float3[] world-space toping positions
mutable wi::graphics::GPUBuffer topingBLASIndexBuffer_; // sequential indices for toping BLAS
mutable uint32_t topingBLASPositionCapacity_ = 0; // pre-allocated capacity (vertices)
mutable uint32_t topingBLASIndexCount_ = 0; // size of toping index buffer
mutable bool topingBLASDirty_ = false; // deferred BLAS position upload + rebuild
mutable uint32_t topingBLASVertexCount_ = 0; // actual vertex count for current frame
std::vector<float> topingBLASPositionStaging_; // CPU staging for deferred upload
// ── Ray Tracing (Phase 6) ────────────────────────────────────────
static constexpr uint32_t MAX_BLAS_VERTICES = MEGA_BUFFER_CAPACITY * 6; // 6 verts per quad
mutable bool rtAvailable_ = false; // GPU supports RT
mutable bool rtDirty_ = true; // BLAS/TLAS need rebuild
mutable uint32_t rtBlockyVertexCount_ = 0; // current blocky BLAS vertex count
mutable uint32_t rtSmoothVertexCount_ = 0; // current smooth BLAS vertex count
mutable uint32_t rtTopingVertexCount_ = 0; // current toping BLAS vertex count
// BLAS capacity tracking: only recreate AS when vertex count exceeds capacity
mutable uint32_t blockyBLASCapacity_ = 0; // vertex count at BLAS creation
mutable uint32_t smoothBLASCapacity_ = 0;
mutable uint32_t topingBLASASCapacity_ = 0; // separate from topingBLASPositionCapacity_ (buffer capacity)
mutable uint32_t tlasInstanceCount_ = 0; // track TLAS instance count to avoid per-frame recreation
mutable VoxelRTManager rt_;
void dispatchBLASExtract(wi::graphics::CommandList cmd) const;
void buildAccelerationStructures(wi::graphics::CommandList cmd) const;
// ── RT Shadows + AO (Phase 6.2 + 6.3) ──────────────────────────
wi::graphics::Shader shadowShader_; // voxelShadowCS compute shader
wi::graphics::Shader aoBlurShader_; // voxelAOBlurCS compute shader
wi::graphics::Shader aoApplyShader_; // voxelAOApplyCS compute shader
mutable wi::graphics::Texture aoRawTexture_; // R8_UNORM: raw AO from shadow CS
mutable wi::graphics::Texture aoBlurredTexture_; // R8_UNORM: after bilateral blur
mutable wi::graphics::Texture aoHistoryTexture_; // R8_UNORM: previous frame's temporally accumulated AO
mutable XMFLOAT4X4 prevViewProjection_; // previous frame's VP matrix
mutable uint32_t frameCounter_ = 0;
mutable bool aoHistoryValid_ = false;
mutable bool rtShadowsEnabled_ = false; // true when shader + TLAS ready
mutable uint32_t rtShadowDebug_ = 0; // 0=off, 1=debug shadows, 2=debug AO
void dispatchShadows(wi::graphics::CommandList cmd,
const wi::graphics::Texture& depthBuffer,
const wi::graphics::Texture& renderTarget,
const wi::graphics::Texture& normalTarget) const;
// Benchmark state machine: runs once after world gen
enum class BenchState { IDLE, DISPATCH, READBACK, DONE };
mutable BenchState benchState_ = BenchState::IDLE;
mutable float cpuMeshTimeMs_ = 0.0f;
mutable uint32_t gpuBaselineQuads_ = 0;
void dispatchGpuMeshBenchmark(wi::graphics::CommandList cmd, const VoxelWorld& world) const;
void readbackGpuMeshBenchmark() const;
void dispatchGpuMesh(wi::graphics::CommandList cmd, const VoxelWorld& world,
ProfileAccum* profPack = nullptr, ProfileAccum* profUpload = nullptr,
ProfileAccum* profDispatch = nullptr) const;
void dispatchGpuSmoothMesh(wi::graphics::CommandList cmd, const VoxelWorld& world) const;
void rebuildChunkInfoOnly(VoxelWorld& world);
// ── GPU Timestamp Queries (Phase 2 benchmark) ────────────────
// ── GPU Timestamp Queries (comprehensive GPU profiling) ────────
wi::graphics::GPUQueryHeap timestampHeap_;
wi::graphics::GPUBuffer timestampReadback_;
static constexpr uint32_t TS_CULL_BEGIN = 0;
static constexpr uint32_t TS_CULL_END = 1;
static constexpr uint32_t TS_DRAW_BEGIN = 2;
static constexpr uint32_t TS_DRAW_END = 3;
static constexpr uint32_t TS_MESH_BEGIN = 4;
static constexpr uint32_t TS_MESH_END = 5;
static constexpr uint32_t TS_COUNT = 6;
mutable float gpuCullTimeMs_ = 0.0f;
mutable float gpuDrawTimeMs_ = 0.0f;
// Timestamp slots: pairs of (BEGIN, END) for each GPU phase
static constexpr uint32_t TS_GPU_MESH_BEGIN = 0;
static constexpr uint32_t TS_GPU_MESH_END = 1;
static constexpr uint32_t TS_GPU_SMOOTH_BEGIN = 2;
static constexpr uint32_t TS_GPU_SMOOTH_END = 3;
static constexpr uint32_t TS_BLAS_EXTRACT_BEGIN = 4;
static constexpr uint32_t TS_BLAS_EXTRACT_END = 5;
static constexpr uint32_t TS_BLAS_BUILD_BEGIN = 6;
static constexpr uint32_t TS_BLAS_BUILD_END = 7;
static constexpr uint32_t TS_DRAW_BEGIN = 8;
static constexpr uint32_t TS_DRAW_END = 9;
static constexpr uint32_t TS_RT_SHADOWS_BEGIN = 10;
static constexpr uint32_t TS_RT_SHADOWS_END = 11;
static constexpr uint32_t TS_COUNT = 12;
mutable float gpuMeshTimeMs_ = 0.0f;
mutable float gpuSmoothMeshTimeMs_ = 0.0f;
mutable float gpuBLASExtractTimeMs_ = 0.0f;
mutable float gpuBLASBuildTimeMs_ = 0.0f;
mutable float gpuDrawTimeMs_ = 0.0f;
mutable float gpuRTShadowsTimeMs_ = 0.0f;
// Stats (mutable: updated during const Render() call)
mutable uint32_t totalQuads_ = 0;
@ -288,10 +236,15 @@ private:
bool initialized_ = false;
public:
float getGpuCullTimeMs() const { return gpuCullTimeMs_; }
float getGpuDrawTimeMs() const { return gpuDrawTimeMs_; }
bool isGpuMeshEnabled() const { return gpuMeshEnabled_ && gpuMesherAvailable_; }
float getGpuMeshTimeMs() const { return gpuMeshTimeMs_; }
float getGpuSmoothMeshTimeMs() const { return gpuSmoothMeshTimeMs_; }
float getGpuBLASExtractTimeMs() const { return gpuBLASExtractTimeMs_; }
float getGpuBLASBuildTimeMs() const { return gpuBLASBuildTimeMs_; }
float getGpuRTShadowsTimeMs() const { return gpuRTShadowsTimeMs_; }
bool isGpuMeshEnabled() const { return gpuMesherAvailable_; }
uint32_t getGpuMeshQuadCount() const { return gpuMeshQuadCount_; }
VoxelRTManager& rt() const { return rt_; }
// Phase 4: Toping rendering
void uploadTopingData(const TopingSystem& topingSystem);
@ -304,26 +257,105 @@ public:
) const;
uint32_t getTopingDrawCalls() const { return topingDrawCalls_; }
// Phase 5: Smooth surface rendering
void uploadSmoothData(VoxelWorld& world);
void uploadSmoothDataFast(VoxelWorld& world); // chunkIndex already stamped
// Phase 5: Smooth surface rendering (GPU compute only)
void renderSmooth(
wi::graphics::CommandList cmd,
const wi::graphics::Texture& depthBuffer,
const wi::graphics::Texture& renderTarget,
const wi::graphics::Texture& normalTarget
) const;
uint32_t getSmoothVertexCount() const { return (smoothCentroidShader_.IsValid() && smoothMeshShader_.IsValid()) ? gpuSmoothVertexCount_ : smoothVertexCount_; }
uint32_t getSmoothVertexCount() const { return gpuSmoothVertexCount_; }
uint32_t getSmoothDrawCalls() const { return smoothDrawCalls_; }
// Phase 6: Ray Tracing
bool isRTAvailable() const { return rtAvailable_; }
bool isRTReady() const { return rtAvailable_ && tlas_.IsValid(); }
bool isRTShadowsEnabled() const { return rtShadowsEnabled_; }
uint32_t getRTBlockyTriCount() const { return rtBlockyVertexCount_ / 3; }
uint32_t getRTSmoothTriCount() const { return rtSmoothVertexCount_ / 3; }
uint32_t getRTTopingTriCount() const { return rtTopingVertexCount_ / 3; }
const wi::graphics::RaytracingAccelerationStructure& getTLAS() const { return tlas_; }
// Phase 6: Ray Tracing (delegated to VoxelRTManager)
bool isRTAvailable() const { return rt_.isAvailable(); }
bool isRTReady() const { return rt_.isReady(); }
bool isRTShadowsEnabled() const { return rt_.isShadowsEnabled(); }
uint32_t getRTBlockyTriCount() const { return rt_.getBlockyTriCount(); }
uint32_t getRTSmoothTriCount() const { return rt_.getSmoothTriCount(); }
uint32_t getRTTopingTriCount() const { return rt_.getTopingTriCount(); }
const wi::graphics::RaytracingAccelerationStructure& getTLAS() const { return rt_.getTLAS(); }
};
// ── Camera Controller ────────────────────────────────────────────
struct CameraController {
float speed = 50.0f;
float sensitivity = 0.003f;
XMFLOAT3 pos = { 256.0f, 100.0f, 256.0f };
float pitch = -0.3f;
float yaw = 0.0f;
bool mouseCaptured = false;
void set(float x, float y, float z, float p, float yw) {
pos = { x, y, z }; pitch = p; yaw = yw;
}
void handleInput(float dt, wi::scene::CameraComponent* camera);
};
// ── Animation State ─────────────────────────────────────────────
struct AnimationState {
float windTime = 0.0f; // continuous, always running
bool terrainAnimated = false; // toggled with F3
bool sunOrbit = false; // toggled with F7: sun orbits in ~10s cycle
bool showCrosshair = true; // toggled with F8: crosshair + face debug info
// F9 debug cycle: 0=all ON, 1=normals OFF, 2=flat lighting, 3=albedo only, 4=NdotL only, 5=normal viz
int debugLighting = 0;
static constexpr int DEBUG_LIGHTING_MODES = 6;
float time = 0.0f; // current animation time offset
float accum = 0.0f; // accumulator for 30 Hz timer
static constexpr float INTERVAL = 1.0f / 30.0f; // ~33.3ms = 30 Hz
// Returns true when an animation tick should fire (call every frame).
bool tick(float dt) {
windTime += dt;
if (!terrainAnimated) return false;
accum += dt;
if (accum < INTERVAL) return false;
accum -= INTERVAL;
time += INTERVAL;
return true;
}
};
// ── CPU Profiling (averages every INTERVAL seconds) ─────────────
struct VoxelProfiler {
static constexpr float INTERVAL = 5.0f;
// Update() phase
ProfileAccum regenerate; // regenerateAnimated
ProfileAccum updateMeshes; // updateMeshes (rebuildChunkInfoOnly)
ProfileAccum topingCollect; // topingSystem.collectInstances
ProfileAccum topingUpload; // uploadTopingData
ProfileAccum smoothMesh; // (legacy, unused — GPU smooth only)
ProfileAccum smoothUpload; // (legacy, unused — GPU smooth only)
ProfileAccum frame; // full frame (Update only - legacy)
// Render() phase
ProfileAccum voxelPack; // voxel data packing in dispatchGpuMesh
ProfileAccum gpuUpload; // GPU upload in dispatchGpuMesh
ProfileAccum gpuDispatch; // compute dispatches in dispatchGpuMesh
ProfileAccum gpuMeshDispatch; // GPU mesh compute dispatch (in Render)
ProfileAccum gpuSmoothDispatch; // GPU smooth mesh dispatch (in Render)
ProfileAccum blasExtract; // BLAS position extraction compute
ProfileAccum blasBuild; // BLAS/TLAS build
ProfileAccum deferredUpload; // deferred GPU buffer uploads
ProfileAccum render; // render() draw calls
ProfileAccum rtShadows; // RT shadows + AO dispatch
// Totals
ProfileAccum fullFrame; // true full frame (Update + Render + Compose)
ProfileAccum gpuWait; // GPU sync: time between Compose end and next Update start
ProfileAccum wickedRender; // RenderPath3D::Render() (Wicked internal)
ProfileAccum trueFrame; // wall-clock frame-to-frame time
// Timing helpers
std::chrono::high_resolution_clock::time_point frameStart;
std::chrono::high_resolution_clock::time_point lastComposeEnd;
bool lastComposeEndValid = false;
float timer = 0.0f;
void log(const VoxelRenderer& renderer) const;
void resetAll();
};
// ── Custom RenderPath that integrates voxel rendering ───────────
@ -336,15 +368,14 @@ public:
bool debugMode = false;
bool debugSmooth = false;
bool screenshotMode = false; // CLI "screenshot": auto-position camera, capture, quit
void setCamera(float x, float y, float z, float pitch, float yaw);
void setCamera(float x, float y, float z, float pitch, float yaw) {
camera_.set(x, y, z, pitch, yaw);
}
void resetAOHistory(); // invalidate temporal AO after camera jump
float cameraSpeed = 50.0f;
float cameraSensitivity = 0.003f;
XMFLOAT3 cameraPos = { 256.0f, 100.0f, 256.0f };
float cameraPitch = -0.3f;
float cameraYaw = 0.0f;
bool mouseCaptured = false;
CameraController camera_;
AnimationState anim_;
mutable VoxelProfiler prof_;
const wi::graphics::Texture& getVoxelRT() const { return voxelRT_; }
@ -354,42 +385,32 @@ public:
void Compose(wi::graphics::CommandList cmd) const override;
private:
void handleInput(float dt);
void createRenderTargets();
mutable bool worldGenerated_ = false;
mutable int frameCount_ = 0;
mutable float lastDt_ = 0.016f;
mutable float smoothFps_ = 60.0f;
// Wind animation (continuous, always running)
float windTime_ = 0.0f;
// Animated terrain (wave effect at 60 Hz, toggled with F3)
bool animatedTerrain_ = false;
float animTime_ = 0.0f;
float animAccum_ = 0.0f;
static constexpr float ANIM_INTERVAL = 1.0f / 60.0f; // ~16.7ms = 60 Hz
wi::graphics::Texture voxelRT_;
wi::graphics::Texture voxelNormalRT_; // Phase 6: world-space normals for RT shadows/AO
wi::graphics::Texture voxelDepth_;
mutable bool rtCreated_ = false;
// ── CPU Profiling (averages every 5 seconds) ─────────────────
mutable ProfileAccum profRegenerate_; // regenerateAnimated
mutable ProfileAccum profUpdateMeshes_; // updateMeshes (rebuildChunkInfoOnly or CPU mesh)
mutable ProfileAccum profVoxelPack_; // voxel data packing in dispatchGpuMesh
mutable ProfileAccum profGpuUpload_; // GPU upload in dispatchGpuMesh
mutable ProfileAccum profGpuDispatch_; // compute dispatches in dispatchGpuMesh
mutable ProfileAccum profRender_; // render() total
mutable ProfileAccum profFrame_; // full frame (Update + Render + Compose)
mutable ProfileAccum profSmoothMesh_; // SmoothMesher::meshChunk (all chunks)
mutable ProfileAccum profSmoothUpload_; // uploadSmoothData
mutable ProfileAccum profTopingCollect_; // topingSystem.collectInstances
mutable ProfileAccum profTopingUpload_; // uploadTopingData
mutable float profTimer_ = 0.0f;
static constexpr float PROF_INTERVAL = 5.0f;
void logProfilingAverages() const;
mutable uint32_t rtBuildSkipCounter_ = 0; // stagger BLAS builds during animation
mutable bool rtWasEnabled_ = false; // saved RT state before animation
// Cached crosshair raycast result (updated each frame in Compose)
struct CrosshairHit {
bool valid = false;
int x = 0, y = 0, z = 0;
int face = -1; // 0=+X,1=-X,2=+Y,3=-Y,4=+Z,5=-Z
uint8_t matID = 0;
bool smooth = false;
};
mutable CrosshairHit crosshairHit_;
// Build a full debug log string (used by HUD overlay and screenshot .log)
std::string buildDebugLog() const;
};
} // namespace voxel

View file

@ -115,7 +115,7 @@ void VoxelWorld::generateChunk(Chunk& chunk, float timeOffset) {
const float caveScale = 0.05f;
const float caveThreshold = 0.3f;
// Animation mode: fewer octaves + skip caves (much faster for 20Hz regen)
// Animation mode: fewer octaves + skip caves + cached materials (much faster for 30Hz regen)
const bool animating = (timeOffset != 0.0f);
const int heightOctaves = animating ? 2 : 5;
@ -130,34 +130,47 @@ void VoxelWorld::generateChunk(Chunk& chunk, float timeOffset) {
float height = baseHeight + heightScale * fbm(wx * scale, timeOffset, wz * scale, heightOctaves);
// ── Surface material via noise-based patches ──
// Use 2D noise at different frequencies/seeds to create organic patches
// of each material on the surface, instead of altitude bands.
float matNoise1 = fbm(wx * 0.03f + 500.0f, 0.0f, wz * 0.03f + 500.0f, 3); // large patches
float matNoise2 = fbm(wx * 0.08f + 1000.0f, 0.0f, wz * 0.08f + 1000.0f, 2); // medium detail
float matNoise3 = fbm(wx * 0.05f + 2000.0f, 0.0f, wz * 0.05f + 2000.0f, 3); // third channel
// Combined noise for material selection (range roughly -1..1)
float matVal = matNoise1 * 0.6f + matNoise2 * 0.4f;
// Material noise is time-independent (uses y=0.0f, no timeOffset).
// During animation, reuse cached values to skip 8 noise3D calls/column.
const int colIdx = x + z * CHUNK_SIZE;
uint8_t surfaceMat;
bool surfaceSmooth = false;
if (matVal < -0.30f) {
surfaceMat = 4; // Sand
} else if (matVal < -0.15f) {
surfaceMat = 2; // Dirt (adjacent to sand for sand↔dirt testing)
} else if (matVal < -0.05f) {
surfaceMat = 3; // Stone (blocky, with topings)
} else if (matVal < 0.05f) {
surfaceMat = 6; // SmoothStone (smooth surface)
surfaceSmooth = true;
} else if (matVal < 0.20f) {
surfaceMat = 1; // Grass
} else if (matVal < 0.30f) {
surfaceMat = 4; // Sand (adjacent to grass for sand↔grass testing)
} else if (matNoise3 > 0.1f) {
surfaceMat = 5; // Snow (smooth)
surfaceSmooth = true;
if (animating) {
// Fast path: read cached material from initial generation
surfaceMat = chunk.cachedSurfaceMat[colIdx];
surfaceSmooth = (chunk.cachedSurfaceFlags[colIdx] != 0);
} else {
surfaceMat = 2; // Dirt
// Full path: compute material noise and cache it
float matNoise1 = fbm(wx * 0.03f + 500.0f, 0.0f, wz * 0.03f + 500.0f, 3); // large patches
float matNoise2 = fbm(wx * 0.08f + 1000.0f, 0.0f, wz * 0.08f + 1000.0f, 2); // medium detail
float matNoise3 = fbm(wx * 0.05f + 2000.0f, 0.0f, wz * 0.05f + 2000.0f, 3); // third channel
float matVal = matNoise1 * 0.6f + matNoise2 * 0.4f;
if (matVal < -0.30f) {
surfaceMat = 4; // Sand
} else if (matVal < -0.15f) {
surfaceMat = 2; // Dirt
} else if (matVal < -0.05f) {
surfaceMat = 3; // Stone (blocky, with topings)
} else if (matVal < 0.05f) {
surfaceMat = 6; // SmoothStone (smooth surface)
surfaceSmooth = true;
} else if (matVal < 0.20f) {
surfaceMat = 1; // Grass
} else if (matVal < 0.30f) {
surfaceMat = 4; // Sand
} else if (matNoise3 > 0.1f) {
surfaceMat = 5; // Snow (smooth)
surfaceSmooth = true;
} else {
surfaceMat = 2; // Dirt (smooth)
surfaceSmooth = true;
}
// Cache for future animation frames
chunk.cachedSurfaceMat[colIdx] = surfaceMat;
chunk.cachedSurfaceFlags[colIdx] = surfaceSmooth ? 1 : 0;
}
for (int y = 0; y < CHUNK_SIZE; y++) {

View file

@ -19,12 +19,14 @@ struct Chunk {
uint32_t faceOffsets[6] = {}; // offset (in quads) for each face group within quads[]
uint32_t faceCounts[6] = {}; // number of quads per face group
// Smooth mesh data (output of Surface Nets mesher, Phase 5)
std::vector<SmoothVertex> smoothVertices;
uint32_t smoothVertexCount = 0;
bool hasSmooth = false; // true if chunk has smooth mesh output (set by mesher)
// Smooth voxel flags (used by GPU smooth mesher to decide which chunks to dispatch)
bool containsSmooth = false; // true if chunk contains any FLAG_SMOOTH voxels (set during generation)
// Cached surface material per column (set during initial generation, reused during animation)
// This avoids recomputing 8 noise3D calls per column that are time-independent.
uint8_t cachedSurfaceMat[CHUNK_SIZE * CHUNK_SIZE] = {}; // material ID per (x,z) column
uint8_t cachedSurfaceFlags[CHUNK_SIZE * CHUNK_SIZE] = {}; // smooth flag per (x,z) column
VoxelData& at(int x, int y, int z) {
return voxels[x + y * CHUNK_SIZE + z * CHUNK_SIZE * CHUNK_SIZE];
}

125
tools/prepare_textures.py Normal file
View file

@ -0,0 +1,125 @@
"""
Prepare voxel textures from FreeStylized.com ZIPs.
Outputs per material:
- *_albedo.png : RGBA (RGB=albedo, A=heightmap)
- *_normal.png : RGB normal map (OpenGL convention, Y-up)
"""
import io
import os
import zipfile
from PIL import Image, ImageEnhance
# (zip_name, color_pattern, height_pattern, normal_pattern, brightness_factor)
# brightness_factor: <1 = darken, >1 = brighten, 1.0 = unchanged
MATERIALS = [
("grass_01_1k", "color", "height", "normal_gl", 1.0),
("ground_02_1k", "color", "height", "normal_gl", 0.75), # dirt: darkened
("ground_stones_01_1k", "baseColor", "height", "normal_gl", 1.0),
("sand_01_1k", "color", "height", "normal_gl", 1.0),
("snow_01_1k", "color", "height", "normal_gl", 1.0),
("rock_01_1k", "color", "height", "normal_gl", 1.0),
]
OUTPUT_NAMES = [
"grass",
"dirt",
"stone",
"sand",
"snow",
"smoothstone",
]
TARGET_SIZE = 512
RAW_DIR = os.path.join(os.path.dirname(__file__), "..", "assets", "raw")
OUT_DIR = os.path.join(os.path.dirname(__file__), "..", "assets", "voxel")
def find_file_in_zip(zf, pattern):
"""Find a file in the zip matching a pattern substring."""
for name in zf.namelist():
basename = os.path.basename(name).lower()
if pattern.lower() in basename and basename.endswith(".png"):
return name
return None
def load_image_from_zip(zf, filename, mode="RGB"):
data = zf.read(filename)
img = Image.open(io.BytesIO(data))
# Handle 16-bit heightmaps: Pillow's .convert("L") on I;16 images
# doesn't scale properly. We must manually scale 0-65535 → 0-255.
if img.mode in ("I;16", "I") and mode == "L":
# Convert to 32-bit int first, then scale down
img = img.convert("I")
img = img.point(lambda v: v / 256)
return img.convert("L")
return img.convert(mode)
def process_material(zip_path, color_pat, height_pat, normal_pat, brightness, out_name):
with zipfile.ZipFile(zip_path, "r") as zf:
color_file = find_file_in_zip(zf, color_pat)
height_file = find_file_in_zip(zf, height_pat)
normal_file = find_file_in_zip(zf, normal_pat)
if not color_file:
print(f" ERROR: no color file matching '{color_pat}' in {zip_path}")
return False
# ── Albedo + Heightmap → RGBA ──
color_img = load_image_from_zip(zf, color_file, "RGB")
if brightness != 1.0:
color_img = ImageEnhance.Brightness(color_img).enhance(brightness)
if height_file:
height_img = load_image_from_zip(zf, height_file, "L")
else:
print(f" WARNING: no height map, deriving from luminance")
height_img = color_img.convert("L")
color_img = color_img.resize((TARGET_SIZE, TARGET_SIZE), Image.LANCZOS)
height_img = height_img.resize((TARGET_SIZE, TARGET_SIZE), Image.LANCZOS)
r, g, b = color_img.split()
rgba = Image.merge("RGBA", (r, g, b, height_img))
albedo_path = os.path.join(OUT_DIR, f"{out_name}_albedo.png")
rgba.save(albedo_path, "PNG")
print(f" OK: {out_name}_albedo.png ({TARGET_SIZE}x{TARGET_SIZE})")
# ── Normal map → RGB ──
if normal_file:
normal_img = load_image_from_zip(zf, normal_file, "RGB")
normal_img = normal_img.resize((TARGET_SIZE, TARGET_SIZE), Image.LANCZOS)
normal_path = os.path.join(OUT_DIR, f"{out_name}_normal.png")
normal_img.save(normal_path, "PNG")
print(f" OK: {out_name}_normal.png ({TARGET_SIZE}x{TARGET_SIZE})")
else:
print(f" WARNING: no normal map found")
return True
def main():
os.makedirs(OUT_DIR, exist_ok=True)
print(f"Output directory: {os.path.abspath(OUT_DIR)}")
print()
success = 0
for i, (zip_name, color_pat, height_pat, normal_pat, brightness) in enumerate(MATERIALS):
zip_path = os.path.join(RAW_DIR, zip_name + ".zip")
print(f"[{i+1}/6] {OUTPUT_NAMES[i]} <- {zip_name}.zip")
if not os.path.exists(zip_path):
print(f" ERROR: {zip_path} not found")
continue
if process_material(zip_path, color_pat, height_pat, normal_pat, brightness, OUTPUT_NAMES[i]):
success += 1
print(f"\nDone: {success}/6 materials generated in {os.path.abspath(OUT_DIR)}")
if __name__ == "__main__":
main()

View file

@ -324,9 +324,7 @@ Le VoxelRenderer s'insère dans le render path de Wicked via des hooks dans le R
J'aimerais tester quelque chose, c'est un nouveau type de block qui ne contient que des modèles 3D customs et qui aurait des comportements de jointure dynamique selon les blocs voisins identiques. Spécifiquement, j'aimerais créer des tuyaux qui se connectent les uns aux autres ou créent des nouvelles connexions pour toujours toucher les blocks tuyaux voisin.
Le ciel te plaît — parfait ! On continue vers Wonderbox. Qu'est-ce que tu voudrais améliorer ensuite ? En comparant avec la ref, je vois plusieurs pistes :
Couleurs plus saturées/profondes — le vert de l'herbe Wonderbox est plus riche et profond
Target wonderbox
Fog atmosphérique — la brume chaude au loin qui fond le terrain dans le ciel
Ombres plus marquées — le contraste ombre/lumière est plus prononcé dans Wonderbox
Faces latérales des blocs — plus texturées/détaillées dans Wonderbox
Faces latérales des blocs — plus texturées/détaillées dans Wonderbox