Fix smooth Surface Nets rendering: eliminate faceting, fix blocky junction

- Remove geoN (ddx/ddy) from smooth PS entirely — use smooth interpolated normal N for all triplanar sampling (albedo, heightmap, normal map). geoN changes discontinuously at triangle edges, causing per-triangle faceting in texture weights and normal perturbation. - Tune consistency-based vertex normal blend to smoothstep(0.70, 0.90): snaps to face normal at 90° boundaries (seamless blocky join) while preserving smooth normals on curved terrain. - Unify all 3 edge axes (X/Y/Z) to same smoothstep formula (was mixed smoothstep + pow4). - Remove grass-specific hardcoded shading from both PS (side darkening, warm shift, ambient boost) — will be data-driven per-material later. - Remove CPU SmoothMesher code (GPU-only path). - Document all findings in TROUBLESHOOTING.md with calibration table.
Add debug tools
2026-04-01 20:35:42 +02:00 · 2026-04-01 18:12:58 +02:00 · 2026-04-01 18:12:53 +02:00 · 2026-04-01 13:41:06 +02:00 · 2026-03-31 20:04:00 +02:00 · 2026-03-31 14:58:44 +02:00
38 changed files with 3184 additions and 2670 deletions
--- a/.gitignore
+++ b/.gitignore
@ -29,3 +29,6 @@ Desktop.ini

 # Claude Code
 .claude/
+assets/raw
+bvle_screenshot_*.log
+bvle_screenshot_*.png
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -18,7 +18,9 @@ bvle-voxels/
 │   │   ├── VoxelTypes.h        # Types fondamentaux (VoxelData, PackedQuad, MaterialDesc, ChunkPos)
 │   │   ├── VoxelWorld.h/.cpp   # Monde voxel (hashmap de chunks, génération procédurale)
 │   │   ├── VoxelMesher.h/.cpp  # Binary Greedy Mesher CPU + SmoothMesher (Naive Surface Nets)
-│   │   ├── VoxelRenderer.h/.cpp# Renderer + VoxelRenderPath (sous-classe RenderPath3D)
+│   │   ├── VoxelRenderer.h/.cpp# Renderer + VoxelRenderPath (CameraController, AnimationState, VoxelProfiler)
+│   │   ├── VoxelRTManager.h/.cpp # Ray tracing: BLAS/TLAS lifecycle, shadows+AO dispatches
+│   │   ├── DeferredGPUBuffer.h # Utilitaire staging→dirty→capacity GPU buffer upload
 │   │   └── TopingSystem.h/.cpp # Système de topings (biseaux décoratifs sur faces +Y)
 │   └── app/
 │       └── main.cpp            # Point d'entrée Win32 + crash handler SEH
@ -36,6 +38,11 @@ bvle-voxels/
 │   ├── voxelShadowCS.hlsl     # Compute shader RT shadows + raw AO (inline ray queries, Phase 6.2+6.3)
 │   ├── voxelAOBlurCS.hlsl     # Compute shader bilateral AO blur (separable H/V, Phase 6.3)
 │   └── voxelAOApplyCS.hlsl    # Compute shader AO apply + tone mapping + saturation (Phase 6.3 + 7)
+├── assets/
+│   ├── voxel/                  # Textures stylisées (6 albedo+height RGBA + 6 normal GL, 512x512)
+│   └── raw/                    # ZIPs sources FreeStylized.com (CC0)
+├── tools/
+│   └── prepare_textures.py     # Script: ZIP → albedo+heightmap RGBA + normal PNG (512x512)
 ├── CLAUDE.md
 └── TROUBLESHOOTING.md          # Pièges techniques, debugging, APIs Wicked
 ```
@ -62,6 +69,38 @@ cmake --build build --config Release --target BVLEVoxels --parallel

 Le SDK 10.0.26100 est requis car les headers DX12 (`d3dx12_check_feature_support.h`) fournis par Wicked Engine ne sont pas compatibles avec le SDK 22621.

+### Exécution
+
+**IMPORTANT** : Le CWD doit être la **racine du projet**, pas `build/Release/`.
+L'exe utilise des chemins relatifs pour les assets (`Content/`) et la compilation shader (`engine/WickedEngine/shaders/`).
+
+```bash
+# Lancer normalement (fenêtre 1920x1080 centrée)
+build/Release/BVLEVoxels.exe
+
+# Mode screenshot (640x480, capture 3 vues, quitte automatiquement)
+build/Release/BVLEVoxels.exe screenshot
+
+# Autres arguments
+build/Release/BVLEVoxels.exe debug         # Faces colorées par direction
+build/Release/BVLEVoxels.exe debugsmooth   # Scène smooth debug
+build/Release/BVLEVoxels.exe vulkan        # Forcer backend Vulkan
+```
+
+**Fichiers de sortie** (écrits dans le CWD, donc la racine du projet) :
+- `bvle_backlog.txt` — log Wicked Engine
+- `bvle_crash.log` + `bvle_crash.dmp` — crash report SEH (si crash)
+- `bvle_screenshot_*.png` — captures mode screenshot ou F6
+
+**Raccourcis clavier** :
+- `F2` — toggle backlog Wicked
+- `F3` — toggle animation terrain (30 Hz)
+- `F4` — toggle debug blend
+- `F5` — cycle RT shadows/AO (ON → debug shadows → debug AO → OFF)
+- `F6` — screenshot in-app (sauvegarde `voxelRT_` en PNG + `.log` compagnon)
+- `F7` — toggle sun orbit (cycle 10s, altitude sinusoïdale)
+- `F8` — toggle crosshair + debug face info (camera, target, face, normal map proj)
+
 ### Post-build automatique (CMakeLists.txt)

 Le build copie automatiquement :
@ -129,7 +168,11 @@ Perlin noise 3D, fBm 5 octaves (2 en animation), caves 3D, matériaux par altitu
 - **Per-chunk info** : `StructuredBuffer<GPUChunkInfo>` (80 bytes/chunk)
 - **Height-based blending** (Phase 3) : PS lit `voxelDataBuffer` (t3), winner-takes-all heightmap, corner attenuation
 - **Render targets propres** : `voxelRT_` (R8G8B8A8) + `voxelDepth_` (D32_FLOAT)
- **CPU profiling** : `ProfileAccum` avec moyennes toutes les 5s
+- **CPU profiling** : `VoxelProfiler` (21 `ProfileAccum`, moyennes toutes les 5s)
+- **DeferredGPUBuffer** : utilitaire pour buffers GPU avec staging CPU, dirty flag, capacity-based growth (25% headroom)
+- **VoxelRTManager** (`VoxelRTManager.h/.cpp`) : gère BLAS/TLAS, dispatches RT shadows+AO, isolé du renderer
+- **VoxelRenderPath** décomposé en : `CameraController` (mouvement/souris), `AnimationState` (tick terrain), `VoxelProfiler`
+- **Toping sort** : counting sort O(n) par (type, variant) au lieu de `std::sort`

 ## Phases de développement

@ -174,6 +217,17 @@ PS-based heightmap blending, winner-takes-all, corner attenuation subtractive. G

 - **7.1** [FAIT] : Hemisphere ambient, colored shadows, rim light, tone mapping + saturation, screenshot mode

+### Phase 8 - Textures stylisées réelles [EN COURS]
+
+- **8.1** [FAIT] : Chargement textures CC0 FreeStylized (6 matériaux, albedo+heightmap RGBA, normal maps GL)
+- **8.2** [FAIT] : Texture2DArray (t1=albedo+height, t7=normals), triplanar sampling, stb_image loading
+- **8.3** [FAIT] : Height-based texture blending (winner-takes-all, sharpness=16, corner attenuation)
+- **8.4** [FAIT] : Asymmetric blend pour resistBleed (coeff 1.6), zone de blend 40%
+- **8.5** [FAIT] : UDN triplanar normal mapping (sign correction, GL green flip Y-proj only, NO abs)
+- **8.6** [FAIT] : Dirt rendu smooth (FLAG_SMOOTH), ground_02 texture assombrie 0.75
+- **8.7** [FAIT] : Sun orbit debug (F7, cycle 10s), crosshair + face debug HUD (F8)
+- **8.8** [FAIT] : Screenshot F6 avec .log compagnon (camera, target, debug states, RT stats)
+
 ## Métriques cibles et résultats

 | Métrique | Cible | Résultat (Ryzen 7 9800X3D + RX 9070 XT) |
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -41,6 +41,14 @@ add_custom_command(TARGET BVLEVoxels POST_BUILD
    COMMENT "Copying DXC shader compiler DLL"
 )

+# Copy voxel texture assets to Content/voxel/ next to the exe
+add_custom_command(TARGET BVLEVoxels POST_BUILD
+    COMMAND ${CMAKE_COMMAND} -E copy_directory
+        ${CMAKE_SOURCE_DIR}/assets/voxel
+        $<TARGET_FILE_DIR:BVLEVoxels>/Content/voxel
+    COMMENT "Copying voxel texture assets"
+)
+
 # Copy our custom shader sources into Wicked's shader source tree
 # so LoadShader can find and compile them as "voxel/voxelVS.cso"
 add_custom_command(TARGET BVLEVoxels POST_BUILD
--- a/TROUBLESHOOTING.md
+++ b/TROUBLESHOOTING.md
@ -3,6 +3,8 @@
 ## Table des matières

 - [APIs Wicked utilisées](#apis-wicked-utilisées)
+- [Coordonnées logiques vs physiques](#coordonnées-logiques-vs-physiques--piège-majeur)
+- [Triplanar UDN Normal Mapping](#triplanar-udn-normal-mapping--pièges-majeurs)
 - [Shaders custom — Pièges importants](#shaders-custom--pièges-importants)
  1. [Root signature obligatoire](#1-root-signature-obligatoire)
  2. [Root signature Wicked (HLSL 6.6+)](#2-root-signature-wicked-hlsl-66)
@ -17,6 +19,7 @@
 - [CreateBuffer avec capacity > data size](#createbuffer-avec-capacity--data-size)
 - [BLAS/TLAS per-frame recreation — VRAM leak](#blastlas-per-frame-recreation--vram-leak)
 - [Diagnostics et debugging](#diagnostics-et-debugging)
+- [Smooth Surface Nets — Rendu facetté et jointure blocky](#smooth-surface-nets--rendu-facetté-et-jointure-blocky)
 - [Gestion des resource states DX12 (buffers)](#gestion-des-resource-states-dx12-buffers)

 ---
@ -41,6 +44,107 @@
 | Render pass | NE JAMAIS imbriquer ! Un seul render pass actif par command list |
 | Debug DX12 | Passer `"debugdevice"` en argument pour activer la couche de debug D3D12 |
 | Logging | `wi::backlog::post(message, logLevel)` — préférer au logging fichier |
+| Screen size (draw) | **`GetLogicalWidth()`/`GetLogicalHeight()`** pour `wi::font` et `wi::image` (PAS `GetPhysicalWidth`) |
+| Solid rect draw | `wi::image::Draw(wi::texturehelper::getWhite(), params, cmd)` — ne PAS passer `nullptr` |
+
+---
+
+## Coordonnées logiques vs physiques — Piège majeur
+
+Wicked Engine distingue deux systèmes de coordonnées écran :
+
+- **Physical** (`GetPhysicalWidth()`/`GetPhysicalHeight()`) : pixels réels du backbuffer. Utilisé pour créer les render targets, viewports, et textures GPU.
+- **Logical** (`GetLogicalWidth()`/`GetLogicalHeight()`) : pixels DPI-scaled. **Tout le système 2D de Wicked** (`wi::font::Draw`, `wi::image::Draw`, `wi::image::Params::pos/siz`) travaille en coordonnées logiques.
+
+**Symptôme** : éléments HUD décalés, crosshair excentré, texte hors écran.
+
+```cpp
+// ❌ FAUX — décalé si DPI scaling ≠ 100%
+float cx = (float)GetPhysicalWidth() * 0.5f;
+wi::font::Params fp; fp.posX = cx;
+
+// ✅ CORRECT
+float cx = GetLogicalWidth() * 0.5f;
+wi::font::Params fp; fp.posX = cx;
+```
+
+**Pour dessiner un rectangle solide** (pas de texture) :
+
+```cpp
+// ❌ FAUX — ne dessine rien
+wi::image::Draw(nullptr, params, cmd);
+
+// ✅ CORRECT — utiliser la texture blanche 1x1 intégrée
+#include "wiTextureHelper.h"
+wi::image::Draw(wi::texturehelper::getWhite(), params, cmd);
+```
+
+La projection 2D est définie dans `wiCanvas.h` :
+```cpp
+GetProjection() = XMMatrixOrthographicOffCenterLH(0, GetLogicalWidth(), GetLogicalHeight(), 0, -1, 1);
+```
+
+---
+
+## Triplanar UDN Normal Mapping — Pièges majeurs
+
+L'implémentation UDN (Unreal Derivative Normal) triplanar pour les normal maps a trois subtilités critiques :
+
+### 1. NE PAS utiliser `abs(normal)` dans la formule UDN
+
+La référence Ben Golus utilise `abs(normal)` car elle cible des terrains (normales toujours vers le haut). Pour des voxels avec 6 directions de faces, `abs()` force la composante dominante à être positive, **inversant l'éclairage sur les faces -X, -Y et -Z**.
+
+```hlsl
+// ❌ FAUX — inverse les normales sur 3 faces (le NdotL est faux)
+float3 absN = abs(normal);
+float3 worldNX = float3(tnX.xy + absN.zy, absN.x).zyx;
+// Face -X: absN.x = 1 → résultat pointe vers +X au lieu de -X
+
+// ✅ CORRECT — utiliser le normal brut
+float3 worldNX = float3(tnX.xy + normal.zy, normal.x).zyx;
+// Face -X: normal.x = -1 → résultat pointe bien vers -X
+```
+
+**Diagnostic** : ombres RT correctes (elles utilisent la géométrie) mais éclairage direct inversé sur certaines faces → contradiction visuelle.
+
+### 2. Correction de signe pour les faces négatives
+
+Les UV sont miroir sur les faces négatives. Le `sign(normal)` corrige la composante tangent-space X :
+
+```hlsl
+float3 axisSign = sign(normal);
+tnX.x *= axisSign.x;  // Flip U-tangent pour -X
+tnY.x *= axisSign.y;  // Flip U-tangent pour -Y
+tnZ.x *= axisSign.z;  // Flip U-tangent pour -Z
+```
+
+### 3. Flip green channel pour les normal maps OpenGL (seulement projection Y)
+
+Les textures `normal_gl` ont le green channel inversé par rapport à DX. En triplanar, seule la **projection Y** (faces horizontales, UV=xz) nécessite le flip — les projections X et Z ont V=world Y qui est naturellement correct.
+
+```hlsl
+// ❌ FAUX — casse les faces verticales
+tnX.y = -tnX.y; tnY.y = -tnY.y; tnZ.y = -tnZ.y;
+
+// ✅ CORRECT — seulement la projection Y
+tnY.y = -tnY.y;
+```
+
+**Formule complète correcte** :
+```hlsl
+float3 axisSign = sign(normal);
+float3 tnX = sample(wp.zy).rgb * 2.0 - 1.0;
+float3 tnY = sample(wp.xz).rgb * 2.0 - 1.0;
+float3 tnZ = sample(wp.xy).rgb * 2.0 - 1.0;
+tnY.y = -tnY.y;                    // GL flip Y-projection only
+tnX.x *= axisSign.x;               // sign correction
+tnY.x *= axisSign.y;
+tnZ.x *= axisSign.z;
+float3 worldNX = float3(tnX.xy + normal.zy, normal.x).zyx;  // RAW normal
+float3 worldNY = float3(tnY.xy + normal.xz, normal.y).xzy;
+float3 worldNZ = float3(tnZ.xy + normal.xy, normal.z);
+return normalize(worldNX * w.x + worldNY * w.y + worldNZ * w.z);
+```

 ---

@ -217,6 +321,66 @@ dev->BuildRaytracingAccelerationStructure(&blas, cmd, nullptr);

 ---

+## Smooth Surface Nets — Rendu facetté et jointure blocky
+
+### Problème 1 : Rendu smooth facetté malgré normales lisses
+
+**Symptôme** : en mode debug (FLAT, NdotL, NORMAL), la surface smooth est parfaitement lisse. Mais en rendu final (ALL), elle apparaît facettée avec des arêtes de triangles visibles.
+
+**Cause racine** : `geoN` (geometric normal via `ddx(worldPos)`/`ddy(worldPos)`) était utilisé pour le triplanar sampling (poids de projection) ET le normal mapping. Cette valeur est la **face normal du triangle à l'écran** — elle change de manière **discontinue** à chaque arête de triangle. Résultat :
+
+1. **Poids triplanar discontinus** → la texture saute aux arêtes (coutures visibles)
+2. **Normal map discontinu** → la perturbation normale diffère par triangle → NdotL facetté
+
+Les modes debug étaient lisses car ils utilisaient `flatN` (smooth normal **avant** perturbation normal map), pas le `N` perturbé.
+
+**Correction** : utiliser `N` (smooth interpolated normal) pour **tout** le triplanar dans `voxelSmoothPS.hlsl` :
+- Poids triplanar albedo/heightmap → `N` (pas `geoN`)
+- Normal map sampling → `N` (pas `geoN`)
+- `geoN` n'est plus calculé/utilisé du tout
+
+`N` varie continûment entre vertices → transitions lisses partout.
+
+### Problème 2 : Jointure visible smooth/blocky
+
+**Symptôme** : contraste visible entre faces smooth et blocky adjacentes, quasi-coplanaires.
+
+**Causes racines** (cumulatives) :
+
+1. **Traitements per-material dans un seul PS** — le blocky PS avait un shading spécifique grass (side darkening 60%, warm shift chromatique, ambient boost ×1.15) absent du smooth PS. Pour une face grass +X, ça créait ~40% d'écart de luminosité.
+
+2. **Smooth normals biaisées aux frontières** — les vertex normals aux arêtes 90° (mur smooth → sol) étaient moyennées entre faces perpendiculaires (consistency ≈ 0.707), produisant une normale biaisée vers +Y au lieu de +X pur.
+
+**Correction** :
+- **Supprimer les traitements per-material hardcodés** des deux PS. Quand on aura besoin de shading par matériau, le rendre data-driven et l'appliquer identiquement dans les deux shaders.
+- **Consistency-based vertex normal blend** dans `voxelSmoothCS.hlsl` : métrique `|Σfn| / Σ|fn|` qui mesure l'accord des face normals incidentes. Les vertices à faible consistency (arêtes nettes, frontières) reçoivent la face normal pure ; les vertices à haute consistency (surfaces courbes) gardent la smooth normal.
+
+### Calibration du seuil de consistency
+
+Le seuil `smoothstep(low, high, consistency)` contrôle le compromis lisse/net :
+
+| Seuil | con=0.707 (90° edge) | con=0.85 (courbe) | con=0.95 (pente) | Résultat |
+|---|---|---|---|---|
+| `(0.85, 1.0)` | t=0 face ✓ | t=0 face ✗ | t=0.26 ≈ face ✗ | Trop agressif, tout facetté |
+| `(0.60, 0.85)` | t=0.27 ≈ 73% face | t=1.0 smooth ✓ | t=1.0 smooth ✓ | Frontière visible, intérieur lisse |
+| `(0.70, 0.90)` | t≈0 face ✓ | t=0.84 smooth ✓ | t=1.0 smooth ✓ | **Bon compromis** |
+
+**Valeur retenue : `smoothstep(0.70, 0.90)`** — les arêtes 90° (con ≤ 0.707) reçoivent 100% face normal (jointure nette avec blocky), les courbes modérées (con > 0.85) restent smooth.
+
+### Normal map strength
+
+Le smooth PS utilise `nmStrength * 0.7` (vs `nmStrength * 1.0` pour blocky). Les surfaces courbes nécessitent des normal maps atténuées pour que les perturbations ne cassent pas la continuité visuelle du smooth shading.
+
+### Règles
+
+- **Toute modification de lighting/texturing** dans `voxelPS.hlsl` doit être portée dans `voxelSmoothPS.hlsl` (et vice-versa)
+- **Ne JAMAIS utiliser `geoN`** (ddx/ddy) dans le smooth PS pour le triplanar ou le normal mapping — utiliser `N` exclusivement
+- Les deux PS doivent produire un résultat identique sur des faces coplanaires de même matériau
+
+**Fichiers** : `shaders/voxelSmoothCS.hlsl` (consistency blend), `shaders/voxelSmoothPS.hlsl` (triplanar + normal map), `shaders/voxelPS.hlsl` (blocky reference)
+
+---
+
 ## Gestion des resource states DX12 (buffers)

 **Wicked Engine ne fait AUCUN tracking automatique d'état pour les buffers.** Les `GPUBarrier::Buffer(buf, before, after)` sont passées directement à D3D12 sans validation. **Le `state_before` DOIT correspondre à l'état DX12 réel, sinon → DXGI_ERROR_INVALID_CALL.**
--- a/assets/voxel/dirt_albedo.png
+++ b/assets/voxel/dirt_albedo.png
--- a/assets/voxel/dirt_normal.png
+++ b/assets/voxel/dirt_normal.png
--- a/assets/voxel/grass_albedo.png
+++ b/assets/voxel/grass_albedo.png
--- a/assets/voxel/grass_normal.png
+++ b/assets/voxel/grass_normal.png
--- a/assets/voxel/sand_albedo.png
+++ b/assets/voxel/sand_albedo.png
--- a/assets/voxel/sand_normal.png
+++ b/assets/voxel/sand_normal.png
--- a/assets/voxel/smoothstone_albedo.png
+++ b/assets/voxel/smoothstone_albedo.png
--- a/assets/voxel/smoothstone_normal.png
+++ b/assets/voxel/smoothstone_normal.png
--- a/assets/voxel/snow_albedo.png
+++ b/assets/voxel/snow_albedo.png
--- a/assets/voxel/snow_normal.png
+++ b/assets/voxel/snow_normal.png
--- a/assets/voxel/stone_albedo.png
+++ b/assets/voxel/stone_albedo.png
--- a/assets/voxel/stone_normal.png
+++ b/assets/voxel/stone_normal.png
--- a/docs/plan-lod-skirts.md
+++ b/docs/plan-lod-skirts.md
@ -0,0 +1,146 @@
+# LOD multi-resolution avec skirts
+
+Inspire du talk Roblox SIGGRAPH 2020 (p.34-38) et de l'approche Transvoxel.
+Objectif : augmenter la distance de vue sans exploser le cout de meshing/rendu.
+
+## Probleme actuel
+
+- Le monde entier (512x512x256 = 8192 chunks potentiels, ~648 actifs) est meshe et
+  rendu a pleine resolution 32^3
+- Le meshing smooth CPU coute 17ms pour 648 chunks (parallelise)
+- Le rendu est cheap (0.1ms GPU mesh), mais le meshing smooth bloque le scale-up
+- Pas de distance de vue variable : tout ou rien
+
+## Approche Roblox : mip pyramid + skirts
+
+### Principe
+
+1. Chaque chunk stocke un **mip pyramid** de voxels : 32^3, 16^3, 8^3, 4^3, 2^3, 1^3
+2. Un **octree** de rendu decide quel niveau de mip utiliser par chunk (distance camera)
+3. Les coutures entre chunks de LOD different sont masquees par des **skirts**
+4. Les skirts sont des triangles supplementaires avec **depth bias** dans le VS
+
+### Pourquoi des skirts plutot que Transvoxel ?
+
+| | Transvoxel | Skirts (Roblox) |
+|--|-----------|-----------------|
+| Complexite | Elevee (tables de cas, 73 transition cells) | Faible (1 couche extra + depth bias) |
+| Qualite | Parfaite (mesh continu) | Bonne (gaps invisibles grace au depth bias) |
+| Cout meshing | +50% (transition cells) | +15% (1 couche de voxels en plus) |
+| Integration | Invasive (change le mesher) | Additive (post-process sur le mesh) |
+
+## Plan d'implementation
+
+### Phase A : Mip pyramid storage
+
+**Fichier :** `VoxelWorld.h/.cpp`
+
+```cpp
+struct Chunk {
+    VoxelData voxels[CHUNK_SIZE * CHUNK_SIZE * CHUNK_SIZE]; // LOD 0 (32^3)
+    // Mip levels stockes a la demande
+    std::array<std::vector<VoxelData>, 4> mips; // LOD 1-4 (16^3, 8^3, 4^3, 2^3)
+    uint8_t maxAvailableLOD = 0;
+};
+```
+
+**Downsampling :** pour chaque groupe 2x2x2, le voxel dominant (material le plus frequent,
+occupancy > 4/8) est conserve. Voxels smooth : l'occupancy est moyennee.
+
+**Memoire :** un mip pyramid complet = 32^3 + 16^3 + 8^3 + 4^3 + 2^3 = 37448 voxels
+= ~73 Ko par chunk (vs 64 Ko actuellement). Surcout de 14%.
+
+### Phase B : Selection de LOD
+
+**Fichier :** `VoxelRenderer.cpp` (dans le frustum cull ou en CPU)
+
+```cpp
+uint8_t selectLOD(const ChunkPos& pos, const XMFLOAT3& cameraPos) {
+    float dist = distance(chunkCenter(pos), cameraPos);
+    if (dist < 64.0f)  return 0; // pleine resolution
+    if (dist < 128.0f) return 1; // 16^3
+    if (dist < 256.0f) return 2; // 8^3
+    return 3;                     // 4^3
+}
+```
+
+Le LOD est passe au mesher. Le mesher binaire greedy et le Surface Nets travaillent
+sur le mip correspondant (identiques, juste un tableau plus petit).
+
+Le compute shader `voxelMeshCS` recoit le LOD level et ajuste le chunk size en
+consequence. Les positions des quads sont multipliees par `2^LOD` pour rester en
+coordonnees monde.
+
+### Phase C : Generation des skirts
+
+**Principe :** quand un chunk a un LOD inferieur (moins detaille) qu'un voisin, des gaps
+apparaissent a la frontiere. On genere une "jupe" de geometrie supplementaire pour
+les masquer.
+
+**Implementation :**
+
+1. Pour chaque face de chunk adjacente a un chunk de LOD superieur :
+   - Ajouter une couche supplementaire de voxels dupliques depuis le voisin haute-res
+   - Mesher normalement cette couche (elle s'etend legerement au-dela du chunk)
+
+2. Taguer les vertices de skirt dans le vertex data (1 bit dans les flags)
+
+3. Dans le VS, appliquer un **depth bias** aux vertices de skirt :
+   ```hlsl
+   if (isSkirt) {
+       // Pousse le skirt legerement derriere la surface
+       output.position.z += 0.0001; // en clip space (reverse-Z: vers le far)
+   }
+   ```
+   Le skirt n'est visible que la ou il y a un gap, car la geometrie normale le
+   masque partout ailleurs grace au depth test.
+
+### Phase D : Integration rendu
+
+**Buffers :** les skirts sont inclus dans le meme mega-buffer de quads, tagges par un bit.
+Pas de draw call supplementaire.
+
+**Compute cull :** le compute shader de culling (`voxelCullCS`) recoit le LOD par chunk
+dans le GPUChunkInfo. Les chunks LOD > 0 ont moins de quads, donc moins de vertices
+a traiter.
+
+**RT :** les BLAS sont construits par chunk. Les chunks LOD > 0 ont des BLAS plus petits.
+Le TLAS reste identique.
+
+### Phase E : Smooth LOD specifique
+
+Pour les chunks smooth (Surface Nets), le LOD est plus delicat :
+- Le mesh smooth LOD 1 (16^3) a des triangles 2x plus grands
+- Les normales sont moins precises
+- La deformation par materiau (plan-vertex-deformation.md) doit rester coherente
+
+**Approche :** mesher smooth sur le mip correspondant. Les skirts smooth sont generes
+de la meme facon (couche supplementaire). La coherence visuelle est acceptable car
+le smooth est deja "flou" par nature.
+
+## Estimation d'effort
+
+| Phase | Effort | Dependance |
+|-------|--------|------------|
+| A. Mip pyramid | 4h | Aucune |
+| B. Selection LOD | 2h | A |
+| C. Skirts blocky | 4h | B |
+| D. Integration rendu | 3h | C |
+| E. Smooth LOD | 4h | B + Phase 5.1 |
+| **Total** | **~17h** | |
+
+## Risques
+
+- **Popping** : le changement de LOD est visible si les distances sont trop proches.
+  Solution : cross-fade ou hysteresis (changer de LOD a dist+10% pour eviter l'oscillation).
+- **Skirt artifacts** : si le depth bias est trop grand, les skirts sont visibles comme
+  des ombres. Tuner le bias par LOD level.
+- **Meshing cache** : les mips LOD > 0 changent moins souvent. Cacher le mesh par LOD
+  level et ne re-mesher que quand le mip change.
+
+## References
+
+- Roblox SIGGRAPH 2020, p.34-38 (skirts + depth bias)
+- Transvoxel (Eric Lengyel) : https://transvoxel.org/
+- 0fps LOD for blocky voxels : https://0fps.net/2018/03/03/a-level-of-detail-method-for-blocky-voxels/
+- Nick Gildea, Dual Contouring seams : http://ngildea.blogspot.com/2014/09/dual-contouring-chunked-terrain.html
--- a/docs/plan-stylized-textures.md
+++ b/docs/plan-stylized-textures.md
@ -0,0 +1,222 @@
+# Textures stylisees reelles + quilting
+
+Passer des couleurs procedurales a de vraies textures hand-painted dans un style
+Wonderbox / Enshrouded. Inclut la technique de quilting Roblox comme optimisation.
+
+## Etat actuel
+
+- `textureArray_` : 5 layers 256x256 generees proceduralement (bruit + couleur unie)
+- `MaterialDesc` : champs `albedoTextureIndex`, `normalTextureIndex`, `heightmapTextureIndex`
+  deja presents mais pointent vers des textures generees
+- Triplanar mapping : fonctionnel dans `voxelPS.hlsl` (blocky) et `voxelSmoothPS.hlsl`
+- Height-based blending : fonctionnel (Phase 3), winner-takes-all + corner attenuation
+- `sampler_` : deja cree, lineaire avec wrap
+
+L'infrastructure est prete, il manque les textures et l'integration.
+
+## Plan d'implementation
+
+### Phase A : Preparer les textures (art)
+
+**Format cible par materiau :**
+
+| Texture | Format | Contenu | Taille |
+|---------|--------|---------|--------|
+| Albedo | RGBA8 | RGB = couleur, A = heightmap | 512x512 |
+| Normal | RG8 | Normal map tangent-space (BC5) | 512x512 |
+
+La heightmap dans le canal alpha de l'albedo est la convention Roblox et evite une
+texture separee. Le height-based blending lit deja un canal height.
+
+**Materiaux a creer (6) :**
+
+1. **Grass** : herbe hand-painted, brins visibles, height map avec pointes hautes
+2. **Dirt** : terre seche, crevasses, height map irreguliere
+3. **Stone** : pierre grise, fissures, height map avec aretes saillantes
+4. **Sand** : sable fin, ondulations, height map douce
+5. **Snow** : neige poudreuse, surface quasi-plate, height map tres lisse
+6. **Smoothstone** : pierre polie, veines subtiles
+
+**Sources de textures stylisees (libres ou a creer) :**
+- Polyhaven (CC0, PBR) : redessiner par-dessus pour le style hand-painted
+- Ambientcg (CC0) : bases realistes a simplifier
+- Creer from scratch dans Krita/Aseprite en 512x512
+
+### Phase B : Charger les textures dans le texture array
+
+**Fichier :** `VoxelRenderer.cpp`, remplacer `generateTextures()`.
+
+```cpp
+void VoxelRenderer::loadTextures() {
+    // Charger chaque materiau depuis des fichiers PNG/DDS
+    const char* albedoPaths[] = {
+        "Content/voxel/grass_albedo.png",
+        "Content/voxel/dirt_albedo.png",
+        "Content/voxel/stone_albedo.png",
+        "Content/voxel/sand_albedo.png",
+        "Content/voxel/snow_albedo.png",
+        "Content/voxel/smoothstone_albedo.png",
+    };
+
+    // Creer un texture array 512x512 x N layers
+    TextureDesc desc;
+    desc.width = 512;
+    desc.height = 512;
+    desc.arraySize = NUM_MATERIALS;
+    desc.format = Format::R8G8B8A8_UNORM;
+    desc.bindFlags = BindFlag::SHADER_RESOURCE;
+    desc.mipLevels = 0; // auto mip generation
+
+    // Charger chaque layer via wi::helper::loadTextureFromFile
+    // puis copier dans le texture array via CopyTexture
+}
+```
+
+**Wicked Engine helper :** `wi::resourcemanager::Load()` charge PNG/DDS et genere
+les mips automatiquement. On peut aussi utiliser `wi::helper::CreateTexture()` avec
+des donnees brutes.
+
+**Post-build :** ajouter une copie des textures dans CMakeLists.txt :
+```cmake
+add_custom_command(TARGET BVLEVoxels POST_BUILD
+    COMMAND ${CMAKE_COMMAND} -E copy_directory
+        ${CMAKE_SOURCE_DIR}/assets/voxel
+        $<TARGET_FILE_DIR:BVLEVoxels>/Content/voxel
+)
+```
+
+### Phase C : Adapter les shaders
+
+**Changements dans `voxelPS.hlsl` (blocky) :**
+
+Le shader utilise deja `materialTextures` (texture array) et `triplanarSample()`.
+Modifications :
+
+```hlsl
+// Actuel : couleur procedurale + texture subtile
+float3 color = materialColor * texSample.rgb;
+
+// Nouveau : texture directe, la couleur vient de la texture
+float3 albedo = triplanarSample(materialTextures, worldPos, normal, matIndex).rgb;
+float height = triplanarSample(materialTextures, worldPos, normal, matIndex).a;
+```
+
+Le height est utilise pour le blending inter-materiaux (deja en place).
+
+**Changements dans `voxelSmoothPS.hlsl` :**
+
+Identique. Le triplanar est deja en place, juste remplacer la source de couleur.
+
+### Phase D : Detiling (anti-repetition)
+
+Probleme : le triplanar avec des textures 512x512 montre de la repetition visible
+tous les 16 voxels (si tiling = 1 texel/voxel).
+
+**Technique Roblox (p.25) :** rotation + shift pseudo-random par vertex.
+
+```hlsl
+// Dans le VS, calculer un seed de detiling a partir de la position
+uint detileSeed = hash(uint3(floor(worldPos)));
+
+// Dans le PS, appliquer une rotation/shift aux UVs
+float angle = (detileSeed & 0x3) * (3.14159 / 2.0); // 0, 90, 180, 270 degres
+float2 rotatedUV = rotate2D(uv, angle);
+float2 shiftedUV = rotatedUV + float2(
+    ((detileSeed >> 2) & 0xF) / 16.0,
+    ((detileSeed >> 6) & 0xF) / 16.0
+);
+```
+
+Cela casse la repetition sans ajouter de samples supplementaires.
+Le seed est passe du VS au PS via un interpolant (1 uint).
+
+**Alternative plus simple :** varier le tiling scale par axe triplanar (1.0, 0.97, 1.03).
+Casse deja pas mal la repetition pour un cout quasi nul.
+
+### Phase E : Quilting (optimisation optionnelle)
+
+Si le triplanar (3 fetches par texture * N textures) devient un bottleneck :
+
+**Technique Roblox (p.22) :** choisir UN plan de projection par vertex parmi 18 plans
+(6 axes * 3 rotations de 30 degres). Encode le plan ID dans le vertex data (5 bits).
+
+```
+18 plans = 6 faces * 3 rotations :
+  +X face : 0deg, 30deg, 60deg
+  -X face : 0deg, 30deg, 60deg
+  +Y face : ...
+  ...
+```
+
+Le PS n'echantillonne qu'une seule fois par materiau au lieu de 3 (triplanar).
+Reduction : de 9 fetches (3 materiaux * 3 axes) a 3 fetches (3 materiaux * 1 plan).
+
+**Pour le blocky :** pas necessaire. Les quads ont une face unique, le triplanar est
+deja reduit a 1 axe dominant. Le quilting n'apporte rien.
+
+**Pour le smooth :** potentiellement utile si on blend 3+ materiaux. A mesurer d'abord
+si le triplanar est reellement un bottleneck (peu probable avec 6 materiaux).
+
+**Verdict :** reporter le quilting apres avoir mesure. Le triplanar standard devrait
+suffire avec notre nombre de materiaux.
+
+### Phase F : Normal mapping
+
+Ajouter une deuxieme texture array pour les normal maps (ou un 2eme set de layers).
+
+```hlsl
+// Triplanar normal map sampling
+float3 normalMap = triplanarSampleNormal(normalTextures, worldPos, geometricNormal, matIndex);
+// Perturber la normale geometrique
+float3 finalNormal = normalize(geometricNormal + normalMap * normalStrength);
+```
+
+Le triplanar normal mapping necessite de reconstruire le TBN par axe de projection.
+C'est un calcul supplementaire mais classique.
+
+**Approche simplifiee :** pour un style hand-painted, les normal maps ne sont pas
+obligatoires. L'albedo porte la majeure partie du detail visuel. A evaluer
+visuellement avant d'investir du temps.
+
+## Structure des assets
+
+```
+assets/
+  voxel/
+    grass_albedo.png      # RGBA : RGB=couleur, A=heightmap
+    dirt_albedo.png
+    stone_albedo.png
+    sand_albedo.png
+    snow_albedo.png
+    smoothstone_albedo.png
+    # (optionnel) grass_normal.png, etc.
+```
+
+## Estimation d'effort
+
+| Phase | Effort | Dependance |
+|-------|--------|------------|
+| A. Creer textures (art) | 4-8h | Aucune (parallelisable) |
+| B. Loader texture array | 3h | A |
+| C. Adapter shaders | 2h | B |
+| D. Detiling | 2h | C |
+| E. Quilting | 4h | C (optionnel) |
+| F. Normal maps | 3h | C (optionnel) |
+| **Total minimum** | **~11h** | A+B+C+D |
+
+## Risques
+
+- **Style incoherent** : les textures doivent toutes avoir le meme style hand-painted.
+  Mieux vaut commencer par 2 materiaux (grass+stone) et valider le look avant de
+  faire les 6.
+- **Mip bleeding** : dans un texture array, les mips peuvent bleed entre layers.
+  Solution : padding 4px autour de chaque texture, ou utiliser des formats compresses
+  (BC7) avec mips explicites.
+- **Tiling visible** : le detiling resout ca, mais necessitee un tuning par materiau.
+  Les textures doivent etre tileable de base.
+
+## References
+
+- Roblox SIGGRAPH 2020, p.21-29 (quilting, detiling, height-based blend)
+- DreamCat Games, Smooth Voxel Mapping : https://bonsairobo.medium.com/smooth-voxel-mapping-a-technical-deep-dive-on-real-time-surface-nets-and-texturing-ef06d0f8ca14
+- Real-time Image Quilting (Hugh Malan, SIGGRAPH 2011)
--- a/docs/plan-vertex-deformation.md
+++ b/docs/plan-vertex-deformation.md
@ -0,0 +1,140 @@
+# Deformation de vertices par materiau
+
+Inspiree du talk Roblox SIGGRAPH 2020 (p.19). Chaque materiau definit une deformation
+procedurale appliquee aux vertices Surface Nets apres le calcul du centroide.
+Donne un caractere visuel distinct a chaque materiau sans cout GPU supplementaire.
+
+## Objectif
+
+Actuellement, tous les materiaux smooth produisent les memes blobs lisses uniformes.
+Avec la deformation par materiau :
+- La **pierre** aurait des aretes plus marquees (cubify)
+- Le **sable** aurait des surfaces ondulees (shift)
+- La **neige** resterait lisse (aucune deformation)
+- La **glace** aurait des facettes cristallines (quantize)
+
+## Modes de deformation (Roblox)
+
+| Mode | Effet | Formule | Materiaux cibles |
+|------|-------|---------|-----------------|
+| `None` | Aucune deformation | identity | snow, water |
+| `Shift` | Offset pseudo-random | `pos += hash(pos) * amplitude` | sand, dirt |
+| `Cubify` | Lerp vers centre du cube | `pos = lerp(pos, round(pos) + 0.5, factor)` | stone, rock |
+| `Quantize` | Arrondi a pas fixe | `pos = round(pos * K) / K` | ice, crystal |
+| `Barrel` | Cubify uniquement en Y | `pos.y = lerp(pos.y, round(pos.y) + 0.5, f)` | pillars, trunks |
+
+## Integration dans le code existant
+
+### 1. Etendre MaterialDesc (VoxelTypes.h)
+
+```cpp
+struct MaterialDesc {
+    // ... champs existants ...
+    uint8_t deformMode = 0;     // 0=None, 1=Shift, 2=Cubify, 3=Quantize, 4=Barrel
+    uint8_t deformStrength = 0; // 0-255 -> 0.0-1.0
+    // remplace _pad ou ajoute 2 bytes (struct reste 16-aligned)
+};
+```
+
+Pas de changement GPU : la deformation est CPU-only dans le mesher.
+
+### 2. Modifier SmoothMesher (VoxelMesher.cpp)
+
+Le point d'insertion est apres le calcul du centroide, avant l'ecriture dans le buffer
+de sortie. Actuellement dans `meshSurfaceNets()` :
+
+```
+centroid = average(edge_crossings)
+normal = average(triangle_normals)
+-> INSERER DEFORMATION ICI
+write SmoothVertex(centroid, normal, material)
+```
+
+Implementation :
+
+```cpp
+// Apres le calcul du centroid et avant l'ecriture du vertex
+XMFLOAT3 deformVertex(XMFLOAT3 pos, const MaterialDesc& mat) {
+    switch (mat.deformMode) {
+    case 1: { // Shift
+        float strength = mat.deformStrength / 255.0f;
+        // Hash stable base sur position entiere (pas de flicker en animation)
+        uint32_t h = hash3(int(pos.x), int(pos.y), int(pos.z));
+        float rx = ((h & 0xFF) / 255.0f - 0.5f) * strength;
+        float ry = (((h >> 8) & 0xFF) / 255.0f - 0.5f) * strength;
+        float rz = (((h >> 16) & 0xFF) / 255.0f - 0.5f) * strength;
+        return { pos.x + rx, pos.y + ry, pos.z + rz };
+    }
+    case 2: { // Cubify
+        float f = mat.deformStrength / 255.0f;
+        float cx = floorf(pos.x) + 0.5f;
+        float cy = floorf(pos.y) + 0.5f;
+        float cz = floorf(pos.z) + 0.5f;
+        return { lerp(pos.x, cx, f), lerp(pos.y, cy, f), lerp(pos.z, cz, f) };
+    }
+    case 3: { // Quantize
+        float K = 2.0f + (mat.deformStrength / 255.0f) * 6.0f; // 2-8 steps
+        return { roundf(pos.x * K) / K, roundf(pos.y * K) / K, roundf(pos.z * K) / K };
+    }
+    case 4: { // Barrel (cubify Y only)
+        float f = mat.deformStrength / 255.0f;
+        float cy = floorf(pos.y) + 0.5f;
+        return { pos.x, lerp(pos.y, cy, f), pos.z };
+    }
+    default: return pos;
+    }
+}
+```
+
+### 3. Recalculer les normales apres deformation
+
+Les normales moyennees doivent etre recalculees APRES la deformation, sinon elles ne
+correspondent plus a la geometrie deformee. Deux options :
+
+**Option A (simple) :** Recalculer les face normals des triangles adjacents apres deformation.
+C'est ce que fait deja le pass de normales dans `meshSurfaceNets()`, il suffit de le
+deplacer apres la deformation.
+
+**Option B (rapide) :** Garder les normales originales. La deformation est subtile,
+l'erreur de normale est visuellement acceptable. Recommande pour le prototype.
+
+### 4. Soft/hard edges par materiau
+
+Roblox controle aussi les aretes douces/dures par materiau. On peut ajouter :
+
+```cpp
+uint8_t edgeHardness = 0; // 0=smooth normals, 255=flat/geometric normals
+```
+
+Dans le PS, interpoler entre les smooth normals et les geometric normals (deja
+disponibles via le triplanar). Cout zero cote mesher, petit calcul PS.
+
+### 5. Configurer les materiaux existants
+
+```cpp
+// Dans VoxelWorld::initMaterials() ou equivalent
+materials[5].deformMode = 0; materials[5].deformStrength = 0;   // snow: lisse
+materials[3].deformMode = 2; materials[3].deformStrength = 180; // stone: cubify fort
+materials[6].deformMode = 2; materials[6].deformStrength = 100; // smoothstone: cubify leger
+materials[4].deformMode = 1; materials[4].deformStrength = 60;  // sand: shift subtil
+materials[2].deformMode = 1; materials[2].deformStrength = 30;  // dirt: shift tres leger
+```
+
+## Risques et precautions
+
+- **Self-intersection** : une deformation trop forte peut creer des triangles inverses.
+  Limiter `deformStrength` a ~200 max et verifier visuellement.
+- **Coutures chunk** : la deformation doit etre identique des deux cotes d'une frontiere
+  de chunk. Le hash base sur la position monde (pas locale) garantit la coherence.
+- **Animation** : en mode animation (terrain regenere a 30Hz), la deformation doit etre
+  stable. Utiliser la position entiere (pas le centroide) comme seed du hash.
+
+## Estimation d'effort
+
+- Etendre MaterialDesc : 15 min
+- Fonction deformVertex : 30 min
+- Integration dans meshSurfaceNets : 30 min
+- Tuning des parametres par materiau : 1h
+- **Total : ~2h**
+
+Aucun changement shader, aucun changement GPU buffer, aucun impact performance.
--- a/docs/plan.md
+++ b/docs/plan.md
@ -0,0 +1,94 @@
+# BVLE Voxels - Plan de travail
+
+Fonctionnalites restantes et idees d'evolution, organisees par sujet.
+Chaque sujet a un document d'implementation detaille dans `docs/`.
+
+L'etat actuel du prototype est documente dans `CLAUDE.md` a la racine.
+
+---
+
+## Sujets restants de la specification originale
+
+### 1. GPU Compute Surface Nets (Phase 5.3) ✅
+
+Le mesher smooth fonctionne en GPU compute (2-pass: centroid CS + mesh CS).
+Auto-bascule GPU/CPU. Shaders: `voxelSmoothCentroidCS.hlsl`, `voxelSmoothCS.hlsl`.
+
+**Statut :** Termine.
+
+### 2. LOD multi-resolution (Phase 5.4)
+
+LOD 1 implementé : chunks 32³ couvrant 64³ world space, lodScale dans GPUChunkInfo,
+VS multiplie localPos par lodScale. LOD 0 radius=6, LOD 1 ring radius=12 (480 chunks).
+Pas de smooth/topings sur LOD 1.
+
+Reste a faire : LOD 2+, skirts pour cacher les coutures, fog aux bords, LOD dynamique.
+
+**Statut :** En cours. Voir `docs/plan-lod-skirts.md`.
+
+### 3. Fallback Shadow Maps + SSAO (Phase 6.4)
+
+Le ray tracing est obligatoire pour les ombres/AO. Les GPU sans RT (ou les configs
+faibles) n'ont aucun eclairage directionnel. Le fallback devrait utiliser le pipeline
+existant de Wicked Engine.
+
+**Statut :** Non commence. Priorite basse (tous les GPU cibles supportent RT).
+
+### 4. Connected Blocks / Tuyaux (idee spec)
+
+Blocs contenant des modeles 3D customs avec jointure dynamique selon les voisins
+identiques. Exemple : tuyaux qui se connectent automatiquement. Extension du systeme
+de topings avec bitmask 6-faces au lieu de 4-adjacence.
+
+**Statut :** Concept uniquement.
+
+---
+
+## Nouveaux sujets (inspires du talk Roblox SIGGRAPH 2020)
+
+### 5. Deformation de vertices par materiau
+
+Roblox definit des deformations procedurales par materiau sur les vertices Surface Nets :
+shift (offset random), cubify (lerp vers centre cube), quantize (arrondi a 1/K),
+barrel (cubify en Y), soft/hard edges. Donne du caractere visuel a chaque materiau
+sans cout GPU.
+
+**Statut :** Non commence. Voir `docs/plan-vertex-deformation.md`.
+
+### 6. LOD avec skirts
+
+Roblox utilise un mip pyramid par chunk + octree LOD. Les coutures entre niveaux LOD
+sont resolues par des "skirts" (triangles de debordement + depth bias) au lieu du
+stitching Transvoxel, qui est complexe. Solution elegante et simple.
+
+**Statut :** Non commence. Voir `docs/plan-lod-skirts.md`.
+
+### 7. Textures stylisees reelles
+
+Passer des couleurs procedurales actuelles a de vraies textures (albedo + heightmap +
+normal) dans un texture array. Triplanar mapping ameliore avec detiling (rotation/shift
+par vertex a la Roblox). Height-based blending deja en place cote shader.
+
+**Statut :** Infrastructure presente (texture array 5 layers, triplanar, height blend),
+mais textures generees proceduralement. Voir `docs/plan-stylized-textures.md`.
+
+### 8. Texture quilting (Roblox)
+
+Alternative au triplanar : 1 plan de projection parmi 18 par vertex, encode dans le
+vertex data. Reduit les fetches de 9-27 a 3. Technique a integrer dans le sujet
+textures si le triplanar devient un bottleneck.
+
+**Statut :** Non commence. Integre dans `docs/plan-stylized-textures.md`.
+
+---
+
+## Priorites suggerees
+
+| Priorite | Sujet | Impact | Effort |
+|----------|-------|--------|--------|
+| 1 | Textures stylisees reelles | Visuel majeur | Moyen |
+| 2 | Deformation vertices/materiau | Visuel fort, cout nul | Faible |
+| 3 | LOD avec skirts | Scalabilite | Moyen-eleve |
+| 4 | GPU Surface Nets | Performance smooth | Moyen |
+| 5 | Fallback SM+SSAO | Compatibilite | Faible |
+| 6 | Connected blocks | Gameplay | Eleve |
--- a/shaders/voxelAOApplyCS.hlsl
+++ b/shaders/voxelAOApplyCS.hlsl
@ -55,12 +55,13 @@ float3 computeSky(float2 uv) {
        sky = lerp(horizonColor, nadirColor, h);
    }

-    // Sun glow near sun direction (soft halo)
+    // Sun glow near sun direction (compact disc + subtle haze)
    float3 L = normalize(-sunDirection.xyz);
    float sunDot = saturate(dot(viewDir, L));
-    float sunGlow = pow(sunDot, 32.0) * 0.4;
-    float sunHaze = pow(sunDot, 4.0) * 0.15;
-    sky += float3(1.0, 0.85, 0.5) * (sunGlow + sunHaze);
+    float sunDisc = pow(sunDot, 256.0) * 0.6;   // tight bright disc
+    float sunGlow = pow(sunDot, 64.0) * 0.2;    // narrow glow ring
+    float sunHaze = pow(sunDot, 8.0) * 0.08;    // subtle atmospheric haze
+    sky += float3(1.0, 0.85, 0.5) * (sunDisc + sunGlow + sunHaze);

    return sky;
 }
--- a/shaders/voxelPS.hlsl
+++ b/shaders/voxelPS.hlsl
@ -5,6 +5,7 @@
 #include "voxelCommon.hlsli"

 Texture2DArray materialTextures : register(t1);
+Texture2DArray normalTextures  : register(t7);
 SamplerState materialSampler : register(s0);

 // Voxel data buffer (same as compute mesher uses) — bound at t3 in GPU mesh path
@ -105,7 +106,7 @@ float3 triplanarWeights(float3 normal, float sharpness) {
 // Triplanar sampling — RGB only (non-blended path)
 float3 sampleTriplanar(float3 worldPos, float3 normal, uint texIndex, float tiling) {
    float3 w = triplanarWeights(normal, 4.0);
-    float3 colX = materialTextures.Sample(materialSampler, float3(worldPos.yz * tiling, (float)texIndex)).rgb;
+    float3 colX = materialTextures.Sample(materialSampler, float3(worldPos.zy * tiling, (float)texIndex)).rgb;
    float3 colY = materialTextures.Sample(materialSampler, float3(worldPos.xz * tiling, (float)texIndex)).rgb;
    float3 colZ = materialTextures.Sample(materialSampler, float3(worldPos.xy * tiling, (float)texIndex)).rgb;
    return colX * w.x + colY * w.y + colZ * w.z;
@ -114,12 +115,46 @@ float3 sampleTriplanar(float3 worldPos, float3 normal, uint texIndex, float tili
 // Triplanar sampling — RGBA (includes heightmap in alpha)
 float4 sampleTriplanarRGBA(float3 worldPos, float3 normal, uint texIndex, float tiling) {
    float3 w = triplanarWeights(normal, 4.0);
-    float4 colX = materialTextures.Sample(materialSampler, float3(worldPos.yz * tiling, (float)texIndex));
+    float4 colX = materialTextures.Sample(materialSampler, float3(worldPos.zy * tiling, (float)texIndex));
    float4 colY = materialTextures.Sample(materialSampler, float3(worldPos.xz * tiling, (float)texIndex));
    float4 colZ = materialTextures.Sample(materialSampler, float3(worldPos.xy * tiling, (float)texIndex));
    return colX * w.x + colY * w.y + colZ * w.z;
 }

+// ── Triplanar normal mapping ───────────────────────────────────────
+// UDN (Unreal Derivative Normal) triplanar blend.
+// For each projection axis, the tangent-space normal's XY perturbs the
+// two world-space axes orthogonal to the projection direction.
+float3 sampleTriplanarNormal(float3 worldPos, float3 normal, uint texIndex, float tiling) {
+    float3 w = triplanarWeights(normal, 4.0);
+    float3 axisSign = sign(normal);
+
+    // Sample tangent-space normals per projection axis (Ben Golus UDN triplanar)
+    float3 tnX = normalTextures.Sample(materialSampler, float3(worldPos.zy * tiling, (float)texIndex)).rgb * 2.0 - 1.0;
+    float3 tnY = normalTextures.Sample(materialSampler, float3(worldPos.xz * tiling, (float)texIndex)).rgb * 2.0 - 1.0;
+    float3 tnZ = normalTextures.Sample(materialSampler, float3(worldPos.xy * tiling, (float)texIndex)).rgb * 2.0 - 1.0;
+
+    // OpenGL normal maps: flip green channel ONLY for Y-projection (horizontal faces).
+    // X/Z projections have texture V = world Y (up), which already matches GL convention.
+    // Y-projection has texture V = world Z, where GL/DX conventions differ.
+    tnY.y = -tnY.y;
+
+    // Sign correction for back-facing projections (Golus reference)
+    // Flips the tangent-space X to account for mirrored UVs on negative faces.
+    tnX.x *= axisSign.x;
+    tnY.x *= axisSign.y;
+    tnZ.x *= axisSign.z;
+
+    // UDN blend using RAW normal (NOT abs!) so that negative faces (-X,-Y,-Z)
+    // produce normals pointing in the correct direction. abs() would force
+    // all dominant components positive, inverting lighting on 3 of 6 faces.
+    float3 worldNX = float3(tnX.xy + normal.zy, normal.x).zyx;
+    float3 worldNY = float3(tnY.xy + normal.xz, normal.y).xzy;
+    float3 worldNZ = float3(tnZ.xy + normal.xy, normal.z);
+
+    return normalize(worldNX * w.x + worldNY * w.y + worldNZ * w.z);
+}
+
 // ── Debug face colors ──────────────────────────────────────────────
 static const float3 faceDebugColors[6] = {
    float3(1.0, 0.2, 0.2),   // 0: +X = RED
@ -158,8 +193,6 @@ PSOutput main(PSInput input)

    // ── NORMAL MODE: triplanar textured with height-based blending ──
    float3 N = normalize(input.normal);
-    float3 L = normalize(-sunDirection.xyz);
-    float NdotL = max(dot(N, L), 0.0);

    uint texIndex = clamp(input.materialID - 1u, 0u, 5u);
    float tiling = textureTiling;
@ -198,8 +231,8 @@ PSOutput main(PSInput input)
        uint uNeighborMat = getNeighborMat(voxelCoord, uEdgeDir, normalDir, input.chunkIndex);
        uint vNeighborMat = getNeighborMat(voxelCoord, vEdgeDir, normalDir, input.chunkIndex);

-        // Blend zone: 0.25 voxels from each edge (covers 50% of face total)
-        float blendZone = 0.25;
+        // Blend zone: 0.40 voxels from each edge (covers 80% of face total)
+        float blendZone = 0.40;

        // Edge distances normalized to 0..1 (0=center, 1=edge) for corner attenuation
        float uEdge = abs(faceFracU - 0.5) * 2.0; // 0 at center, 1 at edge
@ -213,12 +246,14 @@ PSOutput main(PSInput input)
        float uWeight = saturate((uAdj - blendStart) / (1.0 - blendStart)) * 0.5;
        float vWeight = saturate((vAdj - blendStart) / (1.0 - blendStart)) * 0.5;

-        // Only blend if neighbor has a different material AND blend flags allow it:
-        // - Current material must NOT resist bleed (resistBleedMask)
-        // - Neighbor material must be allowed to bleed (bleedMask)
+        // Blend flags:
+        // - mainResists: current material resists being bled onto → no blending from this side
+        // - neighResists: neighbor resists bleed → asymmetric blend (neighbor dominates at edge)
        bool mainResists = (resistBleedMask >> input.materialID) & 1u;
        bool uNeighCanBleed = (bleedMask >> uNeighborMat) & 1u;
        bool vNeighCanBleed = (bleedMask >> vNeighborMat) & 1u;
+        bool uNeighResists = (resistBleedMask >> uNeighborMat) & 1u;
+        bool vNeighResists = (resistBleedMask >> vNeighborMat) & 1u;
        bool uBlend = (uNeighborMat > 0u && uNeighborMat != input.materialID && uWeight > 0.001
                       && !mainResists && uNeighCanBleed);
        bool vBlend = (vNeighborMat > 0u && vNeighborMat != input.materialID && vWeight > 0.001
@ -258,9 +293,16 @@ PSOutput main(PSInput input)
                uint uTexIdx = clamp(uNeighborMat - 1u, 0u, 5u);
                float4 uTex = sampleTriplanarRGBA(input.worldPos, N, uTexIdx, tiling);

-                // Symmetric proximity bias: at edge (weight=0.5) bias=0 → pure heightmap.
-                // Away from edge (weight=0) bias=0.5 → main always wins.
-                float bias = 0.5 - uWeight;
+                // Proximity bias controls heightmap blending:
+                // Symmetric: at edge (w=0.5) bias=0 → pure heightmap; center (w=0) bias=0.5 → main wins
+                // Asymmetric (neighbor resists bleed): at edge bias=-0.15 → neighbor gets +0.3
+                //   score advantage (dominates at equal heights); center bias=0.5 → main wins
+                float bias;
+                if (uNeighResists) {
+                    bias = 0.5 - uWeight * 1.6;
+                } else {
+                    bias = 0.5 - uWeight;
+                }
                float mainScore  = mainTex.a + bias;
                float neighScore = uTex.a   - bias;

@ -272,7 +314,12 @@ PSOutput main(PSInput input)
                uint vTexIdx = clamp(vNeighborMat - 1u, 0u, 5u);
                float4 vTex = sampleTriplanarRGBA(input.worldPos, N, vTexIdx, tiling);

-                float bias = 0.5 - vWeight;
+                float bias;
+                if (vNeighResists) {
+                    bias = 0.5 - vWeight * 1.6;
+                } else {
+                    bias = 0.5 - vWeight;
+                }
                float mainScore  = mainTex.a + bias;
                float neighScore = vTex.a   - bias;

@ -292,27 +339,54 @@ PSOutput main(PSInput input)
        albedo = (input.materialID > 0u) ? texColor : baseColor;
    }

+    // ── Normal map perturbation ──
+    float3 flatN = N; // preserve flat face normal for ambient + side-darkening
+    float nmStrength = toneMapParams.z; // 0 = off (F9 toggle)
+    if (nmStrength > 0.0) {
+        float3 perturbedN = sampleTriplanarNormal(input.worldPos, N, texIndex, tiling);
+        N = normalize(lerp(N, perturbedN, nmStrength));
+    }
+
    // ── Lighting ──
-    float hemiLerp = N.y * 0.5 + 0.5; // 0=down, 1=up
+    // Use FLAT normal for hemisphere ambient + side-darkening (consistent per face)
+    // Use PERTURBED normal for NdotL only (organic detail variation)
+    float3 L = normalize(-sunDirection.xyz);
+    float NdotL = max(dot(N, L), 0.0);
+    float hemiLerp = flatN.y * 0.5 + 0.5; // flat: consistent per face orientation
    float3 ambient = lerp(groundAmbient.rgb, skyAmbient.rgb, hemiLerp);
    float3 diffuse = sunColor.rgb * NdotL;

-    // Grass-specific shading (Wonderbox style)
-    bool isGrass = (texIndex == 0); // material 1 = grass = texture layer 0
-    if (isGrass) {
-        // Vertical face darkening: grass sides are darker green (not black)
-        float verticalDarken = saturate(abs(N.y)); // 1=top, 0=side
-        float sideFactor = lerp(0.60, 1.0, verticalDarken); // sides at 60% brightness
-        albedo *= sideFactor;
-
-        // Subtle warm shift: sunlit grass slightly warmer
-        if (NdotL > 0.0) {
-            float3 warmShift = float3(0.08, 0.05, -0.03) * NdotL;
-            diffuse += warmShift;
-        }
-
-        // Boost ambient for grass: inter-reflection from dense foliage
-        ambient *= 1.15;
+    // ── Debug lighting modes (F9 cycle) ──
+    uint dbgLight = (uint)toneMapParams.w;
+    if (dbgLight == 2) {
+        // FLAT: uniform color per face, no texture, no blend, no normal map
+        // Pure lighting with flat face normal. If two +X faces differ here, it's a VS/mesher bug.
+        float flatNdotL = max(dot(flatN, normalize(-sunDirection.xyz)), 0.0);
+        float flatHemi = flatN.y * 0.5 + 0.5;
+        float3 flatAmb = lerp(groundAmbient.rgb, skyAmbient.rgb, flatHemi);
+        float3 flatColor = float3(0.5, 0.5, 0.5) * (flatAmb + sunColor.rgb * flatNdotL);
+        output.color = float4(flatColor, 1.0);
+        output.normal = float4(flatN, 0.0);
+        return output;
+    }
+    if (dbgLight == 3) {
+        // ALBEDO only: texture + blend, no lighting
+        output.color = float4(albedo, 1.0);
+        output.normal = float4(flatN, 0.0);
+        return output;
+    }
+    if (dbgLight == 4) {
+        // NdotL only: grayscale NdotL with flat normal (no normal map)
+        float flatNdotL = max(dot(flatN, normalize(-sunDirection.xyz)), 0.0);
+        output.color = float4(flatNdotL, flatNdotL, flatNdotL, 1.0);
+        output.normal = float4(flatN, 0.0);
+        return output;
+    }
+    if (dbgLight == 5) {
+        // NORMAL viz: geometric normal mapped to RGB (XYZ → [0,1])
+        output.color = float4(flatN * 0.5 + 0.5, 1.0);
+        output.normal = float4(flatN, 0.0);
+        return output;
    }

    float3 color = albedo * (ambient + diffuse);
--- a/shaders/voxelSmoothCS.hlsl
+++ b/shaders/voxelSmoothCS.hlsl
@ -80,11 +80,25 @@ float3 computeQuadFaceNormal(int3 c0, int3 c1, int3 c2, int3 c3,
    return fn; // area-weighted (not normalized)
 }

-// ── Smooth normal for a vertex at cell v ────────────────────────────
+// ── Smooth normal + consistency for a vertex at cell v ──────────────
 // Checks all 12 incident edges (4 per axis), computes face normals from
-// centroid grid, averages them. All reads from grid only.
-float3 computeSmoothNormal(int3 v) {
+// centroid grid, averages them. Also returns a consistency metric:
+//   consistency = |sum(fn)| / sum(|fn|)
+//   = 1.0 when all face normals agree (flat surface)
+//   ≈ 0.707 at a 90° edge (two perpendicular faces)
+//   → 0 when faces cancel out
+// Used at emission time to blend between smooth normal (interior) and
+// face normal (edge vertices).
+float3 computeSmoothNormal(int3 v, out float consistency) {
    float3 accum = float3(0, 0, 0);
+    float totalMag = 0;
+
+    // Helper macro: accumulate one quad's face normal + its magnitude
+    #define ACCUM_QUAD(c0,c1,c2,c3,solid,axis) { \
+        float3 fn_ = computeQuadFaceNormal(c0,c1,c2,c3,solid,axis); \
+        accum += fn_; \
+        totalMag += length(fn_); \
+    }

    // X-edges: at (v.x, v.y+dy, v.z+dz) for dy,dz in {0,1}
    {
@ -97,30 +111,14 @@ float3 computeSmoothNormal(int3 v) {
        bool sv_11 = isCellSolid(int3(v.x, v.y+1, v.z+1));
        bool sv_11_x1 = isCellSolid(int3(v.x+1, v.y+1, v.z+1));

-        // Edge (v.x, v.y, v.z)
-        if (sv != sv_x1) {
-            accum += computeQuadFaceNormal(
-                v + int3(0,-1,-1), v + int3(0,0,-1),
-                v + int3(0,-1,0),  v, sv, 0);
-        }
-        // Edge (v.x, v.y+1, v.z)
-        if (sv_01 != sv_01_x1) {
-            accum += computeQuadFaceNormal(
-                int3(v.x, v.y, v.z-1), int3(v.x, v.y+1, v.z-1),
-                v, int3(v.x, v.y+1, v.z), sv_01, 0);
-        }
-        // Edge (v.x, v.y, v.z+1)
-        if (sv_10 != sv_10_x1) {
-            accum += computeQuadFaceNormal(
-                int3(v.x, v.y-1, v.z), v,
-                int3(v.x, v.y-1, v.z+1), int3(v.x, v.y, v.z+1), sv_10, 0);
-        }
-        // Edge (v.x, v.y+1, v.z+1)
-        if (sv_11 != sv_11_x1) {
-            accum += computeQuadFaceNormal(
-                v, int3(v.x, v.y+1, v.z),
-                int3(v.x, v.y, v.z+1), int3(v.x, v.y+1, v.z+1), sv_11, 0);
-        }
+        if (sv != sv_x1)
+            ACCUM_QUAD(v+int3(0,-1,-1), v+int3(0,0,-1), v+int3(0,-1,0), v, sv, 0)
+        if (sv_01 != sv_01_x1)
+            ACCUM_QUAD(int3(v.x,v.y,v.z-1), int3(v.x,v.y+1,v.z-1), v, int3(v.x,v.y+1,v.z), sv_01, 0)
+        if (sv_10 != sv_10_x1)
+            ACCUM_QUAD(int3(v.x,v.y-1,v.z), v, int3(v.x,v.y-1,v.z+1), int3(v.x,v.y,v.z+1), sv_10, 0)
+        if (sv_11 != sv_11_x1)
+            ACCUM_QUAD(v, int3(v.x,v.y+1,v.z), int3(v.x,v.y,v.z+1), int3(v.x,v.y+1,v.z+1), sv_11, 0)
    }

    // Y-edges: at (v.x+dx, v.y, v.z+dz) for dx,dz in {0,1}
@ -134,26 +132,14 @@ float3 computeSmoothNormal(int3 v) {
        bool sv_11 = isCellSolid(int3(v.x+1, v.y, v.z+1));
        bool sv_11_y1 = isCellSolid(int3(v.x+1, v.y+1, v.z+1));

-        if (sv != sv_y1) {
-            accum += computeQuadFaceNormal(
-                v + int3(-1,0,-1), v + int3(0,0,-1),
-                v + int3(-1,0,0),  v, sv, 1);
-        }
-        if (sv_10 != sv_10_y1) {
-            accum += computeQuadFaceNormal(
-                int3(v.x, v.y, v.z-1), int3(v.x+1, v.y, v.z-1),
-                v, int3(v.x+1, v.y, v.z), sv_10, 1);
-        }
-        if (sv_01 != sv_01_y1) {
-            accum += computeQuadFaceNormal(
-                int3(v.x-1, v.y, v.z), v,
-                int3(v.x-1, v.y, v.z+1), int3(v.x, v.y, v.z+1), sv_01, 1);
-        }
-        if (sv_11 != sv_11_y1) {
-            accum += computeQuadFaceNormal(
-                v, int3(v.x+1, v.y, v.z),
-                int3(v.x, v.y, v.z+1), int3(v.x+1, v.y, v.z+1), sv_11, 1);
-        }
+        if (sv != sv_y1)
+            ACCUM_QUAD(v+int3(-1,0,-1), v+int3(0,0,-1), v+int3(-1,0,0), v, sv, 1)
+        if (sv_10 != sv_10_y1)
+            ACCUM_QUAD(int3(v.x,v.y,v.z-1), int3(v.x+1,v.y,v.z-1), v, int3(v.x+1,v.y,v.z), sv_10, 1)
+        if (sv_01 != sv_01_y1)
+            ACCUM_QUAD(int3(v.x-1,v.y,v.z), v, int3(v.x-1,v.y,v.z+1), int3(v.x,v.y,v.z+1), sv_01, 1)
+        if (sv_11 != sv_11_y1)
+            ACCUM_QUAD(v, int3(v.x+1,v.y,v.z), int3(v.x,v.y,v.z+1), int3(v.x+1,v.y,v.z+1), sv_11, 1)
    }

    // Z-edges: at (v.x+dx, v.y+dy, v.z) for dx,dy in {0,1}
@ -167,30 +153,21 @@ float3 computeSmoothNormal(int3 v) {
        bool sv_11 = isCellSolid(int3(v.x+1, v.y+1, v.z));
        bool sv_11_z1 = isCellSolid(int3(v.x+1, v.y+1, v.z+1));

-        if (sv != sv_z1) {
-            accum += computeQuadFaceNormal(
-                v + int3(-1,-1,0), v + int3(0,-1,0),
-                v + int3(-1,0,0),  v, sv, 2);
-        }
-        if (sv_10 != sv_10_z1) {
-            accum += computeQuadFaceNormal(
-                int3(v.x, v.y-1, v.z), int3(v.x+1, v.y-1, v.z),
-                v, int3(v.x+1, v.y, v.z), sv_10, 2);
-        }
-        if (sv_01 != sv_01_z1) {
-            accum += computeQuadFaceNormal(
-                int3(v.x-1, v.y, v.z), v,
-                int3(v.x-1, v.y+1, v.z), int3(v.x, v.y+1, v.z), sv_01, 2);
-        }
-        if (sv_11 != sv_11_z1) {
-            accum += computeQuadFaceNormal(
-                v, int3(v.x+1, v.y, v.z),
-                int3(v.x, v.y+1, v.z), int3(v.x+1, v.y+1, v.z), sv_11, 2);
-        }
+        if (sv != sv_z1)
+            ACCUM_QUAD(v+int3(-1,-1,0), v+int3(0,-1,0), v+int3(-1,0,0), v, sv, 2)
+        if (sv_10 != sv_10_z1)
+            ACCUM_QUAD(int3(v.x,v.y-1,v.z), int3(v.x+1,v.y-1,v.z), v, int3(v.x+1,v.y,v.z), sv_10, 2)
+        if (sv_01 != sv_01_z1)
+            ACCUM_QUAD(int3(v.x-1,v.y,v.z), v, int3(v.x-1,v.y+1,v.z), int3(v.x,v.y+1,v.z), sv_01, 2)
+        if (sv_11 != sv_11_z1)
+            ACCUM_QUAD(v, int3(v.x+1,v.y,v.z), int3(v.x,v.y+1,v.z), int3(v.x+1,v.y+1,v.z), sv_11, 2)
    }
+    #undef ACCUM_QUAD

-    float len = length(accum);
-    return (len > 0.0001) ? accum / len : float3(0, 1, 0);
+    float accumLen = length(accum);
+    // consistency: 1.0 = all faces agree, <1.0 = diverging face directions
+    consistency = (totalMag > 0.0001) ? accumLen / totalMag : 1.0;
+    return (accumLen > 0.0001) ? accum / accumLen : float3(0, 1, 0);
 }

 // ── Emit helpers ────────────────────────────────────────────────────
@ -249,16 +226,30 @@ void main(uint3 DTid : SV_DispatchThreadID)
            if (isCentroidValid(cells[0]) && isCentroidValid(cells[1]) &&
                isCentroidValid(cells[2]) && isCentroidValid(cells[3])) {
                float3 p[4], n[4];
+                float con[4];
                [loop] for (uint i = 0; i < 4; i++)
                    p[i] = chunkWorldPos + readCentroidPos(cells[i]);
                [loop] for (uint i = 0; i < 4; i++)
-                    n[i] = computeSmoothNormal(cells[i]);
+                    n[i] = computeSmoothNormal(cells[i], con[i]);

                float3 fn = cross(p[1] - p[0], p[3] - p[0]);
                int s = cellSolid ? +1 : -1;
                if ((fn.x > 0.0) != (s > 0)) fn = -fn;
                bool windingA = !cellSolid;

+                // Consistency-based blend: sharp edge vertices → face normal, curved → smooth
+                // consistency ≈ 1.0 = flat, ≈ 0.707 = 90° edge, < 0.5 = sharp corner
+                // smoothstep(0.70, 0.90): snaps to face normal at 90° boundaries (con<0.70)
+                // for seamless join with blocky, preserves smooth for terrain curves (con>0.90)
+                float fnLen = length(fn);
+                if (fnLen > 0.0001) {
+                    float3 fnN = fn / fnLen;
+                    [loop] for (uint i = 0; i < 4; i++) {
+                        float t = smoothstep(0.70, 0.90, con[i]);
+                        n[i] = normalize(lerp(fnN, n[i], t));
+                    }
+                }
+
                uint packed = readGridPacked(cells[3]);
                uint mat = packed & 0xFF;
                uint secMat = (packed >> 8) & 0xFF;
@ -281,10 +272,11 @@ void main(uint3 DTid : SV_DispatchThreadID)
            if (isCentroidValid(cells[0]) && isCentroidValid(cells[1]) &&
                isCentroidValid(cells[2]) && isCentroidValid(cells[3])) {
                float3 p[4], n[4];
+                float con[4];
                [loop] for (uint i = 0; i < 4; i++)
                    p[i] = chunkWorldPos + readCentroidPos(cells[i]);
                [loop] for (uint i = 0; i < 4; i++)
-                    n[i] = computeSmoothNormal(cells[i]);
+                    n[i] = computeSmoothNormal(cells[i], con[i]);

                float3 fn = cross(p[1] - p[0], p[3] - p[0]);
                int s = cellSolid ? +1 : -1;
@ -292,6 +284,16 @@ void main(uint3 DTid : SV_DispatchThreadID)
                bool windingA = !cellSolid;
                windingA = !windingA; // Y-axis winding flip

+                // Consistency-based blend (same formula as X-edge)
+                float fnLen = length(fn);
+                if (fnLen > 0.0001) {
+                    float3 fnN = fn / fnLen;
+                    [loop] for (uint i = 0; i < 4; i++) {
+                        float t = smoothstep(0.70, 0.90, con[i]);
+                        n[i] = normalize(lerp(fnN, n[i], t));
+                    }
+                }
+
                uint packed = readGridPacked(cells[3]);
                uint mat = packed & 0xFF;
                uint secMat = (packed >> 8) & 0xFF;
@ -314,16 +316,27 @@ void main(uint3 DTid : SV_DispatchThreadID)
            if (isCentroidValid(cells[0]) && isCentroidValid(cells[1]) &&
                isCentroidValid(cells[2]) && isCentroidValid(cells[3])) {
                float3 p[4], n[4];
+                float con[4];
                [loop] for (uint i = 0; i < 4; i++)
                    p[i] = chunkWorldPos + readCentroidPos(cells[i]);
                [loop] for (uint i = 0; i < 4; i++)
-                    n[i] = computeSmoothNormal(cells[i]);
+                    n[i] = computeSmoothNormal(cells[i], con[i]);

                float3 fn = cross(p[1] - p[0], p[3] - p[0]);
                int s = cellSolid ? +1 : -1;
                if ((fn.z > 0.0) != (s > 0)) fn = -fn;
                bool windingA = !cellSolid;

+                // Consistency-based blend (same formula as X-edge)
+                float fnLen = length(fn);
+                if (fnLen > 0.0001) {
+                    float3 fnN = fn / fnLen;
+                    [loop] for (uint i = 0; i < 4; i++) {
+                        float t = smoothstep(0.70, 0.90, con[i]);
+                        n[i] = normalize(lerp(fnN, n[i], t));
+                    }
+                }
+
                uint packed = readGridPacked(cells[3]);
                uint mat = packed & 0xFF;
                uint secMat = (packed >> 8) & 0xFF;
--- a/shaders/voxelSmoothPS.hlsl
+++ b/shaders/voxelSmoothPS.hlsl
@ -6,6 +6,7 @@
 #include "voxelCommon.hlsli"

 Texture2DArray<float4> materialTextures : register(t1);
+Texture2DArray<float4> normalTextures  : register(t7);
 StructuredBuffer<GPUChunkInfo> chunkInfoBuffer : register(t2);
 StructuredBuffer<uint> voxelData : register(t3);
 SamplerState texSampler : register(s0);
@ -76,7 +77,7 @@ float3 triplanarWeights(float3 n, float sharpness) {

 float3 sampleTriplanar(float3 wp, float3 n, uint texIdx, float tiling) {
    float3 w = triplanarWeights(n, 4.0);
-    float3 cx = materialTextures.Sample(texSampler, float3(wp.yz * tiling, (float)texIdx)).rgb;
+    float3 cx = materialTextures.Sample(texSampler, float3(wp.zy * tiling, (float)texIdx)).rgb;
    float3 cy = materialTextures.Sample(texSampler, float3(wp.xz * tiling, (float)texIdx)).rgb;
    float3 cz = materialTextures.Sample(texSampler, float3(wp.xy * tiling, (float)texIdx)).rgb;
    return cx * w.x + cy * w.y + cz * w.z;
@ -84,12 +85,33 @@ float3 sampleTriplanar(float3 wp, float3 n, uint texIdx, float tiling) {

 float4 sampleTriplanarRGBA(float3 wp, float3 n, uint texIdx, float tiling) {
    float3 w = triplanarWeights(n, 4.0);
-    float4 cx = materialTextures.Sample(texSampler, float3(wp.yz * tiling, (float)texIdx));
+    float4 cx = materialTextures.Sample(texSampler, float3(wp.zy * tiling, (float)texIdx));
    float4 cy = materialTextures.Sample(texSampler, float3(wp.xz * tiling, (float)texIdx));
    float4 cz = materialTextures.Sample(texSampler, float3(wp.xy * tiling, (float)texIdx));
    return cx * w.x + cy * w.y + cz * w.z;
 }

+// ── Triplanar normal mapping (UDN blend) ────────────────────────
+float3 sampleTriplanarNormal(float3 wp, float3 n, uint texIdx, float tiling) {
+    float3 w = triplanarWeights(n, 4.0);
+    float3 axisSign = sign(n);
+    // Ben Golus UDN reference — swizzled coordinates + sign corrections
+    float3 tnX = normalTextures.Sample(texSampler, float3(wp.zy * tiling, (float)texIdx)).rgb * 2.0 - 1.0;
+    float3 tnY = normalTextures.Sample(texSampler, float3(wp.xz * tiling, (float)texIdx)).rgb * 2.0 - 1.0;
+    float3 tnZ = normalTextures.Sample(texSampler, float3(wp.xy * tiling, (float)texIdx)).rgb * 2.0 - 1.0;
+    // OpenGL normal maps: flip green channel ONLY for Y-projection
+    tnY.y = -tnY.y;
+    // Sign correction for back-facing projections
+    tnX.x *= axisSign.x;
+    tnY.x *= axisSign.y;
+    tnZ.x *= axisSign.z;
+    // UDN blend using RAW normal (NOT abs!) — preserves sign for negative faces
+    float3 worldNX = float3(tnX.xy + n.zy, n.x).zyx;
+    float3 worldNY = float3(tnY.xy + n.xz, n.y).xzy;
+    float3 worldNZ = float3(tnZ.xy + n.xy, n.z);
+    return normalize(worldNX * w.x + worldNY * w.y + worldNZ * w.z);
+}
+
 // ── MRT Output ──────────────────────────────────────────────────
 struct PSOutput {
    float4 color  : SV_TARGET0;
@ -102,14 +124,11 @@ PSOutput main(PSInput input) {
    PSOutput output;
    float3 N = normalize(input.normal); // smooth normal (for lighting)

-    // Geometric normal from screen-space derivatives of worldPos.
-    // This is the true triangle face normal — use it for triplanar weights
-    // to avoid texture stretching caused by smooth normal interpolation.
-    float3 dpx = ddx(input.worldPos);
-    float3 dpy = ddy(input.worldPos);
-    float3 geoN = normalize(cross(dpx, dpy));
-    // Ensure geometric normal faces same hemisphere as smooth normal
-    if (dot(geoN, N) < 0.0) geoN = -geoN;
+    // NOTE: geoN (ddx/ddy geometric normal) is NOT used for triplanar sampling
+    // or normal mapping on smooth surfaces. It changes abruptly at triangle edges,
+    // causing per-triangle faceting in texture weights, normal perturbation, and
+    // therefore lighting (NdotL). All triplanar operations use N (smooth interpolated
+    // normal) which varies continuously across vertices → seamless result.

    float tiling = textureTiling;

@ -160,7 +179,7 @@ PSOutput main(PSInput input) {
    uint vNeighborMat = getNeighborMat(voxelCoord, vEdgeDir, normalDir, input.chunkIndex);

    // ── Blend weights (SAME params as blocky PS) ──
-    float blendZone = 0.25;
+    float blendZone = 0.40;
    float uEdge = abs(faceFracU - 0.5) * 2.0;
    float vEdge = abs(faceFracV - 0.5) * 2.0;

@ -175,6 +194,8 @@ PSOutput main(PSInput input) {
    bool mainResists = (resistBleedMask >> selfMat) & 1u;
    bool uNeighCanBleed = (bleedMask >> uNeighborMat) & 1u;
    bool vNeighCanBleed = (bleedMask >> vNeighborMat) & 1u;
+    bool uNeighResists = (resistBleedMask >> uNeighborMat) & 1u;
+    bool vNeighResists = (resistBleedMask >> vNeighborMat) & 1u;
    bool uBlend = (uNeighborMat > 0u && uNeighborMat != selfMat && uWeight > 0.001
                   && !mainResists && uNeighCanBleed);
    bool vBlend = (vNeighborMat > 0u && vNeighborMat != selfMat && vWeight > 0.001
@ -185,14 +206,19 @@ PSOutput main(PSInput input) {
    float3 albedo;

    if (uBlend || vBlend) {
-        float4 mainTex = sampleTriplanarRGBA(input.worldPos, geoN, selfTexIdx, tiling);
+        float4 mainTex = sampleTriplanarRGBA(input.worldPos, N, selfTexIdx, tiling);
        float3 result = mainTex.rgb;
        float sharpness = 16.0;

        if (uBlend) {
            uint uTexIdx = clamp(uNeighborMat - 1u, 0u, 5u);
-            float4 uTex = sampleTriplanarRGBA(input.worldPos, geoN, uTexIdx, tiling);
-            float bias = 0.5 - uWeight;
+            float4 uTex = sampleTriplanarRGBA(input.worldPos, N, uTexIdx, tiling);
+            float bias;
+            if (uNeighResists) {
+                bias = 0.5 - uWeight * 1.6;
+            } else {
+                bias = 0.5 - uWeight;
+            }
            float mainScore  = mainTex.a + bias;
            float neighScore = uTex.a   - bias;
            float blend = saturate((neighScore - mainScore) * sharpness + 0.5);
@ -201,8 +227,13 @@ PSOutput main(PSInput input) {

        if (vBlend) {
            uint vTexIdx = clamp(vNeighborMat - 1u, 0u, 5u);
-            float4 vTex = sampleTriplanarRGBA(input.worldPos, geoN, vTexIdx, tiling);
-            float bias = 0.5 - vWeight;
+            float4 vTex = sampleTriplanarRGBA(input.worldPos, N, vTexIdx, tiling);
+            float bias;
+            if (vNeighResists) {
+                bias = 0.5 - vWeight * 1.6;
+            } else {
+                bias = 0.5 - vWeight;
+            }
            float mainScore  = mainTex.a + bias;
            float neighScore = vTex.a   - bias;
            float blend = saturate((neighScore - mainScore) * sharpness + 0.5);
@ -211,15 +242,57 @@ PSOutput main(PSInput input) {

        albedo = result;
    } else {
-        albedo = sampleTriplanar(input.worldPos, geoN, selfTexIdx, tiling);
+        albedo = sampleTriplanar(input.worldPos, N, selfTexIdx, tiling);
    }

-    // Lighting
+    // ── Normal map perturbation ──
+    float3 flatN = N; // preserve for ambient
+    float nmStrength = toneMapParams.z;
+    if (nmStrength > 0.0) {
+        float3 perturbedN = sampleTriplanarNormal(input.worldPos, N, selfTexIdx, tiling);
+        N = normalize(lerp(N, perturbedN, nmStrength * 0.7)); // lighter on smooth for softer transitions
+    }
+
+    // ── Lighting ──
    float3 L = normalize(-sunDirection.xyz);
    float NdotL = max(dot(N, L), 0.0);
-    float hemiLerp = N.y * 0.5 + 0.5;
+    float hemiLerp = flatN.y * 0.5 + 0.5;
    float3 ambient = lerp(groundAmbient.rgb, skyAmbient.rgb, hemiLerp);
-    float3 color = albedo * (sunColor.rgb * NdotL + ambient);
+    float3 diffuse = sunColor.rgb * NdotL;
+
+    // ── Debug lighting modes (F9 cycle) ──
+    uint dbgLight = (uint)toneMapParams.w;
+    if (dbgLight == 2) {
+        // FLAT: uniform gray, no texture, no normal map — pure lighting with geometric normal
+        float flatNdotL = max(dot(flatN, normalize(-sunDirection.xyz)), 0.0);
+        float flatHemi = flatN.y * 0.5 + 0.5;
+        float3 flatAmb = lerp(groundAmbient.rgb, skyAmbient.rgb, flatHemi);
+        float3 flatColor = float3(0.5, 0.5, 0.5) * (flatAmb + sunColor.rgb * flatNdotL);
+        output.color = float4(flatColor, 1.0);
+        output.normal = float4(flatN, 0.0);
+        return output;
+    }
+    if (dbgLight == 3) {
+        // ALBEDO only: texture + blend, no lighting
+        output.color = float4(albedo, 1.0);
+        output.normal = float4(flatN, 0.0);
+        return output;
+    }
+    if (dbgLight == 4) {
+        // NdotL only: grayscale NdotL with flat normal (no normal map)
+        float flatNdotL = max(dot(flatN, normalize(-sunDirection.xyz)), 0.0);
+        output.color = float4(flatNdotL, flatNdotL, flatNdotL, 1.0);
+        output.normal = float4(flatN, 0.0);
+        return output;
+    }
+    if (dbgLight == 5) {
+        // NORMAL viz: geometric normal mapped to RGB (XYZ → [0,1])
+        output.color = float4(flatN * 0.5 + 0.5, 1.0);
+        output.normal = float4(flatN, 0.0);
+        return output;
+    }
+
+    float3 color = albedo * (ambient + diffuse);

    // ── Rim light ──
    float3 V = normalize(cameraPosition.xyz - input.worldPos);
--- a/shaders/voxelTopingBLASCS.hlsl
+++ b/shaders/voxelTopingBLASCS.hlsl
@ -0,0 +1,80 @@
+// BVLE Voxels - Toping BLAS Position Extraction Compute Shader
+// Replaces the 196ms CPU loop that computed world-space toping positions.
+// Reads vertex templates (t4) + instance positions (t5) + group table (t7),
+// writes flat float3 positions (u0) for DXR BLAS construction.
+//
+// One thread per output vertex. Group table maps global vertex index to
+// the correct (instance, local vertex) pair via prefix-sum offsets.
+
+#include "voxelCommon.hlsli"
+
+// Toping mesh vertex (must match C++ TopingVertex, 24 bytes)
+struct TopingVtx {
+    float3 position; // local to voxel [0,1]^3
+    float3 normal;   // unused here, but struct must match
+};
+
+// Toping instance (just the world position, 12 bytes)
+struct TopingInst {
+    float3 worldPos;
+};
+
+// Draw group descriptor for BLAS extraction (must match C++ TopingBLASGroupGPU, 20 bytes)
+struct TopingBLASGroup {
+    uint globalVertexOffset;    // prefix sum: first global vertex index for this group
+    uint vertexTemplateOffset;  // offset into topingVertices (t4)
+    uint vertexCount;           // vertices per instance (mesh slice count)
+    uint instanceOffset;        // offset into topingInstances (t5)
+    uint instanceCount;         // number of instances in this group
+};
+
+StructuredBuffer<TopingVtx>       topingVertices  : register(t4);
+StructuredBuffer<TopingInst>      topingInstances : register(t5);
+StructuredBuffer<TopingBLASGroup> topingGroups    : register(t7);
+
+// Output: raw float3 positions (12 bytes each)
+RWByteAddressBuffer blasPositions : register(u0);
+
+// Push constants (b999)
+struct TopingBLASPush {
+    uint totalVertices;
+    uint groupCount;
+    uint pad0, pad1, pad2, pad3, pad4, pad5, pad6, pad7, pad8, pad9;
+};
+[[vk::push_constant]] ConstantBuffer<TopingBLASPush> push : register(b999);
+
+void storeFloat3(uint byteOffset, float3 v) {
+    blasPositions.Store(byteOffset,      asuint(v.x));
+    blasPositions.Store(byteOffset + 4,  asuint(v.y));
+    blasPositions.Store(byteOffset + 8,  asuint(v.z));
+}
+
+[RootSignature(VOXEL_ROOTSIG)]
+[numthreads(64, 1, 1)]
+void main(uint3 DTid : SV_DispatchThreadID) {
+    uint globalIdx = DTid.x;
+    if (globalIdx >= push.totalVertices) return;
+
+    // Find which group this vertex belongs to (linear scan, max ~32 groups)
+    uint groupIdx = 0;
+    for (uint g = 1; g < push.groupCount; g++) {
+        if (globalIdx >= topingGroups[g].globalVertexOffset)
+            groupIdx = g;
+        else
+            break;
+    }
+
+    TopingBLASGroup grp = topingGroups[groupIdx];
+
+    // Map global vertex to (instance, local vertex) within this group
+    uint localIdx    = globalIdx - grp.globalVertexOffset;
+    uint instanceIdx = grp.instanceOffset + localIdx / grp.vertexCount;
+    uint vertexIdx   = grp.vertexTemplateOffset + localIdx % grp.vertexCount;
+
+    TopingVtx vtx   = topingVertices[vertexIdx];
+    TopingInst inst  = topingInstances[instanceIdx];
+
+    float3 worldPos = inst.worldPos + vtx.position;
+
+    storeFloat3(globalIdx * 12, worldPos);
+}
--- a/shaders/voxelTopingVS.hlsl
+++ b/shaders/voxelTopingVS.hlsl
@ -50,13 +50,15 @@ VSOutput main(uint vertexID : SV_VertexID, uint instanceID : SV_InstanceID) {
    // Quadratic scaling: base stays anchored, tips sway the most.
    if (push.materialID != 3u) { // not stone
        float localHeight = vtx.position.y - 1.0;
+        float amplitude = 2.0;
+        float frequency = 1.4;
        if (localHeight > 0.0) {
            float heightFactor = localHeight * localHeight; // quadratic
-            float phase = worldPos.x * 1.8 + worldPos.z * 1.3 + windTime * 3.5;
-            float phase2 = worldPos.x * 0.7 - worldPos.z * 2.1 + windTime * 2.7;
-            float swayX = sin(phase) * 0.11 * heightFactor;
-            float swayZ = cos(phase2) * 0.08 * heightFactor;
-            float swayY = -abs(sin(phase * 0.7)) * 0.02 * heightFactor; // slight droop
+            float phase = worldPos.x * 1.8 + worldPos.z * 1.3 + windTime * 3.5 * frequency;
+            float phase2 = worldPos.x * 0.7 - worldPos.z * 2.1 + windTime * 2.7 * frequency;
+            float swayX = sin(phase) * 0.11 * heightFactor * amplitude;
+            float swayZ = cos(phase2) * 0.08 * heightFactor * amplitude;
+            float swayY = -abs(sin(phase * 0.7)) * 0.02 * heightFactor * amplitude; // slight droop
            worldPos.x += swayX;
            worldPos.y += swayY;
            worldPos.z += swayZ;
--- a/src/app/main.cpp
+++ b/src/app/main.cpp
@ -139,19 +139,29 @@ int APIENTRY wWinMain(
    wcex.lpszClassName = L"BVLEVoxels";
    RegisterClassExW(&wcex);

-    // Screenshot mode: small minimized window to avoid interrupting user
+    // Compute window size so the client area is exactly 1920x1080
+    DWORD style = WS_OVERLAPPEDWINDOW;
+    int clientW = isScreenshot ? 640 : 1920;
+    int clientH = isScreenshot ? 480 : 1080;
+    RECT rc = { 0, 0, clientW, clientH };
+    AdjustWindowRect(&rc, style, FALSE);
+    int windowW = rc.right - rc.left;
+    int windowH = rc.bottom - rc.top;
+
+    // Center on screen
+    int screenW = GetSystemMetrics(SM_CXSCREEN);
+    int screenH = GetSystemMetrics(SM_CYSCREEN);
+    int posX = isScreenshot ? 0 : (screenW - windowW) / 2;
+    int posY = isScreenshot ? 0 : (screenH - windowH) / 2;
+
    HWND hWnd = CreateWindowW(
        wcex.lpszClassName,
        isScreenshot ? L"BVLE Screenshot" : L"BVLE Voxels - Prototype",
-        WS_OVERLAPPEDWINDOW,
-        isScreenshot ? 0 : CW_USEDEFAULT,
-        isScreenshot ? 0 : 0,
-        isScreenshot ? 640 : 1920,
-        isScreenshot ? 480 : 1080,
+        style,
+        posX, posY, windowW, windowH,
        nullptr, nullptr, hInstance, nullptr
    );
-    // SW_SHOWNOACTIVATE: visible but doesn't steal focus (minimized windows don't render)
-    ShowWindow(hWnd, isScreenshot ? SW_SHOWNOACTIVATE : SW_SHOWMAXIMIZED);
+    ShowWindow(hWnd, isScreenshot ? SW_SHOWNOACTIVATE : SW_SHOW);

    // Initialize Wicked Engine
    application.SetWindow(hWnd);
@ -198,9 +208,10 @@ int APIENTRY wWinMain(
            if (renderPath.screenshotMode) {
                struct CamView { float x, y, z, pitch, yaw; const char* name; };
                static const CamView views[] = {
-                    { 223.f, 36.5f, 261.f, -0.20f, 0.7f, "closeup"   },  // close-up: slightly above grass, looking across
-                    { 222.5f, 36.2f, 261.f, -0.10f, 0.5f,"blade"     },  // eye-level with grass blades
-                    { 220.f, 39.f, 258.f, -0.35f, 0.7f,  "medium"    },  // medium shot of grass patch
+                    { 220.f, 42.f, 258.f, -0.40f, 0.7f, "landscape" },  // higher overview
+                    { 220.f, 39.f, 258.f, -0.35f, 0.7f, "medium"   },  // medium shot, terrain detail
+                    { 222.f, 37.f, 260.f, -0.20f, 0.5f, "closeup"  },  // close-up ground level
+                    { 220.f, 120.f, 258.f, 1.0f, 0.7f, "birdseye"  },  // bird's eye (LOD overview)
                };
                static const int numViews = sizeof(views) / sizeof(views[0]);
                static int currentView = 0;
--- a/src/voxel/DeferredGPUBuffer.h
+++ b/src/voxel/DeferredGPUBuffer.h
@ -0,0 +1,68 @@
+#pragma once
+#include "WickedEngine.h"
+
+namespace voxel {
+
+// ── Deferred GPU Buffer ─────────────────────────────────────────
+// Encapsulates the repeated pattern of:
+//   1. CPU staging data prepared during Update()
+//   2. GPU buffer with capacity-based growth (25% headroom)
+//   3. Dirty flag for deferred upload in Render()
+//
+// Eliminates ~50 lines of boilerplate per buffer and centralizes
+// the invariants (capacity >= count, CreateBuffer with nullptr,
+// UpdateBuffer with actual data size).
+
+struct DeferredGPUBuffer {
+    wi::graphics::GPUBuffer gpu;
+    mutable uint32_t capacity = 0;  // in elements
+    mutable bool dirty = false;
+    uint32_t stride = 0;            // bytes per element
+
+    // Ensure GPU buffer has enough capacity for elementCount elements.
+    // Creates/recreates buffer only when capacity is insufficient.
+    // Returns true if buffer was (re)created.
+    bool ensureCapacity(wi::graphics::GraphicsDevice* device,
+                        uint32_t elementCount,
+                        uint32_t elementStride,
+                        wi::graphics::BindFlag bindFlags,
+                        wi::graphics::ResourceMiscFlag miscFlags = wi::graphics::ResourceMiscFlag::BUFFER_STRUCTURED)
+    {
+        stride = elementStride;
+        if (gpu.IsValid() && capacity >= elementCount) return false;
+
+        capacity = elementCount + elementCount / 4; // 25% headroom
+        wi::graphics::GPUBufferDesc desc;
+        desc.size = (uint64_t)capacity * stride;
+        desc.bind_flags = bindFlags;
+        desc.misc_flags = miscFlags;
+        desc.stride = (miscFlags == wi::graphics::ResourceMiscFlag::BUFFER_STRUCTURED) ? stride : 0;
+        desc.usage = wi::graphics::Usage::DEFAULT;
+        device->CreateBuffer(&desc, nullptr, &gpu);
+        dirty = true;
+        return true;
+    }
+
+    // Upload data to GPU. Call from Render() with a valid CommandList.
+    // dataCount = number of elements to upload (may be < capacity).
+    void upload(wi::graphics::GraphicsDevice* device,
+                wi::graphics::CommandList cmd,
+                const void* data,
+                uint32_t dataCount) const
+    {
+        if (!dirty || !gpu.IsValid() || dataCount == 0 || !data) return;
+        size_t uploadSize = (size_t)dataCount * stride;
+        size_t bufferSize = (size_t)capacity * stride;
+        if (uploadSize <= bufferSize) {
+            device->UpdateBuffer(&gpu, data, cmd, uploadSize);
+        }
+        dirty = false;
+    }
+
+    // Mark as needing upload (call after staging data changes).
+    void markDirty() { dirty = true; }
+
+    bool isValid() const { return gpu.IsValid(); }
+};
+
+} // namespace voxel
--- a/src/voxel/VoxelMesher.cpp
+++ b/src/voxel/VoxelMesher.cpp
@ -243,538 +243,11 @@ uint8_t VoxelMesher::calcAO(const VoxelWorld& world, const ChunkPos& cpos,
 }

 // ══════════════════════════════════════════════════════════════════
-// ── Naive Surface Nets Mesher (Phase 5) ─────────────────────────
+// ── Smooth meshing (Phase 5) ────────────────────────────────────
 // ══════════════════════════════════════════════════════════════════
-//
-// Algorithm:
-//   1. Compute SDF for each voxel: smooth solid = -1, empty = +1
-//      Non-smooth solid voxels act as hard walls (SDF crushed to -1).
-//   2. For each cell on the surface (SDF sign differs from at least one neighbor),
-//      place a vertex at the centroid of edge crossings.
-//   3. For each edge (pair of adjacent cells) with a sign change,
-//      emit a quad connecting the 4 cells that share that edge, then split to 2 triangles.
-//   4. Normals derived from SDF gradient (central differences).
-
-// Padded grid: +2 border for cross-chunk SDF lookups and neighbor smooth detection
-static constexpr int PAD = 2;
-static constexpr int GRID = CHUNK_SIZE + 2 * PAD; // 36
-
-static inline int gridIdx(int x, int y, int z) {
-    return (x + PAD) + (y + PAD) * GRID + (z + PAD) * GRID * GRID;
-}
-
-// Helper: read voxel data at chunk-local coords (with cross-chunk fallback)
-static VoxelData readVoxel(const Chunk& chunk, const VoxelWorld& world, int x, int y, int z) {
-    if (chunk.isInBounds(x, y, z))
-        return chunk.at(x, y, z);
-    return world.getVoxel(
-        chunk.pos.x * CHUNK_SIZE + x,
-        chunk.pos.y * CHUNK_SIZE + y,
-        chunk.pos.z * CHUNK_SIZE + z);
-}
-
-float SmoothMesher::computeSDF(const Chunk& chunk, const VoxelWorld& world,
-                                int x, int y, int z) {
-    VoxelData v = readVoxel(chunk, world, x, y, z);
-    if (v.isEmpty()) return 1.0f;       // empty → positive SDF
-    return -1.0f;                        // any solid → negative SDF
-}
-
-void SmoothMesher::computeNormal(const Chunk& chunk, const VoxelWorld& world,
-                                  int x, int y, int z,
-                                  float& nx, float& ny, float& nz) {
-    // Central differences of the SDF
-    float dx = computeSDF(chunk, world, x+1, y, z) - computeSDF(chunk, world, x-1, y, z);
-    float dy = computeSDF(chunk, world, x, y+1, z) - computeSDF(chunk, world, x, y-1, z);
-    float dz = computeSDF(chunk, world, x, y, z+1) - computeSDF(chunk, world, x, y, z-1);
-
-    float len = std::sqrt(dx*dx + dy*dy + dz*dz);
-    if (len > 0.0001f) {
-        nx = dx / len;
-        ny = dy / len;
-        nz = dz / len;
-    } else {
-        nx = 0.0f; ny = 1.0f; nz = 0.0f;
-    }
-}
-
-// Thread-local scratch buffers to avoid per-chunk allocation overhead.
-// Each worker thread gets its own set, eliminating malloc/free thrashing.
-struct SmoothScratch {
-    float sdf[GRID * GRID * GRID];
-    uint8_t smoothGrid[GRID * GRID * GRID];
-    uint8_t smoothNear[GRID * GRID * GRID]; // dilated: 1 if smooth OR face-adjacent to smooth
-    VoxelData voxelGrid[GRID * GRID * GRID];
-    int32_t vertexMap[33 * 33 * 33]; // VERT_RANGE³
-};
-static thread_local SmoothScratch* tls_scratch = nullptr;
-
-uint32_t SmoothMesher::meshChunk(Chunk& chunk, const VoxelWorld& world) {
-    chunk.smoothVertices.clear();
-    chunk.hasSmooth = false;
-
-    // ── Early exit: skip chunks far from any smooth voxels ──────
-    // Check this chunk + 26 neighbors for containsSmooth flag.
-    // This avoids the expensive 36³ grid fill for ~70% of chunks.
-    {
-        bool nearSmooth = chunk.containsSmooth;
-        if (!nearSmooth) {
-            for (int dz = -1; dz <= 1 && !nearSmooth; dz++)
-            for (int dy = -1; dy <= 1 && !nearSmooth; dy++)
-            for (int dx = -1; dx <= 1 && !nearSmooth; dx++) {
-                if (dx == 0 && dy == 0 && dz == 0) continue;
-                const Chunk* nc = world.getChunk(
-                    ChunkPos{chunk.pos.x + dx, chunk.pos.y + dy, chunk.pos.z + dz});
-                if (nc && nc->containsSmooth) nearSmooth = true;
-            }
-        }
-        if (!nearSmooth) return 0;
-    }
-
-    // Allocate thread-local scratch once per thread (persists across calls)
-    if (!tls_scratch) tls_scratch = new SmoothScratch();
-    auto& scratch = *tls_scratch;
-
-    // ── Step 1: Build SDF grid + smooth flag grid + voxel cache ──
-    // PAD=2 so we have SDF data for cells at [-1..CHUNK_SIZE] (all 8 corners accessible)
-    // Also build a "isSmooth" grid for the same range to detect proximity to smooth voxels.
-    // voxelGrid caches VoxelData to avoid repeated cross-chunk hashmap lookups later.
-    float* sdf = scratch.sdf;
-    uint8_t* smoothGrid = scratch.smoothGrid;
-    VoxelData* voxelGrid = scratch.voxelGrid;
-    constexpr int GRID3 = GRID * GRID * GRID;
-    std::memset(smoothGrid, 0, GRID3);
-    // SDF defaults to 1.0f (empty) — fill below
-    for (int i = 0; i < GRID3; i++) sdf[i] = 1.0f;
-    bool anySmooth = false;
-
-    // Pre-cache neighbor chunk pointers for fast cross-chunk access
-    const Chunk* neighborChunks[3][3][3] = {};
-    for (int dz = -1; dz <= 1; dz++)
-    for (int dy = -1; dy <= 1; dy++)
-    for (int dx = -1; dx <= 1; dx++) {
-        neighborChunks[dx+1][dy+1][dz+1] = world.getChunk(
-            ChunkPos{chunk.pos.x + dx, chunk.pos.y + dy, chunk.pos.z + dz});
-    }
-
-    // Helper: fast voxel read using cached neighbor chunk pointers
-    auto readVoxelFast = [&](int x, int y, int z) -> VoxelData {
-        if (x >= 0 && x < CHUNK_SIZE && y >= 0 && y < CHUNK_SIZE && z >= 0 && z < CHUNK_SIZE)
-            return chunk.at(x, y, z);
-        // Determine which neighbor chunk
-        int cx = (x < 0) ? 0 : (x >= CHUNK_SIZE) ? 2 : 1;
-        int cy = (y < 0) ? 0 : (y >= CHUNK_SIZE) ? 2 : 1;
-        int cz = (z < 0) ? 0 : (z >= CHUNK_SIZE) ? 2 : 1;
-        const Chunk* nc = neighborChunks[cx][cy][cz];
-        if (!nc) return VoxelData{};  // empty if chunk not loaded
-        int lx = ((x % CHUNK_SIZE) + CHUNK_SIZE) % CHUNK_SIZE;
-        int ly = ((y % CHUNK_SIZE) + CHUNK_SIZE) % CHUNK_SIZE;
-        int lz = ((z % CHUNK_SIZE) + CHUNK_SIZE) % CHUNK_SIZE;
-        return nc->at(lx, ly, lz);
-    };
-
-    for (int z = -PAD; z < CHUNK_SIZE + PAD; z++) {
-        for (int y = -PAD; y < CHUNK_SIZE + PAD; y++) {
-            for (int x = -PAD; x < CHUNK_SIZE + PAD; x++) {
-                int gi = gridIdx(x, y, z);
-                VoxelData v = readVoxelFast(x, y, z);
-                voxelGrid[gi] = v;
-                sdf[gi] = v.isEmpty() ? 1.0f : -1.0f;
-                if (v.isSmooth()) {
-                    smoothGrid[gi] = 1;
-                    // Only need anySmooth for this chunk's own voxels
-                    if (chunk.isInBounds(x, y, z)) anySmooth = true;
-                }
-            }
-        }
-    }
-
-    // Also check 1 beyond the chunk (neighbor chunks may have smooth voxels that
-    // affect cells at the chunk boundary)
-    if (!anySmooth) {
-        // Check if any neighbor voxels just outside the chunk are smooth
-        for (int z = -1; z <= CHUNK_SIZE && !anySmooth; z++)
-        for (int y = -1; y <= CHUNK_SIZE && !anySmooth; y++)
-        for (int x = -1; x <= CHUNK_SIZE && !anySmooth; x++) {
-            if (chunk.isInBounds(x, y, z)) continue; // already checked
-            if (smoothGrid[gridIdx(x, y, z)]) anySmooth = true;
-        }
-    }
-
-    if (!anySmooth) return 0;
-    chunk.hasSmooth = true;
-
-    // ── Step 1b: Dilate smoothGrid → smoothNear ──────────────────
-    // Pre-compute "smooth or face-adjacent to smooth" to reduce the
-    // per-cell hasSmooth check from 56 lookups to 8 lookups.
-    uint8_t* smoothNear = scratch.smoothNear;
-    std::memcpy(smoothNear, smoothGrid, GRID3);
-    for (int z = -PAD + 1; z < CHUNK_SIZE + PAD - 1; z++)
-    for (int y = -PAD + 1; y < CHUNK_SIZE + PAD - 1; y++)
-    for (int x = -PAD + 1; x < CHUNK_SIZE + PAD - 1; x++) {
-        if (smoothGrid[gridIdx(x, y, z)]) {
-            smoothNear[gridIdx(x+1, y, z)] = 1;
-            smoothNear[gridIdx(x-1, y, z)] = 1;
-            smoothNear[gridIdx(x, y+1, z)] = 1;
-            smoothNear[gridIdx(x, y-1, z)] = 1;
-            smoothNear[gridIdx(x, y, z+1)] = 1;
-            smoothNear[gridIdx(x, y, z-1)] = 1;
-        }
-    }
-
-    // ── Step 2: Generate vertices for surface cells ──────────────
-    // Extended range: [-1, CHUNK_SIZE) for cross-chunk connectivity.
-    // This chunk generates vertices for cells at [-1..CHUNK_SIZE-1].
-    // The vertex map covers [-1..CHUNK_SIZE-1] → size = CHUNK_SIZE+1, offset by +1.
-    static constexpr int VERT_MIN = -1;
-    static constexpr int VERT_MAX = CHUNK_SIZE; // exclusive
-    static constexpr int VERT_RANGE = VERT_MAX - VERT_MIN; // CHUNK_SIZE + 1 = 33
-    int32_t* vertexMap = scratch.vertexMap;
-    std::memset(vertexMap, -1, VERT_RANGE * VERT_RANGE * VERT_RANGE * sizeof(int32_t));
-
-    auto vertMapIdx = [](int x, int y, int z) -> int {
-        // shift coordinates by -VERT_MIN = +1 so index range is [0, VERT_RANGE)
-        return (x - VERT_MIN) + (y - VERT_MIN) * VERT_RANGE + (z - VERT_MIN) * VERT_RANGE * VERT_RANGE;
-    };
-
-    // World offset for this chunk
-    float ox = (float)(chunk.pos.x * CHUNK_SIZE);
-    float oy = (float)(chunk.pos.y * CHUNK_SIZE);
-    float oz = (float)(chunk.pos.z * CHUNK_SIZE);
-
-    // Corner offsets: (dx,dy,dz) for corner index 0-7 of a cell
-    static const int cornerOff[8][3] = {
-        {0,0,0}, {1,0,0}, {0,1,0}, {1,1,0},
-        {0,0,1}, {1,0,1}, {0,1,1}, {1,1,1},
-    };
-    static const float cornerOffF[8][3] = {
-        {0,0,0}, {1,0,0}, {0,1,0}, {1,1,0},
-        {0,0,1}, {1,0,1}, {0,1,1}, {1,1,1},
-    };
-    static const int edges[12][2] = {
-        {0,1}, {2,3}, {4,5}, {6,7}, // X-axis edges
-        {0,2}, {1,3}, {4,6}, {5,7}, // Y-axis edges
-        {0,4}, {1,5}, {2,6}, {3,7}, // Z-axis edges
-    };
-
-    for (int z = VERT_MIN; z < VERT_MAX; z++) {
-        for (int y = VERT_MIN; y < VERT_MAX; y++) {
-            for (int x = VERT_MIN; x < VERT_MAX; x++) {
-                // hasSmooth check via dilated grid: at least one corner must be
-                // smooth or face-adjacent to smooth. Uses pre-dilated smoothNear
-                // grid → only 8 lookups instead of 56.
-                bool hasSmooth = false;
-                for (int c = 0; c < 8 && !hasSmooth; c++) {
-                    if (smoothNear[gridIdx(x + cornerOff[c][0], y + cornerOff[c][1], z + cornerOff[c][2])])
-                        hasSmooth = true;
-                }
-                if (!hasSmooth) continue;
-
-                // Get SDF at 8 corners of cell (x,y,z)
-                float corner[8];
-                bool hasPos = false, hasNeg = false;
-                for (int c = 0; c < 8; c++) {
-                    corner[c] = sdf[gridIdx(x + cornerOff[c][0], y + cornerOff[c][1], z + cornerOff[c][2])];
-                    if (corner[c] < 0.0f) hasNeg = true;
-                    else hasPos = true;
-                }
-
-                if (!hasPos || !hasNeg) continue; // no sign change → not on surface
-
-                // Compute vertex position as centroid of edge crossings.
-                // +0.5 offset: SDF is sampled at voxel centers, so the cell spans
-                // from (x+0.5) to (x+1.5) in world space. This naturally aligns
-                // the isosurface with the integer grid (voxel face positions).
-                float sumX = 0, sumY = 0, sumZ = 0;
-                int crossCount = 0;
-
-                for (int e = 0; e < 12; e++) {
-                    float s0 = corner[edges[e][0]];
-                    float s1 = corner[edges[e][1]];
-                    if ((s0 < 0.0f) == (s1 < 0.0f)) continue;
-
-                    float t = s0 / (s0 - s1);
-                    t = std::clamp(t, 0.01f, 0.99f);
-
-                    const float* c0 = cornerOffF[edges[e][0]];
-                    const float* c1 = cornerOffF[edges[e][1]];
-                    sumX += c0[0] + t * (c1[0] - c0[0]);
-                    sumY += c0[1] + t * (c1[1] - c0[1]);
-                    sumZ += c0[2] + t * (c1[2] - c0[2]);
-                    crossCount++;
-                }
-
-                if (crossCount == 0) continue;
-
-                float invCross = 1.0f / (float)crossCount;
-                // centroid in [0,1] within the cell
-                float cx = sumX * invCross;
-                float cy = sumY * invCross;
-                float cz = sumZ * invCross;
-
-                // ── Per-axis clamping at blocky boundaries ───────────
-                // With +0.5 offset, the cell spans [x+0.5, x+1.5] in world space.
-                // The integer grid (blocky faces) is at x+1. In centroid coords,
-                // that's centroid = 0.5 (the midpoint of the cell).
-                // If the +side corners (dx=1) contain a blocky solid, clamp centroid ≤ 0.5
-                // If the -side corners (dx=0) contain a blocky solid, clamp centroid ≥ 0.5
-                // This prevents the smooth mesh from extending into blocky territory.
-                bool blockyXlo = false, blockyXhi = false;
-                bool blockyYlo = false, blockyYhi = false;
-                bool blockyZlo = false, blockyZhi = false;
-                for (int c = 0; c < 8; c++) {
-                    if (corner[c] >= 0.0f) continue; // empty corner
-                    VoxelData v = voxelGrid[gridIdx(
-                        x + cornerOff[c][0], y + cornerOff[c][1], z + cornerOff[c][2])];
-                    if (!v.isEmpty() && !v.isSmooth()) {
-                        // This corner is a blocky solid
-                        if (cornerOff[c][0] == 0) blockyXlo = true; else blockyXhi = true;
-                        if (cornerOff[c][1] == 0) blockyYlo = true; else blockyYhi = true;
-                        if (cornerOff[c][2] == 0) blockyZlo = true; else blockyZhi = true;
-                    }
-                }
-                if (blockyXhi) cx = std::min(cx, 0.5f);
-                if (blockyXlo) cx = std::max(cx, 0.5f);
-                if (blockyYhi) cy = std::min(cy, 0.5f);
-                if (blockyYlo) cy = std::max(cy, 0.5f);
-                if (blockyZhi) cz = std::min(cz, 0.5f);
-                if (blockyZlo) cz = std::max(cz, 0.5f);
-
-                // World position with +0.5 offset (SDF at voxel centers)
-                float vx = (float)x + 0.5f + cx;
-                float vy = (float)y + 0.5f + cy;
-                float vz = (float)z + 0.5f + cz;
-
-                // Determine material: prefer smooth voxels' materials to avoid
-                // picking up subsurface blocky materials (e.g., dirt under stone)
-                uint8_t smoothMatCounts[256] = {};
-                uint8_t allMatCounts[256] = {};
-                int smoothCount = 0;
-                for (int c = 0; c < 8; c++) {
-                    if (corner[c] < 0.0f) {
-                        VoxelData v = voxelGrid[gridIdx(
-                            x + cornerOff[c][0], y + cornerOff[c][1], z + cornerOff[c][2])];
-                        if (!v.isEmpty()) {
-                            allMatCounts[v.getMaterialID()]++;
-                            if (v.isSmooth()) {
-                                smoothMatCounts[v.getMaterialID()]++;
-                                smoothCount++;
-                            }
-                        }
-                    }
-                }
-                // Primary material: prefer smooth-only counts to avoid subsurface bleed
-                uint8_t* primaryCounts = (smoothCount > 0) ? smoothMatCounts : allMatCounts;
-                uint8_t bestMat = 6, bestCount = 0;
-                for (int m = 1; m < 256; m++) {
-                    if (primaryCounts[m] > bestCount) {
-                        bestMat = (uint8_t)m; bestCount = primaryCounts[m];
-                    }
-                }
-                // Secondary material: only count SURFACE-EXPOSED voxels (at least one
-                // empty neighbor). This prevents underground materials (dirt under stone)
-                // from bleeding through — same principle as blocky face blending.
-                static const int dirs6[6][3] = {{1,0,0},{-1,0,0},{0,1,0},{0,-1,0},{0,0,1},{0,0,-1}};
-                uint8_t surfaceMatCounts[256] = {};
-                for (int c = 0; c < 8; c++) {
-                    if (corner[c] >= 0.0f) continue;
-                    int cx = x + cornerOff[c][0], cy = y + cornerOff[c][1], cz = z + cornerOff[c][2];
-                    VoxelData v = voxelGrid[gridIdx(cx, cy, cz)];
-                    if (v.isEmpty()) continue;
-                    // Check if this voxel is on the surface
-                    bool onSurface = false;
-                    for (int d = 0; d < 6 && !onSurface; d++) {
-                        if (sdf[gridIdx(cx + dirs6[d][0], cy + dirs6[d][1], cz + dirs6[d][2])] > 0.0f)
-                            onSurface = true;
-                    }
-                    if (onSurface) surfaceMatCounts[v.getMaterialID()]++;
-                }
-                uint8_t secMat = bestMat, secCount = 0;
-                for (int m = 1; m < 256; m++) {
-                    if (m == bestMat) continue;
-                    if (surfaceMatCounts[m] > secCount) {
-                        secMat = (uint8_t)m; secCount = surfaceMatCounts[m];
-                    }
-                }
-                // blendWeight: binary flag — 255 at material boundary, 0 at interior.
-                // GPU interpolation creates the smooth edge-to-interior falloff.
-                uint8_t blendW = (secCount > 0 && secMat != bestMat) ? 255 : 0;
-
-                // Store vertex (normals zeroed — computed later from face normals in Step 4)
-                int32_t vertIdx = (int32_t)chunk.smoothVertices.size();
-                vertexMap[vertMapIdx(x, y, z)] = vertIdx;
-
-                SmoothVertex sv;
-                sv.px = ox + vx;
-                sv.py = oy + vy;
-                sv.pz = oz + vz;
-                sv.nx = 0;
-                sv.ny = 0;
-                sv.nz = 0;
-                sv.materialID = bestMat;
-                sv.secondaryMat = secMat;
-                sv.blendWeight = blendW;
-                sv._pad1 = 0;
-                sv.chunkIndex = 0;
-                sv._pad2 = 0;
-                chunk.smoothVertices.push_back(sv);
-            }
-        }
-    }
-
-    if (chunk.smoothVertices.empty()) {
-        chunk.hasSmooth = false;
-        return 0;
-    }
-
-    // ── Step 3: Emit quads for edges with sign change ────────────
-    // Canonical ownership: this chunk owns edges whose lower endpoint
-    // is in [0, CHUNK_SIZE). Extended to check edges at the chunk
-    // boundary (lower endpoint at CHUNK_SIZE-1, upper at CHUNK_SIZE).
-    // The sharing cells may be at [-1..CHUNK_SIZE-1], all covered by vertex map.
-
-    // Tri with edge axis info for correct normal orientation.
-    // normalAxis: 0=X, 1=Y, 2=Z — the axis of the edge that generated this quad.
-    // normalSign: +1 if the normal should point in +axis direction, -1 for -axis.
-    struct Tri { int32_t a, b, c; int8_t normalAxis; int8_t normalSign; };
-    std::vector<Tri> triangles;
-    triangles.reserve(chunk.smoothVertices.size() * 2);
-
-    // Helper: safe vertex map lookup (returns -1 if out of range)
-    auto safeVertMap = [&](int x, int y, int z) -> int32_t {
-        if (x < VERT_MIN || x >= VERT_MAX ||
-            y < VERT_MIN || y >= VERT_MAX ||
-            z < VERT_MIN || z >= VERT_MAX) return -1;
-        return vertexMap[vertMapIdx(x, y, z)];
-    };
-
-    // Helper: emit 2 triangles for a quad (a,b,c,d) with known desired normal.
-    // The Y-axis sharing cells have a different spatial arrangement from X and Z,
-    // requiring opposite winding to produce correct front-facing triangles.
-    auto emitQuad = [&](int a, int b, int c, int d, float s0, int8_t axis) {
-        if (a < 0 || b < 0 || c < 0 || d < 0) return;
-        int8_t sign = (s0 < 0.0f) ? +1 : -1;
-        // Y-axis has natural winding swapped relative to X and Z
-        bool useWindingA = (s0 > 0.0f);
-        if (axis == 1) useWindingA = !useWindingA;
-        if (useWindingA) {
-            triangles.push_back({a, b, d, axis, sign});
-            triangles.push_back({a, d, c, axis, sign});
-        } else {
-            triangles.push_back({a, d, b, axis, sign});
-            triangles.push_back({a, c, d, axis, sign});
-        }
-    };
-
-    // Iterate over edges owned by this chunk: grid points [0, CHUNK_SIZE)
-    for (int z = 0; z < CHUNK_SIZE; z++) {
-        for (int y = 0; y < CHUNK_SIZE; y++) {
-            for (int x = 0; x < CHUNK_SIZE; x++) {
-                float s0 = sdf[gridIdx(x, y, z)];
-
-                // X-axis edge: (x,y,z) → (x+1,y,z)
-                {
-                    float s1 = sdf[gridIdx(x+1, y, z)];
-                    if ((s0 < 0.0f) != (s1 < 0.0f)) {
-                        emitQuad(
-                            safeVertMap(x, y-1, z-1), safeVertMap(x, y, z-1),
-                            safeVertMap(x, y-1, z),   safeVertMap(x, y, z),
-                            s0, 0);
-                    }
-                }
-
-                // Y-axis edge: (x,y,z) → (x,y+1,z)
-                {
-                    float s1 = sdf[gridIdx(x, y+1, z)];
-                    if ((s0 < 0.0f) != (s1 < 0.0f)) {
-                        emitQuad(
-                            safeVertMap(x-1, y, z-1), safeVertMap(x, y, z-1),
-                            safeVertMap(x-1, y, z),   safeVertMap(x, y, z),
-                            s0, 1);
-                    }
-                }
-
-                // Z-axis edge: (x,y,z) → (x,y,z+1)
-                {
-                    float s1 = sdf[gridIdx(x, y, z+1)];
-                    if ((s0 < 0.0f) != (s1 < 0.0f)) {
-                        emitQuad(
-                            safeVertMap(x-1, y-1, z), safeVertMap(x, y-1, z),
-                            safeVertMap(x-1, y, z),   safeVertMap(x, y, z),
-                            s0, 2);
-                    }
-                }
-            }
-        }
-    }
-
-    // ── Step 4: Compute smooth vertex normals ──────────────────────
-    // Accumulate area-weighted face normals into each indexed vertex,
-    // then normalize. This gives Gouraud-style smooth shading across
-    // the Surface Nets mesh without adding geometry.
-
-    const int vertCount = (int)chunk.smoothVertices.size();
-
-    // Zero out vertex normals (will accumulate face normals)
-    for (auto& sv : chunk.smoothVertices) {
-        sv.nx = 0; sv.ny = 0; sv.nz = 0;
-    }
-
-    // For each triangle: compute oriented face normal, accumulate into vertices.
-    // The cross product magnitude is proportional to triangle area, so larger
-    // triangles contribute more — this is the standard area-weighted approach.
-    for (const auto& tri : triangles) {
-        const SmoothVertex& va = chunk.smoothVertices[tri.a];
-        const SmoothVertex& vb = chunk.smoothVertices[tri.b];
-        const SmoothVertex& vc = chunk.smoothVertices[tri.c];
-
-        float e1x = vb.px - va.px, e1y = vb.py - va.py, e1z = vb.pz - va.pz;
-        float e2x = vc.px - va.px, e2y = vc.py - va.py, e2z = vc.pz - va.pz;
-        float fnx = e1y * e2z - e1z * e2y;
-        float fny = e1z * e2x - e1x * e2z;
-        float fnz = e1x * e2y - e1y * e2x;
-
-        // Orient using the known edge axis (same logic as before)
-        float component = (tri.normalAxis == 0) ? fnx : (tri.normalAxis == 1) ? fny : fnz;
-        if ((component > 0.0f) != (tri.normalSign > 0)) {
-            fnx = -fnx; fny = -fny; fnz = -fnz;
-        }
-
-        // Accumulate (area-weighted — cross product magnitude IS the area×2)
-        chunk.smoothVertices[tri.a].nx += fnx;
-        chunk.smoothVertices[tri.a].ny += fny;
-        chunk.smoothVertices[tri.a].nz += fnz;
-        chunk.smoothVertices[tri.b].nx += fnx;
-        chunk.smoothVertices[tri.b].ny += fny;
-        chunk.smoothVertices[tri.b].nz += fnz;
-        chunk.smoothVertices[tri.c].nx += fnx;
-        chunk.smoothVertices[tri.c].ny += fny;
-        chunk.smoothVertices[tri.c].nz += fnz;
-    }
-
-    // Normalize accumulated vertex normals
-    for (auto& sv : chunk.smoothVertices) {
-        float len = std::sqrt(sv.nx*sv.nx + sv.ny*sv.ny + sv.nz*sv.nz);
-        if (len > 0.0001f) {
-            sv.nx /= len; sv.ny /= len; sv.nz /= len;
-        } else {
-            sv.nx = 0; sv.ny = 1; sv.nz = 0;
-        }
-    }
-
-    // ── Step 5: Expand indexed triangles to triangle list ─────────
-    std::vector<SmoothVertex> expanded;
-    expanded.reserve(triangles.size() * 3);
-    for (const auto& tri : triangles) {
-        expanded.push_back(chunk.smoothVertices[tri.a]);
-        expanded.push_back(chunk.smoothVertices[tri.b]);
-        expanded.push_back(chunk.smoothVertices[tri.c]);
-    }
-
-    chunk.smoothVertices = std::move(expanded);
-    chunk.smoothVertexCount = (uint32_t)chunk.smoothVertices.size();
-
-    return chunk.smoothVertexCount;
-}
+// The CPU SmoothMesher has been removed. Smooth meshing is now handled
+// exclusively by the GPU compute shaders (voxelSmoothCentroidCS.hlsl
+// + voxelSmoothCS.hlsl) which include crease-angle correction for
+// correct normals at sharp edges (e.g. vertical walls).

 } // namespace voxel
--- a/src/voxel/VoxelMesher.h
+++ b/src/voxel/VoxelMesher.h
@ -37,25 +37,4 @@ private:
                          int x, int y, int z, uint8_t face);
 };

-// ── Naive Surface Nets Mesher (Phase 5) ─────────────────────────
-// Generates smooth triangle mesh for voxels marked FLAG_SMOOTH.
-// Algorithm: one vertex per surface cell, positioned at edge-crossing centroid.
-// Quads emitted for each edge with sign change, then split into 2 triangles.
-class SmoothMesher {
-public:
-    // Mesh smooth voxels in a chunk, populating chunk.smoothVertices.
-    // Returns number of smooth vertices generated (always multiple of 3, triangle list).
-    static uint32_t meshChunk(Chunk& chunk, const VoxelWorld& world);
-
-private:
-    // SDF value at a voxel position (solid smooth = -1, empty = +1)
-    // Non-smooth solid voxels are treated as walls (SDF = -1 at boundary)
-    static float computeSDF(const Chunk& chunk, const VoxelWorld& world,
-                            int x, int y, int z);
-
-    // Compute SDF gradient (numerical central differences) for normal
-    static void computeNormal(const Chunk& chunk, const VoxelWorld& world,
-                              int x, int y, int z, float& nx, float& ny, float& nz);
-};
-
 } // namespace voxel
--- a/src/voxel/VoxelRTManager.cpp
+++ b/src/voxel/VoxelRTManager.cpp
@ -0,0 +1,610 @@
+#include "VoxelRTManager.h"
+#include <cstring>
+
+using namespace wi::graphics;
+
+namespace voxel {
+
+void VoxelRTManager::initialize(GraphicsDevice* dev, uint32_t maxBlasVertices) {
+    device_ = dev;
+    maxBlasVertices_ = maxBlasVertices;
+
+    available_ = dev->CheckCapability(GraphicsDeviceCapability::RAYTRACING);
+    if (!available_) {
+        wi::backlog::post("VoxelRTManager: RT not available (GPU does not support ray tracing)");
+        return;
+    }
+
+    wi::renderer::LoadShader(ShaderStage::CS, blasExtractShader_, "voxel/voxelBLASExtractCS.cso");
+    if (blasExtractShader_.IsValid()) {
+        // BLAS position buffer: 6 float3 per quad (non-indexed triangles), raw buffer
+        GPUBufferDesc posDesc;
+        posDesc.size = (uint64_t)maxBlasVertices * sizeof(float) * 3;
+        posDesc.bind_flags = BindFlag::UNORDERED_ACCESS | BindFlag::SHADER_RESOURCE;
+        posDesc.misc_flags = ResourceMiscFlag::BUFFER_RAW;
+        posDesc.stride = 0;
+        posDesc.usage = Usage::DEFAULT;
+        bool ok = dev->CreateBuffer(&posDesc, nullptr, &blasPositionBuffer_);
+
+        // Sequential index buffer for BLAS
+        GPUBufferDesc idxDesc;
+        idxDesc.size = (uint64_t)maxBlasVertices * sizeof(uint32_t);
+        idxDesc.bind_flags = BindFlag::SHADER_RESOURCE;
+        idxDesc.usage = Usage::DEFAULT;
+        auto fillIndices = [maxBlasVertices](void* dest) {
+            uint32_t* p = (uint32_t*)dest;
+            for (uint32_t i = 0; i < maxBlasVertices; i++)
+                p[i] = i;
+        };
+        bool okIdx = dev->CreateBuffer2(&idxDesc, fillIndices, &blasIndexBuffer_);
+
+        if (ok && blasPositionBuffer_.IsValid() && okIdx && blasIndexBuffer_.IsValid()) {
+            dev->SetName(&blasPositionBuffer_, "VoxelRTManager::blasPositionBuffer");
+            dev->SetName(&blasIndexBuffer_, "VoxelRTManager::blasIndexBuffer");
+            wi::backlog::post("VoxelRTManager: RT available (BLAS pos "
+                + std::to_string(posDesc.size / (1024*1024)) + " MB + idx "
+                + std::to_string(idxDesc.size / (1024*1024)) + " MB)");
+        } else {
+            available_ = false;
+            wi::backlog::post("VoxelRTManager: RT buffer creation failed", wi::backlog::LogLevel::Warning);
+        }
+    } else {
+        available_ = false;
+        wi::backlog::post("VoxelRTManager: BLAS extraction shader failed", wi::backlog::LogLevel::Warning);
+    }
+
+    // Toping BLAS CS
+    wi::renderer::LoadShader(ShaderStage::CS, topingBLASShader_, "voxel/voxelTopingBLASCS.cso");
+    if (topingBLASShader_.IsValid()) {
+        static constexpr uint32_t MAX_GROUPS = 64;
+        GPUBufferDesc grpDesc;
+        grpDesc.size = MAX_GROUPS * 20; // 5 × uint32 per group
+        grpDesc.bind_flags = BindFlag::SHADER_RESOURCE;
+        grpDesc.misc_flags = ResourceMiscFlag::BUFFER_STRUCTURED;
+        grpDesc.stride = 20;
+        grpDesc.usage = Usage::DEFAULT;
+        dev->CreateBuffer(&grpDesc, nullptr, &topingBLASGroupBuffer_);
+        wi::backlog::post("VoxelRTManager: toping BLAS CS available");
+    } else {
+        wi::backlog::post("VoxelRTManager: toping BLAS CS failed", wi::backlog::LogLevel::Warning);
+    }
+
+    // RT Shadows + AO
+    wi::renderer::LoadShader(ShaderStage::CS, shadowShader_, "voxel/voxelShadowCS.cso",
+        ShaderModel::SM_6_5);
+    wi::renderer::LoadShader(ShaderStage::CS, aoBlurShader_, "voxel/voxelAOBlurCS.cso");
+    wi::renderer::LoadShader(ShaderStage::CS, aoApplyShader_, "voxel/voxelAOApplyCS.cso");
+    if (shadowShader_.IsValid() && aoBlurShader_.IsValid() && aoApplyShader_.IsValid()) {
+        shadowsEnabled_ = true;
+        wi::backlog::post("VoxelRTManager: RT shadows + AO blur available");
+    } else {
+        wi::backlog::post("VoxelRTManager: RT shadow/AO shader(s) failed",
+            wi::backlog::LogLevel::Warning);
+    }
+}
+
+// ── BLAS extraction: blocky quads → float3 positions ────────────
+
+void VoxelRTManager::dispatchBLASExtract(CommandList cmd,
+    const GPUBuffer& quadBuffer,
+    const GPUBuffer& chunkInfoBuffer,
+    uint32_t quadCount) const
+{
+    if (!available_ || !blasExtractShader_.IsValid() || quadCount == 0) return;
+
+    auto* dev = device_;
+
+    GPUBarrier preBarriers[] = {
+        GPUBarrier::Buffer(&blasPositionBuffer_,
+            ResourceState::UNDEFINED, ResourceState::UNORDERED_ACCESS),
+    };
+    dev->Barrier(preBarriers, 1, cmd);
+
+    dev->BindComputeShader(&blasExtractShader_, cmd);
+    dev->BindResource(&quadBuffer, 0, cmd);        // t0
+    dev->BindResource(&chunkInfoBuffer, 2, cmd);   // t2
+    dev->BindUAV(&blasPositionBuffer_, 0, cmd);    // u0
+
+    struct BLASPush {
+        uint32_t quadCount;
+        uint32_t pad[11];
+    } pushData = {};
+    pushData.quadCount = quadCount;
+    dev->PushConstants(&pushData, sizeof(pushData), cmd);
+
+    uint32_t groupCount = (quadCount + 63) / 64;
+    dev->Dispatch(groupCount, 1, 1, cmd);
+
+    GPUBarrier postBarriers[] = {
+        GPUBarrier::Buffer(&blasPositionBuffer_,
+            ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
+    };
+    dev->Barrier(postBarriers, 1, cmd);
+
+    blockyVertexCount_ = quadCount * 6;
+}
+
+// ── Toping BLAS extraction (GPU compute) ────────────────────────
+
+void VoxelRTManager::dispatchTopingBLASExtract(CommandList cmd,
+    const GPUBuffer& topingVertexBuffer,
+    const GPUBuffer& topingInstanceBuffer,
+    const void* groupsGPUData, size_t groupsGPUSize,
+    uint32_t groupCount, uint32_t totalVertices) const
+{
+    if (!topingBLASShader_.IsValid() || !topingBLASGroupBuffer_.IsValid() ||
+        !topingBLASPositionBuf_.isValid() || !topingVertexBuffer.IsValid() ||
+        !topingInstanceBuffer.IsValid() || totalVertices == 0 || groupCount == 0)
+        return;
+
+    auto* dev = device_;
+
+    // Upload group table
+    dev->UpdateBuffer(&topingBLASGroupBuffer_, groupsGPUData, cmd, groupsGPUSize);
+
+    GPUBarrier preBarriers[] = {
+        GPUBarrier::Buffer(&topingBLASGroupBuffer_,
+            ResourceState::COPY_DST, ResourceState::SHADER_RESOURCE),
+        GPUBarrier::Buffer(&topingBLASPositionBuf_.gpu,
+            ResourceState::UNDEFINED, ResourceState::UNORDERED_ACCESS),
+    };
+    dev->Barrier(preBarriers, 2, cmd);
+
+    dev->BindComputeShader(&topingBLASShader_, cmd);
+    dev->BindResource(&topingVertexBuffer, 4, cmd);      // t4
+    dev->BindResource(&topingInstanceBuffer, 5, cmd);    // t5
+    dev->BindResource(&topingBLASGroupBuffer_, 7, cmd);  // t7
+    dev->BindUAV(&topingBLASPositionBuf_.gpu, 0, cmd);   // u0
+
+    struct {
+        uint32_t totalVertices;
+        uint32_t groupCount;
+        uint32_t pad[10];
+    } pushData = {};
+    pushData.totalVertices = totalVertices;
+    pushData.groupCount = groupCount;
+    dev->PushConstants(&pushData, sizeof(pushData), cmd);
+
+    uint32_t threadGroups = (totalVertices + 63) / 64;
+    dev->Dispatch(threadGroups, 1, 1, cmd);
+
+    GPUBarrier postBarriers[] = {
+        GPUBarrier::Buffer(&topingBLASPositionBuf_.gpu,
+            ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
+    };
+    dev->Barrier(postBarriers, 1, cmd);
+
+    topingVertexCount_ = totalVertices;
+    dirty = true;
+    topingBLASDirty = false;
+}
+
+// ── Ensure toping BLAS buffer capacity ──────────────────────────
+
+bool VoxelRTManager::ensureTopingBLASCapacity(uint32_t totalVertices) {
+    if (totalVertices == 0) return false;
+
+    bool recreated = topingBLASPositionBuf_.ensureCapacity(device_, totalVertices,
+        3 * sizeof(float),
+        BindFlag::UNORDERED_ACCESS | BindFlag::SHADER_RESOURCE,
+        ResourceMiscFlag::BUFFER_RAW);
+
+    if (recreated) {
+        char msg[256];
+        snprintf(msg, sizeof(msg), "VoxelRTManager: toping BLAS pos buffer (%u capacity, %.1f MB)",
+            topingBLASPositionBuf_.capacity,
+            (size_t)topingBLASPositionBuf_.capacity * 3 * sizeof(float) / (1024.0 * 1024.0));
+        wi::backlog::post(msg);
+    }
+
+    // Index buffer: grow if needed
+    if (topingBLASIndexCount_ < topingBLASPositionBuf_.capacity) {
+        uint32_t idxCount = topingBLASPositionBuf_.capacity;
+        std::vector<uint32_t> indices(idxCount);
+        for (uint32_t j = 0; j < idxCount; j++) indices[j] = j;
+
+        GPUBufferDesc idxDesc;
+        idxDesc.size = (size_t)idxCount * sizeof(uint32_t);
+        idxDesc.bind_flags = BindFlag::SHADER_RESOURCE;
+        idxDesc.misc_flags = ResourceMiscFlag::NONE;
+        idxDesc.usage = Usage::DEFAULT;
+        device_->CreateBuffer(&idxDesc, indices.data(), &topingBLASIndexBuffer_);
+        topingBLASIndexCount_ = idxCount;
+        recreated = true;
+    }
+
+    topingBLASDirty = true;
+    return recreated;
+}
+
+// ── Acceleration structure build ────────────────────────────────
+
+void VoxelRTManager::buildAccelerationStructures(CommandList cmd,
+    uint32_t buildFlags,
+    const GPUBuffer& smoothVB,
+    uint32_t smoothVertCount) const
+{
+    if (!available_) return;
+
+    auto* dev = device_;
+
+    // ── Blocky BLAS ──
+    uint32_t blockyVertCount = blockyVertexCount_;
+    if (blockyVertCount < 3) blockyVertCount = 0;
+    if ((buildFlags & BUILD_BLOCKY) && blockyVertCount > 0 && blasPositionBuffer_.IsValid()) {
+        if (!blockyBLAS_.IsValid() || blockyVertCount > blockyBLASCapacity_) {
+            blockyBLASCapacity_ = blockyVertCount + blockyVertCount / 4;
+
+            RaytracingAccelerationStructureDesc desc;
+            desc.type = RaytracingAccelerationStructureDesc::Type::BOTTOMLEVEL;
+            desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
+
+            desc.bottom_level.geometries.resize(1);
+            auto& geom = desc.bottom_level.geometries[0];
+            geom.type = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::Type::TRIANGLES;
+            geom.flags = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::FLAG_OPAQUE;
+            geom.triangles.vertex_buffer = blasPositionBuffer_;
+            geom.triangles.vertex_byte_offset = 0;
+            geom.triangles.vertex_count = blockyBLASCapacity_;
+            geom.triangles.vertex_stride = sizeof(float) * 3;
+            geom.triangles.vertex_format = Format::R32G32B32_FLOAT;
+            geom.triangles.index_buffer = blasIndexBuffer_;
+            geom.triangles.index_count = blockyBLASCapacity_;
+            geom.triangles.index_format = IndexBufferFormat::UINT32;
+            geom.triangles.index_offset = 0;
+
+            bool ok = dev->CreateRaytracingAccelerationStructure(&desc, &blockyBLAS_);
+            if (ok) {
+                dev->SetName(&blockyBLAS_, "VoxelRTManager::blockyBLAS");
+                wi::backlog::post("VoxelRTManager: blocky BLAS created (capacity "
+                    + std::to_string(blockyBLASCapacity_ / 3) + " tris)");
+            } else {
+                wi::backlog::post("VoxelRTManager: failed to create blocky BLAS", wi::backlog::LogLevel::Error);
+                available_ = false;
+                return;
+            }
+        }
+
+        blockyBLAS_.desc.bottom_level.geometries[0].triangles.vertex_count = blockyVertCount;
+        blockyBLAS_.desc.bottom_level.geometries[0].triangles.index_count = blockyVertCount;
+        dev->BuildRaytracingAccelerationStructure(&blockyBLAS_, cmd, nullptr);
+    }
+
+    // ── Smooth BLAS ──
+    if (smoothVertCount < 3) smoothVertCount = 0;
+    if ((buildFlags & BUILD_SMOOTH) && smoothVertCount > 0 && smoothVB.IsValid()) {
+        if (!smoothBLAS_.IsValid() || smoothVertCount > smoothBLASCapacity_) {
+            smoothBLASCapacity_ = smoothVertCount + smoothVertCount / 4;
+
+            RaytracingAccelerationStructureDesc desc;
+            desc.type = RaytracingAccelerationStructureDesc::Type::BOTTOMLEVEL;
+            desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
+
+            desc.bottom_level.geometries.resize(1);
+            auto& geom = desc.bottom_level.geometries[0];
+            geom.type = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::Type::TRIANGLES;
+            geom.flags = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::FLAG_OPAQUE;
+            geom.triangles.vertex_buffer = smoothVB;
+            geom.triangles.vertex_byte_offset = 0;
+            geom.triangles.vertex_count = smoothBLASCapacity_;
+            geom.triangles.vertex_stride = 32;
+            geom.triangles.index_buffer = blasIndexBuffer_;
+            geom.triangles.index_count = smoothBLASCapacity_;
+            geom.triangles.index_format = IndexBufferFormat::UINT32;
+            geom.triangles.index_offset = 0;
+            geom.triangles.vertex_format = Format::R32G32B32_FLOAT;
+
+            bool ok = dev->CreateRaytracingAccelerationStructure(&desc, &smoothBLAS_);
+            if (ok) {
+                dev->SetName(&smoothBLAS_, "VoxelRTManager::smoothBLAS");
+                wi::backlog::post("VoxelRTManager: smooth BLAS created (capacity "
+                    + std::to_string(smoothBLASCapacity_ / 3) + " tris)");
+            } else {
+                wi::backlog::post("VoxelRTManager: failed to create smooth BLAS", wi::backlog::LogLevel::Error);
+            }
+        }
+
+        if (smoothBLAS_.IsValid()) {
+            smoothBLAS_.desc.bottom_level.geometries[0].triangles.vertex_count = smoothVertCount;
+            smoothBLAS_.desc.bottom_level.geometries[0].triangles.index_count = smoothVertCount;
+            dev->BuildRaytracingAccelerationStructure(&smoothBLAS_, cmd, nullptr);
+        }
+
+        smoothVertexCount_ = smoothVertCount;
+    }
+
+    // ── Toping BLAS ──
+    uint32_t topingVertCount = topingVertexCount_;
+    if ((buildFlags & BUILD_TOPING) && topingVertCount >= 3 && topingBLASPositionBuf_.isValid()) {
+        if (!topingBLAS_.IsValid() || topingVertCount > topingBLASASCapacity_) {
+            topingBLASASCapacity_ = topingVertCount + topingVertCount / 4;
+
+            RaytracingAccelerationStructureDesc desc;
+            desc.type = RaytracingAccelerationStructureDesc::Type::BOTTOMLEVEL;
+            desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
+
+            desc.bottom_level.geometries.resize(1);
+            auto& geom = desc.bottom_level.geometries[0];
+            geom.type = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::Type::TRIANGLES;
+            geom.flags = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::FLAG_OPAQUE;
+            geom.triangles.vertex_buffer = topingBLASPositionBuf_.gpu;
+            geom.triangles.vertex_byte_offset = 0;
+            geom.triangles.vertex_count = topingBLASASCapacity_;
+            geom.triangles.vertex_stride = sizeof(float) * 3;
+            geom.triangles.vertex_format = Format::R32G32B32_FLOAT;
+            geom.triangles.index_buffer = topingBLASIndexBuffer_;
+            geom.triangles.index_count = topingBLASASCapacity_;
+            geom.triangles.index_format = IndexBufferFormat::UINT32;
+            geom.triangles.index_offset = 0;
+
+            bool ok = dev->CreateRaytracingAccelerationStructure(&desc, &topingBLAS_);
+            if (ok) {
+                dev->SetName(&topingBLAS_, "VoxelRTManager::topingBLAS");
+                wi::backlog::post("VoxelRTManager: toping BLAS created (capacity "
+                    + std::to_string(topingBLASASCapacity_ / 3) + " tris)");
+            } else {
+                wi::backlog::post("VoxelRTManager: failed to create toping BLAS", wi::backlog::LogLevel::Error);
+            }
+        }
+
+        if (topingBLAS_.IsValid()) {
+            topingBLAS_.desc.bottom_level.geometries[0].triangles.vertex_count = topingVertCount;
+            topingBLAS_.desc.bottom_level.geometries[0].triangles.index_count = topingVertCount;
+            dev->BuildRaytracingAccelerationStructure(&topingBLAS_, cmd, nullptr);
+        }
+    }
+
+    // Memory barrier: sync BLAS builds before TLAS
+    {
+        GPUBarrier barriers[] = { GPUBarrier::Memory() };
+        dev->Barrier(barriers, 1, cmd);
+    }
+
+    // ── TLAS ──
+    uint32_t instanceCount = 0;
+    if (blockyBLAS_.IsValid()) instanceCount++;
+    if (smoothBLAS_.IsValid() && smoothVertCount > 0) instanceCount++;
+    if (topingBLAS_.IsValid() && topingVertCount >= 3) instanceCount++;
+    if (instanceCount == 0) { dirty = false; return; }
+
+    if (!tlas_.IsValid() || instanceCount != tlasInstanceCount_) {
+        const size_t instSize = dev->GetTopLevelAccelerationStructureInstanceSize();
+
+        auto setIdentity = [](float transform[3][4]) {
+            std::memset(transform, 0, sizeof(float) * 12);
+            transform[0][0] = 1.0f;
+            transform[1][1] = 1.0f;
+            transform[2][2] = 1.0f;
+        };
+
+        const RaytracingAccelerationStructure* blockyPtr = blockyBLAS_.IsValid() ? &blockyBLAS_ : nullptr;
+        const RaytracingAccelerationStructure* smoothPtr = (smoothBLAS_.IsValid() && smoothVertCount > 0) ? &smoothBLAS_ : nullptr;
+        const RaytracingAccelerationStructure* topingPtr = (topingBLAS_.IsValid() && topingVertCount >= 3) ? &topingBLAS_ : nullptr;
+
+        RaytracingAccelerationStructureDesc desc;
+        desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
+        desc.type = RaytracingAccelerationStructureDesc::Type::TOPLEVEL;
+        desc.top_level.count = instanceCount;
+
+        GPUBufferDesc bufdesc;
+        bufdesc.misc_flags = ResourceMiscFlag::RAY_TRACING;
+        bufdesc.stride = (uint32_t)instSize;
+        bufdesc.size = bufdesc.stride * desc.top_level.count;
+
+        auto initInstances = [&](void* dest) {
+            uint32_t idx = 0;
+            auto addInstance = [&](const RaytracingAccelerationStructure* blas, uint32_t id) {
+                if (!blas) return;
+                RaytracingAccelerationStructureDesc::TopLevel::Instance inst;
+                setIdentity(inst.transform);
+                inst.instance_id = id;  inst.instance_mask = 0xFF;
+                inst.instance_contribution_to_hit_group_index = 0;  inst.flags = 0;
+                inst.bottom_level = blas;
+                dev->WriteTopLevelAccelerationStructureInstance(&inst, (uint8_t*)dest + idx * instSize);
+                idx++;
+            };
+            addInstance(blockyPtr, 0);
+            addInstance(smoothPtr, 1);
+            addInstance(topingPtr, 2);
+        };
+
+        bool ok = dev->CreateBuffer2(&bufdesc, initInstances, &desc.top_level.instance_buffer);
+        if (!ok) {
+            wi::backlog::post("VoxelRTManager: failed to create TLAS instance buffer", wi::backlog::LogLevel::Error);
+            dirty = false;
+            return;
+        }
+
+        ok = dev->CreateRaytracingAccelerationStructure(&desc, &tlas_);
+        if (!ok) {
+            wi::backlog::post("VoxelRTManager: failed to create TLAS", wi::backlog::LogLevel::Error);
+            dirty = false;
+            return;
+        }
+
+        tlasInstanceCount_ = instanceCount;
+        wi::backlog::post("VoxelRTManager: TLAS created (" + std::to_string(instanceCount) + " instances)");
+    }
+
+    dev->BuildRaytracingAccelerationStructure(&tlas_, cmd, nullptr);
+
+    {
+        GPUBarrier barriers[] = { GPUBarrier::Memory(&tlas_) };
+        dev->Barrier(barriers, 1, cmd);
+    }
+
+    dirty = false;
+}
+
+// ── RT Shadow + AO dispatch ─────────────────────────────────────
+
+void VoxelRTManager::dispatchShadows(CommandList cmd,
+    const Texture& depthBuffer,
+    const Texture& renderTarget,
+    const Texture& normalTarget,
+    const GPUBuffer& constantBuffer) const
+{
+    if (!shadowsEnabled_ || !shadowShader_.IsValid() || !tlas_.IsValid())
+        return;
+
+    auto* dev = device_;
+    uint32_t w = renderTarget.GetDesc().width;
+    uint32_t h = renderTarget.GetDesc().height;
+    uint32_t gx = (w + 7) / 8;
+    uint32_t gy = (h + 7) / 8;
+
+    // Pass 1: Shadow + raw AO
+    {
+        GPUBarrier preBarriers[] = {
+            GPUBarrier::Image(&const_cast<Texture&>(depthBuffer),
+                ResourceState::DEPTHSTENCIL, ResourceState::SHADER_RESOURCE),
+            GPUBarrier::Image(&const_cast<Texture&>(renderTarget),
+                ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
+            GPUBarrier::Image(&aoRawTexture,
+                ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
+        };
+        dev->Barrier(preBarriers, 3, cmd);
+
+        dev->BindComputeShader(&shadowShader_, cmd);
+        dev->BindResource(&depthBuffer, 0, cmd);
+        dev->BindResource(&normalTarget, 1, cmd);
+        dev->BindResource(&tlas_, 2, cmd);
+        dev->BindResource(&aoHistoryTexture, 3, cmd);
+        dev->BindUAV(&renderTarget, 0, cmd);
+        dev->BindUAV(&aoRawTexture, 1, cmd);
+        dev->BindConstantBuffer(&constantBuffer, 0, cmd);
+
+        struct ShadowPush {
+            uint32_t width, height;
+            float normalBias, shadowMaxDist;
+            uint32_t debugMode;
+            float aoRadius;
+            uint32_t aoRayCount;
+            float aoStrength;
+            uint32_t frameIndex;
+            uint32_t historyValid;
+            uint32_t pad[2];
+        } pushData = {};
+        pushData.width = w;
+        pushData.height = h;
+        pushData.normalBias = 0.15f;
+        pushData.shadowMaxDist = 512.0f;
+        pushData.debugMode = shadowDebug_;
+        pushData.aoRadius = 8.0f;
+        pushData.aoRayCount = 4;
+        pushData.aoStrength = 0.7f;
+        pushData.frameIndex = frameCounter++;
+        pushData.historyValid = aoHistoryValid ? 1u : 0u;
+        dev->PushConstants(&pushData, sizeof(pushData), cmd);
+        dev->Dispatch(gx, gy, 1, cmd);
+    }
+
+    // Pass 1.5: Copy raw AO → history
+    {
+        GPUBarrier copyBarriers[] = {
+            GPUBarrier::Image(&aoRawTexture,
+                ResourceState::UNORDERED_ACCESS, ResourceState::COPY_SRC),
+            GPUBarrier::Image(&aoHistoryTexture,
+                ResourceState::SHADER_RESOURCE, ResourceState::COPY_DST),
+        };
+        dev->Barrier(copyBarriers, 2, cmd);
+        dev->CopyResource(&aoHistoryTexture, &aoRawTexture, cmd);
+
+        GPUBarrier postCopyBarriers[] = {
+            GPUBarrier::Image(&aoRawTexture,
+                ResourceState::COPY_SRC, ResourceState::SHADER_RESOURCE),
+            GPUBarrier::Image(&aoHistoryTexture,
+                ResourceState::COPY_DST, ResourceState::SHADER_RESOURCE),
+        };
+        dev->Barrier(postCopyBarriers, 2, cmd);
+        aoHistoryValid = true;
+    }
+
+    // Pass 2: Bilateral blur horizontal
+    {
+        GPUBarrier barriers[] = {
+            GPUBarrier::Image(&aoBlurredTexture,
+                ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
+        };
+        dev->Barrier(barriers, 1, cmd);
+
+        dev->BindComputeShader(&aoBlurShader_, cmd);
+        dev->BindResource(&aoRawTexture, 0, cmd);
+        dev->BindResource(&depthBuffer, 1, cmd);
+        dev->BindResource(&normalTarget, 2, cmd);
+        dev->BindUAV(&aoBlurredTexture, 0, cmd);
+
+        struct BlurPush {
+            uint32_t width, height, direction, radius;
+            float depthThreshold, normalThreshold;
+            uint32_t pad[6];
+        } blurPush = {};
+        blurPush.width = w; blurPush.height = h;
+        blurPush.direction = 0; blurPush.radius = 6;
+        blurPush.depthThreshold = 0.001f; blurPush.normalThreshold = 0.9f;
+        dev->PushConstants(&blurPush, sizeof(blurPush), cmd);
+        dev->Dispatch(gx, gy, 1, cmd);
+    }
+
+    // Pass 3: Bilateral blur vertical
+    {
+        GPUBarrier barriers[] = {
+            GPUBarrier::Image(&aoBlurredTexture,
+                ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
+            GPUBarrier::Image(&aoRawTexture,
+                ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
+        };
+        dev->Barrier(barriers, 2, cmd);
+
+        dev->BindComputeShader(&aoBlurShader_, cmd);
+        dev->BindResource(&aoBlurredTexture, 0, cmd);
+        dev->BindResource(&depthBuffer, 1, cmd);
+        dev->BindResource(&normalTarget, 2, cmd);
+        dev->BindUAV(&aoRawTexture, 0, cmd);
+
+        struct BlurPush {
+            uint32_t width, height, direction, radius;
+            float depthThreshold, normalThreshold;
+            uint32_t pad[6];
+        } blurPush = {};
+        blurPush.width = w; blurPush.height = h;
+        blurPush.direction = 1; blurPush.radius = 6;
+        blurPush.depthThreshold = 0.001f; blurPush.normalThreshold = 0.9f;
+        dev->PushConstants(&blurPush, sizeof(blurPush), cmd);
+        dev->Dispatch(gx, gy, 1, cmd);
+    }
+
+    // Pass 4: Apply blurred AO
+    {
+        GPUBarrier barriers[] = {
+            GPUBarrier::Image(&aoRawTexture,
+                ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
+        };
+        dev->Barrier(barriers, 1, cmd);
+
+        dev->BindComputeShader(&aoApplyShader_, cmd);
+        dev->BindResource(&aoRawTexture, 0, cmd);
+        dev->BindResource(&depthBuffer, 1, cmd);
+        dev->BindUAV(&renderTarget, 0, cmd);
+
+        struct ApplyPush {
+            uint32_t width, height, debugMode;
+            uint32_t pad[9];
+        } applyPush = {};
+        applyPush.width = w; applyPush.height = h;
+        applyPush.debugMode = shadowDebug_;
+        dev->PushConstants(&applyPush, sizeof(applyPush), cmd);
+        dev->Dispatch(gx, gy, 1, cmd);
+    }
+
+    // Restore resource states
+    GPUBarrier postBarriers[] = {
+        GPUBarrier::Image(&const_cast<Texture&>(depthBuffer),
+            ResourceState::SHADER_RESOURCE, ResourceState::DEPTHSTENCIL),
+        GPUBarrier::Image(&const_cast<Texture&>(renderTarget),
+            ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
+    };
+    dev->Barrier(postBarriers, 2, cmd);
+}
+
+} // namespace voxel
--- a/src/voxel/VoxelRTManager.h
+++ b/src/voxel/VoxelRTManager.h
@ -0,0 +1,124 @@
+#pragma once
+#include "DeferredGPUBuffer.h"
+#include "WickedEngine.h"
+
+namespace voxel {
+
+// ── Ray Tracing Manager (Phase 6) ──────────────────────────────
+// Groups all RT state: BLAS/TLAS management, shadow/AO dispatches.
+// Extracted from VoxelRenderer to isolate the ~500 lines of RT code
+// and its 20+ members for easier debugging and maintenance.
+
+class VoxelRTManager {
+public:
+    // ── Initialization ──────────────────────────────────────────
+    void initialize(wi::graphics::GraphicsDevice* device, uint32_t maxBlasVertices);
+
+    // ── BLAS extraction (compute shaders) ───────────────────────
+
+    // Extract blocky quad positions into BLAS vertex buffer.
+    void dispatchBLASExtract(wi::graphics::CommandList cmd,
+        const wi::graphics::GPUBuffer& quadBuffer,
+        const wi::graphics::GPUBuffer& chunkInfoBuffer,
+        uint32_t quadCount) const;
+
+    // Extract toping instance positions via GPU compute.
+    // groupBuffer/groupsGPU: toping BLAS group table.
+    void dispatchTopingBLASExtract(wi::graphics::CommandList cmd,
+        const wi::graphics::GPUBuffer& topingVertexBuffer,
+        const wi::graphics::GPUBuffer& topingInstanceBuffer,
+        const void* groupsGPUData, size_t groupsGPUSize,
+        uint32_t groupCount, uint32_t totalVertices) const;
+
+    // ── Acceleration structure build ────────────────────────────
+    static constexpr uint32_t BUILD_BLOCKY = 1 << 0;
+    static constexpr uint32_t BUILD_SMOOTH = 1 << 1;
+    static constexpr uint32_t BUILD_TOPING = 1 << 2;
+    static constexpr uint32_t BUILD_ALL    = BUILD_BLOCKY | BUILD_SMOOTH | BUILD_TOPING;
+
+    void buildAccelerationStructures(wi::graphics::CommandList cmd,
+        uint32_t buildFlags,
+        const wi::graphics::GPUBuffer& smoothVB,
+        uint32_t smoothVertCount) const;
+
+    // ── RT Shadows + AO dispatch ────────────────────────────────
+    void dispatchShadows(wi::graphics::CommandList cmd,
+        const wi::graphics::Texture& depthBuffer,
+        const wi::graphics::Texture& renderTarget,
+        const wi::graphics::Texture& normalTarget,
+        const wi::graphics::GPUBuffer& constantBuffer) const;
+
+    // ── Toping BLAS buffer management ───────────────────────────
+    // Ensure capacity for toping BLAS position + index buffers.
+    // Returns true if buffers were (re)created.
+    bool ensureTopingBLASCapacity(uint32_t totalVertices);
+
+    // ── State queries ───────────────────────────────────────────
+    bool isAvailable() const { return available_; }
+    bool isReady() const { return available_ && tlas_.IsValid(); }
+    bool isShadowsEnabled() const { return shadowsEnabled_; }
+    void setShadowsEnabled(bool v) { shadowsEnabled_ = v; }
+    uint32_t getShadowDebug() const { return shadowDebug_; }
+    void setShadowDebug(uint32_t v) { shadowDebug_ = v; }
+
+    uint32_t getBlockyTriCount() const { return blockyVertexCount_ / 3; }
+    uint32_t getSmoothTriCount() const { return smoothVertexCount_ / 3; }
+    uint32_t getTopingTriCount() const { return topingVertexCount_ / 3; }
+    uint32_t getTopingVertexCount() const { return topingVertexCount_; }
+    uint32_t getTlasInstanceCount() const { return tlasInstanceCount_; }
+    const wi::graphics::RaytracingAccelerationStructure& getTLAS() const { return tlas_; }
+
+    // Dirty flags (public for VoxelRenderPath orchestration)
+    mutable bool dirty = true;           // BLAS/TLAS need rebuild
+    mutable bool topingBLASDirty = false; // toping BLAS extract + rebuild needed
+    mutable bool aoHistoryValid = false;
+    mutable uint32_t frameCounter = 0;
+    mutable XMFLOAT4X4 prevViewProjection;
+
+    // AO textures (created by VoxelRenderPath::createRenderTargets)
+    mutable wi::graphics::Texture aoRawTexture;
+    mutable wi::graphics::Texture aoBlurredTexture;
+    mutable wi::graphics::Texture aoHistoryTexture;
+
+private:
+    wi::graphics::GraphicsDevice* device_ = nullptr;
+    mutable bool available_ = false;
+    mutable bool shadowsEnabled_ = false;
+    mutable uint32_t shadowDebug_ = 0;
+
+    // Shaders
+    wi::graphics::Shader blasExtractShader_;
+    wi::graphics::Shader topingBLASShader_;
+    wi::graphics::Shader shadowShader_;
+    wi::graphics::Shader aoBlurShader_;
+    wi::graphics::Shader aoApplyShader_;
+
+    // Blocky BLAS resources
+    mutable wi::graphics::GPUBuffer blasPositionBuffer_;
+    wi::graphics::GPUBuffer blasIndexBuffer_;
+    mutable wi::graphics::RaytracingAccelerationStructure blockyBLAS_;
+    mutable uint32_t blockyBLASCapacity_ = 0;
+    mutable uint32_t blockyVertexCount_ = 0;
+
+    // Smooth BLAS
+    mutable wi::graphics::RaytracingAccelerationStructure smoothBLAS_;
+    mutable uint32_t smoothBLASCapacity_ = 0;
+    mutable uint32_t smoothVertexCount_ = 0;
+
+    // Toping BLAS
+    mutable wi::graphics::RaytracingAccelerationStructure topingBLAS_;
+    mutable uint32_t topingBLASASCapacity_ = 0;
+    mutable uint32_t topingVertexCount_ = 0;
+    mutable DeferredGPUBuffer topingBLASPositionBuf_;
+    mutable wi::graphics::GPUBuffer topingBLASIndexBuffer_;
+    mutable uint32_t topingBLASIndexCount_ = 0;
+    wi::graphics::GPUBuffer topingBLASGroupBuffer_;
+
+    // TLAS
+    mutable wi::graphics::RaytracingAccelerationStructure tlas_;
+    mutable uint32_t tlasInstanceCount_ = 0;
+
+    uint32_t maxBlasVertices_ = 0;
+};
+
+} // namespace voxel
--- a/src/voxel/VoxelRenderer.cpp
+++ b/src/voxel/VoxelRenderer.cpp
--- a/src/voxel/VoxelRenderer.h
+++ b/src/voxel/VoxelRenderer.h
@ -2,6 +2,8 @@
 #include "VoxelWorld.h"
 #include "VoxelMesher.h"
 #include "TopingSystem.h"
+#include "DeferredGPUBuffer.h"
+#include "VoxelRTManager.h"
 #include "WickedEngine.h"

 namespace voxel {
@ -27,7 +29,7 @@ struct GPUChunkInfo {
    uint32_t pad2[2];          // pad to 112 bytes (7 × float4)
 };

-// ── Voxel Renderer (Phase 2: mega-buffer + MDI pipeline) ────────
+// ── Voxel Renderer (GPU mesh pipeline) ──────────────────────────
 class VoxelRenderer {
    friend class VoxelRenderPath;
 public:
@ -49,8 +51,8 @@ public:
        const wi::graphics::Texture& normalTarget
    ) const;

-    // Generate procedural textures for materials
-    void generateTextures();
+    // Load material textures from PNG files (RGB=albedo, A=heightmap)
+    void loadTextures();

    // Stats
    uint32_t getTotalQuads() const { return totalQuads_; }
@ -58,16 +60,16 @@ public:
    uint32_t getDrawCalls() const { return drawCalls_; }
    uint32_t getChunkCount() const { return chunkCount_; }
    bool isInitialized() const { return initialized_; }
-    bool isGpuCulling() const { return gpuCullingEnabled_; }
-    bool isMdiEnabled() const { return mdiEnabled_; }

    bool debugFaceColors_ = false;
    bool debugBlend_ = false;
    float windTime_ = 0.0f;  // set by VoxelRenderPath::Update each frame
+    float normalStrength_ = 0.7f; // normal map strength (0=off)
+    int debugLighting_ = 0;       // 0=all, 1=no nmap, 2=flat, 3=albedo, 4=NdotL
+    XMFLOAT4 sunDirection_ = { -0.7f, -0.4f, -0.3f, 0.0f }; // set by VoxelRenderPath::Update

 private:
    void createPipeline();
-    void rebuildMegaBuffer(VoxelWorld& world);

    wi::graphics::GraphicsDevice* device_ = nullptr;

@ -75,16 +77,12 @@ private:
    wi::graphics::Shader vertexShader_;
    wi::graphics::Shader pixelShader_;
    wi::graphics::PipelineState pso_;
-    wi::graphics::Shader cullShader_; // Frustum cull compute shader
-
    // Shaders & Pipeline (topings, Phase 4)
    wi::graphics::Shader topingVS_;
    wi::graphics::Shader topingPS_;
    wi::graphics::PipelineState topingPso_;
    wi::graphics::GPUBuffer topingVertexBuffer_;   // StructuredBuffer<TopingVertex>, SRV t4
-    wi::graphics::GPUBuffer topingInstanceBuffer_; // StructuredBuffer<float3>, SRV t5
-    mutable uint32_t topingInstanceCapacity_ = 0;  // pre-allocated capacity (avoid per-frame CreateBuffer)
-    mutable bool topingInstanceDirty_ = false;     // deferred upload via UpdateBuffer in Render()
+    DeferredGPUBuffer topingInstanceBuf_;            // StructuredBuffer<float3>, SRV t5
    static constexpr uint32_t MAX_TOPING_INSTANCES = 256 * 1024; // 256K instances max
    // Persistent staging buffers for toping upload (avoids per-frame allocations)
    struct TopingSortedInst { float wx, wy, wz; uint16_t type, variant; };
@ -93,30 +91,41 @@ private:
    std::vector<TopingGPUInst> topingGpuInsts_;
    mutable uint32_t topingDrawCalls_ = 0;

+    // ── Toping draw groups (shared between render + BLAS CS) ─────
+    struct TopingDrawGroup {
+        uint16_t type, variant;
+        uint32_t instanceOffset, instanceCount;
+        uint32_t vertexTemplateOffset, vertexCount; // from TopingDef::variants[]
+    };
+    std::vector<TopingDrawGroup> topingDrawGroups_; // built in uploadTopingData, reused in renderTopings
+
+    // ── Toping BLAS group staging (passed to VoxelRTManager) ──────
+    struct TopingBLASGroupGPU {
+        uint32_t globalVertexOffset;    // prefix sum of total vertices before this group
+        uint32_t vertexTemplateOffset;  // offset into topingVertices (t4)
+        uint32_t vertexCount;           // vertices per instance
+        uint32_t instanceOffset;        // offset into topingInstances (t5)
+        uint32_t instanceCount;         // instances in this group
+    };
+    std::vector<TopingBLASGroupGPU> topingBLASGroupsGPU_; // CPU staging for group table
+    mutable uint32_t topingBLASTotalVertices_ = 0;
+
    // Shaders & Pipeline (smooth surfaces, Phase 5)
    wi::graphics::Shader smoothVS_;
    wi::graphics::Shader smoothPS_;
    wi::graphics::RasterizerState smoothRasterizer_;
    wi::graphics::PipelineState smoothPso_;
-    wi::graphics::GPUBuffer smoothVertexBuffer_;   // StructuredBuffer<SmoothVertex>, SRV t6
-    mutable uint32_t smoothVertexCapacity_ = 0;    // pre-allocated capacity (avoid per-frame CreateBuffer)
-    std::vector<SmoothVertex> smoothStagingVerts_;  // persistent staging buffer (avoids per-frame alloc)
-    static constexpr uint32_t MAX_SMOOTH_VERTICES = 4 * 1024 * 1024; // 4M vertices max
-    mutable uint32_t smoothVertexCount_ = 0;
    mutable uint32_t smoothDrawCalls_ = 0;
-    mutable bool smoothVertexDirty_ = false;  // deferred upload via UpdateBuffer in Render()
-    bool smoothDirty_ = true;

-    // Texture array for materials (256x256, 5 layers for prototype)
-    wi::graphics::Texture textureArray_;
+    // Texture arrays for materials (512x512, 6 layers each)
+    wi::graphics::Texture textureArray_;      // RGBA: RGB=albedo, A=heightmap (t1)
+    wi::graphics::Texture normalArray_;       // RGB: tangent-space normal map (t7)
    wi::graphics::Sampler sampler_;

    // ── Mega-buffer architecture (Phase 2) ──────────────────────
    static constexpr uint32_t MEGA_BUFFER_CAPACITY = 2 * 1024 * 1024; // 2M quads max (16 MB)
    static constexpr uint32_t MAX_CHUNKS = 2048;
-    static constexpr uint32_t MAX_DRAWS = MAX_CHUNKS * 6; // up to 6 face groups per chunk

-    wi::graphics::GPUBuffer megaQuadBuffer_;    // StructuredBuffer<PackedQuad>, SRV t0
    wi::graphics::GPUBuffer chunkInfoBuffer_;   // StructuredBuffer<GPUChunkInfo>, SRV t2

    // CPU-side tracking
@ -127,27 +136,9 @@ private:
    };
    std::vector<ChunkSlot> chunkSlots_;
    std::vector<GPUChunkInfo> cpuChunkInfo_;
-    std::vector<PackedQuad> cpuMegaQuads_;       // CPU staging for mega-buffer
    uint32_t chunkCount_ = 0;
    bool megaBufferDirty_ = true;

-    // ── Indirect draw (Phase 2 MDI) ─────────────────────────────
-    // Wicked Engine's DrawInstancedIndirectCount command signature includes a
-    // push constant (1 × uint32 at b999) BEFORE each D3D12_DRAW_ARGUMENTS.
-    // Total stride = 4 + 16 = 20 bytes per draw entry.
-    struct IndirectDrawArgs {
-        uint32_t pushConstant;              // written to b999[0] by ExecuteIndirect
-        uint32_t vertexCountPerInstance;
-        uint32_t instanceCount;
-        uint32_t startVertexLocation;
-        uint32_t startInstanceLocation;
-    };
-    wi::graphics::GPUBuffer indirectArgsBuffer_;   // IndirectDrawArgs[MAX_DRAWS]
-    wi::graphics::GPUBuffer drawCountBuffer_;      // uint32_t[1]
-    mutable std::vector<IndirectDrawArgs> cpuIndirectArgs_;
-    bool gpuCullingEnabled_ = true;                // Phase 2.3: GPU compute cull (true) vs CPU fallback (false)
-    bool mdiEnabled_ = true;                       // Phase 2.2: MDI rendering with CPU-filled indirect args
-
    // Constants buffer (must match HLSL VoxelCB)
    struct VoxelConstants {
        XMFLOAT4X4 viewProjection;
@ -184,7 +175,6 @@ private:
    wi::graphics::GPUBuffer gpuQuadCounter_;  // atomic counter for GPU mesh output
    wi::graphics::GPUBuffer meshCounterReadback_; // READBACK buffer for quad counter
    bool gpuMesherAvailable_ = false;
-    bool gpuMeshEnabled_ = true;              // Use GPU meshing instead of CPU greedy
    mutable uint32_t gpuMeshQuadCount_ = 0;   // Readback from previous frame (1-frame delay)
    mutable uint32_t voxelDataCapacity_ = 0;  // Current capacity of voxelDataBuffer_ (in uint32s)
    mutable std::vector<uint32_t> packedVoxelCache_; // cached packed voxel data for all chunks
@ -204,81 +194,39 @@ private:
    mutable uint32_t gpuSmoothVertexCount_ = 0;       // readback from previous frame
    mutable bool gpuSmoothMeshDirty_ = true;

-    // ── Ray Tracing (Phase 6.1) ─────────────────────────────────────
-    wi::graphics::Shader blasExtractShader_;              // voxelBLASExtractCS compute shader
-    mutable wi::graphics::GPUBuffer blasPositionBuffer_;  // float3[] for blocky BLAS (6 verts per quad)
-    wi::graphics::GPUBuffer blasIndexBuffer_;             // sequential uint32 indices [0,1,2,...] for BLAS
-    mutable wi::graphics::RaytracingAccelerationStructure blockyBLAS_;
-    mutable wi::graphics::RaytracingAccelerationStructure smoothBLAS_;
-    mutable wi::graphics::RaytracingAccelerationStructure topingBLAS_;
-    mutable wi::graphics::RaytracingAccelerationStructure tlas_;
-    mutable wi::graphics::GPUBuffer topingBLASPositionBuffer_; // float3[] world-space toping positions
-    mutable wi::graphics::GPUBuffer topingBLASIndexBuffer_;    // sequential indices for toping BLAS
-    mutable uint32_t topingBLASPositionCapacity_ = 0;          // pre-allocated capacity (vertices)
-    mutable uint32_t topingBLASIndexCount_ = 0;                // size of toping index buffer
-    mutable bool topingBLASDirty_ = false;                     // deferred BLAS position upload + rebuild
-    mutable uint32_t topingBLASVertexCount_ = 0;               // actual vertex count for current frame
-    std::vector<float> topingBLASPositionStaging_;             // CPU staging for deferred upload
+    // ── Ray Tracing (Phase 6) ────────────────────────────────────────
    static constexpr uint32_t MAX_BLAS_VERTICES = MEGA_BUFFER_CAPACITY * 6; // 6 verts per quad
-    mutable bool rtAvailable_ = false;                    // GPU supports RT
-    mutable bool rtDirty_ = true;                         // BLAS/TLAS need rebuild
-    mutable uint32_t rtBlockyVertexCount_ = 0;            // current blocky BLAS vertex count
-    mutable uint32_t rtSmoothVertexCount_ = 0;            // current smooth BLAS vertex count
-    mutable uint32_t rtTopingVertexCount_ = 0;            // current toping BLAS vertex count
-    // BLAS capacity tracking: only recreate AS when vertex count exceeds capacity
-    mutable uint32_t blockyBLASCapacity_ = 0;             // vertex count at BLAS creation
-    mutable uint32_t smoothBLASCapacity_ = 0;
-    mutable uint32_t topingBLASASCapacity_ = 0;           // separate from topingBLASPositionCapacity_ (buffer capacity)
-    mutable uint32_t tlasInstanceCount_ = 0;              // track TLAS instance count to avoid per-frame recreation
+    mutable VoxelRTManager rt_;

-    void dispatchBLASExtract(wi::graphics::CommandList cmd) const;
-    void buildAccelerationStructures(wi::graphics::CommandList cmd) const;
-
-    // ── RT Shadows + AO (Phase 6.2 + 6.3) ──────────────────────────
-    wi::graphics::Shader shadowShader_;           // voxelShadowCS compute shader
-    wi::graphics::Shader aoBlurShader_;           // voxelAOBlurCS compute shader
-    wi::graphics::Shader aoApplyShader_;          // voxelAOApplyCS compute shader
-    mutable wi::graphics::Texture aoRawTexture_;      // R8_UNORM: raw AO from shadow CS
-    mutable wi::graphics::Texture aoBlurredTexture_;  // R8_UNORM: after bilateral blur
-    mutable wi::graphics::Texture aoHistoryTexture_;    // R8_UNORM: previous frame's temporally accumulated AO
-    mutable XMFLOAT4X4 prevViewProjection_;              // previous frame's VP matrix
-    mutable uint32_t frameCounter_ = 0;
-    mutable bool aoHistoryValid_ = false;
-    mutable bool rtShadowsEnabled_ = false;       // true when shader + TLAS ready
-    mutable uint32_t rtShadowDebug_ = 0;           // 0=off, 1=debug shadows, 2=debug AO
-
-    void dispatchShadows(wi::graphics::CommandList cmd,
-                         const wi::graphics::Texture& depthBuffer,
-                         const wi::graphics::Texture& renderTarget,
-                         const wi::graphics::Texture& normalTarget) const;
-
-    // Benchmark state machine: runs once after world gen
-    enum class BenchState { IDLE, DISPATCH, READBACK, DONE };
-    mutable BenchState benchState_ = BenchState::IDLE;
-    mutable float cpuMeshTimeMs_ = 0.0f;
-    mutable uint32_t gpuBaselineQuads_ = 0;
-
-    void dispatchGpuMeshBenchmark(wi::graphics::CommandList cmd, const VoxelWorld& world) const;
-    void readbackGpuMeshBenchmark() const;
    void dispatchGpuMesh(wi::graphics::CommandList cmd, const VoxelWorld& world,
        ProfileAccum* profPack = nullptr, ProfileAccum* profUpload = nullptr,
        ProfileAccum* profDispatch = nullptr) const;
    void dispatchGpuSmoothMesh(wi::graphics::CommandList cmd, const VoxelWorld& world) const;
    void rebuildChunkInfoOnly(VoxelWorld& world);

-    // ── GPU Timestamp Queries (Phase 2 benchmark) ────────────────
+    // ── GPU Timestamp Queries (comprehensive GPU profiling) ────────
    wi::graphics::GPUQueryHeap timestampHeap_;
    wi::graphics::GPUBuffer timestampReadback_;
-    static constexpr uint32_t TS_CULL_BEGIN = 0;
-    static constexpr uint32_t TS_CULL_END = 1;
-    static constexpr uint32_t TS_DRAW_BEGIN = 2;
-    static constexpr uint32_t TS_DRAW_END = 3;
-    static constexpr uint32_t TS_MESH_BEGIN = 4;
-    static constexpr uint32_t TS_MESH_END = 5;
-    static constexpr uint32_t TS_COUNT = 6;
-    mutable float gpuCullTimeMs_ = 0.0f;
-    mutable float gpuDrawTimeMs_ = 0.0f;
+    // Timestamp slots: pairs of (BEGIN, END) for each GPU phase
+    static constexpr uint32_t TS_GPU_MESH_BEGIN = 0;
+    static constexpr uint32_t TS_GPU_MESH_END = 1;
+    static constexpr uint32_t TS_GPU_SMOOTH_BEGIN = 2;
+    static constexpr uint32_t TS_GPU_SMOOTH_END = 3;
+    static constexpr uint32_t TS_BLAS_EXTRACT_BEGIN = 4;
+    static constexpr uint32_t TS_BLAS_EXTRACT_END = 5;
+    static constexpr uint32_t TS_BLAS_BUILD_BEGIN = 6;
+    static constexpr uint32_t TS_BLAS_BUILD_END = 7;
+    static constexpr uint32_t TS_DRAW_BEGIN = 8;
+    static constexpr uint32_t TS_DRAW_END = 9;
+    static constexpr uint32_t TS_RT_SHADOWS_BEGIN = 10;
+    static constexpr uint32_t TS_RT_SHADOWS_END = 11;
+    static constexpr uint32_t TS_COUNT = 12;
    mutable float gpuMeshTimeMs_ = 0.0f;
+    mutable float gpuSmoothMeshTimeMs_ = 0.0f;
+    mutable float gpuBLASExtractTimeMs_ = 0.0f;
+    mutable float gpuBLASBuildTimeMs_ = 0.0f;
+    mutable float gpuDrawTimeMs_ = 0.0f;
+    mutable float gpuRTShadowsTimeMs_ = 0.0f;

    // Stats (mutable: updated during const Render() call)
    mutable uint32_t totalQuads_ = 0;
@ -288,10 +236,15 @@ private:
    bool initialized_ = false;

 public:
-    float getGpuCullTimeMs() const { return gpuCullTimeMs_; }
    float getGpuDrawTimeMs() const { return gpuDrawTimeMs_; }
-    bool isGpuMeshEnabled() const { return gpuMeshEnabled_ && gpuMesherAvailable_; }
+    float getGpuMeshTimeMs() const { return gpuMeshTimeMs_; }
+    float getGpuSmoothMeshTimeMs() const { return gpuSmoothMeshTimeMs_; }
+    float getGpuBLASExtractTimeMs() const { return gpuBLASExtractTimeMs_; }
+    float getGpuBLASBuildTimeMs() const { return gpuBLASBuildTimeMs_; }
+    float getGpuRTShadowsTimeMs() const { return gpuRTShadowsTimeMs_; }
+    bool isGpuMeshEnabled() const { return gpuMesherAvailable_; }
    uint32_t getGpuMeshQuadCount() const { return gpuMeshQuadCount_; }
+    VoxelRTManager& rt() const { return rt_; }

    // Phase 4: Toping rendering
    void uploadTopingData(const TopingSystem& topingSystem);
@ -304,26 +257,105 @@ public:
    ) const;
    uint32_t getTopingDrawCalls() const { return topingDrawCalls_; }

-    // Phase 5: Smooth surface rendering
-    void uploadSmoothData(VoxelWorld& world);
-    void uploadSmoothDataFast(VoxelWorld& world); // chunkIndex already stamped
+    // Phase 5: Smooth surface rendering (GPU compute only)
    void renderSmooth(
        wi::graphics::CommandList cmd,
        const wi::graphics::Texture& depthBuffer,
        const wi::graphics::Texture& renderTarget,
        const wi::graphics::Texture& normalTarget
    ) const;
-    uint32_t getSmoothVertexCount() const { return (smoothCentroidShader_.IsValid() && smoothMeshShader_.IsValid()) ? gpuSmoothVertexCount_ : smoothVertexCount_; }
+    uint32_t getSmoothVertexCount() const { return gpuSmoothVertexCount_; }
    uint32_t getSmoothDrawCalls() const { return smoothDrawCalls_; }

-    // Phase 6: Ray Tracing
-    bool isRTAvailable() const { return rtAvailable_; }
-    bool isRTReady() const { return rtAvailable_ && tlas_.IsValid(); }
-    bool isRTShadowsEnabled() const { return rtShadowsEnabled_; }
-    uint32_t getRTBlockyTriCount() const { return rtBlockyVertexCount_ / 3; }
-    uint32_t getRTSmoothTriCount() const { return rtSmoothVertexCount_ / 3; }
-    uint32_t getRTTopingTriCount() const { return rtTopingVertexCount_ / 3; }
-    const wi::graphics::RaytracingAccelerationStructure& getTLAS() const { return tlas_; }
+    // Phase 6: Ray Tracing (delegated to VoxelRTManager)
+    bool isRTAvailable() const { return rt_.isAvailable(); }
+    bool isRTReady() const { return rt_.isReady(); }
+    bool isRTShadowsEnabled() const { return rt_.isShadowsEnabled(); }
+    uint32_t getRTBlockyTriCount() const { return rt_.getBlockyTriCount(); }
+    uint32_t getRTSmoothTriCount() const { return rt_.getSmoothTriCount(); }
+    uint32_t getRTTopingTriCount() const { return rt_.getTopingTriCount(); }
+    const wi::graphics::RaytracingAccelerationStructure& getTLAS() const { return rt_.getTLAS(); }
+};
+
+// ── Camera Controller ────────────────────────────────────────────
+struct CameraController {
+    float speed = 50.0f;
+    float sensitivity = 0.003f;
+    XMFLOAT3 pos = { 256.0f, 100.0f, 256.0f };
+    float pitch = -0.3f;
+    float yaw = 0.0f;
+    bool mouseCaptured = false;
+
+    void set(float x, float y, float z, float p, float yw) {
+        pos = { x, y, z }; pitch = p; yaw = yw;
+    }
+    void handleInput(float dt, wi::scene::CameraComponent* camera);
+};
+
+// ── Animation State ─────────────────────────────────────────────
+struct AnimationState {
+    float windTime = 0.0f;       // continuous, always running
+    bool terrainAnimated = false; // toggled with F3
+    bool sunOrbit = false;       // toggled with F7: sun orbits in ~10s cycle
+    bool showCrosshair = true;   // toggled with F8: crosshair + face debug info
+    // F9 debug cycle: 0=all ON, 1=normals OFF, 2=flat lighting, 3=albedo only, 4=NdotL only, 5=normal viz
+    int debugLighting = 0;
+    static constexpr int DEBUG_LIGHTING_MODES = 6;
+    float time = 0.0f;           // current animation time offset
+    float accum = 0.0f;          // accumulator for 30 Hz timer
+    static constexpr float INTERVAL = 1.0f / 30.0f; // ~33.3ms = 30 Hz
+
+    // Returns true when an animation tick should fire (call every frame).
+    bool tick(float dt) {
+        windTime += dt;
+        if (!terrainAnimated) return false;
+        accum += dt;
+        if (accum < INTERVAL) return false;
+        accum -= INTERVAL;
+        time += INTERVAL;
+        return true;
+    }
+};
+
+// ── CPU Profiling (averages every INTERVAL seconds) ─────────────
+struct VoxelProfiler {
+    static constexpr float INTERVAL = 5.0f;
+
+    // Update() phase
+    ProfileAccum regenerate;      // regenerateAnimated
+    ProfileAccum updateMeshes;    // updateMeshes (rebuildChunkInfoOnly)
+    ProfileAccum topingCollect;   // topingSystem.collectInstances
+    ProfileAccum topingUpload;    // uploadTopingData
+    ProfileAccum smoothMesh;      // (legacy, unused — GPU smooth only)
+    ProfileAccum smoothUpload;    // (legacy, unused — GPU smooth only)
+    ProfileAccum frame;           // full frame (Update only - legacy)
+
+    // Render() phase
+    ProfileAccum voxelPack;       // voxel data packing in dispatchGpuMesh
+    ProfileAccum gpuUpload;       // GPU upload in dispatchGpuMesh
+    ProfileAccum gpuDispatch;     // compute dispatches in dispatchGpuMesh
+    ProfileAccum gpuMeshDispatch; // GPU mesh compute dispatch (in Render)
+    ProfileAccum gpuSmoothDispatch; // GPU smooth mesh dispatch (in Render)
+    ProfileAccum blasExtract;     // BLAS position extraction compute
+    ProfileAccum blasBuild;       // BLAS/TLAS build
+    ProfileAccum deferredUpload;  // deferred GPU buffer uploads
+    ProfileAccum render;          // render() draw calls
+    ProfileAccum rtShadows;       // RT shadows + AO dispatch
+
+    // Totals
+    ProfileAccum fullFrame;       // true full frame (Update + Render + Compose)
+    ProfileAccum gpuWait;         // GPU sync: time between Compose end and next Update start
+    ProfileAccum wickedRender;    // RenderPath3D::Render() (Wicked internal)
+    ProfileAccum trueFrame;       // wall-clock frame-to-frame time
+
+    // Timing helpers
+    std::chrono::high_resolution_clock::time_point frameStart;
+    std::chrono::high_resolution_clock::time_point lastComposeEnd;
+    bool lastComposeEndValid = false;
+    float timer = 0.0f;
+
+    void log(const VoxelRenderer& renderer) const;
+    void resetAll();
 };

 // ── Custom RenderPath that integrates voxel rendering ───────────
@ -336,15 +368,14 @@ public:
    bool debugMode = false;
    bool debugSmooth = false;
    bool screenshotMode = false;  // CLI "screenshot": auto-position camera, capture, quit
-    void setCamera(float x, float y, float z, float pitch, float yaw);
+    void setCamera(float x, float y, float z, float pitch, float yaw) {
+        camera_.set(x, y, z, pitch, yaw);
+    }
    void resetAOHistory();  // invalidate temporal AO after camera jump

-    float cameraSpeed = 50.0f;
-    float cameraSensitivity = 0.003f;
-    XMFLOAT3 cameraPos = { 256.0f, 100.0f, 256.0f };
-    float cameraPitch = -0.3f;
-    float cameraYaw = 0.0f;
-    bool mouseCaptured = false;
+    CameraController camera_;
+    AnimationState anim_;
+    mutable VoxelProfiler prof_;

    const wi::graphics::Texture& getVoxelRT() const { return voxelRT_; }

@ -354,42 +385,32 @@ public:
    void Compose(wi::graphics::CommandList cmd) const override;

 private:
-    void handleInput(float dt);
    void createRenderTargets();
    mutable bool worldGenerated_ = false;
    mutable int frameCount_ = 0;
    mutable float lastDt_ = 0.016f;
    mutable float smoothFps_ = 60.0f;

-    // Wind animation (continuous, always running)
-    float windTime_ = 0.0f;
-
-    // Animated terrain (wave effect at 60 Hz, toggled with F3)
-    bool animatedTerrain_ = false;
-    float animTime_ = 0.0f;
-    float animAccum_ = 0.0f;
-    static constexpr float ANIM_INTERVAL = 1.0f / 60.0f; // ~16.7ms = 60 Hz
-
    wi::graphics::Texture voxelRT_;
    wi::graphics::Texture voxelNormalRT_;  // Phase 6: world-space normals for RT shadows/AO
    wi::graphics::Texture voxelDepth_;
    mutable bool rtCreated_ = false;

-    // ── CPU Profiling (averages every 5 seconds) ─────────────────
-    mutable ProfileAccum profRegenerate_;     // regenerateAnimated
-    mutable ProfileAccum profUpdateMeshes_;   // updateMeshes (rebuildChunkInfoOnly or CPU mesh)
-    mutable ProfileAccum profVoxelPack_;      // voxel data packing in dispatchGpuMesh
-    mutable ProfileAccum profGpuUpload_;      // GPU upload in dispatchGpuMesh
-    mutable ProfileAccum profGpuDispatch_;    // compute dispatches in dispatchGpuMesh
-    mutable ProfileAccum profRender_;         // render() total
-    mutable ProfileAccum profFrame_;          // full frame (Update + Render + Compose)
-    mutable ProfileAccum profSmoothMesh_;     // SmoothMesher::meshChunk (all chunks)
-    mutable ProfileAccum profSmoothUpload_;   // uploadSmoothData
-    mutable ProfileAccum profTopingCollect_;  // topingSystem.collectInstances
-    mutable ProfileAccum profTopingUpload_;   // uploadTopingData
-    mutable float profTimer_ = 0.0f;
-    static constexpr float PROF_INTERVAL = 5.0f;
-    void logProfilingAverages() const;
+    mutable uint32_t rtBuildSkipCounter_ = 0;  // stagger BLAS builds during animation
+    mutable bool rtWasEnabled_ = false;       // saved RT state before animation
+
+    // Cached crosshair raycast result (updated each frame in Compose)
+    struct CrosshairHit {
+        bool valid = false;
+        int x = 0, y = 0, z = 0;
+        int face = -1;       // 0=+X,1=-X,2=+Y,3=-Y,4=+Z,5=-Z
+        uint8_t matID = 0;
+        bool smooth = false;
+    };
+    mutable CrosshairHit crosshairHit_;
+
+    // Build a full debug log string (used by HUD overlay and screenshot .log)
+    std::string buildDebugLog() const;
 };

 } // namespace voxel
--- a/src/voxel/VoxelWorld.cpp
+++ b/src/voxel/VoxelWorld.cpp
@ -115,7 +115,7 @@ void VoxelWorld::generateChunk(Chunk& chunk, float timeOffset) {
    const float caveScale = 0.05f;
    const float caveThreshold = 0.3f;

-    // Animation mode: fewer octaves + skip caves (much faster for 20Hz regen)
+    // Animation mode: fewer octaves + skip caves + cached materials (much faster for 30Hz regen)
    const bool animating = (timeOffset != 0.0f);
    const int heightOctaves = animating ? 2 : 5;

@ -130,34 +130,47 @@ void VoxelWorld::generateChunk(Chunk& chunk, float timeOffset) {
            float height = baseHeight + heightScale * fbm(wx * scale, timeOffset, wz * scale, heightOctaves);

            // ── Surface material via noise-based patches ──
-            // Use 2D noise at different frequencies/seeds to create organic patches
-            // of each material on the surface, instead of altitude bands.
-            float matNoise1 = fbm(wx * 0.03f + 500.0f, 0.0f, wz * 0.03f + 500.0f, 3);   // large patches
-            float matNoise2 = fbm(wx * 0.08f + 1000.0f, 0.0f, wz * 0.08f + 1000.0f, 2);  // medium detail
-            float matNoise3 = fbm(wx * 0.05f + 2000.0f, 0.0f, wz * 0.05f + 2000.0f, 3);  // third channel
-            // Combined noise for material selection (range roughly -1..1)
-            float matVal = matNoise1 * 0.6f + matNoise2 * 0.4f;
-
+            // Material noise is time-independent (uses y=0.0f, no timeOffset).
+            // During animation, reuse cached values to skip 8 noise3D calls/column.
+            const int colIdx = x + z * CHUNK_SIZE;
            uint8_t surfaceMat;
            bool surfaceSmooth = false;
-            if (matVal < -0.30f) {
-                surfaceMat = 4; // Sand
-            } else if (matVal < -0.15f) {
-                surfaceMat = 2; // Dirt (adjacent to sand for sand↔dirt testing)
-            } else if (matVal < -0.05f) {
-                surfaceMat = 3; // Stone (blocky, with topings)
-            } else if (matVal < 0.05f) {
-                surfaceMat = 6; // SmoothStone (smooth surface)
-                surfaceSmooth = true;
-            } else if (matVal < 0.20f) {
-                surfaceMat = 1; // Grass
-            } else if (matVal < 0.30f) {
-                surfaceMat = 4; // Sand (adjacent to grass for sand↔grass testing)
-            } else if (matNoise3 > 0.1f) {
-                surfaceMat = 5; // Snow (smooth)
-                surfaceSmooth = true;
+
+            if (animating) {
+                // Fast path: read cached material from initial generation
+                surfaceMat = chunk.cachedSurfaceMat[colIdx];
+                surfaceSmooth = (chunk.cachedSurfaceFlags[colIdx] != 0);
            } else {
-                surfaceMat = 2; // Dirt
+                // Full path: compute material noise and cache it
+                float matNoise1 = fbm(wx * 0.03f + 500.0f, 0.0f, wz * 0.03f + 500.0f, 3);   // large patches
+                float matNoise2 = fbm(wx * 0.08f + 1000.0f, 0.0f, wz * 0.08f + 1000.0f, 2);  // medium detail
+                float matNoise3 = fbm(wx * 0.05f + 2000.0f, 0.0f, wz * 0.05f + 2000.0f, 3);  // third channel
+                float matVal = matNoise1 * 0.6f + matNoise2 * 0.4f;
+
+                if (matVal < -0.30f) {
+                    surfaceMat = 4; // Sand
+                } else if (matVal < -0.15f) {
+                    surfaceMat = 2; // Dirt
+                } else if (matVal < -0.05f) {
+                    surfaceMat = 3; // Stone (blocky, with topings)
+                } else if (matVal < 0.05f) {
+                    surfaceMat = 6; // SmoothStone (smooth surface)
+                    surfaceSmooth = true;
+                } else if (matVal < 0.20f) {
+                    surfaceMat = 1; // Grass
+                } else if (matVal < 0.30f) {
+                    surfaceMat = 4; // Sand
+                } else if (matNoise3 > 0.1f) {
+                    surfaceMat = 5; // Snow (smooth)
+                    surfaceSmooth = true;
+                } else {
+                    surfaceMat = 2; // Dirt (smooth)
+                    surfaceSmooth = true;
+                }
+
+                // Cache for future animation frames
+                chunk.cachedSurfaceMat[colIdx] = surfaceMat;
+                chunk.cachedSurfaceFlags[colIdx] = surfaceSmooth ? 1 : 0;
            }

            for (int y = 0; y < CHUNK_SIZE; y++) {
--- a/src/voxel/VoxelWorld.h
+++ b/src/voxel/VoxelWorld.h
@ -19,12 +19,14 @@ struct Chunk {
    uint32_t faceOffsets[6] = {}; // offset (in quads) for each face group within quads[]
    uint32_t faceCounts[6] = {};  // number of quads per face group

-    // Smooth mesh data (output of Surface Nets mesher, Phase 5)
-    std::vector<SmoothVertex> smoothVertices;
-    uint32_t smoothVertexCount = 0;
-    bool hasSmooth = false; // true if chunk has smooth mesh output (set by mesher)
+    // Smooth voxel flags (used by GPU smooth mesher to decide which chunks to dispatch)
    bool containsSmooth = false; // true if chunk contains any FLAG_SMOOTH voxels (set during generation)

+    // Cached surface material per column (set during initial generation, reused during animation)
+    // This avoids recomputing 8 noise3D calls per column that are time-independent.
+    uint8_t cachedSurfaceMat[CHUNK_SIZE * CHUNK_SIZE] = {};   // material ID per (x,z) column
+    uint8_t cachedSurfaceFlags[CHUNK_SIZE * CHUNK_SIZE] = {}; // smooth flag per (x,z) column
+
    VoxelData& at(int x, int y, int z) {
        return voxels[x + y * CHUNK_SIZE + z * CHUNK_SIZE * CHUNK_SIZE];
    }
--- a/tools/prepare_textures.py
+++ b/tools/prepare_textures.py
@ -0,0 +1,125 @@
+"""
+Prepare voxel textures from FreeStylized.com ZIPs.
+Outputs per material:
+  - *_albedo.png  : RGBA (RGB=albedo, A=heightmap)
+  - *_normal.png  : RGB normal map (OpenGL convention, Y-up)
+"""
+import io
+import os
+import zipfile
+from PIL import Image, ImageEnhance
+
+# (zip_name, color_pattern, height_pattern, normal_pattern, brightness_factor)
+# brightness_factor: <1 = darken, >1 = brighten, 1.0 = unchanged
+MATERIALS = [
+    ("grass_01_1k",         "color",     "height", "normal_gl", 1.0),
+    ("ground_02_1k",        "color",     "height", "normal_gl", 0.75),  # dirt: darkened
+    ("ground_stones_01_1k", "baseColor", "height", "normal_gl", 1.0),
+    ("sand_01_1k",          "color",     "height", "normal_gl", 1.0),
+    ("snow_01_1k",          "color",     "height", "normal_gl", 1.0),
+    ("rock_01_1k",          "color",     "height", "normal_gl", 1.0),
+]
+
+OUTPUT_NAMES = [
+    "grass",
+    "dirt",
+    "stone",
+    "sand",
+    "snow",
+    "smoothstone",
+]
+
+TARGET_SIZE = 512
+RAW_DIR = os.path.join(os.path.dirname(__file__), "..", "assets", "raw")
+OUT_DIR = os.path.join(os.path.dirname(__file__), "..", "assets", "voxel")
+
+
+def find_file_in_zip(zf, pattern):
+    """Find a file in the zip matching a pattern substring."""
+    for name in zf.namelist():
+        basename = os.path.basename(name).lower()
+        if pattern.lower() in basename and basename.endswith(".png"):
+            return name
+    return None
+
+
+def load_image_from_zip(zf, filename, mode="RGB"):
+    data = zf.read(filename)
+    img = Image.open(io.BytesIO(data))
+    # Handle 16-bit heightmaps: Pillow's .convert("L") on I;16 images
+    # doesn't scale properly. We must manually scale 0-65535 → 0-255.
+    if img.mode in ("I;16", "I") and mode == "L":
+        # Convert to 32-bit int first, then scale down
+        img = img.convert("I")
+        img = img.point(lambda v: v / 256)
+        return img.convert("L")
+    return img.convert(mode)
+
+
+def process_material(zip_path, color_pat, height_pat, normal_pat, brightness, out_name):
+    with zipfile.ZipFile(zip_path, "r") as zf:
+        color_file = find_file_in_zip(zf, color_pat)
+        height_file = find_file_in_zip(zf, height_pat)
+        normal_file = find_file_in_zip(zf, normal_pat)
+
+        if not color_file:
+            print(f"  ERROR: no color file matching '{color_pat}' in {zip_path}")
+            return False
+
+        # ── Albedo + Heightmap → RGBA ──
+        color_img = load_image_from_zip(zf, color_file, "RGB")
+
+        if brightness != 1.0:
+            color_img = ImageEnhance.Brightness(color_img).enhance(brightness)
+
+        if height_file:
+            height_img = load_image_from_zip(zf, height_file, "L")
+        else:
+            print(f"  WARNING: no height map, deriving from luminance")
+            height_img = color_img.convert("L")
+
+        color_img = color_img.resize((TARGET_SIZE, TARGET_SIZE), Image.LANCZOS)
+        height_img = height_img.resize((TARGET_SIZE, TARGET_SIZE), Image.LANCZOS)
+
+        r, g, b = color_img.split()
+        rgba = Image.merge("RGBA", (r, g, b, height_img))
+
+        albedo_path = os.path.join(OUT_DIR, f"{out_name}_albedo.png")
+        rgba.save(albedo_path, "PNG")
+        print(f"  OK: {out_name}_albedo.png ({TARGET_SIZE}x{TARGET_SIZE})")
+
+        # ── Normal map → RGB ──
+        if normal_file:
+            normal_img = load_image_from_zip(zf, normal_file, "RGB")
+            normal_img = normal_img.resize((TARGET_SIZE, TARGET_SIZE), Image.LANCZOS)
+            normal_path = os.path.join(OUT_DIR, f"{out_name}_normal.png")
+            normal_img.save(normal_path, "PNG")
+            print(f"  OK: {out_name}_normal.png ({TARGET_SIZE}x{TARGET_SIZE})")
+        else:
+            print(f"  WARNING: no normal map found")
+
+        return True
+
+
+def main():
+    os.makedirs(OUT_DIR, exist_ok=True)
+    print(f"Output directory: {os.path.abspath(OUT_DIR)}")
+    print()
+
+    success = 0
+    for i, (zip_name, color_pat, height_pat, normal_pat, brightness) in enumerate(MATERIALS):
+        zip_path = os.path.join(RAW_DIR, zip_name + ".zip")
+        print(f"[{i+1}/6] {OUTPUT_NAMES[i]} <- {zip_name}.zip")
+
+        if not os.path.exists(zip_path):
+            print(f"  ERROR: {zip_path} not found")
+            continue
+
+        if process_material(zip_path, color_pat, height_pat, normal_pat, brightness, OUTPUT_NAMES[i]):
+            success += 1
+
+    print(f"\nDone: {success}/6 materials generated in {os.path.abspath(OUT_DIR)}")
+
+
+if __name__ == "__main__":
+    main()
--- a/voxel_engine_spec.md
+++ b/voxel_engine_spec.md
@ -324,9 +324,7 @@ Le VoxelRenderer s'insère dans le render path de Wicked via des hooks dans le R

 J'aimerais tester quelque chose, c'est un nouveau type de block qui ne contient que des modèles 3D customs et qui aurait des comportements de jointure dynamique selon les blocs voisins identiques. Spécifiquement, j'aimerais créer des tuyaux qui se connectent les uns aux autres ou créent des nouvelles connexions pour toujours toucher les blocks tuyaux voisin.

-Le ciel te plaît — parfait ! On continue vers Wonderbox. Qu'est-ce que tu voudrais améliorer ensuite ? En comparant avec la ref, je vois plusieurs pistes :
-
-Couleurs plus saturées/profondes — le vert de l'herbe Wonderbox est plus riche et profond
+Target wonderbox
 Fog atmosphérique — la brume chaude au loin qui fond le terrain dans le ciel
 Ombres plus marquées — le contraste ombre/lumière est plus prononcé dans Wonderbox
-Faces latérales des blocs — plus texturées/détaillées dans Wonderbox
+Faces latérales des blocs — plus texturées/détaillées dans Wonderbox
Author	SHA1	Message	Date
Samuel Bouchet	626fbaea80	Fix smooth Surface Nets rendering: eliminate faceting, fix blocky junction - Remove geoN (ddx/ddy) from smooth PS entirely — use smooth interpolated normal N for all triplanar sampling (albedo, heightmap, normal map). geoN changes discontinuously at triangle edges, causing per-triangle faceting in texture weights and normal perturbation. - Tune consistency-based vertex normal blend to smoothstep(0.70, 0.90): snaps to face normal at 90° boundaries (seamless blocky join) while preserving smooth normals on curved terrain. - Unify all 3 edge axes (X/Y/Z) to same smoothstep formula (was mixed smoothstep + pow4). - Remove grass-specific hardcoded shading from both PS (side darkening, warm shift, ambient boost) — will be data-driven per-material later. - Remove CPU SmoothMesher code (GPU-only path). - Document all findings in TROUBLESHOOTING.md with calibration table.	2026-04-01 20:35:42 +02:00
Samuel Bouchet	d5bf499375	Add debug tools	2026-04-01 18:12:58 +02:00
Samuel Bouchet	4c50727cb6	Ignore some files	2026-04-01 18:12:53 +02:00
Samuel Bouchet	4419c612bd	Phase 8: Real stylized textures with UDN triplanar normal mapping - Load CC0 FreeStylized textures (6 materials: grass, dirt, stone, sand, snow, smoothstone) as Texture2DArray: t1=albedo+heightmap RGBA, t7=normal maps GL format - Height-based texture blending: winner-takes-all with sharpness=16, 40% blend zone, asymmetric bias (coeff 1.6) for resistBleed materials (grass resists sand bleed) - UDN triplanar normal mapping with 3 critical fixes: * Use raw normal (NOT abs) in UDN formula — abs inverts lighting on -X/-Y/-Z faces * sign(normal) correction on tangent X for back-facing UV mirror * GL green channel flip on Y-projection only (not X/Z where V=worldY is correct) - Dirt material rendered smooth (FLAG_SMOOTH), ground_02 texture darkened 0.75 - Sun orbit debug mode (F7): 10s cycle with sinusoidal altitude - Crosshair + face debug HUD (F8): DDA raycast, camera/target/face/normal info - Screenshot F6 now writes companion .log file with full debug state - Document UDN pitfalls and logical vs physical coordinates in TROUBLESHOOTING.md - Add tools/prepare_textures.py for texture pipeline (ZIP → albedo+height RGBA + normal)	2026-04-01 13:41:06 +02:00
Samuel Bouchet	c2d1a1e0b6	Commit plan and iteration instructions	2026-03-31 20:04:00 +02:00
Samuel Bouchet	8ab908054c	Fix HDR screenshot, reduce sun size, windowed 1080p by default - Add F6 in-app screenshot saving voxelRT_ directly (bypasses Windows HDR) - Shrink sun disc (pow 256), glow (pow 64), and haze (pow 8) for subtler sky - Launch as centered 1920x1080 window instead of maximized	2026-03-31 14:58:44 +02:00
Samuel Bouchet	57ac08f231	Refactor: extract VoxelRTManager, DeferredGPUBuffer, decompose VoxelRenderPath - Extract DeferredGPUBuffer utility (staging→dirty→capacity GPU buffer pattern) - Extract VoxelRTManager from VoxelRenderer (~500 lines: BLAS/TLAS, RT shadows+AO) - Decompose VoxelRenderPath into CameraController, AnimationState, VoxelProfiler - Replace toping std::sort with O(n) counting sort by (type, variant) - Update CLAUDE.md architecture docs to reflect new file structure	2026-03-31 13:46:35 +02:00
Samuel Bouchet	53df73e5e6	fixes after Improving perfs	2026-03-31 08:53:37 +02:00
Samuel Bouchet	0d93cef8f1	GPU profiling + staggered BLAS builds + RT disable during animation - Add comprehensive GPU timestamp queries for all major operations (mesh, smooth mesh, BLAS extract, BLAS build, draw, RT shadows) - Add full-frame profiling: Wicked Render, GPU Wait/Sync, true FPS - Stagger BLAS builds during animation: alternate blocky/smooth per frame, skip toping BLAS entirely (~130ms savings per frame) - Auto-disable RT shadows on F3 animation start (prevents stale shadow artifacts), auto-restore on F3 stop with full BLAS rebuild - Split buildAccelerationStructures() with selective build flags - Result: animation ~24 FPS (CPU-bound on Regenerate 27ms) vs previous 2 FPS (GPU-bound on BLAS Build 1368ms)	2026-03-31 02:21:11 +02:00
Samuel Bouchet	0d3f8200b4	Refactor: remove dead CPU/MDI paths, GPU BLAS compute, 30Hz animation - Remove ~430 lines of dead CPU mesh, MDI, and GPU cull render paths (rebuildMegaBuffer, IndirectDrawArgs, drawCountBuffer, cullShader, etc.) - Add voxelTopingBLASCS.hlsl compute shader replacing 196ms CPU loop for toping BLAS position extraction (<1ms on GPU) - Reduce animation rate from 60Hz to 30Hz (halves CPU regen cost) - Simplify render() to GPU mesh path only (no conditional branches) - Remove benchmark state machine and stale mode strings	2026-03-31 01:43:53 +02:00