bvle-voxels/CLAUDE.md

# BVLE Voxels - Prototype de Moteur Voxel Hybride

## Vue d'ensemble

Prototype de moteur voxel basé sur **Wicked Engine** (MIT, C++17, DX12/Vulkan) pour valider les performances de rendu sur GPU moderne (AMD RDNA 2+ / Nvidia RTX 3060+). Le document de spécification complet est dans `voxel_engine_spec.md` à la racine du projet.

Cible : 60+ fps en 1440p, monde de 512x512x256 voxels visibles.

## Architecture

```
bvle-voxels/
├── CMakeLists.txt              # Build CMake racine
├── engine/                     # Wicked Engine (clone --depth 1, branche main)
│   └── WickedEngine/shaders/voxel/  # Nos shaders copiés ici pour compilation DXC
├── src/
│   ├── voxel/                  # Bibliothèque VoxelEngine (static lib)
│   │   ├── VoxelTypes.h        # Types fondamentaux (VoxelData, PackedQuad, MaterialDesc, ChunkPos)
│   │   ├── VoxelWorld.h/.cpp   # Monde voxel (hashmap de chunks, génération procédurale)
│   │   ├── VoxelMesher.h/.cpp  # Binary Greedy Mesher CPU + SmoothMesher (Naive Surface Nets)
│   │   ├── VoxelRenderer.h/.cpp# Renderer + VoxelRenderPath (sous-classe RenderPath3D)
│   │   └── TopingSystem.h/.cpp # Système de topings (biseaux décoratifs sur faces +Y)
│   └── app/
│       └── main.cpp            # Point d'entrée Win32 + crash handler SEH
├── shaders/                    # Sources HLSL des shaders voxel (copiés dans engine/ au build)
│   ├── voxelCommon.hlsli       # Root signature et CB partagés (inclus par tous les shaders)
│   ├── voxelVS.hlsl            # Vertex shader (vertex pulling, triple-mode: CPU/MDI/GPU mesh)
│   ├── voxelPS.hlsl            # Pixel shader (triplanar + lighting)
│   ├── voxelCullCS.hlsl        # Compute shader frustum+backface cull (Phase 2.3)
│   ├── voxelMeshCS.hlsl        # Compute shader GPU mesher 1×1 (Phase 2.4-2.5)
│   ├── voxelTopingVS.hlsl      # Vertex shader topings (instanced vertex pulling, t4/t5)
│   ├── voxelTopingPS.hlsl      # Pixel shader topings (triplanar + directional lighting)
│   ├── voxelSmoothVS.hlsl      # Vertex shader smooth Surface Nets (vertex pulling, t6)
│   └── voxelSmoothPS.hlsl      # Pixel shader smooth (triplanar + material blending)
└── CLAUDE.md
```

## Build

### Prérequis

- CMake 3.19+ (`winget install Kitware.CMake`)
- Visual Studio 2022 Build Tools (`winget install Microsoft.VisualStudio.2022.BuildTools`)
- Windows SDK 10.0.26100+ (`winget install Microsoft.WindowsSDK.10.0.26100`)

### Commandes

```bash
# Configurer (depuis la racine du projet)
cmake -B build -G "Visual Studio 17 2022" -A x64 -DCMAKE_SYSTEM_VERSION=10.0.26100.0

# Compiler
cmake --build build --config Release --target BVLEVoxels --parallel

# Exécutable produit dans build/Release/BVLEVoxels.exe
```

Le SDK 10.0.26100 est requis car les headers DX12 (`d3dx12_check_feature_support.h`) fournis par Wicked Engine ne sont pas compatibles avec le SDK 22621.

### Post-build automatique (CMakeLists.txt)

Le build copie automatiquement :
1. `dxcompiler.dll` → à côté de l'exe (requis pour la compilation runtime des shaders)
2. `shaders/*.hlsl` → `engine/WickedEngine/shaders/voxel/` (pour que `LoadShader` les trouve via `SHADERSOURCEPATH`)
3. `engine/Content/` → à côté de l'exe (assets Wicked Engine)

## Intégration Wicked Engine

### Backend graphique

Wicked Engine utilise **DX12 par défaut sur Windows**, Vulkan sur Linux. Les shaders sont écrits en **HLSL** et compilés via DXC vers :
- `shaders/hlsl6/*.cso` pour DX12
- `shaders/spirv/*.spv` pour Vulkan

Pour forcer Vulkan sur Windows, passer `"vulkan"` en argument de ligne de commande.

### Point d'entrée et architecture de rendu

`VoxelRenderPath` hérite de `wi::RenderPath3D`. **IMPORTANT** : le rendu voxel utilise ses propres render targets (`voxelRT_`, `voxelDepth_`) et est exécuté dans `Render()` sur un **command list dédié** (`device->BeginCommandList()`). Le résultat est ensuite composité dans `Compose()` via `wi::image::Draw()`.

**NE JAMAIS créer un render pass dans `Compose()`** : cette méthode est appelée à l'intérieur du render pass du swapchain. Imbriquer des render passes est interdit en D3D12 (cause `DXGI_ERROR_INVALID_CALL → device removed`).

Architecture correcte :
```
Render()  → RenderPath3D::Render()     // Wicked rend sa scène
          → device->BeginCommandList() // Nouveau cmd list
          → renderer.render(cmd, ...)  // Notre render pass (clear + draw voxels → voxelRT_)
Compose() → RenderPath3D::Compose()    // Wicked affiche son résultat
          → wi::image::Draw(voxelRT_)  // On overlay nos voxels par-dessus
```

La caméra est gérée manuellement dans `Update()` en écrivant directement `camera->Eye`, `camera->At` (direction LookTo), `camera->Up`.

### APIs Wicked utilisées

| Besoin | API Wicked |
|--------|-----------|
| Clavier WASD | `wi::input::Down(CHARACTER_RANGE_START + offset)` (pas de `KEYBOARD_BUTTON_W`) |
| Souris delta | `wi::input::GetMouseState().delta_position` |
| Cacher curseur | `wi::input::HidePointer(bool)` |
| Shader loading | `wi::renderer::LoadShader()` - compile auto les .hlsl en .cso si absent |
| PSO states | `wi::renderer::GetRasterizerState()` etc. retournent des pointeurs (pas besoin de `&`) |
| Render pass | `RenderPassImage::RenderTarget(texture, loadOp, storeOp, layoutBefore, layoutAfter, subresource=-1)` |
| Font overlay | `wi::font::Params` est un struct - setter les membres un par un |
| Camera | `CameraComponent::At` est une **direction** (utilisé avec `XMMatrixLookToLH`), pas un point cible |
| Buffer create | `device->CreateBuffer(desc, raw_data_ptr, buffer)` — PAS de `SubresourceData` pour les buffers ! |
| Texture create | `device->CreateTexture(desc, subresourceData_ptr, texture)` — utilise `SubresourceData*` (différent de CreateBuffer) |
| Buffer update | `device->UpdateBuffer(buffer, data, cmd, size, offset)` |
| Push constants | `device->PushConstants(data, size, cmd)` — mappés à `register(b999)`, taille fixe 48 bytes (12 × uint32) |
| Command list | `device->BeginCommandList()` — nouveau cmd list pour render passes séparés |
| Render pass | NE JAMAIS imbriquer ! Un seul render pass actif par command list |
| Debug DX12 | Passer `"debugdevice"` en argument pour activer la couche de debug D3D12 |
| Logging | `wi::backlog::post(message, logLevel)` — préférer au logging fichier |

### Shaders custom — PIÈGES IMPORTANTS

Les shaders custom doivent respecter le **binding model de Wicked Engine** :

1. **Root signature obligatoire** : chaque shader DOIT avoir une root signature DX12 intégrée, soit via `#include "globals.hlsli"` (auto), soit via `[RootSignature(MACRO)]` sur le entry point.

2. **Root signature Wicked** (HLSL 6.6+) :
   - `b999` → push constants (12 × uint32 = 48 bytes max)
   - `b0, b1, b2` → CBV root descriptors
   - `t0-t15, u0-u15` → dans une descriptor table partagée
   - `s0-s7` → samplers dynamiques
   - `s100-s109` → static samplers (linear, point, aniso, etc.)

3. **Chemins des shaders** :
   - `SHADERPATH` = `<exe_dir>/shaders/hlsl6/` — où les `.cso` compilés sont stockés
   - `SHADERSOURCEPATH` = `../../engine/WickedEngine/shaders/` — où les `.hlsl` sources sont cherchés
   - Les shaders custom doivent être copiés dans `SHADERSOURCEPATH` (sous-dossier `voxel/`)
   - `LoadShader(stage, shader, "voxel/voxelVS.cso")` → compile `SHADERSOURCEPATH/voxel/voxelVS.hlsl` si `.cso` absent

4. **`dxcompiler.dll` doit être à côté de l'exe** sinon la compilation runtime échoue silencieusement.

5. **CreateBuffer prend `void*`**, pas `SubresourceData*`. L'API texture (`CreateTexture`) prend bien `SubresourceData*`.

6. **Winding des triangles — PIÈGE MAJEUR** :

   Wicked Engine utilise `front_counter_clockwise = true` + `CullMode::BACK` (state `RSTYPE_FRONT`). Malgré cela, les quads voxel doivent utiliser un winding **CW** (clockwise) comme défaut, pas CCW. Confirmé empiriquement via `SV_IsFrontFace` : avec des corners CCW standard, DX12 voit tous les triangles comme **back-facing**.

   La règle pour nos tangent axes U/V :
   - `cross(U,V) = N` (faces +X, -Y, +Z) → corners **CW** pour être front-facing
   - `cross(U,V) ≠ N` (faces -X, +Y, -Z) → corners **CCW** pour être front-facing

   ```
   CW  corners: (0,0)(0,1)(1,0), (1,0)(0,1)(1,1)  ← défaut
   CCW corners: (0,0)(1,0)(0,1), (0,1)(1,0)(1,1)  ← faces 1,2,5
   ```

7. **DrawInstancedIndirectCount — PIÈGE MAJEUR** :

   Les command signatures de Wicked Engine pour `*IndirectCount` incluent un **push constant** (1 × uint32, écrit dans `b999[0]`) AVANT chaque `D3D12_DRAW_ARGUMENTS`. Le stride par draw entry est donc **20 bytes**, pas 16.

   Layout mémoire du buffer d'args indirect :
   ```
   [uint32 pushConstant][uint32 vertexCount][uint32 instanceCount][uint32 startVertex][uint32 startInstance]
        4 bytes                              16 bytes (D3D12_DRAW_ARGUMENTS)
   = 20 bytes par draw entry
   ```

   Le push constant est écrit automatiquement par `ExecuteIndirect` dans `b999[0]` (premier champ de la struct push constants, soit `chunkIndex` dans notre cas). Les autres champs de b999 (quadOffset, flags...) restent tels que définis par le `PushConstants()` appelé avant `DrawInstancedIndirectCount`.

   **En mode MDI, le push constant est utilisé pour packer `chunkIndex | (faceIndex << 16)`**. Le VS décode ces deux valeurs et reconstruit le quadOffset depuis le `GPUChunkInfo` :
   ```hlsl
   chunkIndex = push.chunkIndex & 0xFFFF;
   faceIdx    = push.chunkIndex >> 16;
   quadIndex  = chunkInfo[chunkIndex].quadOffset + faceOffset[faceIdx] + (vertexID / 6);
   ```

   **Source** : `wiGraphicsDevice_DX12.cpp` lignes 3930-3939 — la command signature est créée par PSO avec `D3D12_INDIRECT_ARGUMENT_TYPE_CONSTANT` + `D3D12_INDIRECT_ARGUMENT_TYPE_DRAW`.

8. **SV_VertexID et startVertexLocation — PIÈGE MAJEUR** :

   Avec `ExecuteIndirect` (DrawInstancedIndirectCount), `SV_VertexID` **n'inclut PAS de manière fiable** `startVertexLocation` de `D3D12_DRAW_ARGUMENTS`. Observé sur AMD RDNA 2 (RX 5700 XT) : SV_VertexID commence toujours à 0 pour chaque draw, ignorant startVertexLocation.

   **Solution** : toujours mettre `startVertexLocation = 0` dans les indirect args, et passer l'offset des quads par un autre canal (push constant + GPUChunkInfo lookup). Ne JAMAIS compter sur `startVertexLocation` pour encoder un offset dans le mega-buffer.

9. **Barriers sur buffers indirect — NON NÉCESSAIRES en pratique** :

   Les buffers `Usage::DEFAULT` démarrent en COMMON et décayent vers COMMON après chaque exécution de command list. La promotion implicite COMMON → COPY_DST (via UpdateBuffer) et COMMON → INDIRECT_ARGUMENT (via DrawInstancedIndirectCount) fonctionne sans barriers explicites. C'est le même pattern que les SRV buffers (megaQuadBuffer_, chunkInfoBuffer_) qui passent de COPY_DST à SRV usage sans barrier en Phase 2.1.

   **⚠️ Pour la Phase 2.3 (compute cull)**, des barriers explicites SONT nécessaires :
   - `drawCountBuffer_` : COPY_DST → UAV (après UpdateBuffer zero) puis UAV → INDIRECT_ARGUMENT (après dispatch)
   - `indirectArgsBuffer_` : UNDEFINED → UAV (COMMON après decay, `ResourceState::UNDEFINED = 0` = COMMON en Wicked) puis UAV → INDIRECT_ARGUMENT
   - Wicked Engine appelle `DiscardResource()` quand `state_before == UNDEFINED`, ce qui est OK (le compute écrase les données)

10. **PushConstants après BindComputeShader — PIÈGE MAJEUR** :

    `PushConstants()` dispatche vers `SetGraphicsRoot32BitConstants` ou `SetComputeRoot32BitConstants` selon l'état actif :
    - Si `active_pso != nullptr` → **GRAPHICS** push constants
    - Sinon si `active_cs != nullptr` → **COMPUTE** push constants

    Après `BindComputeShader` + `Dispatch`, `active_cs` reste actif. Appeler `PushConstants` à ce moment écrit dans les push constants **compute**, pas **graphics**. Le vertex shader ne voit jamais la valeur !

    **Règle** : toujours appeler `PushConstants` **APRÈS** `BindPipelineState` (qui set `active_pso`) pour cibler les push constants graphics. L'ordre correct :
    ```cpp
    BindPipelineState(&pso_);   // ← active_pso = &pso_
    PushConstants(&data, ...);  // ← SetGraphicsRoot32BitConstants ✓
    Draw*(...);
    ```

### Diagnostics et debugging

**Crash handler SEH** (`main.cpp`) : `SetUnhandledExceptionFilter` écrit :
- `bvle_crash.log` : stack trace avec symboles + adresses
- `bvle_crash.dmp` : minidump analysable avec Visual Studio
- Nécessite `dbghelp.lib` et build avec symbols (`RelWithDebInfo` ou `Debug`)

**D3D12 Debug Layer** : lancer avec `BVLEVoxels.exe debugdevice` pour activer. Active aussi DRED (Device Removed Extended Data) pour diagnostiquer les GPU hangs.

**Erreurs GPU courantes** :
- `DXGI_ERROR_INVALID_CALL` → render pass imbriqué ou resource state invalide
- `DXGI_ERROR_DEVICE_HUNG` → shader en boucle infinie ou accès mémoire hors limites
- Dialog bloquant avec `messageBox` → vient de `wi::helper::messageBox()`, ne pas confondre avec un crash

**⚠️ Détection de crash GPU depuis CLI (Claude Code)** : les crashs GPU (`DXGI_ERROR_INVALID_CALL`, device removed) affichent une **modale Windows bloquante** via `wi::helper::messageBox()`. `timeout` tue le process sans détecter le crash. Pour détecter correctement :
1. **NE PAS utiliser `timeout`** pour tester — demander à l'utilisateur de lancer manuellement
2. Vérifier `bvle_backlog.txt` après exécution (contient les erreurs DX12)
3. Vérifier `bvle_crash.log` et `bvle_crash.dmp` pour les crashs SEH
4. Lancer avec `debugdevice` pour obtenir les messages de validation D3D12 détaillés dans le backlog
5. Un exit code non-zéro n'est PAS fiable : `timeout` renvoie 124, la modale attend indéfiniment

**Backlog Wicked** : `wi::backlog::SetLogFile("bvle_backlog.txt")` redirige les logs vers un fichier. Touche `~` (tilde) pour toggler la console à l'écran.

### Gestion des resource states DX12 (buffers)

**Wicked Engine ne fait AUCUN tracking automatique d'état pour les buffers.** Les `GPUBarrier::Buffer(buf, before, after)` sont passées directement à D3D12 sans validation. **Le `state_before` DOIT correspondre à l'état DX12 réel, sinon → DXGI_ERROR_INVALID_CALL.**

**Pièges critiques :**
- `UpdateBuffer()` → appelle `CopyBufferRegion` sans aucune barrier. Le buffer **DOIT** être en COPY_DST (ou COMMON pour promotion implicite sur frame 1).
- Après `DrawInstancedIndirectCount`, les buffers indirect restent en **INDIRECT_ARGUMENT**. Appeler `UpdateBuffer` dessus au frame suivant → crash car pas de transition INDIRECT_ARGUMENT → COPY_DST.
- Les buffers créés avec `Usage::DEFAULT` démarrent en état **COMMON** (D3D12). COMMON supporte la promotion implicite vers COPY_DST, SRV, etc. mais **PAS vers UAV**.
- Solution recommandée : **tracker l'état manuellement** avec un `mutable ResourceState` et faire des barriers explicites entre chaque usage.

**Mode debug face-color** : lancer avec `BVLEVoxels.exe debug` pour activer. Génère un monde de test (blocs isolés) et colore chaque face selon sa direction :
- Bright Red / Dark Red = +X / -X
- Bright Green / Dark Green = +Y / -Y
- Bright Blue / Dark Blue = +Z / -Z

## Détails d'implémentation

### VoxelData (16 bits)

```
[15:8] material ID (256 matériaux)
[7:4]  flags (smooth, transparent, emissive, custom)
[3:0]  metadata (orientation, variant)
```

### PackedQuad (64 bits = 8 octets par quad)

```
[5:0]   position X (0-63)
[11:6]  position Y (0-63)
[17:12] position Z (0-63)
[23:18] width (1-32)
[29:24] height (1-32)
[32:30] face (0-5 : +X,-X,+Y,-Y,+Z,-Z)
[40:33] material ID
[48:41] blendMatID (8 bits, matériau voisin pour height-based blending)
[59:49] chunkIndex (11 bits, utilisé par GPU mesh path pour lookup GPUChunkInfo)
[63:60] blendEdges (4 bits : +U(0), -U(1), +V(2), -V(3) — bords avec matériau différent)
```

### Binary Greedy Mesher (CPU, `VoxelMesher.cpp`)

1. **Masques binaires** : pour chaque axe (X,Y,Z), `solid[u][v]` = bitmask 32 bits de voxels solides
2. **Face culling** : `visible = solid & ~(solid >> 1)` pour faces positives (shift adapté par direction), avec lookup cross-chunk aux frontières
3. **Greedy merge** : par tranche de profondeur, grille 2D de material IDs, expansion rectangulaire maximale (largeur puis hauteur)

### Génération procédurale (`VoxelWorld.cpp`)

- Perlin noise 3D (permutation-based, seed configurable)
- fBm 5 octaves pour le heightmap (génération initiale), 2 octaves en animation (perf)
- Caves : `|fbm(x,y,z)| < threshold` en 3D (désactivées en mode animation)
- Matériaux par altitude : sable < 25, herbe 25-70, pierre 70-90, neige > 90
- Chunks générés en Y = 0..7 (hauteur max 256 blocs)
- Animation 60 Hz : `regenerateAnimated()` parallélise génération + pack GPU fusionnés via `wi::jobsystem`

### Renderer (`VoxelRenderer.cpp`)

- **Triple-mode VS** : CPU path (`flags=0`), MDI path (`flags & 1`), GPU mesh path (`flags & 2`)
- **GPU mesh path (actif par défaut)** : compute shader `voxelMeshCS` génère les quads 1×1, `DrawInstanced` avec readback 1-frame-delay du compteur atomique
- **Mega-buffer** : tous les quads de tous les chunks dans un seul `StructuredBuffer<PackedQuad>` (2M quads, 16 MB) — utilisé en mode CPU/MDI
- **Vertex pulling** : le VS lit le quad buffer via `SV_VertexID`, pas de vertex buffer classique
- **Pipeline** : PSO avec `RSTYPE_FRONT` (backface cull), `DSSTYPE_DEFAULT` (depth test), `BSTYPE_OPAQUE`
- **Per-chunk info** : `StructuredBuffer<GPUChunkInfo>` (80 bytes/chunk) avec worldPos, quadOffset, faceOffsets[6], faceCounts[6]
- **Push constants** (b999, 48 bytes) : chunkIndex + quadOffset + flags (bit 0 = MDI mode, bit 1 = GPU mesh mode)
- **CPU culling** : frustum AABB (`wi::primitive::Frustum`) + backface par face group (camera vs AABB) — mode MDI uniquement
- **MDI rendering** (Phase 2.2) : un seul `DrawInstancedIndirectCount` remplace la boucle per-chunk. Push constant = `chunkIndex | (faceIndex << 16)`, le VS reconstruit quadOffset depuis GPUChunkInfo
- **Per-face-group draws** (Phase 2.1 fallback) : jusqu'à 6 `DrawInstanced` par chunk visible
- **Textures** : texture array 2D (256x256, 5 layers) générée procéduralement, triplanar mapping dans le PS. Alpha = heightmap procédural pour blending
- **Height-based blending** (Phase 3) : le PS lit directement `voxelDataBuffer` (SRV t3) pour lookup des matériaux voisins per-pixel. Winner-takes-all : le matériau avec la heightmap la plus haute gagne 100%. Transitions nettes mais forme organique dessinée par les heightmaps. Corner attenuation subtractive (param=0.80). Mode debug blend (F4)
- **Render targets propres** : `voxelRT_` (R8G8B8A8) + `voxelDepth_` (D32_FLOAT), rendu dans `Render()` sur cmd list dédié
- **Composition** : overlay sur le swapchain via `wi::image::Draw()` dans `Compose()`
- **Stats overlay** : affichage HUD des chunks/quads/draw calls via `wi::font::Draw`
- **Frustum planes** : extraction Gribb-Hartmann dans le CB pour le compute shader de cull
- **GPU timestamp queries** : 6 slots (cull begin/end, draw begin/end, mesh begin/end)
- **CPU profiling** : `ProfileAccum` avec moyennes toutes les 5s dans le backlog (Regenerate, UpdateMeshes, VoxelPack, GPU Upload, GPU Dispatch, Render, Frame)

## Phases de développement (spec)

### Phase 1 - Setup et meshing de base [FAIT]

- Fork Wicked Engine, structure de modules
- VoxelWorld avec génération procédurale Perlin (rayon 4 chunks = ~150 chunks)
- Binary Greedy Mesher CPU (~300K quads pour le monde initial)
- Rendu basique avec vertex pulling et texture array
- Caméra libre de navigation (WASD + souris)
- Crash handler SEH avec stack trace symbolique

### Phase 2 - Performance GPU [FAIT]

Découpée en sous-phases pour isoler les sources de bugs potentiels :

#### Phase 2.1 - Mega-buffer + CPU cull + per-face DrawInstanced [FAIT]

- Mega-buffer : tous les quads dans un seul SRV, packés par chunk
- Tri par face group dans le mesher (`faceOffsets[6]`, `faceCounts[6]`)
- CPU frustum culling (AABB vs `wi::primitive::Frustum`)
- CPU backface culling par face group (camera.Eye vs chunk AABB)
- Per-face-group `DrawInstanced` (max 6 draws par chunk visible)
- `GPUChunkInfo` StructuredBuffer pour lookup VS

#### Phase 2.2 - CPU-filled indirect args + DrawInstancedIndirectCount [FAIT]

- Le CPU remplit `IndirectDrawArgs[]` avec la même logique que 2.1 (frustum + backface)
- Le CPU écrit le draw count
- Upload des deux buffers vers le GPU (sans barriers explicites — promotion implicite)
- Un seul `DrawInstancedIndirectCount` remplace la boucle per-chunk
- Le VS décode `chunkIndex | (faceIndex << 16)` depuis le push constant et reconstruit le quadOffset
- **Intérêt** : teste le MDI rendering SANS compute shader (isole les problèmes de barriers)
- **Pièges résolus** :
  - `IndirectDrawArgs` fait 20 bytes (pas 16) — voir point 7 dans "Shaders custom — PIÈGES IMPORTANTS"
  - `SV_VertexID` n'inclut pas `startVertexLocation` avec ExecuteIndirect — voir point 8
  - Pas de barriers explicites nécessaires — voir point 9

#### Phase 2.3 - GPU compute culling [FAIT]

- Le compute shader `voxelCullCS.hlsl` remplace le CPU pour remplir les indirect args
- Barriers DX12 : UNDEFINED → UAV (pre-compute) → INDIRECT_ARGUMENT (post-compute)
- GPU timestamp queries actifs (GPU Cull ~0.006 ms pour 168 chunks)
- **Pièges résolus** :
  - `PushConstants` DOIT être appelé APRÈS `BindPipelineState` — voir point 10
  - Compute shader corrigé : push constant packing + startVertexLocation=0 — voir points 7-8
  - `ResourceState::UNDEFINED` = COMMON en Wicked (valeur 0), déclenche `DiscardResource()` — OK pour les buffers réécrits

#### Phase 2.4 - GPU compute mesher (benchmark) [FAIT]

- Le compute shader `voxelMeshCS.hlsl` fait le meshing 1×1 sur GPU (1 thread par voxel, 8×8×8 thread groups)
- Benchmark automatique au premier frame après génération du monde (mode CPU fallback)
- Résultats (168 chunks, Ryzen 7 3700X + RX 5700 XT) :
  - CPU greedy: 277 ms, 358K quads → greedy merge réduit les quads de 6.8×
  - GPU baseline (1×1): 5.3 ms, 2.43M quads → 52× plus rapide que CPU
- GPU greedy merge non implémenté (pourrait combiner vitesse GPU + réduction de quads)
- Le benchmark est one-shot : state machine IDLE → DISPATCH → READBACK → DONE

#### Phase 2.5 - GPU meshing production + optimisations perf [FAIT]

- **GPU meshing en production** : remplace le CPU greedy mesher comme pipeline par défaut
  - `voxelMeshCS.hlsl` : chunkIndex encodé dans les bits [63:49] de chaque quad (11 bits)
  - `voxelVS.hlsl` : mode `flags & 2` extrait le chunkIndex depuis le quad, lookup `GPUChunkInfo`
  - `VoxelRenderer` : dispatch compute shader → barrier UAV→SRV → `DrawInstanced`
  - Readback 1-frame-delay du compteur atomique pour le vertex count
  - Le `gpuQuadBuffer_` a les bind flags `UNORDERED_ACCESS | SHADER_RESOURCE`
- **Optimisations perf CPU** (profilées et mesurées) :
  - **VoxelPack par memcpy** : `sizeof(VoxelData) == 2`, donc `voxels[]` est directement compatible avec le format GPU (uint16 pairs). Remplace la boucle bit-shift (28ms → <1ms)
  - **Cache dirty** : `packedVoxelCache_` ne se repack que quand les chunks changent, pas chaque frame
  - **Fused regenerate+pack** : `regenerateAnimated()` accepte un pointeur de destination, chaque job parallèle fait generate + memcpy dans le même thread. Élimine la double itération du hashmap et le pack séquentiel (6ms → 0ms)
  - **Skip GPU dispatch** : `gpuMeshDirty_` flag empêche le re-dispatch/upload quand rien n'a changé
  - **Upload conditionnel** : `chunkInfoBuffer_` ne se re-upload que quand `chunkInfoDirty_`
  - **Animation allégée** : 2 octaves fBm (au lieu de 5) + pas de caves en mode animation (54ms → 8ms)
- **Résultats finaux** (171 chunks, Ryzen 7 3700X + RX 5700 XT, animation 60 Hz) :
  - Regenerate: 8.7ms (parallèle, 2 octaves)
  - VoxelPack: 0ms (fusionné dans regenerate)
  - GPU Upload: 4.5ms (~11 MB voxel data)
  - GPU Dispatch: 0.1ms (171 × 64 thread groups)
  - Frame total: ~9ms → **80-110 FPS** avec animation terrain 60 Hz
  - Sans animation: **700+ FPS**

### Phase 3 - Texture blending [FAIT]

Approche **PS-based** : le pixel shader lit directement les données voxel (pas de pré-encodage dans les quads). Voir `blending_experiments.md` pour le détail des itérations.

- **Heightmaps procéduraux** dans le canal alpha de chaque texture de matériau (5 matériaux, paramètres freq/contrast différents)
- **PS neighbor lookup** (`voxelPS.hlsl`) : bind `voxelDataBuffer` à `t3`, `chunkInfoBuffer` à `t2`. Lit les matériaux voisins per-pixel via `readVoxelMat(coord, chunkIdx)`
- **Stair priority** : pour chaque bord, vérifie `pos + edgeDir + normalDir` en premier (le bloc qui masque visuellement le coin), puis fallback `pos + edgeDir`
- **2 axes indépendants** : U et V sont traités séparément avec nearest-edge detection via `sign(faceFrac - 0.5)`
- **Winner-takes-all heightmap** : `mainScore = h_main + bias`, `neighScore = h_neigh - bias`, `bias = 0.5 - weight`. Le matériau avec le score le plus haut gagne à 100%. Sharpness=16 pour anti-aliasing
- **Corner attenuation subtractive** : `xAdj = xEdge - saturate(yEdge - 0.80)` — réduit le blend aux coins où les deux axes se croisent
- **Zone de blend** : 0.25 voxels depuis chaque bord (50% de la face)
- **CB** : `blendEnabled` (float, 1.0 en GPU mesh path, 0.0 sinon) + `debugBlend` (float, toggle F4)
- **VS** (`voxelVS.hlsl`) : passe `chunkIndex` (nointerpolation uint) au PS pour les lookups voxel
- **GPU mesher** (`voxelMeshCS.hlsl`) : simplifié (pas de blend computation), encode seulement `chunkIndex` dans les bits [27:17] du quad
- **Mode debug** (F4) : visualise les zones de blend (rouge=U, bleu=V, vert=pas de blend, rouge vif=data mismatch)
- **Fonctionne uniquement en GPU mesh path** (1×1 quads) ; CPU/MDI paths ont `blendEnabled=0`

### Phase 4 - Toping [EN COURS]

Système de biseaux décoratifs (« topings ») sur les faces +Y exposées pour adoucir les transitions entre blocs.

#### Phase 4.1 - Infrastructure TopingSystem [FAIT]

- **TopingSystem** (`TopingSystem.h/.cpp`) : data structures + mesh generation + instance collection
- **4-bit adjacency bitmask** : pour chaque face +Y exposée, vérifie 4 voisins cardinaux (±X, ±Z) pour même matériau avec +Y exposée → 16 variantes
- **Priority-based adjacency** : `TopingDef.priority` détermine quel toping cède aux frontières de matériaux. Grass (priority=1) génère des biseaux par-dessus stone (priority=0)
- **Mesh par matériau** :
  - **Stone** : wedge cross-section (outer wall + slope) + corner fills + caps aux terminaisons
  - **Grass** : brins d'herbe individuels groupés en touffes, 2 segments courbés, double-sided
- ~191K instances pour ~170 chunks

#### Phase 4.2 - Rendu GPU + shading végétal [FAIT]

- **Shaders dédiés** : `voxelTopingVS.hlsl` (vertex pulling instancié) + `voxelTopingPS.hlsl` (shading par matériau)
- **Vertex pulling** : `StructuredBuffer<TopingVertex>` (t4) + `StructuredBuffer<float3>` (t5 instances)
- **Push constants** : `vertexOffset`, `instanceOffset`, `materialID` réutilisent les 3 premiers champs de b999
- **Per-group DrawInstanced** : instances triées par (type, variant), un `DrawInstanced` par groupe contigu
- **Render pass séparé** avec `LoadOp::LOAD` : topings rendus après voxels, préservent RT et depth
- **PSO** : même rasterizer/depth/blend que les voxels (`RSTYPE_FRONT`, `DSSTYPE_DEFAULT`, `BSTYPE_OPAQUE`)
- **Shading végétal stylisé** (inspiré Airborn Trees, `voxelTopingPS.hlsl`) :
  - **Half-Lambert wrap lighting** : `(N·L * 0.5 + 0.5)²` — enveloppe la lumière, pas de terminator dur
  - **Translucency** : `dot(V, L) * (1 - NdotL) * 0.4` — lumière traversant les brins fins à contre-jour
  - **Ambient chaud** : `(0.22, 0.28, 0.20)` — plus lumineux et verdâtre que l'ambient stone
  - **Stone** : Lambert classique identique aux voxels (branchement sur `materialID == 3`)
- **Génération de touffes d'herbe** (`TopingSystem.cpp`) :
  - **Tufts** : clusters de 3–9 brins partageant un centre commun (scatter ±0.03)
  - **Position des touffes** : hash-driven le long du bord + inset quadratique 0.0–0.30 du bord
  - **Par-tuft personality** : heightScale (0.20–1.0), leanScale (0.3–1.8), blade count (3–9)
  - **Par-brin variety** : hauteur, largeur, angle (±55° fan + jitter), courbure (midLeanRatio 0.08–0.43)
  - **Hash déterministe** : `hashF(a,b,c)` golden-ratio based pour reproductibilité
- **Stone corner fills** : triangle de pente diagonal aux coins où deux bords ouverts se rejoignent
- **Stone caps** : triangle fermant la section du biseau aux terminaisons de strip
- **Pièges résolus** :
  - **Winding CW** : `emitTri()` auto-corrige le winding via `dot(geom, desired) > 0` → swap B↔C
  - **Slope normal = inward + up** : utiliser `(e.ix, e.iz)`, PAS `(e.nx, e.nz)`
  - **sunDirection** : `L = normalize(-sunDirection.xyz)` (direction de voyage → direction vers le soleil)

#### Phase 4.3 - Polish et extensions [A FAIRE]

- Plus de types de topings (neige, mousse, etc.)
- LOD : supprimer les topings à distance
- Animation subtile (vent sur l'herbe via vertex shader)
- Optimisation : compute shader pour le instance collection

### Phase 5 - Rendu smooth [EN COURS]

#### Phase 5.1 - Naive Surface Nets CPU [FAIT]

- **Algorithme** : Naive Surface Nets (dual contouring simplifié) dans `SmoothMesher::meshChunk()`
- **SDF binaire** : solid = -1, empty = +1 (pas de distance field continu)
- **Vertex placement** : centroïde des edge crossings pour chaque cellule à la surface
- **Matériaux smooth** : SmoothStone (mat 6, `FLAG_SMOOTH`) et Snow (mat 5, `FLAG_SMOOTH`)
- **Matériaux blocky** : Stone (mat 3), Grass (mat 1), Dirt (mat 2), Sand (mat 4)
- **SmoothVertex** (32 bytes) : position, face normal, materialID, secondaryMat, blendWeight, chunkIndex
- **Shaders dédiés** : `voxelSmoothVS.hlsl` (vertex pulling t6) + `voxelSmoothPS.hlsl` (triplanar + blending)
- **Render pass séparé** avec `LoadOp::LOAD` : smooth rendu après voxels+topings, préserve RT et depth

**Cross-chunk connectivity** :
- **PAD=2** dans la grille SDF pour accéder aux cellules [-1..CHUNK_SIZE]
- **Vertex range étendu** : `[-1, CHUNK_SIZE)` au lieu de `[0, CHUNK_SIZE)` — les cellules au bord du chunk voisin génèrent des vertices
- **Canonical ownership** : chaque edge est émise par un seul chunk (celui contenant le grid point inférieur), pas de duplication

**Smooth↔blocky boundary** :
- **`hasSmooth` filter** : ne génère des vertices que si au moins un coin de la cellule est un voxel smooth (évite le débordement sur territoire blocky)
- **Per-axis boundary clamping** : les vertices aux frontières smooth↔blocky sont clampés vers la grille entière (empêche le mesh smooth de dépasser sur les faces blocky)
- **GPU mesher** : les voxels smooth sont traités comme solides dans `isNeighborAir()` — les faces blocky ne sont pas émises vers les voxels smooth (le mesh smooth couvre la frontière)

**Face normals — PIÈGES MAJEURS** :
- **Face normals, pas SDF gradient** : le SDF binaire donne des gradients à 45° aux marches, causant du stretching triplanar. Les face normals (cross product des edges du triangle) sont géométriquement correctes.
- **Orientation par axe de l'edge** : chaque quad vient d'une edge X, Y ou Z. La direction `solid→empty` est connue. On vérifie que la composante de la face normal sur cet axe a le bon signe, sinon flip.
- **Y-axis winding inversé** : les sharing cells Y sont arrangées différemment de X et Z. Le winding naturel du quad Y est opposé → `if (axis == 1) useWindingA = !useWindingA;`
- **SDF gradient dot product** : NE PAS utiliser pour orienter les normals (échoue quand le gradient est nul ou ambigu avec SDF binaire)
- **Centroid SDF sampling** : NE PAS utiliser non plus (les deux côtés arrondissent souvent au même voxel)

**Material blending** :
- **Deux matériaux par vertex** : primaryMat (smooth-only counts, évite subsurface bleed) + secondaryMat (all counts, inclut blocky pour le blending aux frontières)
- **blendWeight** : uint8 0-255, ratio du secondaire dans le vote des 8 corners
- **PS** : `lerp(primaryColor, secondaryColor, blendWeight)` entre deux samplings triplanar

#### Phase 5.2 - Optimisations et polish [A FAIRE]

- SDF lissé (distance field approximatif au lieu de binaire ±1)
- Smooth normals (vertex normals au lieu de face normals pour surfaces lisses)
- GPU compute Surface Nets (compute shader au lieu de CPU)
- LOD : réduction de triangles à distance

### Phase 6 - Ray tracing hybride [A FAIRE]

- BLAS par chunk (depuis le mesh greedy), TLAS par frame
- RT Shadows via ray queries (compute shader)
- RT AO (4-8 rayons, courte portée)
- Fallback shadow maps / SSAO si RT non disponible

## Métriques cibles et résultats

| Métrique | Cible | Résultat (Ryzen 7 3700X + RX 5700 XT) |
|----------|-------|---------------------------------------|
| FPS 1440p | > 60 fps | ✅ 80-110 FPS (anim 60Hz), 700+ FPS (statique) |
| Meshing GPU | < 200 µs/chunk | ✅ ~0.6 µs/chunk (0.1ms / 171 chunks) |
| Re-mesh complet | < 16ms | ✅ ~13ms (regen 8.7ms + upload 4.5ms) |
| Mémoire GPU | < 500 Mo | ✅ ~30 Mo (11 MB voxels + 16 MB quads + buffers) |
| RT shadows + AO | < 4ms en 1440p | ⏳ Phase 6 |
| Draw calls | < 100 | ✅ 1 (GPU mesh) ou 1 (MDI) |

## Conventions

- Namespaces : tout le code voxel est dans `namespace voxel`
- Chunks : 32x32x32, configurable via `CHUNK_SIZE`
- Coordonnées : Y = haut, monde infini en X/Z, hashmap sparse
- Matériaux : palette de 256, index 0 = air (vide), 1=grass, 2=dirt, 3=stone (blocky), 4=sand, 5=snow (smooth), 6=smoothstone (smooth)
- Faces : 0=+X, 1=-X, 2=+Y, 3=-Y, 4=+Z, 5=-Z
- Smooth flag : `FLAG_SMOOTH = 0x1` dans VoxelData flags — active Surface Nets au lieu du rendu blocky
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
+								# BVLE Voxels - Prototype de Moteur Voxel Hybride
 								## Vue d'ensemble
-												Phase 2.2: MDI rendering with CPU-filled indirect args

Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.

Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
  quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)

Also adds voxel_engine_spec.md and updates references from .docx to .md.

											
										
										
											2026-03-25 22:07:22 +01:00
+								Prototype de moteur voxel basé sur **Wicked Engine** (MIT, C++17, DX12/Vulkan) pour valider les performances de rendu sur GPU moderne (AMD RDNA 2+ / Nvidia RTX 3060+). Le document de spécification complet est dans `voxel_engine_spec.md` à la racine du projet.
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
 								Cible : 60+ fps en 1440p, monde de 512x512x256 voxels visibles.
 								## Architecture
 								```
 								bvle-voxels/
 								├── CMakeLists.txt              # Build CMake racine
 								├── engine/                     # Wicked Engine (clone --depth 1, branche main)
 								│   └── WickedEngine/shaders/voxel/  # Nos shaders copiés ici pour compilation DXC
 								├── src/
 								│   ├── voxel/                  # Bibliothèque VoxelEngine (static lib)
 								│   │   ├── VoxelTypes.h        # Types fondamentaux (VoxelData, PackedQuad, MaterialDesc, ChunkPos)
 								│   │   ├── VoxelWorld.h/.cpp   # Monde voxel (hashmap de chunks, génération procédurale)
-												Phase 5.1: Naive Surface Nets smooth rendering

Implement CPU-side Naive Surface Nets for smooth voxel surfaces (SmoothStone,
Snow) coexisting with blocky voxels (Grass, Dirt, Stone, Sand).

Key features:
- SmoothMesher with binary SDF, centroid vertex placement, per-axis boundary
  clamping to align with blocky grid at smooth↔blocky transitions
- Cross-chunk connectivity: PAD=2 SDF grid, vertex range [-1, CHUNK_SIZE),
  canonical edge ownership (no duplicate triangles, no z-fighting)
- Face normals oriented by edge axis+sign (robust with binary SDF, unlike
  SDF gradient dot or centroid sampling approaches)
- Y-axis winding fix: sharing cells have different spatial arrangement,
  requiring opposite winding from X and Z axes
- GPU mesher treats smooth neighbors as solid (no blocky faces toward smooth)
- Material blending: primary (smooth-only) + secondary (all counts) per vertex
- Dedicated shaders: voxelSmoothVS (vertex pulling t6) + voxelSmoothPS
  (triplanar + lerp blending between two materials)
- Separate render pass with LoadOp::LOAD after voxels+topings
- New materials: SmoothStone (mat 6), blocky Stone (mat 3) and Dirt patches
  added to world generation for boundary testing

											
										
										
											2026-03-27 13:03:55 +01:00
+								│   │   ├── VoxelMesher.h/.cpp  # Binary Greedy Mesher CPU + SmoothMesher (Naive Surface Nets)
-												Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes

Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.

											
										
										
											2026-03-26 17:47:08 +01:00
+								│   │   ├── VoxelRenderer.h/.cpp# Renderer + VoxelRenderPath (sous-classe RenderPath3D)
 								│   │   └── TopingSystem.h/.cpp # Système de topings (biseaux décoratifs sur faces +Y)
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
+								│   └── app/
 								│       └── main.cpp            # Point d'entrée Win32 + crash handler SEH
 								├── shaders/                    # Sources HLSL des shaders voxel (copiés dans engine/ au build)
-												Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)

Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static

											
										
										
											2026-03-26 09:05:52 +01:00
+								│   ├── voxelCommon.hlsli       # Root signature et CB partagés (inclus par tous les shaders)
 								│   ├── voxelVS.hlsl            # Vertex shader (vertex pulling, triple-mode: CPU/MDI/GPU mesh)
 								│   ├── voxelPS.hlsl            # Pixel shader (triplanar + lighting)
 								│   ├── voxelCullCS.hlsl        # Compute shader frustum+backface cull (Phase 2.3)
-												Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes

Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.

											
										
										
											2026-03-26 17:47:08 +01:00
+								│   ├── voxelMeshCS.hlsl        # Compute shader GPU mesher 1×1 (Phase 2.4-2.5)
 								│   ├── voxelTopingVS.hlsl      # Vertex shader topings (instanced vertex pulling, t4/t5)
-												Phase 5.1: Naive Surface Nets smooth rendering

Implement CPU-side Naive Surface Nets for smooth voxel surfaces (SmoothStone,
Snow) coexisting with blocky voxels (Grass, Dirt, Stone, Sand).

Key features:
- SmoothMesher with binary SDF, centroid vertex placement, per-axis boundary
  clamping to align with blocky grid at smooth↔blocky transitions
- Cross-chunk connectivity: PAD=2 SDF grid, vertex range [-1, CHUNK_SIZE),
  canonical edge ownership (no duplicate triangles, no z-fighting)
- Face normals oriented by edge axis+sign (robust with binary SDF, unlike
  SDF gradient dot or centroid sampling approaches)
- Y-axis winding fix: sharing cells have different spatial arrangement,
  requiring opposite winding from X and Z axes
- GPU mesher treats smooth neighbors as solid (no blocky faces toward smooth)
- Material blending: primary (smooth-only) + secondary (all counts) per vertex
- Dedicated shaders: voxelSmoothVS (vertex pulling t6) + voxelSmoothPS
  (triplanar + lerp blending between two materials)
- Separate render pass with LoadOp::LOAD after voxels+topings
- New materials: SmoothStone (mat 6), blocky Stone (mat 3) and Dirt patches
  added to world generation for boundary testing

											
										
										
											2026-03-27 13:03:55 +01:00
+								│   ├── voxelTopingPS.hlsl      # Pixel shader topings (triplanar + directional lighting)
 								│   ├── voxelSmoothVS.hlsl      # Vertex shader smooth Surface Nets (vertex pulling, t6)
 								│   └── voxelSmoothPS.hlsl      # Pixel shader smooth (triplanar + material blending)
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
+								└── CLAUDE.md
 								```
 								## Build
 								### Prérequis
 								- CMake 3.19+ (`winget install Kitware.CMake`)
 								- Visual Studio 2022 Build Tools (`winget install Microsoft.VisualStudio.2022.BuildTools`)
 								- Windows SDK 10.0.26100+ (`winget install Microsoft.WindowsSDK.10.0.26100`)
 								### Commandes
 								```bash
 								# Configurer (depuis la racine du projet)
 								cmake -B build -G "Visual Studio 17 2022" -A x64 -DCMAKE_SYSTEM_VERSION=10.0.26100.0
 								# Compiler
 								cmake --build build --config Release --target BVLEVoxels --parallel
 								# Exécutable produit dans build/Release/BVLEVoxels.exe
 								```
 								Le SDK 10.0.26100 est requis car les headers DX12 (`d3dx12_check_feature_support.h`) fournis par Wicked Engine ne sont pas compatibles avec le SDK 22621.
 								### Post-build automatique (CMakeLists.txt)
 								Le build copie automatiquement :
 . `dxcompiler.dll` → à côté de l'exe (requis pour la compilation runtime des shaders)
 . `shaders/*.hlsl` → `engine/WickedEngine/shaders/voxel/` (pour que `LoadShader` les trouve via `SHADERSOURCEPATH`)
 . `engine/Content/` → à côté de l'exe (assets Wicked Engine)
 								## Intégration Wicked Engine
 								### Backend graphique
 								Wicked Engine utilise **DX12 par défaut sur Windows**, Vulkan sur Linux. Les shaders sont écrits en **HLSL** et compilés via DXC vers :
 								- `shaders/hlsl6/*.cso` pour DX12
 								- `shaders/spirv/*.spv` pour Vulkan
 								Pour forcer Vulkan sur Windows, passer `"vulkan"` en argument de ligne de commande.
 								### Point d'entrée et architecture de rendu
 								`VoxelRenderPath` hérite de `wi::RenderPath3D`. **IMPORTANT** : le rendu voxel utilise ses propres render targets (`voxelRT_`, `voxelDepth_`) et est exécuté dans `Render()` sur un **command list dédié** (`device->BeginCommandList()`). Le résultat est ensuite composité dans `Compose()` via `wi::image::Draw()`.
 								**NE JAMAIS créer un render pass dans `Compose()`** : cette méthode est appelée à l'intérieur du render pass du swapchain. Imbriquer des render passes est interdit en D3D12 (cause `DXGI_ERROR_INVALID_CALL → device removed`).
 								Architecture correcte :
 								```
 								Render()  → RenderPath3D::Render()     // Wicked rend sa scène
 								          → device->BeginCommandList() // Nouveau cmd list
 								          → renderer.render(cmd, ...)  // Notre render pass (clear + draw voxels → voxelRT_)
 								Compose() → RenderPath3D::Compose()    // Wicked affiche son résultat
 								          → wi::image::Draw(voxelRT_)  // On overlay nos voxels par-dessus
 								```
 								La caméra est gérée manuellement dans `Update()` en écrivant directement `camera->Eye`, `camera->At` (direction LookTo), `camera->Up`.
 								### APIs Wicked utilisées
 								| Besoin | API Wicked |
 								|--------|-----------|
 								| Clavier WASD | `wi::input::Down(CHARACTER_RANGE_START + offset)` (pas de `KEYBOARD_BUTTON_W`) |
 								| Souris delta | `wi::input::GetMouseState().delta_position` |
 								| Cacher curseur | `wi::input::HidePointer(bool)` |
 								| Shader loading | `wi::renderer::LoadShader()` - compile auto les .hlsl en .cso si absent |
 								| PSO states | `wi::renderer::GetRasterizerState()` etc. retournent des pointeurs (pas besoin de `&`) |
 								| Render pass | `RenderPassImage::RenderTarget(texture, loadOp, storeOp, layoutBefore, layoutAfter, subresource=-1)` |
 								| Font overlay | `wi::font::Params` est un struct - setter les membres un par un |
 								| Camera | `CameraComponent::At` est une **direction** (utilisé avec `XMMatrixLookToLH`), pas un point cible |
 								| Buffer create | `device->CreateBuffer(desc, raw_data_ptr, buffer)` — PAS de `SubresourceData` pour les buffers ! |
 								| Texture create | `device->CreateTexture(desc, subresourceData_ptr, texture)` — utilise `SubresourceData*` (différent de CreateBuffer) |
 								| Buffer update | `device->UpdateBuffer(buffer, data, cmd, size, offset)` |
 								| Push constants | `device->PushConstants(data, size, cmd)` — mappés à `register(b999)`, taille fixe 48 bytes (12 × uint32) |
 								| Command list | `device->BeginCommandList()` — nouveau cmd list pour render passes séparés |
 								| Render pass | NE JAMAIS imbriquer ! Un seul render pass actif par command list |
 								| Debug DX12 | Passer `"debugdevice"` en argument pour activer la couche de debug D3D12 |
 								| Logging | `wi::backlog::post(message, logLevel)` — préférer au logging fichier |
 								### Shaders custom — PIÈGES IMPORTANTS
 								Les shaders custom doivent respecter le **binding model de Wicked Engine** :
 . **Root signature obligatoire** : chaque shader DOIT avoir une root signature DX12 intégrée, soit via `#include "globals.hlsli"` (auto), soit via `[RootSignature(MACRO)]` sur le entry point.
 . **Root signature Wicked** (HLSL 6.6+) :
 								   - `b999` → push constants (12 × uint32 = 48 bytes max)
 								   - `b0, b1, b2` → CBV root descriptors
 								   - `t0-t15, u0-u15` → dans une descriptor table partagée
 								   - `s0-s7` → samplers dynamiques
 								   - `s100-s109` → static samplers (linear, point, aniso, etc.)
 . **Chemins des shaders** :
 								   - `SHADERPATH` = `<exe_dir>/shaders/hlsl6/` — où les `.cso` compilés sont stockés
 								   - `SHADERSOURCEPATH` = `../../engine/WickedEngine/shaders/` — où les `.hlsl` sources sont cherchés
 								   - Les shaders custom doivent être copiés dans `SHADERSOURCEPATH` (sous-dossier `voxel/`)
 								   - `LoadShader(stage, shader, "voxel/voxelVS.cso")` → compile `SHADERSOURCEPATH/voxel/voxelVS.hlsl` si `.cso` absent
 . **`dxcompiler.dll` doit être à côté de l'exe** sinon la compilation runtime échoue silencieusement.
 . **CreateBuffer prend `void*`**, pas `SubresourceData*`. L'API texture (`CreateTexture`) prend bien `SubresourceData*`.
 . **Winding des triangles — PIÈGE MAJEUR** :
 								   Wicked Engine utilise `front_counter_clockwise = true` + `CullMode::BACK` (state `RSTYPE_FRONT`). Malgré cela, les quads voxel doivent utiliser un winding **CW** (clockwise) comme défaut, pas CCW. Confirmé empiriquement via `SV_IsFrontFace` : avec des corners CCW standard, DX12 voit tous les triangles comme **back-facing**.
 								   La règle pour nos tangent axes U/V :
 								   - `cross(U,V) = N` (faces +X, -Y, +Z) → corners **CW** pour être front-facing
 								   - `cross(U,V) ≠ N` (faces -X, +Y, -Z) → corners **CCW** pour être front-facing
 								   ```
 								   CW  corners: (0,0)(0,1)(1,0), (1,0)(0,1)(1,1)  ← défaut
 								   CCW corners: (0,0)(1,0)(0,1), (0,1)(1,0)(1,1)  ← faces 1,2,5
 								   ```
-												Phase 2.2: MDI rendering with CPU-filled indirect args

Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.

Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
  quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)

Also adds voxel_engine_spec.md and updates references from .docx to .md.

											
										
										
											2026-03-25 22:07:22 +01:00
+. **DrawInstancedIndirectCount — PIÈGE MAJEUR** :
 								   Les command signatures de Wicked Engine pour `*IndirectCount` incluent un **push constant** (1 × uint32, écrit dans `b999[0]`) AVANT chaque `D3D12_DRAW_ARGUMENTS`. Le stride par draw entry est donc **20 bytes**, pas 16.
 								   Layout mémoire du buffer d'args indirect :
 								   ```
 								   [uint32 pushConstant][uint32 vertexCount][uint32 instanceCount][uint32 startVertex][uint32 startInstance]
 bytes                              16 bytes (D3D12_DRAW_ARGUMENTS)
 								   = 20 bytes par draw entry
 								   ```
 								   Le push constant est écrit automatiquement par `ExecuteIndirect` dans `b999[0]` (premier champ de la struct push constants, soit `chunkIndex` dans notre cas). Les autres champs de b999 (quadOffset, flags...) restent tels que définis par le `PushConstants()` appelé avant `DrawInstancedIndirectCount`.
 								   **En mode MDI, le push constant est utilisé pour packer `chunkIndex | (faceIndex << 16)`**. Le VS décode ces deux valeurs et reconstruit le quadOffset depuis le `GPUChunkInfo` :
 								   ```hlsl
 								   chunkIndex = push.chunkIndex & 0xFFFF;
 								   faceIdx    = push.chunkIndex >> 16;
 								   quadIndex  = chunkInfo[chunkIndex].quadOffset + faceOffset[faceIdx] + (vertexID / 6);
 								   ```
 								   **Source** : `wiGraphicsDevice_DX12.cpp` lignes 3930-3939 — la command signature est créée par PSO avec `D3D12_INDIRECT_ARGUMENT_TYPE_CONSTANT` + `D3D12_INDIRECT_ARGUMENT_TYPE_DRAW`.
 . **SV_VertexID et startVertexLocation — PIÈGE MAJEUR** :
 								   Avec `ExecuteIndirect` (DrawInstancedIndirectCount), `SV_VertexID` **n'inclut PAS de manière fiable** `startVertexLocation` de `D3D12_DRAW_ARGUMENTS`. Observé sur AMD RDNA 2 (RX 5700 XT) : SV_VertexID commence toujours à 0 pour chaque draw, ignorant startVertexLocation.
 								   **Solution** : toujours mettre `startVertexLocation = 0` dans les indirect args, et passer l'offset des quads par un autre canal (push constant + GPUChunkInfo lookup). Ne JAMAIS compter sur `startVertexLocation` pour encoder un offset dans le mega-buffer.
 . **Barriers sur buffers indirect — NON NÉCESSAIRES en pratique** :
 								   Les buffers `Usage::DEFAULT` démarrent en COMMON et décayent vers COMMON après chaque exécution de command list. La promotion implicite COMMON → COPY_DST (via UpdateBuffer) et COMMON → INDIRECT_ARGUMENT (via DrawInstancedIndirectCount) fonctionne sans barriers explicites. C'est le même pattern que les SRV buffers (megaQuadBuffer_, chunkInfoBuffer_) qui passent de COPY_DST à SRV usage sans barrier en Phase 2.1.
-												Phase 2.3: GPU compute culling with frustum + backface cull

Compute shader fills indirect args buffer, replacing CPU cull loop.
Single DrawInstancedIndirectCount renders all visible face groups.

Key fixes:
- Compute shader: pack chunkIndex|(faceIndex<<16) in push constant,
  startVertexLocation=0 (aligned with Phase 2.2 SV_VertexID fix)
- PushConstants must be called AFTER BindPipelineState, not before.
  Wicked Engine dispatches to SetGraphicsRoot32BitConstants only when
  active_pso is set; after BindComputeShader it targets compute instead.
- Barriers: UNDEFINED(COMMON)→UAV before compute, UAV→INDIRECT_ARGUMENT after
- Buffer decay: DX12 buffers always return to COMMON between frames,
  no cross-frame state tracking needed

											
										
										
											2026-03-25 22:30:50 +01:00
+								   **⚠️ Pour la Phase 2.3 (compute cull)**, des barriers explicites SONT nécessaires :
 								   - `drawCountBuffer_` : COPY_DST → UAV (après UpdateBuffer zero) puis UAV → INDIRECT_ARGUMENT (après dispatch)
 								   - `indirectArgsBuffer_` : UNDEFINED → UAV (COMMON après decay, `ResourceState::UNDEFINED = 0` = COMMON en Wicked) puis UAV → INDIRECT_ARGUMENT
 								   - Wicked Engine appelle `DiscardResource()` quand `state_before == UNDEFINED`, ce qui est OK (le compute écrase les données)
 . **PushConstants après BindComputeShader — PIÈGE MAJEUR** :
 								    `PushConstants()` dispatche vers `SetGraphicsRoot32BitConstants` ou `SetComputeRoot32BitConstants` selon l'état actif :
 								    - Si `active_pso != nullptr` → **GRAPHICS** push constants
 								    - Sinon si `active_cs != nullptr` → **COMPUTE** push constants
 								    Après `BindComputeShader` + `Dispatch`, `active_cs` reste actif. Appeler `PushConstants` à ce moment écrit dans les push constants **compute**, pas **graphics**. Le vertex shader ne voit jamais la valeur !
 								    **Règle** : toujours appeler `PushConstants` **APRÈS** `BindPipelineState` (qui set `active_pso`) pour cibler les push constants graphics. L'ordre correct :
 								    ```cpp
 								    BindPipelineState(&pso_);   // ← active_pso = &pso_
 								    PushConstants(&data, ...);  // ← SetGraphicsRoot32BitConstants ✓
 								    Draw*(...);
 								    ```
-												Phase 2.2: MDI rendering with CPU-filled indirect args

Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.

Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
  quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)

Also adds voxel_engine_spec.md and updates references from .docx to .md.

											
										
										
											2026-03-25 22:07:22 +01:00
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
+								### Diagnostics et debugging
 								**Crash handler SEH** (`main.cpp`) : `SetUnhandledExceptionFilter` écrit :
 								- `bvle_crash.log` : stack trace avec symboles + adresses
 								- `bvle_crash.dmp` : minidump analysable avec Visual Studio
 								- Nécessite `dbghelp.lib` et build avec symbols (`RelWithDebInfo` ou `Debug`)
 								**D3D12 Debug Layer** : lancer avec `BVLEVoxels.exe debugdevice` pour activer. Active aussi DRED (Device Removed Extended Data) pour diagnostiquer les GPU hangs.
 								**Erreurs GPU courantes** :
 								- `DXGI_ERROR_INVALID_CALL` → render pass imbriqué ou resource state invalide
 								- `DXGI_ERROR_DEVICE_HUNG` → shader en boucle infinie ou accès mémoire hors limites
 								- Dialog bloquant avec `messageBox` → vient de `wi::helper::messageBox()`, ne pas confondre avec un crash
-												cleanup

											
										
										
											2026-03-25 19:38:50 +01:00
+								**⚠️ Détection de crash GPU depuis CLI (Claude Code)** : les crashs GPU (`DXGI_ERROR_INVALID_CALL`, device removed) affichent une **modale Windows bloquante** via `wi::helper::messageBox()`. `timeout` tue le process sans détecter le crash. Pour détecter correctement :
 . **NE PAS utiliser `timeout`** pour tester — demander à l'utilisateur de lancer manuellement
 . Vérifier `bvle_backlog.txt` après exécution (contient les erreurs DX12)
 . Vérifier `bvle_crash.log` et `bvle_crash.dmp` pour les crashs SEH
 . Lancer avec `debugdevice` pour obtenir les messages de validation D3D12 détaillés dans le backlog
 . Un exit code non-zéro n'est PAS fiable : `timeout` renvoie 124, la modale attend indéfiniment
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
+								**Backlog Wicked** : `wi::backlog::SetLogFile("bvle_backlog.txt")` redirige les logs vers un fichier. Touche `~` (tilde) pour toggler la console à l'écran.
-												cleanup

											
										
										
											2026-03-25 19:38:50 +01:00
+								### Gestion des resource states DX12 (buffers)
 								**Wicked Engine ne fait AUCUN tracking automatique d'état pour les buffers.** Les `GPUBarrier::Buffer(buf, before, after)` sont passées directement à D3D12 sans validation. **Le `state_before` DOIT correspondre à l'état DX12 réel, sinon → DXGI_ERROR_INVALID_CALL.**
 								**Pièges critiques :**
 								- `UpdateBuffer()` → appelle `CopyBufferRegion` sans aucune barrier. Le buffer **DOIT** être en COPY_DST (ou COMMON pour promotion implicite sur frame 1).
 								- Après `DrawInstancedIndirectCount`, les buffers indirect restent en **INDIRECT_ARGUMENT**. Appeler `UpdateBuffer` dessus au frame suivant → crash car pas de transition INDIRECT_ARGUMENT → COPY_DST.
 								- Les buffers créés avec `Usage::DEFAULT` démarrent en état **COMMON** (D3D12). COMMON supporte la promotion implicite vers COPY_DST, SRV, etc. mais **PAS vers UAV**.
 								- Solution recommandée : **tracker l'état manuellement** avec un `mutable ResourceState` et faire des barriers explicites entre chaque usage.
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
+								**Mode debug face-color** : lancer avec `BVLEVoxels.exe debug` pour activer. Génère un monde de test (blocs isolés) et colore chaque face selon sa direction :
 								- Bright Red / Dark Red = +X / -X
 								- Bright Green / Dark Green = +Y / -Y
 								- Bright Blue / Dark Blue = +Z / -Z
 								## Détails d'implémentation
 								### VoxelData (16 bits)
 								```
 								[15:8] material ID (256 matériaux)
 								[7:4]  flags (smooth, transparent, emissive, custom)
 								[3:0]  metadata (orientation, variant)
 								```
 								### PackedQuad (64 bits = 8 octets par quad)
 								```
 								[5:0]   position X (0-63)
 								[11:6]  position Y (0-63)
 								[17:12] position Z (0-63)
 								[23:18] width (1-32)
 								[29:24] height (1-32)
 								[32:30] face (0-5 : +X,-X,+Y,-Y,+Z,-Z)
 								[40:33] material ID
-												Phase 3: PS-based texture blending with winner-takes-all heightmap

Replace pre-encoded quad blend data (v1) with per-pixel voxel data
lookups in the pixel shader. The PS reads voxelDataBuffer (SRV t3)
to find neighbor materials dynamically, enabling 2 independent blend
axes, stair-priority neighbor detection, and winner-takes-all
heightmap-driven transitions.

Key design decisions validated through 6 iterations (see
blending_experiments.md):
- Winner-takes-all: material with highest heightmap score wins 100%
  (sharp but organic transitions, not smooth gradient)
- Symmetric bias: bias = 0.5 - weight ensures equal chance at border
- Subtractive corner attenuation (param=0.80): xAdj = xEdge -
  saturate(yEdge - 0.80) reduces blend at corners naturally
- Blend zone = 0.25 voxels from each edge (50% of face)
- Debug mode (F4) visualizes blend zones as colors

											
										
										
											2026-03-26 12:14:08 +01:00
+								[48:41] blendMatID (8 bits, matériau voisin pour height-based blending)
-												Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)

Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static

											
										
										
											2026-03-26 09:05:52 +01:00
+								[59:49] chunkIndex (11 bits, utilisé par GPU mesh path pour lookup GPUChunkInfo)
-												Phase 3: PS-based texture blending with winner-takes-all heightmap

Replace pre-encoded quad blend data (v1) with per-pixel voxel data
lookups in the pixel shader. The PS reads voxelDataBuffer (SRV t3)
to find neighbor materials dynamically, enabling 2 independent blend
axes, stair-priority neighbor detection, and winner-takes-all
heightmap-driven transitions.

Key design decisions validated through 6 iterations (see
blending_experiments.md):
- Winner-takes-all: material with highest heightmap score wins 100%
  (sharp but organic transitions, not smooth gradient)
- Symmetric bias: bias = 0.5 - weight ensures equal chance at border
- Subtractive corner attenuation (param=0.80): xAdj = xEdge -
  saturate(yEdge - 0.80) reduces blend at corners naturally
- Blend zone = 0.25 voxels from each edge (50% of face)
- Debug mode (F4) visualizes blend zones as colors

											
										
										
											2026-03-26 12:14:08 +01:00
+								[63:60] blendEdges (4 bits : +U(0), -U(1), +V(2), -V(3) — bords avec matériau différent)
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
+								```
 								### Binary Greedy Mesher (CPU, `VoxelMesher.cpp`)
 . **Masques binaires** : pour chaque axe (X,Y,Z), `solid[u][v]` = bitmask 32 bits de voxels solides
 . **Face culling** : `visible = solid & ~(solid >> 1)` pour faces positives (shift adapté par direction), avec lookup cross-chunk aux frontières
 . **Greedy merge** : par tranche de profondeur, grille 2D de material IDs, expansion rectangulaire maximale (largeur puis hauteur)
 								### Génération procédurale (`VoxelWorld.cpp`)
 								- Perlin noise 3D (permutation-based, seed configurable)
-												Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)

Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static

											
										
										
											2026-03-26 09:05:52 +01:00
+								- fBm 5 octaves pour le heightmap (génération initiale), 2 octaves en animation (perf)
 								- Caves : `|fbm(x,y,z)| < threshold` en 3D (désactivées en mode animation)
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
+								- Matériaux par altitude : sable < 25, herbe 25-70, pierre 70-90, neige > 90
 								- Chunks générés en Y = 0..7 (hauteur max 256 blocs)
-												Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)

Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static

											
										
										
											2026-03-26 09:05:52 +01:00
+								- Animation 60 Hz : `regenerateAnimated()` parallélise génération + pack GPU fusionnés via `wi::jobsystem`
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
 								### Renderer (`VoxelRenderer.cpp`)
-												Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)

Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static

											
										
										
											2026-03-26 09:05:52 +01:00
+								- **Triple-mode VS** : CPU path (`flags=0`), MDI path (`flags & 1`), GPU mesh path (`flags & 2`)
 								- **GPU mesh path (actif par défaut)** : compute shader `voxelMeshCS` génère les quads 1×1, `DrawInstanced` avec readback 1-frame-delay du compteur atomique
 								- **Mega-buffer** : tous les quads de tous les chunks dans un seul `StructuredBuffer<PackedQuad>` (2M quads, 16 MB) — utilisé en mode CPU/MDI
 								- **Vertex pulling** : le VS lit le quad buffer via `SV_VertexID`, pas de vertex buffer classique
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
+								- **Pipeline** : PSO avec `RSTYPE_FRONT` (backface cull), `DSSTYPE_DEFAULT` (depth test), `BSTYPE_OPAQUE`
-												Phase 2.2: MDI rendering with CPU-filled indirect args

Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.

Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
  quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)

Also adds voxel_engine_spec.md and updates references from .docx to .md.

											
										
										
											2026-03-25 22:07:22 +01:00
+								- **Per-chunk info** : `StructuredBuffer<GPUChunkInfo>` (80 bytes/chunk) avec worldPos, quadOffset, faceOffsets[6], faceCounts[6]
-												Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)

Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static

											
										
										
											2026-03-26 09:05:52 +01:00
+								- **Push constants** (b999, 48 bytes) : chunkIndex + quadOffset + flags (bit 0 = MDI mode, bit 1 = GPU mesh mode)
 								- **CPU culling** : frustum AABB (`wi::primitive::Frustum`) + backface par face group (camera vs AABB) — mode MDI uniquement
-												Phase 2.2: MDI rendering with CPU-filled indirect args

Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.

Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
  quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)

Also adds voxel_engine_spec.md and updates references from .docx to .md.

											
										
										
											2026-03-25 22:07:22 +01:00
+								- **MDI rendering** (Phase 2.2) : un seul `DrawInstancedIndirectCount` remplace la boucle per-chunk. Push constant = `chunkIndex | (faceIndex << 16)`, le VS reconstruit quadOffset depuis GPUChunkInfo
 								- **Per-face-group draws** (Phase 2.1 fallback) : jusqu'à 6 `DrawInstanced` par chunk visible
-												Phase 3: PS-based texture blending with winner-takes-all heightmap

Replace pre-encoded quad blend data (v1) with per-pixel voxel data
lookups in the pixel shader. The PS reads voxelDataBuffer (SRV t3)
to find neighbor materials dynamically, enabling 2 independent blend
axes, stair-priority neighbor detection, and winner-takes-all
heightmap-driven transitions.

Key design decisions validated through 6 iterations (see
blending_experiments.md):
- Winner-takes-all: material with highest heightmap score wins 100%
  (sharp but organic transitions, not smooth gradient)
- Symmetric bias: bias = 0.5 - weight ensures equal chance at border
- Subtractive corner attenuation (param=0.80): xAdj = xEdge -
  saturate(yEdge - 0.80) reduces blend at corners naturally
- Blend zone = 0.25 voxels from each edge (50% of face)
- Debug mode (F4) visualizes blend zones as colors

											
										
										
											2026-03-26 12:14:08 +01:00
+								- **Textures** : texture array 2D (256x256, 5 layers) générée procéduralement, triplanar mapping dans le PS. Alpha = heightmap procédural pour blending
 								- **Height-based blending** (Phase 3) : le PS lit directement `voxelDataBuffer` (SRV t3) pour lookup des matériaux voisins per-pixel. Winner-takes-all : le matériau avec la heightmap la plus haute gagne 100%. Transitions nettes mais forme organique dessinée par les heightmaps. Corner attenuation subtractive (param=0.80). Mode debug blend (F4)
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
+								- **Render targets propres** : `voxelRT_` (R8G8B8A8) + `voxelDepth_` (D32_FLOAT), rendu dans `Render()` sur cmd list dédié
 								- **Composition** : overlay sur le swapchain via `wi::image::Draw()` dans `Compose()`
 								- **Stats overlay** : affichage HUD des chunks/quads/draw calls via `wi::font::Draw`
-												Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)

Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static

											
										
										
											2026-03-26 09:05:52 +01:00
+								- **Frustum planes** : extraction Gribb-Hartmann dans le CB pour le compute shader de cull
 								- **GPU timestamp queries** : 6 slots (cull begin/end, draw begin/end, mesh begin/end)
 								- **CPU profiling** : `ProfileAccum` avec moyennes toutes les 5s dans le backlog (Regenerate, UpdateMeshes, VoxelPack, GPU Upload, GPU Dispatch, Render, Frame)
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
 								## Phases de développement (spec)
 								### Phase 1 - Setup et meshing de base [FAIT]
 								- Fork Wicked Engine, structure de modules
 								- VoxelWorld avec génération procédurale Perlin (rayon 4 chunks = ~150 chunks)
 								- Binary Greedy Mesher CPU (~300K quads pour le monde initial)
 								- Rendu basique avec vertex pulling et texture array
 								- Caméra libre de navigation (WASD + souris)
 								- Crash handler SEH avec stack trace symbolique
-												Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)

Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static

											
										
										
											2026-03-26 09:05:52 +01:00
+								### Phase 2 - Performance GPU [FAIT]
-												Phase 2.2: MDI rendering with CPU-filled indirect args

Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.

Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
  quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)

Also adds voxel_engine_spec.md and updates references from .docx to .md.

											
										
										
											2026-03-25 22:07:22 +01:00
 								Découpée en sous-phases pour isoler les sources de bugs potentiels :
 								#### Phase 2.1 - Mega-buffer + CPU cull + per-face DrawInstanced [FAIT]
 								- Mega-buffer : tous les quads dans un seul SRV, packés par chunk
 								- Tri par face group dans le mesher (`faceOffsets[6]`, `faceCounts[6]`)
 								- CPU frustum culling (AABB vs `wi::primitive::Frustum`)
 								- CPU backface culling par face group (camera.Eye vs chunk AABB)
 								- Per-face-group `DrawInstanced` (max 6 draws par chunk visible)
 								- `GPUChunkInfo` StructuredBuffer pour lookup VS
 								#### Phase 2.2 - CPU-filled indirect args + DrawInstancedIndirectCount [FAIT]
 								- Le CPU remplit `IndirectDrawArgs[]` avec la même logique que 2.1 (frustum + backface)
 								- Le CPU écrit le draw count
 								- Upload des deux buffers vers le GPU (sans barriers explicites — promotion implicite)
 								- Un seul `DrawInstancedIndirectCount` remplace la boucle per-chunk
 								- Le VS décode `chunkIndex | (faceIndex << 16)` depuis le push constant et reconstruit le quadOffset
 								- **Intérêt** : teste le MDI rendering SANS compute shader (isole les problèmes de barriers)
 								- **Pièges résolus** :
 								  - `IndirectDrawArgs` fait 20 bytes (pas 16) — voir point 7 dans "Shaders custom — PIÈGES IMPORTANTS"
 								  - `SV_VertexID` n'inclut pas `startVertexLocation` avec ExecuteIndirect — voir point 8
 								  - Pas de barriers explicites nécessaires — voir point 9
-												Phase 2.3: GPU compute culling with frustum + backface cull

Compute shader fills indirect args buffer, replacing CPU cull loop.
Single DrawInstancedIndirectCount renders all visible face groups.

Key fixes:
- Compute shader: pack chunkIndex|(faceIndex<<16) in push constant,
  startVertexLocation=0 (aligned with Phase 2.2 SV_VertexID fix)
- PushConstants must be called AFTER BindPipelineState, not before.
  Wicked Engine dispatches to SetGraphicsRoot32BitConstants only when
  active_pso is set; after BindComputeShader it targets compute instead.
- Barriers: UNDEFINED(COMMON)→UAV before compute, UAV→INDIRECT_ARGUMENT after
- Buffer decay: DX12 buffers always return to COMMON between frames,
  no cross-frame state tracking needed

											
										
										
											2026-03-25 22:30:50 +01:00
+								#### Phase 2.3 - GPU compute culling [FAIT]
-												Phase 2.2: MDI rendering with CPU-filled indirect args

Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.

Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
  quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)

Also adds voxel_engine_spec.md and updates references from .docx to .md.

											
										
										
											2026-03-25 22:07:22 +01:00
 								- Le compute shader `voxelCullCS.hlsl` remplace le CPU pour remplir les indirect args
-												Phase 2.3: GPU compute culling with frustum + backface cull

Compute shader fills indirect args buffer, replacing CPU cull loop.
Single DrawInstancedIndirectCount renders all visible face groups.

Key fixes:
- Compute shader: pack chunkIndex|(faceIndex<<16) in push constant,
  startVertexLocation=0 (aligned with Phase 2.2 SV_VertexID fix)
- PushConstants must be called AFTER BindPipelineState, not before.
  Wicked Engine dispatches to SetGraphicsRoot32BitConstants only when
  active_pso is set; after BindComputeShader it targets compute instead.
- Barriers: UNDEFINED(COMMON)→UAV before compute, UAV→INDIRECT_ARGUMENT after
- Buffer decay: DX12 buffers always return to COMMON between frames,
  no cross-frame state tracking needed

											
										
										
											2026-03-25 22:30:50 +01:00
+								- Barriers DX12 : UNDEFINED → UAV (pre-compute) → INDIRECT_ARGUMENT (post-compute)
 								- GPU timestamp queries actifs (GPU Cull ~0.006 ms pour 168 chunks)
 								- **Pièges résolus** :
 								  - `PushConstants` DOIT être appelé APRÈS `BindPipelineState` — voir point 10
 								  - Compute shader corrigé : push constant packing + startVertexLocation=0 — voir points 7-8
 								  - `ResourceState::UNDEFINED` = COMMON en Wicked (valeur 0), déclenche `DiscardResource()` — OK pour les buffers réécrits
-												Phase 2.2: MDI rendering with CPU-filled indirect args

Replace per-chunk DrawInstanced loop with a single DrawInstancedIndirectCount.
CPU fills indirect args buffer with same frustum+backface cull logic as Phase 2.1.

Key discoveries:
- Wicked Engine command signature includes push constant (20-byte stride, not 16)
- SV_VertexID does not reliably include startVertexLocation with ExecuteIndirect
- Solution: pack chunkIndex|(faceIndex<<16) in push constant, VS reconstructs
  quad offset from GPUChunkInfo lookup
- No explicit DX12 barriers needed (implicit promotion from COMMON suffices)

Also adds voxel_engine_spec.md and updates references from .docx to .md.

											
										
										
											2026-03-25 22:07:22 +01:00
-												Phase 2.4: GPU compute mesher benchmark (CPU greedy vs GPU baseline)

One-shot benchmark runs automatically after world generation:
- CPU greedy mesher: 277ms, 358K quads (binary greedy merge)
- GPU baseline (1x1): 5.3ms, 2.43M quads (no merge, 52x faster)
- Greedy merge reduces quad count by 6.8x

Implementation:
- State machine: DISPATCH (upload voxels + dispatch) → READBACK → DONE
- GPU timestamps for accurate timing
- Readback buffer for quad counter
- Each chunk's voxel data uploaded and dispatched sequentially

											
										
										
											2026-03-25 22:51:22 +01:00
+								#### Phase 2.4 - GPU compute mesher (benchmark) [FAIT]
 								- Le compute shader `voxelMeshCS.hlsl` fait le meshing 1×1 sur GPU (1 thread par voxel, 8×8×8 thread groups)
-												Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)

Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static

											
										
										
											2026-03-26 09:05:52 +01:00
+								- Benchmark automatique au premier frame après génération du monde (mode CPU fallback)
-												Phase 2.4: GPU compute mesher benchmark (CPU greedy vs GPU baseline)

One-shot benchmark runs automatically after world generation:
- CPU greedy mesher: 277ms, 358K quads (binary greedy merge)
- GPU baseline (1x1): 5.3ms, 2.43M quads (no merge, 52x faster)
- Greedy merge reduces quad count by 6.8x

Implementation:
- State machine: DISPATCH (upload voxels + dispatch) → READBACK → DONE
- GPU timestamps for accurate timing
- Readback buffer for quad counter
- Each chunk's voxel data uploaded and dispatched sequentially

											
										
										
											2026-03-25 22:51:22 +01:00
+								- Résultats (168 chunks, Ryzen 7 3700X + RX 5700 XT) :
 								  - CPU greedy: 277 ms, 358K quads → greedy merge réduit les quads de 6.8×
 								  - GPU baseline (1×1): 5.3 ms, 2.43M quads → 52× plus rapide que CPU
 								- GPU greedy merge non implémenté (pourrait combiner vitesse GPU + réduction de quads)
 								- Le benchmark est one-shot : state machine IDLE → DISPATCH → READBACK → DONE
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
-												Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)

Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static

											
										
										
											2026-03-26 09:05:52 +01:00
+								#### Phase 2.5 - GPU meshing production + optimisations perf [FAIT]
 								- **GPU meshing en production** : remplace le CPU greedy mesher comme pipeline par défaut
 								  - `voxelMeshCS.hlsl` : chunkIndex encodé dans les bits [63:49] de chaque quad (11 bits)
 								  - `voxelVS.hlsl` : mode `flags & 2` extrait le chunkIndex depuis le quad, lookup `GPUChunkInfo`
 								  - `VoxelRenderer` : dispatch compute shader → barrier UAV→SRV → `DrawInstanced`
 								  - Readback 1-frame-delay du compteur atomique pour le vertex count
 								  - Le `gpuQuadBuffer_` a les bind flags `UNORDERED_ACCESS | SHADER_RESOURCE`
 								- **Optimisations perf CPU** (profilées et mesurées) :
 								  - **VoxelPack par memcpy** : `sizeof(VoxelData) == 2`, donc `voxels[]` est directement compatible avec le format GPU (uint16 pairs). Remplace la boucle bit-shift (28ms → <1ms)
 								  - **Cache dirty** : `packedVoxelCache_` ne se repack que quand les chunks changent, pas chaque frame
 								  - **Fused regenerate+pack** : `regenerateAnimated()` accepte un pointeur de destination, chaque job parallèle fait generate + memcpy dans le même thread. Élimine la double itération du hashmap et le pack séquentiel (6ms → 0ms)
 								  - **Skip GPU dispatch** : `gpuMeshDirty_` flag empêche le re-dispatch/upload quand rien n'a changé
 								  - **Upload conditionnel** : `chunkInfoBuffer_` ne se re-upload que quand `chunkInfoDirty_`
 								  - **Animation allégée** : 2 octaves fBm (au lieu de 5) + pas de caves en mode animation (54ms → 8ms)
 								- **Résultats finaux** (171 chunks, Ryzen 7 3700X + RX 5700 XT, animation 60 Hz) :
 								  - Regenerate: 8.7ms (parallèle, 2 octaves)
 								  - VoxelPack: 0ms (fusionné dans regenerate)
 								  - GPU Upload: 4.5ms (~11 MB voxel data)
 								  - GPU Dispatch: 0.1ms (171 × 64 thread groups)
 								  - Frame total: ~9ms → **80-110 FPS** avec animation terrain 60 Hz
 								  - Sans animation: **700+ FPS**
-												Phase 3: PS-based texture blending with winner-takes-all heightmap

Replace pre-encoded quad blend data (v1) with per-pixel voxel data
lookups in the pixel shader. The PS reads voxelDataBuffer (SRV t3)
to find neighbor materials dynamically, enabling 2 independent blend
axes, stair-priority neighbor detection, and winner-takes-all
heightmap-driven transitions.

Key design decisions validated through 6 iterations (see
blending_experiments.md):
- Winner-takes-all: material with highest heightmap score wins 100%
  (sharp but organic transitions, not smooth gradient)
- Symmetric bias: bias = 0.5 - weight ensures equal chance at border
- Subtractive corner attenuation (param=0.80): xAdj = xEdge -
  saturate(yEdge - 0.80) reduces blend at corners naturally
- Blend zone = 0.25 voxels from each edge (50% of face)
- Debug mode (F4) visualizes blend zones as colors

											
										
										
											2026-03-26 12:14:08 +01:00
+								### Phase 3 - Texture blending [FAIT]
 								Approche **PS-based** : le pixel shader lit directement les données voxel (pas de pré-encodage dans les quads). Voir `blending_experiments.md` pour le détail des itérations.
 								- **Heightmaps procéduraux** dans le canal alpha de chaque texture de matériau (5 matériaux, paramètres freq/contrast différents)
 								- **PS neighbor lookup** (`voxelPS.hlsl`) : bind `voxelDataBuffer` à `t3`, `chunkInfoBuffer` à `t2`. Lit les matériaux voisins per-pixel via `readVoxelMat(coord, chunkIdx)`
 								- **Stair priority** : pour chaque bord, vérifie `pos + edgeDir + normalDir` en premier (le bloc qui masque visuellement le coin), puis fallback `pos + edgeDir`
 								- **2 axes indépendants** : U et V sont traités séparément avec nearest-edge detection via `sign(faceFrac - 0.5)`
 								- **Winner-takes-all heightmap** : `mainScore = h_main + bias`, `neighScore = h_neigh - bias`, `bias = 0.5 - weight`. Le matériau avec le score le plus haut gagne à 100%. Sharpness=16 pour anti-aliasing
 								- **Corner attenuation subtractive** : `xAdj = xEdge - saturate(yEdge - 0.80)` — réduit le blend aux coins où les deux axes se croisent
 								- **Zone de blend** : 0.25 voxels depuis chaque bord (50% de la face)
 								- **CB** : `blendEnabled` (float, 1.0 en GPU mesh path, 0.0 sinon) + `debugBlend` (float, toggle F4)
 								- **VS** (`voxelVS.hlsl`) : passe `chunkIndex` (nointerpolation uint) au PS pour les lookups voxel
 								- **GPU mesher** (`voxelMeshCS.hlsl`) : simplifié (pas de blend computation), encode seulement `chunkIndex` dans les bits [27:17] du quad
 								- **Mode debug** (F4) : visualise les zones de blend (rouge=U, bleu=V, vert=pas de blend, rouge vif=data mismatch)
 								- **Fonctionne uniquement en GPU mesh path** (1×1 quads) ; CPU/MDI paths ont `blendEnabled=0`
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
-												Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes

Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.

											
										
										
											2026-03-26 17:47:08 +01:00
+								### Phase 4 - Toping [EN COURS]
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
-												Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes

Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.

											
										
										
											2026-03-26 17:47:08 +01:00
+								Système de biseaux décoratifs (« topings ») sur les faces +Y exposées pour adoucir les transitions entre blocs.
 								#### Phase 4.1 - Infrastructure TopingSystem [FAIT]
 								- **TopingSystem** (`TopingSystem.h/.cpp`) : data structures + mesh generation + instance collection
 								- **4-bit adjacency bitmask** : pour chaque face +Y exposée, vérifie 4 voisins cardinaux (±X, ±Z) pour même matériau avec +Y exposée → 16 variantes
 								- **Priority-based adjacency** : `TopingDef.priority` détermine quel toping cède aux frontières de matériaux. Grass (priority=1) génère des biseaux par-dessus stone (priority=0)
-												Phase 4.2: grass blade tufts, stone corner fills/caps, vegetation shading

Stone: add corner fill triangles at adjacent open edges and cap
triangles at strip terminaisons. Grass: replace bevel strips with
tuft-based grass blades — clusters of 3-9 curved double-sided
blades with per-tuft height/lean personality and hash-driven
placement (quadratic inset 0-0.30 from edge). Vegetation PS uses
half-Lambert wrap lighting + translucency for soft stylized shading
(inspired by Airborn Trees). Stone keeps classic Lambert.

											
										
										
											2026-03-26 18:48:35 +01:00
+								- **Mesh par matériau** :
 								  - **Stone** : wedge cross-section (outer wall + slope) + corner fills + caps aux terminaisons
 								  - **Grass** : brins d'herbe individuels groupés en touffes, 2 segments courbés, double-sided
 								- ~191K instances pour ~170 chunks
-												Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes

Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.

											
										
										
											2026-03-26 17:47:08 +01:00
-												Phase 4.2: grass blade tufts, stone corner fills/caps, vegetation shading

Stone: add corner fill triangles at adjacent open edges and cap
triangles at strip terminaisons. Grass: replace bevel strips with
tuft-based grass blades — clusters of 3-9 curved double-sided
blades with per-tuft height/lean personality and hash-driven
placement (quadratic inset 0-0.30 from edge). Vegetation PS uses
half-Lambert wrap lighting + translucency for soft stylized shading
(inspired by Airborn Trees). Stone keeps classic Lambert.

											
										
										
											2026-03-26 18:48:35 +01:00
+								#### Phase 4.2 - Rendu GPU + shading végétal [FAIT]
-												Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes

Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.

											
										
										
											2026-03-26 17:47:08 +01:00
-												Phase 4.2: grass blade tufts, stone corner fills/caps, vegetation shading

Stone: add corner fill triangles at adjacent open edges and cap
triangles at strip terminaisons. Grass: replace bevel strips with
tuft-based grass blades — clusters of 3-9 curved double-sided
blades with per-tuft height/lean personality and hash-driven
placement (quadratic inset 0-0.30 from edge). Vegetation PS uses
half-Lambert wrap lighting + translucency for soft stylized shading
(inspired by Airborn Trees). Stone keeps classic Lambert.

											
										
										
											2026-03-26 18:48:35 +01:00
+								- **Shaders dédiés** : `voxelTopingVS.hlsl` (vertex pulling instancié) + `voxelTopingPS.hlsl` (shading par matériau)
-												Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes

Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.

											
										
										
											2026-03-26 17:47:08 +01:00
+								- **Vertex pulling** : `StructuredBuffer<TopingVertex>` (t4) + `StructuredBuffer<float3>` (t5 instances)
 								- **Push constants** : `vertexOffset`, `instanceOffset`, `materialID` réutilisent les 3 premiers champs de b999
 								- **Per-group DrawInstanced** : instances triées par (type, variant), un `DrawInstanced` par groupe contigu
-												Phase 4.2: grass blade tufts, stone corner fills/caps, vegetation shading

Stone: add corner fill triangles at adjacent open edges and cap
triangles at strip terminaisons. Grass: replace bevel strips with
tuft-based grass blades — clusters of 3-9 curved double-sided
blades with per-tuft height/lean personality and hash-driven
placement (quadratic inset 0-0.30 from edge). Vegetation PS uses
half-Lambert wrap lighting + translucency for soft stylized shading
(inspired by Airborn Trees). Stone keeps classic Lambert.

											
										
										
											2026-03-26 18:48:35 +01:00
+								- **Render pass séparé** avec `LoadOp::LOAD` : topings rendus après voxels, préservent RT et depth
-												Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes

Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.

											
										
										
											2026-03-26 17:47:08 +01:00
+								- **PSO** : même rasterizer/depth/blend que les voxels (`RSTYPE_FRONT`, `DSSTYPE_DEFAULT`, `BSTYPE_OPAQUE`)
-												Phase 4.2: grass blade tufts, stone corner fills/caps, vegetation shading

Stone: add corner fill triangles at adjacent open edges and cap
triangles at strip terminaisons. Grass: replace bevel strips with
tuft-based grass blades — clusters of 3-9 curved double-sided
blades with per-tuft height/lean personality and hash-driven
placement (quadratic inset 0-0.30 from edge). Vegetation PS uses
half-Lambert wrap lighting + translucency for soft stylized shading
(inspired by Airborn Trees). Stone keeps classic Lambert.

											
										
										
											2026-03-26 18:48:35 +01:00
+								- **Shading végétal stylisé** (inspiré Airborn Trees, `voxelTopingPS.hlsl`) :
 								  - **Half-Lambert wrap lighting** : `(N·L * 0.5 + 0.5)²` — enveloppe la lumière, pas de terminator dur
 								  - **Translucency** : `dot(V, L) * (1 - NdotL) * 0.4` — lumière traversant les brins fins à contre-jour
 								  - **Ambient chaud** : `(0.22, 0.28, 0.20)` — plus lumineux et verdâtre que l'ambient stone
 								  - **Stone** : Lambert classique identique aux voxels (branchement sur `materialID == 3`)
 								- **Génération de touffes d'herbe** (`TopingSystem.cpp`) :
 								  - **Tufts** : clusters de 3–9 brins partageant un centre commun (scatter ±0.03)
 								  - **Position des touffes** : hash-driven le long du bord + inset quadratique 0.0–0.30 du bord
 								  - **Par-tuft personality** : heightScale (0.20–1.0), leanScale (0.3–1.8), blade count (3–9)
 								  - **Par-brin variety** : hauteur, largeur, angle (±55° fan + jitter), courbure (midLeanRatio 0.08–0.43)
 								  - **Hash déterministe** : `hashF(a,b,c)` golden-ratio based pour reproductibilité
 								- **Stone corner fills** : triangle de pente diagonal aux coins où deux bords ouverts se rejoignent
 								- **Stone caps** : triangle fermant la section du biseau aux terminaisons de strip
-												Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes

Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.

											
										
										
											2026-03-26 17:47:08 +01:00
+								- **Pièges résolus** :
-												Phase 4.2: grass blade tufts, stone corner fills/caps, vegetation shading

Stone: add corner fill triangles at adjacent open edges and cap
triangles at strip terminaisons. Grass: replace bevel strips with
tuft-based grass blades — clusters of 3-9 curved double-sided
blades with per-tuft height/lean personality and hash-driven
placement (quadratic inset 0-0.30 from edge). Vegetation PS uses
half-Lambert wrap lighting + translucency for soft stylized shading
(inspired by Airborn Trees). Stone keeps classic Lambert.

											
										
										
											2026-03-26 18:48:35 +01:00
+								  - **Winding CW** : `emitTri()` auto-corrige le winding via `dot(geom, desired) > 0` → swap B↔C
 								  - **Slope normal = inward + up** : utiliser `(e.ix, e.iz)`, PAS `(e.nx, e.nz)`
 								  - **sunDirection** : `L = normalize(-sunDirection.xyz)` (direction de voyage → direction vers le soleil)
-												Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes

Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.

											
										
										
											2026-03-26 17:47:08 +01:00
 								#### Phase 4.3 - Polish et extensions [A FAIRE]
-												Phase 4.2: grass blade tufts, stone corner fills/caps, vegetation shading

Stone: add corner fill triangles at adjacent open edges and cap
triangles at strip terminaisons. Grass: replace bevel strips with
tuft-based grass blades — clusters of 3-9 curved double-sided
blades with per-tuft height/lean personality and hash-driven
placement (quadratic inset 0-0.30 from edge). Vegetation PS uses
half-Lambert wrap lighting + translucency for soft stylized shading
(inspired by Airborn Trees). Stone keeps classic Lambert.

											
										
										
											2026-03-26 18:48:35 +01:00
+								- Plus de types de topings (neige, mousse, etc.)
-												Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes

Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.

											
										
										
											2026-03-26 17:47:08 +01:00
+								- LOD : supprimer les topings à distance
-												Phase 4.2: grass blade tufts, stone corner fills/caps, vegetation shading

Stone: add corner fill triangles at adjacent open edges and cap
triangles at strip terminaisons. Grass: replace bevel strips with
tuft-based grass blades — clusters of 3-9 curved double-sided
blades with per-tuft height/lean personality and hash-driven
placement (quadratic inset 0-0.30 from edge). Vegetation PS uses
half-Lambert wrap lighting + translucency for soft stylized shading
(inspired by Airborn Trees). Stone keeps classic Lambert.

											
										
										
											2026-03-26 18:48:35 +01:00
+								- Animation subtile (vent sur l'herbe via vertex shader)
-												Phase 4.2: GPU toping rendering pipeline + winding/lighting fixes

Add instanced rendering for toping bevels: dedicated shaders
(voxelTopingVS/PS), PSO, GPU buffers (t4 vertices, t5 instances),
per-group DrawInstanced in a separate render pass with LoadOp::LOAD.
Fix inverted face winding (emitTri auto-winding condition flipped for
CW front-facing), slope normals (use inward direction not outward),
and PS lighting (negate sunDirection like voxelPS). Update CLAUDE.md
with Phase 4.1/4.2 documentation.

											
										
										
											2026-03-26 17:47:08 +01:00
+								- Optimisation : compute shader pour le instance collection
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
-												Phase 5.1: Naive Surface Nets smooth rendering

Implement CPU-side Naive Surface Nets for smooth voxel surfaces (SmoothStone,
Snow) coexisting with blocky voxels (Grass, Dirt, Stone, Sand).

Key features:
- SmoothMesher with binary SDF, centroid vertex placement, per-axis boundary
  clamping to align with blocky grid at smooth↔blocky transitions
- Cross-chunk connectivity: PAD=2 SDF grid, vertex range [-1, CHUNK_SIZE),
  canonical edge ownership (no duplicate triangles, no z-fighting)
- Face normals oriented by edge axis+sign (robust with binary SDF, unlike
  SDF gradient dot or centroid sampling approaches)
- Y-axis winding fix: sharing cells have different spatial arrangement,
  requiring opposite winding from X and Z axes
- GPU mesher treats smooth neighbors as solid (no blocky faces toward smooth)
- Material blending: primary (smooth-only) + secondary (all counts) per vertex
- Dedicated shaders: voxelSmoothVS (vertex pulling t6) + voxelSmoothPS
  (triplanar + lerp blending between two materials)
- Separate render pass with LoadOp::LOAD after voxels+topings
- New materials: SmoothStone (mat 6), blocky Stone (mat 3) and Dirt patches
  added to world generation for boundary testing

											
										
										
											2026-03-27 13:03:55 +01:00
+								### Phase 5 - Rendu smooth [EN COURS]
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
-												Phase 5.1: Naive Surface Nets smooth rendering

Implement CPU-side Naive Surface Nets for smooth voxel surfaces (SmoothStone,
Snow) coexisting with blocky voxels (Grass, Dirt, Stone, Sand).

Key features:
- SmoothMesher with binary SDF, centroid vertex placement, per-axis boundary
  clamping to align with blocky grid at smooth↔blocky transitions
- Cross-chunk connectivity: PAD=2 SDF grid, vertex range [-1, CHUNK_SIZE),
  canonical edge ownership (no duplicate triangles, no z-fighting)
- Face normals oriented by edge axis+sign (robust with binary SDF, unlike
  SDF gradient dot or centroid sampling approaches)
- Y-axis winding fix: sharing cells have different spatial arrangement,
  requiring opposite winding from X and Z axes
- GPU mesher treats smooth neighbors as solid (no blocky faces toward smooth)
- Material blending: primary (smooth-only) + secondary (all counts) per vertex
- Dedicated shaders: voxelSmoothVS (vertex pulling t6) + voxelSmoothPS
  (triplanar + lerp blending between two materials)
- Separate render pass with LoadOp::LOAD after voxels+topings
- New materials: SmoothStone (mat 6), blocky Stone (mat 3) and Dirt patches
  added to world generation for boundary testing

											
										
										
											2026-03-27 13:03:55 +01:00
+								#### Phase 5.1 - Naive Surface Nets CPU [FAIT]
 								- **Algorithme** : Naive Surface Nets (dual contouring simplifié) dans `SmoothMesher::meshChunk()`
 								- **SDF binaire** : solid = -1, empty = +1 (pas de distance field continu)
 								- **Vertex placement** : centroïde des edge crossings pour chaque cellule à la surface
 								- **Matériaux smooth** : SmoothStone (mat 6, `FLAG_SMOOTH`) et Snow (mat 5, `FLAG_SMOOTH`)
 								- **Matériaux blocky** : Stone (mat 3), Grass (mat 1), Dirt (mat 2), Sand (mat 4)
 								- **SmoothVertex** (32 bytes) : position, face normal, materialID, secondaryMat, blendWeight, chunkIndex
 								- **Shaders dédiés** : `voxelSmoothVS.hlsl` (vertex pulling t6) + `voxelSmoothPS.hlsl` (triplanar + blending)
 								- **Render pass séparé** avec `LoadOp::LOAD` : smooth rendu après voxels+topings, préserve RT et depth
 								**Cross-chunk connectivity** :
 								- **PAD=2** dans la grille SDF pour accéder aux cellules [-1..CHUNK_SIZE]
 								- **Vertex range étendu** : `[-1, CHUNK_SIZE)` au lieu de `[0, CHUNK_SIZE)` — les cellules au bord du chunk voisin génèrent des vertices
 								- **Canonical ownership** : chaque edge est émise par un seul chunk (celui contenant le grid point inférieur), pas de duplication
 								**Smooth↔blocky boundary** :
 								- **`hasSmooth` filter** : ne génère des vertices que si au moins un coin de la cellule est un voxel smooth (évite le débordement sur territoire blocky)
 								- **Per-axis boundary clamping** : les vertices aux frontières smooth↔blocky sont clampés vers la grille entière (empêche le mesh smooth de dépasser sur les faces blocky)
 								- **GPU mesher** : les voxels smooth sont traités comme solides dans `isNeighborAir()` — les faces blocky ne sont pas émises vers les voxels smooth (le mesh smooth couvre la frontière)
 								**Face normals — PIÈGES MAJEURS** :
 								- **Face normals, pas SDF gradient** : le SDF binaire donne des gradients à 45° aux marches, causant du stretching triplanar. Les face normals (cross product des edges du triangle) sont géométriquement correctes.
 								- **Orientation par axe de l'edge** : chaque quad vient d'une edge X, Y ou Z. La direction `solid→empty` est connue. On vérifie que la composante de la face normal sur cet axe a le bon signe, sinon flip.
 								- **Y-axis winding inversé** : les sharing cells Y sont arrangées différemment de X et Z. Le winding naturel du quad Y est opposé → `if (axis == 1) useWindingA = !useWindingA;`
 								- **SDF gradient dot product** : NE PAS utiliser pour orienter les normals (échoue quand le gradient est nul ou ambigu avec SDF binaire)
 								- **Centroid SDF sampling** : NE PAS utiliser non plus (les deux côtés arrondissent souvent au même voxel)
 								**Material blending** :
 								- **Deux matériaux par vertex** : primaryMat (smooth-only counts, évite subsurface bleed) + secondaryMat (all counts, inclut blocky pour le blending aux frontières)
 								- **blendWeight** : uint8 0-255, ratio du secondaire dans le vote des 8 corners
 								- **PS** : `lerp(primaryColor, secondaryColor, blendWeight)` entre deux samplings triplanar
 								#### Phase 5.2 - Optimisations et polish [A FAIRE]
 								- SDF lissé (distance field approximatif au lieu de binaire ±1)
 								- Smooth normals (vertex normals au lieu de face normals pour surfaces lisses)
 								- GPU compute Surface Nets (compute shader au lieu de CPU)
 								- LOD : réduction de triangles à distance
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
 								### Phase 6 - Ray tracing hybride [A FAIRE]
 								- BLAS par chunk (depuis le mesh greedy), TLAS par frame
 								- RT Shadows via ray queries (compute shader)
 								- RT AO (4-8 rayons, courte portée)
 								- Fallback shadow maps / SSAO si RT non disponible
-												Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)

Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static

											
										
										
											2026-03-26 09:05:52 +01:00
+								## Métriques cibles et résultats
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
-												Phase 2.5: GPU meshing production pipeline + perf optimizations (80+ FPS)

Replace CPU greedy mesher with GPU compute mesher as default rendering pipeline.
Key optimizations identified via CPU profiling (ProfileAccum, 5s averages):
- Fused regenerate+pack: parallel noise gen + memcpy in same jobsystem pass (6ms → 0ms)
- VoxelData memcpy: sizeof(VoxelData)==2 enables direct memcpy instead of bit-shift loop (28ms → <1ms)
- Dirty-skip: GPU dispatch/upload only when chunks change, not every frame
- Animation: 2 fBm octaves + no caves in animation mode (54ms → 8ms)
- Result: 80-110 FPS with 60Hz terrain animation, 700+ FPS static

											
										
										
											2026-03-26 09:05:52 +01:00
+								| Métrique | Cible | Résultat (Ryzen 7 3700X + RX 5700 XT) |
 								|----------|-------|---------------------------------------|
 								| FPS 1440p | > 60 fps | ✅ 80-110 FPS (anim 60Hz), 700+ FPS (statique) |
 								| Meshing GPU | < 200 µs/chunk | ✅ ~0.6 µs/chunk (0.1ms / 171 chunks) |
 								| Re-mesh complet | < 16ms | ✅ ~13ms (regen 8.7ms + upload 4.5ms) |
 								| Mémoire GPU | < 500 Mo | ✅ ~30 Mo (11 MB voxels + 16 MB quads + buffers) |
 								| RT shadows + AO | < 4ms en 1440p | ⏳ Phase 6 |
 								| Draw calls | < 100 | ✅ 1 (GPU mesh) ou 1 (MDI) |
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
 								## Conventions
 								- Namespaces : tout le code voxel est dans `namespace voxel`
 								- Chunks : 32x32x32, configurable via `CHUNK_SIZE`
 								- Coordonnées : Y = haut, monde infini en X/Z, hashmap sparse
-												Phase 5.1: Naive Surface Nets smooth rendering

Implement CPU-side Naive Surface Nets for smooth voxel surfaces (SmoothStone,
Snow) coexisting with blocky voxels (Grass, Dirt, Stone, Sand).

Key features:
- SmoothMesher with binary SDF, centroid vertex placement, per-axis boundary
  clamping to align with blocky grid at smooth↔blocky transitions
- Cross-chunk connectivity: PAD=2 SDF grid, vertex range [-1, CHUNK_SIZE),
  canonical edge ownership (no duplicate triangles, no z-fighting)
- Face normals oriented by edge axis+sign (robust with binary SDF, unlike
  SDF gradient dot or centroid sampling approaches)
- Y-axis winding fix: sharing cells have different spatial arrangement,
  requiring opposite winding from X and Z axes
- GPU mesher treats smooth neighbors as solid (no blocky faces toward smooth)
- Material blending: primary (smooth-only) + secondary (all counts) per vertex
- Dedicated shaders: voxelSmoothVS (vertex pulling t6) + voxelSmoothPS
  (triplanar + lerp blending between two materials)
- Separate render pass with LoadOp::LOAD after voxels+topings
- New materials: SmoothStone (mat 6), blocky Stone (mat 3) and Dirt patches
  added to world generation for boundary testing

											
										
										
											2026-03-27 13:03:55 +01:00
+								- Matériaux : palette de 256, index 0 = air (vide), 1=grass, 2=dirt, 3=stone (blocky), 4=sand, 5=snow (smooth), 6=smoothstone (smooth)
-												Phase 2: GPU-driven voxel rendering pipeline

Mega-buffer architecture replacing per-chunk GPU buffers:
- Single StructuredBuffer<PackedQuad> for all chunks (2M quads, 16 MB)
- StructuredBuffer<GPUChunkInfo> with per-chunk metadata (position, quad offsets, face groups)
- VS reads chunk info via push constants (b999) for driver-safe chunk indexing
- CPU frustum culling with wi::primitive::Frustum + AABB per chunk
- Quads sorted by face direction in greedy mesher (faceOffsets/faceCounts)
- GPU frustum + backface cull compute shader (voxelCullCS.hlsl)
- GPU binary mesher compute shader baseline (voxelMeshCS.hlsl)
- Indirect draw buffers and timestamp query infrastructure
- README with build instructions and project architecture

											
										
										
											2026-03-25 14:24:05 +01:00
+								- Faces : 0=+X, 1=-X, 2=+Y, 3=-Y, 4=+Z, 5=-Z
-												Phase 5.1: Naive Surface Nets smooth rendering

Implement CPU-side Naive Surface Nets for smooth voxel surfaces (SmoothStone,
Snow) coexisting with blocky voxels (Grass, Dirt, Stone, Sand).

Key features:
- SmoothMesher with binary SDF, centroid vertex placement, per-axis boundary
  clamping to align with blocky grid at smooth↔blocky transitions
- Cross-chunk connectivity: PAD=2 SDF grid, vertex range [-1, CHUNK_SIZE),
  canonical edge ownership (no duplicate triangles, no z-fighting)
- Face normals oriented by edge axis+sign (robust with binary SDF, unlike
  SDF gradient dot or centroid sampling approaches)
- Y-axis winding fix: sharing cells have different spatial arrangement,
  requiring opposite winding from X and Z axes
- GPU mesher treats smooth neighbors as solid (no blocky faces toward smooth)
- Material blending: primary (smooth-only) + secondary (all counts) per vertex
- Dedicated shaders: voxelSmoothVS (vertex pulling t6) + voxelSmoothPS
  (triplanar + lerp blending between two materials)
- Separate render pass with LoadOp::LOAD after voxels+topings
- New materials: SmoothStone (mat 6), blocky Stone (mat 3) and Dirt patches
  added to world generation for boundary testing

											
										
										
											2026-03-27 13:03:55 +01:00
+								- Smooth flag : `FLAG_SMOOTH = 0x1` dans VoxelData flags — active Surface Nets au lieu du rendu blocky