Phase 3: PS-based texture blending with winner-takes-all heightmap

Replace pre-encoded quad blend data (v1) with per-pixel voxel data lookups in the pixel shader. The PS reads voxelDataBuffer (SRV t3) to find neighbor materials dynamically, enabling 2 independent blend axes, stair-priority neighbor detection, and winner-takes-all heightmap-driven transitions. Key design decisions validated through 6 iterations (see blending_experiments.md): - Winner-takes-all: material with highest heightmap score wins 100% (sharp but organic transitions, not smooth gradient) - Symmetric bias: bias = 0.5 - weight ensures equal chance at border - Subtractive corner attenuation (param=0.80): xAdj = xEdge - saturate(yEdge - 0.80) reduces blend at corners naturally - Blend zone = 0.25 voxels from each edge (50% of face) - Debug mode (F4) visualizes blend zones as colors
2026-03-26 12:14:08 +01:00 · 2026-03-26 12:14:08 +01:00 · d7e69f97ca
commit d7e69f97ca
parent 21f1bd1a12
9 changed files with 430 additions and 77 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -253,9 +253,9 @@ Les shaders custom doivent respecter le **binding model de Wicked Engine** :
 [29:24] height (1-32)
 [32:30] face (0-5 : +X,-X,+Y,-Y,+Z,-Z)
 [40:33] material ID
-[48:41] AO (4x2 bits par coin)
+[48:41] blendMatID (8 bits, matériau voisin pour height-based blending)
 [59:49] chunkIndex (11 bits, utilisé par GPU mesh path pour lookup GPUChunkInfo)
-[63:60] flags (réservés)
+[63:60] blendEdges (4 bits : +U(0), -U(1), +V(2), -V(3) — bords avec matériau différent)
 ```
 ### Binary Greedy Mesher (CPU, `VoxelMesher.cpp`)
@ -285,7 +285,8 @@ Les shaders custom doivent respecter le **binding model de Wicked Engine** :
 - **CPU culling** : frustum AABB (`wi::primitive::Frustum`) + backface par face group (camera vs AABB) — mode MDI uniquement
 - **MDI rendering** (Phase 2.2) : un seul `DrawInstancedIndirectCount` remplace la boucle per-chunk. Push constant = `chunkIndex | (faceIndex << 16)`, le VS reconstruit quadOffset depuis GPUChunkInfo
 - **Per-face-group draws** (Phase 2.1 fallback) : jusqu'à 6 `DrawInstanced` par chunk visible
- **Textures** : texture array 2D (256x256, 5 layers) générée procéduralement, triplanar mapping dans le PS
+- **Textures** : texture array 2D (256x256, 5 layers) générée procéduralement, triplanar mapping dans le PS. Alpha = heightmap procédural pour blending
 - **Height-based blending** (Phase 3) : le PS lit directement `voxelDataBuffer` (SRV t3) pour lookup des matériaux voisins per-pixel. Winner-takes-all : le matériau avec la heightmap la plus haute gagne 100%. Transitions nettes mais forme organique dessinée par les heightmaps. Corner attenuation subtractive (param=0.80). Mode debug blend (F4)
 - **Render targets propres** : `voxelRT_` (R8G8B8A8) + `voxelDepth_` (D32_FLOAT), rendu dans `Render()` sur cmd list dédié
 - **Composition** : overlay sur le swapchain via `wi::image::Draw()` dans `Compose()`
 - **Stats overlay** : affichage HUD des chunks/quads/draw calls via `wi::font::Draw`
@ -373,12 +374,22 @@ Découpée en sous-phases pour isoler les sources de bugs potentiels :
  - Frame total: ~9ms → **80-110 FPS** avec animation terrain 60 Hz
  - Sans animation: **700+ FPS**
-### Phase 3 - Texture blending [A FAIRE]
+### Phase 3 - Texture blending [FAIT]
- Triplanar mapping (déjà en place, à affiner)
+Approche **PS-based** : le pixel shader lit directement les données voxel (pas de pré-encodage dans les quads). Voir `blending_experiments.md` pour le détail des itérations.
- Height-based blending aux frontières de matériaux
+
- Heightmaps dans le canal alpha ou texture séparée
+- **Heightmaps procéduraux** dans le canal alpha de chaque texture de matériau (5 matériaux, paramètres freq/contrast différents)
- Neighbor material ID dans le vertex format (8 bits dans les flags réservés)
+- **PS neighbor lookup** (`voxelPS.hlsl`) : bind `voxelDataBuffer` à `t3`, `chunkInfoBuffer` à `t2`. Lit les matériaux voisins per-pixel via `readVoxelMat(coord, chunkIdx)`
 - **Stair priority** : pour chaque bord, vérifie `pos + edgeDir + normalDir` en premier (le bloc qui masque visuellement le coin), puis fallback `pos + edgeDir`
 - **2 axes indépendants** : U et V sont traités séparément avec nearest-edge detection via `sign(faceFrac - 0.5)`
 - **Winner-takes-all heightmap** : `mainScore = h_main + bias`, `neighScore = h_neigh - bias`, `bias = 0.5 - weight`. Le matériau avec le score le plus haut gagne à 100%. Sharpness=16 pour anti-aliasing
 - **Corner attenuation subtractive** : `xAdj = xEdge - saturate(yEdge - 0.80)` — réduit le blend aux coins où les deux axes se croisent
 - **Zone de blend** : 0.25 voxels depuis chaque bord (50% de la face)
 - **CB** : `blendEnabled` (float, 1.0 en GPU mesh path, 0.0 sinon) + `debugBlend` (float, toggle F4)
 - **VS** (`voxelVS.hlsl`) : passe `chunkIndex` (nointerpolation uint) au PS pour les lookups voxel
 - **GPU mesher** (`voxelMeshCS.hlsl`) : simplifié (pas de blend computation), encode seulement `chunkIndex` dans les bits [27:17] du quad
 - **Mode debug** (F4) : visualise les zones de blend (rouge=U, bleu=V, vert=pas de blend, rouge vif=data mismatch)
 - **Fonctionne uniquement en GPU mesh path** (1×1 quads) ; CPU/MDI paths ont `blendEnabled=0`
 ### Phase 4 - Toping [A FAIRE]
--- a/blending_experiments.md
+++ b/blending_experiments.md
@ -0,0 +1,101 @@
 # Experimentations -- Texture Blending (Phase 3)
 ## Contexte
 - Moteur voxel prototype base sur Wicked Engine (DX12)
 - Objectif : transitions organiques entre materiaux voxel adjacents (grass/dirt/stone/sand/snow)
 - Approche retenue : PS-based voxel data lookup (le pixel shader lit directement les donnees voxel pour determiner les materiaux voisins)
 ---
 ## Phase 3 v1 -- Blend pre-encode dans les quads (abandonnee)
 - **Approche** : encoder `blendMatID` (8 bits) + `blendEdges` (4 bits) dans chaque `PackedQuad` au moment du meshing GPU
 - **Probleme 1** : limite a 1 seul materiau de blend par quad (pas de support 2 axes independants)
 - **Probleme 2** : sur les escaliers, le materiau du bloc en-dessous (dirt sous grass) "saignait" vers le haut
 - **Probleme 3** : aux jonctions tri-materiaux, les jointures etaient tres visibles
 - **Decision** : abandonner cette approche au profit d'un lookup per-pixel dans le PS
 ---
 ## Phase 3 v2 -- PS-based neighbor lookup
 ### Iteration 1 -- Blend lineaire + heightmap boundary shift
 - **Approche** : `lerp(main, neigh, weight)` avec weight 0->1, heightmap deplace la frontiere de +/-0.08 voxels
 - **Zone de blend** : 0.45 (90% de la face couverte)
 - **Resultat** : artefacts massifs -- la zone trop large faisait blender avec des blocs souterrains (dirt sous grass). Le heightmap shift asymetrique creait des discontinuites a la frontiere.
 ### Iteration 2 -- Cap du weight a 0.5
 - **Fix** : `weight *= 0.5` pour garantir la continuite (`lerp(A,B,0.5) == lerp(B,A,0.5)`)
 - **Resultat** : jointure encore trop visible -- la modulation heightmap brisait la symetrie (cote A : `heightBlend = f(hA-hB)`, cote B : `heightBlend = f(hB-hA)`, resultats inverses)
 ### Iteration 3 -- Heightmap comme deplacement de frontiere (pas modulation du montant)
 - **Fix** : heightmap shift ajoute a la distance (`uDist + heightShift`), pas au poids
 - **Resultat** : artefacts en damier -- le shift deplacait la frontiere de facon erratique car les heightmaps triplanaires donnaient des valeurs incoherentes entre faces adjacentes
 ### Iteration 4 -- Simplification radicale (gradient lineaire pur)
 - **Approche** : retirer TOUT (heightmap, noise, corner attenuation). Juste `lerp(main, neigh, weight)` avec weight 0->0.5.
 - **Zone de blend** reduite a 0.25 (50% de la face)
 - Ajout d'un mode debug (F4) pour visualiser les zones de blend (rouge=U, bleu=V, vert=pas de blend)
 - **Resultat** : **ca fonctionne !** Gradient lisse et continu, pas d'artefacts. Le debug mode a confirme que les donnees voxel etaient correctement lues (pas de rouge = data mismatch).
 - **Conclusion** : le probleme n'etait pas les donnees mais les transformations appliquees dessus.
 ### Iteration 5 -- Corner attenuation
 Trois methodes testees avec UI de selection (F5 cycle, F6/F7 ajuste param) :
 #### Mode 0 -- Threshold Fade
 - **Formule** : `cornerFade = saturate(otherDist / param)` (param defaut : 0.15)
 - Fade lineaire dans les `param` voxels du coin
 - **Resultat** : coins trop visibles, transition abrupte
 #### Mode 1 -- Subtractive (reference Unity) -- RETENU
 - **Formule** : `xDist_adj = xEdge - saturate(yEdge - param)` (param defaut : 0.60, optimal : 0.80)
 - Quand l'autre axe depasse `param` (proche de son bord), il soustrait de cet axe
 - **Resultat** : **le plus naturel** -- l'attenuation est progressive et ne cree pas de forme de coin distincte
 #### Mode 2 -- Smoothstep
 - **Formule** : `cornerFade = smoothstep(0, param, otherDist)` (param defaut : 0.15)
 - Courbe S au lieu de lineaire
 - **Resultat** : similaire au threshold mais legerement plus doux, coins encore un peu visibles
 ### Iteration 6 -- Winner-takes-all heightmap blending
 - Abandon du `lerp(main, neigh, weight)` (gradient lisse/boueux)
 - Nouveau : comparaison des scores `mainScore = h_main + bias` vs `neighScore = h_neigh - bias`
 - `bias = 0.5 - weight` : loin du bord bias=0.5 (main gagne toujours), au bord bias=0 (heightmap decide)
 - `blend = saturate((neighScore - mainScore) * sharpness + 0.5)` avec sharpness=16
 - **Bug corrige** : le bias initial etait asymetrique (`main + (0.5-w)` vs `neigh + w`), donnant un avantage de +0.5 au voisin au bord. Fix : bias symetrique `main + bias` / `neigh - bias`.
 - **Resultat** : **transitions nettes mais organiques** -- la forme de la transition est dessinee par les heightmaps, pas un gradient lineaire
 ---
 ## Configuration finale retenue
 | Parametre | Valeur |
 |-----------|--------|
 | Zone de blend | 0.25 voxels depuis chaque bord |
 | Corner attenuation | Subtractive avec param=0.80 |
 | Blending | Winner-takes-all heightmap (sharpness=16) |
 | Bias | Symetrique : `bias = 0.5 - weight` |
 | Score main | `mainScore = h_main + bias` |
 | Score voisin | `neighScore = h_neigh - bias` |
 | Voisin | Stair priority (`pos + edgeDir + normalDir` d'abord, puis fallback `pos + edgeDir`) |
 | Mode debug | F4 : visualisation des zones de blend |
 ---
 ## Lecons apprises
 1. **Commencer simple** : le gradient lineaire pur a permis de valider que les donnees etaient correctes avant d'ajouter de la complexite
 2. **Le mode debug est indispensable** : F4 a immediatement confirme que le `readVoxelMat` fonctionnait correctement
 3. **La symetrie est critique** : tout calcul asymetrique entre les deux cotes d'une frontiere cree une discontinuite visible
 4. **Le heightmap module OU, pas COMBIEN** : deplacer la frontiere (shift) plutot que moduler le poids (multiply) est plus stable, mais winner-takes-all est encore mieux
 5. **La zone de blend doit etre petite** : 0.25 (50% de la face) vs 0.45 (90%) fait une enorme difference de qualite
--- a/shaders/voxelCommon.hlsli
+++ b/shaders/voxelCommon.hlsli
@ -43,7 +43,8 @@ cbuffer VoxelCB : register(b0) {
    float4 sunColor;
    float chunkSize;
    float textureTiling;
-    float2 _pad;
+    float blendEnabled;
    float debugBlend; // >0.5 = show blend zones as debug colors
    // Frustum culling data (used by cull compute shader)
    float4 frustumPlanes[6]; // ax+by+cz+d=0, xyz=normal, w=distance
    uint chunkCount;
--- a/shaders/voxelMeshCS.hlsl
+++ b/shaders/voxelMeshCS.hlsl
@ -1,6 +1,7 @@
 // BVLE Voxels - GPU Compute Mesher (Binary Face Culling only)
 // 1 thread per voxel: checks 6 neighbors, emits 1x1 PackedQuad per visible face.
-// No greedy merge — this is the simple GPU baseline for benchmark comparison.
+// No greedy merge — this is the simple GPU baseline.
 // Phase 3: blend info is computed per-pixel in the PS (not pre-encoded here).
 #include "voxelCommon.hlsli"
@ -44,10 +45,10 @@ bool isNeighborAir(int3 pos, int3 dir) {
 }
 // Pack a quad into uint2 (matches CPU PackedQuad format)
-// chunkIdx is stored in the flags field [63:49] = hi bits [31:17] for VS lookup
+// chunkIdx is stored in bits [27:17] of hi word for VS lookup
 uint2 packQuad(uint x, uint y, uint z, uint w, uint h, uint face, uint matID, uint chunkIdx) {
    uint lo = x | (y << 6) | (z << 12) | (w << 18) | (h << 24) | (face << 30);
-    uint hi = (face >> 2) | (matID << 1) | (0 << 9) | ((chunkIdx & 0x7FF) << 17);
+    uint hi = (face >> 2) | (matID << 1) | ((chunkIdx & 0x7FF) << 17);
    return uint2(lo, hi);
 }
--- a/shaders/voxelPS.hlsl
+++ b/shaders/voxelPS.hlsl
@ -1,10 +1,16 @@
-// BVLE Voxels - Pixel Shader (Triplanar textured with simple lighting)
+// BVLE Voxels - Pixel Shader (Triplanar textured with PS-based height blending)
 // Phase 3 v2: reads voxel data directly in PS for neighbor material lookups.
 // Two independent blend axes (U/V), corner attenuation, winner-takes-all heightmap.
 #include "voxelCommon.hlsli"
 Texture2DArray materialTextures : register(t1);
 SamplerState materialSampler : register(s0);
 // Voxel data buffer (same as compute mesher uses) — bound at t3 in GPU mesh path
 StructuredBuffer<uint> voxelData : register(t3);
 StructuredBuffer<GPUChunkInfo> chunkInfoBuffer : register(t2);
 struct PSInput {
    float4 position : SV_POSITION;
    float3 worldPos : WORLDPOS;
@ -13,27 +19,108 @@ struct PSInput {
    nointerpolation uint materialID : MATERIALID;
    nointerpolation uint faceID : FACEID;
    nointerpolation float debugFlag : DEBUGFLAG;
-    float ao        : AO;
+    nointerpolation uint chunkIndex : CHUNKINDEX;
 };
-// Triplanar blend weights
+// ── Constants ──────────────────────────────────────────────────────
 static const uint CSIZE = 32;
 static const uint CVOL = CSIZE * CSIZE * CSIZE;
 // Face normals: +X, -X, +Y, -Y, +Z, -Z
 static const int3 faceNormals[6] = {
    int3( 1, 0, 0), int3(-1, 0, 0),
    int3( 0, 1, 0), int3( 0,-1, 0),
    int3( 0, 0, 1), int3( 0, 0,-1)
 };
 // Face tangent axes (U, V) — must match voxelVS.hlsl faceU/faceV
 static const int3 faceUDirs[6] = {
    int3(0, 1, 0), int3(0, 1, 0),
    int3(1, 0, 0), int3(1, 0, 0),
    int3(1, 0, 0), int3(1, 0, 0)
 };
 static const int3 faceVDirs[6] = {
    int3(0, 0, 1), int3(0, 0, 1),
    int3(0, 0, 1), int3(0, 0, 1),
    int3(0, 1, 0), int3(0, 1, 0)
 };
 // ── Voxel data read helpers ────────────────────────────────────────
 // Read material ID from voxel data (16-bit voxels packed as uint16 pairs)
 // Returns high 8 bits = material ID, 0 = air
 uint readVoxelMat(int3 coord, uint chunkIdx) {
    // Compute chunk-local coords and check bounds
    GPUChunkInfo info = chunkInfoBuffer[chunkIdx];
    float3 chunkOrigin = info.worldPos.xyz;
    // coord is in world voxel space — convert to chunk-local
    int3 local = coord - (int3)chunkOrigin;
    // Out of this chunk's bounds → treat as air (no cross-chunk lookup for now)
    if (any(local < 0) || any(local >= (int3)CSIZE))
        return 0;
    uint flatIdx = (uint)local.x + (uint)local.y * CSIZE + (uint)local.z * CSIZE * CSIZE;
    uint pairIndex = flatIdx >> 1;
    uint shift = (flatIdx & 1) * 16;
    // voxelData is laid out as: all chunks packed sequentially
    // Each chunk is CVOL/2 uints (16384 uints = 32^3 voxels / 2 per uint)
    uint bufferOffset = chunkIdx * (CVOL / 2);
    uint voxel = (voxelData[bufferOffset + pairIndex] >> shift) & 0xFFFF;
    return voxel >> 8; // high 8 bits = material ID
 }
 // Get neighbor material with stair priority:
 // Check pos + edgeDir + normalDir FIRST (the stair block that visually masks the edge),
 // then fallback to pos + edgeDir if stair is air.
 uint getNeighborMat(int3 voxelCoord, int3 edgeDir, int3 normalDir, uint chunkIdx) {
    // Stair neighbor (priority): the block that sits at the edge AND is offset by the normal
    int3 stairPos = voxelCoord + edgeDir + normalDir;
    uint stairMat = readVoxelMat(stairPos, chunkIdx);
    if (stairMat > 0)
        return stairMat;
    // Planar neighbor (fallback): the adjacent block in the face plane
    int3 planarPos = voxelCoord + edgeDir;
    return readVoxelMat(planarPos, chunkIdx);
 }
 // ── Noise for transition variation ─────────────────────────────────
 float hash31(float3 p) {
    float3 q = frac(p * float3(127.1, 311.7, 74.7));
    q += dot(q, q.yzx + 33.33);
    return frac((q.x + q.y) * q.z);
 }
 // ── Triplanar helpers ──────────────────────────────────────────────
 float3 triplanarWeights(float3 normal, float sharpness) {
    float3 w = abs(normal);
    w = pow(w, (float3)sharpness);
    return w / (w.x + w.y + w.z + 0.0001);
 }
 // Triplanar sampling — RGB only (non-blended path)
 float3 sampleTriplanar(float3 worldPos, float3 normal, uint texIndex, float tiling) {
    float3 w = triplanarWeights(normal, 4.0);
    float3 colX = materialTextures.Sample(materialSampler, float3(worldPos.yz * tiling, (float)texIndex)).rgb;
    float3 colY = materialTextures.Sample(materialSampler, float3(worldPos.xz * tiling, (float)texIndex)).rgb;
    float3 colZ = materialTextures.Sample(materialSampler, float3(worldPos.xy * tiling, (float)texIndex)).rgb;
    return colX * w.x + colY * w.y + colZ * w.z;
 }
-// Debug face colors
+// Triplanar sampling — RGBA (includes heightmap in alpha)
 float4 sampleTriplanarRGBA(float3 worldPos, float3 normal, uint texIndex, float tiling) {
    float3 w = triplanarWeights(normal, 4.0);
    float4 colX = materialTextures.Sample(materialSampler, float3(worldPos.yz * tiling, (float)texIndex));
    float4 colY = materialTextures.Sample(materialSampler, float3(worldPos.xz * tiling, (float)texIndex));
    float4 colZ = materialTextures.Sample(materialSampler, float3(worldPos.xy * tiling, (float)texIndex));
    return colX * w.x + colY * w.y + colZ * w.z;
 }
 // ── Debug face colors ──────────────────────────────────────────────
 static const float3 faceDebugColors[6] = {
    float3(1.0, 0.2, 0.2),   // 0: +X = RED
    float3(0.5, 0.0, 0.0),   // 1: -X = DARK RED
@ -43,6 +130,8 @@ static const float3 faceDebugColors[6] = {
    float3(0.0, 0.0, 0.5),   // 5: -Z = DARK BLUE
 };
 // ── Main PS ────────────────────────────────────────────────────────
 [RootSignature(VOXEL_ROOTSIG)]
 float4 main(PSInput input) : SV_TARGET0
 {
@ -57,23 +146,137 @@ float4 main(PSInput input) : SV_TARGET0
        return float4(faceColor, 1.0);
    }
-    // ── NORMAL MODE: triplanar textured ──
+    // ── NORMAL MODE: triplanar textured with height-based blending ──
    float3 N = normalize(input.normal);
    float3 L = normalize(-sunDirection.xyz);
    float NdotL = max(dot(N, L), 0.0);
    float3 baseColor = N * 0.5 + 0.5;
    uint texIndex = clamp(input.materialID - 1u, 0u, 4u);
    float tiling = textureTiling;
    float3 albedo;
    // ── Height-based blending via PS voxel data lookup ──
    if (blendEnabled > 0.5 && input.materialID > 0u)
    {
        uint face = min(input.faceID, 5u);
        int3 normalDir = faceNormals[face];
        int3 uDir = faceUDirs[face];
        int3 vDir = faceVDirs[face];
        // Compute voxel coordinate from world position
        // Offset inward by normal * 0.001 to handle positive faces at integer boundaries
        float3 samplePos = input.worldPos - (float3)normalDir * 0.001;
        int3 voxelCoord = (int3)floor(samplePos);
        // Fractional position within the voxel face
        // Use worldPos directly (chunk origin is integer-aligned, so frac is same)
        float faceFracU = frac(dot(input.worldPos, (float3)uDir));
        float faceFracV = frac(dot(input.worldPos, (float3)vDir));
        // Distance from nearest edge (0 = at edge, 0.5 = at center)
        float uDist = 0.5 - abs(faceFracU - 0.5);
        float vDist = 0.5 - abs(faceFracV - 0.5);
        // Nearest edge direction: which side of the voxel face is this pixel closer to?
        int uSign = (faceFracU >= 0.5) ? 1 : -1;
        int vSign = (faceFracV >= 0.5) ? 1 : -1;
        int3 uEdgeDir = uDir * uSign;
        int3 vEdgeDir = vDir * vSign;
        // Get neighbor materials (with stair priority)
        uint uNeighborMat = getNeighborMat(voxelCoord, uEdgeDir, normalDir, input.chunkIndex);
        uint vNeighborMat = getNeighborMat(voxelCoord, vEdgeDir, normalDir, input.chunkIndex);
        // Blend zone: 0.25 voxels from each edge (covers 50% of face total)
        float blendZone = 0.25;
        // Edge distances normalized to 0..1 (0=center, 1=edge) for corner attenuation
        float uEdge = abs(faceFracU - 0.5) * 2.0; // 0 at center, 1 at edge
        float vEdge = abs(faceFracV - 0.5) * 2.0;
        // Corner attenuation — Subtractive (Unity reference style)
        // When one axis is very close to its edge (>0.80), it subtracts from the other axis
        float blendStart = 1.0 - blendZone * 2.0;
        float uAdj = uEdge - saturate(vEdge - 0.80);
        float vAdj = vEdge - saturate(uEdge - 0.80);
        float uWeight = saturate((uAdj - blendStart) / (1.0 - blendStart)) * 0.5;
        float vWeight = saturate((vAdj - blendStart) / (1.0 - blendStart)) * 0.5;
        // Only blend if neighbor has a different material
        bool uBlend = (uNeighborMat > 0u && uNeighborMat != input.materialID && uWeight > 0.001);
        bool vBlend = (vNeighborMat > 0u && vNeighborMat != input.materialID && vWeight > 0.001);
        // ── DEBUG BLEND MODE (F4): show blend zones as colors ──
        if (debugBlend > 0.5) {
            float3 debugColor = float3(0.3, 0.3, 0.3); // gray = no blend
            uint selfMat = readVoxelMat(voxelCoord, input.chunkIndex);
            if (selfMat != input.materialID) {
                return float4(1, 0, 0, 1); // RED = data mismatch bug
            }
            if (uBlend) debugColor.r = uWeight * 2.0;
            if (vBlend) debugColor.b = vWeight * 2.0;
            if (!uBlend && !vBlend) debugColor.g = 0.5;
            return float4(debugColor, 1.0);
        }
        if (uBlend || vBlend) {
            // Sample main material (RGBA: rgb=color, a=heightmap)
            float4 mainTex = sampleTriplanarRGBA(input.worldPos, N, texIndex, tiling);
            float3 result = mainTex.rgb;
            // Winner-takes-all height blending:
            // Each material's "score" = its heightmap + a proximity bias.
            // Near the edge (weight→0.5), both have equal bias → heightmap decides.
            // Far from the edge (weight→0), main gets a large bias → always wins.
            // The highest score wins 100% — transition is SHARP but its shape is organic.
            // A small sharpness factor softens the very edge to avoid aliasing.
            float sharpness = 16.0; // higher = sharper transition (∞ = binary)
            if (uBlend) {
                uint uTexIdx = clamp(uNeighborMat - 1u, 0u, 4u);
                float4 uTex = sampleTriplanarRGBA(input.worldPos, N, uTexIdx, tiling);
                // Symmetric proximity bias: at edge (weight=0.5) bias=0 → pure heightmap.
                // Away from edge (weight=0) bias=0.5 → main always wins.
                float bias = 0.5 - uWeight;
                float mainScore  = mainTex.a + bias;
                float neighScore = uTex.a   - bias;
                float blend = saturate((neighScore - mainScore) * sharpness + 0.5);
                result = lerp(result, uTex.rgb, blend);
            }
            if (vBlend) {
                uint vTexIdx = clamp(vNeighborMat - 1u, 0u, 4u);
                float4 vTex = sampleTriplanarRGBA(input.worldPos, N, vTexIdx, tiling);
                float bias = 0.5 - vWeight;
                float mainScore  = mainTex.a + bias;
                float neighScore = vTex.a   - bias;
                float blend = saturate((neighScore - mainScore) * sharpness + 0.5);
                result = lerp(result, vTex.rgb, blend);
            }
            albedo = result;
        } else {
            albedo = sampleTriplanar(input.worldPos, N, texIndex, tiling);
        }
    }
    else
    {
        float3 baseColor = N * 0.5 + 0.5;
        float3 texColor = sampleTriplanar(input.worldPos, N, texIndex, tiling);
        albedo = (input.materialID > 0u) ? texColor : baseColor;
    }
-    float3 albedo = (input.materialID > 0u) ? texColor : baseColor;
+    // ── Lighting ──
    float3 ambient = float3(0.15, 0.18, 0.25);
    float3 diffuse = sunColor.rgb * NdotL;
-    float3 color = albedo * (ambient + diffuse) * input.ao;
+    float3 color = albedo * (ambient + diffuse);
    // ── Distance fog ──
    float dist = length(input.worldPos - cameraPosition.xyz);
    float fog = 1.0 - exp(-dist * 0.003);
    float3 fogColor = float3(0.55, 0.70, 0.90);
--- a/shaders/voxelVS.hlsl
+++ b/shaders/voxelVS.hlsl
@ -1,5 +1,6 @@
 // BVLE Voxels - Vertex Shader (Vertex Pulling from mega-buffer)
-// Phase 2: supports both CPU draw loop (push constants) and GPU MDI (binary search).
+// Phase 2: supports CPU draw loop, GPU MDI, and GPU mesh modes.
 // Phase 3: passes chunkIndex to PS for voxel data neighbor lookups.
 #include "voxelCommon.hlsli"
@ -11,12 +12,10 @@ StructuredBuffer<PackedQuad> quadBuffer : register(t0);
 StructuredBuffer<GPUChunkInfo> chunkInfoBuffer : register(t2);
 // Push constants (48 bytes = 12 x uint32)
 //   CPU path: chunkIndex + quadOffset explicit
 //   MDI path: flags bit 0 set, VS derives chunk from SV_VertexID via binary search
 struct VoxelPush {
    uint chunkIndex;
    uint quadOffset;   // offset into mega quad buffer (in quads)
-    uint flags;        // bit 0: 1 = MDI mode (binary search), 0 = CPU mode
+    uint flags;        // bit 0: 1 = MDI mode, bit 1: GPU mesh mode
    uint pad0, pad1, pad2, pad3, pad4, pad5, pad6, pad7, pad8;
 };
 [[vk::push_constant]] ConstantBuffer<VoxelPush> push : register(b999);
@ -29,13 +28,12 @@ struct VSOutput {
    nointerpolation uint materialID : MATERIALID;
    nointerpolation uint faceID : FACEID;
    nointerpolation float debugFlag : DEBUGFLAG;
-    float ao        : AO;
+    nointerpolation uint chunkIndex : CHUNKINDEX;
 };
 // Unpack 64 bits from 2 x uint32
 void unpackQuad(uint2 raw, out uint px, out uint py, out uint pz,
-                out uint w, out uint h, out uint face,
+                out uint w, out uint h, out uint face, out uint matID)
                out uint matID, out uint ao)
 {
    uint lo = raw.x;
    uint hi = raw.y;
@ -46,12 +44,9 @@ void unpackQuad(uint2 raw, out uint px, out uint py, out uint pz,
    h     = (lo >> 24) & 0x3F;
    face  = ((lo >> 30) & 0x3) | ((hi & 0x1) << 2);
    matID = (hi >> 1) & 0xFF;
    ao    = (hi >> 9) & 0xFF;
 }
 // Binary search: find which chunk owns a given global quad index.
 // Chunks are packed contiguously in the mega-buffer, sorted by chunk index.
 // O(log2(chunkCount)) = ~11 iterations for 2048 chunks.
 uint findChunkIndex(uint globalQuadIndex) {
    uint lo = 0, hi = chunkCount;
    [loop]
@ -91,28 +86,21 @@ VSOutput main(uint vertexID : SV_VertexID)
 {
    VSOutput output;
    // Determine quad index and chunk index based on rendering mode
    uint quadIndex;
    uint chunkIndex = 0;
    if (push.flags & 2) {
-        // GPU mesh path: quads are in a flat buffer, chunk index is embedded
+        // GPU mesh path
        // in each quad's flags field (bits [31:17] of hi word = 11-bit chunk index).
        // push.quadOffset = base offset into the GPU quad buffer.
        quadIndex = push.quadOffset + (vertexID / 6);
    } else if (push.flags & 1) {
-        // MDI path: push.chunkIndex is packed by ExecuteIndirect command signature:
+        // MDI path
        //   low 16 bits  = chunk index into chunkInfoBuffer
        //   high 16 bits = face index (0-5)
        // SV_VertexID starts at 0 (startVertexLocation=0), so we compute the
        // global quad index from the GPUChunkInfo face offset.
        chunkIndex = push.chunkIndex & 0xFFFF;
        uint faceIdx = push.chunkIndex >> 16;
        GPUChunkInfo ci = chunkInfoBuffer[chunkIndex];
        uint faceOff = getFaceOffset(ci, faceIdx);
        quadIndex = ci.quadOffset + faceOff + (vertexID / 6);
    } else {
-        // CPU path: push constants provide explicit offsets
+        // CPU path
        quadIndex = push.quadOffset + (vertexID / 6);
        chunkIndex = push.chunkIndex;
    }
@ -120,10 +108,10 @@ VSOutput main(uint vertexID : SV_VertexID)
    uint cornerIndex = vertexID % 6;
    PackedQuad packed = quadBuffer[quadIndex];
-    uint px, py, pz, w, h, face, matID, ao;
+    uint px, py, pz, w, h, face, matID;
-    unpackQuad(packed.data, px, py, pz, w, h, face, matID, ao);
+    unpackQuad(packed.data, px, py, pz, w, h, face, matID);
-    // GPU mesh path: extract chunk index from quad flags field (bits [31:17] of hi word)
+    // GPU mesh path: extract chunk index from quad data bits [27:17] of hi word
    if (push.flags & 2) {
        chunkIndex = (packed.data.y >> 17) & 0x7FF;
    }
@ -131,8 +119,6 @@ VSOutput main(uint vertexID : SV_VertexID)
    GPUChunkInfo info = chunkInfoBuffer[chunkIndex];
    // Corner offsets for 2 triangles (6 vertices per quad)
    // cross(U,V) matches N for faces: +X(0), -Y(3), +Z(4) -> CW corners
    // cross(U,V) opposes N for faces: -X(1), +Y(2), -Z(5) -> CCW corners
    static const float2 cornersCW[6] = {
        float2(0, 0), float2(0, 1), float2(1, 0),
        float2(1, 0), float2(0, 1), float2(1, 1)
@ -163,11 +149,7 @@ VSOutput main(uint vertexID : SV_VertexID)
    output.materialID = matID;
    output.faceID = face;
    output.debugFlag = info.worldPos.w;
-
+    output.chunkIndex = chunkIndex;
    // AO: 4 corners x 2 bits
    uint aoCorner = min(cornerIndex, 3u);
    float aoValue = (float)((ao >> (aoCorner * 2u)) & 3u) / 3.0;
    output.ao = 1.0 - aoValue * 0.4;
    return output;
 }
--- a/src/voxel/VoxelRenderer.cpp
+++ b/src/voxel/VoxelRenderer.cpp
@ -184,9 +184,10 @@ void VoxelRenderer::createPipeline() {
 static void generateNoiseTexture(uint8_t* pixels, int w, int h,
    uint8_t r0, uint8_t g0, uint8_t b0,
    uint8_t r1, uint8_t g1, uint8_t b1,
-    uint32_t seed)
+    uint32_t seed, float heightFreq = 1.0f, float heightContrast = 1.0f)
 {
    uint32_t s = seed;
    uint32_t s2 = seed * 7919u + 104729u; // separate seed for heightmap
    for (int y = 0; y < h; y++) {
        for (int x = 0; x < w; x++) {
            s = s * 1664525u + 1013904223u;
@ -201,7 +202,15 @@ static void generateNoiseTexture(uint8_t* pixels, int w, int h,
            pixels[idx + 0] = (uint8_t)(r0 + (r1 - r0) * t);
            pixels[idx + 1] = (uint8_t)(g0 + (g1 - g0) * t);
            pixels[idx + 2] = (uint8_t)(b0 + (b1 - b0) * t);
-            pixels[idx + 3] = 255;
+
            // Heightmap in alpha: separate noise for height-based material blending
            s2 = s2 * 1664525u + 1013904223u;
            float hn = (float)(s2 & 0xFFFF) / 65535.0f;
            float hPattern = 0.5f + 0.5f * std::sin(fx * 12.0f * heightFreq + hn * 2.0f) *
                             std::cos(fy * 12.0f * heightFreq + hn * 2.0f);
            float heightVal = hn * 0.5f + hPattern * 0.5f;
            heightVal = std::clamp(heightVal * heightContrast, 0.0f, 1.0f);
            pixels[idx + 3] = (uint8_t)(heightVal * 255.0f);
        }
    }
 }
@ -212,13 +221,18 @@ void VoxelRenderer::generateTextures() {
    std::vector<uint8_t> allPixels(TEX_SIZE * TEX_SIZE * 4 * NUM_MATERIALS);
-    struct MatColor { uint8_t r0,g0,b0, r1,g1,b1; uint32_t seed; };
+    struct MatColor {
        uint8_t r0,g0,b0, r1,g1,b1;
        uint32_t seed;
        float heightFreq;     // heightmap noise frequency
        float heightContrast; // heightmap contrast (higher = more defined peaks)
    };
    MatColor colors[NUM_MATERIALS] = {
-        { 60, 140, 40,   80, 180, 60,  101 },  // Grass
+        { 60, 140, 40,   80, 180, 60,  101, 1.5f, 0.8f },  // Grass: medium bumps
-        { 100, 70, 40,  140, 100, 60,  202 },  // Dirt
+        { 100, 70, 40,  140, 100, 60,  202, 0.8f, 0.6f },  // Dirt: smooth mounds
-        { 110, 110, 105, 140, 140, 130, 303 },  // Stone
+        { 110, 110, 105, 140, 140, 130, 303, 2.5f, 1.2f },  // Stone: rough, high peaks
-        { 200, 190, 140, 230, 220, 170, 404 },  // Sand
+        { 200, 190, 140, 230, 220, 170, 404, 3.0f, 0.4f },  // Sand: fine, uniform
-        { 220, 225, 230, 245, 248, 252, 505 },  // Snow
+        { 220, 225, 230, 245, 248, 252, 505, 1.0f, 0.5f },  // Snow: smooth, soft
    };
    for (int i = 0; i < NUM_MATERIALS; i++) {
@ -226,7 +240,8 @@ void VoxelRenderer::generateTextures() {
        generateNoiseTexture(
            allPixels.data() + i * TEX_SIZE * TEX_SIZE * 4,
            TEX_SIZE, TEX_SIZE,
-            c.r0, c.g0, c.b0, c.r1, c.g1, c.b1, c.seed
+            c.r0, c.g0, c.b0, c.r1, c.g1, c.b1, c.seed,
            c.heightFreq, c.heightContrast
        );
    }
@ -672,7 +687,12 @@ void VoxelRenderer::render(
        cb.sunColor = XMFLOAT4(1.2f, 1.1f, 0.9f, 1.0f);
        cb.chunkSize = (float)CHUNK_SIZE;
        cb.textureTiling = 0.25f;
        cb.blendEnabled = 1.0f; // Phase 3: PS-based blending enabled in GPU mesh path
        cb.debugBlend = debugBlend_ ? 1.0f : 0.0f;
        cb.chunkCount = chunkCount_;
        cb._cullPad0 = 0;
        cb._cullPad1 = 0;
        cb._cullPad2 = 0;
        dev->UpdateBuffer(&constantBuffer_, &cb, cmd, sizeof(cb));
        // Render pass
@ -710,6 +730,7 @@ void VoxelRenderer::render(
        dev->BindResource(&gpuQuadBuffer_, 0, cmd);  // GPU quads, not mega-buffer
        dev->BindResource(&textureArray_, 1, cmd);
        dev->BindResource(&chunkInfoBuffer_, 2, cmd);
        dev->BindResource(&voxelDataBuffer_, 3, cmd); // Phase 3: voxel data for PS neighbor lookups
        dev->BindSampler(&sampler_, 0, cmd);
        // GPU mesh mode: flags=2, MUST be after BindPipelineState
@ -754,6 +775,11 @@ void VoxelRenderer::render(
    cb.sunColor = XMFLOAT4(1.2f, 1.1f, 0.9f, 1.0f);
    cb.chunkSize = (float)CHUNK_SIZE;
    cb.textureTiling = 0.25f;
    cb.blendEnabled = 0.0f; // Phase 3: blending disabled in CPU/MDI paths (no voxel data SRV)
    cb.debugBlend = 0.0f;
    cb._cullPad0 = 0;
    cb._cullPad1 = 0;
    cb._cullPad2 = 0;
    cb.chunkCount = chunkCount_;
    extractFrustumPlanes(vpMatrix, cb.frustumPlanes);
    dev->UpdateBuffer(&constantBuffer_, &cb, cmd, sizeof(cb));
@ -1173,6 +1199,20 @@ static constexpr wi::input::BUTTON KEY_S = (wi::input::BUTTON)(wi::input::CHARAC
 static constexpr wi::input::BUTTON KEY_D = (wi::input::BUTTON)(wi::input::CHARACTER_RANGE_START + ('D' - 'A'));
 void VoxelRenderPath::handleInput(float dt) {
    // F2: toggle backlog console
    if (wi::input::Press(wi::input::KEYBOARD_BUTTON_F2)) {
        wi::backlog::Toggle();
    }
    // F3: toggle animated terrain
    if (wi::input::Press(wi::input::KEYBOARD_BUTTON_F3)) {
        animatedTerrain_ = !animatedTerrain_;
        wi::backlog::post(animatedTerrain_ ? "Animation: ON (60 Hz)" : "Animation: OFF");
    }
    // F4: toggle blend debug visualization
    if (wi::input::Press(wi::input::KEYBOARD_BUTTON_F4)) {
        renderer.debugBlend_ = !renderer.debugBlend_;
        wi::backlog::post(renderer.debugBlend_ ? "Blend debug: ON" : "Blend debug: OFF");
    }
    if (wi::input::Press(wi::input::MOUSE_BUTTON_RIGHT)) {
        mouseCaptured = !mouseCaptured;
        wi::input::HidePointer(mouseCaptured);
@ -1352,7 +1392,7 @@ void VoxelRenderPath::Compose(CommandList cmd) const {
    char dtStr[16];
    snprintf(dtStr, sizeof(dtStr), "%.2f", lastDt_ * 1000.0f);
-    std::string stats = "BVLE Voxel Engine (Phase 2 — GPU-driven)\n";
+    std::string stats = "BVLE Voxel Engine (Phase 3 — Texture Blending)\n";
    stats += "FPS: " + std::string(fpsStr) + " (" + std::string(dtStr) + " ms)\n";
    if (debugMode) {
        stats += "=== DEBUG FACE MODE ===\n";
@ -1381,7 +1421,9 @@ void VoxelRenderPath::Compose(CommandList cmd) const {
        snprintf(drawStr, sizeof(drawStr), "%.3f", renderer.getGpuDrawTimeMs());
        stats += "GPU Cull: " + std::string(cullStr) + " ms | Draw: " + std::string(drawStr) + " ms\n";
    }
-    stats += "WASD+Space/Ctrl: move | Shift: fast | Right-click: capture mouse";
+    stats += "WASD+Space/Ctrl: move | Shift: fast | Right-click: capture mouse\n";
    stats += "F2: console | F3: anim [" + std::string(animatedTerrain_ ? "ON" : "OFF")
        + "] | F4: dbg [" + std::string(renderer.debugBlend_ ? "ON" : "OFF") + "]";
    wi::font::Draw(stats, fp, cmd);
 }
--- a/src/voxel/VoxelRenderer.h
+++ b/src/voxel/VoxelRenderer.h
@ -58,6 +58,7 @@ public:
    bool isMdiEnabled() const { return mdiEnabled_; }
    bool debugFaceColors_ = false;
    bool debugBlend_ = false;
 private:
    void createPipeline();
@ -120,7 +121,8 @@ private:
        XMFLOAT4 sunColor;
        float chunkSize;
        float textureTiling;
-        float _pad[2];
+        float blendEnabled;
        float debugBlend;
        XMFLOAT4 frustumPlanes[6]; // ax+by+cz+d=0
        uint32_t chunkCount;
        uint32_t _cullPad0;
@ -213,8 +215,8 @@ private:
    mutable float lastDt_ = 0.016f;
    mutable float smoothFps_ = 60.0f;
-    // Animated terrain (wave effect at 20 Hz)
+    // Animated terrain (wave effect at 60 Hz, toggled with F3)
-    bool animatedTerrain_ = true;
+    bool animatedTerrain_ = false;
    float animTime_ = 0.0f;
    float animAccum_ = 0.0f;
    static constexpr float ANIM_INTERVAL = 1.0f / 60.0f; // ~16.7ms = 60 Hz
--- a/src/voxel/VoxelTypes.h
+++ b/src/voxel/VoxelTypes.h
@ -34,17 +34,25 @@ static constexpr int CHUNK_SIZE = 32;
 static constexpr int CHUNK_VOLUME = CHUNK_SIZE * CHUNK_SIZE * CHUNK_SIZE;
 // ── Packed Vertex for Greedy Mesh Quads (8 bytes per quad) ──────
-// Layout per spec:
+// Layout:
-//   6 bits posX | 6 bits posY | 6 bits posZ |
+//   [5:0]   posX (6 bits)
-//   6 bits width | 6 bits height | 3 bits face |
+//   [11:6]  posY (6 bits)
-//   8 bits materialID | 8 bits AO | 15 bits flags
+//   [17:12] posZ (6 bits)
 //   [23:18] width (6 bits)
 //   [29:24] height (6 bits)
 //   [32:30] face (3 bits, split across lo/hi)
 //   [40:33] materialID (8 bits)
 //   [48:41] blendMatID (8 bits) — neighbor material for height-based blending
 //   [59:49] chunkIndex (11 bits, GPU mesh path)
 //   [63:60] blendEdges (4 bits) — which edges have the blend material (+U,-U,+V,-V)
 struct PackedQuad {
    uint64_t data;
    static PackedQuad create(
        uint8_t x, uint8_t y, uint8_t z,
        uint8_t w, uint8_t h, uint8_t face,
-        uint8_t materialID, uint8_t ao = 0, uint16_t flags = 0
+        uint8_t materialID, uint8_t blendMatID = 0,
        uint16_t chunkIndex = 0, uint8_t blendEdges = 0
    ) {
        PackedQuad q;
        q.data =
@ -55,8 +63,9 @@ struct PackedQuad {
            (uint64_t(h & 0x3F) << 24)    |
            (uint64_t(face & 0x7) << 30)  |
            (uint64_t(materialID) << 33)  |
-            (uint64_t(ao) << 41)          |
+            (uint64_t(blendMatID) << 41)  |
-            (uint64_t(flags & 0x7FFF) << 49);
+            (uint64_t(chunkIndex & 0x7FF) << 49) |
            (uint64_t(blendEdges & 0xF) << 60);
        return q;
    }
@ -67,8 +76,9 @@ struct PackedQuad {
    uint8_t getHeight() const { return uint8_t((data >> 24) & 0x3F); }
    uint8_t getFace() const { return uint8_t((data >> 30) & 0x7); }
    uint8_t getMaterialID() const { return uint8_t((data >> 33) & 0xFF); }
-    uint8_t getAO() const { return uint8_t((data >> 41) & 0xFF); }
+    uint8_t getBlendMatID() const { return uint8_t((data >> 41) & 0xFF); }
-    uint16_t getFlags() const { return uint16_t((data >> 49) & 0x7FFF); }
+    uint16_t getChunkIndex() const { return uint16_t((data >> 49) & 0x7FF); }
    uint8_t getBlendEdges() const { return uint8_t((data >> 60) & 0xF); }
 };
 // Face directions: +X, -X, +Y, -Y, +Z, -Z