Refactor: extract VoxelRTManager, DeferredGPUBuffer, decompose VoxelRenderPath

- Extract DeferredGPUBuffer utility (staging→dirty→capacity GPU buffer pattern)
- Extract VoxelRTManager from VoxelRenderer (~500 lines: BLAS/TLAS, RT shadows+AO)
- Decompose VoxelRenderPath into CameraController, AnimationState, VoxelProfiler
- Replace toping std::sort with O(n) counting sort by (type, variant)
- Update CLAUDE.md architecture docs to reflect new file structure
This commit is contained in:
Samuel Bouchet 2026-03-31 13:46:35 +02:00
parent 53df73e5e6
commit 57ac08f231
7 changed files with 1294 additions and 1070 deletions

View file

@ -18,7 +18,9 @@ bvle-voxels/
│ │ ├── VoxelTypes.h # Types fondamentaux (VoxelData, PackedQuad, MaterialDesc, ChunkPos)
│ │ ├── VoxelWorld.h/.cpp # Monde voxel (hashmap de chunks, génération procédurale)
│ │ ├── VoxelMesher.h/.cpp # Binary Greedy Mesher CPU + SmoothMesher (Naive Surface Nets)
│ │ ├── VoxelRenderer.h/.cpp# Renderer + VoxelRenderPath (sous-classe RenderPath3D)
│ │ ├── VoxelRenderer.h/.cpp# Renderer + VoxelRenderPath (CameraController, AnimationState, VoxelProfiler)
│ │ ├── VoxelRTManager.h/.cpp # Ray tracing: BLAS/TLAS lifecycle, shadows+AO dispatches
│ │ ├── DeferredGPUBuffer.h # Utilitaire staging→dirty→capacity GPU buffer upload
│ │ └── TopingSystem.h/.cpp # Système de topings (biseaux décoratifs sur faces +Y)
│ └── app/
│ └── main.cpp # Point d'entrée Win32 + crash handler SEH
@ -129,7 +131,11 @@ Perlin noise 3D, fBm 5 octaves (2 en animation), caves 3D, matériaux par altitu
- **Per-chunk info** : `StructuredBuffer<GPUChunkInfo>` (80 bytes/chunk)
- **Height-based blending** (Phase 3) : PS lit `voxelDataBuffer` (t3), winner-takes-all heightmap, corner attenuation
- **Render targets propres** : `voxelRT_` (R8G8B8A8) + `voxelDepth_` (D32_FLOAT)
- **CPU profiling** : `ProfileAccum` avec moyennes toutes les 5s
- **CPU profiling** : `VoxelProfiler` (21 `ProfileAccum`, moyennes toutes les 5s)
- **DeferredGPUBuffer** : utilitaire pour buffers GPU avec staging CPU, dirty flag, capacity-based growth (25% headroom)
- **VoxelRTManager** (`VoxelRTManager.h/.cpp`) : gère BLAS/TLAS, dispatches RT shadows+AO, isolé du renderer
- **VoxelRenderPath** décomposé en : `CameraController` (mouvement/souris), `AnimationState` (tick terrain), `VoxelProfiler`
- **Toping sort** : counting sort O(n) par (type, variant) au lieu de `std::sort`
## Phases de développement

View file

@ -0,0 +1,68 @@
#pragma once
#include "WickedEngine.h"
namespace voxel {
// ── Deferred GPU Buffer ─────────────────────────────────────────
// Encapsulates the repeated pattern of:
// 1. CPU staging data prepared during Update()
// 2. GPU buffer with capacity-based growth (25% headroom)
// 3. Dirty flag for deferred upload in Render()
//
// Eliminates ~50 lines of boilerplate per buffer and centralizes
// the invariants (capacity >= count, CreateBuffer with nullptr,
// UpdateBuffer with actual data size).
struct DeferredGPUBuffer {
wi::graphics::GPUBuffer gpu;
mutable uint32_t capacity = 0; // in elements
mutable bool dirty = false;
uint32_t stride = 0; // bytes per element
// Ensure GPU buffer has enough capacity for elementCount elements.
// Creates/recreates buffer only when capacity is insufficient.
// Returns true if buffer was (re)created.
bool ensureCapacity(wi::graphics::GraphicsDevice* device,
uint32_t elementCount,
uint32_t elementStride,
wi::graphics::BindFlag bindFlags,
wi::graphics::ResourceMiscFlag miscFlags = wi::graphics::ResourceMiscFlag::BUFFER_STRUCTURED)
{
stride = elementStride;
if (gpu.IsValid() && capacity >= elementCount) return false;
capacity = elementCount + elementCount / 4; // 25% headroom
wi::graphics::GPUBufferDesc desc;
desc.size = (uint64_t)capacity * stride;
desc.bind_flags = bindFlags;
desc.misc_flags = miscFlags;
desc.stride = (miscFlags == wi::graphics::ResourceMiscFlag::BUFFER_STRUCTURED) ? stride : 0;
desc.usage = wi::graphics::Usage::DEFAULT;
device->CreateBuffer(&desc, nullptr, &gpu);
dirty = true;
return true;
}
// Upload data to GPU. Call from Render() with a valid CommandList.
// dataCount = number of elements to upload (may be < capacity).
void upload(wi::graphics::GraphicsDevice* device,
wi::graphics::CommandList cmd,
const void* data,
uint32_t dataCount) const
{
if (!dirty || !gpu.IsValid() || dataCount == 0 || !data) return;
size_t uploadSize = (size_t)dataCount * stride;
size_t bufferSize = (size_t)capacity * stride;
if (uploadSize <= bufferSize) {
device->UpdateBuffer(&gpu, data, cmd, uploadSize);
}
dirty = false;
}
// Mark as needing upload (call after staging data changes).
void markDirty() { dirty = true; }
bool isValid() const { return gpu.IsValid(); }
};
} // namespace voxel

View file

@ -0,0 +1,610 @@
#include "VoxelRTManager.h"
#include <cstring>
using namespace wi::graphics;
namespace voxel {
void VoxelRTManager::initialize(GraphicsDevice* dev, uint32_t maxBlasVertices) {
device_ = dev;
maxBlasVertices_ = maxBlasVertices;
available_ = dev->CheckCapability(GraphicsDeviceCapability::RAYTRACING);
if (!available_) {
wi::backlog::post("VoxelRTManager: RT not available (GPU does not support ray tracing)");
return;
}
wi::renderer::LoadShader(ShaderStage::CS, blasExtractShader_, "voxel/voxelBLASExtractCS.cso");
if (blasExtractShader_.IsValid()) {
// BLAS position buffer: 6 float3 per quad (non-indexed triangles), raw buffer
GPUBufferDesc posDesc;
posDesc.size = (uint64_t)maxBlasVertices * sizeof(float) * 3;
posDesc.bind_flags = BindFlag::UNORDERED_ACCESS | BindFlag::SHADER_RESOURCE;
posDesc.misc_flags = ResourceMiscFlag::BUFFER_RAW;
posDesc.stride = 0;
posDesc.usage = Usage::DEFAULT;
bool ok = dev->CreateBuffer(&posDesc, nullptr, &blasPositionBuffer_);
// Sequential index buffer for BLAS
GPUBufferDesc idxDesc;
idxDesc.size = (uint64_t)maxBlasVertices * sizeof(uint32_t);
idxDesc.bind_flags = BindFlag::SHADER_RESOURCE;
idxDesc.usage = Usage::DEFAULT;
auto fillIndices = [maxBlasVertices](void* dest) {
uint32_t* p = (uint32_t*)dest;
for (uint32_t i = 0; i < maxBlasVertices; i++)
p[i] = i;
};
bool okIdx = dev->CreateBuffer2(&idxDesc, fillIndices, &blasIndexBuffer_);
if (ok && blasPositionBuffer_.IsValid() && okIdx && blasIndexBuffer_.IsValid()) {
dev->SetName(&blasPositionBuffer_, "VoxelRTManager::blasPositionBuffer");
dev->SetName(&blasIndexBuffer_, "VoxelRTManager::blasIndexBuffer");
wi::backlog::post("VoxelRTManager: RT available (BLAS pos "
+ std::to_string(posDesc.size / (1024*1024)) + " MB + idx "
+ std::to_string(idxDesc.size / (1024*1024)) + " MB)");
} else {
available_ = false;
wi::backlog::post("VoxelRTManager: RT buffer creation failed", wi::backlog::LogLevel::Warning);
}
} else {
available_ = false;
wi::backlog::post("VoxelRTManager: BLAS extraction shader failed", wi::backlog::LogLevel::Warning);
}
// Toping BLAS CS
wi::renderer::LoadShader(ShaderStage::CS, topingBLASShader_, "voxel/voxelTopingBLASCS.cso");
if (topingBLASShader_.IsValid()) {
static constexpr uint32_t MAX_GROUPS = 64;
GPUBufferDesc grpDesc;
grpDesc.size = MAX_GROUPS * 20; // 5 × uint32 per group
grpDesc.bind_flags = BindFlag::SHADER_RESOURCE;
grpDesc.misc_flags = ResourceMiscFlag::BUFFER_STRUCTURED;
grpDesc.stride = 20;
grpDesc.usage = Usage::DEFAULT;
dev->CreateBuffer(&grpDesc, nullptr, &topingBLASGroupBuffer_);
wi::backlog::post("VoxelRTManager: toping BLAS CS available");
} else {
wi::backlog::post("VoxelRTManager: toping BLAS CS failed", wi::backlog::LogLevel::Warning);
}
// RT Shadows + AO
wi::renderer::LoadShader(ShaderStage::CS, shadowShader_, "voxel/voxelShadowCS.cso",
ShaderModel::SM_6_5);
wi::renderer::LoadShader(ShaderStage::CS, aoBlurShader_, "voxel/voxelAOBlurCS.cso");
wi::renderer::LoadShader(ShaderStage::CS, aoApplyShader_, "voxel/voxelAOApplyCS.cso");
if (shadowShader_.IsValid() && aoBlurShader_.IsValid() && aoApplyShader_.IsValid()) {
shadowsEnabled_ = true;
wi::backlog::post("VoxelRTManager: RT shadows + AO blur available");
} else {
wi::backlog::post("VoxelRTManager: RT shadow/AO shader(s) failed",
wi::backlog::LogLevel::Warning);
}
}
// ── BLAS extraction: blocky quads → float3 positions ────────────
void VoxelRTManager::dispatchBLASExtract(CommandList cmd,
const GPUBuffer& quadBuffer,
const GPUBuffer& chunkInfoBuffer,
uint32_t quadCount) const
{
if (!available_ || !blasExtractShader_.IsValid() || quadCount == 0) return;
auto* dev = device_;
GPUBarrier preBarriers[] = {
GPUBarrier::Buffer(&blasPositionBuffer_,
ResourceState::UNDEFINED, ResourceState::UNORDERED_ACCESS),
};
dev->Barrier(preBarriers, 1, cmd);
dev->BindComputeShader(&blasExtractShader_, cmd);
dev->BindResource(&quadBuffer, 0, cmd); // t0
dev->BindResource(&chunkInfoBuffer, 2, cmd); // t2
dev->BindUAV(&blasPositionBuffer_, 0, cmd); // u0
struct BLASPush {
uint32_t quadCount;
uint32_t pad[11];
} pushData = {};
pushData.quadCount = quadCount;
dev->PushConstants(&pushData, sizeof(pushData), cmd);
uint32_t groupCount = (quadCount + 63) / 64;
dev->Dispatch(groupCount, 1, 1, cmd);
GPUBarrier postBarriers[] = {
GPUBarrier::Buffer(&blasPositionBuffer_,
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
};
dev->Barrier(postBarriers, 1, cmd);
blockyVertexCount_ = quadCount * 6;
}
// ── Toping BLAS extraction (GPU compute) ────────────────────────
void VoxelRTManager::dispatchTopingBLASExtract(CommandList cmd,
const GPUBuffer& topingVertexBuffer,
const GPUBuffer& topingInstanceBuffer,
const void* groupsGPUData, size_t groupsGPUSize,
uint32_t groupCount, uint32_t totalVertices) const
{
if (!topingBLASShader_.IsValid() || !topingBLASGroupBuffer_.IsValid() ||
!topingBLASPositionBuf_.isValid() || !topingVertexBuffer.IsValid() ||
!topingInstanceBuffer.IsValid() || totalVertices == 0 || groupCount == 0)
return;
auto* dev = device_;
// Upload group table
dev->UpdateBuffer(&topingBLASGroupBuffer_, groupsGPUData, cmd, groupsGPUSize);
GPUBarrier preBarriers[] = {
GPUBarrier::Buffer(&topingBLASGroupBuffer_,
ResourceState::COPY_DST, ResourceState::SHADER_RESOURCE),
GPUBarrier::Buffer(&topingBLASPositionBuf_.gpu,
ResourceState::UNDEFINED, ResourceState::UNORDERED_ACCESS),
};
dev->Barrier(preBarriers, 2, cmd);
dev->BindComputeShader(&topingBLASShader_, cmd);
dev->BindResource(&topingVertexBuffer, 4, cmd); // t4
dev->BindResource(&topingInstanceBuffer, 5, cmd); // t5
dev->BindResource(&topingBLASGroupBuffer_, 7, cmd); // t7
dev->BindUAV(&topingBLASPositionBuf_.gpu, 0, cmd); // u0
struct {
uint32_t totalVertices;
uint32_t groupCount;
uint32_t pad[10];
} pushData = {};
pushData.totalVertices = totalVertices;
pushData.groupCount = groupCount;
dev->PushConstants(&pushData, sizeof(pushData), cmd);
uint32_t threadGroups = (totalVertices + 63) / 64;
dev->Dispatch(threadGroups, 1, 1, cmd);
GPUBarrier postBarriers[] = {
GPUBarrier::Buffer(&topingBLASPositionBuf_.gpu,
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
};
dev->Barrier(postBarriers, 1, cmd);
topingVertexCount_ = totalVertices;
dirty = true;
topingBLASDirty = false;
}
// ── Ensure toping BLAS buffer capacity ──────────────────────────
bool VoxelRTManager::ensureTopingBLASCapacity(uint32_t totalVertices) {
if (totalVertices == 0) return false;
bool recreated = topingBLASPositionBuf_.ensureCapacity(device_, totalVertices,
3 * sizeof(float),
BindFlag::UNORDERED_ACCESS | BindFlag::SHADER_RESOURCE,
ResourceMiscFlag::BUFFER_RAW);
if (recreated) {
char msg[256];
snprintf(msg, sizeof(msg), "VoxelRTManager: toping BLAS pos buffer (%u capacity, %.1f MB)",
topingBLASPositionBuf_.capacity,
(size_t)topingBLASPositionBuf_.capacity * 3 * sizeof(float) / (1024.0 * 1024.0));
wi::backlog::post(msg);
}
// Index buffer: grow if needed
if (topingBLASIndexCount_ < topingBLASPositionBuf_.capacity) {
uint32_t idxCount = topingBLASPositionBuf_.capacity;
std::vector<uint32_t> indices(idxCount);
for (uint32_t j = 0; j < idxCount; j++) indices[j] = j;
GPUBufferDesc idxDesc;
idxDesc.size = (size_t)idxCount * sizeof(uint32_t);
idxDesc.bind_flags = BindFlag::SHADER_RESOURCE;
idxDesc.misc_flags = ResourceMiscFlag::NONE;
idxDesc.usage = Usage::DEFAULT;
device_->CreateBuffer(&idxDesc, indices.data(), &topingBLASIndexBuffer_);
topingBLASIndexCount_ = idxCount;
recreated = true;
}
topingBLASDirty = true;
return recreated;
}
// ── Acceleration structure build ────────────────────────────────
void VoxelRTManager::buildAccelerationStructures(CommandList cmd,
uint32_t buildFlags,
const GPUBuffer& smoothVB,
uint32_t smoothVertCount) const
{
if (!available_) return;
auto* dev = device_;
// ── Blocky BLAS ──
uint32_t blockyVertCount = blockyVertexCount_;
if (blockyVertCount < 3) blockyVertCount = 0;
if ((buildFlags & BUILD_BLOCKY) && blockyVertCount > 0 && blasPositionBuffer_.IsValid()) {
if (!blockyBLAS_.IsValid() || blockyVertCount > blockyBLASCapacity_) {
blockyBLASCapacity_ = blockyVertCount + blockyVertCount / 4;
RaytracingAccelerationStructureDesc desc;
desc.type = RaytracingAccelerationStructureDesc::Type::BOTTOMLEVEL;
desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
desc.bottom_level.geometries.resize(1);
auto& geom = desc.bottom_level.geometries[0];
geom.type = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::Type::TRIANGLES;
geom.flags = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::FLAG_OPAQUE;
geom.triangles.vertex_buffer = blasPositionBuffer_;
geom.triangles.vertex_byte_offset = 0;
geom.triangles.vertex_count = blockyBLASCapacity_;
geom.triangles.vertex_stride = sizeof(float) * 3;
geom.triangles.vertex_format = Format::R32G32B32_FLOAT;
geom.triangles.index_buffer = blasIndexBuffer_;
geom.triangles.index_count = blockyBLASCapacity_;
geom.triangles.index_format = IndexBufferFormat::UINT32;
geom.triangles.index_offset = 0;
bool ok = dev->CreateRaytracingAccelerationStructure(&desc, &blockyBLAS_);
if (ok) {
dev->SetName(&blockyBLAS_, "VoxelRTManager::blockyBLAS");
wi::backlog::post("VoxelRTManager: blocky BLAS created (capacity "
+ std::to_string(blockyBLASCapacity_ / 3) + " tris)");
} else {
wi::backlog::post("VoxelRTManager: failed to create blocky BLAS", wi::backlog::LogLevel::Error);
available_ = false;
return;
}
}
blockyBLAS_.desc.bottom_level.geometries[0].triangles.vertex_count = blockyVertCount;
blockyBLAS_.desc.bottom_level.geometries[0].triangles.index_count = blockyVertCount;
dev->BuildRaytracingAccelerationStructure(&blockyBLAS_, cmd, nullptr);
}
// ── Smooth BLAS ──
if (smoothVertCount < 3) smoothVertCount = 0;
if ((buildFlags & BUILD_SMOOTH) && smoothVertCount > 0 && smoothVB.IsValid()) {
if (!smoothBLAS_.IsValid() || smoothVertCount > smoothBLASCapacity_) {
smoothBLASCapacity_ = smoothVertCount + smoothVertCount / 4;
RaytracingAccelerationStructureDesc desc;
desc.type = RaytracingAccelerationStructureDesc::Type::BOTTOMLEVEL;
desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
desc.bottom_level.geometries.resize(1);
auto& geom = desc.bottom_level.geometries[0];
geom.type = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::Type::TRIANGLES;
geom.flags = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::FLAG_OPAQUE;
geom.triangles.vertex_buffer = smoothVB;
geom.triangles.vertex_byte_offset = 0;
geom.triangles.vertex_count = smoothBLASCapacity_;
geom.triangles.vertex_stride = 32;
geom.triangles.index_buffer = blasIndexBuffer_;
geom.triangles.index_count = smoothBLASCapacity_;
geom.triangles.index_format = IndexBufferFormat::UINT32;
geom.triangles.index_offset = 0;
geom.triangles.vertex_format = Format::R32G32B32_FLOAT;
bool ok = dev->CreateRaytracingAccelerationStructure(&desc, &smoothBLAS_);
if (ok) {
dev->SetName(&smoothBLAS_, "VoxelRTManager::smoothBLAS");
wi::backlog::post("VoxelRTManager: smooth BLAS created (capacity "
+ std::to_string(smoothBLASCapacity_ / 3) + " tris)");
} else {
wi::backlog::post("VoxelRTManager: failed to create smooth BLAS", wi::backlog::LogLevel::Error);
}
}
if (smoothBLAS_.IsValid()) {
smoothBLAS_.desc.bottom_level.geometries[0].triangles.vertex_count = smoothVertCount;
smoothBLAS_.desc.bottom_level.geometries[0].triangles.index_count = smoothVertCount;
dev->BuildRaytracingAccelerationStructure(&smoothBLAS_, cmd, nullptr);
}
smoothVertexCount_ = smoothVertCount;
}
// ── Toping BLAS ──
uint32_t topingVertCount = topingVertexCount_;
if ((buildFlags & BUILD_TOPING) && topingVertCount >= 3 && topingBLASPositionBuf_.isValid()) {
if (!topingBLAS_.IsValid() || topingVertCount > topingBLASASCapacity_) {
topingBLASASCapacity_ = topingVertCount + topingVertCount / 4;
RaytracingAccelerationStructureDesc desc;
desc.type = RaytracingAccelerationStructureDesc::Type::BOTTOMLEVEL;
desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
desc.bottom_level.geometries.resize(1);
auto& geom = desc.bottom_level.geometries[0];
geom.type = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::Type::TRIANGLES;
geom.flags = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::FLAG_OPAQUE;
geom.triangles.vertex_buffer = topingBLASPositionBuf_.gpu;
geom.triangles.vertex_byte_offset = 0;
geom.triangles.vertex_count = topingBLASASCapacity_;
geom.triangles.vertex_stride = sizeof(float) * 3;
geom.triangles.vertex_format = Format::R32G32B32_FLOAT;
geom.triangles.index_buffer = topingBLASIndexBuffer_;
geom.triangles.index_count = topingBLASASCapacity_;
geom.triangles.index_format = IndexBufferFormat::UINT32;
geom.triangles.index_offset = 0;
bool ok = dev->CreateRaytracingAccelerationStructure(&desc, &topingBLAS_);
if (ok) {
dev->SetName(&topingBLAS_, "VoxelRTManager::topingBLAS");
wi::backlog::post("VoxelRTManager: toping BLAS created (capacity "
+ std::to_string(topingBLASASCapacity_ / 3) + " tris)");
} else {
wi::backlog::post("VoxelRTManager: failed to create toping BLAS", wi::backlog::LogLevel::Error);
}
}
if (topingBLAS_.IsValid()) {
topingBLAS_.desc.bottom_level.geometries[0].triangles.vertex_count = topingVertCount;
topingBLAS_.desc.bottom_level.geometries[0].triangles.index_count = topingVertCount;
dev->BuildRaytracingAccelerationStructure(&topingBLAS_, cmd, nullptr);
}
}
// Memory barrier: sync BLAS builds before TLAS
{
GPUBarrier barriers[] = { GPUBarrier::Memory() };
dev->Barrier(barriers, 1, cmd);
}
// ── TLAS ──
uint32_t instanceCount = 0;
if (blockyBLAS_.IsValid()) instanceCount++;
if (smoothBLAS_.IsValid() && smoothVertCount > 0) instanceCount++;
if (topingBLAS_.IsValid() && topingVertCount >= 3) instanceCount++;
if (instanceCount == 0) { dirty = false; return; }
if (!tlas_.IsValid() || instanceCount != tlasInstanceCount_) {
const size_t instSize = dev->GetTopLevelAccelerationStructureInstanceSize();
auto setIdentity = [](float transform[3][4]) {
std::memset(transform, 0, sizeof(float) * 12);
transform[0][0] = 1.0f;
transform[1][1] = 1.0f;
transform[2][2] = 1.0f;
};
const RaytracingAccelerationStructure* blockyPtr = blockyBLAS_.IsValid() ? &blockyBLAS_ : nullptr;
const RaytracingAccelerationStructure* smoothPtr = (smoothBLAS_.IsValid() && smoothVertCount > 0) ? &smoothBLAS_ : nullptr;
const RaytracingAccelerationStructure* topingPtr = (topingBLAS_.IsValid() && topingVertCount >= 3) ? &topingBLAS_ : nullptr;
RaytracingAccelerationStructureDesc desc;
desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
desc.type = RaytracingAccelerationStructureDesc::Type::TOPLEVEL;
desc.top_level.count = instanceCount;
GPUBufferDesc bufdesc;
bufdesc.misc_flags = ResourceMiscFlag::RAY_TRACING;
bufdesc.stride = (uint32_t)instSize;
bufdesc.size = bufdesc.stride * desc.top_level.count;
auto initInstances = [&](void* dest) {
uint32_t idx = 0;
auto addInstance = [&](const RaytracingAccelerationStructure* blas, uint32_t id) {
if (!blas) return;
RaytracingAccelerationStructureDesc::TopLevel::Instance inst;
setIdentity(inst.transform);
inst.instance_id = id; inst.instance_mask = 0xFF;
inst.instance_contribution_to_hit_group_index = 0; inst.flags = 0;
inst.bottom_level = blas;
dev->WriteTopLevelAccelerationStructureInstance(&inst, (uint8_t*)dest + idx * instSize);
idx++;
};
addInstance(blockyPtr, 0);
addInstance(smoothPtr, 1);
addInstance(topingPtr, 2);
};
bool ok = dev->CreateBuffer2(&bufdesc, initInstances, &desc.top_level.instance_buffer);
if (!ok) {
wi::backlog::post("VoxelRTManager: failed to create TLAS instance buffer", wi::backlog::LogLevel::Error);
dirty = false;
return;
}
ok = dev->CreateRaytracingAccelerationStructure(&desc, &tlas_);
if (!ok) {
wi::backlog::post("VoxelRTManager: failed to create TLAS", wi::backlog::LogLevel::Error);
dirty = false;
return;
}
tlasInstanceCount_ = instanceCount;
wi::backlog::post("VoxelRTManager: TLAS created (" + std::to_string(instanceCount) + " instances)");
}
dev->BuildRaytracingAccelerationStructure(&tlas_, cmd, nullptr);
{
GPUBarrier barriers[] = { GPUBarrier::Memory(&tlas_) };
dev->Barrier(barriers, 1, cmd);
}
dirty = false;
}
// ── RT Shadow + AO dispatch ─────────────────────────────────────
void VoxelRTManager::dispatchShadows(CommandList cmd,
const Texture& depthBuffer,
const Texture& renderTarget,
const Texture& normalTarget,
const GPUBuffer& constantBuffer) const
{
if (!shadowsEnabled_ || !shadowShader_.IsValid() || !tlas_.IsValid())
return;
auto* dev = device_;
uint32_t w = renderTarget.GetDesc().width;
uint32_t h = renderTarget.GetDesc().height;
uint32_t gx = (w + 7) / 8;
uint32_t gy = (h + 7) / 8;
// Pass 1: Shadow + raw AO
{
GPUBarrier preBarriers[] = {
GPUBarrier::Image(&const_cast<Texture&>(depthBuffer),
ResourceState::DEPTHSTENCIL, ResourceState::SHADER_RESOURCE),
GPUBarrier::Image(&const_cast<Texture&>(renderTarget),
ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
GPUBarrier::Image(&aoRawTexture,
ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
};
dev->Barrier(preBarriers, 3, cmd);
dev->BindComputeShader(&shadowShader_, cmd);
dev->BindResource(&depthBuffer, 0, cmd);
dev->BindResource(&normalTarget, 1, cmd);
dev->BindResource(&tlas_, 2, cmd);
dev->BindResource(&aoHistoryTexture, 3, cmd);
dev->BindUAV(&renderTarget, 0, cmd);
dev->BindUAV(&aoRawTexture, 1, cmd);
dev->BindConstantBuffer(&constantBuffer, 0, cmd);
struct ShadowPush {
uint32_t width, height;
float normalBias, shadowMaxDist;
uint32_t debugMode;
float aoRadius;
uint32_t aoRayCount;
float aoStrength;
uint32_t frameIndex;
uint32_t historyValid;
uint32_t pad[2];
} pushData = {};
pushData.width = w;
pushData.height = h;
pushData.normalBias = 0.15f;
pushData.shadowMaxDist = 512.0f;
pushData.debugMode = shadowDebug_;
pushData.aoRadius = 8.0f;
pushData.aoRayCount = 4;
pushData.aoStrength = 0.7f;
pushData.frameIndex = frameCounter++;
pushData.historyValid = aoHistoryValid ? 1u : 0u;
dev->PushConstants(&pushData, sizeof(pushData), cmd);
dev->Dispatch(gx, gy, 1, cmd);
}
// Pass 1.5: Copy raw AO → history
{
GPUBarrier copyBarriers[] = {
GPUBarrier::Image(&aoRawTexture,
ResourceState::UNORDERED_ACCESS, ResourceState::COPY_SRC),
GPUBarrier::Image(&aoHistoryTexture,
ResourceState::SHADER_RESOURCE, ResourceState::COPY_DST),
};
dev->Barrier(copyBarriers, 2, cmd);
dev->CopyResource(&aoHistoryTexture, &aoRawTexture, cmd);
GPUBarrier postCopyBarriers[] = {
GPUBarrier::Image(&aoRawTexture,
ResourceState::COPY_SRC, ResourceState::SHADER_RESOURCE),
GPUBarrier::Image(&aoHistoryTexture,
ResourceState::COPY_DST, ResourceState::SHADER_RESOURCE),
};
dev->Barrier(postCopyBarriers, 2, cmd);
aoHistoryValid = true;
}
// Pass 2: Bilateral blur horizontal
{
GPUBarrier barriers[] = {
GPUBarrier::Image(&aoBlurredTexture,
ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
};
dev->Barrier(barriers, 1, cmd);
dev->BindComputeShader(&aoBlurShader_, cmd);
dev->BindResource(&aoRawTexture, 0, cmd);
dev->BindResource(&depthBuffer, 1, cmd);
dev->BindResource(&normalTarget, 2, cmd);
dev->BindUAV(&aoBlurredTexture, 0, cmd);
struct BlurPush {
uint32_t width, height, direction, radius;
float depthThreshold, normalThreshold;
uint32_t pad[6];
} blurPush = {};
blurPush.width = w; blurPush.height = h;
blurPush.direction = 0; blurPush.radius = 6;
blurPush.depthThreshold = 0.001f; blurPush.normalThreshold = 0.9f;
dev->PushConstants(&blurPush, sizeof(blurPush), cmd);
dev->Dispatch(gx, gy, 1, cmd);
}
// Pass 3: Bilateral blur vertical
{
GPUBarrier barriers[] = {
GPUBarrier::Image(&aoBlurredTexture,
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
GPUBarrier::Image(&aoRawTexture,
ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
};
dev->Barrier(barriers, 2, cmd);
dev->BindComputeShader(&aoBlurShader_, cmd);
dev->BindResource(&aoBlurredTexture, 0, cmd);
dev->BindResource(&depthBuffer, 1, cmd);
dev->BindResource(&normalTarget, 2, cmd);
dev->BindUAV(&aoRawTexture, 0, cmd);
struct BlurPush {
uint32_t width, height, direction, radius;
float depthThreshold, normalThreshold;
uint32_t pad[6];
} blurPush = {};
blurPush.width = w; blurPush.height = h;
blurPush.direction = 1; blurPush.radius = 6;
blurPush.depthThreshold = 0.001f; blurPush.normalThreshold = 0.9f;
dev->PushConstants(&blurPush, sizeof(blurPush), cmd);
dev->Dispatch(gx, gy, 1, cmd);
}
// Pass 4: Apply blurred AO
{
GPUBarrier barriers[] = {
GPUBarrier::Image(&aoRawTexture,
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
};
dev->Barrier(barriers, 1, cmd);
dev->BindComputeShader(&aoApplyShader_, cmd);
dev->BindResource(&aoRawTexture, 0, cmd);
dev->BindResource(&depthBuffer, 1, cmd);
dev->BindUAV(&renderTarget, 0, cmd);
struct ApplyPush {
uint32_t width, height, debugMode;
uint32_t pad[9];
} applyPush = {};
applyPush.width = w; applyPush.height = h;
applyPush.debugMode = shadowDebug_;
dev->PushConstants(&applyPush, sizeof(applyPush), cmd);
dev->Dispatch(gx, gy, 1, cmd);
}
// Restore resource states
GPUBarrier postBarriers[] = {
GPUBarrier::Image(&const_cast<Texture&>(depthBuffer),
ResourceState::SHADER_RESOURCE, ResourceState::DEPTHSTENCIL),
GPUBarrier::Image(&const_cast<Texture&>(renderTarget),
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
};
dev->Barrier(postBarriers, 2, cmd);
}
} // namespace voxel

124
src/voxel/VoxelRTManager.h Normal file
View file

@ -0,0 +1,124 @@
#pragma once
#include "DeferredGPUBuffer.h"
#include "WickedEngine.h"
namespace voxel {
// ── Ray Tracing Manager (Phase 6) ──────────────────────────────
// Groups all RT state: BLAS/TLAS management, shadow/AO dispatches.
// Extracted from VoxelRenderer to isolate the ~500 lines of RT code
// and its 20+ members for easier debugging and maintenance.
class VoxelRTManager {
public:
// ── Initialization ──────────────────────────────────────────
void initialize(wi::graphics::GraphicsDevice* device, uint32_t maxBlasVertices);
// ── BLAS extraction (compute shaders) ───────────────────────
// Extract blocky quad positions into BLAS vertex buffer.
void dispatchBLASExtract(wi::graphics::CommandList cmd,
const wi::graphics::GPUBuffer& quadBuffer,
const wi::graphics::GPUBuffer& chunkInfoBuffer,
uint32_t quadCount) const;
// Extract toping instance positions via GPU compute.
// groupBuffer/groupsGPU: toping BLAS group table.
void dispatchTopingBLASExtract(wi::graphics::CommandList cmd,
const wi::graphics::GPUBuffer& topingVertexBuffer,
const wi::graphics::GPUBuffer& topingInstanceBuffer,
const void* groupsGPUData, size_t groupsGPUSize,
uint32_t groupCount, uint32_t totalVertices) const;
// ── Acceleration structure build ────────────────────────────
static constexpr uint32_t BUILD_BLOCKY = 1 << 0;
static constexpr uint32_t BUILD_SMOOTH = 1 << 1;
static constexpr uint32_t BUILD_TOPING = 1 << 2;
static constexpr uint32_t BUILD_ALL = BUILD_BLOCKY | BUILD_SMOOTH | BUILD_TOPING;
void buildAccelerationStructures(wi::graphics::CommandList cmd,
uint32_t buildFlags,
const wi::graphics::GPUBuffer& smoothVB,
uint32_t smoothVertCount) const;
// ── RT Shadows + AO dispatch ────────────────────────────────
void dispatchShadows(wi::graphics::CommandList cmd,
const wi::graphics::Texture& depthBuffer,
const wi::graphics::Texture& renderTarget,
const wi::graphics::Texture& normalTarget,
const wi::graphics::GPUBuffer& constantBuffer) const;
// ── Toping BLAS buffer management ───────────────────────────
// Ensure capacity for toping BLAS position + index buffers.
// Returns true if buffers were (re)created.
bool ensureTopingBLASCapacity(uint32_t totalVertices);
// ── State queries ───────────────────────────────────────────
bool isAvailable() const { return available_; }
bool isReady() const { return available_ && tlas_.IsValid(); }
bool isShadowsEnabled() const { return shadowsEnabled_; }
void setShadowsEnabled(bool v) { shadowsEnabled_ = v; }
uint32_t getShadowDebug() const { return shadowDebug_; }
void setShadowDebug(uint32_t v) { shadowDebug_ = v; }
uint32_t getBlockyTriCount() const { return blockyVertexCount_ / 3; }
uint32_t getSmoothTriCount() const { return smoothVertexCount_ / 3; }
uint32_t getTopingTriCount() const { return topingVertexCount_ / 3; }
uint32_t getTopingVertexCount() const { return topingVertexCount_; }
uint32_t getTlasInstanceCount() const { return tlasInstanceCount_; }
const wi::graphics::RaytracingAccelerationStructure& getTLAS() const { return tlas_; }
// Dirty flags (public for VoxelRenderPath orchestration)
mutable bool dirty = true; // BLAS/TLAS need rebuild
mutable bool topingBLASDirty = false; // toping BLAS extract + rebuild needed
mutable bool aoHistoryValid = false;
mutable uint32_t frameCounter = 0;
mutable XMFLOAT4X4 prevViewProjection;
// AO textures (created by VoxelRenderPath::createRenderTargets)
mutable wi::graphics::Texture aoRawTexture;
mutable wi::graphics::Texture aoBlurredTexture;
mutable wi::graphics::Texture aoHistoryTexture;
private:
wi::graphics::GraphicsDevice* device_ = nullptr;
mutable bool available_ = false;
mutable bool shadowsEnabled_ = false;
mutable uint32_t shadowDebug_ = 0;
// Shaders
wi::graphics::Shader blasExtractShader_;
wi::graphics::Shader topingBLASShader_;
wi::graphics::Shader shadowShader_;
wi::graphics::Shader aoBlurShader_;
wi::graphics::Shader aoApplyShader_;
// Blocky BLAS resources
mutable wi::graphics::GPUBuffer blasPositionBuffer_;
wi::graphics::GPUBuffer blasIndexBuffer_;
mutable wi::graphics::RaytracingAccelerationStructure blockyBLAS_;
mutable uint32_t blockyBLASCapacity_ = 0;
mutable uint32_t blockyVertexCount_ = 0;
// Smooth BLAS
mutable wi::graphics::RaytracingAccelerationStructure smoothBLAS_;
mutable uint32_t smoothBLASCapacity_ = 0;
mutable uint32_t smoothVertexCount_ = 0;
// Toping BLAS
mutable wi::graphics::RaytracingAccelerationStructure topingBLAS_;
mutable uint32_t topingBLASASCapacity_ = 0;
mutable uint32_t topingVertexCount_ = 0;
mutable DeferredGPUBuffer topingBLASPositionBuf_;
mutable wi::graphics::GPUBuffer topingBLASIndexBuffer_;
mutable uint32_t topingBLASIndexCount_ = 0;
wi::graphics::GPUBuffer topingBLASGroupBuffer_;
// TLAS
mutable wi::graphics::RaytracingAccelerationStructure tlas_;
mutable uint32_t tlasInstanceCount_ = 0;
uint32_t maxBlasVertices_ = 0;
};
} // namespace voxel

File diff suppressed because it is too large Load diff

View file

@ -2,6 +2,8 @@
#include "VoxelWorld.h"
#include "VoxelMesher.h"
#include "TopingSystem.h"
#include "DeferredGPUBuffer.h"
#include "VoxelRTManager.h"
#include "WickedEngine.h"
namespace voxel {
@ -77,9 +79,7 @@ private:
wi::graphics::Shader topingPS_;
wi::graphics::PipelineState topingPso_;
wi::graphics::GPUBuffer topingVertexBuffer_; // StructuredBuffer<TopingVertex>, SRV t4
wi::graphics::GPUBuffer topingInstanceBuffer_; // StructuredBuffer<float3>, SRV t5
mutable uint32_t topingInstanceCapacity_ = 0; // pre-allocated capacity (avoid per-frame CreateBuffer)
mutable bool topingInstanceDirty_ = false; // deferred upload via UpdateBuffer in Render()
DeferredGPUBuffer topingInstanceBuf_; // StructuredBuffer<float3>, SRV t5
static constexpr uint32_t MAX_TOPING_INSTANCES = 256 * 1024; // 256K instances max
// Persistent staging buffers for toping upload (avoids per-frame allocations)
struct TopingSortedInst { float wx, wy, wz; uint16_t type, variant; };
@ -96,8 +96,7 @@ private:
};
std::vector<TopingDrawGroup> topingDrawGroups_; // built in uploadTopingData, reused in renderTopings
// ── GPU compute toping BLAS extraction (replaces 196ms CPU loop) ──
wi::graphics::Shader topingBLASShader_; // voxelTopingBLASCS compute shader
// ── Toping BLAS group staging (passed to VoxelRTManager) ──────
struct TopingBLASGroupGPU {
uint32_t globalVertexOffset; // prefix sum of total vertices before this group
uint32_t vertexTemplateOffset; // offset into topingVertices (t4)
@ -105,24 +104,19 @@ private:
uint32_t instanceOffset; // offset into topingInstances (t5)
uint32_t instanceCount; // instances in this group
};
wi::graphics::GPUBuffer topingBLASGroupBuffer_; // StructuredBuffer<TopingBLASGroupGPU>, SRV t7
std::vector<TopingBLASGroupGPU> topingBLASGroupsGPU_; // CPU staging for group table
mutable uint32_t topingBLASTotalVertices_ = 0;
static constexpr uint32_t MAX_TOPING_BLAS_GROUPS = 64;
void dispatchTopingBLASExtract(wi::graphics::CommandList cmd) const;
// Shaders & Pipeline (smooth surfaces, Phase 5)
wi::graphics::Shader smoothVS_;
wi::graphics::Shader smoothPS_;
wi::graphics::RasterizerState smoothRasterizer_;
wi::graphics::PipelineState smoothPso_;
wi::graphics::GPUBuffer smoothVertexBuffer_; // StructuredBuffer<SmoothVertex>, SRV t6
mutable uint32_t smoothVertexCapacity_ = 0; // pre-allocated capacity (avoid per-frame CreateBuffer)
DeferredGPUBuffer smoothVertexBuf_; // StructuredBuffer<SmoothVertex>, SRV t6
std::vector<SmoothVertex> smoothStagingVerts_; // persistent staging buffer (avoids per-frame alloc)
static constexpr uint32_t MAX_SMOOTH_VERTICES = 4 * 1024 * 1024; // 4M vertices max
mutable uint32_t smoothVertexCount_ = 0;
mutable uint32_t smoothDrawCalls_ = 0;
mutable bool smoothVertexDirty_ = false; // deferred upload via UpdateBuffer in Render()
bool smoothDirty_ = true;
// Texture array for materials (256x256, 5 layers for prototype)
@ -201,58 +195,9 @@ private:
mutable uint32_t gpuSmoothVertexCount_ = 0; // readback from previous frame
mutable bool gpuSmoothMeshDirty_ = true;
// ── Ray Tracing (Phase 6.1) ─────────────────────────────────────
wi::graphics::Shader blasExtractShader_; // voxelBLASExtractCS compute shader
mutable wi::graphics::GPUBuffer blasPositionBuffer_; // float3[] for blocky BLAS (6 verts per quad)
wi::graphics::GPUBuffer blasIndexBuffer_; // sequential uint32 indices [0,1,2,...] for BLAS
mutable wi::graphics::RaytracingAccelerationStructure blockyBLAS_;
mutable wi::graphics::RaytracingAccelerationStructure smoothBLAS_;
mutable wi::graphics::RaytracingAccelerationStructure topingBLAS_;
mutable wi::graphics::RaytracingAccelerationStructure tlas_;
mutable wi::graphics::GPUBuffer topingBLASPositionBuffer_; // float3[] world-space toping positions
mutable wi::graphics::GPUBuffer topingBLASIndexBuffer_; // sequential indices for toping BLAS
mutable uint32_t topingBLASPositionCapacity_ = 0; // pre-allocated capacity (vertices)
mutable uint32_t topingBLASIndexCount_ = 0; // size of toping index buffer
mutable bool topingBLASDirty_ = false; // GPU compute BLAS extract + rebuild needed
mutable uint32_t topingBLASVertexCount_ = 0; // actual vertex count for current frame
// ── Ray Tracing (Phase 6) ────────────────────────────────────────
static constexpr uint32_t MAX_BLAS_VERTICES = MEGA_BUFFER_CAPACITY * 6; // 6 verts per quad
mutable bool rtAvailable_ = false; // GPU supports RT
mutable bool rtDirty_ = true; // BLAS/TLAS need rebuild
mutable uint32_t rtBlockyVertexCount_ = 0; // current blocky BLAS vertex count
mutable uint32_t rtSmoothVertexCount_ = 0; // current smooth BLAS vertex count
mutable uint32_t rtTopingVertexCount_ = 0; // current toping BLAS vertex count
// BLAS capacity tracking: only recreate AS when vertex count exceeds capacity
mutable uint32_t blockyBLASCapacity_ = 0; // vertex count at BLAS creation
mutable uint32_t smoothBLASCapacity_ = 0;
mutable uint32_t topingBLASASCapacity_ = 0; // separate from topingBLASPositionCapacity_ (buffer capacity)
mutable uint32_t tlasInstanceCount_ = 0; // track TLAS instance count to avoid per-frame recreation
void dispatchBLASExtract(wi::graphics::CommandList cmd) const;
// Flags for selective BLAS rebuild
static constexpr uint32_t RT_BUILD_BLOCKY = 1 << 0;
static constexpr uint32_t RT_BUILD_SMOOTH = 1 << 1;
static constexpr uint32_t RT_BUILD_TOPING = 1 << 2;
static constexpr uint32_t RT_BUILD_ALL = RT_BUILD_BLOCKY | RT_BUILD_SMOOTH | RT_BUILD_TOPING;
void buildAccelerationStructures(wi::graphics::CommandList cmd,
uint32_t buildFlags = RT_BUILD_ALL) const;
// ── RT Shadows + AO (Phase 6.2 + 6.3) ──────────────────────────
wi::graphics::Shader shadowShader_; // voxelShadowCS compute shader
wi::graphics::Shader aoBlurShader_; // voxelAOBlurCS compute shader
wi::graphics::Shader aoApplyShader_; // voxelAOApplyCS compute shader
mutable wi::graphics::Texture aoRawTexture_; // R8_UNORM: raw AO from shadow CS
mutable wi::graphics::Texture aoBlurredTexture_; // R8_UNORM: after bilateral blur
mutable wi::graphics::Texture aoHistoryTexture_; // R8_UNORM: previous frame's temporally accumulated AO
mutable XMFLOAT4X4 prevViewProjection_; // previous frame's VP matrix
mutable uint32_t frameCounter_ = 0;
mutable bool aoHistoryValid_ = false;
mutable bool rtShadowsEnabled_ = false; // true when shader + TLAS ready
mutable uint32_t rtShadowDebug_ = 0; // 0=off, 1=debug shadows, 2=debug AO
void dispatchShadows(wi::graphics::CommandList cmd,
const wi::graphics::Texture& depthBuffer,
const wi::graphics::Texture& renderTarget,
const wi::graphics::Texture& normalTarget) const;
mutable VoxelRTManager rt_;
void dispatchGpuMesh(wi::graphics::CommandList cmd, const VoxelWorld& world,
ProfileAccum* profPack = nullptr, ProfileAccum* profUpload = nullptr,
@ -298,9 +243,9 @@ public:
float getGpuBLASExtractTimeMs() const { return gpuBLASExtractTimeMs_; }
float getGpuBLASBuildTimeMs() const { return gpuBLASBuildTimeMs_; }
float getGpuRTShadowsTimeMs() const { return gpuRTShadowsTimeMs_; }
void toggleRTShadows() { rtShadowsEnabled_ = !rtShadowsEnabled_; }
bool isGpuMeshEnabled() const { return gpuMesherAvailable_; }
uint32_t getGpuMeshQuadCount() const { return gpuMeshQuadCount_; }
VoxelRTManager& rt() const { return rt_; }
// Phase 4: Toping rendering
void uploadTopingData(const TopingSystem& topingSystem);
@ -325,14 +270,90 @@ public:
uint32_t getSmoothVertexCount() const { return (smoothCentroidShader_.IsValid() && smoothMeshShader_.IsValid()) ? gpuSmoothVertexCount_ : smoothVertexCount_; }
uint32_t getSmoothDrawCalls() const { return smoothDrawCalls_; }
// Phase 6: Ray Tracing
bool isRTAvailable() const { return rtAvailable_; }
bool isRTReady() const { return rtAvailable_ && tlas_.IsValid(); }
bool isRTShadowsEnabled() const { return rtShadowsEnabled_; }
uint32_t getRTBlockyTriCount() const { return rtBlockyVertexCount_ / 3; }
uint32_t getRTSmoothTriCount() const { return rtSmoothVertexCount_ / 3; }
uint32_t getRTTopingTriCount() const { return rtTopingVertexCount_ / 3; }
const wi::graphics::RaytracingAccelerationStructure& getTLAS() const { return tlas_; }
// Phase 6: Ray Tracing (delegated to VoxelRTManager)
bool isRTAvailable() const { return rt_.isAvailable(); }
bool isRTReady() const { return rt_.isReady(); }
bool isRTShadowsEnabled() const { return rt_.isShadowsEnabled(); }
uint32_t getRTBlockyTriCount() const { return rt_.getBlockyTriCount(); }
uint32_t getRTSmoothTriCount() const { return rt_.getSmoothTriCount(); }
uint32_t getRTTopingTriCount() const { return rt_.getTopingTriCount(); }
const wi::graphics::RaytracingAccelerationStructure& getTLAS() const { return rt_.getTLAS(); }
};
// ── Camera Controller ────────────────────────────────────────────
struct CameraController {
float speed = 50.0f;
float sensitivity = 0.003f;
XMFLOAT3 pos = { 256.0f, 100.0f, 256.0f };
float pitch = -0.3f;
float yaw = 0.0f;
bool mouseCaptured = false;
void set(float x, float y, float z, float p, float yw) {
pos = { x, y, z }; pitch = p; yaw = yw;
}
void handleInput(float dt, wi::scene::CameraComponent* camera);
};
// ── Animation State ─────────────────────────────────────────────
struct AnimationState {
float windTime = 0.0f; // continuous, always running
bool terrainAnimated = false; // toggled with F3
float time = 0.0f; // current animation time offset
float accum = 0.0f; // accumulator for 30 Hz timer
static constexpr float INTERVAL = 1.0f / 30.0f; // ~33.3ms = 30 Hz
// Returns true when an animation tick should fire (call every frame).
bool tick(float dt) {
windTime += dt;
if (!terrainAnimated) return false;
accum += dt;
if (accum < INTERVAL) return false;
accum -= INTERVAL;
time += INTERVAL;
return true;
}
};
// ── CPU Profiling (averages every INTERVAL seconds) ─────────────
struct VoxelProfiler {
static constexpr float INTERVAL = 5.0f;
// Update() phase
ProfileAccum regenerate; // regenerateAnimated
ProfileAccum updateMeshes; // updateMeshes (rebuildChunkInfoOnly)
ProfileAccum topingCollect; // topingSystem.collectInstances
ProfileAccum topingUpload; // uploadTopingData
ProfileAccum smoothMesh; // SmoothMesher::meshChunk (all chunks)
ProfileAccum smoothUpload; // uploadSmoothData
ProfileAccum frame; // full frame (Update only - legacy)
// Render() phase
ProfileAccum voxelPack; // voxel data packing in dispatchGpuMesh
ProfileAccum gpuUpload; // GPU upload in dispatchGpuMesh
ProfileAccum gpuDispatch; // compute dispatches in dispatchGpuMesh
ProfileAccum gpuMeshDispatch; // GPU mesh compute dispatch (in Render)
ProfileAccum gpuSmoothDispatch; // GPU smooth mesh dispatch (in Render)
ProfileAccum blasExtract; // BLAS position extraction compute
ProfileAccum blasBuild; // BLAS/TLAS build
ProfileAccum deferredUpload; // deferred GPU buffer uploads
ProfileAccum render; // render() draw calls
ProfileAccum rtShadows; // RT shadows + AO dispatch
// Totals
ProfileAccum fullFrame; // true full frame (Update + Render + Compose)
ProfileAccum gpuWait; // GPU sync: time between Compose end and next Update start
ProfileAccum wickedRender; // RenderPath3D::Render() (Wicked internal)
ProfileAccum trueFrame; // wall-clock frame-to-frame time
// Timing helpers
std::chrono::high_resolution_clock::time_point frameStart;
std::chrono::high_resolution_clock::time_point lastComposeEnd;
bool lastComposeEndValid = false;
float timer = 0.0f;
void log(const VoxelRenderer& renderer) const;
void resetAll();
};
// ── Custom RenderPath that integrates voxel rendering ───────────
@ -345,15 +366,14 @@ public:
bool debugMode = false;
bool debugSmooth = false;
bool screenshotMode = false; // CLI "screenshot": auto-position camera, capture, quit
void setCamera(float x, float y, float z, float pitch, float yaw);
void setCamera(float x, float y, float z, float pitch, float yaw) {
camera_.set(x, y, z, pitch, yaw);
}
void resetAOHistory(); // invalidate temporal AO after camera jump
float cameraSpeed = 50.0f;
float cameraSensitivity = 0.003f;
XMFLOAT3 cameraPos = { 256.0f, 100.0f, 256.0f };
float cameraPitch = -0.3f;
float cameraYaw = 0.0f;
bool mouseCaptured = false;
CameraController camera_;
AnimationState anim_;
mutable VoxelProfiler prof_;
const wi::graphics::Texture& getVoxelRT() const { return voxelRT_; }
@ -363,57 +383,19 @@ public:
void Compose(wi::graphics::CommandList cmd) const override;
private:
void handleInput(float dt);
void createRenderTargets();
mutable bool worldGenerated_ = false;
mutable int frameCount_ = 0;
mutable float lastDt_ = 0.016f;
mutable float smoothFps_ = 60.0f;
// Wind animation (continuous, always running)
float windTime_ = 0.0f;
// Animated terrain (wave effect at 30 Hz, toggled with F3)
bool animatedTerrain_ = false;
float animTime_ = 0.0f;
float animAccum_ = 0.0f;
static constexpr float ANIM_INTERVAL = 1.0f / 30.0f; // ~33.3ms = 30 Hz
wi::graphics::Texture voxelRT_;
wi::graphics::Texture voxelNormalRT_; // Phase 6: world-space normals for RT shadows/AO
wi::graphics::Texture voxelDepth_;
mutable bool rtCreated_ = false;
// ── CPU Profiling (averages every 5 seconds) ─────────────────
mutable ProfileAccum profRegenerate_; // regenerateAnimated
mutable ProfileAccum profUpdateMeshes_; // updateMeshes (rebuildChunkInfoOnly or CPU mesh)
mutable ProfileAccum profVoxelPack_; // voxel data packing in dispatchGpuMesh
mutable ProfileAccum profGpuUpload_; // GPU upload in dispatchGpuMesh
mutable ProfileAccum profGpuDispatch_; // compute dispatches in dispatchGpuMesh
mutable ProfileAccum profRender_; // render() draw calls
mutable ProfileAccum profFrame_; // full frame (Update only - legacy)
mutable ProfileAccum profFullFrame_; // true full frame (Update + Render + Compose)
mutable ProfileAccum profSmoothMesh_; // SmoothMesher::meshChunk (all chunks)
mutable ProfileAccum profSmoothUpload_; // uploadSmoothData
mutable ProfileAccum profTopingCollect_; // topingSystem.collectInstances
mutable ProfileAccum profTopingUpload_; // uploadTopingData
mutable ProfileAccum profGpuMeshDispatch_; // GPU mesh compute dispatch (in Render)
mutable ProfileAccum profGpuSmoothDispatch_; // GPU smooth mesh dispatch (in Render)
mutable ProfileAccum profBLASExtract_; // BLAS position extraction compute
mutable ProfileAccum profBLASBuild_; // BLAS/TLAS build
mutable ProfileAccum profDeferredUpload_; // deferred GPU buffer uploads
mutable ProfileAccum profRTShadows_; // RT shadows + AO dispatch
mutable ProfileAccum profGpuWait_; // GPU sync: time between Compose end and next Update start
mutable ProfileAccum profWickedRender_; // RenderPath3D::Render() (Wicked internal)
mutable ProfileAccum profTrueFrame_; // wall-clock frame-to-frame time
mutable std::chrono::high_resolution_clock::time_point frameStartTime_; // for full frame timing
mutable std::chrono::high_resolution_clock::time_point lastComposeEnd_; // for GPU wait measurement
mutable bool lastComposeEndValid_ = false;
mutable uint32_t rtBuildSkipCounter_ = 0; // stagger BLAS builds during animation
mutable bool rtWasEnabled_ = false; // saved RT state before animation
mutable float profTimer_ = 0.0f;
static constexpr float PROF_INTERVAL = 5.0f;
void logProfilingAverages() const;
};
} // namespace voxel

View file

@ -329,4 +329,116 @@ Le ciel te plaît — parfait ! On continue vers Wonderbox. Qu'est-ce que tu vou
Couleurs plus saturées/profondes — le vert de l'herbe Wonderbox est plus riche et profond
Fog atmosphérique — la brume chaude au loin qui fond le terrain dans le ciel
Ombres plus marquées — le contraste ombre/lumière est plus prononcé dans Wonderbox
Faces latérales des blocs — plus texturées/détaillées dans Wonderbox
Faces latérales des blocs — plus texturées/détaillées dans Wonderbox
Remaining (deferred):
DeferredGPUBuffer utility extraction — moderate refactor, 3 instances, lower priority
VoxelRTManager extraction — significant refactor, deferred
Toping instance sort+pack parallelization — P3
Voici mon analyse complète et mes propositions.
## 1. Gestion mémoire : état des lieux et clarification
**Pattern actuel (répété 5× dans le code) :**
```
uploadXxxData() {
staging.resize(count); // CPU staging vector
/* fill staging */
if (capacity < count) {
capacity = count * 1.25;
CreateBuffer(nullptr); // GPU buffer, no data
}
dirty = true; // flag for Render()
}
Render() {
if (dirty) {
UpdateBuffer(staging); // actual GPU upload
dirty = false;
}
}
```
**Problèmes :**
- Pattern dupliqué pour : `topingInstanceBuffer_`, `topingBLASPositionBuffer_`, `smoothVertexBuffer_`, et les 3 BLAS + TLAS
- Chaque duplication a engendré des bugs (le crash `memmove`, la fuite VRAM BLAS, les ombres figées)
- Les dirty flags sont dispersés (`topingInstanceDirty_`, `smoothVertexDirty_`, `topingBLASDirty_`, `rtDirty_`) avec des dépendances d'ordre non-évidentes (le BLAS upload doit précéder le BLAS build)
- 15 membres `mutable` juste pour les flags + capacités
**Proposition :** Extraire un `DeferredGPUBuffer` encapsulant ce pattern :
```cpp
struct DeferredGPUBuffer {
GPUBuffer gpu;
std::vector<uint8_t> staging;
uint32_t count = 0;
uint32_t capacity = 0;
uint32_t stride = 0;
bool dirty = false;
void prepare(uint32_t newCount, const void* data); // resize + fill + dirty=true
void upload(GraphicsDevice* dev, CommandList cmd); // UpdateBuffer + dirty=false
void ensureCapacity(GraphicsDevice* dev, uint32_t newCount, BindFlag flags);
};
```
Ça élimine ~50 lignes de boilerplate par buffer et centralise les invariants (capacity > count, create avec nullptr, upload avec taille réelle).
---
## 2. Performance : propositions sans régression fonctionnelle
### 2.2 — Paralléliser le tri + packing d'instances toping (~5ms → ~1ms)
Le `std::sort` sur 30K éléments et la copie dans `topingGpuInsts_` sont single-thread. Utiliser `wi::jobsystem` pour partitionner par type (2 types = 2 jobs), ou un counting sort (16 buckets par variant × 2 types = 32 buckets) qui est O(N) au lieu de O(N log N).
### 2.3 — Skip le BLAS rebuild quand seul le blocky change
Actuellement buildAccelerationStructures() rebuild les 3 BLAS + TLAS à chaque frame d'animation. Si seul le terrain blocky change (pas de vent/toping), le toping BLAS rebuild est inutile. Ajouter des dirty flags granulaires :
mutable bool blockyBLASDirty_ = false;
mutable bool smoothBLASDirty_ = false;
// topingBLASDirty_ existe déjà
## 3. Refactoring : axes de simplification
### 3.2 — Extraire le RT dans une classe dédiée
`VoxelRenderer` fait 2900+ lignes et mélange rendering, meshing, et ray tracing. Extraire un `VoxelRTManager` :
```cpp
class VoxelRTManager {
// BLAS/TLAS management, capacity tracking
// dispatchBLASExtract(), buildAccelerationStructures()
// dispatchShadows()
// All RT-related state (rtAvailable_, rtDirty_, aoTextures_, etc.)
};
```
Ça isole les ~500 lignes de RT et ses 20+ membres, rendant le debugging plus ciblé.
### 3.3 — Unifier le pattern deferred upload
Comme décrit en §1, le `DeferredGPUBuffer` centralisé évite la duplication error-prone. Chaque bug rencontré (crash memmove, VRAM leak, ombres figées) vient d'une variation mal implémentée de ce même pattern.
### 3.4 — Simplifier `VoxelRenderPath`
`VoxelRenderPath` fait office de "god object" : caméra, input, animation, profiling, render targets, wind. Extraire :
- Input/caméra → struct `CameraController`
- Profiling → struct `VoxelProfiler` (déjà un bon candidat, les `ProfileAccum` sont isolables)
- Animation → struct `AnimationState`
## Priorisation recommandée
| Priorité | Action | Impact perf | Effort |
|----------|--------|-------------|--------|
| **P1** | `DeferredGPUBuffer` (§3.3) | Prévention bugs | Moyen |
| **P2** | Extraire RT dans classe (§3.2) | Maintenabilité | Moyen |
| **P2** | Dirty flags granulaires BLAS (§2.3) | ~2-5ms/frame | Faible |
| **P3** | Paralléliser tri toping (§2.2) | ~4ms | Faible |
| **P3** | LOD topings en animation (§4.1) | Raster + BLAS | Moyen |
**Le P0 seul ramènerait le frame time de 232ms à ~35ms (~28 FPS), soit 6.5× mieux.** Combiné avec P2 dirty flags, on approche les 60 FPS cibles.
Dis-moi quelles priorités tu veux attaquer et dans quel ordre.