Refactor: extract VoxelRTManager, DeferredGPUBuffer, decompose VoxelRenderPath
- Extract DeferredGPUBuffer utility (staging→dirty→capacity GPU buffer pattern) - Extract VoxelRTManager from VoxelRenderer (~500 lines: BLAS/TLAS, RT shadows+AO) - Decompose VoxelRenderPath into CameraController, AnimationState, VoxelProfiler - Replace toping std::sort with O(n) counting sort by (type, variant) - Update CLAUDE.md architecture docs to reflect new file structure
This commit is contained in:
parent
53df73e5e6
commit
57ac08f231
7 changed files with 1294 additions and 1070 deletions
10
CLAUDE.md
10
CLAUDE.md
|
|
@ -18,7 +18,9 @@ bvle-voxels/
|
||||||
│ │ ├── VoxelTypes.h # Types fondamentaux (VoxelData, PackedQuad, MaterialDesc, ChunkPos)
|
│ │ ├── VoxelTypes.h # Types fondamentaux (VoxelData, PackedQuad, MaterialDesc, ChunkPos)
|
||||||
│ │ ├── VoxelWorld.h/.cpp # Monde voxel (hashmap de chunks, génération procédurale)
|
│ │ ├── VoxelWorld.h/.cpp # Monde voxel (hashmap de chunks, génération procédurale)
|
||||||
│ │ ├── VoxelMesher.h/.cpp # Binary Greedy Mesher CPU + SmoothMesher (Naive Surface Nets)
|
│ │ ├── VoxelMesher.h/.cpp # Binary Greedy Mesher CPU + SmoothMesher (Naive Surface Nets)
|
||||||
│ │ ├── VoxelRenderer.h/.cpp# Renderer + VoxelRenderPath (sous-classe RenderPath3D)
|
│ │ ├── VoxelRenderer.h/.cpp# Renderer + VoxelRenderPath (CameraController, AnimationState, VoxelProfiler)
|
||||||
|
│ │ ├── VoxelRTManager.h/.cpp # Ray tracing: BLAS/TLAS lifecycle, shadows+AO dispatches
|
||||||
|
│ │ ├── DeferredGPUBuffer.h # Utilitaire staging→dirty→capacity GPU buffer upload
|
||||||
│ │ └── TopingSystem.h/.cpp # Système de topings (biseaux décoratifs sur faces +Y)
|
│ │ └── TopingSystem.h/.cpp # Système de topings (biseaux décoratifs sur faces +Y)
|
||||||
│ └── app/
|
│ └── app/
|
||||||
│ └── main.cpp # Point d'entrée Win32 + crash handler SEH
|
│ └── main.cpp # Point d'entrée Win32 + crash handler SEH
|
||||||
|
|
@ -129,7 +131,11 @@ Perlin noise 3D, fBm 5 octaves (2 en animation), caves 3D, matériaux par altitu
|
||||||
- **Per-chunk info** : `StructuredBuffer<GPUChunkInfo>` (80 bytes/chunk)
|
- **Per-chunk info** : `StructuredBuffer<GPUChunkInfo>` (80 bytes/chunk)
|
||||||
- **Height-based blending** (Phase 3) : PS lit `voxelDataBuffer` (t3), winner-takes-all heightmap, corner attenuation
|
- **Height-based blending** (Phase 3) : PS lit `voxelDataBuffer` (t3), winner-takes-all heightmap, corner attenuation
|
||||||
- **Render targets propres** : `voxelRT_` (R8G8B8A8) + `voxelDepth_` (D32_FLOAT)
|
- **Render targets propres** : `voxelRT_` (R8G8B8A8) + `voxelDepth_` (D32_FLOAT)
|
||||||
- **CPU profiling** : `ProfileAccum` avec moyennes toutes les 5s
|
- **CPU profiling** : `VoxelProfiler` (21 `ProfileAccum`, moyennes toutes les 5s)
|
||||||
|
- **DeferredGPUBuffer** : utilitaire pour buffers GPU avec staging CPU, dirty flag, capacity-based growth (25% headroom)
|
||||||
|
- **VoxelRTManager** (`VoxelRTManager.h/.cpp`) : gère BLAS/TLAS, dispatches RT shadows+AO, isolé du renderer
|
||||||
|
- **VoxelRenderPath** décomposé en : `CameraController` (mouvement/souris), `AnimationState` (tick terrain), `VoxelProfiler`
|
||||||
|
- **Toping sort** : counting sort O(n) par (type, variant) au lieu de `std::sort`
|
||||||
|
|
||||||
## Phases de développement
|
## Phases de développement
|
||||||
|
|
||||||
|
|
|
||||||
68
src/voxel/DeferredGPUBuffer.h
Normal file
68
src/voxel/DeferredGPUBuffer.h
Normal file
|
|
@ -0,0 +1,68 @@
|
||||||
|
#pragma once
|
||||||
|
#include "WickedEngine.h"
|
||||||
|
|
||||||
|
namespace voxel {
|
||||||
|
|
||||||
|
// ── Deferred GPU Buffer ─────────────────────────────────────────
|
||||||
|
// Encapsulates the repeated pattern of:
|
||||||
|
// 1. CPU staging data prepared during Update()
|
||||||
|
// 2. GPU buffer with capacity-based growth (25% headroom)
|
||||||
|
// 3. Dirty flag for deferred upload in Render()
|
||||||
|
//
|
||||||
|
// Eliminates ~50 lines of boilerplate per buffer and centralizes
|
||||||
|
// the invariants (capacity >= count, CreateBuffer with nullptr,
|
||||||
|
// UpdateBuffer with actual data size).
|
||||||
|
|
||||||
|
struct DeferredGPUBuffer {
|
||||||
|
wi::graphics::GPUBuffer gpu;
|
||||||
|
mutable uint32_t capacity = 0; // in elements
|
||||||
|
mutable bool dirty = false;
|
||||||
|
uint32_t stride = 0; // bytes per element
|
||||||
|
|
||||||
|
// Ensure GPU buffer has enough capacity for elementCount elements.
|
||||||
|
// Creates/recreates buffer only when capacity is insufficient.
|
||||||
|
// Returns true if buffer was (re)created.
|
||||||
|
bool ensureCapacity(wi::graphics::GraphicsDevice* device,
|
||||||
|
uint32_t elementCount,
|
||||||
|
uint32_t elementStride,
|
||||||
|
wi::graphics::BindFlag bindFlags,
|
||||||
|
wi::graphics::ResourceMiscFlag miscFlags = wi::graphics::ResourceMiscFlag::BUFFER_STRUCTURED)
|
||||||
|
{
|
||||||
|
stride = elementStride;
|
||||||
|
if (gpu.IsValid() && capacity >= elementCount) return false;
|
||||||
|
|
||||||
|
capacity = elementCount + elementCount / 4; // 25% headroom
|
||||||
|
wi::graphics::GPUBufferDesc desc;
|
||||||
|
desc.size = (uint64_t)capacity * stride;
|
||||||
|
desc.bind_flags = bindFlags;
|
||||||
|
desc.misc_flags = miscFlags;
|
||||||
|
desc.stride = (miscFlags == wi::graphics::ResourceMiscFlag::BUFFER_STRUCTURED) ? stride : 0;
|
||||||
|
desc.usage = wi::graphics::Usage::DEFAULT;
|
||||||
|
device->CreateBuffer(&desc, nullptr, &gpu);
|
||||||
|
dirty = true;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Upload data to GPU. Call from Render() with a valid CommandList.
|
||||||
|
// dataCount = number of elements to upload (may be < capacity).
|
||||||
|
void upload(wi::graphics::GraphicsDevice* device,
|
||||||
|
wi::graphics::CommandList cmd,
|
||||||
|
const void* data,
|
||||||
|
uint32_t dataCount) const
|
||||||
|
{
|
||||||
|
if (!dirty || !gpu.IsValid() || dataCount == 0 || !data) return;
|
||||||
|
size_t uploadSize = (size_t)dataCount * stride;
|
||||||
|
size_t bufferSize = (size_t)capacity * stride;
|
||||||
|
if (uploadSize <= bufferSize) {
|
||||||
|
device->UpdateBuffer(&gpu, data, cmd, uploadSize);
|
||||||
|
}
|
||||||
|
dirty = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Mark as needing upload (call after staging data changes).
|
||||||
|
void markDirty() { dirty = true; }
|
||||||
|
|
||||||
|
bool isValid() const { return gpu.IsValid(); }
|
||||||
|
};
|
||||||
|
|
||||||
|
} // namespace voxel
|
||||||
610
src/voxel/VoxelRTManager.cpp
Normal file
610
src/voxel/VoxelRTManager.cpp
Normal file
|
|
@ -0,0 +1,610 @@
|
||||||
|
#include "VoxelRTManager.h"
|
||||||
|
#include <cstring>
|
||||||
|
|
||||||
|
using namespace wi::graphics;
|
||||||
|
|
||||||
|
namespace voxel {
|
||||||
|
|
||||||
|
void VoxelRTManager::initialize(GraphicsDevice* dev, uint32_t maxBlasVertices) {
|
||||||
|
device_ = dev;
|
||||||
|
maxBlasVertices_ = maxBlasVertices;
|
||||||
|
|
||||||
|
available_ = dev->CheckCapability(GraphicsDeviceCapability::RAYTRACING);
|
||||||
|
if (!available_) {
|
||||||
|
wi::backlog::post("VoxelRTManager: RT not available (GPU does not support ray tracing)");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
wi::renderer::LoadShader(ShaderStage::CS, blasExtractShader_, "voxel/voxelBLASExtractCS.cso");
|
||||||
|
if (blasExtractShader_.IsValid()) {
|
||||||
|
// BLAS position buffer: 6 float3 per quad (non-indexed triangles), raw buffer
|
||||||
|
GPUBufferDesc posDesc;
|
||||||
|
posDesc.size = (uint64_t)maxBlasVertices * sizeof(float) * 3;
|
||||||
|
posDesc.bind_flags = BindFlag::UNORDERED_ACCESS | BindFlag::SHADER_RESOURCE;
|
||||||
|
posDesc.misc_flags = ResourceMiscFlag::BUFFER_RAW;
|
||||||
|
posDesc.stride = 0;
|
||||||
|
posDesc.usage = Usage::DEFAULT;
|
||||||
|
bool ok = dev->CreateBuffer(&posDesc, nullptr, &blasPositionBuffer_);
|
||||||
|
|
||||||
|
// Sequential index buffer for BLAS
|
||||||
|
GPUBufferDesc idxDesc;
|
||||||
|
idxDesc.size = (uint64_t)maxBlasVertices * sizeof(uint32_t);
|
||||||
|
idxDesc.bind_flags = BindFlag::SHADER_RESOURCE;
|
||||||
|
idxDesc.usage = Usage::DEFAULT;
|
||||||
|
auto fillIndices = [maxBlasVertices](void* dest) {
|
||||||
|
uint32_t* p = (uint32_t*)dest;
|
||||||
|
for (uint32_t i = 0; i < maxBlasVertices; i++)
|
||||||
|
p[i] = i;
|
||||||
|
};
|
||||||
|
bool okIdx = dev->CreateBuffer2(&idxDesc, fillIndices, &blasIndexBuffer_);
|
||||||
|
|
||||||
|
if (ok && blasPositionBuffer_.IsValid() && okIdx && blasIndexBuffer_.IsValid()) {
|
||||||
|
dev->SetName(&blasPositionBuffer_, "VoxelRTManager::blasPositionBuffer");
|
||||||
|
dev->SetName(&blasIndexBuffer_, "VoxelRTManager::blasIndexBuffer");
|
||||||
|
wi::backlog::post("VoxelRTManager: RT available (BLAS pos "
|
||||||
|
+ std::to_string(posDesc.size / (1024*1024)) + " MB + idx "
|
||||||
|
+ std::to_string(idxDesc.size / (1024*1024)) + " MB)");
|
||||||
|
} else {
|
||||||
|
available_ = false;
|
||||||
|
wi::backlog::post("VoxelRTManager: RT buffer creation failed", wi::backlog::LogLevel::Warning);
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
available_ = false;
|
||||||
|
wi::backlog::post("VoxelRTManager: BLAS extraction shader failed", wi::backlog::LogLevel::Warning);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Toping BLAS CS
|
||||||
|
wi::renderer::LoadShader(ShaderStage::CS, topingBLASShader_, "voxel/voxelTopingBLASCS.cso");
|
||||||
|
if (topingBLASShader_.IsValid()) {
|
||||||
|
static constexpr uint32_t MAX_GROUPS = 64;
|
||||||
|
GPUBufferDesc grpDesc;
|
||||||
|
grpDesc.size = MAX_GROUPS * 20; // 5 × uint32 per group
|
||||||
|
grpDesc.bind_flags = BindFlag::SHADER_RESOURCE;
|
||||||
|
grpDesc.misc_flags = ResourceMiscFlag::BUFFER_STRUCTURED;
|
||||||
|
grpDesc.stride = 20;
|
||||||
|
grpDesc.usage = Usage::DEFAULT;
|
||||||
|
dev->CreateBuffer(&grpDesc, nullptr, &topingBLASGroupBuffer_);
|
||||||
|
wi::backlog::post("VoxelRTManager: toping BLAS CS available");
|
||||||
|
} else {
|
||||||
|
wi::backlog::post("VoxelRTManager: toping BLAS CS failed", wi::backlog::LogLevel::Warning);
|
||||||
|
}
|
||||||
|
|
||||||
|
// RT Shadows + AO
|
||||||
|
wi::renderer::LoadShader(ShaderStage::CS, shadowShader_, "voxel/voxelShadowCS.cso",
|
||||||
|
ShaderModel::SM_6_5);
|
||||||
|
wi::renderer::LoadShader(ShaderStage::CS, aoBlurShader_, "voxel/voxelAOBlurCS.cso");
|
||||||
|
wi::renderer::LoadShader(ShaderStage::CS, aoApplyShader_, "voxel/voxelAOApplyCS.cso");
|
||||||
|
if (shadowShader_.IsValid() && aoBlurShader_.IsValid() && aoApplyShader_.IsValid()) {
|
||||||
|
shadowsEnabled_ = true;
|
||||||
|
wi::backlog::post("VoxelRTManager: RT shadows + AO blur available");
|
||||||
|
} else {
|
||||||
|
wi::backlog::post("VoxelRTManager: RT shadow/AO shader(s) failed",
|
||||||
|
wi::backlog::LogLevel::Warning);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── BLAS extraction: blocky quads → float3 positions ────────────
|
||||||
|
|
||||||
|
void VoxelRTManager::dispatchBLASExtract(CommandList cmd,
|
||||||
|
const GPUBuffer& quadBuffer,
|
||||||
|
const GPUBuffer& chunkInfoBuffer,
|
||||||
|
uint32_t quadCount) const
|
||||||
|
{
|
||||||
|
if (!available_ || !blasExtractShader_.IsValid() || quadCount == 0) return;
|
||||||
|
|
||||||
|
auto* dev = device_;
|
||||||
|
|
||||||
|
GPUBarrier preBarriers[] = {
|
||||||
|
GPUBarrier::Buffer(&blasPositionBuffer_,
|
||||||
|
ResourceState::UNDEFINED, ResourceState::UNORDERED_ACCESS),
|
||||||
|
};
|
||||||
|
dev->Barrier(preBarriers, 1, cmd);
|
||||||
|
|
||||||
|
dev->BindComputeShader(&blasExtractShader_, cmd);
|
||||||
|
dev->BindResource(&quadBuffer, 0, cmd); // t0
|
||||||
|
dev->BindResource(&chunkInfoBuffer, 2, cmd); // t2
|
||||||
|
dev->BindUAV(&blasPositionBuffer_, 0, cmd); // u0
|
||||||
|
|
||||||
|
struct BLASPush {
|
||||||
|
uint32_t quadCount;
|
||||||
|
uint32_t pad[11];
|
||||||
|
} pushData = {};
|
||||||
|
pushData.quadCount = quadCount;
|
||||||
|
dev->PushConstants(&pushData, sizeof(pushData), cmd);
|
||||||
|
|
||||||
|
uint32_t groupCount = (quadCount + 63) / 64;
|
||||||
|
dev->Dispatch(groupCount, 1, 1, cmd);
|
||||||
|
|
||||||
|
GPUBarrier postBarriers[] = {
|
||||||
|
GPUBarrier::Buffer(&blasPositionBuffer_,
|
||||||
|
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
|
||||||
|
};
|
||||||
|
dev->Barrier(postBarriers, 1, cmd);
|
||||||
|
|
||||||
|
blockyVertexCount_ = quadCount * 6;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Toping BLAS extraction (GPU compute) ────────────────────────
|
||||||
|
|
||||||
|
void VoxelRTManager::dispatchTopingBLASExtract(CommandList cmd,
|
||||||
|
const GPUBuffer& topingVertexBuffer,
|
||||||
|
const GPUBuffer& topingInstanceBuffer,
|
||||||
|
const void* groupsGPUData, size_t groupsGPUSize,
|
||||||
|
uint32_t groupCount, uint32_t totalVertices) const
|
||||||
|
{
|
||||||
|
if (!topingBLASShader_.IsValid() || !topingBLASGroupBuffer_.IsValid() ||
|
||||||
|
!topingBLASPositionBuf_.isValid() || !topingVertexBuffer.IsValid() ||
|
||||||
|
!topingInstanceBuffer.IsValid() || totalVertices == 0 || groupCount == 0)
|
||||||
|
return;
|
||||||
|
|
||||||
|
auto* dev = device_;
|
||||||
|
|
||||||
|
// Upload group table
|
||||||
|
dev->UpdateBuffer(&topingBLASGroupBuffer_, groupsGPUData, cmd, groupsGPUSize);
|
||||||
|
|
||||||
|
GPUBarrier preBarriers[] = {
|
||||||
|
GPUBarrier::Buffer(&topingBLASGroupBuffer_,
|
||||||
|
ResourceState::COPY_DST, ResourceState::SHADER_RESOURCE),
|
||||||
|
GPUBarrier::Buffer(&topingBLASPositionBuf_.gpu,
|
||||||
|
ResourceState::UNDEFINED, ResourceState::UNORDERED_ACCESS),
|
||||||
|
};
|
||||||
|
dev->Barrier(preBarriers, 2, cmd);
|
||||||
|
|
||||||
|
dev->BindComputeShader(&topingBLASShader_, cmd);
|
||||||
|
dev->BindResource(&topingVertexBuffer, 4, cmd); // t4
|
||||||
|
dev->BindResource(&topingInstanceBuffer, 5, cmd); // t5
|
||||||
|
dev->BindResource(&topingBLASGroupBuffer_, 7, cmd); // t7
|
||||||
|
dev->BindUAV(&topingBLASPositionBuf_.gpu, 0, cmd); // u0
|
||||||
|
|
||||||
|
struct {
|
||||||
|
uint32_t totalVertices;
|
||||||
|
uint32_t groupCount;
|
||||||
|
uint32_t pad[10];
|
||||||
|
} pushData = {};
|
||||||
|
pushData.totalVertices = totalVertices;
|
||||||
|
pushData.groupCount = groupCount;
|
||||||
|
dev->PushConstants(&pushData, sizeof(pushData), cmd);
|
||||||
|
|
||||||
|
uint32_t threadGroups = (totalVertices + 63) / 64;
|
||||||
|
dev->Dispatch(threadGroups, 1, 1, cmd);
|
||||||
|
|
||||||
|
GPUBarrier postBarriers[] = {
|
||||||
|
GPUBarrier::Buffer(&topingBLASPositionBuf_.gpu,
|
||||||
|
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
|
||||||
|
};
|
||||||
|
dev->Barrier(postBarriers, 1, cmd);
|
||||||
|
|
||||||
|
topingVertexCount_ = totalVertices;
|
||||||
|
dirty = true;
|
||||||
|
topingBLASDirty = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Ensure toping BLAS buffer capacity ──────────────────────────
|
||||||
|
|
||||||
|
bool VoxelRTManager::ensureTopingBLASCapacity(uint32_t totalVertices) {
|
||||||
|
if (totalVertices == 0) return false;
|
||||||
|
|
||||||
|
bool recreated = topingBLASPositionBuf_.ensureCapacity(device_, totalVertices,
|
||||||
|
3 * sizeof(float),
|
||||||
|
BindFlag::UNORDERED_ACCESS | BindFlag::SHADER_RESOURCE,
|
||||||
|
ResourceMiscFlag::BUFFER_RAW);
|
||||||
|
|
||||||
|
if (recreated) {
|
||||||
|
char msg[256];
|
||||||
|
snprintf(msg, sizeof(msg), "VoxelRTManager: toping BLAS pos buffer (%u capacity, %.1f MB)",
|
||||||
|
topingBLASPositionBuf_.capacity,
|
||||||
|
(size_t)topingBLASPositionBuf_.capacity * 3 * sizeof(float) / (1024.0 * 1024.0));
|
||||||
|
wi::backlog::post(msg);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Index buffer: grow if needed
|
||||||
|
if (topingBLASIndexCount_ < topingBLASPositionBuf_.capacity) {
|
||||||
|
uint32_t idxCount = topingBLASPositionBuf_.capacity;
|
||||||
|
std::vector<uint32_t> indices(idxCount);
|
||||||
|
for (uint32_t j = 0; j < idxCount; j++) indices[j] = j;
|
||||||
|
|
||||||
|
GPUBufferDesc idxDesc;
|
||||||
|
idxDesc.size = (size_t)idxCount * sizeof(uint32_t);
|
||||||
|
idxDesc.bind_flags = BindFlag::SHADER_RESOURCE;
|
||||||
|
idxDesc.misc_flags = ResourceMiscFlag::NONE;
|
||||||
|
idxDesc.usage = Usage::DEFAULT;
|
||||||
|
device_->CreateBuffer(&idxDesc, indices.data(), &topingBLASIndexBuffer_);
|
||||||
|
topingBLASIndexCount_ = idxCount;
|
||||||
|
recreated = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
topingBLASDirty = true;
|
||||||
|
return recreated;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Acceleration structure build ────────────────────────────────
|
||||||
|
|
||||||
|
void VoxelRTManager::buildAccelerationStructures(CommandList cmd,
|
||||||
|
uint32_t buildFlags,
|
||||||
|
const GPUBuffer& smoothVB,
|
||||||
|
uint32_t smoothVertCount) const
|
||||||
|
{
|
||||||
|
if (!available_) return;
|
||||||
|
|
||||||
|
auto* dev = device_;
|
||||||
|
|
||||||
|
// ── Blocky BLAS ──
|
||||||
|
uint32_t blockyVertCount = blockyVertexCount_;
|
||||||
|
if (blockyVertCount < 3) blockyVertCount = 0;
|
||||||
|
if ((buildFlags & BUILD_BLOCKY) && blockyVertCount > 0 && blasPositionBuffer_.IsValid()) {
|
||||||
|
if (!blockyBLAS_.IsValid() || blockyVertCount > blockyBLASCapacity_) {
|
||||||
|
blockyBLASCapacity_ = blockyVertCount + blockyVertCount / 4;
|
||||||
|
|
||||||
|
RaytracingAccelerationStructureDesc desc;
|
||||||
|
desc.type = RaytracingAccelerationStructureDesc::Type::BOTTOMLEVEL;
|
||||||
|
desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
|
||||||
|
|
||||||
|
desc.bottom_level.geometries.resize(1);
|
||||||
|
auto& geom = desc.bottom_level.geometries[0];
|
||||||
|
geom.type = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::Type::TRIANGLES;
|
||||||
|
geom.flags = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::FLAG_OPAQUE;
|
||||||
|
geom.triangles.vertex_buffer = blasPositionBuffer_;
|
||||||
|
geom.triangles.vertex_byte_offset = 0;
|
||||||
|
geom.triangles.vertex_count = blockyBLASCapacity_;
|
||||||
|
geom.triangles.vertex_stride = sizeof(float) * 3;
|
||||||
|
geom.triangles.vertex_format = Format::R32G32B32_FLOAT;
|
||||||
|
geom.triangles.index_buffer = blasIndexBuffer_;
|
||||||
|
geom.triangles.index_count = blockyBLASCapacity_;
|
||||||
|
geom.triangles.index_format = IndexBufferFormat::UINT32;
|
||||||
|
geom.triangles.index_offset = 0;
|
||||||
|
|
||||||
|
bool ok = dev->CreateRaytracingAccelerationStructure(&desc, &blockyBLAS_);
|
||||||
|
if (ok) {
|
||||||
|
dev->SetName(&blockyBLAS_, "VoxelRTManager::blockyBLAS");
|
||||||
|
wi::backlog::post("VoxelRTManager: blocky BLAS created (capacity "
|
||||||
|
+ std::to_string(blockyBLASCapacity_ / 3) + " tris)");
|
||||||
|
} else {
|
||||||
|
wi::backlog::post("VoxelRTManager: failed to create blocky BLAS", wi::backlog::LogLevel::Error);
|
||||||
|
available_ = false;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
blockyBLAS_.desc.bottom_level.geometries[0].triangles.vertex_count = blockyVertCount;
|
||||||
|
blockyBLAS_.desc.bottom_level.geometries[0].triangles.index_count = blockyVertCount;
|
||||||
|
dev->BuildRaytracingAccelerationStructure(&blockyBLAS_, cmd, nullptr);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Smooth BLAS ──
|
||||||
|
if (smoothVertCount < 3) smoothVertCount = 0;
|
||||||
|
if ((buildFlags & BUILD_SMOOTH) && smoothVertCount > 0 && smoothVB.IsValid()) {
|
||||||
|
if (!smoothBLAS_.IsValid() || smoothVertCount > smoothBLASCapacity_) {
|
||||||
|
smoothBLASCapacity_ = smoothVertCount + smoothVertCount / 4;
|
||||||
|
|
||||||
|
RaytracingAccelerationStructureDesc desc;
|
||||||
|
desc.type = RaytracingAccelerationStructureDesc::Type::BOTTOMLEVEL;
|
||||||
|
desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
|
||||||
|
|
||||||
|
desc.bottom_level.geometries.resize(1);
|
||||||
|
auto& geom = desc.bottom_level.geometries[0];
|
||||||
|
geom.type = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::Type::TRIANGLES;
|
||||||
|
geom.flags = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::FLAG_OPAQUE;
|
||||||
|
geom.triangles.vertex_buffer = smoothVB;
|
||||||
|
geom.triangles.vertex_byte_offset = 0;
|
||||||
|
geom.triangles.vertex_count = smoothBLASCapacity_;
|
||||||
|
geom.triangles.vertex_stride = 32;
|
||||||
|
geom.triangles.index_buffer = blasIndexBuffer_;
|
||||||
|
geom.triangles.index_count = smoothBLASCapacity_;
|
||||||
|
geom.triangles.index_format = IndexBufferFormat::UINT32;
|
||||||
|
geom.triangles.index_offset = 0;
|
||||||
|
geom.triangles.vertex_format = Format::R32G32B32_FLOAT;
|
||||||
|
|
||||||
|
bool ok = dev->CreateRaytracingAccelerationStructure(&desc, &smoothBLAS_);
|
||||||
|
if (ok) {
|
||||||
|
dev->SetName(&smoothBLAS_, "VoxelRTManager::smoothBLAS");
|
||||||
|
wi::backlog::post("VoxelRTManager: smooth BLAS created (capacity "
|
||||||
|
+ std::to_string(smoothBLASCapacity_ / 3) + " tris)");
|
||||||
|
} else {
|
||||||
|
wi::backlog::post("VoxelRTManager: failed to create smooth BLAS", wi::backlog::LogLevel::Error);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (smoothBLAS_.IsValid()) {
|
||||||
|
smoothBLAS_.desc.bottom_level.geometries[0].triangles.vertex_count = smoothVertCount;
|
||||||
|
smoothBLAS_.desc.bottom_level.geometries[0].triangles.index_count = smoothVertCount;
|
||||||
|
dev->BuildRaytracingAccelerationStructure(&smoothBLAS_, cmd, nullptr);
|
||||||
|
}
|
||||||
|
|
||||||
|
smoothVertexCount_ = smoothVertCount;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Toping BLAS ──
|
||||||
|
uint32_t topingVertCount = topingVertexCount_;
|
||||||
|
if ((buildFlags & BUILD_TOPING) && topingVertCount >= 3 && topingBLASPositionBuf_.isValid()) {
|
||||||
|
if (!topingBLAS_.IsValid() || topingVertCount > topingBLASASCapacity_) {
|
||||||
|
topingBLASASCapacity_ = topingVertCount + topingVertCount / 4;
|
||||||
|
|
||||||
|
RaytracingAccelerationStructureDesc desc;
|
||||||
|
desc.type = RaytracingAccelerationStructureDesc::Type::BOTTOMLEVEL;
|
||||||
|
desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
|
||||||
|
|
||||||
|
desc.bottom_level.geometries.resize(1);
|
||||||
|
auto& geom = desc.bottom_level.geometries[0];
|
||||||
|
geom.type = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::Type::TRIANGLES;
|
||||||
|
geom.flags = RaytracingAccelerationStructureDesc::BottomLevel::Geometry::FLAG_OPAQUE;
|
||||||
|
geom.triangles.vertex_buffer = topingBLASPositionBuf_.gpu;
|
||||||
|
geom.triangles.vertex_byte_offset = 0;
|
||||||
|
geom.triangles.vertex_count = topingBLASASCapacity_;
|
||||||
|
geom.triangles.vertex_stride = sizeof(float) * 3;
|
||||||
|
geom.triangles.vertex_format = Format::R32G32B32_FLOAT;
|
||||||
|
geom.triangles.index_buffer = topingBLASIndexBuffer_;
|
||||||
|
geom.triangles.index_count = topingBLASASCapacity_;
|
||||||
|
geom.triangles.index_format = IndexBufferFormat::UINT32;
|
||||||
|
geom.triangles.index_offset = 0;
|
||||||
|
|
||||||
|
bool ok = dev->CreateRaytracingAccelerationStructure(&desc, &topingBLAS_);
|
||||||
|
if (ok) {
|
||||||
|
dev->SetName(&topingBLAS_, "VoxelRTManager::topingBLAS");
|
||||||
|
wi::backlog::post("VoxelRTManager: toping BLAS created (capacity "
|
||||||
|
+ std::to_string(topingBLASASCapacity_ / 3) + " tris)");
|
||||||
|
} else {
|
||||||
|
wi::backlog::post("VoxelRTManager: failed to create toping BLAS", wi::backlog::LogLevel::Error);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (topingBLAS_.IsValid()) {
|
||||||
|
topingBLAS_.desc.bottom_level.geometries[0].triangles.vertex_count = topingVertCount;
|
||||||
|
topingBLAS_.desc.bottom_level.geometries[0].triangles.index_count = topingVertCount;
|
||||||
|
dev->BuildRaytracingAccelerationStructure(&topingBLAS_, cmd, nullptr);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Memory barrier: sync BLAS builds before TLAS
|
||||||
|
{
|
||||||
|
GPUBarrier barriers[] = { GPUBarrier::Memory() };
|
||||||
|
dev->Barrier(barriers, 1, cmd);
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── TLAS ──
|
||||||
|
uint32_t instanceCount = 0;
|
||||||
|
if (blockyBLAS_.IsValid()) instanceCount++;
|
||||||
|
if (smoothBLAS_.IsValid() && smoothVertCount > 0) instanceCount++;
|
||||||
|
if (topingBLAS_.IsValid() && topingVertCount >= 3) instanceCount++;
|
||||||
|
if (instanceCount == 0) { dirty = false; return; }
|
||||||
|
|
||||||
|
if (!tlas_.IsValid() || instanceCount != tlasInstanceCount_) {
|
||||||
|
const size_t instSize = dev->GetTopLevelAccelerationStructureInstanceSize();
|
||||||
|
|
||||||
|
auto setIdentity = [](float transform[3][4]) {
|
||||||
|
std::memset(transform, 0, sizeof(float) * 12);
|
||||||
|
transform[0][0] = 1.0f;
|
||||||
|
transform[1][1] = 1.0f;
|
||||||
|
transform[2][2] = 1.0f;
|
||||||
|
};
|
||||||
|
|
||||||
|
const RaytracingAccelerationStructure* blockyPtr = blockyBLAS_.IsValid() ? &blockyBLAS_ : nullptr;
|
||||||
|
const RaytracingAccelerationStructure* smoothPtr = (smoothBLAS_.IsValid() && smoothVertCount > 0) ? &smoothBLAS_ : nullptr;
|
||||||
|
const RaytracingAccelerationStructure* topingPtr = (topingBLAS_.IsValid() && topingVertCount >= 3) ? &topingBLAS_ : nullptr;
|
||||||
|
|
||||||
|
RaytracingAccelerationStructureDesc desc;
|
||||||
|
desc.flags = RaytracingAccelerationStructureDesc::FLAG_PREFER_FAST_BUILD;
|
||||||
|
desc.type = RaytracingAccelerationStructureDesc::Type::TOPLEVEL;
|
||||||
|
desc.top_level.count = instanceCount;
|
||||||
|
|
||||||
|
GPUBufferDesc bufdesc;
|
||||||
|
bufdesc.misc_flags = ResourceMiscFlag::RAY_TRACING;
|
||||||
|
bufdesc.stride = (uint32_t)instSize;
|
||||||
|
bufdesc.size = bufdesc.stride * desc.top_level.count;
|
||||||
|
|
||||||
|
auto initInstances = [&](void* dest) {
|
||||||
|
uint32_t idx = 0;
|
||||||
|
auto addInstance = [&](const RaytracingAccelerationStructure* blas, uint32_t id) {
|
||||||
|
if (!blas) return;
|
||||||
|
RaytracingAccelerationStructureDesc::TopLevel::Instance inst;
|
||||||
|
setIdentity(inst.transform);
|
||||||
|
inst.instance_id = id; inst.instance_mask = 0xFF;
|
||||||
|
inst.instance_contribution_to_hit_group_index = 0; inst.flags = 0;
|
||||||
|
inst.bottom_level = blas;
|
||||||
|
dev->WriteTopLevelAccelerationStructureInstance(&inst, (uint8_t*)dest + idx * instSize);
|
||||||
|
idx++;
|
||||||
|
};
|
||||||
|
addInstance(blockyPtr, 0);
|
||||||
|
addInstance(smoothPtr, 1);
|
||||||
|
addInstance(topingPtr, 2);
|
||||||
|
};
|
||||||
|
|
||||||
|
bool ok = dev->CreateBuffer2(&bufdesc, initInstances, &desc.top_level.instance_buffer);
|
||||||
|
if (!ok) {
|
||||||
|
wi::backlog::post("VoxelRTManager: failed to create TLAS instance buffer", wi::backlog::LogLevel::Error);
|
||||||
|
dirty = false;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
ok = dev->CreateRaytracingAccelerationStructure(&desc, &tlas_);
|
||||||
|
if (!ok) {
|
||||||
|
wi::backlog::post("VoxelRTManager: failed to create TLAS", wi::backlog::LogLevel::Error);
|
||||||
|
dirty = false;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
tlasInstanceCount_ = instanceCount;
|
||||||
|
wi::backlog::post("VoxelRTManager: TLAS created (" + std::to_string(instanceCount) + " instances)");
|
||||||
|
}
|
||||||
|
|
||||||
|
dev->BuildRaytracingAccelerationStructure(&tlas_, cmd, nullptr);
|
||||||
|
|
||||||
|
{
|
||||||
|
GPUBarrier barriers[] = { GPUBarrier::Memory(&tlas_) };
|
||||||
|
dev->Barrier(barriers, 1, cmd);
|
||||||
|
}
|
||||||
|
|
||||||
|
dirty = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── RT Shadow + AO dispatch ─────────────────────────────────────
|
||||||
|
|
||||||
|
void VoxelRTManager::dispatchShadows(CommandList cmd,
|
||||||
|
const Texture& depthBuffer,
|
||||||
|
const Texture& renderTarget,
|
||||||
|
const Texture& normalTarget,
|
||||||
|
const GPUBuffer& constantBuffer) const
|
||||||
|
{
|
||||||
|
if (!shadowsEnabled_ || !shadowShader_.IsValid() || !tlas_.IsValid())
|
||||||
|
return;
|
||||||
|
|
||||||
|
auto* dev = device_;
|
||||||
|
uint32_t w = renderTarget.GetDesc().width;
|
||||||
|
uint32_t h = renderTarget.GetDesc().height;
|
||||||
|
uint32_t gx = (w + 7) / 8;
|
||||||
|
uint32_t gy = (h + 7) / 8;
|
||||||
|
|
||||||
|
// Pass 1: Shadow + raw AO
|
||||||
|
{
|
||||||
|
GPUBarrier preBarriers[] = {
|
||||||
|
GPUBarrier::Image(&const_cast<Texture&>(depthBuffer),
|
||||||
|
ResourceState::DEPTHSTENCIL, ResourceState::SHADER_RESOURCE),
|
||||||
|
GPUBarrier::Image(&const_cast<Texture&>(renderTarget),
|
||||||
|
ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
|
||||||
|
GPUBarrier::Image(&aoRawTexture,
|
||||||
|
ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
|
||||||
|
};
|
||||||
|
dev->Barrier(preBarriers, 3, cmd);
|
||||||
|
|
||||||
|
dev->BindComputeShader(&shadowShader_, cmd);
|
||||||
|
dev->BindResource(&depthBuffer, 0, cmd);
|
||||||
|
dev->BindResource(&normalTarget, 1, cmd);
|
||||||
|
dev->BindResource(&tlas_, 2, cmd);
|
||||||
|
dev->BindResource(&aoHistoryTexture, 3, cmd);
|
||||||
|
dev->BindUAV(&renderTarget, 0, cmd);
|
||||||
|
dev->BindUAV(&aoRawTexture, 1, cmd);
|
||||||
|
dev->BindConstantBuffer(&constantBuffer, 0, cmd);
|
||||||
|
|
||||||
|
struct ShadowPush {
|
||||||
|
uint32_t width, height;
|
||||||
|
float normalBias, shadowMaxDist;
|
||||||
|
uint32_t debugMode;
|
||||||
|
float aoRadius;
|
||||||
|
uint32_t aoRayCount;
|
||||||
|
float aoStrength;
|
||||||
|
uint32_t frameIndex;
|
||||||
|
uint32_t historyValid;
|
||||||
|
uint32_t pad[2];
|
||||||
|
} pushData = {};
|
||||||
|
pushData.width = w;
|
||||||
|
pushData.height = h;
|
||||||
|
pushData.normalBias = 0.15f;
|
||||||
|
pushData.shadowMaxDist = 512.0f;
|
||||||
|
pushData.debugMode = shadowDebug_;
|
||||||
|
pushData.aoRadius = 8.0f;
|
||||||
|
pushData.aoRayCount = 4;
|
||||||
|
pushData.aoStrength = 0.7f;
|
||||||
|
pushData.frameIndex = frameCounter++;
|
||||||
|
pushData.historyValid = aoHistoryValid ? 1u : 0u;
|
||||||
|
dev->PushConstants(&pushData, sizeof(pushData), cmd);
|
||||||
|
dev->Dispatch(gx, gy, 1, cmd);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pass 1.5: Copy raw AO → history
|
||||||
|
{
|
||||||
|
GPUBarrier copyBarriers[] = {
|
||||||
|
GPUBarrier::Image(&aoRawTexture,
|
||||||
|
ResourceState::UNORDERED_ACCESS, ResourceState::COPY_SRC),
|
||||||
|
GPUBarrier::Image(&aoHistoryTexture,
|
||||||
|
ResourceState::SHADER_RESOURCE, ResourceState::COPY_DST),
|
||||||
|
};
|
||||||
|
dev->Barrier(copyBarriers, 2, cmd);
|
||||||
|
dev->CopyResource(&aoHistoryTexture, &aoRawTexture, cmd);
|
||||||
|
|
||||||
|
GPUBarrier postCopyBarriers[] = {
|
||||||
|
GPUBarrier::Image(&aoRawTexture,
|
||||||
|
ResourceState::COPY_SRC, ResourceState::SHADER_RESOURCE),
|
||||||
|
GPUBarrier::Image(&aoHistoryTexture,
|
||||||
|
ResourceState::COPY_DST, ResourceState::SHADER_RESOURCE),
|
||||||
|
};
|
||||||
|
dev->Barrier(postCopyBarriers, 2, cmd);
|
||||||
|
aoHistoryValid = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pass 2: Bilateral blur horizontal
|
||||||
|
{
|
||||||
|
GPUBarrier barriers[] = {
|
||||||
|
GPUBarrier::Image(&aoBlurredTexture,
|
||||||
|
ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
|
||||||
|
};
|
||||||
|
dev->Barrier(barriers, 1, cmd);
|
||||||
|
|
||||||
|
dev->BindComputeShader(&aoBlurShader_, cmd);
|
||||||
|
dev->BindResource(&aoRawTexture, 0, cmd);
|
||||||
|
dev->BindResource(&depthBuffer, 1, cmd);
|
||||||
|
dev->BindResource(&normalTarget, 2, cmd);
|
||||||
|
dev->BindUAV(&aoBlurredTexture, 0, cmd);
|
||||||
|
|
||||||
|
struct BlurPush {
|
||||||
|
uint32_t width, height, direction, radius;
|
||||||
|
float depthThreshold, normalThreshold;
|
||||||
|
uint32_t pad[6];
|
||||||
|
} blurPush = {};
|
||||||
|
blurPush.width = w; blurPush.height = h;
|
||||||
|
blurPush.direction = 0; blurPush.radius = 6;
|
||||||
|
blurPush.depthThreshold = 0.001f; blurPush.normalThreshold = 0.9f;
|
||||||
|
dev->PushConstants(&blurPush, sizeof(blurPush), cmd);
|
||||||
|
dev->Dispatch(gx, gy, 1, cmd);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pass 3: Bilateral blur vertical
|
||||||
|
{
|
||||||
|
GPUBarrier barriers[] = {
|
||||||
|
GPUBarrier::Image(&aoBlurredTexture,
|
||||||
|
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
|
||||||
|
GPUBarrier::Image(&aoRawTexture,
|
||||||
|
ResourceState::SHADER_RESOURCE, ResourceState::UNORDERED_ACCESS),
|
||||||
|
};
|
||||||
|
dev->Barrier(barriers, 2, cmd);
|
||||||
|
|
||||||
|
dev->BindComputeShader(&aoBlurShader_, cmd);
|
||||||
|
dev->BindResource(&aoBlurredTexture, 0, cmd);
|
||||||
|
dev->BindResource(&depthBuffer, 1, cmd);
|
||||||
|
dev->BindResource(&normalTarget, 2, cmd);
|
||||||
|
dev->BindUAV(&aoRawTexture, 0, cmd);
|
||||||
|
|
||||||
|
struct BlurPush {
|
||||||
|
uint32_t width, height, direction, radius;
|
||||||
|
float depthThreshold, normalThreshold;
|
||||||
|
uint32_t pad[6];
|
||||||
|
} blurPush = {};
|
||||||
|
blurPush.width = w; blurPush.height = h;
|
||||||
|
blurPush.direction = 1; blurPush.radius = 6;
|
||||||
|
blurPush.depthThreshold = 0.001f; blurPush.normalThreshold = 0.9f;
|
||||||
|
dev->PushConstants(&blurPush, sizeof(blurPush), cmd);
|
||||||
|
dev->Dispatch(gx, gy, 1, cmd);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pass 4: Apply blurred AO
|
||||||
|
{
|
||||||
|
GPUBarrier barriers[] = {
|
||||||
|
GPUBarrier::Image(&aoRawTexture,
|
||||||
|
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
|
||||||
|
};
|
||||||
|
dev->Barrier(barriers, 1, cmd);
|
||||||
|
|
||||||
|
dev->BindComputeShader(&aoApplyShader_, cmd);
|
||||||
|
dev->BindResource(&aoRawTexture, 0, cmd);
|
||||||
|
dev->BindResource(&depthBuffer, 1, cmd);
|
||||||
|
dev->BindUAV(&renderTarget, 0, cmd);
|
||||||
|
|
||||||
|
struct ApplyPush {
|
||||||
|
uint32_t width, height, debugMode;
|
||||||
|
uint32_t pad[9];
|
||||||
|
} applyPush = {};
|
||||||
|
applyPush.width = w; applyPush.height = h;
|
||||||
|
applyPush.debugMode = shadowDebug_;
|
||||||
|
dev->PushConstants(&applyPush, sizeof(applyPush), cmd);
|
||||||
|
dev->Dispatch(gx, gy, 1, cmd);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Restore resource states
|
||||||
|
GPUBarrier postBarriers[] = {
|
||||||
|
GPUBarrier::Image(&const_cast<Texture&>(depthBuffer),
|
||||||
|
ResourceState::SHADER_RESOURCE, ResourceState::DEPTHSTENCIL),
|
||||||
|
GPUBarrier::Image(&const_cast<Texture&>(renderTarget),
|
||||||
|
ResourceState::UNORDERED_ACCESS, ResourceState::SHADER_RESOURCE),
|
||||||
|
};
|
||||||
|
dev->Barrier(postBarriers, 2, cmd);
|
||||||
|
}
|
||||||
|
|
||||||
|
} // namespace voxel
|
||||||
124
src/voxel/VoxelRTManager.h
Normal file
124
src/voxel/VoxelRTManager.h
Normal file
|
|
@ -0,0 +1,124 @@
|
||||||
|
#pragma once
|
||||||
|
#include "DeferredGPUBuffer.h"
|
||||||
|
#include "WickedEngine.h"
|
||||||
|
|
||||||
|
namespace voxel {
|
||||||
|
|
||||||
|
// ── Ray Tracing Manager (Phase 6) ──────────────────────────────
|
||||||
|
// Groups all RT state: BLAS/TLAS management, shadow/AO dispatches.
|
||||||
|
// Extracted from VoxelRenderer to isolate the ~500 lines of RT code
|
||||||
|
// and its 20+ members for easier debugging and maintenance.
|
||||||
|
|
||||||
|
class VoxelRTManager {
|
||||||
|
public:
|
||||||
|
// ── Initialization ──────────────────────────────────────────
|
||||||
|
void initialize(wi::graphics::GraphicsDevice* device, uint32_t maxBlasVertices);
|
||||||
|
|
||||||
|
// ── BLAS extraction (compute shaders) ───────────────────────
|
||||||
|
|
||||||
|
// Extract blocky quad positions into BLAS vertex buffer.
|
||||||
|
void dispatchBLASExtract(wi::graphics::CommandList cmd,
|
||||||
|
const wi::graphics::GPUBuffer& quadBuffer,
|
||||||
|
const wi::graphics::GPUBuffer& chunkInfoBuffer,
|
||||||
|
uint32_t quadCount) const;
|
||||||
|
|
||||||
|
// Extract toping instance positions via GPU compute.
|
||||||
|
// groupBuffer/groupsGPU: toping BLAS group table.
|
||||||
|
void dispatchTopingBLASExtract(wi::graphics::CommandList cmd,
|
||||||
|
const wi::graphics::GPUBuffer& topingVertexBuffer,
|
||||||
|
const wi::graphics::GPUBuffer& topingInstanceBuffer,
|
||||||
|
const void* groupsGPUData, size_t groupsGPUSize,
|
||||||
|
uint32_t groupCount, uint32_t totalVertices) const;
|
||||||
|
|
||||||
|
// ── Acceleration structure build ────────────────────────────
|
||||||
|
static constexpr uint32_t BUILD_BLOCKY = 1 << 0;
|
||||||
|
static constexpr uint32_t BUILD_SMOOTH = 1 << 1;
|
||||||
|
static constexpr uint32_t BUILD_TOPING = 1 << 2;
|
||||||
|
static constexpr uint32_t BUILD_ALL = BUILD_BLOCKY | BUILD_SMOOTH | BUILD_TOPING;
|
||||||
|
|
||||||
|
void buildAccelerationStructures(wi::graphics::CommandList cmd,
|
||||||
|
uint32_t buildFlags,
|
||||||
|
const wi::graphics::GPUBuffer& smoothVB,
|
||||||
|
uint32_t smoothVertCount) const;
|
||||||
|
|
||||||
|
// ── RT Shadows + AO dispatch ────────────────────────────────
|
||||||
|
void dispatchShadows(wi::graphics::CommandList cmd,
|
||||||
|
const wi::graphics::Texture& depthBuffer,
|
||||||
|
const wi::graphics::Texture& renderTarget,
|
||||||
|
const wi::graphics::Texture& normalTarget,
|
||||||
|
const wi::graphics::GPUBuffer& constantBuffer) const;
|
||||||
|
|
||||||
|
// ── Toping BLAS buffer management ───────────────────────────
|
||||||
|
// Ensure capacity for toping BLAS position + index buffers.
|
||||||
|
// Returns true if buffers were (re)created.
|
||||||
|
bool ensureTopingBLASCapacity(uint32_t totalVertices);
|
||||||
|
|
||||||
|
// ── State queries ───────────────────────────────────────────
|
||||||
|
bool isAvailable() const { return available_; }
|
||||||
|
bool isReady() const { return available_ && tlas_.IsValid(); }
|
||||||
|
bool isShadowsEnabled() const { return shadowsEnabled_; }
|
||||||
|
void setShadowsEnabled(bool v) { shadowsEnabled_ = v; }
|
||||||
|
uint32_t getShadowDebug() const { return shadowDebug_; }
|
||||||
|
void setShadowDebug(uint32_t v) { shadowDebug_ = v; }
|
||||||
|
|
||||||
|
uint32_t getBlockyTriCount() const { return blockyVertexCount_ / 3; }
|
||||||
|
uint32_t getSmoothTriCount() const { return smoothVertexCount_ / 3; }
|
||||||
|
uint32_t getTopingTriCount() const { return topingVertexCount_ / 3; }
|
||||||
|
uint32_t getTopingVertexCount() const { return topingVertexCount_; }
|
||||||
|
uint32_t getTlasInstanceCount() const { return tlasInstanceCount_; }
|
||||||
|
const wi::graphics::RaytracingAccelerationStructure& getTLAS() const { return tlas_; }
|
||||||
|
|
||||||
|
// Dirty flags (public for VoxelRenderPath orchestration)
|
||||||
|
mutable bool dirty = true; // BLAS/TLAS need rebuild
|
||||||
|
mutable bool topingBLASDirty = false; // toping BLAS extract + rebuild needed
|
||||||
|
mutable bool aoHistoryValid = false;
|
||||||
|
mutable uint32_t frameCounter = 0;
|
||||||
|
mutable XMFLOAT4X4 prevViewProjection;
|
||||||
|
|
||||||
|
// AO textures (created by VoxelRenderPath::createRenderTargets)
|
||||||
|
mutable wi::graphics::Texture aoRawTexture;
|
||||||
|
mutable wi::graphics::Texture aoBlurredTexture;
|
||||||
|
mutable wi::graphics::Texture aoHistoryTexture;
|
||||||
|
|
||||||
|
private:
|
||||||
|
wi::graphics::GraphicsDevice* device_ = nullptr;
|
||||||
|
mutable bool available_ = false;
|
||||||
|
mutable bool shadowsEnabled_ = false;
|
||||||
|
mutable uint32_t shadowDebug_ = 0;
|
||||||
|
|
||||||
|
// Shaders
|
||||||
|
wi::graphics::Shader blasExtractShader_;
|
||||||
|
wi::graphics::Shader topingBLASShader_;
|
||||||
|
wi::graphics::Shader shadowShader_;
|
||||||
|
wi::graphics::Shader aoBlurShader_;
|
||||||
|
wi::graphics::Shader aoApplyShader_;
|
||||||
|
|
||||||
|
// Blocky BLAS resources
|
||||||
|
mutable wi::graphics::GPUBuffer blasPositionBuffer_;
|
||||||
|
wi::graphics::GPUBuffer blasIndexBuffer_;
|
||||||
|
mutable wi::graphics::RaytracingAccelerationStructure blockyBLAS_;
|
||||||
|
mutable uint32_t blockyBLASCapacity_ = 0;
|
||||||
|
mutable uint32_t blockyVertexCount_ = 0;
|
||||||
|
|
||||||
|
// Smooth BLAS
|
||||||
|
mutable wi::graphics::RaytracingAccelerationStructure smoothBLAS_;
|
||||||
|
mutable uint32_t smoothBLASCapacity_ = 0;
|
||||||
|
mutable uint32_t smoothVertexCount_ = 0;
|
||||||
|
|
||||||
|
// Toping BLAS
|
||||||
|
mutable wi::graphics::RaytracingAccelerationStructure topingBLAS_;
|
||||||
|
mutable uint32_t topingBLASASCapacity_ = 0;
|
||||||
|
mutable uint32_t topingVertexCount_ = 0;
|
||||||
|
mutable DeferredGPUBuffer topingBLASPositionBuf_;
|
||||||
|
mutable wi::graphics::GPUBuffer topingBLASIndexBuffer_;
|
||||||
|
mutable uint32_t topingBLASIndexCount_ = 0;
|
||||||
|
wi::graphics::GPUBuffer topingBLASGroupBuffer_;
|
||||||
|
|
||||||
|
// TLAS
|
||||||
|
mutable wi::graphics::RaytracingAccelerationStructure tlas_;
|
||||||
|
mutable uint32_t tlasInstanceCount_ = 0;
|
||||||
|
|
||||||
|
uint32_t maxBlasVertices_ = 0;
|
||||||
|
};
|
||||||
|
|
||||||
|
} // namespace voxel
|
||||||
File diff suppressed because it is too large
Load diff
|
|
@ -2,6 +2,8 @@
|
||||||
#include "VoxelWorld.h"
|
#include "VoxelWorld.h"
|
||||||
#include "VoxelMesher.h"
|
#include "VoxelMesher.h"
|
||||||
#include "TopingSystem.h"
|
#include "TopingSystem.h"
|
||||||
|
#include "DeferredGPUBuffer.h"
|
||||||
|
#include "VoxelRTManager.h"
|
||||||
#include "WickedEngine.h"
|
#include "WickedEngine.h"
|
||||||
|
|
||||||
namespace voxel {
|
namespace voxel {
|
||||||
|
|
@ -77,9 +79,7 @@ private:
|
||||||
wi::graphics::Shader topingPS_;
|
wi::graphics::Shader topingPS_;
|
||||||
wi::graphics::PipelineState topingPso_;
|
wi::graphics::PipelineState topingPso_;
|
||||||
wi::graphics::GPUBuffer topingVertexBuffer_; // StructuredBuffer<TopingVertex>, SRV t4
|
wi::graphics::GPUBuffer topingVertexBuffer_; // StructuredBuffer<TopingVertex>, SRV t4
|
||||||
wi::graphics::GPUBuffer topingInstanceBuffer_; // StructuredBuffer<float3>, SRV t5
|
DeferredGPUBuffer topingInstanceBuf_; // StructuredBuffer<float3>, SRV t5
|
||||||
mutable uint32_t topingInstanceCapacity_ = 0; // pre-allocated capacity (avoid per-frame CreateBuffer)
|
|
||||||
mutable bool topingInstanceDirty_ = false; // deferred upload via UpdateBuffer in Render()
|
|
||||||
static constexpr uint32_t MAX_TOPING_INSTANCES = 256 * 1024; // 256K instances max
|
static constexpr uint32_t MAX_TOPING_INSTANCES = 256 * 1024; // 256K instances max
|
||||||
// Persistent staging buffers for toping upload (avoids per-frame allocations)
|
// Persistent staging buffers for toping upload (avoids per-frame allocations)
|
||||||
struct TopingSortedInst { float wx, wy, wz; uint16_t type, variant; };
|
struct TopingSortedInst { float wx, wy, wz; uint16_t type, variant; };
|
||||||
|
|
@ -96,8 +96,7 @@ private:
|
||||||
};
|
};
|
||||||
std::vector<TopingDrawGroup> topingDrawGroups_; // built in uploadTopingData, reused in renderTopings
|
std::vector<TopingDrawGroup> topingDrawGroups_; // built in uploadTopingData, reused in renderTopings
|
||||||
|
|
||||||
// ── GPU compute toping BLAS extraction (replaces 196ms CPU loop) ──
|
// ── Toping BLAS group staging (passed to VoxelRTManager) ──────
|
||||||
wi::graphics::Shader topingBLASShader_; // voxelTopingBLASCS compute shader
|
|
||||||
struct TopingBLASGroupGPU {
|
struct TopingBLASGroupGPU {
|
||||||
uint32_t globalVertexOffset; // prefix sum of total vertices before this group
|
uint32_t globalVertexOffset; // prefix sum of total vertices before this group
|
||||||
uint32_t vertexTemplateOffset; // offset into topingVertices (t4)
|
uint32_t vertexTemplateOffset; // offset into topingVertices (t4)
|
||||||
|
|
@ -105,24 +104,19 @@ private:
|
||||||
uint32_t instanceOffset; // offset into topingInstances (t5)
|
uint32_t instanceOffset; // offset into topingInstances (t5)
|
||||||
uint32_t instanceCount; // instances in this group
|
uint32_t instanceCount; // instances in this group
|
||||||
};
|
};
|
||||||
wi::graphics::GPUBuffer topingBLASGroupBuffer_; // StructuredBuffer<TopingBLASGroupGPU>, SRV t7
|
|
||||||
std::vector<TopingBLASGroupGPU> topingBLASGroupsGPU_; // CPU staging for group table
|
std::vector<TopingBLASGroupGPU> topingBLASGroupsGPU_; // CPU staging for group table
|
||||||
mutable uint32_t topingBLASTotalVertices_ = 0;
|
mutable uint32_t topingBLASTotalVertices_ = 0;
|
||||||
static constexpr uint32_t MAX_TOPING_BLAS_GROUPS = 64;
|
|
||||||
void dispatchTopingBLASExtract(wi::graphics::CommandList cmd) const;
|
|
||||||
|
|
||||||
// Shaders & Pipeline (smooth surfaces, Phase 5)
|
// Shaders & Pipeline (smooth surfaces, Phase 5)
|
||||||
wi::graphics::Shader smoothVS_;
|
wi::graphics::Shader smoothVS_;
|
||||||
wi::graphics::Shader smoothPS_;
|
wi::graphics::Shader smoothPS_;
|
||||||
wi::graphics::RasterizerState smoothRasterizer_;
|
wi::graphics::RasterizerState smoothRasterizer_;
|
||||||
wi::graphics::PipelineState smoothPso_;
|
wi::graphics::PipelineState smoothPso_;
|
||||||
wi::graphics::GPUBuffer smoothVertexBuffer_; // StructuredBuffer<SmoothVertex>, SRV t6
|
DeferredGPUBuffer smoothVertexBuf_; // StructuredBuffer<SmoothVertex>, SRV t6
|
||||||
mutable uint32_t smoothVertexCapacity_ = 0; // pre-allocated capacity (avoid per-frame CreateBuffer)
|
|
||||||
std::vector<SmoothVertex> smoothStagingVerts_; // persistent staging buffer (avoids per-frame alloc)
|
std::vector<SmoothVertex> smoothStagingVerts_; // persistent staging buffer (avoids per-frame alloc)
|
||||||
static constexpr uint32_t MAX_SMOOTH_VERTICES = 4 * 1024 * 1024; // 4M vertices max
|
static constexpr uint32_t MAX_SMOOTH_VERTICES = 4 * 1024 * 1024; // 4M vertices max
|
||||||
mutable uint32_t smoothVertexCount_ = 0;
|
mutable uint32_t smoothVertexCount_ = 0;
|
||||||
mutable uint32_t smoothDrawCalls_ = 0;
|
mutable uint32_t smoothDrawCalls_ = 0;
|
||||||
mutable bool smoothVertexDirty_ = false; // deferred upload via UpdateBuffer in Render()
|
|
||||||
bool smoothDirty_ = true;
|
bool smoothDirty_ = true;
|
||||||
|
|
||||||
// Texture array for materials (256x256, 5 layers for prototype)
|
// Texture array for materials (256x256, 5 layers for prototype)
|
||||||
|
|
@ -201,58 +195,9 @@ private:
|
||||||
mutable uint32_t gpuSmoothVertexCount_ = 0; // readback from previous frame
|
mutable uint32_t gpuSmoothVertexCount_ = 0; // readback from previous frame
|
||||||
mutable bool gpuSmoothMeshDirty_ = true;
|
mutable bool gpuSmoothMeshDirty_ = true;
|
||||||
|
|
||||||
// ── Ray Tracing (Phase 6.1) ─────────────────────────────────────
|
// ── Ray Tracing (Phase 6) ────────────────────────────────────────
|
||||||
wi::graphics::Shader blasExtractShader_; // voxelBLASExtractCS compute shader
|
|
||||||
mutable wi::graphics::GPUBuffer blasPositionBuffer_; // float3[] for blocky BLAS (6 verts per quad)
|
|
||||||
wi::graphics::GPUBuffer blasIndexBuffer_; // sequential uint32 indices [0,1,2,...] for BLAS
|
|
||||||
mutable wi::graphics::RaytracingAccelerationStructure blockyBLAS_;
|
|
||||||
mutable wi::graphics::RaytracingAccelerationStructure smoothBLAS_;
|
|
||||||
mutable wi::graphics::RaytracingAccelerationStructure topingBLAS_;
|
|
||||||
mutable wi::graphics::RaytracingAccelerationStructure tlas_;
|
|
||||||
mutable wi::graphics::GPUBuffer topingBLASPositionBuffer_; // float3[] world-space toping positions
|
|
||||||
mutable wi::graphics::GPUBuffer topingBLASIndexBuffer_; // sequential indices for toping BLAS
|
|
||||||
mutable uint32_t topingBLASPositionCapacity_ = 0; // pre-allocated capacity (vertices)
|
|
||||||
mutable uint32_t topingBLASIndexCount_ = 0; // size of toping index buffer
|
|
||||||
mutable bool topingBLASDirty_ = false; // GPU compute BLAS extract + rebuild needed
|
|
||||||
mutable uint32_t topingBLASVertexCount_ = 0; // actual vertex count for current frame
|
|
||||||
static constexpr uint32_t MAX_BLAS_VERTICES = MEGA_BUFFER_CAPACITY * 6; // 6 verts per quad
|
static constexpr uint32_t MAX_BLAS_VERTICES = MEGA_BUFFER_CAPACITY * 6; // 6 verts per quad
|
||||||
mutable bool rtAvailable_ = false; // GPU supports RT
|
mutable VoxelRTManager rt_;
|
||||||
mutable bool rtDirty_ = true; // BLAS/TLAS need rebuild
|
|
||||||
mutable uint32_t rtBlockyVertexCount_ = 0; // current blocky BLAS vertex count
|
|
||||||
mutable uint32_t rtSmoothVertexCount_ = 0; // current smooth BLAS vertex count
|
|
||||||
mutable uint32_t rtTopingVertexCount_ = 0; // current toping BLAS vertex count
|
|
||||||
// BLAS capacity tracking: only recreate AS when vertex count exceeds capacity
|
|
||||||
mutable uint32_t blockyBLASCapacity_ = 0; // vertex count at BLAS creation
|
|
||||||
mutable uint32_t smoothBLASCapacity_ = 0;
|
|
||||||
mutable uint32_t topingBLASASCapacity_ = 0; // separate from topingBLASPositionCapacity_ (buffer capacity)
|
|
||||||
mutable uint32_t tlasInstanceCount_ = 0; // track TLAS instance count to avoid per-frame recreation
|
|
||||||
|
|
||||||
void dispatchBLASExtract(wi::graphics::CommandList cmd) const;
|
|
||||||
// Flags for selective BLAS rebuild
|
|
||||||
static constexpr uint32_t RT_BUILD_BLOCKY = 1 << 0;
|
|
||||||
static constexpr uint32_t RT_BUILD_SMOOTH = 1 << 1;
|
|
||||||
static constexpr uint32_t RT_BUILD_TOPING = 1 << 2;
|
|
||||||
static constexpr uint32_t RT_BUILD_ALL = RT_BUILD_BLOCKY | RT_BUILD_SMOOTH | RT_BUILD_TOPING;
|
|
||||||
void buildAccelerationStructures(wi::graphics::CommandList cmd,
|
|
||||||
uint32_t buildFlags = RT_BUILD_ALL) const;
|
|
||||||
|
|
||||||
// ── RT Shadows + AO (Phase 6.2 + 6.3) ──────────────────────────
|
|
||||||
wi::graphics::Shader shadowShader_; // voxelShadowCS compute shader
|
|
||||||
wi::graphics::Shader aoBlurShader_; // voxelAOBlurCS compute shader
|
|
||||||
wi::graphics::Shader aoApplyShader_; // voxelAOApplyCS compute shader
|
|
||||||
mutable wi::graphics::Texture aoRawTexture_; // R8_UNORM: raw AO from shadow CS
|
|
||||||
mutable wi::graphics::Texture aoBlurredTexture_; // R8_UNORM: after bilateral blur
|
|
||||||
mutable wi::graphics::Texture aoHistoryTexture_; // R8_UNORM: previous frame's temporally accumulated AO
|
|
||||||
mutable XMFLOAT4X4 prevViewProjection_; // previous frame's VP matrix
|
|
||||||
mutable uint32_t frameCounter_ = 0;
|
|
||||||
mutable bool aoHistoryValid_ = false;
|
|
||||||
mutable bool rtShadowsEnabled_ = false; // true when shader + TLAS ready
|
|
||||||
mutable uint32_t rtShadowDebug_ = 0; // 0=off, 1=debug shadows, 2=debug AO
|
|
||||||
|
|
||||||
void dispatchShadows(wi::graphics::CommandList cmd,
|
|
||||||
const wi::graphics::Texture& depthBuffer,
|
|
||||||
const wi::graphics::Texture& renderTarget,
|
|
||||||
const wi::graphics::Texture& normalTarget) const;
|
|
||||||
|
|
||||||
void dispatchGpuMesh(wi::graphics::CommandList cmd, const VoxelWorld& world,
|
void dispatchGpuMesh(wi::graphics::CommandList cmd, const VoxelWorld& world,
|
||||||
ProfileAccum* profPack = nullptr, ProfileAccum* profUpload = nullptr,
|
ProfileAccum* profPack = nullptr, ProfileAccum* profUpload = nullptr,
|
||||||
|
|
@ -298,9 +243,9 @@ public:
|
||||||
float getGpuBLASExtractTimeMs() const { return gpuBLASExtractTimeMs_; }
|
float getGpuBLASExtractTimeMs() const { return gpuBLASExtractTimeMs_; }
|
||||||
float getGpuBLASBuildTimeMs() const { return gpuBLASBuildTimeMs_; }
|
float getGpuBLASBuildTimeMs() const { return gpuBLASBuildTimeMs_; }
|
||||||
float getGpuRTShadowsTimeMs() const { return gpuRTShadowsTimeMs_; }
|
float getGpuRTShadowsTimeMs() const { return gpuRTShadowsTimeMs_; }
|
||||||
void toggleRTShadows() { rtShadowsEnabled_ = !rtShadowsEnabled_; }
|
|
||||||
bool isGpuMeshEnabled() const { return gpuMesherAvailable_; }
|
bool isGpuMeshEnabled() const { return gpuMesherAvailable_; }
|
||||||
uint32_t getGpuMeshQuadCount() const { return gpuMeshQuadCount_; }
|
uint32_t getGpuMeshQuadCount() const { return gpuMeshQuadCount_; }
|
||||||
|
VoxelRTManager& rt() const { return rt_; }
|
||||||
|
|
||||||
// Phase 4: Toping rendering
|
// Phase 4: Toping rendering
|
||||||
void uploadTopingData(const TopingSystem& topingSystem);
|
void uploadTopingData(const TopingSystem& topingSystem);
|
||||||
|
|
@ -325,14 +270,90 @@ public:
|
||||||
uint32_t getSmoothVertexCount() const { return (smoothCentroidShader_.IsValid() && smoothMeshShader_.IsValid()) ? gpuSmoothVertexCount_ : smoothVertexCount_; }
|
uint32_t getSmoothVertexCount() const { return (smoothCentroidShader_.IsValid() && smoothMeshShader_.IsValid()) ? gpuSmoothVertexCount_ : smoothVertexCount_; }
|
||||||
uint32_t getSmoothDrawCalls() const { return smoothDrawCalls_; }
|
uint32_t getSmoothDrawCalls() const { return smoothDrawCalls_; }
|
||||||
|
|
||||||
// Phase 6: Ray Tracing
|
// Phase 6: Ray Tracing (delegated to VoxelRTManager)
|
||||||
bool isRTAvailable() const { return rtAvailable_; }
|
bool isRTAvailable() const { return rt_.isAvailable(); }
|
||||||
bool isRTReady() const { return rtAvailable_ && tlas_.IsValid(); }
|
bool isRTReady() const { return rt_.isReady(); }
|
||||||
bool isRTShadowsEnabled() const { return rtShadowsEnabled_; }
|
bool isRTShadowsEnabled() const { return rt_.isShadowsEnabled(); }
|
||||||
uint32_t getRTBlockyTriCount() const { return rtBlockyVertexCount_ / 3; }
|
uint32_t getRTBlockyTriCount() const { return rt_.getBlockyTriCount(); }
|
||||||
uint32_t getRTSmoothTriCount() const { return rtSmoothVertexCount_ / 3; }
|
uint32_t getRTSmoothTriCount() const { return rt_.getSmoothTriCount(); }
|
||||||
uint32_t getRTTopingTriCount() const { return rtTopingVertexCount_ / 3; }
|
uint32_t getRTTopingTriCount() const { return rt_.getTopingTriCount(); }
|
||||||
const wi::graphics::RaytracingAccelerationStructure& getTLAS() const { return tlas_; }
|
const wi::graphics::RaytracingAccelerationStructure& getTLAS() const { return rt_.getTLAS(); }
|
||||||
|
};
|
||||||
|
|
||||||
|
// ── Camera Controller ────────────────────────────────────────────
|
||||||
|
struct CameraController {
|
||||||
|
float speed = 50.0f;
|
||||||
|
float sensitivity = 0.003f;
|
||||||
|
XMFLOAT3 pos = { 256.0f, 100.0f, 256.0f };
|
||||||
|
float pitch = -0.3f;
|
||||||
|
float yaw = 0.0f;
|
||||||
|
bool mouseCaptured = false;
|
||||||
|
|
||||||
|
void set(float x, float y, float z, float p, float yw) {
|
||||||
|
pos = { x, y, z }; pitch = p; yaw = yw;
|
||||||
|
}
|
||||||
|
void handleInput(float dt, wi::scene::CameraComponent* camera);
|
||||||
|
};
|
||||||
|
|
||||||
|
// ── Animation State ─────────────────────────────────────────────
|
||||||
|
struct AnimationState {
|
||||||
|
float windTime = 0.0f; // continuous, always running
|
||||||
|
bool terrainAnimated = false; // toggled with F3
|
||||||
|
float time = 0.0f; // current animation time offset
|
||||||
|
float accum = 0.0f; // accumulator for 30 Hz timer
|
||||||
|
static constexpr float INTERVAL = 1.0f / 30.0f; // ~33.3ms = 30 Hz
|
||||||
|
|
||||||
|
// Returns true when an animation tick should fire (call every frame).
|
||||||
|
bool tick(float dt) {
|
||||||
|
windTime += dt;
|
||||||
|
if (!terrainAnimated) return false;
|
||||||
|
accum += dt;
|
||||||
|
if (accum < INTERVAL) return false;
|
||||||
|
accum -= INTERVAL;
|
||||||
|
time += INTERVAL;
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
// ── CPU Profiling (averages every INTERVAL seconds) ─────────────
|
||||||
|
struct VoxelProfiler {
|
||||||
|
static constexpr float INTERVAL = 5.0f;
|
||||||
|
|
||||||
|
// Update() phase
|
||||||
|
ProfileAccum regenerate; // regenerateAnimated
|
||||||
|
ProfileAccum updateMeshes; // updateMeshes (rebuildChunkInfoOnly)
|
||||||
|
ProfileAccum topingCollect; // topingSystem.collectInstances
|
||||||
|
ProfileAccum topingUpload; // uploadTopingData
|
||||||
|
ProfileAccum smoothMesh; // SmoothMesher::meshChunk (all chunks)
|
||||||
|
ProfileAccum smoothUpload; // uploadSmoothData
|
||||||
|
ProfileAccum frame; // full frame (Update only - legacy)
|
||||||
|
|
||||||
|
// Render() phase
|
||||||
|
ProfileAccum voxelPack; // voxel data packing in dispatchGpuMesh
|
||||||
|
ProfileAccum gpuUpload; // GPU upload in dispatchGpuMesh
|
||||||
|
ProfileAccum gpuDispatch; // compute dispatches in dispatchGpuMesh
|
||||||
|
ProfileAccum gpuMeshDispatch; // GPU mesh compute dispatch (in Render)
|
||||||
|
ProfileAccum gpuSmoothDispatch; // GPU smooth mesh dispatch (in Render)
|
||||||
|
ProfileAccum blasExtract; // BLAS position extraction compute
|
||||||
|
ProfileAccum blasBuild; // BLAS/TLAS build
|
||||||
|
ProfileAccum deferredUpload; // deferred GPU buffer uploads
|
||||||
|
ProfileAccum render; // render() draw calls
|
||||||
|
ProfileAccum rtShadows; // RT shadows + AO dispatch
|
||||||
|
|
||||||
|
// Totals
|
||||||
|
ProfileAccum fullFrame; // true full frame (Update + Render + Compose)
|
||||||
|
ProfileAccum gpuWait; // GPU sync: time between Compose end and next Update start
|
||||||
|
ProfileAccum wickedRender; // RenderPath3D::Render() (Wicked internal)
|
||||||
|
ProfileAccum trueFrame; // wall-clock frame-to-frame time
|
||||||
|
|
||||||
|
// Timing helpers
|
||||||
|
std::chrono::high_resolution_clock::time_point frameStart;
|
||||||
|
std::chrono::high_resolution_clock::time_point lastComposeEnd;
|
||||||
|
bool lastComposeEndValid = false;
|
||||||
|
float timer = 0.0f;
|
||||||
|
|
||||||
|
void log(const VoxelRenderer& renderer) const;
|
||||||
|
void resetAll();
|
||||||
};
|
};
|
||||||
|
|
||||||
// ── Custom RenderPath that integrates voxel rendering ───────────
|
// ── Custom RenderPath that integrates voxel rendering ───────────
|
||||||
|
|
@ -345,15 +366,14 @@ public:
|
||||||
bool debugMode = false;
|
bool debugMode = false;
|
||||||
bool debugSmooth = false;
|
bool debugSmooth = false;
|
||||||
bool screenshotMode = false; // CLI "screenshot": auto-position camera, capture, quit
|
bool screenshotMode = false; // CLI "screenshot": auto-position camera, capture, quit
|
||||||
void setCamera(float x, float y, float z, float pitch, float yaw);
|
void setCamera(float x, float y, float z, float pitch, float yaw) {
|
||||||
|
camera_.set(x, y, z, pitch, yaw);
|
||||||
|
}
|
||||||
void resetAOHistory(); // invalidate temporal AO after camera jump
|
void resetAOHistory(); // invalidate temporal AO after camera jump
|
||||||
|
|
||||||
float cameraSpeed = 50.0f;
|
CameraController camera_;
|
||||||
float cameraSensitivity = 0.003f;
|
AnimationState anim_;
|
||||||
XMFLOAT3 cameraPos = { 256.0f, 100.0f, 256.0f };
|
mutable VoxelProfiler prof_;
|
||||||
float cameraPitch = -0.3f;
|
|
||||||
float cameraYaw = 0.0f;
|
|
||||||
bool mouseCaptured = false;
|
|
||||||
|
|
||||||
const wi::graphics::Texture& getVoxelRT() const { return voxelRT_; }
|
const wi::graphics::Texture& getVoxelRT() const { return voxelRT_; }
|
||||||
|
|
||||||
|
|
@ -363,57 +383,19 @@ public:
|
||||||
void Compose(wi::graphics::CommandList cmd) const override;
|
void Compose(wi::graphics::CommandList cmd) const override;
|
||||||
|
|
||||||
private:
|
private:
|
||||||
void handleInput(float dt);
|
|
||||||
void createRenderTargets();
|
void createRenderTargets();
|
||||||
mutable bool worldGenerated_ = false;
|
mutable bool worldGenerated_ = false;
|
||||||
mutable int frameCount_ = 0;
|
mutable int frameCount_ = 0;
|
||||||
mutable float lastDt_ = 0.016f;
|
mutable float lastDt_ = 0.016f;
|
||||||
mutable float smoothFps_ = 60.0f;
|
mutable float smoothFps_ = 60.0f;
|
||||||
|
|
||||||
// Wind animation (continuous, always running)
|
|
||||||
float windTime_ = 0.0f;
|
|
||||||
|
|
||||||
// Animated terrain (wave effect at 30 Hz, toggled with F3)
|
|
||||||
bool animatedTerrain_ = false;
|
|
||||||
float animTime_ = 0.0f;
|
|
||||||
float animAccum_ = 0.0f;
|
|
||||||
static constexpr float ANIM_INTERVAL = 1.0f / 30.0f; // ~33.3ms = 30 Hz
|
|
||||||
|
|
||||||
wi::graphics::Texture voxelRT_;
|
wi::graphics::Texture voxelRT_;
|
||||||
wi::graphics::Texture voxelNormalRT_; // Phase 6: world-space normals for RT shadows/AO
|
wi::graphics::Texture voxelNormalRT_; // Phase 6: world-space normals for RT shadows/AO
|
||||||
wi::graphics::Texture voxelDepth_;
|
wi::graphics::Texture voxelDepth_;
|
||||||
mutable bool rtCreated_ = false;
|
mutable bool rtCreated_ = false;
|
||||||
|
|
||||||
// ── CPU Profiling (averages every 5 seconds) ─────────────────
|
|
||||||
mutable ProfileAccum profRegenerate_; // regenerateAnimated
|
|
||||||
mutable ProfileAccum profUpdateMeshes_; // updateMeshes (rebuildChunkInfoOnly or CPU mesh)
|
|
||||||
mutable ProfileAccum profVoxelPack_; // voxel data packing in dispatchGpuMesh
|
|
||||||
mutable ProfileAccum profGpuUpload_; // GPU upload in dispatchGpuMesh
|
|
||||||
mutable ProfileAccum profGpuDispatch_; // compute dispatches in dispatchGpuMesh
|
|
||||||
mutable ProfileAccum profRender_; // render() draw calls
|
|
||||||
mutable ProfileAccum profFrame_; // full frame (Update only - legacy)
|
|
||||||
mutable ProfileAccum profFullFrame_; // true full frame (Update + Render + Compose)
|
|
||||||
mutable ProfileAccum profSmoothMesh_; // SmoothMesher::meshChunk (all chunks)
|
|
||||||
mutable ProfileAccum profSmoothUpload_; // uploadSmoothData
|
|
||||||
mutable ProfileAccum profTopingCollect_; // topingSystem.collectInstances
|
|
||||||
mutable ProfileAccum profTopingUpload_; // uploadTopingData
|
|
||||||
mutable ProfileAccum profGpuMeshDispatch_; // GPU mesh compute dispatch (in Render)
|
|
||||||
mutable ProfileAccum profGpuSmoothDispatch_; // GPU smooth mesh dispatch (in Render)
|
|
||||||
mutable ProfileAccum profBLASExtract_; // BLAS position extraction compute
|
|
||||||
mutable ProfileAccum profBLASBuild_; // BLAS/TLAS build
|
|
||||||
mutable ProfileAccum profDeferredUpload_; // deferred GPU buffer uploads
|
|
||||||
mutable ProfileAccum profRTShadows_; // RT shadows + AO dispatch
|
|
||||||
mutable ProfileAccum profGpuWait_; // GPU sync: time between Compose end and next Update start
|
|
||||||
mutable ProfileAccum profWickedRender_; // RenderPath3D::Render() (Wicked internal)
|
|
||||||
mutable ProfileAccum profTrueFrame_; // wall-clock frame-to-frame time
|
|
||||||
mutable std::chrono::high_resolution_clock::time_point frameStartTime_; // for full frame timing
|
|
||||||
mutable std::chrono::high_resolution_clock::time_point lastComposeEnd_; // for GPU wait measurement
|
|
||||||
mutable bool lastComposeEndValid_ = false;
|
|
||||||
mutable uint32_t rtBuildSkipCounter_ = 0; // stagger BLAS builds during animation
|
mutable uint32_t rtBuildSkipCounter_ = 0; // stagger BLAS builds during animation
|
||||||
mutable bool rtWasEnabled_ = false; // saved RT state before animation
|
mutable bool rtWasEnabled_ = false; // saved RT state before animation
|
||||||
mutable float profTimer_ = 0.0f;
|
|
||||||
static constexpr float PROF_INTERVAL = 5.0f;
|
|
||||||
void logProfilingAverages() const;
|
|
||||||
};
|
};
|
||||||
|
|
||||||
} // namespace voxel
|
} // namespace voxel
|
||||||
|
|
|
||||||
|
|
@ -330,3 +330,115 @@ Couleurs plus saturées/profondes — le vert de l'herbe Wonderbox est plus rich
|
||||||
Fog atmosphérique — la brume chaude au loin qui fond le terrain dans le ciel
|
Fog atmosphérique — la brume chaude au loin qui fond le terrain dans le ciel
|
||||||
Ombres plus marquées — le contraste ombre/lumière est plus prononcé dans Wonderbox
|
Ombres plus marquées — le contraste ombre/lumière est plus prononcé dans Wonderbox
|
||||||
Faces latérales des blocs — plus texturées/détaillées dans Wonderbox
|
Faces latérales des blocs — plus texturées/détaillées dans Wonderbox
|
||||||
|
|
||||||
|
Remaining (deferred):
|
||||||
|
|
||||||
|
DeferredGPUBuffer utility extraction — moderate refactor, 3 instances, lower priority
|
||||||
|
VoxelRTManager extraction — significant refactor, deferred
|
||||||
|
Toping instance sort+pack parallelization — P3
|
||||||
|
Voici mon analyse complète et mes propositions.
|
||||||
|
|
||||||
|
## 1. Gestion mémoire : état des lieux et clarification
|
||||||
|
|
||||||
|
**Pattern actuel (répété 5× dans le code) :**
|
||||||
|
```
|
||||||
|
uploadXxxData() {
|
||||||
|
staging.resize(count); // CPU staging vector
|
||||||
|
/* fill staging */
|
||||||
|
if (capacity < count) {
|
||||||
|
capacity = count * 1.25;
|
||||||
|
CreateBuffer(nullptr); // GPU buffer, no data
|
||||||
|
}
|
||||||
|
dirty = true; // flag for Render()
|
||||||
|
}
|
||||||
|
|
||||||
|
Render() {
|
||||||
|
if (dirty) {
|
||||||
|
UpdateBuffer(staging); // actual GPU upload
|
||||||
|
dirty = false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problèmes :**
|
||||||
|
- Pattern dupliqué pour : `topingInstanceBuffer_`, `topingBLASPositionBuffer_`, `smoothVertexBuffer_`, et les 3 BLAS + TLAS
|
||||||
|
- Chaque duplication a engendré des bugs (le crash `memmove`, la fuite VRAM BLAS, les ombres figées)
|
||||||
|
- Les dirty flags sont dispersés (`topingInstanceDirty_`, `smoothVertexDirty_`, `topingBLASDirty_`, `rtDirty_`) avec des dépendances d'ordre non-évidentes (le BLAS upload doit précéder le BLAS build)
|
||||||
|
- 15 membres `mutable` juste pour les flags + capacités
|
||||||
|
|
||||||
|
**Proposition :** Extraire un `DeferredGPUBuffer` encapsulant ce pattern :
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
struct DeferredGPUBuffer {
|
||||||
|
GPUBuffer gpu;
|
||||||
|
std::vector<uint8_t> staging;
|
||||||
|
uint32_t count = 0;
|
||||||
|
uint32_t capacity = 0;
|
||||||
|
uint32_t stride = 0;
|
||||||
|
bool dirty = false;
|
||||||
|
|
||||||
|
void prepare(uint32_t newCount, const void* data); // resize + fill + dirty=true
|
||||||
|
void upload(GraphicsDevice* dev, CommandList cmd); // UpdateBuffer + dirty=false
|
||||||
|
void ensureCapacity(GraphicsDevice* dev, uint32_t newCount, BindFlag flags);
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
Ça élimine ~50 lignes de boilerplate par buffer et centralise les invariants (capacity > count, create avec nullptr, upload avec taille réelle).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Performance : propositions sans régression fonctionnelle
|
||||||
|
|
||||||
|
### 2.2 — Paralléliser le tri + packing d'instances toping (~5ms → ~1ms)
|
||||||
|
|
||||||
|
Le `std::sort` sur 30K éléments et la copie dans `topingGpuInsts_` sont single-thread. Utiliser `wi::jobsystem` pour partitionner par type (2 types = 2 jobs), ou un counting sort (16 buckets par variant × 2 types = 32 buckets) qui est O(N) au lieu de O(N log N).
|
||||||
|
|
||||||
|
### 2.3 — Skip le BLAS rebuild quand seul le blocky change
|
||||||
|
|
||||||
|
Actuellement buildAccelerationStructures() rebuild les 3 BLAS + TLAS à chaque frame d'animation. Si seul le terrain blocky change (pas de vent/toping), le toping BLAS rebuild est inutile. Ajouter des dirty flags granulaires :
|
||||||
|
|
||||||
|
mutable bool blockyBLASDirty_ = false;
|
||||||
|
mutable bool smoothBLASDirty_ = false;
|
||||||
|
// topingBLASDirty_ existe déjà
|
||||||
|
|
||||||
|
## 3. Refactoring : axes de simplification
|
||||||
|
|
||||||
|
### 3.2 — Extraire le RT dans une classe dédiée
|
||||||
|
|
||||||
|
`VoxelRenderer` fait 2900+ lignes et mélange rendering, meshing, et ray tracing. Extraire un `VoxelRTManager` :
|
||||||
|
|
||||||
|
```cpp
|
||||||
|
class VoxelRTManager {
|
||||||
|
// BLAS/TLAS management, capacity tracking
|
||||||
|
// dispatchBLASExtract(), buildAccelerationStructures()
|
||||||
|
// dispatchShadows()
|
||||||
|
// All RT-related state (rtAvailable_, rtDirty_, aoTextures_, etc.)
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
Ça isole les ~500 lignes de RT et ses 20+ membres, rendant le debugging plus ciblé.
|
||||||
|
|
||||||
|
### 3.3 — Unifier le pattern deferred upload
|
||||||
|
|
||||||
|
Comme décrit en §1, le `DeferredGPUBuffer` centralisé évite la duplication error-prone. Chaque bug rencontré (crash memmove, VRAM leak, ombres figées) vient d'une variation mal implémentée de ce même pattern.
|
||||||
|
|
||||||
|
### 3.4 — Simplifier `VoxelRenderPath`
|
||||||
|
|
||||||
|
`VoxelRenderPath` fait office de "god object" : caméra, input, animation, profiling, render targets, wind. Extraire :
|
||||||
|
- Input/caméra → struct `CameraController`
|
||||||
|
- Profiling → struct `VoxelProfiler` (déjà un bon candidat, les `ProfileAccum` sont isolables)
|
||||||
|
- Animation → struct `AnimationState`
|
||||||
|
|
||||||
|
## Priorisation recommandée
|
||||||
|
|
||||||
|
| Priorité | Action | Impact perf | Effort |
|
||||||
|
|----------|--------|-------------|--------|
|
||||||
|
| **P1** | `DeferredGPUBuffer` (§3.3) | Prévention bugs | Moyen |
|
||||||
|
| **P2** | Extraire RT dans classe (§3.2) | Maintenabilité | Moyen |
|
||||||
|
| **P2** | Dirty flags granulaires BLAS (§2.3) | ~2-5ms/frame | Faible |
|
||||||
|
| **P3** | Paralléliser tri toping (§2.2) | ~4ms | Faible |
|
||||||
|
| **P3** | LOD topings en animation (§4.1) | Raster + BLAS | Moyen |
|
||||||
|
|
||||||
|
**Le P0 seul ramènerait le frame time de 232ms à ~35ms (~28 FPS), soit 6.5× mieux.** Combiné avec P2 dirty flags, on approche les 60 FPS cibles.
|
||||||
|
|
||||||
|
Dis-moi quelles priorités tu veux attaquer et dans quel ordre.
|
||||||
Loading…
Add table
Reference in a new issue