Hi and thanks,
I’m using triangle meshes for now and Trbvh. Probably will try to support ply point clouds as voxels as well later on, but for now purely triangles, and quite uniformly sized usually.
Here are the device specs :
Enabled Device id: 0
Device: GeForce GTX 980 Ti, clock 1228000, compute capability [5, 2]
Memory: 5337261670 / 6442450944, max textures count: 1048576
Multiprocessor count: 22, threads per block: 1024
CUDA GPU Device 0:GeForce GTX 980 Ti cm 5.2
Times are not really an issue now since the acceleration will only have to be rebuilt at each load of a new object anyway, so the file loading and parsing times will dwarf the Acceleration build times, here is for a roughly 2.5M triangles model :
Frame 1 for entry 0 took: 2262.53 [msecs]
Frame 2 for entry 0 took: 0.00176 [msecs]
(As opposed to about 7000 ms for a plain BVH Accelerator)
So time difference between both frames should about acount for buffers copy from host to device and building of the acceleration structure. Not really an issue when compared with file parsing times anyway as you can see below :
filesize: 283338207
load time: 0.073019 [msecs]
of threads = 8
total parsing time: 22908.5 ms
line detection : 6167.46 ms
alloc buf : 653.088 ms
parse : 12704.9 ms
merge : 3372.69 ms
construct : 2094.18 ms
Geometry triangles count = 2589196
upload to device time: 9779.82 [msecs]
bmin = -9.359957, -8.549820, 1.697836
bmax = 21.165928, 5.092374, 10.014181
Available main device memory: 4084815462
Memory however will be a problem. I’d like to be able to work on 20M triangles and up, which I can’t fit on the GPU. So was planning to load in host memory and process them in chunks on the GPU. However I’ll need to treat space coherent chunks, so I’ll need a BVH on the host as well. I was hoping to save memory and ease space coherent swapping by holding a full model BVH in host memory, and uploading only a sub-tree of it to the device each time I swap chunks.
Maybe, without giving access to the innards of it, allowing the acceleration structure to be mapped to host memory and unmapped back to device could be a possibility?
So I guess I’d need to build a BVH myself and use the RTUtraversal_api rather ?
Because now the “built-in” solution I would not be very optimal : build a BVH on host, use it to select space coherent chunks of desired size, upload one chunk to device and ask for a Trbvh build on it each time (and then, speed might become an issue again since a brand new Trbvh will have to be rebuilt for each chunk?)