The reason I always use a staging image is to perform a blit operation to decompress potentially compressed formats. The code does this no matter if the texture is compressed or not. The mapping method is not only for reading, it’s for reading and writing, otherwise having a pre-allocated buffer for reads is viable. However the code is supposed to perform an actual memory map, and may have more than one resource mapped at the time, meaning its hard to predict how big a scratch buffer of relevant size will be. I know about persistently mapped buffers, used them frequently in OpenGL, and practice them where mapping and un-mapping is unnecessary, for example when updating uniforms, vertex/index buffers for dynamic geometry etc.
Trying to figure out which buffers are used in which shader stage is not so simple either. The only way I can see it work in a pipeline is to have some kind of annotation language and implement a preprocessor to detect which buffers are used in which shaders, or to hardcode the usages in the program. As far as I have seen the Khronos GLSL reference compiler cannot be used to retrieve which shader stages a uniform buffer is used in either, otherwise it would’ve been trivial. Right now I couldn’t easily distinguish where they are used, and even if I were, I would want my descriptor set layout to be as identical as possible between shaders, which they will not if one shader uses some stage flags while another uses some others.
When it comes to descriptor set binding, it’s mostly just ugly to have some system which incrementally saves every descriptor set being applied in a list, and then applies this entire list every time the descriptor sets needs to be bound, instead of having each stage of the rendering code update the state. Although I find that when using secondary command buffers within render passes, one must apply all descriptor sets anyways, since each secondary command buffer has its own local state. However, in that case, we can apply all ‘shared’ descriptor sets when we start recording to the secondary buffer, and then just incrementally apply the sets that change, which is what I am doing, but it also means it is important for the code to be able to assume which descriptor sets are compatible, and which are not. In my case, I let all descriptor sets down to and including number 5 be shared, so that I know sets 0-5 can be applied once and used by all. Then only the descriptor sets unique for that shader change per object, and with offsets into each uniform buffer for that set describing the slice used by the draw. This method allows me to avoid changing descriptor sets, but instead just supply them with offsets, which I vaguely recall nvidia claimed would be more efficient since the driver can detect if only the offsets change. I would assume it’s somewhat similar to how glBindBufferRange worked where only offsets changed.
The reason for the many uniform buffers is because many subsystems are responsible to update their share of the shader state. The descriptor sets are fixed from startup, meaning I have only one descriptor set per unique set layout, and a single buffer bound to every binding slot. The buffer is expanded when a new instance is requested, and each subsystem gets a slice in that buffer to work with. This keeps the descriptor set count down to a minimum, which is necessary because I found myself running out of memory on my AMD driver when using a descriptor set per material, so I figured that is not the proper use of descriptor sets. At the same time, updating a descriptor set has to be done after the work on said descriptor set is done, so updating descriptor sets to emulate ‘binding’ as in OpenGL or DirectX is not viable either. The idea is derived from keeping the total amount of allocations and buffers down to a minimum, in order to improve memory access, as explained here [url]https://developer.nvidia.com/vulkan-memory-management[/url].
I also implement texturing using the method purposed by AMD, where all textures are bound in a single descriptor set, and textures are addressed using integers found in uniform buffers. While the AMD driver supports 2^32 textures to be bound at the same time, the nvidia supports 49k, which is sufficient for most applications. To be honest, I can’t quite figure out how to have different textures per object without either having a descriptor set per material (which ran me out of memory on the AMD driver, no clue about the nvidia one) or updating the same descriptor between every draw (which stomps data and cannot work). It could work if there was a Cmd-type command for updating descriptor sets, but there isn’t. I guess putting textures in their own descriptor set to keep the descriptor set memory footprint as small as possible might be a solution, but I really prefer the AMD idea of just binding them all at once.
So the problem isn’t really with Vulkan itself, the problem is putting it in practice. I know I have some redundancies, for example GlobalLightBlock, CSMParamBlock and LightForwardBlock can be merged into one and the same for example, so I think that’s fine, I was just not prepared to be limited by the count of uniform buffer declarations. I would’ve assumed something slightly less obvious was causing the issue.