"VkDeviceLost-4" error message keeps throwing out

OS: Windows 10
Vulkan Version: 1.0.65.0
Nvidia driver version: 388.43
GPU: 750ti

I’m working on a project which is to build a high-performance rendering server based on Vulkan. The rendering server is multi-threaded in order to process rendering requests sent by users. Initially, for each unique model, an exclusive logical device is created and taken care by one rendering thread. Basically, it works fine until we found that there seems to be a limitation on maximum number of devices that can be created. And this limitation varies from device to device (My tested results: 750ti [Windows10 driver] → Max = ~81, on Titan Xp [Linux driver] → Max = ~35). In order to overcome this pitfall, we tried to create only one single logical device which is shared by every 3D model rendering threads. Every model has their own command buffer which is built in multi-threaded fashion and submitted to one graphics queue whenever a render request is received. Locks are heavily used in drawCall, buildSecondBuffer and uploadModelData stages in order to prevent from race conditions. But it doesn’t work this time. Error msg “VkDeviceLost-4” keeps throwing out when multiple rendering requests received at the same time.

According to Vulkan Specification:

  • A logical device may become lost because of hardware errors, execution timeouts, power management events and/or platform-specific events.
  • I guess this problem may caused by execution timeout, but I have no idea how this could happen.

    Due to patented issue, I cannot show you the source code(~5000 lines). But still, can any of you guys got any idea about this issue? What could potentially cause this problem? Does anyone else have the same issue? Really appreciate it if you can help me! Thanks in advance!