Vulkan Performance Support on GeForce 840M

redty11 · November 10, 2017, 3:47pm

For my final year project I am doing a performance comparison between DirectX 11 and Vulkan on a multitude of different hardware and operating systems. Early signs currently point to DirectX 11 being faster in my implementation.
I was wondering if anybody had any suggestions for Vulkan features or settings to look into? I was hoping that Vulkan would outperform DirectX 11 really…

My Vulkan project was built using: https://vulkan-tutorial.com/

Currently both projects just load in the Sponza scene as an OBJ, with 8 dynamic spotlights and a camera following a pre-determined path.

I am personally working on a laptop with an Intel i-7 CPU and a NVIDIA GeForce 840M GPU.

I am using Vulkan 1.0.61.1.

Any advice or suggestions would be welcome, I am happy to post specific pieces of code if that would help?

Thanks!

virtual_storm · November 11, 2017, 11:49am

I assume you want to compare CPU performance of DX11 and Vulkan, since the GPU side is the same for both. In that case loading a “small” scene with some lights is not going to do a lot. Since the DX11 drivers optimize for you in the background, your Vulkan implementation needs to beat the optimized driver heuristics. But beating DX11 with Vulkan is not hard if you craft a CPU-heavy workload, at least for me. Here are some of the things that run really nice on Vulkan, but not so nice on DX11:

10k+ single drawcalls per frame, changing bound textures between each drawcall. In Vulkan you can pre-bake the descriptor sets and just need to bind, in DX11 this involves more CPU. The more textures you bind the worse it gets: If you have 2000 textures in total and 10k sets (one unique set per drawcall) which select and bind 100 random textures of these 2000 textures, thats 1 million calls into the DX11 driver per frame to bind textures. If you pre-bake them on Vulkan into distinct descriptor sets, it will be a day-night difference.
10k+ single drawcalls per frame, alternating PSOs/pipelines. To make it hurt, use one pipeline with a tessellation/geometry shader and the other without. Same applies as point 1: If you have 10k unique PSOs/pipelines, all different, and you draw them in random order each frame, DX11 will go to its knee. If you pre-bake them on Vulkan on the other hand you will have no trouble.
If you draw a “static” scene, in the sense that only uniform data changes between frames, record the entire frame twice with different uniform buffer offsets at start and only submit each frame without recording anything, with double-buffered uniforms. This will make DX11 look really old. A submit on my system (Win 7 x64, stoneage i-7 920 2.6 GHz, 970 GTX) takes about 20-40 microseconds with Vulkan. If you draw a couple thousand objects, DX11 has no chance to match that.
If stability is a performance metric (it should be), being able to explicitly upload via the DMA queue in Vulkan is awesome and fully multithreaded. In DX11 this is not so easy and requires to hint the driver into using the DMA hardware. The same argument goes for the explicit PSO/pipeline compiles, precompiling all pipelines multithreaded saves a lot of hitches.

And just to mention it, you should profile Vulkan without the validation layers to get accurate measurements. Also measuring the CPU times is not so easy, GPUview on Windows helps, raw FPS on the other hand are pretty unreliable.

Regards

redty11 · November 11, 2017, 3:38pm

Thanks virtual_storm! That’s incredibly informative and useful!

I have about a month or so to make the tests as valid as possible, I’m currently in the process of multithreading the command buffers as I’ve read a few different reports that all say that Vulkan’s performance will be a lot stronger with it.

I’m not entirely sure that my laptop would hand 10k+ draw calls, unfortunately. I have a cube test where a render 500+ cubes and I hit about 70-85 FPS, so I think that I’d start running out of memory above 1000.

What do you mean by stability being a performance metric? Do you mean the stability of frames, as in a consistent frame rate? Or do you mean something else?

Validation layers are disabled in Release Mode. :)

virtual_storm · November 13, 2017, 10:59am

Multithreaded recording in Vulkan is a really nice feature, but you can max out the GPU with one thread easily. On my old CPU it takes about 1ms to record 10k separate draw calls incl. all the pipeline and descriptor bindings. So if we talk just recording, i could record 100k draw calls in a single thread on an 8 year old 2.6 GHz CPU and still get 60FPS if my GPU was fast enough.

If your GPU is too slow for you to measure the CPU side, simply reduce the GPU work without the CPU work: Render into a 1x1 rendertarget, sample from 1x1 textures, face the camera away from the objects being drawn etc.

I basically mean stability of frames or predictability of frame times. Since DX11 has a lot of magic in the driver, you have no control over what really happens on the hardware side. Random single-frame hick-ups are quite common in a lot of DX11 games and really break immersion, it is even worse in VR. Assets/Texture streaming is used quite often in games and needs to be done on the render thread in DX11, but can be done in a separate thread in Vulkan. So in Vulkan an upload of a large texture can span multiple frames easily without interrupting rendering since you can explicitly target the DMA hardware directly.

Most games atm that can run with Vulkan still have a DX9/DX10/DX11 engine design, so they need to compromise in Vulkan and do stuff like rewrite descriptor sets every frame, have a pipeline cache that creates pipelines on first use etc. If, on the other hand, you design a game engine with DX12 or Vulkan in mind from the start, you can get much better results. In my own renderer designed from scratch for Vulkan i not only pre-build all the layouts and pipelines, but i use sparse buffers/textures for the streamed in assets, which allows me to pre-create all buffers/images/views and therefore in turn pre-write all descriptor sets since the views stay static even if they are streamed in or out of actual memory. With the NVIDIA extension for device generated commands you can take this one step further and have a fully dynamic scene without basically any CPU recording at all except memory barriers and image transitions for streamed in assets on the render thread.

Regards

redty11 · November 19, 2017, 12:20pm

Again, thank you for your reply! :)

Trying to prove that an engine built from scratch using Vulkan has the potential for better performance is a large part of what I’m trying to prove as tests performed in existing AAA titles like DOOM have shown that adding Vulkan into an existing codebase doesn’t yield great results.

So, for stability of frames, I should record how many frames rendered each second, average them out then compare each second to the average to determine how much the framerate waivers? Maybe from a static camera with a static scene to minimise false results like there being twice as much geometry in a frame?

Thanks again!