Execution time difference between WinXP and Win7

We are profiling GPU code to ensure that the execution time for each GPU is similar. The code was tested on two different machines (that are almost identical) with one running Windows XP 32-Bit and the other one dual booting XP / Win 7 64-Bit. The specs for the two machines are in the attached document in the “Experiments” Tab. To test the GPUs the same code was run on each individual GPU 5 times with each run consisting of 1000 GPU calls. Each GPU call was identical to remove the randomness from the test. In looking at the results (attached), the two XP machines performed as expected where 5 GPUs were identical, and 1 was slightly higher. This is expected since one of the GPUs is running the display, which is okay because that GPU can be avoided when running the code across multiple GPUs. However, on the win7 side of the dual boot machine the results showed that 3 of the GPUs were identical and 3 GPUs were quite random in execution time. This result was unexpected and undesirable for our application. Since the code worked on two different computers, and on different drivers of CUDA SDK we think there may be a driver issue or Win7 feature that is causing the unexpected results. We have updated the Win7 Nvidia drivers to the latest version, and also the CUDA SDK. Neither of these updates helped resolve the issue. Are there any known issues that cause a difference in execution times between Win7 and WinXP? Do you have any suggestions on how to fix unexpected results on the Win7 machine?

Profiling results are below:

NvidiaQuestion.xls (1.69 MB)

XP has a completely different Hardware Abstraction Layer.

The best thing to do is get in a Time Machine and go back before the time (summer of 2005) that teenager Sven Jaschan in Germany wrote the Sasser virus that made computers all over the world keep rebooting and rebooting and caused many businesses to lose billions of dollars. http://www.pcworld.com/article/121709/german_teen_confirms_he_created_the_sasser_worm.html

This is what instigated Microsoft to drastically change the way the operating system does hardware abstraction. I’m surprised that Sven has not received his due karmic reward yet.

I have no personal experience with Win7 driver performance issues, as I use Linux and occasionally, WinXP64. However, I am aware that Windows XP and Vista/Win7 use very different driver models for 3D graphics cards and that the WDDM driver model used by Vista/Win7 introduces inefficiencies. You may want to look into using the TCC driver with Win7, which uses a different driver model that can enhance performance:

TCC DRIVER FOR WINDOWS
The TCC (Tesla Computer Cluster) driver is a Windows driver for CUDA C/C++ that enables remote desktop, services, and reduces the CUDA kernel launch overhead on Windows. Note that the TCC driver disables graphics on the Tesla products.

This sounds like it could be the solution to our problem, but can the TCC driver be used with a GeForce 9800 GX2?