I had a quick question regarding the performance of cudamalloc opeartion on Win 7 machine and xp machine.
For about 3.5 Kbyte of memory allocation , it takes about 1.4msec on an win7 machine and < than 0.1ms on an xp machine.
Is it a known fact that the cudaMalloc operation is a lot slower on an windows 7 machine when compared to an xp machine? If so what is the reason? and is there any possible work around?
This is expected due to the overhead of having to interact with the Windows display driver scheduler on Vista/Win7. TCC mode doesn’t have this performance impact.
hmm yeah I am in the process of implementing that.
Had another question regarding cuda and OS. Does the OS have any role to play once the kernel is launched? as in thread scheduling/memory management etc? This may sound stupid, but the reason i am asking this is I have a kernel which takes about 700 ms on xp and 1.4 seconds on win 7 and this is only the kernel execution time. I have gtx 285 on both machines. This seems to be the issue only when different threads work on memory areas which are wide apart.