We have 4 modes while creating a CUDA project. They are Debug, EmuDebug, Release, EmuRelease.
Among these Debug and Release modes are working on GPU and EmuDebug and EmuRelease modes are working in CPU. am I right or not?
While I am running a simple MatrixMul CUDA project, what I observe is that
Debug mode is taking 0.399 ms
Release mode is taking 0.466 ms
EmuDebug mode is taking 1369.699 ms
EmuRelease mode is taking 1390.366 ms
Is this fine? Why EmuXXX builds are taking that much time???
If I put a break point in global function + build mode is EmuDebug, then control is coming in global function and I can able to debug.
If I put a break point in global function + build mode is Debug, then control is NOT coming in global function and I cannot able to debug. Why???
which mode build should I considered? Is it Release build or EmuRelease build???
Section 4.5.2.9 of the 2.1 CUDA Programming Guide describes “Debugging using the Device Emulation Mode”.
This is slower because CPU threads are being used to emulate the GPU hardware.
You should be using the Release build for normal computations when the GPU is available and EMU builds for debugging purposes or when a GPU is not available.
I suggest you check the Programming Guide for more details.
That’s correct - the ‘emu’ modes work in emulation on the CPU. The other modes run kernels on the GPU.
Because they’re emulating the GPU on the CPU, rather than trying to run fast on the CPU. It is not inconceivable that NVIDIA could do better (and there is the next version of nvcc, which is supposed to include a multi-core backend…), but why would they want to do that?
Besides, those slowdowns aren’t so bad. If you really want your code to run slowly, you should compile emudebug, and then run in valgrind :)
How is gdb (or whatever), running on the CPU going to affect code on the GPU? There is supposed to be a GPU-based debugger, but I’ve got no experience of it.
I have found all modes useful, although plain ‘emu’ less so. If I’m running in emulation, I probably want all the symbol information too… Plain ‘debug’ is useful because it will turn on all those CUDA_SAFE_CALL macros which you’re liberally scattering through your code, but won’t cripple performance like emulation.