Compilation with and without debug info, different outcomes
First of all I think this topic is not in the right forum, but seeing that I could not start a new topic in the Nsight section, I have no choice but to ask here.

When compiled without gpu debug information, my CUDA code works fine, at least it seems to run well, with no crashes or reported errors. But when I compile with debug info and run, I get errors, but when I go in and debug step by step through the GPU code, no problems are reported, and I can't see anything going wrong... How can this be? Which info can I trust?
First of all I think this topic is not in the right forum, but seeing that I could not start a new topic in the Nsight section, I have no choice but to ask here.



When compiled without gpu debug information, my CUDA code works fine, at least it seems to run well, with no crashes or reported errors. But when I compile with debug info and run, I get errors, but when I go in and debug step by step through the GPU code, no problems are reported, and I can't see anything going wrong... How can this be? Which info can I trust?

#1
Posted 05/01/2012 05:08 PM   
Same type of problem here, not errors but different animation for some unknown reason.

Congratulations on Nsight 2.2, single GPU debugging, that's very major!
Same type of problem here, not errors but different animation for some unknown reason.



Congratulations on Nsight 2.2, single GPU debugging, that's very major!

#2
Posted 05/15/2012 01:16 PM   
Of what nature are the errors? Completely incorrect results could be the consequence of a race condition in the code (missing __synthreads(), for example). Small numerical differences could be the result of different rates of FMAD/FMA merging occuring between debug and optimized builds. To test the latter hypothesis, you can compile the code with -fmad=false to turn off FMAD/FMA merging. Note that the flag requires the NVVM compiler which is used only for the sm_20 architecture and above.
Of what nature are the errors? Completely incorrect results could be the consequence of a race condition in the code (missing __synthreads(), for example). Small numerical differences could be the result of different rates of FMAD/FMA merging occuring between debug and optimized builds. To test the latter hypothesis, you can compile the code with -fmad=false to turn off FMAD/FMA merging. Note that the flag requires the NVVM compiler which is used only for the sm_20 architecture and above.

#3
Posted 05/15/2012 02:31 PM   
still don't see it, tried -nvvm -fmad=false, also tried different sm_XX, I could take the release build and only add -G then animation is no good.

There's use of thrust in the function, also a __device__ array of floats. Tried making it single threaded, there's use of _beginthreadex, didn't help, just slower.

no __syncthreads used at all.

Occurs in both x86 and x64 with -G.
still don't see it, tried -nvvm -fmad=false, also tried different sm_XX, I could take the release build and only add -G then animation is no good.



There's use of thrust in the function, also a __device__ array of floats. Tried making it single threaded, there's use of _beginthreadex, didn't help, just slower.



no __syncthreads used at all.



Occurs in both x86 and x64 with -G.

#4
Posted 05/15/2012 11:15 PM   
I am sorry I forgot about this post! I made a second post after I tracked down the problem (I believe I did find it, unless error reporting is off in Nsight), and this is a link to the post explaining what I am having http://forums.nvidia.com/index.php?showtopic=229554.

Let me summarize the issue. Things work quite well when compiled without GPU debug info, and in fact things run well with GPU debug info, and this is the bit I failed to mention, it only gives me "first chance exception" error when I try to GPU debug. I am using 2 Tesla C2050s, CUDA 4.1, Nsight 2.1.

Unfortunately I can't try your suggestions as I will be away from the machine for the weekend...
I am sorry I forgot about this post! I made a second post after I tracked down the problem (I believe I did find it, unless error reporting is off in Nsight), and this is a link to the post explaining what I am having http://forums.nvidia.com/index.php?showtopic=229554.



Let me summarize the issue. Things work quite well when compiled without GPU debug info, and in fact things run well with GPU debug info, and this is the bit I failed to mention, it only gives me "first chance exception" error when I try to GPU debug. I am using 2 Tesla C2050s, CUDA 4.1, Nsight 2.1.



Unfortunately I can't try your suggestions as I will be away from the machine for the weekend...

#5
Posted 05/25/2012 04:31 PM   
Gorune,

You could probably fix that err msg by upgrading to 4.2. Tried 4.1 to fix my own prob and saw that msg too.

Still no luck with my animation though, maybe functor/iterator related?? nVidia? What info you need?

Thanks
Gorune,



You could probably fix that err msg by upgrading to 4.2. Tried 4.1 to fix my own prob and saw that msg too.



Still no luck with my animation though, maybe functor/iterator related?? nVidia? What info you need?



Thanks

#6
Posted 05/30/2012 09:58 PM   
Hey, did you try 4.2 or 5.0? Are you having the same behaviour? I mean, when running without debug things work well, but when you debug CUDA with nsight, at some point you get an error?

I am thinking maybe the nested functor is the problem, the debugger simply cannot tell what to do there and shows an error, although these functors are inlinable and very lightweight and I am quite sure the compiler deals with them quite efficiently.

I will do a small experiment soon, trying to make tiny kernels with nested functors and see what Nsight thinks of those.
Hey, did you try 4.2 or 5.0? Are you having the same behaviour? I mean, when running without debug things work well, but when you debug CUDA with nsight, at some point you get an error?



I am thinking maybe the nested functor is the problem, the debugger simply cannot tell what to do there and shows an error, although these functors are inlinable and very lightweight and I am quite sure the compiler deals with them quite efficiently.



I will do a small experiment soon, trying to make tiny kernels with nested functors and see what Nsight thinks of those.

#7
Posted 05/31/2012 09:59 AM   
Hi,

I'm using 4.2, looking into trying 5.0 though now that you mention it. I saw that error with 4.1 when compiled with -G0 whether using Nsight or not. The program keeps running but many error msgs, with 4.2 no error msgs. Can you try 4.2?
Hi,



I'm using 4.2, looking into trying 5.0 though now that you mention it. I saw that error with 4.1 when compiled with -G0 whether using Nsight or not. The program keeps running but many error msgs, with 4.2 no error msgs. Can you try 4.2?

#8
Posted 05/31/2012 11:03 PM   
I do not want to attempt any changes on that level because I don't want to have things to worry about right now. I will certainly try cuda 5.0 and Nsight 2.2 in a couple of weeks.
I do not want to attempt any changes on that level because I don't want to have things to worry about right now. I will certainly try cuda 5.0 and Nsight 2.2 in a couple of weeks.

#9
Posted 06/01/2012 09:49 AM   
Scroll To Top