So I have been struggling with profiling my GPU program using Nsight + vs2010 + windows 7. I was never able to profile the GPU program that I want to optimize.
I have cuda sdk 4.2 installed on my Thinkpad W530 laptop, with Quadro K1000M NVIDIA graphics card. Then I installed Nsight 2.2, attempting to use it as a profiling tool. I configured the project to use vc90 toolset, according to the suggestions in this article: http://http.developer.nvidia.com/ParallelNsight/1.51/UserGuide/HTML/Configuring_a_VS2010_Project.html
Actually, I was able to profile a very simple CUDA application with my current setup. The project contains only one test.cu file and one “Add” kernel. When I try to use the same settings to profile a slightly bigger project, it always shows “There was 1 collection error encountered” >> “No events captured”…
The only difference between these two projects, I believe, are:
-
CUDA device setting
the bigger project uses “compute_20,sm_21” so that I could use atomicAdd function.
the simple project uses “compute_10,sm_10” -
the place of main function.
in the simple project that I can profile, main function is located in the only test.cu file
however for the larger project, the main function is in main.cpp, and kernels are launched in other files.
I couldn’t find much useful info out on the web, please help. Thank you!