Application crashes during dwInitialize with remote debugging
I'm using the Drive PX 2 as target platform and a PC with Ubuntu 14.04 as host, DriveWorks SDK v0.3 and PDK v4.1.6.1. My host PC has a NVS 510, which has only 3.0 compute capability (shouldn't be a problem, right?). I can cross-compile the DriveNet application and even run my compiled sample directly on the Drive PX 2 without problems. When I try to run it on the Drive PX 2 using a remote debugging session with Nsight-Eclipse on my host PC, I am able to step the sample until dwInitialize, but when it goes inside this function the application ends abruptly and I cannot debug it. Within the gdb traces I can see the following error: error,msg="fatal: No CUDA capable device was found. (error code = CUDBG_ERROR_NO_DEVICE_\ AVAILABLE(0x27)" The remote shell shows the following: Warning: Adjusting return value of linux_common_core_of_thread (pid=1391, tid=1391). core = 17 >= num_cores = 6! Program Arguments: --camera-index=0 --camera-type=ar0231-rccb-ssc --csi-port=ab --input-type=video --slave=0 --stopFrame=0 --video=/usr/local/driveworks/data/samples/raw/rccb.raw Initialize DriveWorks SDK v0.3.400 Release build with GNU 4.9.2 from v0.3.0-rc8-0-g3eeebea against PDK v4.1.6.1 SDK: Resources mounted from /usr/local/driveworks/data/resources Killing all inferiors logout Do you know what is the cause this problem?
I'm using the Drive PX 2 as target platform and a PC with Ubuntu 14.04 as host, DriveWorks SDK v0.3 and PDK v4.1.6.1.
My host PC has a NVS 510, which has only 3.0 compute capability (shouldn't be a problem, right?).

I can cross-compile the DriveNet application and even run my compiled sample directly on the Drive PX 2 without problems.
When I try to run it on the Drive PX 2 using a remote debugging session with Nsight-Eclipse on my host PC, I am able to step the sample until dwInitialize, but when it goes inside this function the application ends abruptly and I cannot debug it.

Within the gdb traces I can see the following error:
error,msg="fatal: No CUDA capable device was found. (error code = CUDBG_ERROR_NO_DEVICE_\
AVAILABLE(0x27)"

The remote shell shows the following:
Warning: Adjusting return value of linux_common_core_of_thread (pid=1391, tid=1391).
core = 17 >= num_cores = 6!
Program Arguments:
--camera-index=0
--camera-type=ar0231-rccb-ssc
--csi-port=ab
--input-type=video
--slave=0
--stopFrame=0
--video=/usr/local/driveworks/data/samples/raw/rccb.raw

Initialize DriveWorks SDK v0.3.400
Release build with GNU 4.9.2 from v0.3.0-rc8-0-g3eeebea against PDK v4.1.6.1
SDK: Resources mounted from /usr/local/driveworks/data/resources
Killing all inferiors
logout

Do you know what is the cause this problem?

#1
Posted 09/26/2017 12:42 PM   
Dear Frank_cisco, Could you please check below requirement for HostPC? Thanks. Prerequisites Basic Hardware Requirements •NVIDIDA DRIVE™ PX 2 with the latest PDK flashed in the system. •Linux desktop of Linux x86/x64 Linux System Requirements These are the basic preqrequisites for Linux: •Ubuntu Linux 14.04 (out of the box installation) •GCC >= 4.8.X && GCC <= 4.9.x •cmake version >= 3.2.2 By default, Ubuntu 14.04 installs cmake version 2.8. For guidance on installing cmake 3.x, see: http://askubuntu.com/questions/610291/how-to-install-cmake-3-2-on-ubuntu-14-04 •CUDA Toolkit 8.0 or higher •NVIDIA® CUDA® version 8.0 or later •NVIDIA® Vibrante™ PDK installation for DRIVE PX 2 on the Linux Host •You may also need to install (using apt-get install) the following packages: libx11-dev libxrandr-dev libxcursor-dev libxxf86vm-dev libxinerama-dev libxi-dev libglu1-mesa-dev Desktop development relies on NVCUVID for video decoding, which is included with the NVIDIA drivers. In general, the cmake build scripts can find NVCUVID installation. However, if this fails, you must set a symbolic link /usr/lib/nvidia-current pointing to your NVIDIA driver lib. For example /usr/lib/nvidia-367
Dear Frank_cisco,

Could you please check below requirement for HostPC? Thanks.

Prerequisites

Basic Hardware Requirements
•NVIDIDA DRIVE™ PX 2 with the latest PDK flashed in the system.
•Linux desktop of Linux x86/x64

Linux System Requirements

These are the basic preqrequisites for Linux:
•Ubuntu Linux 14.04 (out of the box installation)
•GCC >= 4.8.X && GCC <= 4.9.x

•cmake version >= 3.2.2

By default, Ubuntu 14.04 installs cmake version 2.8. For guidance on installing cmake 3.x, see:

http://askubuntu.com/questions/610291/how-to-install-cmake-3-2-on-ubuntu-14-04


•CUDA Toolkit 8.0 or higher
•NVIDIA® CUDA® version 8.0 or later
•NVIDIA® Vibrante™ PDK installation for DRIVE PX 2 on the Linux Host
•You may also need to install (using apt-get install) the following packages: libx11-dev
libxrandr-dev
libxcursor-dev
libxxf86vm-dev
libxinerama-dev
libxi-dev
libglu1-mesa-dev


Desktop development relies on NVCUVID for video decoding, which is included with the NVIDIA drivers. In general, the cmake build scripts can find NVCUVID installation. However, if this fails, you must set a symbolic link /usr/lib/nvidia-current pointing to your NVIDIA driver lib. For example /usr/lib/nvidia-367

#2
Posted 09/27/2017 01:35 AM   
Dear SteveNV, All requirements that you mention are met, except that I'm not using SDK/PDK 4.1.8.0 but 4.1.6.1. However, I checked the release notes of v4.1.8.0 and I didn't see anything that could explain/solve the problem I'm having. Moreover, like I mentioned, my compiled sample runs without problems if I start it directly on the Drive PX 2. Do you have any other suggestions of what could I check/modify to solve this problem?
Dear SteveNV,

All requirements that you mention are met, except that I'm not using SDK/PDK 4.1.8.0 but 4.1.6.1. However, I checked the release notes of v4.1.8.0 and I didn't see anything that could explain/solve the problem I'm having. Moreover, like I mentioned, my compiled sample runs without problems if I start it directly on the Drive PX 2.
Do you have any other suggestions of what could I check/modify to solve this problem?

#3
Posted 09/27/2017 09:03 AM   
Dear Frank_cisco, Could you please check CUDA environment setup in DrivePX2 like below? $gedit ~/.bashrc export PATH=/usr/local/cuda-8.0/bin/:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/targets/aarch64-linux/lib:$LD_LIBRARY_PATH $ source ~/.bashrc $ nvcc --version $ /usr/local/cuda-8.0/bin/cuda-install-samples-8.0.sh ~/ $cd ~/NVIDIA_CUDA8.0_Samples/1_Utilities/deviceQuery $make $./deviceQuery
Dear Frank_cisco,

Could you please check CUDA environment setup in DrivePX2 like below?

$gedit ~/.bashrc
export PATH=/usr/local/cuda-8.0/bin/:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/targets/aarch64-linux/lib:$LD_LIBRARY_PATH

$ source ~/.bashrc
$ nvcc --version

$ /usr/local/cuda-8.0/bin/cuda-install-samples-8.0.sh ~/

$cd ~/NVIDIA_CUDA8.0_Samples/1_Utilities/deviceQuery
$make
$./deviceQuery

#4
Posted 09/28/2017 01:00 AM   
Dear SteveNV, I checked what you asked me, including editing .bashrc, and everything looks fine. To be honest, I fail to see how this is related with the problem. As I mentioned, there is no problem running the sample directly on the board, only when trying to execute it with remote debugging. The output of nvcc --version and deviceQuery looks like this: [code] nvidia@nvidia:~$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Mon_Mar_20_17:07:33_CDT_2017 Cuda compilation tools, release 8.0, V8.0.72 nvidia@nvidia:~$ ~/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/deviceQuery /home/nvidia/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GP10B" CUDA Driver Version / Runtime Version 8.0 / 8.0 CUDA Capability Major/Minor version number: 6.2 Total amount of global memory: 6660 MBytes (6983643136 bytes) ( 2) Multiprocessors, (128) CUDA Cores/MP: 256 CUDA Cores GPU Max Clock rate: 1275 MHz (1.27 GHz) Memory Clock rate: 1600 Mhz Memory Bus Width: 128-bit L2 Cache Size: 524288 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 1 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: Yes Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GP10B Result = PASS [/code] Any other suggestions?
Dear SteveNV,

I checked what you asked me, including editing .bashrc, and everything looks fine.
To be honest, I fail to see how this is related with the problem. As I mentioned, there is no problem running the sample directly on the board, only when trying to execute it with remote debugging.

The output of nvcc --version and deviceQuery looks like this:
nvidia@nvidia:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Mon_Mar_20_17:07:33_CDT_2017
Cuda compilation tools, release 8.0, V8.0.72

nvidia@nvidia:~$ ~/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/deviceQuery
/home/nvidia/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GP10B"
CUDA Driver Version / Runtime Version 8.0 / 8.0
CUDA Capability Major/Minor version number: 6.2
Total amount of global memory: 6660 MBytes (6983643136 bytes)
( 2) Multiprocessors, (128) CUDA Cores/MP: 256 CUDA Cores
GPU Max Clock rate: 1275 MHz (1.27 GHz)
Memory Clock rate: 1600 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GP10B
Result = PASS



Any other suggestions?

#5
Posted 09/28/2017 08:26 AM   
Dear Frank_cisco, We didn’t observe this issue while testing with Nsight IDE. I suspect NVS 510 (Kepler will not support ) the cuda gdb (8.0) from host PC. I would suggest you to try with some latest Maxwell / Parker GPUs rather than NVS 510. Thanks.
Dear Frank_cisco,

We didn’t observe this issue while testing with Nsight IDE.
I suspect NVS 510 (Kepler will not support ) the cuda gdb (8.0) from host PC.
I would suggest you to try with some latest Maxwell / Parker GPUs rather than NVS 510. Thanks.

#6
Posted 10/12/2017 12:52 AM   
Hi,Frank_cisco To help locate which part has issue,you can first try using cuda-gdb directly on the remote target(Drive PX2) and see what will happens.
Hi,Frank_cisco

To help locate which part has issue,you can first try using cuda-gdb directly on the remote target(Drive PX2) and see what will happens.

#7
Posted 10/12/2017 02:36 AM   
[quote=""]Hi,Frank_cisco To help locate which part has issue,you can first try using cuda-gdb directly on the remote target(Drive PX2) and see what will happens. [/quote] Hi Vera J, I tried cuda-gdb directly on the Drive PX 2 and it works, it is slow, but it works. It seems like the suspicion from SteveNV might be right and a GPU with enough compute capability on the host is also necessary (even though the application runs on the target). I'll try to get another graphic card with compute capability of at least 5.0 on the host to be able to remote debug CUDA/DriveWorks applications running on the Drive PX 2.
said:Hi,Frank_cisco

To help locate which part has issue,you can first try using cuda-gdb directly on the remote target(Drive PX2) and see what will happens.


Hi Vera J,

I tried cuda-gdb directly on the Drive PX 2 and it works, it is slow, but it works.
It seems like the suspicion from SteveNV might be right and a GPU with enough compute capability on the host is also necessary (even though the application runs on the target).
I'll try to get another graphic card with compute capability of at least 5.0 on the host to be able to remote debug CUDA/DriveWorks applications running on the Drive PX 2.

#8
Posted 10/18/2017 08:21 AM   
Hi, I finally got a better graphic card for my host PC, a GTX 1060. I updated the Drive PX board and the PC to the latest SDK/PDK version (5.0.5.0). With the 1060 I am able to run and debug the samples on the host PC Unfortunately the problem remains the same: the compiled sample runs on the board only if I start the binary directly on the Drive PX board, but the remote debug session aborts with the error "fatal: No CUDA capable device was found. (error code = CUDBG_ERROR_NO_DEVICE_AVAILABLE(0x27)" Do you have any other ideas on how to solve this problem?
Hi,

I finally got a better graphic card for my host PC, a GTX 1060.

I updated the Drive PX board and the PC to the latest SDK/PDK version (5.0.5.0).
With the 1060 I am able to run and debug the samples on the host PC

Unfortunately the problem remains the same: the compiled sample runs on the board only if I start the binary directly on the Drive PX board, but the remote debug session aborts with the error "fatal: No CUDA capable device was found. (error code = CUDBG_ERROR_NO_DEVICE_AVAILABLE(0x27)"

Do you have any other ideas on how to solve this problem?

#9
Posted 12/22/2017 02:42 PM   
Hi, Frank_cisco Back to the original description, you said "When I try to run it on the Drive PX 2 using a remote debugging session with Nsight-Eclipse on my host PC, I am able to step the sample until dwInitialize" 1. You mean debug is basically works on Drive PX2 and only when your code calls dwInitialize, then error happens ? So what this function want to do ? Something about graphic display ? 2. The same behavior between cuda-gdb, or just Nsight EE have the problem ?
Hi, Frank_cisco

Back to the original description, you said

"When I try to run it on the Drive PX 2 using a remote debugging session with Nsight-Eclipse on my host PC, I am able to step the sample until dwInitialize"


1. You mean debug is basically works on Drive PX2 and only when your code calls dwInitialize, then error happens ? So what this function want to do ? Something about graphic display ?

2. The same behavior between cuda-gdb, or just Nsight EE have the problem ?

#10
Posted 12/25/2017 02:22 AM   
[quote=""]Hi, Frank_cisco Back to the original description, you said "When I try to run it on the Drive PX 2 using a remote debugging session with Nsight-Eclipse on my host PC, I am able to step the sample until dwInitialize" 1. You mean debug is basically works on Drive PX2 and only when your code calls dwInitialize, then error happens ? So what this function want to do ? Something about graphic display ? 2. The same behavior between cuda-gdb, or just Nsight EE have the problem ?[/quote] Hi veraj, 1. Yes, debug basically works on the Drive PX2. The error comes specifically when the function cudaFree(0) in the constructor of DriveWorksSample is called 2. With cuda-gdb directly on the Drive PX2 there is no problem
said:Hi, Frank_cisco

Back to the original description, you said

"When I try to run it on the Drive PX 2 using a remote debugging session with Nsight-Eclipse on my host PC, I am able to step the sample until dwInitialize"


1. You mean debug is basically works on Drive PX2 and only when your code calls dwInitialize, then error happens ? So what this function want to do ? Something about graphic display ?

2. The same behavior between cuda-gdb, or just Nsight EE have the problem ?


Hi veraj,

1. Yes, debug basically works on the Drive PX2. The error comes specifically when the function cudaFree(0) in the constructor of DriveWorksSample is called
2. With cuda-gdb directly on the Drive PX2 there is no problem

#11
Posted 01/08/2018 08:43 AM   
Hi, Frank_cisco Can you provide the sample to reproduce ? Also please clarify your instructions, like which breakpoint you have set.
Hi, Frank_cisco

Can you provide the sample to reproduce ?

Also please clarify your instructions, like which breakpoint you have set.

#12
Posted 01/08/2018 08:49 AM   
Hi veraj, I'm using the DriveNet sample that is included in the latest SDK/PDK version (5.0.5.0) The problem occurs with or without breakpoints, I just let it run in a remote debug session.
Hi veraj,

I'm using the DriveNet sample that is included in the latest SDK/PDK version (5.0.5.0)
The problem occurs with or without breakpoints, I just let it run in a remote debug session.

#13
Posted 01/08/2018 08:53 AM   
Hi, Frank_cisco We checked with Drive 5.0.5.0 SDK/PDK, and we are able to run the sample application successfully. We did not see the issue you reported but did see some sluggishness in the application run remotely on the target. And we now have some known issues about Nsight EE that may related to your problem. As cuda-gdb works for you, can you use this for WAR temporarily?
Hi, Frank_cisco


We checked with Drive 5.0.5.0 SDK/PDK, and we are able to run the sample application successfully.
We did not see the issue you reported but did see some sluggishness in the application run remotely on the target.

And we now have some known issues about Nsight EE that may related to your problem.


As cuda-gdb works for you, can you use this for WAR temporarily?

#14
Posted 01/09/2018 05:11 AM   
Hi Veraj, cuda-gdb works for remote debugging using it directly on the command line, but not with Nsight. Like you mention, it is (extremely) slow. The problem I have with this approach is that I get the following warnings on the debug session: warning: Cuda API error detected: cudaGetLastError returned (0xb) warning: Cuda API error detected: cudaHostGetFlags returned (0xb) If I let DriveNet run (remotely, on the command line), I get these warnings several times per second, so I cannot really use this approach. Do you also get these warnings?
Hi Veraj,

cuda-gdb works for remote debugging using it directly on the command line, but not with Nsight.
Like you mention, it is (extremely) slow.
The problem I have with this approach is that I get the following warnings on the debug session:

warning: Cuda API error detected: cudaGetLastError returned (0xb)
warning: Cuda API error detected: cudaHostGetFlags returned (0xb)

If I let DriveNet run (remotely, on the command line), I get these warnings several times per second, so I cannot really use this approach.
Do you also get these warnings?

#15
Posted 01/17/2018 10:06 AM   
Scroll To Top

Add Reply