Application crashes during dwInitialize with remote debugging

I’m using the Drive PX 2 as target platform and a PC with Ubuntu 14.04 as host, DriveWorks SDK v0.3 and PDK v4.1.6.1.
My host PC has a NVS 510, which has only 3.0 compute capability (shouldn’t be a problem, right?).

I can cross-compile the DriveNet application and even run my compiled sample directly on the Drive PX 2 without problems.
When I try to run it on the Drive PX 2 using a remote debugging session with Nsight-Eclipse on my host PC, I am able to step the sample until dwInitialize, but when it goes inside this function the application ends abruptly and I cannot debug it.

Within the gdb traces I can see the following error:
error,msg=“fatal: No CUDA capable device was found. (error code = CUDBG_ERROR_NO_DEVICE_
AVAILABLE(0x27)”

The remote shell shows the following:
Warning: Adjusting return value of linux_common_core_of_thread (pid=1391, tid=1391).
core = 17 >= num_cores = 6!
Program Arguments:
–camera-index=0
–camera-type=ar0231-rccb-ssc
–csi-port=ab
–input-type=video
–slave=0
–stopFrame=0
–video=/usr/local/driveworks/data/samples/raw/rccb.raw

Initialize DriveWorks SDK v0.3.400
Release build with GNU 4.9.2 from v0.3.0-rc8-0-g3eeebea against PDK v4.1.6.1
SDK: Resources mounted from /usr/local/driveworks/data/resources
Killing all inferiors
logout

Do you know what is the cause this problem?

Dear Frank_cisco,

Could you please check below requirement for HostPC? Thanks.

Prerequisites

Basic Hardware Requirements
•NVIDIDA DRIVE™ PX 2 with the latest PDK flashed in the system.
•Linux desktop of Linux x86/x64

Linux System Requirements

These are the basic preqrequisites for Linux:
•Ubuntu Linux 14.04 (out of the box installation)
•GCC >= 4.8.X && GCC <= 4.9.x

•cmake version >= 3.2.2

By default, Ubuntu 14.04 installs cmake version 2.8. For guidance on installing cmake 3.x, see:
http://askubuntu.com/questions/610291/how-to-install-cmake-3-2-on-ubuntu-14-04

•CUDA Toolkit 8.0 or higher
•NVIDIA® CUDA® version 8.0 or later
•NVIDIA® Vibrante™ PDK installation for DRIVE PX 2 on the Linux Host
•You may also need to install (using apt-get install) the following packages: libx11-dev
libxrandr-dev
libxcursor-dev
libxxf86vm-dev
libxinerama-dev
libxi-dev
libglu1-mesa-dev

Desktop development relies on NVCUVID for video decoding, which is included with the NVIDIA drivers. In general, the cmake build scripts can find NVCUVID installation. However, if this fails, you must set a symbolic link /usr/lib/nvidia-current pointing to your NVIDIA driver lib. For example /usr/lib/nvidia-367

Dear SteveNV,

All requirements that you mention are met, except that I’m not using SDK/PDK 4.1.8.0 but 4.1.6.1. However, I checked the release notes of v4.1.8.0 and I didn’t see anything that could explain/solve the problem I’m having. Moreover, like I mentioned, my compiled sample runs without problems if I start it directly on the Drive PX 2.
Do you have any other suggestions of what could I check/modify to solve this problem?

Dear Frank_cisco,

Could you please check CUDA environment setup in DrivePX2 like below?

$gedit ~/.bashrc
export PATH=/usr/local/cuda-8.0/bin/:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/targets/aarch64-linux/lib:$LD_LIBRARY_PATH

$ source ~/.bashrc
$ nvcc --version

$ /usr/local/cuda-8.0/bin/cuda-install-samples-8.0.sh ~/

$cd ~/NVIDIA_CUDA8.0_Samples/1_Utilities/deviceQuery
$make
$./deviceQuery

Dear SteveNV,

I checked what you asked me, including editing .bashrc, and everything looks fine.
To be honest, I fail to see how this is related with the problem. As I mentioned, there is no problem running the sample directly on the board, only when trying to execute it with remote debugging.

The output of nvcc --version and deviceQuery looks like this:

nvidia@nvidia:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Mon_Mar_20_17:07:33_CDT_2017
Cuda compilation tools, release 8.0, V8.0.72

nvidia@nvidia:~$ ~/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/deviceQuery 
/home/nvidia/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GP10B"
  CUDA Driver Version / Runtime Version          8.0 / 8.0
  CUDA Capability Major/Minor version number:    6.2
  Total amount of global memory:                 6660 MBytes (6983643136 bytes)
  ( 2) Multiprocessors, (128) CUDA Cores/MP:     256 CUDA Cores
  GPU Max Clock rate:                            1275 MHz (1.27 GHz)
  Memory Clock rate:                             1600 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GP10B
Result = PASS

Any other suggestions?

Dear Frank_cisco,

We didn’t observe this issue while testing with Nsight IDE.
I suspect NVS 510 (Kepler will not support ) the cuda gdb (8.0) from host PC.
I would suggest you to try with some latest Maxwell / Parker GPUs rather than NVS 510. Thanks.

Hi,Frank_cisco

To help locate which part has issue,you can first try using cuda-gdb directly on the remote target(Drive PX2) and see what will happens.

Hi Vera J,

I tried cuda-gdb directly on the Drive PX 2 and it works, it is slow, but it works.
It seems like the suspicion from SteveNV might be right and a GPU with enough compute capability on the host is also necessary (even though the application runs on the target).
I’ll try to get another graphic card with compute capability of at least 5.0 on the host to be able to remote debug CUDA/DriveWorks applications running on the Drive PX 2.

Hi,

I finally got a better graphic card for my host PC, a GTX 1060.

I updated the Drive PX board and the PC to the latest SDK/PDK version (5.0.5.0).
With the 1060 I am able to run and debug the samples on the host PC

Unfortunately the problem remains the same: the compiled sample runs on the board only if I start the binary directly on the Drive PX board, but the remote debug session aborts with the error “fatal: No CUDA capable device was found. (error code = CUDBG_ERROR_NO_DEVICE_AVAILABLE(0x27)”

Do you have any other ideas on how to solve this problem?

Hi, Frank_cisco

Back to the original description, you said

“When I try to run it on the Drive PX 2 using a remote debugging session with Nsight-Eclipse on my host PC, I am able to step the sample until dwInitialize”

  1. You mean debug is basically works on Drive PX2 and only when your code calls dwInitialize, then error happens ? So what this function want to do ? Something about graphic display ?

  2. The same behavior between cuda-gdb, or just Nsight EE have the problem ?

Hi veraj,

  1. Yes, debug basically works on the Drive PX2. The error comes specifically when the function cudaFree(0) in the constructor of DriveWorksSample is called
  2. With cuda-gdb directly on the Drive PX2 there is no problem

Hi, Frank_cisco

Can you provide the sample to reproduce ?

Also please clarify your instructions, like which breakpoint you have set.

Hi veraj,

I’m using the DriveNet sample that is included in the latest SDK/PDK version (5.0.5.0)
The problem occurs with or without breakpoints, I just let it run in a remote debug session.

Hi, Frank_cisco

We checked with Drive 5.0.5.0 SDK/PDK, and we are able to run the sample application successfully.
We did not see the issue you reported but did see some sluggishness in the application run remotely on the target.

And we now have some known issues about Nsight EE that may related to your problem.

As cuda-gdb works for you, can you use this for WAR temporarily?

Hi Veraj,

cuda-gdb works for remote debugging using it directly on the command line, but not with Nsight.
Like you mention, it is (extremely) slow.
The problem I have with this approach is that I get the following warnings on the debug session:

warning: Cuda API error detected: cudaGetLastError returned (0xb)
warning: Cuda API error detected: cudaHostGetFlags returned (0xb)

If I let DriveNet run (remotely, on the command line), I get these warnings several times per second, so I cannot really use this approach.
Do you also get these warnings?

Hi, Frank_cisco

Yes, we get the same error frequently.

And this is already confirmed by the dev this is demo app issue, it need fixed to not make these invalid calls in the first place.

Also there is something to do with cuda-gdb, it should bypass these api failures if the setting “break on API” is disabled. And our dev is working on this.

So currently I think you can just wait the new DriveInstall Release.
Sorry for that.