cudaErrorNotSupported when calling cv::cuda::cudaHostRegister on NVIDIA TX2

MaciejMatuszak · April 14, 2018, 11:24am

Hi,
I am trying to integrate OpenCV CUDA Stereo block matching for ROS.
The code works well on laptop (Device 0: “Quadro K2100M” 2000Mb, sm_30, 576 cores, Driver/Runtime ver.9.10/9.10) but when trying to run on TX2 (Device 0: “NVIDIA Tegra X2” 7854Mb, sm_62, Driver/Runtime ver.9.0/9.0) I will get error:

OpenCV Error: Gpu API call (operation not supported) in registerPageLocked, file /data/git/opencv/modules/core/src/cuda_host_mem.cpp, line 323
terminate called after throwing an instance of 'cv::Exception'
  what():  /data/git/opencv/modules/core/src/cuda_host_mem.cpp:323: error: (-217) operation not supported in function registerPageLocked

I freshly flashed the TX2 with Jetpack 3.2, the CUDA 9.0 come from Jetpack. I got the same error with Jetpack 3.1 and CUDA 8.0.

The registerPageLocked looks like this:

void cv::cuda::registerPageLocked(Mat& m)
{
#ifndef HAVE_CUDA
    (void) m;
    throw_no_cuda();
#else
    CV_Assert( m.isContinuous() );
    cudaSafeCall( cudaHostRegister(m.data, m.step * m.rows, cudaHostRegisterPortable) );
#endif
}

Looking at the API DOC: http://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1ge8d5c17670f16ac4fc8fcb4181cb490c
I assume the error returned from cudaHostRegister is cudaErrorNotSupported.
The API mention one thing re support:

cudaHostRegister is not supported on non I/O coherent devices.

Is my analysis above correct?
Does the TX2 support cudaHostRegister function ?
If not then what are the alternatives?
I am trying to use OpenCV streams.

Any advice greatly appreciated!

P.S. This was orginally posted on cuda forum but I have been advised TX2 may be better.
https://devtalk.nvidia.com/default/topic/1032205/cudaerrornotsupported-when-calling-cv-cuda-cudahostregister-on-nvidia-tx2/

AastaLLL · April 16, 2018, 2:37am

Hi,

cudaHostRegister() is not supported on ARM platforms.
This is because the caching attribute of an existing allocation can’t be changed on the fly.

If required, please use cudaHostAlloc() with the flag cudaHostAllocMapped to allocate device-mapped host-accessible memory.
Thanks.

MaciejMatuszak · May 8, 2018, 5:01am

Thanks AastaLLL
I used cudaHostAlloc via existing OpenCV call and it works.
What a shame you can not reassign the caching attribute, it means 2 extra copy of the image…
Maciej