Hey folks,
I started to try to implement some cuda processing with the runtime compilation library for live compiling. The stuff I do is image processing with heavy parts of the code dependent on user input so I thought a short compilation before processing a load of images is the best way.
But now I have a problem executing the code as cuLaunchKernel results in a CUDA_ERROR_INVALID_VALUE.
I have packed my experiments into a sample single file demo code pasted to https://bpaste.net/show/a90b0436770d
It compiles on my Mac OS Yosemite 10.10.5 system, CUDA 7.5, clang/llvm version 7.0.2 with
clang++ nvrtc_test_single.cpp -o cudartctest-single -I $CUDA_PATH/include -L $CUDA_PATH/lib -lnvrtc -lcuda -lcudart -F/Library/Frameworks -framework CUDA -Wl,-rpath,$CUDA_PATH/lib
Result output is
Using CUDA device [0]: GeForce GTX 750
CUDA init - time: 87.081001 ms
Fileinfo: Width=1920, Height=1080
CUDA
CUDA - Memory-Prep - time: 9.111000 ms
nvrtcProgramLog:
CUDA - Kernel RTC - time: 872.619019 ms
Grid dimensions: 15 x 1080
error: cuLaunchKernel( kernel, CUDA_X_DIM, 1, 1, grid_dim_x, rgb->height, 1, 0, NULL, args, NULL) failed with error CUDA_ERROR_INVALID_VALUE
So I’d expect that the problem is either in line 192 or in 198 but I don’t really get what’s the problem as it should match the sample from http://docs.nvidia.com/cuda/nvrtc/index.html with only slight modifications.
(The saxpy sample from the nvrtc docu just works fine as is. Same compiler options.)
Does anyone have an idea what the problem is? Searched for over a day but can’t really find helpful information on how to debug this.