Since CUDA 8, I am having trouble with Thrust. I am using “sort” kernel from driver API, to sort a buffer of data according to some keys. The whole kernel is as follows:
#include <thrust/sort.h>
#include <thrust/execution_policy.h>
#include "CudaCommon.h" //some defines, basic math, etc... MyDataType is defined there
extern "C"
__global__ void sortKernel(
uint64_t* keys,
MyDataType* data,
unsigned int dataSize
)
{
thrust::sort_by_key(thrust::device, keys, keys + dataSize, data);
}
size of MyDataType is 48 bytes (12 ints).
The kernel itself is launched just as one instance (groups & block size equaling to 1).
checkCudaErrors(cuLaunchKernel(_sortKernel, 1, 1, 1, 1, 1, 1, 0, 0, sortArgs, nullptr));
checkCudaErrors(cuEventRecord(_kernelSyncEvent, 0));
checkCudaErrors(cuEventSynchronize(_kernelSyncEvent));
This code works OK on CUDA 7.5, on CUDA 8 (RC and Release) it causes CUDA_ERROR_UNKNOWN (on the cuEventSynchronize).
System specs: W10 x64, i7 4770K, VS2015 (without any further update), 16GB RAM, GTX 780, drivers 369.30 (shipped with CUDA 8), CUDA installer “cuda_8.0.44_win10.exe”.
What is wrong? An error in my code, or a bug in Thrust?
Thanks in advance.