Problem using GPU with Matlab with gpuArray/gather functions

Hi all,

I am using Matlab for GPU computing. To do so, I use a .ptx files compiled from a .cu files with nvcc. I have problems when I compute successively .cu files :

Error using gpuArray/gather
An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_UNKNOWN

This error comes after several successive call to cuda files, with functions “gpuArray” or “gather”. Normally, it is not a size problem because the max GridSize is 2.1475e+09 and my Grid size is smaller but it might be a problem of re-initialization of memory.

Here is my device properties obtained by ‘gpuDevice’ function :

CUDADevice with properties:

                      Name: 'GeForce GTX TITAN X'
                     Index: 1
         ComputeCapability: '5.2'
            SupportsDouble: 1
             DriverVersion: 7.5000
            ToolkitVersion: 5.5000
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 1.2885e+10
                FreeMemory: 1.2608e+10
       MultiprocessorCount: 24
              ClockRateKHz: 1215500
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1

Here is my function to call the kernel :

% Parameters with proper types %
g_data = gpuArray(int32(data)); 
s_Fs = single(Fs);
i_Nt = int32(Nt);
s_t0 = single(t0);
i_Ne = int32(Ne);
g_Xe = gpuArray(single(Xe));
g_Ye = gpuArray(single(Ye));
g_Ze = gpuArray(single(Ze));
i_N = int32(N);
g_X = gpuArray(single(X));
g_Y = gpuArray(single(Y));
g_Z = gpuArray(single(Z));
s_c1 = single(c1);
s_c2 = single(c2);
s_tol = single(tol);
g_O = gpuArray(zeros(N,1,'int32'));

% Load the kernel %
kernel = parallel.gpu.CUDAKernel('name.ptx', 'name.cu');
kernel.GridSize = [ceil(N/kernel.MaxThreadsPerBlock) 1];
kernel.ThreadBlockSize = [kernel.MaxThreadsPerBlock 1 1];

% GPU computation %
[g_data,g_Xe,g_Ye,g_Ze,g_X,g_Y,g_Z,g_O] = feval(    kernel, ...
                                                    g_data, ...
                                                    s_Fs, ...
                                                    i_Nt, ...
                                                    s_t0, ...
                                                    i_Ne, ...
                                                    g_Xe, ...
                                                    g_Ye, ...
                                                    g_Ze, ...
                                                    i_N, ...
                                                    g_X, ...
                                                    g_Y, ...
                                                    g_Z, ...
                                                    s_c1, ...
                                                    s_c2, ...
                                                    s_tol, ...
                                                    g_O);

% Gather the data %                    
O = gather(g_O);

Thanks !

Is this on windows? If so you may be running into a WDDM timeout.

Thanks for your quick answer txbob, I am working on windows and I heard about this problem of WDDM timeout so I have increased the TdrDelay but I still have the problem.