Problem using zero-copy / mapped memory Cuda 2.2 beta

Hi,

I’m trying to use the memory mapping feature of Cuda 2.2 beta. My code is:

#include “cuda/cutil.h”

[codebox]

int main(void) {

float4 *ptr_h, *ptr_d;

CUDA_SAFE_CALL( cudaSetDevice(0) );

CUDA_SAFE_CALL( cudaSetDeviceFlags( cudaDeviceMapHost ) );

CUDA_SAFE_CALL( cudaHostAlloc( (void**) &(ptr_h), sizeof(float4) * 30000, cudaHostAllocMapped | cudaHostAllocPortable ) );

CUDA_SAFE_CALL( cudaHostGetDevicePointer( (void**) &(ptr_d), ptr_h, 0 ) );

}

[/codebox]

This gives the error:

[codebox]

Cuda error in file ‘f.cu’ in line 10 : unspecified launch failure in prior launch.

[/codebox]

which is the HostGetDevicePointer() call.

I have the beta release 185 driver, a Red Hat 5.3 x86-64 system and a Tesla C1060.

Any ideas what I’m doing wrong?

Cheers,

Matt

It probably won’t make a difference, but have you tried putting the setDeviceFlags before the setDevice call? The manual does say:

but I don’t really thing that cudaSetDevice is a “CUDA operation”…

I’m at a conference, otherwise I would try it out myself.

Will poke at this in the afternoon.

Cool, if you get this to work please report the speed gains you get compared to copying.

It works just fine for me ( 2.2 beta, RHEL4 64bit)

#include "cuda_runtime.h"

int main(void) {

  float4 *ptr_h, *ptr_d;

  int cudaError;

cudaSetDevice(0);

   cudaSetDeviceFlags( cudaDeviceMapHost );

   cudaError=cudaHostAlloc( (void**) &(ptr_h), sizeof(float4) * 30000, cudaHostAllocMapped | cudaHostAllocPortable );

   if (cudaError) printf ("Failed to allocate pinned memory \n");

cudaError=cudaHostGetDevicePointer( (void**) &(ptr_d), ptr_h, 0 );

   if (cudaError) printf ("Failed to get device pointer \n");

}

Compiled both with gcc ( gcc -I/usr/local/cuda/include bug.c -L/usr/local/cuda/lib -lcudart) or nvcc (nvcc bug.c).

BTW using cutil is usually a bad idea, the error checks will go away in release mode.

Sage words: the CUDA_SAFE_CALL macro I was using (derived originally from cutil) was broken.

Your example - and mine with corrected macro - work just fine now, thanks!

Matt