MJH22
1
Hi,
I’m trying to use the memory mapping feature of Cuda 2.2 beta. My code is:
#include “cuda/cutil.h”
[codebox]
int main(void) {
float4 *ptr_h, *ptr_d;
CUDA_SAFE_CALL( cudaSetDevice(0) );
CUDA_SAFE_CALL( cudaSetDeviceFlags( cudaDeviceMapHost ) );
CUDA_SAFE_CALL( cudaHostAlloc( (void**) &(ptr_h), sizeof(float4) * 30000, cudaHostAllocMapped | cudaHostAllocPortable ) );
CUDA_SAFE_CALL( cudaHostGetDevicePointer( (void**) &(ptr_d), ptr_h, 0 ) );
}
[/codebox]
This gives the error:
[codebox]
Cuda error in file ‘f.cu’ in line 10 : unspecified launch failure in prior launch.
[/codebox]
which is the HostGetDevicePointer() call.
I have the beta release 185 driver, a Red Hat 5.3 x86-64 system and a Tesla C1060.
Any ideas what I’m doing wrong?
Cheers,
Matt
It probably won’t make a difference, but have you tried putting the setDeviceFlags before the setDevice call? The manual does say:
but I don’t really thing that cudaSetDevice is a “CUDA operation”…
I’m at a conference, otherwise I would try it out myself.
Will poke at this in the afternoon.
wumpus
4
Cool, if you get this to work please report the speed gains you get compared to copying.
It works just fine for me ( 2.2 beta, RHEL4 64bit)
#include "cuda_runtime.h"
int main(void) {
float4 *ptr_h, *ptr_d;
int cudaError;
cudaSetDevice(0);
cudaSetDeviceFlags( cudaDeviceMapHost );
cudaError=cudaHostAlloc( (void**) &(ptr_h), sizeof(float4) * 30000, cudaHostAllocMapped | cudaHostAllocPortable );
if (cudaError) printf ("Failed to allocate pinned memory \n");
cudaError=cudaHostGetDevicePointer( (void**) &(ptr_d), ptr_h, 0 );
if (cudaError) printf ("Failed to get device pointer \n");
}
Compiled both with gcc ( gcc -I/usr/local/cuda/include bug.c -L/usr/local/cuda/lib -lcudart) or nvcc (nvcc bug.c).
BTW using cutil is usually a bad idea, the error checks will go away in release mode.
MJH22
6
Sage words: the CUDA_SAFE_CALL macro I was using (derived originally from cutil) was broken.
Your example - and mine with corrected macro - work just fine now, thanks!
Matt