Problem using zero-copy / mapped memory Cuda 2.2 beta
Hi,

I'm trying to use the memory mapping feature of Cuda 2.2 beta. My code is:

#include "cuda/cutil.h"

[codebox]
int main(void) {
float4 *ptr_h, *ptr_d;

CUDA_SAFE_CALL( cudaSetDevice(0) );
CUDA_SAFE_CALL( cudaSetDeviceFlags( cudaDeviceMapHost ) );
CUDA_SAFE_CALL( cudaHostAlloc( (void**) &(ptr_h), sizeof(float4) * 30000, cudaHostAllocMapped | cudaHostAllocPortable ) );
CUDA_SAFE_CALL( cudaHostGetDevicePointer( (void**) &(ptr_d), ptr_h, 0 ) );
}
[/codebox]

This gives the error:

[codebox]
Cuda error in file 'f.cu' in line 10 : unspecified launch failure in prior launch.
[/codebox]

which is the HostGetDevicePointer() call.

I have the beta release 185 driver, a Red Hat 5.3 x86-64 system and a Tesla C1060.

Any ideas what I'm doing wrong?

Cheers,

Matt
Hi,



I'm trying to use the memory mapping feature of Cuda 2.2 beta. My code is:



#include "cuda/cutil.h"



[codebox]

int main(void) {

float4 *ptr_h, *ptr_d;



CUDA_SAFE_CALL( cudaSetDevice(0) );

CUDA_SAFE_CALL( cudaSetDeviceFlags( cudaDeviceMapHost ) );

CUDA_SAFE_CALL( cudaHostAlloc( (void**) &(ptr_h), sizeof(float4) * 30000, cudaHostAllocMapped | cudaHostAllocPortable ) );

CUDA_SAFE_CALL( cudaHostGetDevicePointer( (void**) &(ptr_d), ptr_h, 0 ) );

}

[/codebox]



This gives the error:



[codebox]

Cuda error in file 'f.cu' in line 10 : unspecified launch failure in prior launch.

[/codebox]



which is the HostGetDevicePointer() call.



I have the beta release 185 driver, a Red Hat 5.3 x86-64 system and a Tesla C1060.



Any ideas what I'm doing wrong?



Cheers,



Matt

#1
Posted 03/18/2009 03:00 PM   
It probably won't make a difference, but have you tried putting the setDeviceFlags before the setDevice call? The manual does say:
[quote]To be able to retrieve the device pointer to any mapped page-locked memory within
a given host thread, page-locked memory mapping must be enabled by calling
cudaSetDeviceFlags() with the cudaDeviceMapHost flag before any other
CUDA operations is performed by the thread. Otherwise,
cudaHostGetDevicePointer() will return an error.[/quote]
but I don't really thing that cudaSetDevice is a "CUDA operation"......

I'm at a conference, otherwise I would try it out myself.
It probably won't make a difference, but have you tried putting the setDeviceFlags before the setDevice call? The manual does say:

To be able to retrieve the device pointer to any mapped page-locked memory within

a given host thread, page-locked memory mapping must be enabled by calling

cudaSetDeviceFlags() with the cudaDeviceMapHost flag before any other

CUDA operations is performed by the thread. Otherwise,

cudaHostGetDevicePointer() will return an error.


but I don't really thing that cudaSetDevice is a "CUDA operation"......



I'm at a conference, otherwise I would try it out myself.

#2
Posted 03/18/2009 06:08 PM   
Will poke at this in the afternoon.
Will poke at this in the afternoon.

#3
Posted 03/18/2009 06:48 PM   
Cool, if you get this to work please report the speed gains you get compared to copying.
Cool, if you get this to work please report the speed gains you get compared to copying.

#4
Posted 03/18/2009 07:36 PM   
It works just fine for me ( 2.2 beta, RHEL4 64bit)

[code]#include "cuda_runtime.h"

int main(void) {
float4 *ptr_h, *ptr_d;
int cudaError;

cudaSetDevice(0);
cudaSetDeviceFlags( cudaDeviceMapHost );
cudaError=cudaHostAlloc( (void**) &(ptr_h), sizeof(float4) * 30000, cudaHostAllocMapped | cudaHostAllocPortable );
if (cudaError) printf ("Failed to allocate pinned memory \n");

cudaError=cudaHostGetDevicePointer( (void**) &(ptr_d), ptr_h, 0 );
if (cudaError) printf ("Failed to get device pointer \n");
}[/code]

Compiled both with gcc ( gcc -I/usr/local/cuda/include bug.c -L/usr/local/cuda/lib -lcudart) or nvcc (nvcc bug.c).
BTW using cutil is usually a bad idea, the error checks will go away in release mode.
It works just fine for me ( 2.2 beta, RHEL4 64bit)



#include "cuda_runtime.h"



int main(void) {

float4 *ptr_h, *ptr_d;

int cudaError;



cudaSetDevice(0);

cudaSetDeviceFlags( cudaDeviceMapHost );

cudaError=cudaHostAlloc( (void**) &(ptr_h), sizeof(float4) * 30000, cudaHostAllocMapped | cudaHostAllocPortable );

if (cudaError) printf ("Failed to allocate pinned memory \n");



cudaError=cudaHostGetDevicePointer( (void**) &(ptr_d), ptr_h, 0 );

if (cudaError) printf ("Failed to get device pointer \n");

}




Compiled both with gcc ( gcc -I/usr/local/cuda/include bug.c -L/usr/local/cuda/lib -lcudart) or nvcc (nvcc bug.c).

BTW using cutil is usually a bad idea, the error checks will go away in release mode.

#5
Posted 03/18/2009 08:27 PM   
[quote]BTW using cutil is usually a bad idea, the error checks will go away in release mode.[/quote]

Sage words: the CUDA_SAFE_CALL macro I was using (derived originally from cutil) was broken.
Your example - and mine with corrected macro - work just fine now, thanks!

Matt
BTW using cutil is usually a bad idea, the error checks will go away in release mode.




Sage words: the CUDA_SAFE_CALL macro I was using (derived originally from cutil) was broken.

Your example - and mine with corrected macro - work just fine now, thanks!



Matt

#6
Posted 03/19/2009 12:09 PM   
Scroll To Top