cudaHostAlloc - very slow the first time
Hi,
I have a speed problem with cudaHostAlloc...
Basically my cuda routine (let's call it John) is :

1 cudaSetDeviceFlags(cudaDeviceMapHost);
2 cudaHostAlloc((void**)&A, size,cudaHostAllocMapped));
3 cudaHostAlloc((void**)&B, size,cudaHostAllocMapped));
4 ...calculations...kernels...
5 cudaFreeHost(A);
6 cudaFreeHost(B);

Execution time of 2 : 2 seconds
Execution time of 3 : 0.0001 seconds
Execution time of 1-2-3-4-5-6 : 10 seconds

Why is the first allocation so slow ?

I tried to call John twice from the main : the second call is fast : 0.0001 seconds execution time for both 2 and 3.
What's happening during the first call to cudaHostAlloc ??

Thanks,
Nicolas
Hi,

I have a speed problem with cudaHostAlloc...

Basically my cuda routine (let's call it John) is :



1 cudaSetDeviceFlags(cudaDeviceMapHost);

2 cudaHostAlloc((void**)&A, size,cudaHostAllocMapped));

3 cudaHostAlloc((void**)&B, size,cudaHostAllocMapped));

4 ...calculations...kernels...

5 cudaFreeHost(A);

6 cudaFreeHost(B);



Execution time of 2 : 2 seconds

Execution time of 3 : 0.0001 seconds

Execution time of 1-2-3-4-5-6 : 10 seconds



Why is the first allocation so slow ?



I tried to call John twice from the main : the second call is fast : 0.0001 seconds execution time for both 2 and 3.

What's happening during the first call to cudaHostAlloc ??



Thanks,

Nicolas

#1
Posted 04/25/2012 01:58 PM   
Hi,
First call to a cuda function such as cudaMalloc (and apparently cudaHostAlloc too) triggers the creation of the cuda context and potentially the wake up of the card too.
You can reduce this time by setting the persistent mode on on the card (nvidia-smi -pm 1), and avoid the pollution of your timings by triggering earlier the creation of the context with for example a call to "cudaMalloc(&prt, 0)" (where prt is a pointer to whatever).
Hi,

First call to a cuda function such as cudaMalloc (and apparently cudaHostAlloc too) triggers the creation of the cuda context and potentially the wake up of the card too.

You can reduce this time by setting the persistent mode on on the card (nvidia-smi -pm 1), and avoid the pollution of your timings by triggering earlier the creation of the context with for example a call to "cudaMalloc(&prt, 0)" (where prt is a pointer to whatever).

#2
Posted 04/25/2012 02:37 PM   
Thank you for your answer !
Thank you for your answer !

#3
Posted 04/26/2012 06:59 AM   
Scroll To Top