Can someone explain how to use cudaMallocHost? My code is working using cudaMalloc. Naively, I thought I could simply change cudaMalloc to cudaMallocHost and the code would still work. But it seems other changes are required. I have tried all I can think of. I need help.
Here are code snippets that show the way I am doing all calls. The code does exactly what I want in the kernel. All is fine, although too slow.
So I thought I would try to change cudaMalloc to cudaMallocHost to see what difference in speed there would be but then I get no results at all. The code compiles, I get no run-time errors but I know the kernel call RejectingonDevice does nothing or if it does the way I am coding the calls is faultyā¦
I have tried all I can think of. Should cudaMemcpyHostToDevice also be changed to cudaMemcpyHostToHost? Tried, did not help. I am sure some-one will know instantly what trivial thing I am doing wrong. I hope they can also tell me. Thanks. And maybe the answer will help others who are as naive as me.
All I want to do is change cudaMalloc to cudaMallocHost and be told what else to alter in code snippets below or maybe I do not understand something and it is not possible?. Thanks
void dumb_rejecting (int g_nots, int* dev_nots)
{
cudaMemcpy(dev_nots, g_nots, gsgssizeof(int), cudaMemcpyHostToDevice);
int blockSize = gs*gs; // always less than 512
int nBlocks = 1;
FunctionOnDevice <<< nBlocks, blockSize >>> (dev_nots);
cudaMemcpy(g_nots, dev_nots, gs*gs*sizeof(int), cudaMemcpyDeviceToHost);
}
int dumb_not_allowed(int g_nots, int* dev_nots)
{
dumb_rejecting (g_nots, dev_nots);
}
int main( int argc, char** argv)
{
int gs=8;
int* g_nots = NULL;
g_nots = new int[gs*gs];
int* dev_nots;
cudaMalloc((void **) &dev_nots, sizeof(int)*gs*gs);
dumb_not_allowed(g_nots,dev_nots);
}