Hello,
I recently bought new GTX 1080 as replacement for GTX 980 Ti. I ran few CUDA benchmarks and found out that calling cudaMalloc on GTX 1080 is almost 20x slower compared to GTX 980Ti. Please, see my code sample below.
Is this just a glich of realease candidate version of CUDA 8 and this will be fixed in full version?
Thanks a lot in advance
Cestmir
Environment:
OS: Windows 7 64 bit
nVIDIA Driver: 368.81 WHQL
CUDA Toolkit: both CUDA 7.5, CUDA 8RC
Source code:
int main(int argc, char **argv)
{
float *f_A, *f_B;
// warming up CUDA
checkCudaErrors(cudaMalloc((void **)&f_A, 100*1024*1024)); // dummy allocation 100 MB
fnElapsedTime();
checkCudaErrors(cudaMalloc((void **)&f_B, 8053063680)); // alloc 7.5 GB
printf("cudaMalloc time: %.1lf sec.\n", fnElapsedTime());
// clean up memory
checkCudaErrors(cudaFree(f_A));
checkCudaErrors(cudaFree(f_B));
}
Output:
cudaMalloc time: 9.8 sec.