Hi cuFFT developers,
I hope you can help out. Following test case on K20X with CUDA 7.5, gcc4.8.2, RHEL6.7.
doubles, Outplace Complex, 1D, roundtrip
// pseudo
for( e : {262144, 67108863, 262144} ) {
try {
alloc data
create plan
fft_forward
fft_inverse
clean data
destroy plan
}catch(...){
clean data
destroy plan
}
}
Code for own testing is attached http://hostcode.sourceforge.net/view/7741:
Error report:
262144 … works
67108863 … plan: out of memory (expected) [1]
262144 … fft_forward: out of memory (unexpected) [2]
[1] CUFFT_ALLOC_FAILED [2] test_cufft.cpp:106 cufftPlan1d(&plan, extents[0], CUFFT_Z2Z, 1)
[2] CUFFT_EXEC_FAILED [6] test_cufft.cpp:110 cufftExecZ2Z(plan, data, data_transform, CUFFT_FORWARD)
While data is cleaned up after the error [1], something remains misconfigured within cuFFT.
This leads again to an out of memory error in [2] (cuda-memcheck says so), although memory is available (cudaMemGetInfo).
cuda-memcheck
w.r.t. [1]: Program hit cudaErrorMemoryAllocation (error 2) due to “out of memory” on CUDA API call to cudaMalloc.
w.r.t. [2]: Program hit cudaErrorMemoryAllocation (error 2) due to “out of memory” on CUDA API call to cudaPeekAtLastError.
The second error occurred with the kernel “bluestein_init”, at least this was the last kernel called (nvprof).
A colleague had the idea, that local memory allocation could lead to problem, and bluestein_init uses indeed local memory. But how it comes, that this kernel throws out of memory error at [2]?
Or did I oversee something stupid?
Best Regards