cuFFT out of memory yields "irreparable" context

Hi cuFFT developers,

I hope you can help out. Following test case on K20X with CUDA 7.5, gcc4.8.2, RHEL6.7.

doubles, Outplace Complex, 1D, roundtrip

// pseudo
for( e : {262144, 67108863, 262144} ) {
 try {
  alloc data
  create plan
  fft_forward
  fft_inverse
  clean data
  destroy plan
 }catch(...){
  clean data
  destroy plan
 }
}

Code for own testing is attached http://hostcode.sourceforge.net/view/7741:

Error report:

262144 … works
67108863 … plan: out of memory (expected) [1]
262144 … fft_forward: out of memory (unexpected) [2]

[1] CUFFT_ALLOC_FAILED [2] test_cufft.cpp:106 cufftPlan1d(&plan, extents[0], CUFFT_Z2Z, 1)
[2] CUFFT_EXEC_FAILED [6] test_cufft.cpp:110 cufftExecZ2Z(plan, data, data_transform, CUFFT_FORWARD)

While data is cleaned up after the error [1], something remains misconfigured within cuFFT.
This leads again to an out of memory error in [2] (cuda-memcheck says so), although memory is available (cudaMemGetInfo).

cuda-memcheck
w.r.t. [1]: Program hit cudaErrorMemoryAllocation (error 2) due to “out of memory” on CUDA API call to cudaMalloc.
w.r.t. [2]: Program hit cudaErrorMemoryAllocation (error 2) due to “out of memory” on CUDA API call to cudaPeekAtLastError.

The second error occurred with the kernel “bluestein_init”, at least this was the last kernel called (nvprof).

A colleague had the idea, that local memory allocation could lead to problem, and bluestein_init uses indeed local memory. But how it comes, that this kernel throws out of memory error at [2]?
Or did I oversee something stupid?

Best Regards

Try CUDA 8. I was able to reproduce the 2nd error on CUDA 7.5 but not on CUDA 8. On CUDA 8 I get this:

$ nvcc -std=c++11 -o t1228 t1228.cu -lcufft
t1228.cu(96): warning: variable "s" was declared but never referenced

t1228.cu(96): warning: variable "s" was declared but never referenced

$ ./t1228
5623 MiB, 5590 MiB
Success: nx=262144
3554 MiB, Error for nx=67108863: CUDA error cufft CUFFT_ALLOC_FAILED [2] t1228.cu:106 cufftPlan1d(&plan, extents[0], CUFFT_Z2Z, 1)
5594 MiB, 5590 MiB
Success: nx=262144
$

CUDA 8.0RC, Tesla K20X, RHEL 7.2

Hi txbob,

thanks for testing and confirming the results on RHEL 7.2.
Since CUDA 8.0 RC is only “RC”, our HPC team wants to wait for the release.
I hope it will be released within the next weeks :)

Best Regards