Asyncrhonous cuFFT batched execution

I have a 3D grid with a float3 assigned to each cell and I want to perform, for each component of the float3, a 3D FFT.

The cufft plan is as follows:

int n = 200;
 int dim[] = {n, n, n}; 
 cufftPlanMany(&cufft_plan,                                                                                                                                                             
               3, dim, /*Three dimensional FFT*/                                                                                                                                    
               dim,                                                                                                                                                                 
               /*Each FFT starts in 1+previous FFT index. FFTx in 0*/                                                                                                                           
               3, 1, //Each element separated by three others x0 y0 z0 x1 y1 z1...                                                                                                              
               dim,                                                                                                                                                                 
               3, 1,                                                                                                                                                                            
               /*Perform 3 FFTs*/                                                                                                                                                               
               CUFFT_R2C, 3);

So I am saying, do three 3D FFTs with this data.

I want the three FFTs to execute concurrently. I guess I could do this by creating three plans and assigning a stream to each one.
But I wonder, can one configure cuFFT to perform the batched calls asynchronously?

So I can just call:

cufftExecR2C(cufft_plan, (cufftReal*)in, (cufftComplex*)out);

And let cuFFT execute the three FFTs at the same time (if the device has enough resources!)

Greetings!

batching of calls allows cufft to use the device most effectively. It will do as much work as it can on all 3 transforms (in this example). You don’t need to do anything specific to enable this, and it is a superior approach to issuing the calls for single transforms in separate streams.

Thank you for your answer!
So, always choose batching over multiple plans on multiple streams whenever possible, right?

Yes, for the case where you are using a single GPU.

Got it, tx bob!