Greetings,
I am a complete beginner in CUDA (I’ve never hear of it up until a few weeks ago). I was given a project which requires using the CUFFT library to perform transforms in one and two dimensions. In order to test whether I had implemented CUFFT properly, I used a 1D array of 1’s which should return 0’s after being transformed. The data being passed to cufftPlan1D is a 1D array of complex numbers as shown in the following code:
void runTest(int argc, char** argv);
#define SIGNAL_SIZE 4096
#define REPEAT 5000
int main(int argc, char** argv)
{
runTest(argc, argv);
cutilExit(argc, argv);
}
void runTest(int argc, char** argv)
{
if( cutCheckCmdLineFlag(argc, (const char**)argv, “device”) )
cutilDeviceInit(argc, argv);
else
cudaSetDevice( cutGetMaxGflopsDeviceId() );
// Allocate host memory for the signal
cufftComplex* h_signal = (cufftComplex*)malloc(SIGNAL_SIZE * REPEAT * sizeof(cufftComplex));
// Initalize the memory for the signal
for (unsigned int i = 0; i < SIGNAL_SIZE; i++) {
h_signal[i].x = 1.0f; //real
h_signal[i].y = 0.0f; //imag
}
// display the signal
for (unsigned int i = 0; i < SIGNAL_SIZE; i++) {
printf("%g %g\n", h_signal[i].x, h_signal[i].y);
}
printf("End of signal\n");
// Allocate device memory for signal
Complex* d_signal;
cudaMalloc((void**)&d_signal, SIGNAL_SIZE * REPEAT * sizeof(Complex));
// Copy host memory to device
cudaMemcpy(d_signal, h_signal, SIGNAL_SIZE * REPEAT * sizeof(Complex),
cudaMemcpyHostToDevice);
// Create a 1D FFT plan
cufftHandle plan;
cufftPlan1d(&plan, SIGNAL_SIZE, CUFFT_C2C, REPEAT);
// Use the CUFFT plan to transform the signal in place
cufftExecC2C(plan, (cufftComplex *)d_signal,
(cufftComplex *)d_signal, CUFFT_FORWARD);
// Check if CUFFT library initialized successfully
if (CUFFT_SETUP_FAILED != 0)
printf("CUFFT Library initialized\n");
// Check if CUUFT executed the transform on the GPU
if (CUFFT_EXEC_FAILED != 0)
printf( "FFT successfully executed on the GPU\n" );
// Copy result from device to host
cufftComplex* h_transformed_signal = h_signal;
cutilSafeCall(cudaMemcpy(h_transformed_signal, d_signal,
SIGNAL_SIZE * REPEAT * sizeof(Complex), cudaMemcpyDeviceToHost));
// Display results
for (unsigned int i = 0; i < SIGNAL_SIZE; i++) {
printf("%g %g\n", h_transformed_signal[i].x, h_transformed_signal[i].y);
}
printf("End of result\n");
// Destroy the CUFFT plan
cufftDestroy(plan);
// Free host and device memories
free(h_signal);
cutilSafeCall(cudaFree(d_signal));
cudaThreadExit();
}
I’ve been struggling trying to figure out how to initialize and pass a 2D array of complex numbers to a 2d C2C CUFFT plan. I’ve read everything on the forums that I could, but it’s still not clear to me. I know most people mention it better to flatten multidimensional arrays, but even getting to this point is proving to be very frustrating. I’ve tried the following with no success:
// Allocate memory for host signal
cufftComplex *h_idata = (cufftComplex *)malloc(size);
for (unsigned int col = 0; col < NX; col++) {
for (unsigned int row = 0; row < NY; row++) {
h_idata[row][col].x = 1.0f; //real
h_idata[row][col].y = 0.0f; //imag
}
}
But, I do believe that CUDA flattens multidimensional arrays(?).
I sincerely appreciate any help.
Thanks