Compile .cu like .cpp

Hello everyone,

I guess the titles isn’t that explicit…

But I’m wondering if I can use my .cu file and compiles them for .cpp usage…

I’ll explain :

I’m codding a program in CUDA C/C++ and if wanted the project to be used as well in C++. So I made a #define that check if the architecture is CUDACC or not. So if it is I chose one way and if not I chose an other way.

For host device function I defined it like this :

#ifdef __CUDACC__
# define __CUDA__ __host__ __device__
#else
# define __CUDA__

And for the main function I made 2 differents main (more understandable) with their own initialisation (functions)

But (I know it’s wrong) I code all of my program in ONE .cu file.

Before you ask, I’m on Windows using CUDA7.5 on VisualStudio2013

Can I compile .cu file like if it was a .cpp file ?

Thanks ! =)

Edit : Yeah I want to add something. I want (if it’s possible) to have something that work without having the CUDA toolkit. Like if I take my code to a computer that did not have NVIDIA graphic card, I’d like it to work too

I’m not sure about compiling a .cu file as .cpp (e.g. in Visual Studio). But you can definitely compile a .cpp file as if it were a .cu file. nvcc has an option to do that: -x cu

So you could put all your CUDA code in .cpp files, and in your CUDA projects be sure that nvcc is specifying -x cu. It should work the same as if the file was named .cu

So doing this “trick” i’m able to launch the program compiled with nvcc to run CUDA and with g++ to run cpp code ?

It is not clear to me what it is you are asking.

If the question is can you write code that automagically runs at as a serial program when compiled as a C++ program on the host, but as a parallel program when compiled with CUDA for the device, the answer is “no”.

Is the question is can you write code with appropriate #ifdef’s that can compile as a C++ program for the host, as well as a CUDA program for the device, the answer is “yes”.

Your statement "if I take my code to a computer that did not have NVIDIA graphic card, I’d like it to work too " suggests to me you are looking for the former scenario.

So, if I understand well what you said. I can’t have ONE program that can’t run on a computer that did not have CUDA and compiling with g++ AND run on a computer that have CUDA compiling with nvcc.

Event if the CUDA part (of the code) is under #ifdef, so something like this, can’t work on every machine ? :

example : main.cpp (using your “trick” I rename it .cpp instead of .cu)

#ifdef __CUDACC__
# define __CUDA__ __host__ __device__
#else
# define __CUDA__
#endif

#ifdef __CUDACC__
__global__ void kernel(int *a, int *b, int *c)
{
  int tid;

  tid = threadIdx.x + blockIdx.x * blockDim.x;
  c[tid] = a[tid] + b[tid];
}

int main(void)
{
 int *cuda_a, *cuda_b, *cuda_c, a[100], b[100], c[100];
 for (int i = 0; i < 100; i++)
 {
   a[i] = rand() % 1000;
   b[i] = rand() % 100;
 }
 cudaMalloc((void **)&cuda_a, sizeof(int) * 100);
 cudaMalloc((void **)&cuda_b, sizeof(int) * 100);
 cudaMalloc((void **)&cuda_c, sizeof(int) * 100);
 cudaMemcpy(cuda_a, a, sizeof(int) * 100, cudaMemcpyHostToDevice);
 cudaMemcpy(cuda_b, b, sizeof(int) * 100, cudaMemcpyHostToDevice);
 kernel <<< 10, 10 >>> (cuda_a, cuda_b, cuda_c);
 cudaMemcpy(c, cuda_c, sizeof(int) * 100, cudaMemcpyDeviceToHost);
 cudaFree(cuda_a);
 cudaFree(cuda_b);
 cudaFree(cuda_c);
 return (0);
}
#else
void add(int *a, int *b, int *c)
{
  for (int i = 0; i < 100)
    c[i] = a[i] + b[i];
}

int main(void)
{
 int a[100], int b[100], int *c;

 c = (int *)malloc(sizeof(int) * 100);
 for (int i = 0; i < 100; i++)
 {
   a[i] = rand() % 1000;
   b[i] = rand() % 100;
 }
 add(a, b, c);
}
#endif

So for this code, if it’s a .cpp file and if I compile it with g++ it will fail ? Of course it didn’t with nvcc (I tried)

So what I wanted to know it’s how to compile code like this, on a computer that does not have CUDA (nvcc) AND on a computer that have CUDA (nvcc).

Edit : keeping the common functions set with host device replaced by CUDA out of the #ifdef

Your code contains errors and non-standard idioms. After cleaning those up, I don’t have any problems compiling the code with nvcc and icl (that’s the Intel compiler, I don’t have gcc here). When I compile with icl, I specify /Tp to tell it to treat the code as C++, regardless of file extension.

> icl /W4 /Tp two-platform.cu
Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.3.198 Build 20130607
Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.

two-platform.cu
Microsoft (R) Incremental Linker Version 10.00.40219.01
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:two-platform.exe
two-platform.obj

Now nvcc:

> nvcc -arch=sm_50 -o two-platform.exe  two-platform.cu
two-platform.cu
   Creating library two-platform.lib and object two-platform.exp

The cleaned-up code in file two-platform.cu:

#include <stdio.h>
#include <stdlib.h>

#ifdef __CUDACC__
# define __CUDA__ __host__ __device__
#else
# define __CUDA__
#endif

#ifdef __CUDACC__
__global__ void kernel(int *a, int *b, int *c)
{
    int tid;

    tid = threadIdx.x + blockIdx.x * blockDim.x;
    c[tid] = a[tid] + b[tid];
}

int main(void)
{
    int *cuda_a, *cuda_b, *cuda_c, a[100], b[100], c[100];

    printf ("Hello from CUDA\n");

    for (int i = 0; i < 100; i++) {
        a[i] = rand() % 1000;
        b[i] = rand() % 100;
    }
    cudaMalloc((void **)&cuda_a, sizeof(int) * 100);
    cudaMalloc((void **)&cuda_b, sizeof(int) * 100);
    cudaMalloc((void **)&cuda_c, sizeof(int) * 100);
    cudaMemcpy(cuda_a, a, sizeof(int) * 100, cudaMemcpyHostToDevice);
    cudaMemcpy(cuda_b, b, sizeof(int) * 100, cudaMemcpyHostToDevice);
    kernel <<< 10, 10 >>> (cuda_a, cuda_b, cuda_c);
    cudaMemcpy(c, cuda_c, sizeof(int) * 100, cudaMemcpyDeviceToHost);
    cudaFree(cuda_a);
    cudaFree(cuda_b);
    cudaFree(cuda_c);
    return (0);
}
#else
void add(int *a, int *b, int *c)
{
    for (int i = 0; i < 100; i++) {
        c[i] = a[i] + b[i];
    }
}

int main(void)
{
    int a[100], b[100], *c;
    
    printf ("Hello from C++\n");

    c = (int *)malloc(sizeof(int) * 100);
    for (int i = 0; i < 100; i++) {
        a[i] = rand() % 1000;
        b[i] = rand() % 100;
    }
    add(a, b, c);
}
#endif

Thanks a lot for your answer so as I can see you can compile .cu file with a .cpp compiler, is this normal ? If yes, it’s “better” for me =) ! Now that i know it work and how to do it, I can move forward !!!

Thanks again !

Edit : ow i just saw that you specified a flag to read it as cpp !

Pretty much all C/C++ compilers have a command line switch for explicitly specifying the source language, rather than deriving it from the filename extension. For the GNU toolchain, it should be -x (see the help output of the compiler or documentation for details):

-x language