Multi-GPU system, running MPI/CUDA

Hi All,

I am working on a Multi-GPU system, till yesterday we had 2 CPU’s (each CPU is 6 cores) and 2 Tesla K20 in the system but we just installed another 2 Tesla K40 cards. Now we have 4 cards in an dual CPU system.

Till date I was doing something like this.

void set_device_before_mpi_init_()
{
char * localRankStr = NULL;
int rank = 0, devCount = 0;
cudaError_t cudaStat1;

// We extract the local rank initialization using an environment variable
if ((localRankStr = getenv(ENV_LOCAL_RANK)) != NULL)
{
rank = atoi(localRankStr);
}

cudaDeviceReset();
cudaThreadExit();

cudaGetDeviceCount(&devCount);
printf(“device count %d %d %d\n”, devCount, rank, rank%devCount);

cudaStat1 = cudaSetDevice(rank % devCount);
if(cudaStat1 != cudaSuccess)
printf(“ERROR DEVICE SET FAILED\n”);
}

It works fine this way.

But now I want to use only the K40 cards, so I thought of doing some think like this.

void set_device_before_mpi_init_()
{
char * localRankStr = NULL;
int rank = 0, devCount = 0, current_device = -10;
cudaError_t cudaStat1;

// We extract the local rank initialization using an environment variable
if ((localRankStr = getenv(ENV_LOCAL_RANK)) != NULL)
{
rank = atoi(localRankStr);
}

cudaDeviceReset();
cudaThreadExit();

struct cudaDeviceProp prop[4];

cudaGetDeviceProperties(&prop[0], 0);
cudaGetDeviceProperties(&prop[1], 1);
cudaGetDeviceProperties(&prop[2], 2);
cudaGetDeviceProperties(&prop[3], 3);

printf(" device id = 0, name = %s \n", prop[0].name);
printf(" device id = 1, name = %s \n", prop[1].name);
printf(" device id = 2, name = %s \n", prop[2].name);
printf(" device id = 3, name = %s \n", prop[3].name);

cudaStat1 = cudaSetDevice(0);
cudaStat1 = cudaSetDevice(3);

if(cudaStat1 != cudaSuccess)
printf(“ERROR DEVICE SET FAILED\n”);

cudaGetDevice(&current_device);
printf(“current device %d\n”, current_device);

cudaDeviceReset();
cudaThreadExit();

}

The output of cudaGetDeviceProperties is

device id = 0, name = Tesla K40c
device id = 1, name = Tesla K20c
device id = 2, name = Tesla K20c
device id = 3, name = Tesla K40c

So I hard coded cudaSetDevice(0) and (3), Since it reports K40 to have id 0 and 3

but it does not work. It still select K20 I suppose.

what should i do. Is the setting of the device using the output of cudaGetDeviceProperties reliable?

Thanks…

I noticed another problem. When I launch one MPI rank per card, I can use one K40 or two K40 or K40 combined with K20… But I cannot use a single K20 on its own. Forgot to mention that all 4 cards are connected on the same PCI bus…

Any insight on what happening will be helpful.