Hi All,
I am working on a Multi-GPU system, till yesterday we had 2 CPU’s (each CPU is 6 cores) and 2 Tesla K20 in the system but we just installed another 2 Tesla K40 cards. Now we have 4 cards in an dual CPU system.
Till date I was doing something like this.
void set_device_before_mpi_init_()
{
char * localRankStr = NULL;
int rank = 0, devCount = 0;
cudaError_t cudaStat1;
// We extract the local rank initialization using an environment variable
if ((localRankStr = getenv(ENV_LOCAL_RANK)) != NULL)
{
rank = atoi(localRankStr);
}
cudaDeviceReset();
cudaThreadExit();
cudaGetDeviceCount(&devCount);
printf(“device count %d %d %d\n”, devCount, rank, rank%devCount);
cudaStat1 = cudaSetDevice(rank % devCount);
if(cudaStat1 != cudaSuccess)
printf(“ERROR DEVICE SET FAILED\n”);
}
It works fine this way.
But now I want to use only the K40 cards, so I thought of doing some think like this.
void set_device_before_mpi_init_()
{
char * localRankStr = NULL;
int rank = 0, devCount = 0, current_device = -10;
cudaError_t cudaStat1;
// We extract the local rank initialization using an environment variable
if ((localRankStr = getenv(ENV_LOCAL_RANK)) != NULL)
{
rank = atoi(localRankStr);
}
cudaDeviceReset();
cudaThreadExit();
struct cudaDeviceProp prop[4];
cudaGetDeviceProperties(&prop[0], 0);
cudaGetDeviceProperties(&prop[1], 1);
cudaGetDeviceProperties(&prop[2], 2);
cudaGetDeviceProperties(&prop[3], 3);
printf(" device id = 0, name = %s \n", prop[0].name);
printf(" device id = 1, name = %s \n", prop[1].name);
printf(" device id = 2, name = %s \n", prop[2].name);
printf(" device id = 3, name = %s \n", prop[3].name);
cudaStat1 = cudaSetDevice(0);
cudaStat1 = cudaSetDevice(3);
if(cudaStat1 != cudaSuccess)
printf(“ERROR DEVICE SET FAILED\n”);
cudaGetDevice(¤t_device);
printf(“current device %d\n”, current_device);
cudaDeviceReset();
cudaThreadExit();
}
The output of cudaGetDeviceProperties is
device id = 0, name = Tesla K40c
device id = 1, name = Tesla K20c
device id = 2, name = Tesla K20c
device id = 3, name = Tesla K40c
So I hard coded cudaSetDevice(0) and (3), Since it reports K40 to have id 0 and 3
but it does not work. It still select K20 I suppose.
what should i do. Is the setting of the device using the output of cudaGetDeviceProperties reliable?
Thanks…