Incrementing Shared variable

Greetings all!,

I am having the following code which is trying to do a PageRank. I have implemented a Lock and unlock in order to increment globally shared variables. The Lock is implemented as below:

struct Lock{
  int *mutex;
  Lock(void){
    int state = 0;
    gpuErrchk(cudaMalloc((void**) &mutex, sizeof(int)));
    cudaMemcpy(mutex, &state, sizeof(int), cudaMemcpyHostToDevice);
  }
  ~Lock(void){
    cudaFree(mutex);
  }
  __device__ void lock(void){
    while(atomicCAS(mutex, 0, 1) != 0);
  }
  __device__ void unlock(void){
    atomicExch(mutex, 0);
  }
};

I have a function called PageRank which does the PageRank computation as shown below.

<i>(some code)</i>

bool PageRank()
{
     unsigned int *num_modified, *num_constant; //variable declaration
     <i>(some code)</i>

     cudaMalloc((void**) &num_modified, sizeof(unsigned int) ); //variable allocation
     cudaMalloc((void**) &num_constant, sizeof(unsigned int) );

     <i>(some more code)</i>

     kernel.spmv(argument..., MyLock, num_modified, num_constant);
}

In the function spmv I have a call to another global function (csr32DynamicWarp) which does the following:

__global__ void csr32DynamicWarp (Lock MyLock, unsigned int *num_modified, unsigned int *num_constant)
{
   MyLock.lock();
   <i>(some more code to set a flag called cur_flag)</i>

    if(cur_flag == false)
           *num_constant ++;
     else
          *num_modified ++;
   MyLock.unlock();
}

Note that I have a global lock that implements mutual exclusion in every thread, it is called by calling the variable MyLock.lock();

So each time the above function (csr32DynamicWarp) is called either one of num_constant or num_modified needs to be incremented. But when I copy to host memory and print these variables in the function (PageRank) I always see 0 and 0 for the variables num_constant and num_modified respectively. Shouldnt at least one of them have the value 1 ?

is csr32DynamicWarp being called from host code or from device code (i.e. another GPU kernel)? It’s not clear from the code you have shown. kernel.spmv appears to be a host function, but it’s not clear if you have any other GPU kernels inside it which are calling csr32DynamicWarp, or if that is being called directly from host code inside kernel.spmv

Also, what happens if you run your code with cuda-memcheck ?

which cuda version?

Note that for the very simple usage shown here (updating a single variable) atomics would be a better choice than the locking mechanism.

Thank you for the prompt reply. csr32DynamicWarp is a device function and is called from host code, it is the only kernel in the program. I am using CUDA-7.0. I have a locking mechanism in place as I need to update a common shared table and I am using variables just to get some stats.

Running cuda-memcheck I get the following messages:

========= Program hit cudaErrorInvalidValue (error 11) due to “invalid argument” on CUDA API call to cudaMemcpy.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib64/libcuda.so.1 [0x2e7c03]
========= Host Frame:./PageRankCuda [0x5823f]
========= Host Frame:./PageRankCuda [0x1de74]
========= Host Frame:./PageRankCuda [0x6a2a]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x21b25]
========= Host Frame:./PageRankCuda [0x50ff]

========= Program hit cudaErrorInvalidValue (error 11) due to “invalid argument” on CUDA API call to cudaMemcpy.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib64/libcuda.so.1 [0x2e7c03]
========= Host Frame:./PageRankCuda [0x5823f]
========= Host Frame:./PageRankCuda [0x1de93]
========= Host Frame:./PageRankCuda [0x6a2a]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x21b25]
========= Host Frame:./PageRankCuda [0x50ff]

========= Program hit cudaErrorInvalidValue (error 11) due to “invalid argument” on CUDA API call to cudaMemcpy.
========= Saved host backtrace up to driver entry point at error
========= Host Frame:/usr/lib64/libcuda.so.1 [0x2e7c03]
========= Host Frame:./PageRankCuda [0x5823f]
========= Host Frame:./PageRankCuda [0x1deb2]
========= Host Frame:./PageRankCuda [0x6a2a]
========= Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x21b25]
========= Host Frame:./PageRankCuda [0x50ff]

You need to sort out those cuda-memcheck errors.

You should run cuda-memcheck on your codes before asking for help in a public forum.

Thanks for the pointer and apologies for not running memcheck. However the problem was not with memcheck instead the increment that I am doing *num_constant++ should actually be (*num_constant) ++. Minor but significant change