PyCUDA + Thrust Example in case someone is curious

I need to write a toy Monte Carlo in a Python application, but want to reuse the Thrust device-side RNG functions. It turns out this is not so hard. In case someone is curious, here’s a quick example application showing how:

import pycuda.autoinit, pycuda.driver

from pycuda.compiler import SourceModule

import numpy

import time

mod = SourceModule('''

#include <thrust/random.h>

extern "C" {

__global__ void test_rand(int numRand, float *samples)

{

  int threadID = blockIdx.x * gridDim.x + threadIdx.x;

  thrust::default_random_engine rng;

  rng.discard(numRand * threadID);

thrust::uniform_real_distribution<float> rand01(0,1);

float acc=0.0;

  for (int i=0; i < numRand; i++) {

	float r = rand01(rng);

	acc += r;

  }

samples[threadID] = acc/numRand;  // Normalize back to range [0,1)

}

}

''', no_extern_c=True, include_dirs=['/home/seibert/tmp/'])

test_rand = mod.get_function('test_rand')

### Compute many sums of random numbers

numRand = 512

threads_per_block = 256

blocks = 512

threads = threads_per_block * blocks

print 'Sampling sum of', numRand, 'random numbers', threads, 'times.'

samples = numpy.zeros(threads, dtype=numpy.float32)

start = time.time()

test_rand(numpy.int32(numRand), pycuda.driver.Out(samples), grid=(blocks,1), block=(threads_per_block,1,1))

stop = time.time()

print 'Execution+copy time = %1.1f msec' % ( (stop-start) * 1000.0,)

### Look at results in a lame text-only way

### Squint hard!  Central Limit Theorem says this is approximately a Gaussian.

hist, bin_edges = numpy.histogram(samples, bins=20, range=(0.3, 0.7))

print 'Distribution:', hist

Notes:

  • You need no_extern_c=True, otherwise PyCUDA will wrap your Thrust #includes with extern “C” and break compilation.

  • Then you need to manually wrap your kernel in extern “C”, otherwise the name will be mangled and you won’t be able to look it up later with get_function().

  • If you put thrust in a non-standard location, include_dirs is required. (Remember to go up a level if you put “thrust” in the #include path)

  • Buyer-beware on the use of the default linear congruence generator. Unfortunately, taus88 performs terribly in this example because there is no efficient implementation of the discard() function. I’ll have to see if hashing the threadID for a seed, as in one of the thrust examples, is acceptable.

1 Like