I am using CURand (curand_init / curand_uniform) for the first time, and I noticed that when I set the sequence number the same (0) for all threads that the curand_init() function (I have a separate kernel that just calls it, my other kernel uses curand_uniform() in it) that performance is drastically better (O(10 ms) vs. O(30s)). However, in reading the CURand Library documentation, it never really explains what the sequence number in the input parameters does. So I was wondering if anyone knew/understood what this parameter does?

Thanks,
Matt

Update: I'm using 1954 threads/block, 512 blocks (~1M threads) for this, running on a Tesla C1060.

I am using CURand (curand_init / curand_uniform) for the first time, and I noticed that when I set the sequence number the same (0) for all threads that the curand_init() function (I have a separate kernel that just calls it, my other kernel uses curand_uniform() in it) that performance is drastically better (O(10 ms) vs. O(30s)). However, in reading the CURand Library documentation, it never really explains what the sequence number in the input parameters does. So I was wondering if anyone knew/understood what this parameter does?

Thanks,

Matt

Update: I'm using 1954 threads/block, 512 blocks (~1M threads) for this, running on a Tesla C1060.

[quote name='sinclair' date='17 April 2011 - 01:59 PM' timestamp='1303073941' post='1226284']
...
However, in reading the CURand Library documentation, it never really explains what the sequence number in the input parameters does. So I was wondering if anyone knew/understood what this parameter does?
[/quote]

In the CURAND 4.0 pdf there is some explanation in the "Device API Overview" section, page 11 (might be a different page number in the 3.2 version).

The sequence number spaces out starting states by 2^67 positions. So if every thread has the same seed but then sequence number 0, 1, 2, ..., then the threads will be spaced out 2^67 positions from each other. Theoretically if you could generate 2^67 values from the first thread, it would then start overlapping the beginning values from the second thread. But 2^67 is a really big number.

Another way to organize the states is to give each thread a different seed and just keep the sequence number at 0 for every thread. This will be much faster, since the setup doesn't have to do the tricky computations to advance exactly 2^67 positions. The disadvantage is that there could be theoretical weaknesses and correlations between threads (but it's unlikely).

Since you're using 1 million threads, I would go with the second option.

[quote name='sinclair' date='17 April 2011 - 01:59 PM' timestamp='1303073941' post='1226284']

...

However, in reading the CURand Library documentation, it never really explains what the sequence number in the input parameters does. So I was wondering if anyone knew/understood what this parameter does?

In the CURAND 4.0 pdf there is some explanation in the "Device API Overview" section, page 11 (might be a different page number in the 3.2 version).

The sequence number spaces out starting states by 2^67 positions. So if every thread has the same seed but then sequence number 0, 1, 2, ..., then the threads will be spaced out 2^67 positions from each other. Theoretically if you could generate 2^67 values from the first thread, it would then start overlapping the beginning values from the second thread. But 2^67 is a really big number.

Another way to organize the states is to give each thread a different seed and just keep the sequence number at 0 for every thread. This will be much faster, since the setup doesn't have to do the tricky computations to advance exactly 2^67 positions. The disadvantage is that there could be theoretical weaknesses and correlations between threads (but it's unlikely).

Since you're using 1 million threads, I would go with the second option.

I saw that in the 3.2 guide too, but I guess it didn't take the first time or I'm missing something. So, if I give them all the same sequence number N, that means that they all start at position N in their psuedo-random sequence, right? However, if I have different seeds, then they're really (hopefully) entirely different sequences, because they've been seeded differently, right? This seems like it would render the sequence number value fairly inconsequential...

Thanks,
Matt

EDIT: I should say, thanks for the input, and I will still try what you've said and post back with the results, I'm just still a bit confused on the "greater meaning" of everything.

[quote name='NathanW' date='18 April 2011 - 03:49 PM' timestamp='1303163376' post='1226799']
In the CURAND 4.0 pdf there is some explanation in the "Device API Overview" section, page 11 (might be a different page number in the 3.2 version).

The sequence number spaces out starting states by 2^67 positions. So if every thread has the same seed but then sequence number 0, 1, 2, ..., then the threads will be spaced out 2^67 positions from each other. Theoretically if you could generate 2^67 values from the first thread, it would then start overlapping the beginning values from the second thread. But 2^67 is a really big number.

Another way to organize the states is to give each thread a different seed and just keep the sequence number at 0 for every thread. This will be much faster, since the setup doesn't have to do the tricky computations to advance exactly 2^67 positions. The disadvantage is that there could be theoretical weaknesses and correlations between threads (but it's unlikely).

Since you're using 1 million threads, I would go with the second option.
[/quote]

I saw that in the 3.2 guide too, but I guess it didn't take the first time or I'm missing something. So, if I give them all the same sequence number N, that means that they all start at position N in their psuedo-random sequence, right? However, if I have different seeds, then they're really (hopefully) entirely different sequences, because they've been seeded differently, right? This seems like it would render the sequence number value fairly inconsequential...

Thanks,

Matt

EDIT: I should say, thanks for the input, and I will still try what you've said and post back with the results, I'm just still a bit confused on the "greater meaning" of everything.

[quote name='NathanW' date='18 April 2011 - 03:49 PM' timestamp='1303163376' post='1226799']

In the CURAND 4.0 pdf there is some explanation in the "Device API Overview" section, page 11 (might be a different page number in the 3.2 version).

The sequence number spaces out starting states by 2^67 positions. So if every thread has the same seed but then sequence number 0, 1, 2, ..., then the threads will be spaced out 2^67 positions from each other. Theoretically if you could generate 2^67 values from the first thread, it would then start overlapping the beginning values from the second thread. But 2^67 is a really big number.

Another way to organize the states is to give each thread a different seed and just keep the sequence number at 0 for every thread. This will be much faster, since the setup doesn't have to do the tricky computations to advance exactly 2^67 positions. The disadvantage is that there could be theoretical weaknesses and correlations between threads (but it's unlikely).

Since you're using 1 million threads, I would go with the second option.

As promised, I ran the test Nathan suggested. When I have the threads use different seeds, but the same sequence number, I get that it took 0.1 s, as opposed to ~30s.

I had the offsets the same too, if anyone was curious. When I set the offset to 100, it took ~6.5 s (when the sequences were the same).

As promised, I ran the test Nathan suggested. When I have the threads use different seeds, but the same sequence number, I get that it took 0.1 s, as opposed to ~30s.

I had the offsets the same too, if anyone was curious. When I set the offset to 100, it took ~6.5 s (when the sequences were the same).

The state space of the generator is about 2^190. No matter how you setup the state, you will end up at 1 position on a long loop of 2^190 states. If you just had one thread, there's no problem. You pick a seed, use that seed to generate a starting state, and you're good to go.

When you have multiple threads you have to start thinking about how to get the threads working independently. The simplest way is for every thread to have a different seed. Each seed maps to some position in the 2^190 loop, so for 1 million threads you will end up with 1 million "cuts" of the loop. The distance between the cuts will be erratic, they will depend on weird interactions between the seeding algorithm and mathematical properties of the pseudorandom generator. There aren't any mathematical results about when this is safe or not. For any particular pattern of seeds you can run statistical tests and verify that it's OK. But when you choose a different pattern of seeds for the next experiment you don't have assurance that this new pattern will be OK.

The idea of the sequence number and the 2^67 spacing is that the "cuts" are evenly spaced. No matter which seed you pick, every thread will be spaced 2^67 apart. That way you can analyze the statistical properties of the generator once, and the results hold for ANY seed.

As an example, suppose you use seeds of 1, 2, 3, ..., n, and no sequence number. Suppose somehow the hash function that sets up the state from the seed value has a collision on 3 and 7. That means threads 3 and 7 are generating exactly the same numbers, forever! That's obviously bad. Guaranteeing this won't happen is hard; hash functions are designed to be hard to analyze. There are lots of subtle ways that the hash function could introduce some sort of bias, and no mathematical guarantees that it won't happen.

[conclusion]

In the end the sequence number is an optional feature, it's perfectly reasonable to use it or not. On the plus side it makes it possible to analyze the quality of the RNG results and extrapolate to multiple seed values. On the negative it is computationally expensive.

The state space of the generator is about 2^190. No matter how you setup the state, you will end up at 1 position on a long loop of 2^190 states. If you just had one thread, there's no problem. You pick a seed, use that seed to generate a starting state, and you're good to go.

When you have multiple threads you have to start thinking about how to get the threads working independently. The simplest way is for every thread to have a different seed. Each seed maps to some position in the 2^190 loop, so for 1 million threads you will end up with 1 million "cuts" of the loop. The distance between the cuts will be erratic, they will depend on weird interactions between the seeding algorithm and mathematical properties of the pseudorandom generator. There aren't any mathematical results about when this is safe or not. For any particular pattern of seeds you can run statistical tests and verify that it's OK. But when you choose a different pattern of seeds for the next experiment you don't have assurance that this new pattern will be OK.

The idea of the sequence number and the 2^67 spacing is that the "cuts" are evenly spaced. No matter which seed you pick, every thread will be spaced 2^67 apart. That way you can analyze the statistical properties of the generator once, and the results hold for ANY seed.

As an example, suppose you use seeds of 1, 2, 3, ..., n, and no sequence number. Suppose somehow the hash function that sets up the state from the seed value has a collision on 3 and 7. That means threads 3 and 7 are generating exactly the same numbers, forever! That's obviously bad. Guaranteeing this won't happen is hard; hash functions are designed to be hard to analyze. There are lots of subtle ways that the hash function could introduce some sort of bias, and no mathematical guarantees that it won't happen.

[conclusion]

In the end the sequence number is an optional feature, it's perfectly reasonable to use it or not. On the plus side it makes it possible to analyze the quality of the RNG results and extrapolate to multiple seed values. On the negative it is computationally expensive.

I am using CURand (curand_init / curand_uniform) for the first time, and I noticed that when I set the sequence number the same (0) for all threads that the curand_init() function (I have a separate kernel that just calls it, my other kernel uses curand_uniform() in it) that performance is drastically better (O(10 ms) vs. O(30s)). However, in reading the CURand Library documentation, it never really explains what the sequence number in the input parameters does. So I was wondering if anyone knew/understood what this parameter does?

Thanks,

Matt

Update: I'm using 1954 threads/block, 512 blocks (~1M threads) for this, running on a Tesla C1060.

I am using CURand (curand_init / curand_uniform) for the first time, and I noticed that when I set the sequence number the same (0) for all threads that the curand_init() function (I have a separate kernel that just calls it, my other kernel uses curand_uniform() in it) that performance is drastically better (O(10 ms) vs. O(30s)). However, in reading the CURand Library documentation, it never really explains what the sequence number in the input parameters does. So I was wondering if anyone knew/understood what this parameter does?

Thanks,

Matt

Update: I'm using 1954 threads/block, 512 blocks (~1M threads) for this, running on a Tesla C1060.

...

However, in reading the CURand Library documentation, it never really explains what the sequence number in the input parameters does. So I was wondering if anyone knew/understood what this parameter does?

[/quote]

In the CURAND 4.0 pdf there is some explanation in the "Device API Overview" section, page 11 (might be a different page number in the 3.2 version).

The sequence number spaces out starting states by 2^67 positions. So if every thread has the same seed but then sequence number 0, 1, 2, ..., then the threads will be spaced out 2^67 positions from each other. Theoretically if you could generate 2^67 values from the first thread, it would then start overlapping the beginning values from the second thread. But 2^67 is a really big number.

Another way to organize the states is to give each thread a different seed and just keep the sequence number at 0 for every thread. This will be much faster, since the setup doesn't have to do the tricky computations to advance exactly 2^67 positions. The disadvantage is that there could be theoretical weaknesses and correlations between threads (but it's unlikely).

Since you're using 1 million threads, I would go with the second option.

...

However, in reading the CURand Library documentation, it never really explains what the sequence number in the input parameters does. So I was wondering if anyone knew/understood what this parameter does?

In the CURAND 4.0 pdf there is some explanation in the "Device API Overview" section, page 11 (might be a different page number in the 3.2 version).

The sequence number spaces out starting states by 2^67 positions. So if every thread has the same seed but then sequence number 0, 1, 2, ..., then the threads will be spaced out 2^67 positions from each other. Theoretically if you could generate 2^67 values from the first thread, it would then start overlapping the beginning values from the second thread. But 2^67 is a really big number.

Another way to organize the states is to give each thread a different seed and just keep the sequence number at 0 for every thread. This will be much faster, since the setup doesn't have to do the tricky computations to advance exactly 2^67 positions. The disadvantage is that there could be theoretical weaknesses and correlations between threads (but it's unlikely).

Since you're using 1 million threads, I would go with the second option.

I saw that in the 3.2 guide too, but I guess it didn't take the first time or I'm missing something. So, if I give them all the same sequence number N, that means that they all start at position N in their psuedo-random sequence, right? However, if I have different seeds, then they're really (hopefully) entirely different sequences, because they've been seeded differently, right? This seems like it would render the sequence number value fairly inconsequential...

Thanks,

Matt

EDIT: I should say, thanks for the input, and I will still try what you've said and post back with the results, I'm just still a bit confused on the "greater meaning" of everything.

[quote name='NathanW' date='18 April 2011 - 03:49 PM' timestamp='1303163376' post='1226799']

In the CURAND 4.0 pdf there is some explanation in the "Device API Overview" section, page 11 (might be a different page number in the 3.2 version).

The sequence number spaces out starting states by 2^67 positions. So if every thread has the same seed but then sequence number 0, 1, 2, ..., then the threads will be spaced out 2^67 positions from each other. Theoretically if you could generate 2^67 values from the first thread, it would then start overlapping the beginning values from the second thread. But 2^67 is a really big number.

Another way to organize the states is to give each thread a different seed and just keep the sequence number at 0 for every thread. This will be much faster, since the setup doesn't have to do the tricky computations to advance exactly 2^67 positions. The disadvantage is that there could be theoretical weaknesses and correlations between threads (but it's unlikely).

Since you're using 1 million threads, I would go with the second option.

[/quote]

I saw that in the 3.2 guide too, but I guess it didn't take the first time or I'm missing something. So, if I give them all the same sequence number N, that means that they all start at position N in their psuedo-random sequence, right? However, if I have different seeds, then they're really (hopefully) entirely different sequences, because they've been seeded differently, right? This seems like it would render the sequence number value fairly inconsequential...

Thanks,

Matt

EDIT: I should say, thanks for the input, and I will still try what you've said and post back with the results, I'm just still a bit confused on the "greater meaning" of everything.

[quote name='NathanW' date='18 April 2011 - 03:49 PM' timestamp='1303163376' post='1226799']

Since you're using 1 million threads, I would go with the second option.

I had the offsets the same too, if anyone was curious. When I set the offset to 100, it took ~6.5 s (when the sequences were the same).

Matt

I had the offsets the same too, if anyone was curious. When I set the offset to 100, it took ~6.5 s (when the sequences were the same).

Matt

The state space of the generator is about 2^190. No matter how you setup the state, you will end up at 1 position on a long loop of 2^190 states. If you just had one thread, there's no problem. You pick a seed, use that seed to generate a starting state, and you're good to go.

When you have multiple threads you have to start thinking about how to get the threads working independently. The simplest way is for every thread to have a different seed. Each seed maps to some position in the 2^190 loop, so for 1 million threads you will end up with 1 million "cuts" of the loop. The distance between the cuts will be erratic, they will depend on weird interactions between the seeding algorithm and mathematical properties of the pseudorandom generator. There aren't any mathematical results about when this is safe or not. For any particular pattern of seeds you can run statistical tests and verify that it's OK. But when you choose a different pattern of seeds for the next experiment you don't have assurance that this new pattern will be OK.

The idea of the sequence number and the 2^67 spacing is that the "cuts" are evenly spaced. No matter which seed you pick, every thread will be spaced 2^67 apart. That way you can analyze the statistical properties of the generator once, and the results hold for ANY seed.

As an example, suppose you use seeds of 1, 2, 3, ..., n, and no sequence number. Suppose somehow the hash function that sets up the state from the seed value has a collision on 3 and 7. That means threads 3 and 7 are generating exactly the same numbers, forever! That's obviously bad. Guaranteeing this won't happen is hard; hash functions are designed to be hard to analyze. There are lots of subtle ways that the hash function could introduce some sort of bias, and no mathematical guarantees that it won't happen.

[conclusion]

In the end the sequence number is an optional feature, it's perfectly reasonable to use it or not. On the plus side it makes it possible to analyze the quality of the RNG results and extrapolate to multiple seed values. On the negative it is computationally expensive.

The state space of the generator is about 2^190. No matter how you setup the state, you will end up at 1 position on a long loop of 2^190 states. If you just had one thread, there's no problem. You pick a seed, use that seed to generate a starting state, and you're good to go.

When you have multiple threads you have to start thinking about how to get the threads working independently. The simplest way is for every thread to have a different seed. Each seed maps to some position in the 2^190 loop, so for 1 million threads you will end up with 1 million "cuts" of the loop. The distance between the cuts will be erratic, they will depend on weird interactions between the seeding algorithm and mathematical properties of the pseudorandom generator. There aren't any mathematical results about when this is safe or not. For any particular pattern of seeds you can run statistical tests and verify that it's OK. But when you choose a different pattern of seeds for the next experiment you don't have assurance that this new pattern will be OK.

The idea of the sequence number and the 2^67 spacing is that the "cuts" are evenly spaced. No matter which seed you pick, every thread will be spaced 2^67 apart. That way you can analyze the statistical properties of the generator once, and the results hold for ANY seed.

As an example, suppose you use seeds of 1, 2, 3, ..., n, and no sequence number. Suppose somehow the hash function that sets up the state from the seed value has a collision on 3 and 7. That means threads 3 and 7 are generating exactly the same numbers, forever! That's obviously bad. Guaranteeing this won't happen is hard; hash functions are designed to be hard to analyze. There are lots of subtle ways that the hash function could introduce some sort of bias, and no mathematical guarantees that it won't happen.

[conclusion]

In the end the sequence number is an optional feature, it's perfectly reasonable to use it or not. On the plus side it makes it possible to analyze the quality of the RNG results and extrapolate to multiple seed values. On the negative it is computationally expensive.

Matt

Matt