[question] simulation of strings, independent process

Hello everybody,

I hope that the post is clear enough, english is not my mother tongue.
I have a question for the community but before ask it I should probably introduce my problem (in easy therms).
In my work I should simulate the pulling of a string that have some parameters. The string start from a random
conformation and, applying force, it reach a straight conformation. I do it with a Monte Carlo simulation and
the code works in the CPU (it take severals ours just for a single simulation using one core).

The point is the following, I should simulate many (think in the order of 200) strings that have slightly different
properties but I do not have access to a cluster of CPUs.

Each simulations is -->independent<-- from another, so the cores|execution units do not have to talk or synchronize
each others, the simulation doesn’t require also a lot of memory(500 kB for simulation is enough).
The code is roughly 200 lines, it is not extremely complicate but it use several “for”, “if” and mathematical
functions, plus random number generators.

I thought that, even if the GPU is slower to do a single simulation I could run in parallel several simulations,
that should mean a gain of time. Am I wrong?

What do you think, is it worth to port in the GPU?

Any opinion is welcome.

200 independent simulations are not many from the GPU perspective - they prefer having tens to hundredths of thousands threads running. So you will have to extract more parallelism from each individual simulation. This should however not be too hard, parallelizing the outermost for loop will likely be good enough.

A question just out of interest, if I may: Why are you using a Monte Carlo simulation? Aren’t there faster ways of simulating dissipation? Is the element of randomness needed to get the right sound in an ensemble of instruments?

Hello,

I am running a MC simulation for a N-body problem. At each MC step I move one particle . For the new energy N interaction are calculate and it is done on the gpu. Because my system is not so large it does not fille GPU, but because but I need many measurements, I use streams to simulate independent configurations and get more statistics.

You are saying that is not worth unless I squeeze all the parallelism possible of the GPU, did I get you correctly?

About the simulation, I will try to answer to your question however I don’t know if I will be able to give you the answer that you want.

The idea is that each timestep you apply a force F(t), this force increase at each timestep.

At the same time the chain try a random change of conformation (well, not completely random, let say that a segment of the chain can change direction in the space).

The chain is in fact a polymer at a certain temperature, so each segment move (and the chain accordingly), with some constrains.

The change of energy (DeltaE) is calculated considering the force applied and the change of extension of the chain, this DeltaE is compared

to a random number for the acceptance.

If you can parallelize at least to get a few blocks you can fill the GPU. It is not going to achieve the max performance, but for me It gave enough speed up to use it for production runs.

You don’t need to squeeze out all parallelism, but likely you need more than just the ~200 independent simulations. But prospects are quite bright: Using one block of between 64…1024 threads for each simulation should be easy (they can fully communicate with each other and run in sync), you are right at the number of threads you need.

Thanks for the explanation!