In order to promote the use of CUDA for more than machine learning and image processing I am starting a series of blogs showing how to convert well known algorithms to CUDA CPU/GPU hybrid implementations.
The first in the series, the discrete knapsack problem which returns all items used to generate the optimal result;
[url]https://sites.google.com/site/cudadiscreteknapsack/[/url]
Not the most elegant implementation but it is a good starting point for those new to CUDA. On a well configured Titan X over a 25 time speedup over a serial CPU single threaded 4.5 GHz implementation.
This is not an ‘embarrassingly parallel’ algorithm and it is interesting to determine how to break the problem down into portions which can be mapped to a GPU.