npp tutorials?

Dear someone,
I hope this is the right forum to ask - are there any good blogs or tutorials (pdf or videos) on the use of npp primitives for image processing? My own googling has only produces the pdf on npp primitives. Any links are greatly appreciated.

There are some resources in the nvidia gtc website, such as this one:

[url]http://on-demand.gputechconf.com/gtc/2014/presentations/HANDS-ON-LAB-S4793-image-processing-using-npp.pdf[/url]

here’s a sample npp search on the gtc website:

[url]http://on-demand-gtc.gputechconf.com/gtc-quicklink/eepQUou[/url]

Disclaimer: Opinion only;

In my experience all the Nvidia libraries included in the SDK(cuBLAS, cuSPARSE, thrust(open source) ) are well documented, reliable and fast, with the exception of NPP.

IMO you will spend less time learning to write your own convolution and filtering kernels in CUDA, than figuring out how to use NPP.

On top of that, but their own admission (in that PDF link) the library is only 5-10 times faster than single core CPU, which is just terrible. If you search Github you will find open source CUDA projects which have 2D and 3D projects(like 3D Gaussian blur, separable convolutions etc) which are about 10-20x faster than NPP and about 100x faster than a CPU implementation.

Granted it is a complicated library and needs to be compatible with industry standards, but this might be a good opportunity to “roll your own” CUDA code for image processing.

How did you conclude single core from that PDF link?

The performance slide (5) is making a comparison to Intel ipp, on a 6-core intel processor, and ipp is specifically referred to by Intel as a “pre-threaded” library. It is natively a multi-threaded, multi-core capable library.

thanks, typical of me to want to use the least documented library then…

Missed the multi-core qualifier for the CPU comparison, sorry about that.

I guess a point I was trying to make was that image processing is an ideal choice for GPUs, and this category of algorithm is a good starting point for learning CUDA.

It may be that NPP does suit your specific use case.