cublasDgetrfBatched and cublasDgetriBatched

Nevermind, am slowly figuring it all out!

Here is a fully worked example showing how to invert a matrix using these functions:

http://stackoverflow.com/questions/22887167/cublas-incorrect-inversion-for-matrix-with-zero-pivot

The example was actually created to demonstrate an issue, but the code is functional and should produce correct results if you are using CUDA 6 (which has the identified bug fixed).

I think as described in the documentation:
[url]http://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-getrfbatched[/url]
"This function is intended to be used for matrices of small sizes where the launch overhead is a significant factor. "

there may be more efficient methods for inverting single large matrices, but I am not an expert.