Simple Matrix-Vector Multiplication

Hello all,

I am sure this is a very simple problem but my brain does not want to work properly today. Given a square matrix and vector I would like to perform matrix-vector operation using CUDA. I found some examples in CUDA SDK for Matrix multiplication where the dimensions of matrices are multiples of defined block size, I would like a more generalized operation if possible. Are there any examples/tutorials for the multiplication of [M-by-N] matrix by a [N-by-P] vector?

Thank you.

If you want to know matrix multiplication on general dimension, you can check this thread
[url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA

I write a report on SGEMM and discuss how to extend SGEMM to arbitrary dimension in section 8

Wow, great work.

Thank you

Another quick question. Given that I am just trying to really learn CUDA with regards to matrix-vector operation(s) is there a simple CUDA code for matrix-vector multiplication (does not have to be all that efficient)? I like the code you built but it seems like a bit much for what I need - I just want to learn the basics of a matrix-vector multiplication on GPU.

Thanks again.

for matrix-vector multiplication, you can look at reduction example in SDK.

for matrix-matrix multiplication, matrixMul in SDK uses shared memory but only works for specific dimension.

You can try to extend matrixMul in SDK to arbitrary dimension.

if you want to know how to use registers instead of shared memory, then

I strongly recommand volkov’s paper and his code, you can download them in the thread

http://forums.nvidia.com/index.php?showtopic=89084

Here we shared thoughts, ideas, and some code on GEMV: [url=“The Official NVIDIA Forums | NVIDIA”]http://forums.nvidia.com/index.php?showtop...62330&st=20[/url]

Thanks very much for all the great information.

I have another question - I hope it is not too far off topic. Can Gaussian Elimination be implemented on CUDA? I know that it contains a lot of back/forward substitutions which don’t seem to leave a lot of room for parallelism via CUDA but it would be interesting to see.

Thanks very much for all the great information.

I have another question - I hope it is not too far off topic. Can Gaussian Elimination be implemented on CUDA? I know that it contains a lot of back/forward substitutions which don’t seem to leave a lot of room for parallelism via CUDA but it would be interesting to see.