I am sure this is a very simple problem but my brain does not want to work properly today. Given a square matrix and vector I would like to perform matrix-vector operation using CUDA. I found some examples in CUDA SDK for Matrix multiplication where the dimensions of matrices are multiples of defined block size, I would like a more generalized operation if possible. Are there any examples/tutorials for the multiplication of [M-by-N] matrix by a [N-by-P] vector?

I am sure this is a very simple problem but my brain does not want to work properly today. Given a square matrix and vector I would like to perform matrix-vector operation using CUDA. I found some examples in CUDA SDK for Matrix multiplication where the dimensions of matrices are multiples of defined block size, I would like a more generalized operation if possible. Are there any examples/tutorials for the multiplication of [M-by-N] matrix by a [N-by-P] vector?

If you want to know matrix multiplication on general dimension, you can check this thread
[url="http://forums.nvidia.com/index.php?showtopic=159033&hl=Hand-Tuned+SGEMM"]http://forums.nvidia.com/index.php?showtop...and-Tuned+SGEMM[/url]

I write a report on SGEMM and discuss how to extend SGEMM to arbitrary dimension in section 8

[quote name='LSChien' post='1034861' date='Apr 6 2010, 11:06 AM']If you want to know matrix multiplication on general dimension, you can check this thread
[url="http://forums.nvidia.com/index.php?showtopic=159033&hl=Hand-Tuned+SGEMM"]http://forums.nvidia.com/index.php?showtop...and-Tuned+SGEMM[/url]

I write a report on SGEMM and discuss how to extend SGEMM to arbitrary dimension in section 8[/quote]

[quote name='LSChien' post='1034861' date='Apr 6 2010, 11:06 AM']If you want to know matrix multiplication on general dimension, you can check this thread

[quote name='LSChien' post='1034861' date='Apr 6 2010, 11:06 AM']If you want to know matrix multiplication on general dimension, you can check this thread
[url="http://forums.nvidia.com/index.php?showtopic=159033&hl=Hand-Tuned+SGEMM"]http://forums.nvidia.com/index.php?showtop...and-Tuned+SGEMM[/url]

I write a report on SGEMM and discuss how to extend SGEMM to arbitrary dimension in section 8[/quote]

Another quick question. Given that I am just trying to really learn CUDA with regards to matrix-vector operation(s) is there a simple CUDA code for matrix-vector multiplication (does not have to be all that efficient)? I like the code you built but it seems like a bit much for what I need - I just want to learn the basics of a matrix-vector multiplication on GPU.

[quote name='LSChien' post='1034861' date='Apr 6 2010, 11:06 AM']If you want to know matrix multiplication on general dimension, you can check this thread

I write a report on SGEMM and discuss how to extend SGEMM to arbitrary dimension in section 8

Another quick question. Given that I am just trying to really learn CUDA with regards to matrix-vector operation(s) is there a simple CUDA code for matrix-vector multiplication (does not have to be all that efficient)? I like the code you built but it seems like a bit much for what I need - I just want to learn the basics of a matrix-vector multiplication on GPU.

[quote name='dinaharchery' post='1035030' date='Apr 6 2010, 11:55 AM']Another quick question. Given that I am just trying to really learn CUDA with regards to matrix-vector operation(s) is there a simple CUDA code for matrix-vector multiplication (does not have to be all that efficient)? I like the code you built but it seems like a bit much for what I need - I just want to learn the basics of a matrix-vector multiplication on GPU.

Thanks again.[/quote]

for matrix-vector multiplication, you can look at reduction example in SDK.
for matrix-matrix multiplication, matrixMul in SDK uses shared memory but only works for specific dimension.
You can try to extend matrixMul in SDK to arbitrary dimension.

if you want to know how to use registers instead of shared memory, then
I strongly recommand volkov's paper and his code, you can download them in the thread
[url="http://forums.nvidia.com/index.php?showtopic=89084"]http://forums.nvidia.com/index.php?showtopic=89084[/url]

[quote name='dinaharchery' post='1035030' date='Apr 6 2010, 11:55 AM']Another quick question. Given that I am just trying to really learn CUDA with regards to matrix-vector operation(s) is there a simple CUDA code for matrix-vector multiplication (does not have to be all that efficient)? I like the code you built but it seems like a bit much for what I need - I just want to learn the basics of a matrix-vector multiplication on GPU.

Thanks again.

for matrix-vector multiplication, you can look at reduction example in SDK.

for matrix-matrix multiplication, matrixMul in SDK uses shared memory but only works for specific dimension.

You can try to extend matrixMul in SDK to arbitrary dimension.

if you want to know how to use registers instead of shared memory, then

I strongly recommand volkov's paper and his code, you can download them in the thread

Here we shared thoughts, ideas, and some code on GEMV: [url="http://forums.nvidia.com/index.php?showtopic=162330&st=20"]http://forums.nvidia.com/index.php?showtop...62330&st=20[/url]

[quote name='Jimmy Pettersson' post='1036999' date='Apr 9 2010, 03:47 PM']Here we shared thoughts, ideas, and some code on GEMV: [url="http://forums.nvidia.com/index.php?showtopic=162330&st=20"]http://forums.nvidia.com/index.php?showtop...62330&st=20[/url][/quote]

Thanks very much for all the great information.

I have another question - I hope it is not too far off topic. Can Gaussian Elimination be implemented on CUDA? I know that it contains a lot of back/forward substitutions which don't seem to leave a lot of room for parallelism via CUDA but it would be interesting to see.

I have another question - I hope it is not too far off topic. Can Gaussian Elimination be implemented on CUDA? I know that it contains a lot of back/forward substitutions which don't seem to leave a lot of room for parallelism via CUDA but it would be interesting to see.

[quote name='Jimmy Pettersson' post='1036999' date='Apr 9 2010, 03:47 PM']Here we shared thoughts, ideas, and some code on GEMV: [url="http://forums.nvidia.com/index.php?showtopic=162330&st=20"]http://forums.nvidia.com/index.php?showtop...62330&st=20[/url][/quote]

Thanks very much for all the great information.

I have another question - I hope it is not too far off topic. Can Gaussian Elimination be implemented on CUDA? I know that it contains a lot of back/forward substitutions which don't seem to leave a lot of room for parallelism via CUDA but it would be interesting to see.

I have another question - I hope it is not too far off topic. Can Gaussian Elimination be implemented on CUDA? I know that it contains a lot of back/forward substitutions which don't seem to leave a lot of room for parallelism via CUDA but it would be interesting to see.

I am sure this is a very simple problem but my brain does not want to work properly today. Given a square matrix and vector I would like to perform matrix-vector operation using CUDA. I found some examples in CUDA SDK for Matrix multiplication where the dimensions of matrices are multiples of defined block size, I would like a more generalized operation if possible. Are there any examples/tutorials for the multiplication of [M-by-N] matrix by a [N-by-P] vector?

Thank you.

I am sure this is a very simple problem but my brain does not want to work properly today. Given a square matrix and vector I would like to perform matrix-vector operation using CUDA. I found some examples in CUDA SDK for Matrix multiplication where the dimensions of matrices are multiples of defined block size, I would like a more generalized operation if possible. Are there any examples/tutorials for the multiplication of [M-by-N] matrix by a [N-by-P] vector?

Thank you.

[url="http://forums.nvidia.com/index.php?showtopic=159033&hl=Hand-Tuned+SGEMM"]http://forums.nvidia.com/index.php?showtop...and-Tuned+SGEMM[/url]

I write a report on SGEMM and discuss how to extend SGEMM to arbitrary dimension in section 8

http://forums.nvidia.com/index.php?showtop...and-Tuned+SGEMM

I write a report on SGEMM and discuss how to extend SGEMM to arbitrary dimension in section 8

Department of Mathematics, Tsing Hua university, R.O.C.

Lung Sheng Chien

[url="http://forums.nvidia.com/index.php?showtopic=159033&hl=Hand-Tuned+SGEMM"]http://forums.nvidia.com/index.php?showtop...and-Tuned+SGEMM[/url]

I write a report on SGEMM and discuss how to extend SGEMM to arbitrary dimension in section 8[/quote]

Wow, great work.

Thank you

http://forums.nvidia.com/index.php?showtop...and-Tuned+SGEMM

I write a report on SGEMM and discuss how to extend SGEMM to arbitrary dimension in section 8

Wow, great work.

Thank you

[url="http://forums.nvidia.com/index.php?showtopic=159033&hl=Hand-Tuned+SGEMM"]http://forums.nvidia.com/index.php?showtop...and-Tuned+SGEMM[/url]

I write a report on SGEMM and discuss how to extend SGEMM to arbitrary dimension in section 8[/quote]

Another quick question. Given that I am just trying to really learn CUDA with regards to matrix-vector operation(s) is there a simple CUDA code for matrix-vector multiplication (does not have to be all that efficient)? I like the code you built but it seems like a bit much for what I need - I just want to learn the basics of a matrix-vector multiplication on GPU.

Thanks again.

http://forums.nvidia.com/index.php?showtop...and-Tuned+SGEMM

I write a report on SGEMM and discuss how to extend SGEMM to arbitrary dimension in section 8

Another quick question. Given that I am just trying to really learn CUDA with regards to matrix-vector operation(s) is there a simple CUDA code for matrix-vector multiplication (does not have to be all that efficient)? I like the code you built but it seems like a bit much for what I need - I just want to learn the basics of a matrix-vector multiplication on GPU.

Thanks again.

Thanks again.[/quote]

for matrix-vector multiplication, you can look at reduction example in SDK.

for matrix-matrix multiplication, matrixMul in SDK uses shared memory but only works for specific dimension.

You can try to extend matrixMul in SDK to arbitrary dimension.

if you want to know how to use registers instead of shared memory, then

I strongly recommand volkov's paper and his code, you can download them in the thread

[url="http://forums.nvidia.com/index.php?showtopic=89084"]http://forums.nvidia.com/index.php?showtopic=89084[/url]

Thanks again.

for matrix-vector multiplication, you can look at reduction example in SDK.

for matrix-matrix multiplication, matrixMul in SDK uses shared memory but only works for specific dimension.

You can try to extend matrixMul in SDK to arbitrary dimension.

if you want to know how to use registers instead of shared memory, then

I strongly recommand volkov's paper and his code, you can download them in the thread

http://forums.nvidia.com/index.php?showtopic=89084

Department of Mathematics, Tsing Hua university, R.O.C.

Lung Sheng Chien

Thanks very much for all the great information.

I have another question - I hope it is not too far off topic. Can Gaussian Elimination be implemented on CUDA? I know that it contains a lot of back/forward substitutions which don't seem to leave a lot of room for parallelism via CUDA but it would be interesting to see.

Thanks very much for all the great information.

I have another question - I hope it is not too far off topic. Can Gaussian Elimination be implemented on CUDA? I know that it contains a lot of back/forward substitutions which don't seem to leave a lot of room for parallelism via CUDA but it would be interesting to see.

Thanks very much for all the great information.

I have another question - I hope it is not too far off topic. Can Gaussian Elimination be implemented on CUDA? I know that it contains a lot of back/forward substitutions which don't seem to leave a lot of room for parallelism via CUDA but it would be interesting to see.

Thanks very much for all the great information.