matrix multiplication for large matrices
Can anyone tell me how to multiply a matrix of 200000 by 200000 matrix with 200000 by 200000 using shared memory and tiling? the examples given in programming guide or cuda by example does not support for matrices whose size is more than 1024. Or is it not possible to use shared memory with tiling for such large matrices? Is it necessary to launch grid in the order of resultant matrix? Thanks in advance :)
Can anyone tell me how to multiply a matrix of 200000 by 200000 matrix with 200000 by 200000 using shared memory and tiling? the examples given in programming guide or cuda by example does not support for matrices whose size is more than 1024. Or is it not possible to use shared memory with tiling for such large matrices? Is it necessary to launch grid in the order of resultant matrix? Thanks in advance :)

#1
Posted 08/14/2011 09:01 AM   
The matrix product you are asking about requires about 480Gb of memory in single precision, 960Gb in double precision. I would be much more worried about how to do this on a device with a maximum of 6Gb of ram, rather than any of the intricacies of the CUDA implementation.
The matrix product you are asking about requires about 480Gb of memory in single precision, 960Gb in double precision. I would be much more worried about how to do this on a device with a maximum of 6Gb of ram, rather than any of the intricacies of the CUDA implementation.

#2
Posted 08/14/2011 02:49 PM   
Matrix Multiplication is a blocked algorithm is it not?So you can use streaming,althought you have to stream chuncks from the hard Drive to main memory and then to card.
Matrix Multiplication is a blocked algorithm is it not?So you can use streaming,althought you have to stream chuncks from the hard Drive to main memory and then to card.

#3
Posted 08/22/2011 02:21 PM   
Of course, but the mechanics of that sort of out-of-core gemm implementation completely dwarf the minutae of what goes on in the GPU. At that size, it would be folly to use anything other than CUBLAS or MagmaBLAS for the GPU gemm kernel.
Of course, but the mechanics of that sort of out-of-core gemm implementation completely dwarf the minutae of what goes on in the GPU. At that size, it would be folly to use anything other than CUBLAS or MagmaBLAS for the GPU gemm kernel.

#4
Posted 08/22/2011 02:37 PM   
Scroll To Top