Hi ! I'm totally new with CUDA. I've read the CUDA C Programming guide (cuda 4.0) and I found a part (3.2.3) which described Shared Memory through Matrix Multiplication. However I don't get how to use the stride efficiently.
This was the struct used:

// M(row,col)=*(M.elements + row *M.stride + col)
typedef struct {
int width;
int height;
int stride;
float* elements;
} Matrix;

Is it possible to find a main which used this struct and the kernels proposed in this document ? The samples provided don't use this struct. Something not to difficult to understand (without safeCall...) and using CUDA runtime, I'm still a student.

I don't know how to choose parameters well enough (width, height, SIZE_BLOCK...) to get good performances with GPU. I've got a GPU with 2.1 compute capability. If you need further information, please do not hesitate !

Hi ! I'm totally new with CUDA. I've read the CUDA C Programming guide (cuda 4.0) and I found a part (3.2.3) which described Shared Memory through Matrix Multiplication. However I don't get how to use the stride efficiently.

This was the struct used:

// M(row,col)=*(M.elements + row *M.stride + col)

typedef struct {

int width;

int height;

int stride;

float* elements;

} Matrix;

Is it possible to find a main which used this struct and the kernels proposed in this document ? The samples provided don't use this struct. Something not to difficult to understand (without safeCall...) and using CUDA runtime, I'm still a student.

I don't know how to choose parameters well enough (width, height, SIZE_BLOCK...) to get good performances with GPU. I've got a GPU with 2.1 compute capability. If you need further information, please do not hesitate !

[quote name='Frstdies' date='21 June 2011 - 10:19 AM' timestamp='1308669580' post='1254783']
Hi ! I'm totally new with CUDA. I've read the CUDA C Programming guide (cuda 4.0) and I found a part (3.2.3) which described Shared Memory through Matrix Multiplication. However I don't get how to use the stride efficiently.
This was the struct used:

// M(row,col)=*(M.elements + row *M.stride + col)
typedef struct {
int width;
int height;
int stride;
float* elements;
} Matrix;

Is it possible to find a main which used this struct and the kernels proposed in this document ? The samples provided don't use this struct. Something not to difficult to understand (without safeCall...) and using CUDA runtime, I'm still a student.

I don't know how to choose parameters well enough (width, height, SIZE_BLOCK...) to get good performances with GPU. I've got a GPU with 2.1 compute capability. If you need further information, please do not hesitate !
[/quote]

I was trying to understand that example just 2 weeks ago I guess. I wrote a main function for it. This code is not perfect sample to show the performance of CUDA but it can help you to understand it. Good Luck!

[quote name='Frstdies' date='21 June 2011 - 10:19 AM' timestamp='1308669580' post='1254783']

Hi ! I'm totally new with CUDA. I've read the CUDA C Programming guide (cuda 4.0) and I found a part (3.2.3) which described Shared Memory through Matrix Multiplication. However I don't get how to use the stride efficiently.

This was the struct used:

// M(row,col)=*(M.elements + row *M.stride + col)

typedef struct {

int width;

int height;

int stride;

float* elements;

} Matrix;

Is it possible to find a main which used this struct and the kernels proposed in this document ? The samples provided don't use this struct. Something not to difficult to understand (without safeCall...) and using CUDA runtime, I'm still a student.

I don't know how to choose parameters well enough (width, height, SIZE_BLOCK...) to get good performances with GPU. I've got a GPU with 2.1 compute capability. If you need further information, please do not hesitate !

I was trying to understand that example just 2 weeks ago I guess. I wrote a main function for it. This code is not perfect sample to show the performance of CUDA but it can help you to understand it. Good Luck!

This was the struct used:

// M(row,col)=*(M.elements + row *M.stride + col)

typedef struct {

int width;

int height;

int stride;

float* elements;

} Matrix;

Is it possible to find a main which used this struct and the kernels proposed in this document ? The samples provided don't use this struct. Something not to difficult to understand (without safeCall...) and using CUDA runtime, I'm still a student.

I don't know how to choose parameters well enough (width, height, SIZE_BLOCK...) to get good performances with GPU. I've got a GPU with 2.1 compute capability. If you need further information, please do not hesitate !

This was the struct used:

// M(row,col)=*(M.elements + row *M.stride + col)

typedef struct {

int width;

int height;

int stride;

float* elements;

} Matrix;

Is it possible to find a main which used this struct and the kernels proposed in this document ? The samples provided don't use this struct. Something not to difficult to understand (without safeCall...) and using CUDA runtime, I'm still a student.

I don't know how to choose parameters well enough (width, height, SIZE_BLOCK...) to get good performances with GPU. I've got a GPU with 2.1 compute capability. If you need further information, please do not hesitate !

Hi ! I'm totally new with CUDA. I've read the CUDA C Programming guide (cuda 4.0) and I found a part (3.2.3) which described Shared Memory through Matrix Multiplication. However I don't get how to use the stride efficiently.

This was the struct used:

// M(row,col)=*(M.elements + row *M.stride + col)

typedef struct {

int width;

int height;

int stride;

float* elements;

} Matrix;

Is it possible to find a main which used this struct and the kernels proposed in this document ? The samples provided don't use this struct. Something not to difficult to understand (without safeCall...) and using CUDA runtime, I'm still a student.

I don't know how to choose parameters well enough (width, height, SIZE_BLOCK...) to get good performances with GPU. I've got a GPU with 2.1 compute capability. If you need further information, please do not hesitate !

[/quote]

I was trying to understand that example just 2 weeks ago I guess. I wrote a main function for it. This code is not perfect sample to show the performance of CUDA but it can help you to understand it. Good Luck!

[attachment=21480:matrix_shared.cu]

Hi ! I'm totally new with CUDA. I've read the CUDA C Programming guide (cuda 4.0) and I found a part (3.2.3) which described Shared Memory through Matrix Multiplication. However I don't get how to use the stride efficiently.

This was the struct used:

// M(row,col)=*(M.elements + row *M.stride + col)

typedef struct {

int width;

int height;

int stride;

float* elements;

} Matrix;

I was trying to understand that example just 2 weeks ago I guess. I wrote a main function for it. This code is not perfect sample to show the performance of CUDA but it can help you to understand it. Good Luck!

[attachment=21480:matrix_shared.cu]