handle structure in the array of device
Good morning,

I'm starting in cuda. I have a struct "Point" on my problem:

struct Point {
double x, y, z, radius;
* Point neighbors [NUM_VIZINHOS];
* Point possiveisVizinhos [NUM_POSSIVEISVIZINHOS];
}

I can copy / access / change / return A "point" when sending to a kernel.
But I can not do the same with an array of "points" with the following code.

const int size = nPoints * sizeof (Point);

cudaMalloc ((void **) & arrayPoints_d, size);
cudaMalloc ((void **) & arrayAux_d, nPoints * nPoints * sizeof (int));
cudaMemcpy (arrayPoints_d, arrayPoint, size, cudaMemcpyHostToDevice);
kernel <<< blocks, threads >>> (arrayPoints_d, nPoints, arrayAux_d);
cudaMemcpy (arrayAux, arrayAux_d, nPoints * nPoints * sizeof (int), cudaMemcpyDeviceToHost);

__global__ void kernel (Point * arrayPoints, nPoints int, int * arrayAux) {
x = int + threadIdx.x blockIdx.x blockDim.x *;
double value = 0;

if (x <nPoints) {
for (int j = 0 j <nPoints nPoints *, + + j) {
arrayPoints value = [x​​]. radius;
arrayAux [j + x * nPoints] = value;
}
}
}

I can not access the data structure in this case above.

Do not repair simple errors in code because I translated the code. I wonder if you can send arryas a struct to a device, and if possible an example of how to allocate / copy / modify / return this array. Thank you!
Good morning,



I'm starting in cuda. I have a struct "Point" on my problem:



struct Point {

double x, y, z, radius;

* Point neighbors [NUM_VIZINHOS];

* Point possiveisVizinhos [NUM_POSSIVEISVIZINHOS];

}



I can copy / access / change / return A "point" when sending to a kernel.

But I can not do the same with an array of "points" with the following code.



const int size = nPoints * sizeof (Point);



cudaMalloc ((void **) & arrayPoints_d, size);

cudaMalloc ((void **) & arrayAux_d, nPoints * nPoints * sizeof (int));

cudaMemcpy (arrayPoints_d, arrayPoint, size, cudaMemcpyHostToDevice);

kernel <<< blocks, threads >>> (arrayPoints_d, nPoints, arrayAux_d);

cudaMemcpy (arrayAux, arrayAux_d, nPoints * nPoints * sizeof (int), cudaMemcpyDeviceToHost);



__global__ void kernel (Point * arrayPoints, nPoints int, int * arrayAux) {

x = int + threadIdx.x blockIdx.x blockDim.x *;

double value = 0;



if (x <nPoints) {

for (int j = 0 j <nPoints nPoints *, + + j) {

arrayPoints value = [x​​]. radius;

arrayAux [j + x * nPoints] = value;

}

}

}



I can not access the data structure in this case above.



Do not repair simple errors in code because I translated the code. I wonder if you can send arryas a struct to a device, and if possible an example of how to allocate / copy / modify / return this array. Thank you!

#1
Posted 04/30/2012 12:29 PM   
Hi,
I don't quite understand the following:
[quote name='Kr0n' date='30 April 2012 - 01:29 PM' timestamp='1335788945' post='1402577']
Do not repair simple errors in code because I translated the code.
[/quote]Is that you just don't want to know what wrong in your code or is that this code is somehow automatically translated from a different language and you don't care about having a correct translator?

And yes, you can allocate, send, use and retrieve array of structs on and from the device, basically like this:
[code]
myStruct *h_array = (myStruct*)malloc(nbElem * sizeof(myStruct));
cudamalloc(&d_array, nbElem * sizeof(myStruct));
cudaMemcpy(d_array, h_array, nbElem * sizeof(myStruct), cudaMemcpyHostToDevice);
myKernel<<<blocks, threads>>>(d_array, nbElem);
cudaMemcpy(h_array, d_array, nbElem * sizeof(myStruct), cudaMemcpyDeviceToHost);
[/code]
Hi,

I don't quite understand the following:

[quote name='Kr0n' date='30 April 2012 - 01:29 PM' timestamp='1335788945' post='1402577']

Do not repair simple errors in code because I translated the code.

Is that you just don't want to know what wrong in your code or is that this code is somehow automatically translated from a different language and you don't care about having a correct translator?



And yes, you can allocate, send, use and retrieve array of structs on and from the device, basically like this:



myStruct *h_array = (myStruct*)malloc(nbElem * sizeof(myStruct));

cudamalloc(&d_array, nbElem * sizeof(myStruct));

cudaMemcpy(d_array, h_array, nbElem * sizeof(myStruct), cudaMemcpyHostToDevice);

myKernel<<<blocks, threads>>>(d_array, nbElem);

cudaMemcpy(h_array, d_array, nbElem * sizeof(myStruct), cudaMemcpyDeviceToHost);

#2
Posted 04/30/2012 12:57 PM   
Hi gilles_c,

sorry for the translation! I'll leave the normal code, since it is only of variables.

After some straightening out, my problem is accessing the data in my array structure within the kernel.

The condition in bold, I do not returns expected. Never returns true, and the file is loaded into the host. Not even when I put a shorter term, such as: "vetorDePontos [x]. Xmin == 2", the result is not expected. The way I accessed the data is correct? Take a look please! thank you now.

struct Ponto{
double x, y, z, raio;
Ponto* vizinhos[NUM_VIZINHOS];
Ponto* possiveisVizinhos[NUM_POSSIVEISVIZINHOS];
int quantidadeVizinhos, qtdPossiveisVizinhos;
int xmin, xmax, ymin, ymax, zmin, zmax;
int quadranteX, quadranteY, quadranteZ;
};

__global__ void localizaPossiveisVizinhosQuadrante_d(Ponto* vetorDePontos, int nbPontos, int* vetorAux){
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;

while(x < nbPontos){
for(int j=0; j<nbPontos; ++j){
if(x != j){
if( [b]( vetorDePontos[j].quadranteX <= vetorDePontos[x].xmax) && ( vetorDePontos[j].quadranteX >= vetorDePontos[x].xmin ) &&
( vetorDePontos[j].quadranteY <= vetorDePontos[x].ymax) && ( vetorDePontos[j].quadranteY >= vetorDePontos[x].ymin ) &&
( vetorDePontos[j].quadranteZ <= vetorDePontos[x].zmax) && ( vetorDePontos[j].quadranteZ >= vetorDePontos[x].zmin )[/b] )
{
vetorAux[j + x * nbPontos] = 1;
}
else
vetorAux[j + x * nbPontos] = 2;
}
else
vetorAux[j + x * nbPontos] = 3;

}
//vetorAux[x] = 1;
x += blockDim.x * gridDim.x;
}
}


int main(){
adicionaPontos(ptr, "10coordenadas20ComRaioPadrao.txt");

const int size = nbPontos * sizeof(Ponto);

Ponto* vetorDePontos_d;

int *vetorAux_d;
int* vetorAux = new int[nbPontos*nbPontos];

cudaMalloc((void**)&vetorDePontos_d, nbPontos * sizeof(Ponto));
cudaMalloc((void**)&vetorAux_d, nbPontos*nbPontos*sizeof(int));
cudaMemcpy(vetorDePontos_d, vetorDePontos, size, cudaMemcpyHostToDevice );
localizaPossiveisVizinhosQuadrante_d<<<blocks, threads>>>(vetorDePontos_d, nbPontos, vetorAux_d);

cudaMemcpy(vetorAux, vetorAux_d, nbPontos*nbPontos*sizeof(int), cudaMemcpyDeviceToHost) ;


cudaFree(vetorDePontos_d);
cudaFree(vetorAux_d);

}
Hi gilles_c,



sorry for the translation! I'll leave the normal code, since it is only of variables.



After some straightening out, my problem is accessing the data in my array structure within the kernel.



The condition in bold, I do not returns expected. Never returns true, and the file is loaded into the host. Not even when I put a shorter term, such as: "vetorDePontos [x]. Xmin == 2", the result is not expected. The way I accessed the data is correct? Take a look please! thank you now.



struct Ponto{

double x, y, z, raio;

Ponto* vizinhos[NUM_VIZINHOS];

Ponto* possiveisVizinhos[NUM_POSSIVEISVIZINHOS];

int quantidadeVizinhos, qtdPossiveisVizinhos;

int xmin, xmax, ymin, ymax, zmin, zmax;

int quadranteX, quadranteY, quadranteZ;

};



__global__ void localizaPossiveisVizinhosQuadrante_d(Ponto* vetorDePontos, int nbPontos, int* vetorAux){

int x = threadIdx.x + blockIdx.x * blockDim.x;

int y = threadIdx.y + blockIdx.y * blockDim.y;



while(x < nbPontos){

for(int j=0; j<nbPontos; ++j){

if(x != j){

if( ( vetorDePontos[j].quadranteX <= vetorDePontos[x].xmax) && ( vetorDePontos[j].quadranteX >= vetorDePontos[x].xmin ) &&

( vetorDePontos[j].quadranteY <= vetorDePontos[x].ymax) && ( vetorDePontos[j].quadranteY >= vetorDePontos[x].ymin ) &&

( vetorDePontos[j].quadranteZ <= vetorDePontos[x].zmax) && ( vetorDePontos[j].quadranteZ >= vetorDePontos[x].zmin )
)

{

vetorAux[j + x * nbPontos] = 1;

}

else

vetorAux[j + x * nbPontos] = 2;

}

else

vetorAux[j + x * nbPontos] = 3;



}

//vetorAux[x] = 1;

x += blockDim.x * gridDim.x;

}

}





int main(){

adicionaPontos(ptr, "10coordenadas20ComRaioPadrao.txt");



const int size = nbPontos * sizeof(Ponto);



Ponto* vetorDePontos_d;



int *vetorAux_d;

int* vetorAux = new int[nbPontos*nbPontos];



cudaMalloc((void**)&vetorDePontos_d, nbPontos * sizeof(Ponto));

cudaMalloc((void**)&vetorAux_d, nbPontos*nbPontos*sizeof(int));

cudaMemcpy(vetorDePontos_d, vetorDePontos, size, cudaMemcpyHostToDevice );

localizaPossiveisVizinhosQuadrante_d<<<blocks, threads>>>(vetorDePontos_d, nbPontos, vetorAux_d);



cudaMemcpy(vetorAux, vetorAux_d, nbPontos*nbPontos*sizeof(int), cudaMemcpyDeviceToHost) ;





cudaFree(vetorDePontos_d);

cudaFree(vetorAux_d);



}

#3
Posted 04/30/2012 11:58 PM   
Hi,
From what I can see in your snippet, the code looks OK. However, there are a few potential issues that the lack of corresponding code makes impossible to assess:
[list]
[*]No error checking: it might happen that your allocations just fail, or that (and it is the most likely explanation) your kernel fails, either for a some pre-launch error, or during its run. Please add the necessary error checking mechanism.
[*]No main array allocation and initialisation: are you sure you allocated and initialised properly you array "vetorDePontos" on the host side? See especially next point...
[*]Some doggy member pointers: the member pointers "vizinhos" and "possiveisVizinhos" are most likely allocated on the host, pointing to some memory on the host. If you transfer your array of struct like that and try to dereference the memory there, you might get in trouble (or not, depending on the compute capability of the device, which you don't mention)
[*]No information about thread and block sizes: here again, the sizes of your blocks and grid are a very important pieces of information that are missing to assess the validity of your code. Maybe you ask for too many blocks or too many threads... And see also the next remark.
[*]Some possible race conditions depending on the use of y dimension: your kernel defines the index "y" corresponding to a 2nd dimension index in your thread blocks. But since you only use "x" to index the work in your kernel, all threads of same "y" index will compete to access the same memory areas. In the code snippet you gave, that should only translate into poor performances, but if the code is indeed more complex, that might lead to undefined results.
[/list]
This is just what comes into my mind as possible issues / explanations to the unwanted behaviour you encounter. Bottom line is: check for errors returned by cuda calls first. In addition, a run through the memchecker and the debugger should give you some useful informations.
HTH
Hi,

From what I can see in your snippet, the code looks OK. However, there are a few potential issues that the lack of corresponding code makes impossible to assess:


  • No error checking: it might happen that your allocations just fail, or that (and it is the most likely explanation) your kernel fails, either for a some pre-launch error, or during its run. Please add the necessary error checking mechanism.
  • No main array allocation and initialisation: are you sure you allocated and initialised properly you array "vetorDePontos" on the host side? See especially next point...
  • Some doggy member pointers: the member pointers "vizinhos" and "possiveisVizinhos" are most likely allocated on the host, pointing to some memory on the host. If you transfer your array of struct like that and try to dereference the memory there, you might get in trouble (or not, depending on the compute capability of the device, which you don't mention)
  • No information about thread and block sizes: here again, the sizes of your blocks and grid are a very important pieces of information that are missing to assess the validity of your code. Maybe you ask for too many blocks or too many threads... And see also the next remark.
  • Some possible race conditions depending on the use of y dimension: your kernel defines the index "y" corresponding to a 2nd dimension index in your thread blocks. But since you only use "x" to index the work in your kernel, all threads of same "y" index will compete to access the same memory areas. In the code snippet you gave, that should only translate into poor performances, but if the code is indeed more complex, that might lead to undefined results.


This is just what comes into my mind as possible issues / explanations to the unwanted behaviour you encounter. Bottom line is: check for errors returned by cuda calls first. In addition, a run through the memchecker and the debugger should give you some useful informations.

HTH

#4
Posted 05/01/2012 05:53 AM   
Hi Gilles,

I found two mistakes with his tips:

- Through the treatment of exceptions found that "const int size = nbPontos * sizeof (Point)" was not a valid argument for cudaMemcpy. I removed the variable and put direct heat, corrected the error.
- How you gave me sure that the code was right on the device, I find mistakes in the old code which make the comparisons of results. I found a mistake in passing the parameter.

For now it is. Thanks for your attention, was a great help for this novice; [
Hi Gilles,



I found two mistakes with his tips:



- Through the treatment of exceptions found that "const int size = nbPontos * sizeof (Point)" was not a valid argument for cudaMemcpy. I removed the variable and put direct heat, corrected the error.

- How you gave me sure that the code was right on the device, I find mistakes in the old code which make the comparisons of results. I found a mistake in passing the parameter.



For now it is. Thanks for your attention, was a great help for this novice; [

#5
Posted 05/01/2012 03:50 PM   
Scroll To Top