I have two problems which make me so confuse and try hard these days
- As title, I ran a performance test with my NVDIA Tesla M2090. The only thing I did was create new object insider kernel, nothing more. And the result I got was terrible which make me so surprise. Here’s my code:
template <class T>
class TArray
{
protected:
int m_Size;
int m_Capacity;
int m_GrowBy;
T* m_pData;
public:
template <class T>
__host__ __device__ TArray<T>::TArray(int growby)
{
m_Size = 0; // empty to start
m_Capacity = 0; // remember the size
m_GrowBy = growby;
m_pData = NULL;
}
}
struct Rect2D
{
real64 left;
real64 top;
real64 right;
real64 bottom;
};
class GERect2D : public Rect2D
{
public:
__host__ __device__ GERect2D() {
left = 0;
top = 0;
right = 0;
bottom = 0;
}
}
class GELineString
{
protected:
GERect2D* m_pMBR;
TArray<Point2D>* m_parrPoint;
bool m_bContain;
bool m_bRTree;
TRTree<int32, real64, 2> *m_pRT;
public:
__host__ __device__ GELineString(int32 growby = 100) {
m_parrPoint = new TArray<Point2D>(growby);
m_pMBR = new GERect2D();
m_bContain = false;
m_bRTree = false;
m_pRT = NULL;
}
}
//My kernel
__global__ void kernel_BuildTree() {
int tid = threadIdx.x + blockIdx.x* blockDim.x;
GELineString *pLs = NULL;
for (int i = 0; i < 1000; i++)
{
pLs = new GELineString(17);
delete pLs;
}
}
void main() {
kernel_BuildTree<<<1,128>>>();
cudaDeviceSynchronize();
}
The result which I never think about when I ran it is about 18 seconds, while it’s only not over 1 milisecond when I ran in CPU. Moreover, when I increased the number in for loop to 10,000 (just 10 times from 1000), the time I got was not 180 seconds (10 times from 18 seconds) as I was expected, it was about 10 minutes
I’ve read a lot about the warp divergence, occupancy, memory hierarchy of GPU but still can’t explain it, so confuse. Maybe there is a limitation when using heap in GPU?
- I want to port the RTree structure to GPU. I’ve read a lot of papers about RTree implemented on GPU but seem like all of it based on array, not pointer. So maybe pointer is not well-suited for GPU? Do you know any library about tree structure based on pointers on GPU?
Hope to see your help!