Host memory consumption spike with cudamalloc

Hello,

I have a GEForce 960M chip on a ubuntu 14.04 laptop.

The recursive function below allocates device memory. However, I see host memory spiking to 8G from 1.5G or so. (confirmed by running free -g). nvidia-smi shows device memory consumption increasing in parallel with host mem consumption until it hits 1.9G.

The host mem consumption disappears if the cudaMalloc lines are commented out. This rules out recursion as the cause of host memory spike.

------------------------------- CODE ----------------------------------------------

Node * copyTreeToGPU() {

if(hasChildren) {

// Copy Children to GPU
for(int i = 0; i < 8; i++) {
  d_ptr[i] = ptr[i]->copyTreeToGPU();
}

// Copy self
Node *d_ptrtmp;
CHECK(cudaMalloc((Node **) &d_ptrtmp, sizeof(Node)));
//CHECK(cudaMemcpy(d_ptrtmp,this,sizeof(Node),cudaMemcpyHostToDevice));
copiedToGPU = true;
return d_ptrtmp;

} else {

// Copy self
Node *d_ptrtmp;
CHECK(cudaMalloc((Node **) &d_ptrtmp, sizeof(Node)));
//CHECK(cudaMemcpy(d_ptrtmp,this,sizeof(Node),cudaMemcpyHostToDevice));
copiedToGPU = true;
return d_ptrtmp;

}

} // copy from host to GPU