NVCaffe docker memory leak when using pycaffe

brandon846dn · May 10, 2018, 3:35pm

We are using NVCaffe to train one of our networks because it has much better grouped convolution and depthwise/pointwise convolutions. Thanks to the Nvidia team for making this possible.

Sadly, when using pycaffe inside a container based on the latest nvcr.io/nvidia/caffe:18.04-py2 image we are experiencing a memory leak.

We are using a server with three GTX 1080 cards, NVIDIA-SMI displayed below:

We are running our training scripts directly inside the docker by first opening bash and then just running our training manually at the bash prompt. After every few iterations we can clearly see the memory usage increase.

We have put together a basic training example using pycaffe to replicate the memory leak.

In order to run this example using nvidia-docker you can do the following:

docker pull camerai/nvcr.io-nvidia-caffe-18.04-py2:mem_leak

nvidia-docker run -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 camerai/nvcr.io-nvidia-caffe-18.04-py2:mem_leak /bin/bash

cd mem_leak_test

python job.py <gpu_num>

This initialises a very simple cnn and enters a training loop where a python data layer is called to load a single image and label every time net.forward is called.

example output:

iteration: 500
| ID | GPU | MEM |

| 0 | 33% | 7% |
| 1 | 0% | 0% |
| 2 | 0% | 0% |
iteration: 1000
| ID | GPU | MEM |

| 0 | 28% | 8% |
| 1 | 0% | 0% |
| 2 | 0% | 0% |
iteration: 1500
| ID | GPU | MEM |

| 0 | 27% | 9% |
| 1 | 0% | 0% |
| 2 | 0% | 0% |

We have tried deleting the pycaffe solver object and then reloading the network from the last snapshot (which should free the net and blob data) but unfortunately this does not free any memory and it continues to increase until finally the job crashes (Check failed: error == cudaSuccess (2 vs. 0) out of memory).

Any help and advice on how to proceed to debug or fix the situation would be really appreciated.

NVCaffe docker memory leak when using pycaffe

iteration: 500 | ID | GPU | MEM |

| 0 | 33% | 7% | | 1 | 0% | 0% | | 2 | 0% | 0% | iteration: 1000 | ID | GPU | MEM |

| 0 | 28% | 8% | | 1 | 0% | 0% | | 2 | 0% | 0% | iteration: 1500 | ID | GPU | MEM |

iteration: 500
| ID | GPU | MEM |

| 0 | 33% | 7% |
| 1 | 0% | 0% |
| 2 | 0% | 0% |
iteration: 1000
| ID | GPU | MEM |

| 0 | 28% | 8% |
| 1 | 0% | 0% |
| 2 | 0% | 0% |
iteration: 1500
| ID | GPU | MEM |