Out of memory message trying to run cnn network benchmark

Hi,

I am running the tensorflow benchmark cnn network using the command:

root@7e8e9113fa85:/workspace/nvidia-examples/cnn# python3 nvcnn.py --model=vgg19 --batch_size=256 --num_gpus=3

I have increased the SHMEM allocation limit, but the system still throws the below message:
“2018-04-07 18:46:39.083434: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.”

Also, I need to know how to monitor the gpu performance when running this benchmark model, I have tried nvidia-smi getting the message “Invalid combination of input arguments”

Could you please help with both issues?

Thanks,

VS

Could you please answer the following questions to help us debug this:

  1. Does the training run fine with batch size 128?
  2. Can you please share your docker run command?
  3. Can you please share the nvidia-smi command you are using, complete with the arguments?
  4. Also, just curious, but why did you decide to use 3 gpus?

Here the requested info:

  1. Does the training run fine with batch size 128?
    For 128 batch size / 1 GPU, the training is running fine

For 128 batch size / 2 or 3 GPUs, it throws the message:
Initializing variables
Unexpected end of /proc/mounts line overlay / overlay rw,relatime,lowerdir=/var/lib/docker/overlay2/l/QMKMRAUCQ5JIVHP5GZQ3O7PCNS:/var/lib/docker/overlay2/l/Q5QRVQX2RIASGB3ON73STZWNSK:/var/lib/docker/overlay2/l/IAW4NQQBQZR3TMI6CCN5ZPILUJ:/var/lib/docker/overlay2/l/MEJKNJMDXSN2R25Q5W52QE45NW:/var/lib/docker/overlay2/l/V6MYOSGX4HD65LUB6S7PGTNGSR:/var/lib/docker/overlay2/l/UYHWSUNQ3D4UPO7UNXJHSRJXOA:/var/lib/docker/overlay2/l/FSMRQZIRD3QBN5T3TSPUXKAXQG:/var/lib/docker/overlay2/l/OW7H4HIAG66BSVO7LLGQ4MHQAM:/var/lib/docker/overlay2/l/P7R6YUXGD3O2A' Unexpected end of /proc/mounts line DTO7NF5OKQ76B:/var/lib/docker/overlay2/l/EDPAZPDQQGCI4A4NNDTNHBAQ2G:/var/lib/docker/overlay2/l/JNMAGLPUR2TK6QMNOZZXPH6C7B:/var/lib/docker/overlay2/l/N2TY3YVRIWD4EY4Z2PY3FGRFVQ:/var/lib/docker/overlay2/l/GJHVR3Q2VUZ7AAYZSMHTLR34HV:/var/lib/docker/overlay2/l/SMJOSBRISKTVURIT6SISVSDRXH:/var/lib/docker/overlay2/l/ZFJJ4777GN4XN7W6TSOOMOQFOZ:/var/lib/docker/overlay2/l/TCRRKFLD623SVIYVRYAHER7QKQ:/var/lib/docker/overlay2/l/DNTV366AGJ3C7OMR7WUE2WKQIN:/var/lib/docker/overlay2/l/V4MDBKGPSABUTYIHEFGSGQBINO:/var/lib/do’
Unexpected end of /proc/mounts line cker/overlay2/l/YN24UX4ROKGCWZS5QWPC4XGOVF:/var/lib/docker/overlay2/l/4GT7YCEOSLRQTKBNDUF4R6XCBE:/var/lib/docker/overlay2/l/6YUC5Z5NFMK4MMMLJ6BTLWEHII:/var/lib/docker/overlay2/l/2F4Y53MVPEAZBLUOU42YYO23VW:/var/lib/docker/overlay2/l/2VU36ALVHULZCQYR3NPTTPZG46:/var/lib/docker/overlay2/l/GZCTL2S2PBAE5WNHMBNREGA2TL:/var/lib/docker/overlay2/l/L2FNH3FRBSZD3KBVKZ7BV546IJ:/var/lib/docker/overlay2/l/2PFOSJ3DCMADGLMDNKPVMS2RBS:/var/lib/docker/overlay2/l/C2YR3XE2I2HF4GZG3G6B4V3VDK:/var/lib/docker/overlay2/l/JFUC65Q3A' Unexpected end of /proc/mounts line XVYU73NL7OJA3LLYV:/var/lib/docker/overlay2/l/KNHA4RWXLTKWZORPD724J5EECA:/var/lib/docker/overlay2/l/NS7TDD3SL4MCUPTWXPEHPCWNPI:/var/lib/docker/overlay2/l/XON36A36EXXGDHZAINQY7ULQ7P:/var/lib/docker/overlay2/l/PHB6AFJAFONRSRAEHPAUMDRJIJ:/var/lib/docker/overlay2/l/B4AELY72WPCNDSFDJUG5FN3ZFY:/var/lib/docker/overlay2/l/NJMK23DUY77D563JSYADNDZSNC:/var/lib/docker/overlay2/l/3QLD57BN2XZXAUP3DW6Q4JNP7D:/var/lib/docker/overlay2/l/HQYUONMSPBWLBFDE47OZ3UGGJ5,upperdir=/var/lib/docker/overlay2/176879c32e21d4c3d9b46e3e196’

For 256 batch size / 1, 2, or 3 GPUs, it throws the message:
2018-04-07 21:14:15.848048: W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

For 512 batch size /1 GPU, it throws the message:
No results, did not get past burn-in phase (20 steps)
Out of memory error detected, exiting

  1. Can you please share your docker run command?
    Here is an example:
    root@c06440ad6425:/workspace/nvidia-examples/cnn# python3 nvcnn.py --model=vgg19 --batch_size=256 --num_gpus=1

  2. Can you please share the nvidia-smi command you are using, complete with the arguments?
    nvidia-smi python3 nvcnn.py --model=vgg19 --batch_size=256 --num_gpus=1

  3. Also, just curious, but why did you decide to use 3 gpus?.
    Sure, I am benchmarking servers that have more than 1 GPU. Please let me know if there are other scripts more suitable to benchmark these kind of servers

Warnings that start “Unexpected end of /proc/mounts line `overlay / overlay …” can be ignored. They are harmless and you can ignore them. We have a fix coming that will prevent those messages being thrown.

What type of GPUs are you using? P3 Instance on AWS or something else?

I did not see an ‘nvidia-docker run …’ command in your answers above. Are you using NGC containers? This forum is for support of NGC users running NGC containers on supported platforms.

What about these messages related to the performance gain:
“W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 3.45GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.”

Or these one that didn’t give any result at all for 512 batch size /1 GPU:
“No results, did not get past burn-in phase (20 steps)
Out of memory error detected, exiting”

I am using private servers with Tesla P40 GPUs, also I am using NGC containers (nvidia-docker tensorflow), here is the run command: nvidia-docker run -it nvcr.io/nvidia/tensorflow:18.03-py3

The script used was nvcnn.py v1.4. Please let me know if there are other scripts/tools more suitable to benchmark these kind of servers, these are the metrics I need to benchmark: imags/sec , imags/watt , latency , accuracy

Thanks,

VS