W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at pack_op.cc:88 : Resource exhausted: OOM when allocating tensor with shape[32,3,2160,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

2019-11-28 12:13:31.886802: W tensorflow/core/common_runtime/bfc_allocator.cc:271] ************************************_______________________________________________****____*********
2019-11-28 12:13:31.886830: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at fused_batch_norm_op.cc:574 : Resource exhausted: OOM when allocating tensor with shape[4,64,1080,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 37, in main
  File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
  File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "./detectnet_v2/scripts/train.py", line 632, in main
  File "./detectnet_v2/scripts/train.py", line 556, in run_experiment
  File "./detectnet_v2/scripts/train.py", line 490, in train_gridbox
  File "./detectnet_v2/scripts/train.py", line 136, in run_training_loop
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 676, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1270, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1327, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1091, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4,64,1080,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node resnet50_nopool_bn_detectnet_v2/bn_conv1/FusedBatchNorm (defined at /usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py:1839) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[node resnet50_nopool_bn_detectnet_v2/block_2b_bn_2/AssignMovingAvg_1 (defined at /opt/nvidia/third_party/keras/tensorflow_backend.py:186) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


Caused by op u'resnet50_nopool_bn_detectnet_v2/bn_conv1/FusedBatchNorm', defined at:
  File "/usr/local/bin/tlt-train-g1", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 37, in main
  File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
  File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "./detectnet_v2/scripts/train.py", line 632, in main
  File "./detectnet_v2/scripts/train.py", line 556, in run_experiment
  File "./detectnet_v2/scripts/train.py", line 466, in train_gridbox
  File "./detectnet_v2/scripts/train.py", line 320, in build_training_graph
  File "./detectnet_v2/model/detectnet_model.py", line 470, in build_training_graph
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 564, in call
    output_tensors, _, _ = self.run_internal_graph(inputs, masks)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 721, in run_internal_graph
    layer.call(computed_tensor, **kwargs))
  File "/opt/nvidia/third_party/keras/mixed_precision.py", line 181, in _batch_normalization_call
    epsilon=self.epsilon)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1864, in normalize_batch_in_training
    epsilon=epsilon)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1839, in _fused_normalize_batch_in_training
    data_format=tf_data_format)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_impl.py", line 1182, in fused_batch_norm
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 3756, in _fused_batch_norm
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4,64,1080,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node resnet50_nopool_bn_detectnet_v2/bn_conv1/FusedBatchNorm (defined at /usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py:1839) ]]

https://github.com/tensorflow/models/issues/1993 I have gone through this link, hecnce i reduced the batch_size_per_gpu: into 2 from 16. But getting the same error after reducing the batch size.
DetectNet_V2_train_config.txt (9.21 KB)

Hi samjith888,
Could you paste the command along with the full log?

[code]root@3b0b9c604317:/home/samjth/NVIDIA_Transfer_Learning _Toolkit# tlt-train detectnet_v2 -e specs/train_config.txt -r result -k KEY
Using TensorFlow backend.
2019-11-29 04:18:55.397607: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-29 04:18:55.477263: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-29 04:18:55.477696: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x698d520 executing computations on platform CUDA. Devices:
2019-11-29 04:18:55.477720: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1070, Compute Capability 6.1
2019-11-29 04:18:55.479598: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz
2019-11-29 04:18:55.480523: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x6aa9c10 executing computations on platform Host. Devices:
2019-11-29 04:18:55.480554: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): ,
2019-11-29 04:18:55.480685: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7465
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.67GiB
2019-11-29 04:18:55.480715: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-29 04:18:55.481234: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-29 04:18:55.481249: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-11-29 04:18:55.481256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-11-29 04:18:55.481311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7461 MB memory) → physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-11-29 04:18:55,482 [INFO] iva.detectnet_v2.scripts.train: Loading experiment spec at specs/train_config.txt.
2019-11-29 04:18:55,483 [INFO] iva.detectnet_v2.spec_handler.spec_loader: Merging specification from specs/train_config.txt
WARNING:tensorflow:From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
tf.data.TFRecordDataset(path)
2019-11-29 04:18:55,494 [WARNING] tensorflow: From ./detectnet_v2/dataloader/utilities.py:114: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and:
tf.data.TFRecordDataset(path)
2019-11-29 04:18:55,552 [INFO] iva.detectnet_v2.scripts.train: Cannot iterate over exactly 1562 samples with a batch size of 4; each epoch will therefore take one extra step.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-11-29 04:18:55,558 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/horovod/tensorflow/init.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
2019-11-29 04:18:55,570 [WARNING] tensorflow: From /usr/local/lib/python2.7/dist-packages/horovod/tensorflow/init.py:91: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) (None, 3, 2160, 4096 0


conv1 (Conv2D) (None, 64, 1080, 204 9472 input_1[0][0]


bn_conv1 (BatchNormalization) (None, 64, 1080, 204 256 conv1[0][0]


activation_1 (Activation) (None, 64, 1080, 204 0 bn_conv1[0][0]


block_1a_conv_1 (Conv2D) (None, 64, 540, 1024 4160 activation_1[0][0]


block_1a_bn_1 (BatchNormalizati (None, 64, 540, 1024 256 block_1a_conv_1[0][0]


activation_2 (Activation) (None, 64, 540, 1024 0 block_1a_bn_1[0][0]


block_1a_conv_2 (Conv2D) (None, 64, 540, 1024 36928 activation_2[0][0]


block_1a_bn_2 (BatchNormalizati (None, 64, 540, 1024 256 block_1a_conv_2[0][0]


activation_3 (Activation) (None, 64, 540, 1024 0 block_1a_bn_2[0][0]


block_1a_conv_3 (Conv2D) (None, 256, 540, 102 16640 activation_3[0][0]


block_1a_conv_shortcut (Conv2D) (None, 256, 540, 102 16640 activation_1[0][0]


block_1a_bn_3 (BatchNormalizati (None, 256, 540, 102 1024 block_1a_conv_3[0][0]


block_1a_bn_shortcut (BatchNorm (None, 256, 540, 102 1024 block_1a_conv_shortcut[0][0]


add_1 (Add) (None, 256, 540, 102 0 block_1a_bn_3[0][0]
block_1a_bn_shortcut[0][0]


activation_4 (Activation) (None, 256, 540, 102 0 add_1[0][0]


block_1b_conv_1 (Conv2D) (None, 64, 540, 1024 16448 activation_4[0][0]


block_1b_bn_1 (BatchNormalizati (None, 64, 540, 1024 256 block_1b_conv_1[0][0]


activation_5 (Activation) (None, 64, 540, 1024 0 block_1b_bn_1[0][0]


block_1b_conv_2 (Conv2D) (None, 64, 540, 1024 36928 activation_5[0][0]


block_1b_bn_2 (BatchNormalizati (None, 64, 540, 1024 256 block_1b_conv_2[0][0]


activation_6 (Activation) (None, 64, 540, 1024 0 block_1b_bn_2[0][0]


block_1b_conv_3 (Conv2D) (None, 256, 540, 102 16640 activation_6[0][0]


block_1b_bn_3 (BatchNormalizati (None, 256, 540, 102 1024 block_1b_conv_3[0][0]


add_2 (Add) (None, 256, 540, 102 0 block_1b_bn_3[0][0]
activation_4[0][0]


activation_7 (Activation) (None, 256, 540, 102 0 add_2[0][0]


block_1c_conv_1 (Conv2D) (None, 64, 540, 1024 16448 activation_7[0][0]


block_1c_bn_1 (BatchNormalizati (None, 64, 540, 1024 256 block_1c_conv_1[0][0]


activation_8 (Activation) (None, 64, 540, 1024 0 block_1c_bn_1[0][0]


block_1c_conv_2 (Conv2D) (None, 64, 540, 1024 36928 activation_8[0][0]


block_1c_bn_2 (BatchNormalizati (None, 64, 540, 1024 256 block_1c_conv_2[0][0]


activation_9 (Activation) (None, 64, 540, 1024 0 block_1c_bn_2[0][0]


block_1c_conv_3 (Conv2D) (None, 256, 540, 102 16640 activation_9[0][0]


block_1c_bn_3 (BatchNormalizati (None, 256, 540, 102 1024 block_1c_conv_3[0][0]


add_3 (Add) (None, 256, 540, 102 0 block_1c_bn_3[0][0]
activation_7[0][0]


activation_10 (Activation) (None, 256, 540, 102 0 add_3[0][0]


block_2a_conv_1 (Conv2D) (None, 128, 270, 512 32896 activation_10[0][0]


block_2a_bn_1 (BatchNormalizati (None, 128, 270, 512 512 block_2a_conv_1[0][0]


activation_11 (Activation) (None, 128, 270, 512 0 block_2a_bn_1[0][0]


block_2a_conv_2 (Conv2D) (None, 128, 270, 512 147584 activation_11[0][0]


block_2a_bn_2 (BatchNormalizati (None, 128, 270, 512 512 block_2a_conv_2[0][0]


activation_12 (Activation) (None, 128, 270, 512 0 block_2a_bn_2[0][0]


block_2a_conv_3 (Conv2D) (None, 512, 270, 512 66048 activation_12[0][0]


block_2a_conv_shortcut (Conv2D) (None, 512, 270, 512 131584 activation_10[0][0]


block_2a_bn_3 (BatchNormalizati (None, 512, 270, 512 2048 block_2a_conv_3[0][0]


block_2a_bn_shortcut (BatchNorm (None, 512, 270, 512 2048 block_2a_conv_shortcut[0][0]


add_4 (Add) (None, 512, 270, 512 0 block_2a_bn_3[0][0]
block_2a_bn_shortcut[0][0]


activation_13 (Activation) (None, 512, 270, 512 0 add_4[0][0]


block_2b_conv_1 (Conv2D) (None, 128, 270, 512 65664 activation_13[0][0]


block_2b_bn_1 (BatchNormalizati (None, 128, 270, 512 512 block_2b_conv_1[0][0]


activation_14 (Activation) (None, 128, 270, 512 0 block_2b_bn_1[0][0]


block_2b_conv_2 (Conv2D) (None, 128, 270, 512 147584 activation_14[0][0]


block_2b_bn_2 (BatchNormalizati (None, 128, 270, 512 512 block_2b_conv_2[0][0]


activation_15 (Activation) (None, 128, 270, 512 0 block_2b_bn_2[0][0]


block_2b_conv_3 (Conv2D) (None, 512, 270, 512 66048 activation_15[0][0]


block_2b_bn_3 (BatchNormalizati (None, 512, 270, 512 2048 block_2b_conv_3[0][0]


add_5 (Add) (None, 512, 270, 512 0 block_2b_bn_3[0][0]
activation_13[0][0]


activation_16 (Activation) (None, 512, 270, 512 0 add_5[0][0]


block_2c_conv_1 (Conv2D) (None, 128, 270, 512 65664 activation_16[0][0]


block_2c_bn_1 (BatchNormalizati (None, 128, 270, 512 512 block_2c_conv_1[0][0]


activation_17 (Activation) (None, 128, 270, 512 0 block_2c_bn_1[0][0]


block_2c_conv_2 (Conv2D) (None, 128, 270, 512 147584 activation_17[0][0]


block_2c_bn_2 (BatchNormalizati (None, 128, 270, 512 512 block_2c_conv_2[0][0]


activation_18 (Activation) (None, 128, 270, 512 0 block_2c_bn_2[0][0]


block_2c_conv_3 (Conv2D) (None, 512, 270, 512 66048 activation_18[0][0]


block_2c_bn_3 (BatchNormalizati (None, 512, 270, 512 2048 block_2c_conv_3[0][0]


add_6 (Add) (None, 512, 270, 512 0 block_2c_bn_3[0][0]
activation_16[0][0]


activation_19 (Activation) (None, 512, 270, 512 0 add_6[0][0]


block_2d_conv_1 (Conv2D) (None, 128, 270, 512 65664 activation_19[0][0]


block_2d_bn_1 (BatchNormalizati (None, 128, 270, 512 512 block_2d_conv_1[0][0]


activation_20 (Activation) (None, 128, 270, 512 0 block_2d_bn_1[0][0]


block_2d_conv_2 (Conv2D) (None, 128, 270, 512 147584 activation_20[0][0]


block_2d_bn_2 (BatchNormalizati (None, 128, 270, 512 512 block_2d_conv_2[0][0]


activation_21 (Activation) (None, 128, 270, 512 0 block_2d_bn_2[0][0]


block_2d_conv_3 (Conv2D) (None, 512, 270, 512 66048 activation_21[0][0]


block_2d_bn_3 (BatchNormalizati (None, 512, 270, 512 2048 block_2d_conv_3[0][0]


add_7 (Add) (None, 512, 270, 512 0 block_2d_bn_3[0][0]
activation_19[0][0]


activation_22 (Activation) (None, 512, 270, 512 0 add_7[0][0]


block_3a_conv_1 (Conv2D) (None, 256, 135, 256 131328 activation_22[0][0]


block_3a_bn_1 (BatchNormalizati (None, 256, 135, 256 1024 block_3a_conv_1[0][0]


activation_23 (Activation) (None, 256, 135, 256 0 block_3a_bn_1[0][0]


block_3a_conv_2 (Conv2D) (None, 256, 135, 256 590080 activation_23[0][0]


block_3a_bn_2 (BatchNormalizati (None, 256, 135, 256 1024 block_3a_conv_2[0][0]


activation_24 (Activation) (None, 256, 135, 256 0 block_3a_bn_2[0][0]


block_3a_conv_3 (Conv2D) (None, 1024, 135, 25 263168 activation_24[0][0]


block_3a_conv_shortcut (Conv2D) (None, 1024, 135, 25 525312 activation_22[0][0]


block_3a_bn_3 (BatchNormalizati (None, 1024, 135, 25 4096 block_3a_conv_3[0][0]


block_3a_bn_shortcut (BatchNorm (None, 1024, 135, 25 4096 block_3a_conv_shortcut[0][0]


add_8 (Add) (None, 1024, 135, 25 0 block_3a_bn_3[0][0]
block_3a_bn_shortcut[0][0]


activation_25 (Activation) (None, 1024, 135, 25 0 add_8[0][0]


block_3b_conv_1 (Conv2D) (None, 256, 135, 256 262400 activation_25[0][0]


block_3b_bn_1 (BatchNormalizati (None, 256, 135, 256 1024 block_3b_conv_1[0][0]


activation_26 (Activation) (None, 256, 135, 256 0 block_3b_bn_1[0][0]


block_3b_conv_2 (Conv2D) (None, 256, 135, 256 590080 activation_26[0][0]


block_3b_bn_2 (BatchNormalizati (None, 256, 135, 256 1024 block_3b_conv_2[0][0]


activation_27 (Activation) (None, 256, 135, 256 0 block_3b_bn_2[0][0]


block_3b_conv_3 (Conv2D) (None, 1024, 135, 25 263168 activation_27[0][0]


block_3b_bn_3 (BatchNormalizati (None, 1024, 135, 25 4096 block_3b_conv_3[0][0]


add_9 (Add) (None, 1024, 135, 25 0 block_3b_bn_3[0][0]
activation_25[0][0]


activation_28 (Activation) (None, 1024, 135, 25 0 add_9[0][0]


block_3c_conv_1 (Conv2D) (None, 256, 135, 256 262400 activation_28[0][0]


block_3c_bn_1 (BatchNormalizati (None, 256, 135, 256 1024 block_3c_conv_1[0][0]


activation_29 (Activation) (None, 256, 135, 256 0 block_3c_bn_1[0][0]


block_3c_conv_2 (Conv2D) (None, 256, 135, 256 590080 activation_29[0][0]


block_3c_bn_2 (BatchNormalizati (None, 256, 135, 256 1024 block_3c_conv_2[0][0]


activation_30 (Activation) (None, 256, 135, 256 0 block_3c_bn_2[0][0]


block_3c_conv_3 (Conv2D) (None, 1024, 135, 25 263168 activation_30[0][0]


block_3c_bn_3 (BatchNormalizati (None, 1024, 135, 25 4096 block_3c_conv_3[0][0]


add_10 (Add) (None, 1024, 135, 25 0 block_3c_bn_3[0][0]
activation_28[0][0]


activation_31 (Activation) (None, 1024, 135, 25 0 add_10[0][0]


block_3d_conv_1 (Conv2D) (None, 256, 135, 256 262400 activation_31[0][0]


block_3d_bn_1 (BatchNormalizati (None, 256, 135, 256 1024 block_3d_conv_1[0][0]


activation_32 (Activation) (None, 256, 135, 256 0 block_3d_bn_1[0][0]


block_3d_conv_2 (Conv2D) (None, 256, 135, 256 590080 activation_32[0][0]


block_3d_bn_2 (BatchNormalizati (None, 256, 135, 256 1024 block_3d_conv_2[0][0]


activation_33 (Activation) (None, 256, 135, 256 0 block_3d_bn_2[0][0]


block_3d_conv_3 (Conv2D) (None, 1024, 135, 25 263168 activation_33[0][0]


block_3d_bn_3 (BatchNormalizati (None, 1024, 135, 25 4096 block_3d_conv_3[0][0]


add_11 (Add) (None, 1024, 135, 25 0 block_3d_bn_3[0][0]
activation_31[0][0]


activation_34 (Activation) (None, 1024, 135, 25 0 add_11[0][0]


block_3e_conv_1 (Conv2D) (None, 256, 135, 256 262400 activation_34[0][0]


block_3e_bn_1 (BatchNormalizati (None, 256, 135, 256 1024 block_3e_conv_1[0][0]


activation_35 (Activation) (None, 256, 135, 256 0 block_3e_bn_1[0][0]


block_3e_conv_2 (Conv2D) (None, 256, 135, 256 590080 activation_35[0][0]


block_3e_bn_2 (BatchNormalizati (None, 256, 135, 256 1024 block_3e_conv_2[0][0]


activation_36 (Activation) (None, 256, 135, 256 0 block_3e_bn_2[0][0]


block_3e_conv_3 (Conv2D) (None, 1024, 135, 25 263168 activation_36[0][0]


block_3e_bn_3 (BatchNormalizati (None, 1024, 135, 25 4096 block_3e_conv_3[0][0]


add_12 (Add) (None, 1024, 135, 25 0 block_3e_bn_3[0][0]
activation_34[0][0]


activation_37 (Activation) (None, 1024, 135, 25 0 add_12[0][0]


block_3f_conv_1 (Conv2D) (None, 256, 135, 256 262400 activation_37[0][0]


block_3f_bn_1 (BatchNormalizati (None, 256, 135, 256 1024 block_3f_conv_1[0][0]


activation_38 (Activation) (None, 256, 135, 256 0 block_3f_bn_1[0][0]


block_3f_conv_2 (Conv2D) (None, 256, 135, 256 590080 activation_38[0][0]


block_3f_bn_2 (BatchNormalizati (None, 256, 135, 256 1024 block_3f_conv_2[0][0]


activation_39 (Activation) (None, 256, 135, 256 0 block_3f_bn_2[0][0]


block_3f_conv_3 (Conv2D) (None, 1024, 135, 25 263168 activation_39[0][0]


block_3f_bn_3 (BatchNormalizati (None, 1024, 135, 25 4096 block_3f_conv_3[0][0]


add_13 (Add) (None, 1024, 135, 25 0 block_3f_bn_3[0][0]
activation_37[0][0]


activation_40 (Activation) (None, 1024, 135, 25 0 add_13[0][0]


block_4a_conv_1 (Conv2D) (None, 512, 135, 256 524800 activation_40[0][0]


block_4a_bn_1 (BatchNormalizati (None, 512, 135, 256 2048 block_4a_conv_1[0][0]


activation_41 (Activation) (None, 512, 135, 256 0 block_4a_bn_1[0][0]


block_4a_conv_2 (Conv2D) (None, 512, 135, 256 2359808 activation_41[0][0]


block_4a_bn_2 (BatchNormalizati (None, 512, 135, 256 2048 block_4a_conv_2[0][0]


activation_42 (Activation) (None, 512, 135, 256 0 block_4a_bn_2[0][0]


block_4a_conv_3 (Conv2D) (None, 2048, 135, 25 1050624 activation_42[0][0]


block_4a_conv_shortcut (Conv2D) (None, 2048, 135, 25 2099200 activation_40[0][0]


block_4a_bn_3 (BatchNormalizati (None, 2048, 135, 25 8192 block_4a_conv_3[0][0]


block_4a_bn_shortcut (BatchNorm (None, 2048, 135, 25 8192 block_4a_conv_shortcut[0][0]


add_14 (Add) (None, 2048, 135, 25 0 block_4a_bn_3[0][0]
block_4a_bn_shortcut[0][0]


activation_43 (Activation) (None, 2048, 135, 25 0 add_14[0][0]


block_4b_conv_1 (Conv2D) (None, 512, 135, 256 1049088 activation_43[0][0]


block_4b_bn_1 (BatchNormalizati (None, 512, 135, 256 2048 block_4b_conv_1[0][0]


activation_44 (Activation) (None, 512, 135, 256 0 block_4b_bn_1[0][0]


block_4b_conv_2 (Conv2D) (None, 512, 135, 256 2359808 activation_44[0][0]


block_4b_bn_2 (BatchNormalizati (None, 512, 135, 256 2048 block_4b_conv_2[0][0]


activation_45 (Activation) (None, 512, 135, 256 0 block_4b_bn_2[0][0]


block_4b_conv_3 (Conv2D) (None, 2048, 135, 25 1050624 activation_45[0][0]


block_4b_bn_3 (BatchNormalizati (None, 2048, 135, 25 8192 block_4b_conv_3[0][0]


add_15 (Add) (None, 2048, 135, 25 0 block_4b_bn_3[0][0]
activation_43[0][0]


activation_46 (Activation) (None, 2048, 135, 25 0 add_15[0][0]


block_4c_conv_1 (Conv2D) (None, 512, 135, 256 1049088 activation_46[0][0]


block_4c_bn_1 (BatchNormalizati (None, 512, 135, 256 2048 block_4c_conv_1[0][0]


activation_47 (Activation) (None, 512, 135, 256 0 block_4c_bn_1[0][0]


block_4c_conv_2 (Conv2D) (None, 512, 135, 256 2359808 activation_47[0][0]


block_4c_bn_2 (BatchNormalizati (None, 512, 135, 256 2048 block_4c_conv_2[0][0]


activation_48 (Activation) (None, 512, 135, 256 0 block_4c_bn_2[0][0]


block_4c_conv_3 (Conv2D) (None, 2048, 135, 25 1050624 activation_48[0][0]


block_4c_bn_3 (BatchNormalizati (None, 2048, 135, 25 8192 block_4c_conv_3[0][0]


add_16 (Add) (None, 2048, 135, 25 0 block_4c_bn_3[0][0]
activation_46[0][0]


activation_49 (Activation) (None, 2048, 135, 25 0 add_16[0][0]


output_bbox (Conv2D) (None, 28, 135, 256) 57372 activation_49[0][0]


output_cov (Conv2D) (None, 7, 135, 256) 14343 activation_49[0][0]

Total params: 23,659,427
Trainable params: 23,606,307
Non-trainable params: 53,120


target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2019-11-29 04:19:46,833 [INFO] iva.detectnet_v2.scripts.train: Found 1562 samples in training set
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
target/truncation is not updated to match the crop areaif the dataset contains target/truncation.
2019-11-29 04:19:55,687 [INFO] iva.detectnet_v2.scripts.train: Found 390 samples in validation set
INFO:tensorflow:Create CheckpointSaverHook.
2019-11-29 04:20:01,853 [INFO] tensorflow: Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2019-11-29 04:20:04,794 [INFO] tensorflow: Graph was finalized.
2019-11-29 04:20:04.794794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-11-29 04:20:04.794820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-29 04:20:04.794827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-11-29 04:20:04.794834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-11-29 04:20:04.794894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7461 MB memory) → physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:tensorflow:Running local_init_op.
2019-11-29 04:20:08,496 [INFO] tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2019-11-29 04:20:08,857 [INFO] tensorflow: Done running local_init_op.
INFO:tensorflow:Saving checkpoints for step-0.
2019-11-29 04:20:26,874 [INFO] tensorflow: Saving checkpoints for step-0.

2019-11-29 04:21:36.984125: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f03b49f00 of size 256
2019-11-29 04:21:36.984135: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f03b4a000 of size 256
2019-11-29 04:21:36.984144: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f03b4a100 of size 256
2019-11-29 04:21:36.984153: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f03b4a200 of size 256
2019-11-29 04:21:36.984162: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f03b4a300 of size 256
2019-11-29 04:21:36.984171: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f03b4a400 of size 14592
2019-11-29 04:21:36.984181: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f03b4dd00 of size 8192
2019-11-29 04:21:36.984191: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f03b4fd00 of size 2211840
2019-11-29 04:21:36.984200: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f03d6bd00 of size 2211840
2019-11-29 04:21:36.984210: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f03f87d00 of size 3964928
2019-11-29 04:21:36.984219: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f0434fd00 of size 8388608
2019-11-29 04:21:36.984229: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f04b4fd00 of size 4096
2019-11-29 04:21:36.984238: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f04b50d00 of size 4096
2019-11-29 04:21:36.984247: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f04b51d00 of size 552960
2019-11-29 04:21:36.984256: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f04bd8d00 of size 552960
2019-11-29 04:21:36.984265: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f04c5fd00 of size 991232
2019-11-29 04:21:36.984275: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f04d51d00 of size 2097152
2019-11-29 04:21:36.984284: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f04f51d00 of size 2048
2019-11-29 04:21:36.984294: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f04f52500 of size 2048
2019-11-29 04:21:36.984303: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f04f52d00 of size 524288
2019-11-29 04:21:36.984313: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f04fd2d00 of size 524288
2019-11-29 04:21:36.984322: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05052d00 of size 1024
2019-11-29 04:21:36.984331: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05053100 of size 1024
2019-11-29 04:21:36.984340: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05053500 of size 65536
2019-11-29 04:21:36.984350: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05063500 of size 65536
2019-11-29 04:21:36.984359: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05073500 of size 1024
2019-11-29 04:21:36.984369: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05073900 of size 1024
2019-11-29 04:21:36.984380: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05073d00 of size 256
2019-11-29 04:21:36.984390: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05073e00 of size 256
2019-11-29 04:21:36.984402: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05073f00 of size 147456
2019-11-29 04:21:36.984414: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05097f00 of size 147456
2019-11-29 04:21:36.984425: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f050bbf00 of size 256
2019-11-29 04:21:36.984434: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f050bc000 of size 256
2019-11-29 04:21:36.984444: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f050bc100 of size 16384
2019-11-29 04:21:36.984453: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f050c0100 of size 16384
2019-11-29 04:21:36.984463: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f050c4100 of size 65536
2019-11-29 04:21:36.984472: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f050d4100 of size 65536
2019-11-29 04:21:36.984482: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f050e4100 of size 1024
2019-11-29 04:21:36.984491: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f050e4500 of size 1024
2019-11-29 04:21:36.984500: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f050e4900 of size 256
2019-11-29 04:21:36.984511: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f050e4a00 of size 256
2019-11-29 04:21:36.984520: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f050e4b00 of size 256
2019-11-29 04:21:36.984531: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f050e4c00 of size 256
2019-11-29 04:21:36.984540: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f050e4d00 of size 65536
2019-11-29 04:21:36.984550: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f050f4d00 of size 65536
2019-11-29 04:21:36.984560: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05104d00 of size 147456
2019-11-29 04:21:36.984570: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05128d00 of size 147456
2019-11-29 04:21:36.984580: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f0514cd00 of size 65536
2019-11-29 04:21:36.984591: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f0515cd00 of size 65536
2019-11-29 04:21:36.984599: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f0516cd00 of size 1024
2019-11-29 04:21:36.984605: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f0516d100 of size 1024
2019-11-29 04:21:36.984613: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f0516d500 of size 256
2019-11-29 04:21:36.984624: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f0516d600 of size 256
2019-11-29 04:21:36.984635: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f0516d700 of size 256
2019-11-29 04:21:36.984643: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f0516d800 of size 256
2019-11-29 04:21:36.984649: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f0516d900 of size 65536
2019-11-29 04:21:36.984657: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f0517d900 of size 65536
2019-11-29 04:21:36.984669: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f0518d900 of size 147456
2019-11-29 04:21:36.984679: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f051b1900 of size 147456
2019-11-29 04:21:36.984689: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f051d5900 of size 65536
2019-11-29 04:21:36.984699: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f051e5900 of size 65536
2019-11-29 04:21:36.984708: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f051f5900 of size 2048
2019-11-29 04:21:36.984718: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f051f6100 of size 2048
2019-11-29 04:21:36.984726: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f051f6900 of size 512
2019-11-29 04:21:36.984737: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f051f6b00 of size 512
2019-11-29 04:21:36.984747: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f051f6d00 of size 589824
2019-11-29 04:21:36.984757: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05286d00 of size 589824
2019-11-29 04:21:36.984764: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05316d00 of size 512
2019-11-29 04:21:36.984774: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05316f00 of size 512
2019-11-29 04:21:36.984785: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05317100 of size 131072
2019-11-29 04:21:36.984797: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05337100 of size 131072
2019-11-29 04:21:36.984807: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05357100 of size 262144
2019-11-29 04:21:36.984818: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05397100 of size 262144
2019-11-29 04:21:36.984829: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f053d7100 of size 2048
2019-11-29 04:21:36.984839: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f053d7900 of size 2048
2019-11-29 04:21:36.984846: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f053d8100 of size 512
2019-11-29 04:21:36.984856: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f053d8300 of size 512
2019-11-29 04:21:36.984865: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f053d8500 of size 589824
2019-11-29 04:21:36.984873: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05468500 of size 589824
2019-11-29 04:21:36.984884: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f054f8500 of size 512
2019-11-29 04:21:36.984892: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f054f8700 of size 512
2019-11-29 04:21:36.984903: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f054f8900 of size 262144
2019-11-29 04:21:36.984912: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05538900 of size 262144
2019-11-29 04:21:36.984923: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05578900 of size 262144
2019-11-29 04:21:36.984933: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f055b8900 of size 262144
2019-11-29 04:21:36.984944: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f055f8900 of size 2048
2019-11-29 04:21:36.984952: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f055f9100 of size 2048
2019-11-29 04:21:36.984961: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f055f9900 of size 512
2019-11-29 04:21:36.984970: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f055f9b00 of size 512
2019-11-29 04:21:36.984986: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f055f9d00 of size 589824
2019-11-29 04:21:36.984997: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05689d00 of size 589824
2019-11-29 04:21:36.985006: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05719d00 of size 512
2019-11-29 04:21:36.985015: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05719f00 of size 512
2019-11-29 04:21:36.985023: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f0571a100 of size 262144
2019-11-29 04:21:36.985032: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f0575a100 of size 262144
2019-11-29 04:21:36.985041: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f0579a100 of size 262144
2019-11-29 04:21:36.985050: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f057da100 of size 262144
2019-11-29 04:21:36.985059: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f0581a100 of size 2048
2019-11-29 04:21:36.985068: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f0581a900 of size 2048
2019-11-29 04:21:36.985077: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f0581b100 of size 512
2019-11-29 04:21:36.985086: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f0581b300 of size 512
2019-11-29 04:21:36.985094: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f0581b500 of size 589824
2019-11-29 04:21:36.985103: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f058ab500 of size 589824
2019-11-29 04:21:36.985111: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f0593b500 of size 512
2019-11-29 04:21:36.985119: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f0593b700 of size 512
2019-11-29 04:21:36.985128: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f0593b900 of size 262144
2019-11-29 04:21:36.985137: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f0597b900 of size 262144
2019-11-29 04:21:36.985146: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f059bb900 of size 262144
2019-11-29 04:21:36.985155: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f059fb900 of size 262144
2019-11-29 04:21:36.985164: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05a3b900 of size 4096
2019-11-29 04:21:36.985174: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05a3c900 of size 4096
2019-11-29 04:21:36.985183: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05a3d900 of size 1024
2019-11-29 04:21:36.985193: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05a3dd00 of size 1024
2019-11-29 04:21:36.985205: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05a3e100 of size 1024
2019-11-29 04:21:36.985215: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05a3e500 of size 1024
2019-11-29 04:21:36.985225: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free at 0x7f1f05a3e900 of size 524288
2019-11-29 04:21:36.985235: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05abe900 of size 524288
2019-11-29 04:21:36.985246: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05b3e900 of size 2359296
2019-11-29 04:21:36.985257: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f05d7e900 of size 2627328
2019-11-29 04:21:36.985266: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06000000 of size 1048576
2019-11-29 04:21:36.985276: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06100000 of size 4096
2019-11-29 04:21:36.985286: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06101000 of size 1024
2019-11-29 04:21:36.985296: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06101400 of size 2359296
2019-11-29 04:21:36.985305: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06341400 of size 1024
2019-11-29 04:21:36.985315: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06341800 of size 1048576
2019-11-29 04:21:36.985325: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06441800 of size 1048576
2019-11-29 04:21:36.985335: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06541800 of size 4096
2019-11-29 04:21:36.985344: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06542800 of size 1024
2019-11-29 04:21:36.985351: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06542c00 of size 1024
2019-11-29 04:21:36.985357: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06543000 of size 1048576
2019-11-29 04:21:36.985365: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06643000 of size 2359296
2019-11-29 04:21:36.985375: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06883000 of size 1048576
2019-11-29 04:21:36.985386: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06983000 of size 4096
2019-11-29 04:21:36.985393: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06984000 of size 1024
2019-11-29 04:21:36.985399: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06984400 of size 1024
2019-11-29 04:21:36.985408: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06984800 of size 1048576
2019-11-29 04:21:36.985418: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06a84800 of size 2359296
2019-11-29 04:21:36.985428: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06cc4800 of size 1048576
2019-11-29 04:21:36.985439: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06dc4800 of size 4096
2019-11-29 04:21:36.985449: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06dc5800 of size 1024
2019-11-29 04:21:36.985460: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f06dc5c00 of size 2359296
2019-11-29 04:21:36.985474: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f07005c00 of size 1024
2019-11-29 04:21:36.985489: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f07006000 of size 1048576
2019-11-29 04:21:36.985501: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f07106000 of size 1048576
2019-11-29 04:21:36.985511: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f07206000 of size 4096
2019-11-29 04:21:36.985526: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f07207000 of size 1024
2019-11-29 04:21:36.985539: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f07207400 of size 2359296
2019-11-29 04:21:36.985548: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f07447400 of size 1024
2019-11-29 04:21:36.985557: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f07447800 of size 1048576
2019-11-29 04:21:36.985567: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f07547800 of size 1048576
2019-11-29 04:21:36.985579: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f07647800 of size 8192
2019-11-29 04:21:36.985588: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f07649800 of size 2048
2019-11-29 04:21:36.985596: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f0764a000 of size 10182656
2019-11-29 04:21:36.985607: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f16000000 of size 4194304
2019-11-29 04:21:36.985619: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f16400000 of size 12582912
2019-11-29 04:21:36.985629: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800000 of size 256
2019-11-29 04:21:36.985640: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800100 of size 256
2019-11-29 04:21:36.985651: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800200 of size 256
2019-11-29 04:21:36.985661: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800300 of size 256
2019-11-29 04:21:36.985669: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800400 of size 256
2019-11-29 04:21:36.985677: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800500 of size 256
2019-11-29 04:21:36.985686: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800600 of size 256
2019-11-29 04:21:36.985695: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800700 of size 256
2019-11-29 04:21:36.985704: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800800 of size 256
2019-11-29 04:21:36.985714: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800900 of size 256
2019-11-29 04:21:36.985724: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800a00 of size 256
2019-11-29 04:21:36.985733: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800b00 of size 256
2019-11-29 04:21:36.985742: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800c00 of size 256
2019-11-29 04:21:36.985751: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800d00 of size 256
2019-11-29 04:21:36.985759: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800e00 of size 256
2019-11-29 04:21:36.985768: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f800f00 of size 256
2019-11-29 04:21:36.985777: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f801000 of size 256
2019-11-29 04:21:36.985786: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f801100 of size 256
2019-11-29 04:21:36.985794: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f801200 of size 256
2019-11-29 04:21:36.985801: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f801300 of size 256
2019-11-29 04:21:36.985810: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f801400 of size 256
2019-11-29 04:21:36.985819: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f801500 of size 256
2019-11-29 04:21:36.985828: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f801600 of size 256
2019-11-29 04:21:36.985837: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f801700 of size 256
2019-11-29 04:21:36.985845: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1f9f801800 of size 256
2019-
2019-11-29 04:21:36.987255: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1fc2a10900 of size 147456
2019-11-29 04:21:36.987263: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1fc2a34900 of size 256
2019-11-29 04:21:36.987273: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1fc2a34a00 of size 16384
2019-11-29 04:21:36.987282: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1fc2a38a00 of size 65536
2019-11-29 04:21:36.987291: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f1fc2a48a00 of size 1024
2019-11-29 04:
experiment_spec.txt (9.73 KB)

2019-11-29 04:55:53.508629: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296a8d600 of size 65536
2019-11-29 04:55:53.508639: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296a9d600 of size 147456
2019-11-29 04:55:53.508648: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296ac1600 of size 65536
2019-11-29 04:55:53.508657: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296ad1600 of size 2048
2019-11-29 04:55:53.508664: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296ad1e00 of size 512
2019-11-29 04:55:53.508671: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296ad2000 of size 589824
2019-11-29 04:55:53.508680: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296b62000 of size 512
2019-11-29 04:55:53.508687: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296b62200 of size 131072
2019-11-29 04:55:53.508695: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296b82200 of size 262144
2019-11-29 04:55:53.508703: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296bc2200 of size 2048
2019-11-29 04:55:53.508711: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296bc2a00 of size 512
2019-11-29 04:55:53.508721: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296bc2c00 of size 589824
2019-11-29 04:55:53.508727: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296c52c00 of size 512
2019-11-29 04:55:53.508735: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296c52e00 of size 262144
2019-11-29 04:55:53.508745: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296c92e00 of size 262144
2019-11-29 04:55:53.508752: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296cd2e00 of size 2048
2019-11-29 04:55:53.508760: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296cd3600 of size 512
2019-11-29 04:55:53.508770: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296cd3800 of size 512
2019-11-29 04:55:53.508776: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296cd3a00 of size 262144
2019-11-29 04:55:53.508782: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296d13a00 of size 589824
2019-11-29 04:55:53.508792: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296da3a00 of size 262144
2019-11-29 04:55:53.508801: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296de3a00 of size 2048
2019-11-29 04:55:53.508810: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296de4200 of size 512
2019-11-29 04:55:53.508819: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296de4400 of size 589824
2019-11-29 04:55:53.508828: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296e74400 of size 512
2019-11-29 04:55:53.508838: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296e74600 of size 262144
2019-11-29 04:55:53.508847: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296eb4600 of size 262144
2019-11-29 04:55:53.508856: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296ef4600 of size 4096
2019-11-29 04:55:53.508866: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296ef5600 of size 1024
2019-11-29 04:55:53.508875: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296ef5a00 of size 1024
2019-11-29 04:55:53.508884: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296ef5e00 of size 524288
2019-11-29 04:55:53.508894: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fc296f75e00 of size 2662912
2019-11-29 04:55:53.508900: I tensorflow/core/common_runtime/bfc_allocator.cc:638]      Summary of in-use Chunks by size: 
2019-11-29 04:55:53.508911: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 436 Chunks of size 256 totalling 109.0KiB
2019-11-29 04:55:53.508922: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 170 Chunks of size 512 totalling 85.0KiB
2019-11-29 04:55:53.508931: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 768 totalling 1.5KiB
2019-11-29 04:55:53.508942: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 348 Chunks of size 1024 totalling 348.0KiB
2019-11-29 04:55:53.508951: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 1280 totalling 1.2KiB
2019-11-29 04:55:53.508962: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 248 Chunks of size 2048 totalling 496.0KiB
2019-11-29 04:55:53.508973: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 153 Chunks of size 4096 totalling 612.0KiB
2019-11-29 04:55:53.508989: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 84 Chunks of size 8192 totalling 672.0KiB
2019-11-29 04:55:53.509002: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 7 Chunks of size 16384 totalling 112.0KiB
2019-11-29 04:55:53.509012: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 7 Chunks of size 37632 totalling 257.2KiB
2019-11-29 04:55:53.509023: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 6 Chunks of size 57344 totalling 336.0KiB
2019-11-29 04:55:53.509034: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 38 Chunks of size 65536 totalling 2.38MiB
2019-11-29 04:55:53.509044: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 7 Chunks of size 131072 totalling 896.0KiB
2019-11-29 04:55:53.509055: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 19 Chunks of size 147456 totalling 2.67MiB
2019-11-29 04:55:53.509064: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 6 Chunks of size 229376 totalling 1.31MiB
2019-11-29 04:55:53.509075: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 46 Chunks of size 262144 totalling 11.50MiB
2019-11-29 04:55:53.509084: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 13 Chunks of size 524288 totalling 6.50MiB
2019-11-29 04:55:53.509095: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 552960 totalling 2.11MiB
2019-11-29 04:55:53.509104: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 29 Chunks of size 589824 totalling 16.31MiB
2019-11-29 04:55:53.509116: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 941824 totalling 919.8KiB
2019-11-29 04:55:53.509124: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 991232 totalling 1.89MiB
2019-11-29 04:55:53.509135: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 81 Chunks of size 1048576 totalling 81.00MiB
2019-11-29 04:55:53.509143: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 1056768 totalling 1.01MiB
2019-11-29 04:55:53.509154: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 1441792 totalling 1.38MiB
2019-11-29 04:55:53.509163: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 14 Chunks of size 2097152 totalling 28.00MiB
2019-11-29 04:55:53.509173: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 2211840 totalling 6.33MiB
2019-11-29 04:55:53.509181: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 42 Chunks of size 2359296 totalling 94.50MiB
2019-11-29 04:55:53.509191: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 2662912 totalling 2.54MiB
2019-11-29 04:55:53.509199: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 3754240 totalling 3.58MiB
2019-11-29 04:55:53.509207: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 41 Chunks of size 4194304 totalling 164.00MiB
2019-11-29 04:55:53.509217: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 6 Chunks of size 8388608 totalling 48.00MiB
2019-11-29 04:55:53.509228: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 16 Chunks of size 9437184 totalling 144.00MiB
2019-11-29 04:55:53.509239: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 10182656 totalling 9.71MiB
2019-11-29 04:55:53.509247: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 12582912 totalling 12.00MiB
2019-11-29 04:55:53.509256: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 15374336 totalling 14.66MiB
2019-11-29 04:55:53.509267: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 15673088 totalling 14.95MiB
2019-11-29 04:55:53.509277: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 16777216 totalling 32.00MiB
2019-11-29 04:55:53.509287: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 2264924160 totalling 2.11GiB
2019-11-29 04:55:53.509297: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 2.80GiB
2019-11-29 04:55:53.509309: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats: 
Limit:                  7847523124
InUse:                  3006324992
MaxInUse:               3577017856
NumAllocs:                    4368
MaxAllocSize:           2264924160

2019-11-29 04:55:53.509372: W tensorflow/core/common_runtime/bfc_allocator.cc:271] ************************************_______________________________________________*****___*********
2019-11-29 04:55:53.509402: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at fused_batch_norm_op.cc:574 : Resource exhausted: OOM when allocating tensor with shape[4,64,1080,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/usr/local/bin/tlt-train-g1", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 37, in main
  File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
  File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "./detectnet_v2/scripts/train.py", line 632, in main
  File "./detectnet_v2/scripts/train.py", line 556, in run_experiment
  File "./detectnet_v2/scripts/train.py", line 490, in train_gridbox
  File "./detectnet_v2/scripts/train.py", line 136, in run_training_loop
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 676, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1270, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1327, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1091, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4,64,1080,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node resnet50_nopool_bn_detectnet_v2/bn_conv1/FusedBatchNorm (defined at /usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py:1839) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[node resnet50_nopool_bn_detectnet_v2/block_1b_bn_1/AssignMovingAvg (defined at /opt/nvidia/third_party/keras/tensorflow_backend.py:186) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


Caused by op u'resnet50_nopool_bn_detectnet_v2/bn_conv1/FusedBatchNorm', defined at:
  File "/usr/local/bin/tlt-train-g1", line 10, in <module>
    sys.exit(main())
  File "./common/magnet_train.py", line 37, in main
  File "</usr/local/lib/python2.7/dist-packages/decorator.pyc:decorator-gen-2>", line 2, in main
  File "./detectnet_v2/utilities/timer.py", line 46, in wrapped_fn
  File "./detectnet_v2/scripts/train.py", line 632, in main
  File "./detectnet_v2/scripts/train.py", line 556, in run_experiment
  File "./detectnet_v2/scripts/train.py", line 466, in train_gridbox
  File "./detectnet_v2/scripts/train.py", line 320, in build_training_graph
  File "./detectnet_v2/model/detectnet_model.py", line 470, in build_training_graph
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 564, in call
    output_tensors, _, _ = self.run_internal_graph(inputs, masks)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/network.py", line 721, in run_internal_graph
    layer.call(computed_tensor, **kwargs))
  File "/opt/nvidia/third_party/keras/mixed_precision.py", line 181, in _batch_normalization_call
    epsilon=self.epsilon)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1864, in normalize_batch_in_training
    epsilon=epsilon)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1839, in _fused_normalize_batch_in_training
    data_format=tf_data_format)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/nn_impl.py", line 1182, in fused_batch_norm
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 3756, in _fused_batch_norm
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[4,64,1080,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node resnet50_nopool_bn_detectnet_v2/bn_conv1/FusedBatchNorm (defined at /usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py:1839) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[node resnet50_nopool_bn_detectnet_v2/block_1b_bn_1/AssignMovingAvg (defined at /opt/nvidia/third_party/keras/tensorflow_backend.py:186) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

above is the remaining part

Thanks for the info.
For the OOM, could you please try below experiment?
1.set below before training.
$ export TF_FORCE_GPU_ALLOW_GROWTH=true

I have already tried , but getting the same error even after that.

Hi samjith888,
Could you please set lower batch size and try again?
Before running and During the training, please use nvidia-smi to monitor the GPU memory.

I have already tried with smaller batch sizes, but still getting the same error.

The error is only resolving when i change output_image_width and height with a lesser value (but my image size is (4096 * 2160)). Getting mean average precision is ‘0’ in every evaluation if i reduced the width and height into eg: 768 *768. (i thought its a wrong )

augmentation_config {
preprocessing {
output_image_width: 4096
output_image_height: 2160

Hi Morganh,

    Do i need to adjust learning rate in spec file if i changed the 'batch_size_per_gpu: 16' to 2 ? If needed , what value i have adjust and in which field in my spec file? any suggestion please

Hi samjith888,
No, it is not needed for OOM issue.
For OOM, as I mentioned in your another thread, please consider set lower input-size.
But for detectnet_v2, the tlt-train tool does not support training on images of multiple resolutions, or resizing images during training.
So, you’d better resize all of the images offline to set lower training size. And the corresponding bounding boxes must be scaled accordingly.

Resizing images and scaling bbox will be big task because of my dataset. All the images are in same resolution in my dataset.

I have seen that in https://github.com/NVIDIA-AI-IOT/tlt-iva-examples/issues/1#issuecomment-560587003, if i reduced the batch size then i have to adjust the learning rate also.

Hi samjith888,
According to tlt doc, currently detectnet_v2 only supports softstart annealing learning rate schedule, and maybe configured using the following parameters:

soft_start (float): Defines the time to ramp up the learning rate from minumum learning rate to maximum learning rate
annealing (float): Defines the time to cool down the learning rate from maximum learning rate to minimum learning rate
minimum_learning_rate(float): Minimum learning rate in the learning rate schedule.
maximum_learning_rate(float): Maximum learning rate in the learning rate schedule.

I observe that OOM issue happen immediately when you trigger training, so you need not adjust the lr.

For “Resizing images and scaling bbox”, can you try to resize offiline?

  1. resize images
  2. adjust labels’ bboxes: write scripts to adjust column(xmin, ymin, xmax, ymax)
1 Like

Finally , I’m going to reduce the size of the image using some script. But i have a image with high resolution, can you suggest which is the higher resolution can accept by the detectnet augmentation ?

Hi samjith888,
What do you mean “which is the higher resolution”?

Because when i put output_image_width: 4096 and output_image_height: 2160, i got error due to the high resolution. so into what resolution i should resize input image?

This should be case by case.
For your case, you are training with GTX 1070 but meet OOM for (4096,2160).
If you use multi-gpu or other GPU, the OOM issue maybe does not happen.

So, please keep you hardware setting and resize the images/labels, to see if OOM issue is gone.

1 Like

I have resized the image from 4096 X 2160 to 1248 x 384. So any formula to find xmin, ymin, xmax and ymax from this?

Hi samjith888,
Take one label.txt as an example.
$ cat label.txt
car 0.00 0 0.00 111.1 222.2 333.3 444.4 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Then the [xmin, ymin, xmax, ymax] of it is [111.1, 222.2, 333.3, 444.4]

If you resize the image from 4096 X 2160 to 1248 x 384, then
[111.1, 222.2, 333.3, 444.4] will be resized into [111.11248/4096, 222.2384/2160, 333.31248/4096, 444.4384/2160]

I have resized the image and adjusted the labels according to it. But I’m getting mAp 0 and average precision 0 for all classes even after changing the resolution and labels. I have attached the updated spec file which is used for this training.

Epoch 36/120
=========================

Validation cost: -0.000009
Mean average_precision (in %): 0.0000

Please attach the updated spec file. I did not find it.
BTW, is OOM issue resolved?