Memory problem with P100

I am trying to run the tensorflow benchmarks on P100. Here are the parameters:

File “tf_cnn_benchmarks.py”, line 56, in main
TensorFlow: 1.6
Model: alexnet
Dataset: imagenet (synthetic)
Mode: training
SingleSess: False
Batch size: 16 global
16 per device
Num batches: 100
Num epochs: 0.00
Devices: [‘/gpu:0’]
Data format: NCHW
Layout optimizer: False
Optimizer: sgd
Variables: parameter_server

I am using pretty small batch size but still getting out memory error.

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[9216,4096] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: v/cg/affine0/weights/Initializer/truncated_normal/TruncatedNormal = TruncatedNormalT=DT_INT32, _class=[“loc:@v/cg/affine0/weights”], dtype=DT_FLOAT, seed=1234, seed2=147, _device=“/job:localhost/replica:0/task:0/device:GPU:0”]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Still don’t get it why I am getting this error.