Alexnet with INT8 made with caffe gives CUDA Error : "cudnnEngine.cpp (357) - Cuda Error in execute: 77"

I am running Imagenet using 1000 batches with only 2 classes Dog and Cat using Caffe:
I try to run my deploy_imagenet.protoxt and caffe_imagenet.caffemodel with TensorRT and I got the below error.

Anyone has idea what can be reason for this ? Please help

/TensorRT-2.1.2/bin> ./sample_int8 imagenet

INT8 run:4 batches of size 10 starting at 10
cudnnEngine.cpp (357) - Cuda Error in execute: 77

Q: How to decide how many batches to make for a particular dataset(CIFAR,MNIST, Imagenet etc)
Q: What is the significance of batches? They all are same size…What does it contain?
Q: Running sampleint8 mnist shows 400 batches of 100 size each processing 40000 images…Where are these images?

How are the batches connected to the images to test for inference ?

Thanks a lot for your help in clarifiying my queestions in advance.

putty05.log (68 KB)

Attached cuda-gdb logs and cuda-memcheck logs:

d1230@linse3:~/no_backup/d1230/TensorRT-2.1.2/bin> cuda-gdb sample_int8
NVIDIA (R) CUDA Debugger
8.0 release
Portions Copyright (C) 2007-2016 NVIDIA Corporation
GNU gdb (GDB) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “x86_64-unknown-linux-gnu”.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/
Reading symbols from /net/linse8-sn/no_backup_00/d1230/TensorRT-2.1.2/targets/x86_64-linux-gnu/bin/sample_int8…done.
(cuda-gdb) r imagenet
Starting program: /net/linse8-sn/no_backup_00/d1230/TensorRT-2.1.2/targets/x86_64-linux-gnu/bin/sample_int8 imagenet
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib64/libthread_db.so.1”.

INT8 run:1 batches of size 64 starting at 0
[New Thread 0x7fffd4fac700 (LWP 11790)]
[New Thread 0x7fffd472a700 (LWP 11791)]
[New Thread 0x7fffd3f29700 (LWP 11793)]
[New Thread 0x7fffd36c2700 (LWP 11794)]
[New Thread 0x7fffd2ec1700 (LWP 11795)]

CUDA Exception: Warp Illegal Address

Program received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 0, grid 772, block (0,0,0), thread (32,0,0), device 1, sm 0, warp 2, lane 0]
0x0000000004bdf008 in .text.trtwell_scudnn_128x32_relu_interior_nn<<<(1513,3,1),(128,1,1)>>> ()
(cuda-gdb)
(cuda-gdb)

Hi,

It look like you probably did not allocate enough host memory for your outputs.

Maybe you have to resize your output buffer and to allocate enough memory on each output by using cudaMalloc ?

Did you take care about the batchsize in your cudaMalloc ?

David