Hi,
I’m running into a segfault while trying to run the TensorRT sample_uff_ssd app with the --int8 flag on a Jetson AGX Xavier board.
I’ve successfully run simpler examples such as the Uff MNIST example… this is the first sample I’m trying to run with int8 which requires calibration. Without the --int8 flag, it ran fine in FP32 mode and was able to identify the objects in the sample PPM images. (As part of getting the FP32 mode to work, I downloaded the model, ran the script to convert the frozen graph to Uff, identified which file was the working ssd.prototxt file – non-obvious by the way, etc.)
For the calibration images, I downloaded the COCO 2017 val zip file, unzipped the images into a temporary directory. I then converted from jpg to PPM via a ‘mogrify --format ppm *.jpg’, and moved all resulting ppm files to /workspace/tensorrt/data/ssd. I then created a list.txt file which contained the names of all the PPM files, with the ‘.ppm’ extension removed, with each file on a separate line.
I very recently loaded the board (about 1 week ago) using a fresh install of jetpack. Unfortunately I’m not 100% sure how to report the version that I’m running on the AGX board itself, so if there’s any other helpful info I can collect let me know.
nvidia@jetson-0423418010368:~/tensorrt/bin$ ./sample_uff_ssd --int8
../data/ssd/sample_ssd_relu6.uff
Begin parsing model...
End parsing model...
Begin building engine...
Batch #0
Calibrating with file 000000000139.ppm
Calibrating with file 000000000285.ppm
Calibrating with file 000000000632.ppm
Calibrating with file 000000000724.ppm
Calibrating with file 000000000776.ppm
Calibrating with file 000000000785.ppm
Calibrating with file 000000000802.ppm
Calibrating with file 000000000872.ppm
Calibrating with file 000000000885.ppm
Calibrating with file 000000001000.ppm
Calibrating with file 000000001268.ppm
Calibrating with file 000000001296.ppm
Calibrating with file 000000001353.ppm
Calibrating with file 000000001425.ppm
Calibrating with file 000000001490.ppm
Calibrating with file 000000001503.ppm
Calibrating with file 000000001532.ppm
Calibrating with file 000000001584.ppm
Calibrating with file 000000001675.ppm
Calibrating with file 000000001761.ppm
Calibrating with file 000000001818.ppm
Calibrating with file 000000001993.ppm
Calibrating with file 000000002006.ppm
Calibrating with file 000000002149.ppm
Calibrating with file 000000002153.ppm
Calibrating with file 000000002157.ppm
Calibrating with file 000000002261.ppm
Calibrating with file 000000002299.ppm
Calibrating with file 000000002431.ppm
Calibrating with file 000000002473.ppm
Calibrating with file 000000002532.ppm
Calibrating with file 000000002587.ppm
Calibrating with file 000000002592.ppm
Calibrating with file 000000002685.ppm
Calibrating with file 000000002923.ppm
Calibrating with file 000000003156.ppm
Calibrating with file 000000003255.ppm
Calibrating with file 000000003501.ppm
Calibrating with file 000000003553.ppm
Calibrating with file 000000003661.ppm
Calibrating with file 000000003845.ppm
Calibrating with file 000000003934.ppm
Calibrating with file 000000004134.ppm
Calibrating with file 000000004395.ppm
Calibrating with file 000000004495.ppm
Calibrating with file 000000004765.ppm
Calibrating with file 000000004795.ppm
Calibrating with file 000000005001.ppm
Calibrating with file 000000005037.ppm
Calibrating with file 000000005060.ppm
Segmentation fault (core dumped)
Rerunning the debug version with gdb, I see the following stack trace:
Calibrating with file 000000005060.ppm
Thread 1 "sample_uff_ssd_" received signal SIGSEGV, Segmentation fault.
__memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:108
108 ../sysdeps/aarch64/multiarch/../memcpy.S: No such file or directory.
(gdb) bt
#0 __memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:108
#1 0x0000007fab726ae4 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#2 0x0000007fab726e3c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/aarch64-linux-gnu/libstdc++.so.6
#3 0x00000055555627ec in samplesCommon::readPPMFile<3, 300, 300> (filename="../data/ssd/000000000285.ppm", ppm=...) at ../common/common.h:447
#4 0x000000555555ec8c in BatchStream::update (this=0x7fffffe198) at BatchStreamPPM.h:110
#5 0x000000555555e6f4 in BatchStream::next (this=0x7fffffe198) at BatchStreamPPM.h:51
#6 0x000000555555f478 in Int8EntropyCalibrator::getBatch (this=0x7fffffe190, bindings=0x55b1b26300, names=0x55b1d76a40, nbBindings=1)
at BatchStreamPPM.h:170
#7 0x0000007fb0974890 in nvinfer1::builder::calibrateEngine(nvinfer1::IInt8Calibrator&, nvinfer1::ICudaEngine&, std::unordered_map<std::string, float, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, float> > >&, bool) ()
from /usr/lib/aarch64-linux-gnu/libnvinfer.so.5
#8 0x0000007fb0946250 in nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, nvinfer1::rt::HardwareContext const&, nvinfer1::Network const&) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.5
#9 0x0000007fb09b02ec in nvinfer1::builder::Builder::buildCudaEngine(nvinfer1::INetworkDefinition&) ()
from /usr/lib/aarch64-linux-gnu/libnvinfer.so.5
#10 0x000000555555ac7c in loadModelAndCreateEngine (uffFile=0x55558422c0 "../data/ssd/sample_ssd_relu6.uff", maxBatchSize=2, parser=0x5555824730,
calibrator=0x7fffffe190, trtModelStream=@0x7fffffdf50: 0x0) at sampleUffSSD.cpp:162
#11 0x000000555555b5dc in main (argc=2, argv=0x7fffffef48) at sampleUffSSD.cpp:539
(gdb)
Let me know if there’s anything else I can provide that might help. I saw a similar topic on the forum from someone who saw this within a docker container, but searching the forum I didn’t see a similar issue. Apologies if I missed it!
Thanks,
- Josh.