Deep Learning Inference Benchmarking Instructions

Hi all, below you will find the procedures to run the Jetson Nano deep learning inferencing benchmarks from this blog post with TensorRT.

note: for updated JetPack 4.4 benchmarks, please use github.com/NVIDIA-AI-IOT/jetson_benchmarks

While using one of the recommended power supplies, make sure you Nano is in 10W performance mode (which is the default mode):

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Using other lower-capacity power supplies may lead to system instabilities or shutdown during the benchmarks.

SSD-Mobilenet-V2

  1. Copy the ssd-mobilenet-v2 archive from here to the ~/Downloads folder on Nano.
    $ cd ~/Downloads/
    $ wget --no-check-certificate 'https://nvidia.box.com/shared/static/8oqvmd79llr6lq1fr43s4fu1ph37v8nt.gz' -O ssd-mobilenet-v2.tar.gz
    $ tar -xvf ssd-mobilenet-v2.tar.gz
    $ cd ssd-mobilenet-v2
    $ sudo cp -R sampleUffSSD_rect /usr/src/tensorrt/samples
    $ sudo cp sample_unpruned_mobilenet_v2.uff /usr/src/tensorrt/data/ssd/
    $ sudo cp image1.ppm /usr/src/tensorrt/data/ssd/
    
  2. Apply the following patches to the sample, depending on your JetPack version:
    JetPack 4.4 or newer
    • patch for /usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp
    20,21d19
    < using namespace sample;
    < using namespace std;
    23c21
    < /*static Logger gLogger;*/
    ---
    > static Logger gLogger;
    171c169
    <     builder->setMaxWorkspaceSize(1024 * 1024 * 128); // We need about 1GB of scratch space for the plugin layer for batch size 5.
    ---
    >     builder->setMaxWorkspaceSize(128_MB); // We need about 1GB of scratch space for the plugin layer for batch size 5.
    
    • patch for /usr/src/tensorrt/samples/sampleUffSSD_rect/Makefile
    3d2
    < EXTRA_DIRECTORIES = ../common
    
    JetPack 4.3 or JetPack 4.2.1
    • patch for /usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp
    19a20
    > using namespace std;
    21c22
    < static Logger gLogger;
    ---
    > /*static*/ Logger gLogger;
    169c170
    <     builder->setMaxWorkspaceSize(128_MB); // We need about 1GB of scratch space for the plugin layer for batch size 5.
    ---
    >     builder->setMaxWorkspaceSize(1024 * 1024 * 128); // We need about 1GB of scratch space for the plugin layer for batch size 5.
    

  3. Compile the sample
    $ cd /usr/src/tensorrt/samples/sampleUffSSD_rect
    $ sudo make
    
  4. Run the sample to measure inference performance
    $ cd /usr/src/tensorrt/bin
    $ sudo ./sample_uff_ssd_rect
    

Image Classification (ResNet-50, Inception V4, VGG-19)

  1. The resources needed to run these models are available here. Copy each of these .prototxt files to the /usr/src/tensorrt/data/googlenet folder on your Jetson Nano.
  2. ResNet-50
    $ cd /usr/src/tensorrt/bin
    $ ./trtexec --output=prob --deploy=../data/googlenet/ResNet50_224x224.prototxt --fp16 --batch=1
    
  3. Inception V4
    $ cd /usr/src/tensorrt/bin
    $ ./trtexec --output=prob --deploy=../data/googlenet/inception_v4.prototxt --fp16 --batch=1
    
  4. VGG-19
    $ cd /usr/src/tensorrt/bin
    $ ./trtexec --output=prob --deploy=../data/googlenet/VGG19_N2.prototxt --fp16 --batch=1
    

U-Net Segmentation

  1. Copy the output_graph.uff model file from here to the home folder on your Jetson Nano or any directory of your preference.
  2. Run the U-Net inference benchmark:
    $ cd /usr/src/tensorrt/bin
    $ sudo ./trtexec --uff=~/output_graph.uff --uffInput=input_1,1,512,512 --output=conv2d_19/Sigmoid --fp16
    

Pose Estimation

  1. Copy the pose_estimation.prototxt file from here to the /usr/src/tensorrt/data/googlenet folder of your Nano.
  2. Run the OpenPose inference benchmark:
    $ cd /usr/src/tensorrt/bin/
    $ sudo ./trtexec --output=Mconv7_stage2_L2 --deploy=../data/googlenet/pose_estimation.prototxt --fp16 --batch=1
    

Super Resolution

  1. Download the require files to run inference on the Super Resolution neural network.
    $ sudo wget --no-check-certificate 'https://nvidia.box.com/shared/static/a99l8ttk21p3tubjbyhfn4gh37o45rn8.gz' -O Super-Resolution-BSD500.tar.gz
    
  2. Unzip the downloaded file
    $ sudo tar -xvf Super-Resolution-BSD500.tar.gz
    
  3. Run the Super Resolution inferencing benchmark:
    $ cd /usr/src/tensorrt/bin
    $ sudo ./trtexec --output=output_0 --onnx=<path to the .onnx file in the unzipped folder above> --fp16 --batch=1
    

Tiny YOLO v3

  1. Install pre-requisite packages
    $ sudo apt-get install libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev libgflags-dev
    
  2. Download trt-yolo-app
    $ cd ~
    $ git clone -b restructure https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps
    
  3. If you are using JetPack 4.3 or newer, apply the following git patch to the deepstream_reference_apps source:
    diff --git a/yolo/config/yolov3-tiny.txt b/yolo/config/yolov3-tiny.txt
    index ec12c53..47e46a6 100644
    --- a/yolo/config/yolov3-tiny.txt
    +++ b/yolo/config/yolov3-tiny.txt
    @@ -47,7 +47,7 @@
     # nms_thresh : IOU threshold for bounding box candidates. Default value is 0.5
     
     #Uncomment the lines below to use a specific config param
    -#--precision=kINT8
    +--precision=kHALF
     #--calibration_table_path=data/calibration/yolov3-tiny-calibration.table
     #--engine_file_path=
     #--print_prediction_info=true
    diff --git a/yolo/lib/ds_image.cpp b/yolo/lib/ds_image.cpp
    index 36a394c..9e4ff5b 100644
    --- a/yolo/lib/ds_image.cpp
    +++ b/yolo/lib/ds_image.cpp
    @@ -88,7 +88,7 @@ DsImage::DsImage(const std::string& path, const int& inputH, const int& inputW)
         cv::copyMakeBorder(m_LetterboxImage, m_LetterboxImage, m_YOffset, m_YOffset, m_XOffset,
                            m_XOffset, cv::BORDER_CONSTANT, cv::Scalar(128, 128, 128));
         // converting to RGB
    -    cv::cvtColor(m_LetterboxImage, m_LetterboxImage, CV_BGR2RGB);
    +    cv::cvtColor(m_LetterboxImage, m_LetterboxImage, cv::COLOR_BGR2RGB);
     }
     
     void DsImage::addBBox(BBoxInfo box, const std::string& labelName)
    @@ -106,7 +106,7 @@ void DsImage::addBBox(BBoxInfo box, const std::string& labelName)
             = cv::getTextSize(labelName, cv::FONT_HERSHEY_COMPLEX_SMALL, 0.5, 1, nullptr);
         cv::rectangle(m_MarkedImage, cv::Rect(x, y, tsize.width + 3, tsize.height + 4), color, -1);
         cv::putText(m_MarkedImage, labelName.c_str(), cv::Point(x, y + tsize.height),
    -                cv::FONT_HERSHEY_COMPLEX_SMALL, 0.5, cv::Scalar(255, 255, 255), 1, CV_AA);
    +                cv::FONT_HERSHEY_COMPLEX_SMALL, 0.5, cv::Scalar(255, 255, 255), 1, cv::LINE_AA);
     }
     
     void DsImage::showImage() const
    @@ -142,4 +142,4 @@ std::string DsImage::exportJson() const
                 json << "}";
         }
         return json.str();
    -}
    \ No newline at end of file
    +}
    diff --git a/yolo/lib/trt_utils.h b/yolo/lib/trt_utils.h
    index 359bfea..96a5a39 100644
    --- a/yolo/lib/trt_utils.h
    +++ b/yolo/lib/trt_utils.h
    @@ -28,11 +28,12 @@ SOFTWARE.
     #define __TRT_UTILS_H__
     
     /* OpenCV headers */
    -#include <opencv/cv.h>
    +//#include <opencv/cv.h>
     #include <opencv2/core/core.hpp>
     #include <opencv2/dnn/dnn.hpp>
     #include <opencv2/highgui/highgui.hpp>
     #include <opencv2/imgproc/imgproc.hpp>
    +#include <opencv2/imgcodecs/legacy/constants_c.h>
     
     #include <set>
     
    diff --git a/yolo/lib/yolo.cpp b/yolo/lib/yolo.cpp
    index 117a49f..2b7435e 100644
    --- a/yolo/lib/yolo.cpp
    +++ b/yolo/lib/yolo.cpp
    @@ -423,7 +423,7 @@ void Yolo::createYOLOEngine(const nvinfer1::DataType dataType, Int8EntropyCalibr
                   << " precision : " << m_Precision << " and batch size :" << m_BatchSize << std::endl;
     
         m_Builder->setMaxBatchSize(m_BatchSize);
    -    m_Builder->setMaxWorkspaceSize(1 << 20);
    +    m_Builder->setMaxWorkspaceSize(1024 * 1024 * 8);
     
         if (dataType == nvinfer1::DataType::kINT8)
         {
    
  4. Install other requirements
    $ cd ~/deepstream_reference_apps/yolo
    $ sudo sh prebuild.sh
    
  5. Compile and install app
    $ cd apps/trt-yolo
    $ mkdir build && cd build
    $ cmake -D CMAKE_BUILD_TYPE=Release ..
    $ make && sudo make install
    $ cd ../../..
    
  6. For the sample image data set, you can download 500 images (need to be in .png) format to any folder on your Jetson Nano, just use 1 image file, or use a test set of 5 images that we've provided here.
    • Navigate your terminal to:
      $ cd ~/deepstream_reference_apps/yolo/data
      
    • Open the file “test_images.txt”
    • In the above file, you need to provide the full path to each of the 500 images you downloaded. For example, if your first image is located in the Downloads directory, the path you would enter in line 1 would be:
      /home/<username>/Downloads/<image file name>.png
      
    • Alternatively, you could provide the path to just one image and copy that line 500 times in that file.
    • A sample set of images (5 images of varying resolutions, repeated 100 times) along with the test_images.txt file have been uploaded here. You can use this data set if you don’t want to download your own images.
    • Go to the folder ‘config’ and open file ‘yolov3-tiny.txt'
    • In the file yolov3-tiny.txt, search for “--precision=kINT8” and replace “kINT8” with “kHALF” to change the inference precision to FP16 mode. Also you will need to uncomment this line. (if you applied the patch for JetPack 4.3 above, this step has already been done)
    • Save the file
  7. Now run the Tiny YOLO inference:
    $ cd ~/deepstream_reference_apps/yolo
    $ sudo trt-yolo-app --flagfile=config/yolov3-tiny.txt
    
2 Likes

hello, I followed the commands for the SSD-Mobilenet-V2, getting a crash.
I think it’s because I only use the micro-usb charger (10 W) to feed the nano jetson.
And since then it doesn’t boot-up either.
Can you confirm that the jetson nano is unable to boot with a micro-usb loader after executing the jetson_clock command?
Is it possible to modify some file of the sd card (from another device) to revert the changes produced by jetson_clock?
Thanks

Hi luisma, can you try re-flashing your SD card with the original image?

I’ll add a note to the post above about using one of the recommended power supplies to run the benchmarks, thanks.

sorry , i do not want to re-flashing because I’ve worked so hard on it. I’d like to reconfigure it.

It’s possible that during the abrupt shutdown, the filesystem on the SD card got corrupted, which is why it may no longer boot. Do you have a second SD card that you could try flashing with the original image? Alternatively, would recommend trying one of the DC barrel jack adapters or one of the recommended USB power supplies and seeing if that helps (although jetson_clocks behavior gets reset upon reboot, and nvpmodel -m 0 profile is already the default).

You could also try plugging your SD card into a Linux PC (or another machine that can read ext4) and see if you can mount it to recover your files.

Thanks , Thats great !!
And what files must be restored ?
Perhaps l4t_dfs.conf and put a little script in etc/rc.local like with :
jetson-clock --restore

It is unclear which files are corrupt/damaged and would need to be restored. You could try using fsck utility from a PC to check for errors.

Baring that, the purpose of mounting the SD card on PC would be to backup your files before re-flashing the SD card.

Thank you very much. i will try.

Do you have the benchmarking instructions for the SSD ResNet-18 model?

Hi hunterjm, looking into what these are now. Stay tuned, thanks.

Hi,

Can you specify which openpose network did you use and can you also post the weights?

Thanks.

@a7ypical sadly I don’t think they have the weights. They used an open source model from here which does not post the weights: https://github.com/opencv/open_model_zoo/blob/master/intel_models/human-pose-estimation-0001/human-pose-estimation-0001.prototxt

Either you’ll have to train your own or try and convert another model to tensorrt,

Hi,

I tried the above mentioned mobilenet_v2 SSD example and the results are not encouraging, to be honest. It detects nothing on sample images. Are you sure the image data are being normalized correctly for this network?

What is the TF source model for the sample_unpruned_mobilenet_v2.uff? According to sample source, it should have 37 classes, but MS COCO has much more classes.

I would like to be able to go through TF → UFF → TensorRT with mobilenet_v2 SSD and to try different dimensions, too. Can you share your code somewhere?

Thank you

Hi Freemanix, you would want to freeze the PB graph from TensorFlow and export it to UFF similar to these documents:

I’m having a strange issue now. The mobilenet sample code you posted works just fine. But now when I attempt to build regular sampleUffSSD instead of sampleUffSSD_rect, the executable is named sampleUffSSD but runs the code of sampleUffSSD_rect. So I now have two executables, sampleUffSSD and sampleUffSSD_rect, that both seem to run the code of sampleUffSSD_rect. Is something messed up with the makefiles?


Update: Renaming the files and running make clean fixed it

@Freemanix I noticed the same where nothing is detected in this network. It seems suspicious that the code that generated detections was commented out in the example program.

The benchmark is for the network - that sample does post-processing which was commented out to get an accurate performance result of the network, as different applications and platforms apply pre/post-processing differently.

Of course, I wrote my own result parsing. The problem is not in commented out code, but in the network inference results. I tried to work with the similar sample, sampleUffSSD in the tensorrt samples, but when i convert frozen graph for ssd_inception_v2_coco_2017_11_17 to .uff file, the sample fails with:

../data/ssd/sample_ssd_relu6.uff
Begin parsing model...
ERROR: UFFParser: Graph error: Cycle graph detected
ERROR: sample_uff_ssd: Fail to parse

As a result, I was unable to run reasonably fast valid SSD on Jetson Nano so far.

Hello! It seems that the link to the Unet files is wrong. Can you fix that? Can you provide any details about the use Unet architecture, or other useful resources for segmentation? Thank you!

Hi bl5218, the UNet and pose estimation model share the same folder on Google Drive. The UNet model is output_graph.uff (and the prototxt from that folder is for the pose estimation benchmark). Sorry for the confusion — it should work ok though.

For other resource on semantic segmentation network, see this tutorial:
https://github.com/dusty-nv/jetson-inference/blob/master/docs/segnet-dataset.md