DIGITS 4 on Jetson TX1

arashloo · September 20, 2016, 7:58am

Dear all,

I am wondering whether the “DIGITS 4” development environment runs on the “Jetson TX1” platform?

I am planning to use the DetectNet deep neural network model provided by NVIDIA to perform vehicle detection using the Jetson TX1 platform. I would appreciate if you share any experiences you might have related to this experiment.

many thanks,
Shervin

dusty_nv · September 20, 2016, 12:22pm

Hi Shervin, the DIGITS training system is not supported on ARM/Jetson and is meant to run from PC for training. Partially this is because the nvcaffe that DIGITS uses for training, on TX1 nvcaffe is optimized for FP16 inference and not training. So for training a network, run DIGITS in the cloud (like in AWS or Azure) or on a local x86 machine. With each training epoch, DIGITS will save a network model checkpoint, which you can copy over to your Jetson for deploying the inference. You can do this with DetectNet as well, after you get it trained in DIGITS to your liking, copy it over to your Jetson. There you can load it with TensorRT using example code like this.

kanakiyab · September 27, 2017, 7:38pm

Hello,

I have been following the posts related to DIGITS training and inference on Jetson TX1. I understand that DIGITS can be set up only on Cloud or a computer with a GPU. I am going through the tutorial on the [url]https://github.com/dusty-nv/jetson-inference#system-setup[/url]. This explains the steps to install JetPack on the Jetson and while doing that, it would install the necessary CUDA toolkit on the machine which will be used to run DIGITS.

However, I have currently installed JetPack on TX1 with a local computer and would like to use google cloud VM Instance for setting up DIGITS. Are there any related posts for doing so? Please let me know.

Thank you,
Bhargav

kanakiyab · September 27, 2017, 9:02pm

[UPDATE#1]
Hello,

I found the setup instructions on https://github.com/NVIDIA/DIGITS/blob/digits-6.0/docs/BuildDigits.md#prerequisites. I could install the CUDA toolkit and drivers (please note that I installed Cuda 9.0 rather than 8.0 as mentioned in the link). Now in the next step, when installing the Machine Learning Repo, I get the following error:

Preparing to unpack /tmp/ml-repo.deb ...
Unpacking nvidia-machine-learning-repo-ubuntu1604 (1.0.0-1) over (1.0.0-1) ...
Setting up nvidia-machine-learning-repo-ubuntu1604 (1.0.0-1) ...
gpg: no valid OpenPGP data found.
Failed to add GPGKEY at http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/7fa2af80.pub to apt keys.

Not sure how to resolve this. Currently working on trying to find a solution for this, but if someone reads it, please direct me to some resources.

Any help is appreciated.

Thanks,
Bhargav

kanakiyab · September 27, 2017, 9:22pm

[UPDATE#2]

With some more digging, I notice that 7fa2af80.pub file is not available at [url]http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/[/url]. However, it is available at [url]http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1404/x86_64/[/url]. But I believe that I can’t use a file for 14.04 to install on 16.04 machine.

Will keep digging more…

(I will also update the solution if I can find, or still feel free to suggest anything.)

dusty_nv · September 28, 2017, 5:30pm

For Ubuntu it looks like the CUDA downloads currently support 16.04 or 17.04. Does your cloud provider support 16.04? If you still have trouble installing the DEB package, you can try the runfile instead to install directly:

External Media

If you do go the runfile route, you’ll want to install cuDNN and NVcaffe too, which is covered in the tutorial. You can re-join it at this step: [url]GitHub - dusty-nv/jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.

kanakiyab · September 28, 2017, 9:11pm

Thanks, Dustin!

I had set up Ubuntu 16.04 on Cloud platform. I ended up following the exact same steps last night and had success. Haven’t had time to update the post.

I have started using DIGITS and also reading about it. I have one question about the platform. I understand that the DetectNet can take in images of various sizes and still train an object detection model (from this post: [url]https://devblogs.nvidia.com/parallelforall/detectnet-deep-neural-network-object-detection-digits/[/url]. However, when I was uploading the dataset on Digits, it wouldn’t allow me to upload arbitrary sized images. Did I miss something?

Moreover, are there any other official guides except [url]https://github.com/dusty-nv/jetson-inference#locating-object-coordinates-using-detectnet[/url].

Please let me know. Really appreciate it and look forward to utilizing the powerful DetectNet for training and inference.

Regards,
Bhargav

kanakiyab · September 29, 2017, 1:00am

On another note, I have setup DIGITS 6 on the cloud and I was trying to train a model on a custom dataset. Every time I create a model, it would fail with Error code -6. After peeking into the log file, I saw the following error message:

I0929 00:37:08.050968  9779 layer_factory.hpp:77] Creating layer cluster
[libprotobuf FATAL google/protobuf/stubs/common.cc:61] This program requires version 3.2.0 of the Protocol Buffer runtime library, but the installed version is 2.6.1.  Please update your library.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "google/protobuf/descriptor.pb.cc".)
terminate called after throwing an instance of 'google::protobuf::FatalException'
what():  This program requires version 3.2.0 of the Protocol Buffer runtime library, but the installed version is 2.6.1.  Please update your library.  If you compiled the program yourself, make sure that your headers are from the same version of Protocol Buffers as your link-time library.  (Version verification failed in "google/protobuf/descriptor.pb.cc".)
*** Aborted at 1506645428 (unix time) try "date -d @1506645428" if you are using GNU date ***
PC: @     0x7fda5d10b428 gsignal
*** SIGABRT (@0x3ea00002633) received by PID 9779 (TID 0x7fda5f433ac0) from PID 9779; stack trace: ***

I installed protobuf compiler as mentioned in the installation steps. However, when I run

protoc --version

I get

libprotoc 2.6.1

.

Not sure which step went wrong.

Maybe, if I install the protobuf 3.2 from the source, and then rebuild caffe? Would try that out but hopefully if someone reads this and suggests, I could save some time from futile attempts.

AastaLLL · October 5, 2017, 9:29am

Hi,

Could you test MNIST training first to make sure the functionality of DIGITs?

github.com

NVIDIA/DIGITS/blob/master/docs/GettingStarted.md

# Getting Started

Now that you have successfully installed DIGITS, this guide will teach you the basics of how to use it.
By the end, you will have trained a Caffe model to recognize hand-written digits.
We will be using the [MNIST handwritten digit database](http://yann.lecun.com/exdb/mnist) as our dataset and [LeNet-5](http://yann.lecun.com/exdb/lenet/) for our network.
Both are generously made available by Yann LeCun on [his website](http://yann.lecun.com/).

## Download the data

Use the following command to download the MNIST dataset onto your server:
```sh
$ python -m digits.download_data mnist ~/mnist
Downloading url=http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz ...
Downloading url=http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz ...
Downloading url=http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz ...
Downloading url=http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz ...
Uncompressing file=train-images-idx3-ubyte.gz ...
Uncompressing file=train-labels-idx1-ubyte.gz ...
Uncompressing file=t10k-images-idx3-ubyte.gz ...
Uncompressing file=t10k-labels-idx1-ubyte.gz ...

This file has been truncated. show original

Thanks.

dusty_nv · October 5, 2017, 1:00pm

If you want to have multiple images sizes within a dataset, make sure the “Enforce same shape” option is de-selected when importing the dataset.

This guide uses DIGITS on PC and TensorRT on Jetson for deployment. However, you can consult the DIGITS master examples, please see:

https://github.com/NVIDIA/DIGITS/tree/master/examples
https://github.com/NVIDIA/DIGITS/tree/master/examples/object-detection

kanakiyab · October 5, 2017, 6:25pm

Thanks, dusty_nv. I was able to use it with this functionality as I had also noticed the checkbox.

Sure, I have stumbled upon these links too. :)

Thanks, AastaLLL. I was successfully able to test everything including object detection.

For others who might stumble upon this post, I had missed the following two steps after I installed protobuf-3.2.0 from source.

cd python
python setup.py install --cpp_implementation

These need to be executed inside the folder where protobuf source is downloaded and after it has been built from source as per the instructions on the installation guide.

Thanks,
Bhargav