NVCaffe support on TX2

Hi,

It would seem that NVCaffe cannot be compiled in the TX2 because it requires NVML, which does not support tegra chips.

Is there a workaround for this?

Hi,

NvCaffe can run with TX2 without error. Please remember to use nvcaffe-0.15 since TensorRT2.1 not support NvCaffe-0.16 model yet.

Here are the steps:
1. Pre-requirement

sudo apt-get update
sudo apt-get install software-properties-common
sudo add-apt-repository universe 
sudo add-apt-repository multiverse
sudo apt-get install libboost-dev libboost-all-dev
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev libatlas-base-dev liblmdb-dev libblas-dev libatlas-base-dev libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler

2. NvCaffe

git clone -b caffe-0.15 https://github.com/NVIDIA/caffe.git

Apply change:

diff --git a/3rdparty/cub/host/mutex.cuh b/3rdparty/cub/host/mutex.cuh
index be29d3e..e25afbe 100644
--- a/3rdparty/cub/host/mutex.cuh
+++ b/3rdparty/cub/host/mutex.cuh
@@ -121,7 +121,7 @@ struct Mutex
          */
         __forceinline__ void YieldProcessor()
         {
-        #ifndef __arm__
+        #if !defined(__arm__) && !defined(__aarch64__)
                 asm volatile("pause\n": : :"memory");
         #endif  // __arm__
         }
diff --git a/Makefile b/Makefile
index 44e1fe5..3054792 100644
--- a/Makefile
+++ b/Makefile
@@ -180,7 +180,7 @@ ifneq ($(CPU_ONLY), 1)
        LIBRARIES := cudart cublas curand
 endif
 
-LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5
+LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial
 
 # handle IO dependencies
 USE_LEVELDB ?= 1
diff --git a/Makefile.config.example b/Makefile.config.example
index d5f269f..9c81f11 100644
--- a/Makefile.config.example
+++ b/Makefile.config.example
@@ -3,7 +3,7 @@
 
 # cuDNN acceleration switch (uncomment to build with cuDNN).
 # cuDNN version 4 or higher is required.
-# USE_CUDNN := 1
+USE_CUDNN := 1
 
 # NCCL acceleration switch (uncomment to build with NCCL)
 # See https://github.com/NVIDIA/nccl
@@ -43,6 +43,7 @@ CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
                -gencode arch=compute_30,code=sm_30 \
                -gencode arch=compute_35,code=sm_35 \
                -gencode arch=compute_50,code=sm_50 \
+               -gencode arch=compute_62,code=sm_62 \
                -gencode arch=compute_50,code=compute_50
 
 # BLAS choice:
@@ -93,7 +94,7 @@ PYTHON_LIB := /usr/lib
 # WITH_PYTHON_LAYER := 1
 
 # Whatever else you find you need goes here.
-INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
+INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/
 LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib
 
 # If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies
cp Makefile.config.example Makefile.config
make -j4
make pycaffe

By the way, if you are looking for DIGITs, a user interface for NvCaffe, it can’t run on TX2 since no NVML support.
But you still can train your model with NvCaffe directly.

Thanks.

It seems to have worked with this adjustment:

https://github.com/BVLC/caffe/issues/4808

Hi,

Both Caffe(your posted) and NvCaffe is working on Jetson.
Installation steps are in comment #2.

Thank you for the steps published!
Does 0.16 version work with Tensor 1.2.1/ 1.3 [if installed at Jetson]? Does it make sense to use TensorRT somehow instead? Is there an instruction how to use TensorRT image recognition [retrained ?]somehow somewhere?

Hi,

NvCaffe-0.15 is compatible to TensorRT2.1(JetPack3.1)
NvCaffe-0.16 needs TensorRT3.0 which is not available on Jetson currently.

Please use NvCaffe-0.15 instead.
Here is a tutorial may help:

Hello!

For regular caffe should we change the line

-gencode arch=compute_61,code=sm_61

to

-gencode arch=compute_62,code=sm_62

as well?

Yes, TX2 GPU architecture is sm_62.

HI:
#make all -j 16
/usr/bin/ld: skipping incompatible /usr/lib/libnvidia-ml.so when searching for -lnvidia-ml
/usr/bin/ld: skipping incompatible /usr/lib/libnvidia-ml.so when searching for -lnvidia-ml
/usr/bin/ld: cannot find -lopencv_imgcode
collect2: error: ld returned 1 exit status
make: *** [.build_release/lib/libcaffe-nv.so.0.16.5] Error 1
make: *** Waiting for unfinished jobs…

Can you tell me how to solve?

331744738,
What application or sample you tried to build - more info? Thanks

In Post No 2 you have a script to apply changes to the makefile and makefile.config.

Can you please expand what tools exactly are needed to run that script?

Hi,

Could you share the setting of your environment with us?
Please remember that the comment #2 is used for nvCaffe-0.15 and JetPack2.1.

For JetPack3.2, we have changed OpenCV to version 3.3.1.
Please modify this configuration to the corresponding libraries before building:
[url]https://github.com/NVIDIA/caffe/blob/caffe-0.16/Makefile.config.example#L21[/url]

Thanks.

i am confused is this a script or a certain make file i should be referring to? where do i apply this change

Hi,

Please git clone our NvCaffe source first:

$ git clone -b caffe-0.15 https://github.com/NVIDIA/caffe.git

Then apply the change with this command:

$ git apply [file]

Thanks.

Hello AastaLLL,
i followed you’re instruction make j4 and got this error fatal: make -j4
CXX src/caffe/layers/hdf5_output_layer.cpp
CXX src/caffe/layers/neuron_layer.cpp
CXX src/caffe/layers/cudnn_conv_layer.cpp
CXX src/caffe/layers/input_layer.cpp
src/caffe/layers/hdf5_output_layer.cpp:3:18: fatal error: hdf5.h: No such file or directory
i even tried to changing the directory to the exact location
src/caffe/layers/hdf5_output_layer.cpp:3:42: fatal error: usr/include/hdf5/serial/hdf5.h: No such file or directory
compilation terminated.
im not sure what the problem is can you help me?

Hi,

Please make sure you have well-installed the Pre-requirement in comment #2.
Thanks.

I double checked, it is all thoroughly installed. The actual usr/include/hdf5/serial/hdf5.h file is there but the error tells me theres no such file or directory.

I’ve even followed the git’s solution of Fix HDF5 Linking Issue

$ sudo ln -s /usr/lib/aarch64-linux-gnu/libhdf5_serial.so.10 /usr/lib/aarch64-linux-gnu/libhdf5.so
$ sudo ln -s /usr/lib/aarch64-linux-gnu/libhdf5_serial_hl.so.10 /usr/lib/aarch64-linux-gnu/libhdf5_hl.so

srill hasnt worked.

I double checked, it is all thoroughly installed. The actual usr/include/hdf5/serial/hdf5.h file is there but the error tells me theres no such file or directory.

I’ve even followed the git’s solution of Fix HDF5 Linking Issue

$ sudo ln -s /usr/lib/aarch64-linux-gnu/libhdf5_serial.so.10 /usr/lib/aarch64-linux-gnu/libhdf5.so
$ sudo ln -s /usr/lib/aarch64-linux-gnu/libhdf5_serial_hl.so.10 /usr/lib/aarch64-linux-gnu/libhdf5_hl.so

srill hasnt worked.

Hi,

Try this:

$CAFFE_ROOT/Makefile.config

--- INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
+++ INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/

Thanks.

Following #2, when attempting git apply changes (#14) I get the following error:

fatal: patch fragment without header at line 41: @@ -43,6 +43,7 @@ CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \

Not sure how to proceed?