Caffe tests get stuck

Hi,

When I install Caffe on my fresh TX (version details below), it gets stuck during the tests.
It’s always during some of the Gradient tests. GDB shows that tests are stuck in cuMemcpy, looping indefinitely and
calling the sleep function.

Did anybody encounter such a situation and has a fix?
How can I validate that my board is not faulty? Any utilities that can test it
and are guaranteed to run well on well-behaved board?

Details about the board and the software:
The board is a new one received a few weeks ago, original power supply.
SW version:
head -n 1 /etc/nv_tegra_release

R24 (release), REVISION: 2.1, GCID: 8028265, BOARD: t210ref, EABI: aarch64, DATE: Thu Nov 10 03:51:59 UTC 2016

Plenty of memory:
#free
total used free shared buff/cache available
Mem: 4090604 2197844 521628 42648 1371132 2154284
Swap: 0 0 0

Caffe was installed using the following instructions:

Also tried Caffe installation from Caffe Deep Learning Framework - 64-bit NVIDIA Jetson TX1 - JetsonHacks
with the same results.

any ideas?

Hi,

Thanks for your question.
I have just verified nvCaffe TOT and it works well.
Could you switch to nvCaffe TOT and try it again.

Thanks.

Steps
1. Install dependence

sudo apt-get update
sudo apt-get install software-properties-common
sudo add-apt-repository universe 
sudo add-apt-repository multiverse
sudo apt-get install libboost-dev libboost-all-dev
sudo apt-get install libgflags-dev libgoogle-glog-dev liblmdb-dev libatlas-base-dev liblmdb-dev libblas-dev libatlas-base-dev libprotobuf-dev libleveldb-dev libsnappy-dev libhdf5-serial-dev protobuf-compiler
git clone https://github.com/NVIDIA/caffe.git

2. Apply difference

diff --git a/3rdparty/cub/host/mutex.cuh b/3rdparty/cub/host/mutex.cuh
index be29d3e..e25afbe 100644
--- a/3rdparty/cub/host/mutex.cuh
+++ b/3rdparty/cub/host/mutex.cuh
@@ -121,7 +121,7 @@ struct Mutex
          */
         __forceinline__ void YieldProcessor()
         {
-        #ifndef __arm__
+        #if !defined(__arm__) && !defined(__aarch64__)
                 asm volatile("pause\n": : :"memory");
         #endif  // __arm__
         }
diff --git a/Makefile b/Makefile
index 44e1fe5..3054792 100644
--- a/Makefile
+++ b/Makefile
@@ -180,7 +180,7 @@ ifneq ($(CPU_ONLY), 1)
        LIBRARIES := cudart cublas curand
 endif
 
-LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_hl hdf5
+LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial
 
 # handle IO dependencies
 USE_LEVELDB ?= 1
diff --git a/Makefile.config.example b/Makefile.config.example
index d5f269f..8d2b9f3 100644
--- a/Makefile.config.example
+++ b/Makefile.config.example
@@ -3,7 +3,7 @@
 
 # cuDNN acceleration switch (uncomment to build with cuDNN).
 # cuDNN version 4 or higher is required.
-# USE_CUDNN := 1
+USE_CUDNN := 1
 
 # NCCL acceleration switch (uncomment to build with NCCL)
 # See https://github.com/NVIDIA/nccl
@@ -93,7 +93,7 @@ PYTHON_LIB := /usr/lib
 # WITH_PYTHON_LAYER := 1
 
 # Whatever else you find you need goes here.
-INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
+INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/
 LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib

3. Make

cp Makefile.config.example Makefile.config
make -j4

Thanks for the reply.
I have reflashed the board using JetPack and executed your instructions line by line.

The build was flawless.

However, when I executed “make test” and then “make runtest”, some of the tests failed:
[==========] 2122 tests from 287 test cases ran. (3322914 ms total)
[ PASSED ] 2116 tests.
[ FAILED ] 6 tests, listed below:
[ FAILED ] DetectNetTransformationLayerTest/3.TestRotation, where TypeParam = caffe::GPUDevice
[ FAILED ] DetectNetTransformationLayerTest/3.TestDesaturation, where TypeParam = caffe::GPUDevice
[ FAILED ] DetectNetTransformationLayerTest/3.TestScaleDown, where TypeParam = caffe::GPUDevice
[ FAILED ] DetectNetTransformationLayerTest/3.TestScaleUp, where TypeParam = caffe::GPUDevice
[ FAILED ] DetectNetTransformationLayerTest/3.TestFlip, where TypeParam = caffe::GPUDevice
[ FAILED ] DetectNetTransformationLayerTest/3.TestNoAugmentation, where TypeParam = caffe::GPUDevice

I then tried executing jetson_clocks.sh and repeating “make runtest”.

This time Gradient tests get stuck exactly in the same way that they used to get stuck on my other Caffe installations.

I have repeated both runtest execution twice and the results are fully consistent.

Does “make runtest” succeed on your board with and without jetson_clocks.sh?

Are there any non-Caffe diagnostic tests that I can run on the board to determine the health of the hardware?

Please run tensorRT sample for hardware check. This can show you if cuDNN and tensorRT work well.

cp -r /usr/src/gie_samples/ ~/
cd ~/gie_samples/samples
make TRAGET=aarch64
mkdir -p ~/gie_samples/samples/bin/data/
cp -r ~/gie_samples/samples/data/* ~/gie_samples/samples/bin/data/

cd ~/gie_samples/samples/bin
./sample_mnist
./sample_mnist_gie
./sample_googlenet

Here is the output of the last three commands.

ubuntu@tegra-ubuntu:~/gie_samples/samples/bin$ ./sample_mnist




---------------------------



@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@+  :@@@@@@@@
@@@@@@@@@@@@@@%= :. --%@@@@@
@@@@@@@@@@@@@%. -@= - :@@@@@
@@@@@@@@@@@@@: -@@#%@@ #@@@@
@@@@@@@@@@@@: #@@@@@@@-#@@@@
@@@@@@@@@@@= #@@@@@@@@=%@@@@
@@@@@@@@@@= #@@@@@@@@@:@@@@@
@@@@@@@@@+ -@@@@@@@@@%.@@@@@
@@@@@@@@@::@@@@@@@@@@+-@@@@@
@@@@@@@@-.%@@@@@@@@@@.*@@@@@
@@@@@@@@ *@@@@@@@@@@@ *@@@@@
@@@@@@@% %@@@@@@@@@%.-@@@@@@
@@@@@@@:*@@@@@@@@@+. %@@@@@@
@@@@@@# @@@@@@@@@# .*@@@@@@@
@@@@@@# @@@@@@@@=  +@@@@@@@@
@@@@@@# @@@@@@%. .+@@@@@@@@@
@@@@@@# @@@@@*. -%@@@@@@@@@@
@@@@@@# ---    =@@@@@@@@@@@@
@@@@@@#      *%@@@@@@@@@@@@@
@@@@@@@%: -=%@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@


0: **********
1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 

ubuntu@tegra-ubuntu:~/gie_samples/samples/bin$ 
ubuntu@tegra-ubuntu:~/gie_samples/samples/bin$ ./sample_mnist_gie



---------------------------



@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@%.:@@@@@@@@@@@@
@@@@@@@@@@@@@: *@@@@@@@@@@@@
@@@@@@@@@@@@* =@@@@@@@@@@@@@
@@@@@@@@@@@% :@@@@@@@@@@@@@@
@@@@@@@@@@@- *@@@@@@@@@@@@@@
@@@@@@@@@@# .@@@@@@@@@@@@@@@
@@@@@@@@@@: #@@@@@@@@@@@@@@@
@@@@@@@@@+ -@@@@@@@@@@@@@@@@
@@@@@@@@@: %@@@@@@@@@@@@@@@@
@@@@@@@@+ +@@@@@@@@@@@@@@@@@
@@@@@@@@:.%@@@@@@@@@@@@@@@@@
@@@@@@@% -@@@@@@@@@@@@@@@@@@
@@@@@@@% -@@@@@@#..:@@@@@@@@
@@@@@@@% +@@@@@-    :@@@@@@@
@@@@@@@% =@@@@%.#@@- +@@@@@@
@@@@@@@@..%@@@*+@@@@ :@@@@@@
@@@@@@@@= -%@@@@@@@@ :@@@@@@
@@@@@@@@@- .*@@@@@@+ +@@@@@@
@@@@@@@@@@+  .:-+-: .@@@@@@@
@@@@@@@@@@@@+:    :*@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@


0: 
1: 
2: 
3: 
4: 
5: 
6: **********
7: 
8: 
9: 

ubuntu@tegra-ubuntu:~/gie_samples/samples/bin$ ./sample_googlenet
Building and running a GPU inference engine for GoogleNet, N=4...
data_zipper                              0.236ms
conv1/7x7_s2 + conv1/relu_7x7            6.522ms
pool1/3x3_s2                             0.701ms
pool1/norm1                              0.319ms
conv2/3x3_reduce + conv2/relu_3x3_reduce 0.242ms
conv2/3x3 + conv2/relu_3x3               2.756ms
conv2/norm2                              0.948ms
pool2/3x3_s2                             0.550ms
inception_3a/1x1 + inception_3a/relu_1x1 0.164ms
inception_3a/pool                        0.282ms
inception_3a/3x3_reduce + inception_3a/r 0.263ms
inception_3a/pool_proj + inception_3a/re 0.158ms
inception_3a/3x3 + inception_3a/relu_3x3 0.680ms
inception_3a/5x5 + inception_3a/relu_5x5 0.270ms
inception_3a/output                      0.001ms
inception_3b/1x1 + inception_3b/relu_1x1 0.336ms
inception_3b/pool                        0.371ms
inception_3b/3x3_reduce + inception_3b/r 0.543ms
inception_3b/pool_proj + inception_3b/re 0.204ms
inception_3b/3x3 + inception_3b/relu_3x3 1.302ms
inception_3b/5x5 + inception_3b/relu_5x5 0.935ms
inception_3b/output                      0.001ms
pool3/3x3_s2                             0.215ms
inception_4a/1x1 + inception_4a/relu_1x1 0.308ms
inception_4a/pool                        0.109ms
inception_4a/3x3_reduce + inception_4a/r 0.168ms
inception_4a/pool_proj + inception_4a/re 0.175ms
inception_4a/3x3 + inception_4a/relu_3x3 0.298ms
inception_4a/5x5 + inception_4a/relu_5x5 0.101ms
inception_4a/output                      0.001ms
inception_4b/1x1 + inception_4b/relu_1x1 0.341ms
inception_4b/pool                        0.123ms
inception_4b/3x3_reduce + inception_4b/r 0.327ms
inception_4b/pool_proj + inception_4b/re 0.147ms
inception_4b/3x3 + inception_4b/relu_3x3 0.336ms
inception_4b/5x5 + inception_4b/relu_5x5 0.147ms
inception_4b/output                      0.000ms
inception_4c/1x1 + inception_4c/relu_1x1 0.179ms
inception_4c/pool                        0.118ms
inception_4c/3x3_reduce + inception_4c/r 0.334ms
inception_4c/pool_proj + inception_4c/re 0.154ms
inception_4c/3x3 + inception_4c/relu_3x3 0.423ms
inception_4c/5x5 + inception_4c/relu_5x5 0.147ms
inception_4c/output                      0.000ms
inception_4d/1x1 + inception_4d/relu_1x1 0.179ms
inception_4d/pool                        0.117ms
inception_4d/3x3_reduce + inception_4d/r 0.329ms
inception_4d/pool_proj + inception_4d/re 0.147ms
inception_4d/3x3 + inception_4d/relu_3x3 0.545ms
inception_4d/5x5 + inception_4d/relu_5x5 0.184ms
inception_4d/output                      0.000ms
inception_4e/1x1 + inception_4e/relu_1x1 0.353ms
inception_4e/pool                        0.120ms
inception_4e/3x3_reduce + inception_4e/r 0.339ms
inception_4e/pool_proj + inception_4e/re 0.183ms
inception_4e/3x3 + inception_4e/relu_3x3 0.652ms
inception_4e/5x5 + inception_4e/relu_5x5 0.263ms
inception_4e/output                      0.001ms
pool4/3x3_s2                             0.095ms
inception_5a/1x1 + inception_5a/relu_1x1 0.184ms
inception_5a/pool                        0.054ms
inception_5a/3x3_reduce + inception_5a/r 0.227ms
inception_5a/pool_proj + inception_5a/re 0.174ms
inception_5a/3x3 + inception_5a/relu_3x3 0.322ms
inception_5a/5x5 + inception_5a/relu_5x5 0.175ms
inception_5a/output                      0.000ms
inception_5b/1x1 + inception_5b/relu_1x1 0.275ms
inception_5b/pool                        0.053ms
inception_5b/3x3_reduce + inception_5b/r 0.186ms
inception_5b/pool_proj + inception_5b/re 0.174ms
inception_5b/3x3 + inception_5b/relu_3x3 0.448ms
inception_5b/5x5 + inception_5b/relu_5x5 0.240ms
inception_5b/output                      0.000ms
pool5/7x7_s1                             0.226ms
loss3/classifier                         0.048ms
prob                                     0.063ms
prob_unzipper                            0.007ms
Time over all layers: 28.296
Done.
ubuntu@tegra-ubuntu:~/gie_samples/samples/bin$

After running jetson_clock.sh, the test still succeeds with the overall time dropping to about 23 seconds.

Please let me know what the next steps should be - Caffe clearly behaves differently on this board compared to yours, with cuMemcpy getting stuck if running after jetson_clock.sh and failing a few tests without it…

Hi,

I tried to run the test and also fail in some item.
But actually we used caffe several time and no error is found.

I think you can try your own use case directly.
And then if you encounter error in your case, please let us know and we will help.

Caffe use lots of cuDNN library, and your cuDNN works normally since you can pass tensorRT test case.