issues with dlib library

I am trying to run a python script on Jetson Nano board, which performs facial detection and embeddings calculation using dlib library. When I try to print the embeddings. The values of 128 embeddings are displaying in very large number sometimes and sometimes as NaN. Same script is displaying values properly in all other devices like Tx2 or i386/amd64 Linux machine.

I have tried installing dlib using both methods like, “pip install dlib” as well as building from source. Both cases, I am getting similar result.

Could anyone please suggest, how this problem can be resolved?

Thanks in advance

Hi,

To give further information, would you mind to share your script with us?
Thanks.

Please find the code below

import face_recognition
import cv2
import os
import numpy as np
import dlib 

face_locations = []
face_encodings = []

### Path where images are present for testing
imagefolderpath = "Images/"

### Model for face detection
face_detector = dlib.get_frontal_face_detector()

for image in os.listdir(imagefolderpath):
    image = cv2.imread(os.path.join(imagefolderpath,image),1)
    faces = face_detector(image,0)
    for face in faces:
        x,y,w,h = face.left(),face.top(),face.right(),face.bottom()
        face_locations.append((x,y,h,w))
    face_encodings = face_recognition.face_encodings(image, known_face_locations = face_locations, num_jitters = 1)
    print(face_encodings)

    for (left, top, bottom, right) in face_locations:
        cv2.rectangle(image, (left,top), (right, bottom), (0, 0, 255), 2)
        cv2.imshow('Image', image)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

Hi,

We are installing the dlib library and it will take some time.
At the same time, could you share the dlib and OpenCV version of Nano and TX2 with us?

Thanks.

I have exactly the same issue. I have been testing C++ code and get nan or very large number as the descriptors output.

dlib 19.17

I have the same issue, my code runs on Tx1, Tx2, and Xavier without problems, but it produces same error on nano,
I tried both dlib ver. 19.16, and ver. 19.17.
actually “face_recognition.batch_face_locations” function outputs correct location of faces using “CNN”
but the issue is with this function “face_recognition.face_encodings”! the output is very large numbers or NaNs.

Hi,

We tested the script on opencv 3 and dlib 19.17 version.

Like abdu307 said, we also facing issue only when calculating embeddings. face detection and identifying the face locations is working fine

Hi,

We originally thought that this may be caused by a different OpenCV or dlib version across the platform.
But looks not.

For the usecase error occurs, guess that it may be related to the OOM.
Could you try to run your application with cuda-memcheck to get more information?

cuda-memcheck python myApp.py

Thanks.

Hi,

I have ran the memcheck command with my python script and I got the following output.

========= CUDA-MEMCHECK
========= Internal Memcheck Error: Initialization failed
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1 (cuDevicePrimaryCtxRetain + 0x154) [0x1fd7d4]
=========     Host Frame:/usr/local/lib/python3.6/dist-packages/dlib.cpython-36m-aarch64-linux-gnu.so [0x8389c4]
=========

Hi,

Thanks for your log. Looks like not a memory issue.

We are checking the dlib source code and still need some time to give a suggestion.
http://dlib.net/files/dlib-19.6.tar.bz2

Say tuned.

Hello, I also encountered the same problem with NaN. Can you find a solution?Please tell me, thank you very much!

Same problem with Jetson Nano. I get sometimes values but most time NaN. Also the face_encoding examples from Dlib aren’t working.

Hi,

We are working on this issue but still need some time.
Stay tuned.

Hi,

We found this issue may comes from cudnn and is checking with our internal team now.
Thanks.

Glad to hear that it sounds like the developers are making progress on identifying this bug.

FWIW the memcheck error above appears to come from not running the utility as root.

I get no initialization errors after running the cuda-memcheck utility as root.

Using the code listed above, below was the expected and received results for the first entry in my numpy array:

Expected: -0.08488056
Received: 1.13017666e+18

I hope that helps.

Hi,

We found a workaround to unblock this issue.
Please use basic cudnnConvolutionForward algorithm instead.

1. Download source

wget http://dlib.net/files/dlib-19.16.tar.bz2
tar jxvf dlib-19.16.tar.bz2

2. Apply this changes:

diff --git a/dlib/cuda/cudnn_dlibapi.cpp b/dlib/cuda/cudnn_dlibapi.cpp
index a32fcf6..6952584 100644
--- a/dlib/cuda/cudnn_dlibapi.cpp
+++ b/dlib/cuda/cudnn_dlibapi.cpp
@@ -851,7 +851,7 @@ namespace dlib
                         dnn_prefer_fastest_algorithms()?CUDNN_CONVOLUTION_FWD_PREFER_FASTEST:CUDNN_CONVOLUTION_FWD_NO_WORKSPACE,
                         std::numeric_limits<size_t>::max(),
                         &forward_best_algo));
-                forward_algo = forward_best_algo;
+                //forward_algo = forward_best_algo;
                 CHECK_CUDNN(cudnnGetConvolutionForwardWorkspaceSize( 
                         context(),
                         descriptor(data),

3. Build and install

mkdir build
cd build
cmake ..
cmake --build .
sudo python setup.py install

Our internal team keep checking the cuDNN issue and will let you know if any progress.
Thanks.

1 Like

I will give this a try and let you guys know if it works. Thank you guys for the help and swift response.

Hey @AastaLLL

It appears that the patch you have provided works as a temporary solution.

Again using the sample code from earlier in the thread, below were my results from testing.

Expected: -0.08488056
Received: -0.08488055

Slight change in accuracy but that’s probably from using a different model in the DLIB library?

Will still be waiting for the patch when it comes out, but I can confirm that this works for an immediate solution.

For anyone else who would needs to use the workaround from above, ensure that you remove DLIB via pip before/after running the setup.py in the instructions above if you previously have it installed from such. If you do not, you will still get NaN and accuracy errors despite manually compiling and installing DLIB.

I can also confirm that pulling the current version of DLIB (19.17 at the moment) via git and applying this patch works.

Thank you @AastaLLL

Hello,

i tryed to install this patch, but i don’t know how to Apply this changes.
I am not a Linux expert. ;)

Thanks for advice

Deleted