I can’t seem to get MVTec Halcon to recognize the Pascal GPU inside a TX2. Here is the code for the test app I’m running:
// HalconTest.cpp
#include <iostream>
#include <string>
#include <cstdlib>
#include <halconcpp/HalconCpp.h>
// function prototypes
bool stringContainsSubstringIgnoreCase(std::string fullString, std::string substring);
int main(int argc, char *argv[])
{
// comment out this first try-catch if you don't want to try to use the GPU
try
{
// query for GPUs
HalconCpp::HTuple possibleGpuIdentifiers;
HalconCpp::QueryAvailableComputeDevices(&possibleGpuIdentifiers);
std::cout << "1" << "\n\n";
auto possibleGpuIdentifiersLength = possibleGpuIdentifiers.Length();
//auto asdfasd = possibleGpuIdentifiersLength.L();
std::cout << "possibleGpuIdentifiersLength = " << possibleGpuIdentifiersLength << "\n\n";
if (possibleGpuIdentifiersLength <= 0)
{
std::cout << "HalconCpp::QueryAvailableComputeDevices() was not able to identify any compute devices" << "\n\n";
}
else
{
HalconCpp::HTuple gpuDeviceId;
for (int i = 0; i < possibleGpuIdentifiers.Length(); i++)
{
std::cout << "1a" << "\n\n";
// get the GPU name as a HTuple
HalconCpp::HTuple tupPossibleGpuName;
HalconCpp::GetComputeDeviceInfo(possibleGpuIdentifiers[i], "name", &tupPossibleGpuName);
std::cout << "1b" << "\n\n";
// convert the GPU name to a string and log the name
auto charArrPossibleGpuName = tupPossibleGpuName.SArr();
std::string possibleGpuName(*charArrPossibleGpuName);
std::cout << "possibleGpuName = " << possibleGpuName << "\n\n";
std::cout << "1c" << "\n\n";
if ((stringContainsSubstringIgnoreCase(possibleGpuName, "GTX") && (stringContainsSubstringIgnoreCase(possibleGpuName, "1070") || stringContainsSubstringIgnoreCase(possibleGpuName, "1080"))) ||
(stringContainsSubstringIgnoreCase(possibleGpuName, "Quadro") && stringContainsSubstringIgnoreCase(possibleGpuName, "P2000")) ||
(stringContainsSubstringIgnoreCase(possibleGpuName, "Pascal")))
{
gpuDeviceId = possibleGpuIdentifiers[i];
}
std::cout << "1d" << "\n\n";
}
std::cout << "2" << "\n\n";
// get the GPU name as a HTuple
HalconCpp::HTuple tupGpuName;
HalconCpp::GetComputeDeviceInfo(gpuDeviceId, "name", &tupGpuName);
std::cout << "3" << "\n\n";
// convert the GPU name to a string and log the name
auto charArrGpuName = tupGpuName.SArr();
std::string gpuName(*charArrGpuName);
std::cout << "4" << "\n\n";
// open the device handle
HalconCpp::HTuple deviceHandle;
HalconCpp::OpenComputeDevice(gpuDeviceId, &deviceHandle);
std::cout << "5" << "\n\n";
// set the GPU params
HalconCpp::SetComputeDeviceParam(deviceHandle, "asynchronous_execution", "false");
std::cout << "6" << "\n\n";
// use the GPU for all possible Halcon functions
HalconCpp::InitComputeDevice(deviceHandle, "all");
std::cout << "7" << "\n\n";
// finally we can activate the GPU with Halcon
HalconCpp::ActivateComputeDevice(deviceHandle);
std::cout << "8" << "\n\n";
std::cout << "GPU configuration successful, gpuName = " << gpuName << "\n\n";
}
}
catch (HalconCpp::HException &ex)
{
std::cout << "unable to configure GPU with Bananas" << "\n" << ex.ErrorCode() << "\n" << ex.ErrorMessage() << "\n" << "\n";
//return (0);
}
// from here down is GPU-independent, show an image as a test
try
{
// open the image
HalconCpp::HImage image("image.png");
// get the image width and height
HalconCpp::HTuple imageWidth;
HalconCpp::HTuple imageHeight;
HalconCpp::GetImageSize(image, &imageWidth, &imageHeight);
// show the image width and height
std::cout << "imageWidth = " << imageWidth.ToString() << "\n\n";
std::cout << "imageHeight = " << imageHeight.ToString() << "\n\n";
// instantiate an HWindow
HalconCpp::HWindow hWindow(0, 0, imageWidth, imageHeight);
// show the HImage in the HWindow
hWindow.DispImage(image);
// wait for a click, then clear the window
hWindow.Click();
hWindow.ClearWindow();
}
catch (HalconCpp::HException& exception)
{
std::cout << "Halcon error: " << exception.ErrorCode() << "\n" << exception.ErrorMessage() << "\n";
}
}
bool stringContainsSubstringIgnoreCase(std::string fullString, std::string substring)
{
// note that these string variables are pass by value so changing them here does not affect the variables in the calling function
// convert fullString to lower case for case-insensitive comparison
for (int i = 0; i < fullString.length(); i++)
{
fullString[i] = std::tolower(fullString[i]);
}
// convert substring to lower case for case-insensitive comparison
for (int i = 0; i < substring.length(); i++)
{
substring[i] = std::tolower(substring[i]);
}
// if we find the substring before going off the end of the full string, then the full string contains the substring
if (fullString.find(substring) != std::string::npos)
{
return true;
}
else // otherwise it dosen't
{
return false;
}
}
This code successfully recognizes and uses a GTX 1070 or 1080 GPU in a desktop computer and a Quadro P2000 on a server, however when I run it on a TX2 it never recognizes the GPU. The portion at the end that shows the image as a test still works on a TX2, and since the GPU portion of the code (the first 2/3) works in the other two cases I’m pretty convinced the code is doing the proper things to shake hands with Halcon.
I’ve ran this code on a desktop with a GTX 1070/1080 under both Ubuntu 16.04 w/ CUDA 9.0 and cuDNN 7.1 and Ubuntu 18.04 w/ CUDA 9.2 and cuDNN 7.2, it works on both. It also works on a server with Ubuntu 18.04 server, a Quadro P2000, and CUDA 9.2 and cuDNN 7.2
For the TX2, I’m developing on a native install Ubuntu 16.04 host. I installed JetPack 3.3 and performed the full flash of the TX2 as recommended by the JetPack 3.3 install process. I’m convinced the flash went well and the TX2 hardware is good since I can run GPU-accelerated OpenCV 3.3.1 (as installed/flashed by JetPack) and I successfully compiled Tensorflow 1.10 with Bazel 0.18.0 on the TX2 and that runs GPU-accelerated successfully on the TX2 as well.
I’m developing using Nsight on the Ubuntu 16.04 host. I’m confident I’m performing the cross-compile steps well since GPU-accelerated OpenCV and TensorFlow programs work well.
I also know the JetPack flash was successful based on the following commands and output:
nvidia@tegra-ubuntu:~/HalconTest$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Sun_Nov_19_03:16:56_CST_2017
Cuda compilation tools, release 9.0, V9.0.252
nvidia@tegra-ubuntu:~/HalconTest$ cat /usr/local/cuda/version.txt
CUDA Version 9.0.252
nvidia@tegra-ubuntu:~/HalconTest$ cat /usr/include/aarch64-linux-gnu/cudnn_v7.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 1
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
nvidia@tegra-ubuntu:~/HalconTest$ dpkg -l | grep libopencv
ii libopencv 3.3.1 arm64 Open Computer Vision Library
I’d like to reiterate that the Halcon test app runs and shows the test image successfully, however it does not recognize the TX2’s Pascal GPU, so as far as I can tell I’ve correctly installed and configured Halcon. I did not have to perform any other special steps to get Halcon to recognize the GPU in the case of the desktop and server. I’m using Halcon 18, which is stated to work on all these platforms. Unfortunately not recognizing the TX2’s Pascal GPU is unacceptable in my case since we will be deploying a vision application on the TX2 and the processing power of the GPU is essential.
Any idea what I’m doing wrong? Has anybody else gotten Halcon to recognize the TX2’s Pascal GPU? Any suggestions as to other steps to try or stuff to check?
– Edit –
I was just looking at the Halcon documentation for the function QueryAvailableComputeDevices and I found the following:
At present, HALCON only supports OpenCL compatible GPUs supporting the OpenCL extension cl_khr_byte_addressable_store and image objects. If you are not sure whether a certain device is supported, please refer to the manufacturer.
Does the TX2 Pascal GPU supporting the OpenCL extension cl_khr_byte_addressable_store
and image objects? What does this mean and how can I check?
– Edit 2 –
I’ve discovered some more information here, and things are not looking good for getting the TX2 to run Halcon.
Bearing in mind that per the Halcon documentation, for a platform to use the Halcon GPU the platform must support OpenCV and the cl_khr_byte_addressable_store device extension specifically, on my Ubuntu 16.04 host with GTX 1080 where Halcon can successfully use the GPU, I did this:
sudo apt-get install clinfo
clinfo
I got these results:
$ clinfo
Number of platforms 1
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.2 CUDA 10.0.132
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
Platform Extensions function suffix NV
Platform Name NVIDIA CUDA
Number of devices 1
Device Name GeForce GTX 1080
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 CUDA
Driver Version 415.27
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Device Topology (NV) PCI-E, 01:00.0
Max compute units 20
Max clock frequency 1733MHz
Compute Capability (NV) 6.1
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple 32
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 8511488000 (7.927GiB)
Error Correction support No
Max memory allocation 2127872000 (1.982GiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Global Memory cache type Read/Write
Global Memory cache size 327680
Global Memory cache line 128 bytes
Image support Yes
Max number of samplers per kernel 32
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x32768 pixels
Max 3D image size 16384x16384x16384 pixels
Max number of read image args 256
Max number of write image args 16
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max constant buffer size 65536 (64KiB)
Max number of constant args 9
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) Yes
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [NV]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform
The important part of these results is that cl_khr_byte_addressable_store is listed under Device Extensions towards the bottom.
Then I did the same on the TX2:
sudo apt-get install clinfo
clinfo
And I got this:
Number of platforms 0
Clearly this is not encouraging. Is there some way to install OpenCL and the cl_khr_byte_addressable_store device extension on the TX2? Or am I dead in the water here?