how to use the AprilTag GPU C/C++API outside of ISAAC

dk1900 · January 9, 2020, 9:17pm

We are particularly interested in intergrating the ISAAC GPU version of AprilTag in our C++/C application but do not want to stay within ISAAC SDK.

I am seeing the apriltag logic has been wrapped into a dynamic library file “libapril_tags_module.so” so I assume this should be possible. If so, where is the documentation for that? I did not find such information from the SDK documentation.

Thanks in advance

shrinv · January 10, 2020, 6:35am

First you have to download the SDK.

From your C/C++ code base you can use our CAPI to pull in the JSON file containing the April Tag Messages

Message API for April Tag is at https://docs.nvidia.com/isaac/isaac/doc/message_api.html#fiducial-list

C API is documented @ https://docs.nvidia.com/isaac/isaac/engine/alice/c_api/doc/c_api.html#isaac-c-api

Receiving the message from Isaac @ https://docs.nvidia.com/isaac/isaac/engine/alice/c_api/doc/c_api.html#receiving-a-message-from-isaac

Building the C API @ https://docs.nvidia.com/isaac/isaac/engine/alice/c_api/doc/c_api.html#building-the-c-api

dk1900 · January 13, 2020, 7:37pm

Accessing the AprilTag GPU code through the ISSAC might be a bit too heavy for our application. Is there a way to directly call the GPU code by passing ISSAC.
I checked the dependency of the libapril_tags_module.so seems it does not depend on any ISSAC code, so to me it seems to be possible to access without ISSAC.

Thanks

Below libapril_tags_module.so dependencies:

ldd ./packages_x86_64/perception/libapril_tags_module.so
./packages_x86_64/perception/libapril_tags_module.so: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.22' not found (required by ./packages_x86_64/perception/libapril_tags_module.so)
linux-vdso.so.1 =>  (0x00007ffd063fe000)
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007ffa47fb8000)
libnvToolsExt.so.1 => /home/jzhang/svn/sw/static/linux/x86_64/usr/local/cuda/lib64/libnvToolsExt.so.1 (0x00007ffa47daf000)
libcudart.so.10.0 => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so.10.0 (0x00007ffa47b35000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ffa47931000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ffa47628000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ffa472a6000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ffa47090000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ffa46e73000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffa46aa9000)
/lib64/ld-linux-x86-64.so.2 (0x00007ffa493c6000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ffa468a1000)
libnvidia-fatbinaryloader.so.384.130 => /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.130 (0x00007ffa4664f000)

dchichkovd0qb3 · January 13, 2020, 10:35pm

As far as I understand it, the AprilTag GPU code carries the license of Isaac SDK code. If your can accept that license, linking directly to the library might be an option. An example of that linkage could be seen in the packages.bzl build file (see ‘apriltags’) and associated third party library archive. This library is not a part of Isaac API. But, as long as the version of the Isaac SDK release is supported and you work under the licensing agreement, you probably will be able to continue accessing a particular version of the library you are linking to.

dk1900 · January 14, 2020, 7:53pm

Hello @dchichkovd0qb3,

Thanks very much for your reply. This is exactly what I want.
I am trying to play with the apriltag GPU code by downing from the place you mentioned (ISSAC 2019.3) , and got some strange results on my X86_64 host with a Quadro P3200 GPU (Ubuntu 16 + CUDA 10.2 + 440.33.01 Driver). I can create create a AprilTag object and destroy it but when I exit the main function, I will get a cuda segfault. Could you please give me a working example of using this AprilTag code? Thanks.

int main {
    nvAprilTagsHandle hApriltags;
    nvAprilTagsCameraIntrinsics_t cam = {100, 100, 320, 240 };

    int ret = nvCreateAprilTagsDetector(&hApriltags, 640, 480, NVAT_TAG36H11, &cam, 10.0);
    // do nothing
    nvAprilTagsDestroy(hApriltags);
    return 0;
}

Note the ret value is 0, so seems the detector has been correctly created. The segfault happens when the code is exiting the scope of the main function.

CUDA segfault:

0x0000000000435443 in cudart::contextState::markChangeModuleUnload(cudart::globalModule*) ()
(cuda-gdb) bt
#0  0x0000000000435443 in cudart::contextState::markChangeModuleUnload(cudart::globalModule*) ()
#1  0x000000000043d8d1 in cudart::contextStateManager::notifyContextStatesOfModuleUnload(cudart::globalModule*) ()
#2  0x000000000042ef4e in cudart::globalState::destroyModule(cudart::globalModule*) ()
#3  0x000000000042f650 in cudart::globalState::unregisterFatBinary(cudart::globalModule*) ()
#4  0x00007ffff6b7cff8 in secure_getenv () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007ffff6b7d045 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007ffff6b63837 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
(cuda-gdb) 
#0  0x0000000000435443 in cudart::contextState::markChangeModuleUnload(cudart::globalModule*) ()
#1  0x000000000043d8d1 in cudart::contextStateManager::notifyContextStatesOfModuleUnload(cudart::globalModule*) ()
#2  0x000000000042ef4e in cudart::globalState::destroyModule(cudart::globalModule*) ()
#3  0x000000000042f650 in cudart::globalState::unregisterFatBinary(cudart::globalModule*) ()
#4  0x00007ffff6b7cff8 in secure_getenv () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007ffff6b7d045 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007ffff6b63837 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6

dk1900 · January 14, 2020, 10:15pm

Another questions is related to the input data struct of this GPU AprilTag detector:

typedef struct nvAprilTagsImageInput_st
{
    uchar4* dev_ptr;    //!< Device pointer to the buffer
    size_t pitch;       //!< Pitch in bytes
    uint16_t width;     //!< Width in pixels
    uint16_t height;    //!< Buffer height
}nvAprilTagsImageInput_t;

The input requires a uchar4 GPU device buffer, however in our usecase we only have GRAYSCALE images.
So the questions is could NVIDIA provide an new API which support grayscale image?
Or Is the detection was done on each channel separately? Can I pack 4 grayscale images and have a way to know which results corresponding to which channel.

Thanks

FrancoisCarouge · January 15, 2020, 12:33am

Hello dk1900,
Below is your code with additional details to get you to a functional example.
We would also recommend Ubuntu 18.04 / CUDA 10.0 for compatibility with Isaac 2019.3.

int main {
    nvAprilTagsHandle hApriltags;
    nvAprilTagsCameraIntrinsics_t cam = {100, 100, 320, 240 };
    cudaStream_t cuda_stream = {};

    int ret = nvCreateAprilTagsDetector(&hApriltags, 640, 480, NVAT_TAG36H11, &cam, 0.18);
    // cudaStreamCreate(&(cuda_stream));
    // std::vector<nvAprilTagsID_t> tags;
    // uint32_t num_tags;
    // int error = nvAprilTagsDetect(hApriltags, &(image), tags.data(), &num_tags, 20, cuda_stream);
    // cudaStreamDestroy(cuda_stream);
    nvAprilTagsDestroy(hApriltags);
    return 0;
}

We will have to get back to you with more information about your other question:

dk1900 · January 15, 2020, 8:18pm

@FrancoisCarouge, Thanks for your example.
I think the crash might be a Ubuntu 16 issue as I cannot reproduce it on my Jetson TX2 dev board.

I am looking forward to your answer to my second question.

shrinv · January 17, 2020, 12:49am

Yes, the current version is Ubuntu 18.04

dk1900 · January 19, 2020, 6:58pm

I do not have any luck to use the AprilTag library to successfully detect the tag yet, for some reason if I feed the library with a RGBA image converted from gray scale (duplicated RGB same with grayscale value and fix alpha channel) or RGBA image, it is unable to decode anything. Would you guys be able to provide me a sample image file together with one complete working code sample?

I am wondering could we just get the source code for this AprilTag library? It is a bit hard to debug without source code and an working examples.

A summary of my requests based list by the priority :-)
-1. A full working example and jpeg image to demonstrate how to use AprilTag GPU code to detect and estimate the pose
-2. A new API which support Grayscale input image
-3. Source code access so we can do more optimisation and customisation.

dk1900 · January 20, 2020, 8:42pm

Good news I finally find the issue why the detector cannot detect and after fixing that issue, I can now detect the tags on TX2. Thanks for the support you guys provide.

The issue was with Ubuntu 16 + the pitch size setting in the input image to the detector and the code is really fast for the SD image, the detection + pose estimation only takes 8.5ms on TX2 (averaged for 1000 frames)

Now the only urgent help I still need is to have a new API supporting detection on GrayScale images :-).

jsysteam · June 24, 2020, 8:44am

Hello. I can not find any links for this headers and library, please say where you get it? :) Can I got there other headers?
I want direct usage without node system…

bhanner · July 23, 2020, 8:43pm

I too would love to access GPU optimized version of AprilTag detection outside of Isaac. I have the CPU based version up and running on a TX2 but its killing me that I can’t put the GPU to work on this…

Firrel · September 22, 2020, 1:14pm

For those that are still looking for it, the header file and library can be found here.

I am trying to use the AprilTag GPU library as well. I think I am not correctly creating the image data (nvAprilTagsImageInput_st). I am getting segfault when calling nvAprilTagsDetect. Could you please point me into what I am doing wrong?

Here is my code sample:

#define ROWS 480
#define COLS 640

int main() {
    nvAprilTagsHandle hApriltags;
    nvAprilTagsCameraIntrinsics_t cam = {100, 100, COLS/2, ROWS/2 };
    cudaStream_t cuda_stream = {};

    int ret = nvCreateAprilTagsDetector(&hApriltags, COLS, ROWS, NVAT_TAG36H11, &cam, 0.18);
    cudaStreamCreate(&(cuda_stream));

    //... get image from camera

    const unsigned int bytes = COLS * ROWS * sizeof(uchar4);
    uchar4 *h_a = (uchar4*)malloc(bytes);
    memcpy(h_a, color_frame.get_data(), bytes);

    uchar4 *d_a;
    cudaMalloc((uchar4**)&d_a, bytes);
    cudaMemcpy(d_a, h_a, bytes, cudaMemcpyHostToDevice);

    nvAprilTagsImageInput_st image;
    image.dev_ptr = d_a;
    image.height = ROWS;
    image.width = COLS;
    image.pitch = COLS;

    std::vector<nvAprilTagsID_t> tags;
    uint32_t num_tags;
    int error = nvAprilTagsDetect(hApriltags, &image, tags.data(), &num_tags, 20, cuda_stream);

    cudaStreamDestroy(cuda_stream);
    nvAprilTagsDestroy(hApriltags);
    return 0;
}

This is from core dump:

#0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:186
#1  0x00007f8f48ca127c in TagDecoder::decode(DeviceDataView<uchar4> const&, nvAprilTagsID_st*, unsigned int) () from .../libapril_tags_module.so
#2  0x00007f8f48cac749 in AprilTags::DetectAndDecodeAprilTags(DeviceDataView<uchar4> const&, nvAprilTagsID_st*, unsigned int*, unsigned int, CUstream_st*) () from .../libapril_tags_module.so
#3  0x00007f8f48c9ac3c in nvAprilTagsDetect () from .../libapril_tags_module.so

zzhihao23 · November 25, 2020, 2:43am

@dk1900 hi! I get the same question with you tha I have a gray scale image from opencv and I want to know how you figure it out. Could you please provide a working example? I can creat and destroy the detector successfully but I don’t know how to form the input image. Thanks!

Andrey1984 · February 23, 2021, 3:31am

@shrinv
is there any sample for reading/printing continiously tag position as text output, e.g. in the terminal?

hemals · February 23, 2021, 6:27pm

We have released a ROS2 native package around our NVAprilTags GPU-accelerated detector library on NVIDIA-AI-IOT here. You can post questions about it on GitHub itself and I’ll try to get to them.

hemals · February 23, 2021, 6:30pm

There isn’t really an example to do that, no, but you could write another component that receives the detections and prints them to terminal easily enough.

Andrey1984 · February 23, 2021, 6:41pm

folks tried to use rosbridge in order to try is as another compoinent
but we have hard times running it either as it results in Seg fault even if we just execute

bazel run //apps/samples/navigation_rosbridge

from default uncompressed folder

Upd
we will try https://github.com/NVIDIA-AI-IOT/ros2-nvapriltags
@hemals

    `Subscription taking message`
    `[component_container-1] [DEBUG] [1614244532.503365570] [rcl]: Subscription take succeeded: true`
    `[component_container-1] component_container: /home/ty-desktop-20/ros2_ws/src/ros2-nvapriltags/src/AprilTagNode.cpp:68: void AprilTagNode::AprilTagsImpl::initialize(const AprilTagNode&, uint32_t, uint32_t, size_t, size_t, const ConstSharedPtr&): Assertion `april_tags_handle != nullptr' failed.`
    `[ERROR] [component_container-1]: process has died [pid 63595, exit code -6, cmd '/opt/ros/foxy/lib/rclcpp_components/component_container --ros-args --log-level DEBUG --ros-args -r __node:=tag_container -r __ns:=/apriltag'].`

ctxqlxs · April 16, 2021, 7:53am

@hemals @FrancoisCarouge @shrinv

There was no response in NVIDIA-AI-IOT…
I want to know how to handle the BGR image from Opencv Video stream?
cv::cvtColor(frame, img_rgba8, cv::COLOR_BGR2RGBA)
I use the transformation above to get the RGBA image, but it can not detect anything…

The complete code is as follows:

#include <iostream>
#include "nvapriltags/nvAprilTags.h"
#include "cuda.h"
#include "cuda_runtime.h"
#include <opencv2/opencv.hpp>
#include <chrono>
using namespace cv;

// copy from https://github.com/NVIDIA-AI-IOT/ros2-nvapriltags/blob/main/src/AprilTagNode.cpp

struct AprilTagsImpl {
    // Handle used to interface with the stereo library.
    nvAprilTagsHandle april_tags_handle = nullptr;
    // Camera intrinsics
    nvAprilTagsCameraIntrinsics_t cam_intrinsics;

    // Output vector of detected Tags
    std::vector<nvAprilTagsID_t> tags;

    // CUDA stream
    cudaStream_t main_stream = {};

    // CUDA buffers to store the input image.
    nvAprilTagsImageInput_t input_image;

    // CUDA memory buffer container for RGBA images.
    uchar4 *input_image_buffer = nullptr;

    // Size of image buffer
    size_t input_image_buffer_size = 0;

    int max_tags;

    void initialize(const uint32_t width,
                    const uint32_t height, const size_t image_buffer_size,
                    const size_t pitch_bytes,
                    const float fx, const float fy, const float cx, const float cy,
                    float tag_edge_size_, int max_tags_) {
        assert(!april_tags_handle), "Already initialized.";

        // Get camera intrinsics
        cam_intrinsics = {fx, fy, cx, cy};

        // Create AprilTags detector instance and get handle
        const int error = nvCreateAprilTagsDetector(
                &april_tags_handle, width, height, nvAprilTagsFamily::NVAT_TAG36H11,
                &cam_intrinsics, tag_edge_size_);
        if (error != 0) {
            throw std::runtime_error(
                    "Failed to create NV April Tags detector (error code " +
                    std::to_string(error) + ")");
        }

        // Create stream for detection
        cudaStreamCreate(&main_stream);

        // Allocate the output vector to contain detected AprilTags.
        tags.resize(max_tags_);
        max_tags = max_tags_;
        // Setup input image CUDA buffer.
        const cudaError_t cuda_error =
                cudaMalloc(&input_image_buffer, image_buffer_size);
        if (cuda_error != cudaSuccess) {
            throw std::runtime_error("Could not allocate CUDA memory (error code " +
                                     std::to_string(cuda_error) + ")");
        }

        // Setup input image.
        input_image_buffer_size = image_buffer_size;
        input_image.width = width;
        input_image.height = height;
        input_image.dev_ptr = reinterpret_cast<uchar4 *>(input_image_buffer);
        input_image.pitch = pitch_bytes;
    }

    ~AprilTagsImpl() {
        if (april_tags_handle != nullptr) {
            cudaStreamDestroy(main_stream);
            nvAprilTagsDestroy(april_tags_handle);
            cudaFree(input_image_buffer);
        }
    }
};

int main() {
    printf("cuda main");
    VideoCapture capture;
    int width = 640;
    int height = 480;
    float fx = 388.239;
    float fy = 388.239;
    float ppx = 317.285;
    float ppy = 245.185;

    capture.open(-1);
    capture.set(cv::CAP_PROP_FRAME_WIDTH, width);
    capture.set(cv::CAP_PROP_FRAME_HEIGHT, height);
    Mat frame;
    Mat img_rgba8;
    capture>>frame;
    cv::cvtColor(frame, img_rgba8, cv::COLOR_BGR2RGBA);
    auto *impl_ = new AprilTagsImpl();
    impl_->initialize(img_rgba8.cols, img_rgba8.rows,
                      img_rgba8.total() * img_rgba8.elemSize(),  img_rgba8.step,
                      fx,fy,ppx,ppy,
                      0.5,
                      6);

    while (capture.isOpened())
    {

        capture>>frame;
        cv::cvtColor(frame, img_rgba8, cv::COLOR_BGR2RGBA);
        auto start = std::chrono::system_clock::now();

        const cudaError_t cuda_error =
                cudaMemcpy(impl_->input_image_buffer, (uchar4 *)img_rgba8.ptr<unsigned char>(0),
                           impl_->input_image_buffer_size, cudaMemcpyHostToDevice);

        if (cuda_error != cudaSuccess) {
            throw std::runtime_error(
                    "Could not memcpy to device CUDA memory (error code " +
                    std::to_string(cuda_error) + ")");
        }

        uint32_t num_detections;
        const int error = nvAprilTagsDetect(
                impl_->april_tags_handle, &(impl_->input_image), impl_->tags.data(),
                &num_detections, impl_->max_tags, impl_->main_stream);
        if (error != 0) {
            throw std::runtime_error("Failed to run AprilTags detector (error code " +
                                     std::to_string(error) + ")");
        }

        for (int i = 0; i < num_detections; i++) {
            const nvAprilTagsID_t &detection = impl_->tags[i];

            // corners
            for (auto corner : detection.corners) {
               float x = corner.x;
               float y = corner.y;
            }

        }

        auto end = std::chrono::system_clock::now();
        int fps = int(1000 / ( std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() + 1));
        cv::putText(frame, "FPS: "+ std::to_string(fps), cv::Point(100,100),
                    cv::FONT_HERSHEY_PLAIN, 5, cv::Scalar(0xFF, 0xFF, 0), 2);
        std::cout<<"num_detections: "<<num_detections<<std::endl;

        cv::namedWindow("frame", 0);
        cv::resizeWindow("frame", 1280,800);
        cv::imshow("frame", frame);
        if (cv::waitKey(10)==27)
            break;
    }
    delete(impl_);
    return 0;
}