How to transform points from image space to world space?

Hello, I have a quick question about rendering the points obtained from LaneGraph. We have followed the workflow described in the documentation and are using LaneNet as source with H264 serilized data from one of our drives using Sekonix AR-231-RCCB. The LeaneNet results are correct and we use the lane detection results as input to dwLaneGraph_setLanesMeasurement followed by dwLaneGraph_processHost.
The documentation states:
“The geometry includes the center paths of lanes or as the dual, their boundaries, represented in the ego vehicle’s local coordinate system”.
So I assume that the ego vehicle “local coordinate system” means the rig space. We have loaded our rig file and obtained the sensor to rig transform with dwRigConfiguration_getSensorToRigTransformation and stored the inverse of this matrix (Mat4_IsoInv) as rig to camera. So I assume if I transform each 3D point obtained from LaneGraph which is in “ego vehicle’s local coordinate system” and then project it with dwCalibratedCamera_ray2Pixel I should get valid points in image space. We have tried this scheme before with Mobileye 3D points in Rig (World) space and it has worked, but LaneGraph points are not correct. Is this the right approach? There are, AFAIK, no samples showing how to render LaneGraph results. I’d appreciate any hints or info please. I am running this on DrivePX2 with SDK 5.0.10.3 and DriveWorks 1.2.

Allen

Hi Allen,

in order to do the rendering of the lane graph in image space you can either

  • use an HDMap lane graph which takes a localized position which is further decided by the lane detections, or
  • a LaneNet lane graph directly from the lane detections.
  • The following code explains in detail how to perform this task:

    // converting a 3D point on the lanegraph to pixel coordinates on an image 
    dwVector2f rig2Pixel(const dwVector3f& rigPt,                   // query lg pt represented in rig 
                         dwCalibratedCameraHandle_t cameraHandle,   // camera intrinsic 
                         const dwTransformation& cam2Rig)           // camera extrinsic 
    { 
        dwVector2f out; 
        out.x = -1.0;   // invalid coordinates, out of bounds 
        out.y = -1.0; 
     
        // apply transformation on rigPt -> camPt 
        float camarr[3] = {0, 0, 0}; 
        float rigarr[3] = {rigPt.x, rigPt.y, rigPt.z}; 
     
        float rig2cam[16]; 
        Mat4_IsoInv(rig2cam, cam2Rig.array); 
        Mat4_Axp(camarr, rig2cam, rigarr); 
     
        // check whether the camarr will be inside the camera field of view 
        bool isInFov{false}; 
        dwCalibratedCamera_isRayInsideFOV(&isInFov, camarr[0], camarr[1], camarr[2], cameraHandle); 
     
        // convert the ray to pixel coordinates using the camera intrinsic model 
        // the result is valid only if the ray is in the fov 
        if (isInFov) 
            dwCalibratedCamera_ray2Pixel(&out.x, &out.y, camarr[0], camarr[1], camarr[2], cameraHandle); 
     
        return out; 
    } 
     
    // to convert a polyline from rig representation to pixel coordinates 
    void convertPolylinesRig2Pixel(dwLaneGraphPolyline* polyline) 
    { 
        for (uint32_t ptIdx = 0; ptIdx < polyline->pointCount; ++ptIdx) 
        { 
            auto& pt   = polyline->points[ptIdx]; 
            auto camPt = rig2Pixel(pt); 
            pt.x       = camPt.x; 
            pt.y       = camPt.y; 
            pt.z       = 0.0; 
        } 
    } 
     
    // to render a complete lanegraph 
    for (uint32_t laneIdx = 0; laneIdx < lanegraph.laneCount; ++laneIdx) 
    { 
        // center line 
        convertPolylinesRig2Pixel(&result.lanes[laneIdx].geometry); 
     
        // left boundary 
        if (result.lanes[laneIdx].isLaneDividerValid[0]) 
        { 
            convertPolylinesRig2Pixel(&result.lanes[laneIdx].laneDividers[0].geometry); 
        } 
     
        // right boundary 
        if (result.lanes[laneIdx].isLaneDividerValid[1]) 
        { 
            convertPolylinesRig2Pixel(&result.lanes[laneIdx].laneDividers[1].geometry); 
        } 
    }
    

    Hope that helps.

    • Fabian

    Hi Fabian

    Thanks for the reply. I think this is pretty much exactly what I am doing, although I was not doing the dwCalibratedCamera_isRayInsideFOV check and I should.
    I was wondering about this part:
    auto& pt = polyline->points[ptIdx];
    auto camPt = rig2Pixel(pt);
    pt.x = camPt.x;
    pt.y = camPt.y;
    pt.z = 0.0;

    Why set up a 3D point in image space with z = 0? Do you render using:
    dwRenderEngine_render(DW_RENDER_ENGINE_PRIMITIVE_TYPE_LINESTRIP_2D, pt, sizeof(dwVector3f)…
    or
    dwRenderEngine_render(DW_RENDER_ENGINE_PRIMITIVE_TYPE_LINESTRIP_3D, pt, sizeof(dwVector3f)…

    and initialize the rendering engine with a LookAt and projection matrix?

    BTW I have tried all of them and I think I am still doing something wrong. I am also loading a rig file we generated when we calibrated our cameras at NVIDIA more than a year ago and I see that rig system has changed a lot. Could it be that my rig file is not compatible with this SDK and some how the camera extrinsic/intrinsic is not loaded correctly. I’ll start printing out the matrices and see if they make any sense.

    Regards
    Allen

    Hi Allen,

    in the following sample I show you how to transform each pixel from image space to world space without using lanegraph at all.

    // Draw lanes
            for (uint32_t laneIdx = 0U; laneIdx < m_laneDetectionResult.numLaneMarkings; ++laneIdx) {
                const dwLaneMarking& laneMarking = m_laneDetectionResult.laneMarkings[laneIdx];
    
    /******************************************************************************
    NVIDIA, Fabian Weise, fweise@nvidia.com                              12/10/2018
    
        1. Get the points from our lane markings
        2. For each point we need to transform it into world space
    ******************************************************************************/
                for (uint32_t pointIdx = 0U; pointIdx < laneMarking.numPoints; ++pointIdx) {
                    dwVector2f point = laneMarking.imagePoints[pointIdx];
    
    /******************************************************************************
    NVIDIA, Fabian Weise, fweise@nvidia.com                              12/10/2018
    
        3. Print out the point in image space
    ******************************************************************************/
                    printf("[Image Point] x: %f, y: %f\n", point.x, point.y);
    
    /******************************************************************************
    NVIDIA, Fabian Weise, fweise@nvidia.com                              12/04/2018
    
        4. u and v are our coordinates in image space
        5. To transform them, we need some 3D world space coordinates, let them be
           the following x, y and z
    ******************************************************************************/
                    float32_t u = point.x;
                    float32_t v = point.y;
    
                    // Transform bottom center of the camera into 3D world space
                    float32_t x, y, z;
    
    /******************************************************************************
    NVIDIA, Fabian Weise, fweise@nvidia.com                              12/10/2018
    
        6. This is the camera which acquired the original image
        7. Print its values in ray space
    ******************************************************************************/
                    CHECK_DW_ERROR(dwCalibratedCamera_pixel2Ray(&x, &y, &z, u, v, m_calibratedCamera));
                    printf("[Ray] x: %f, y: %f, z: %f\n", x, y, z);
    
    /******************************************************************************
    NVIDIA, Fabian Weise, fweise@nvidia.com                              12/10/2018
    
        8. Now we are moving from camera coordinate system to world coordinate
           system, hence we need some empty variables first
        9. Transform using our previous calculated Sensor --> Rig matrix into our
           rayInRig array
       10. Print its values in world space
    ******************************************************************************/
                    float32_t pointH[4] = {x, y, z};
                    float32_t rayInRig[3];
                    Mat4_Rxp(rayInRig, m_transformationSensor2Rig.array, pointH);
                    printf("[rayInRig] x: %f, y: %f, z: %f\n", rayInRig[0], rayInRig[1], rayInRig[2]);
    
    /******************************************************************************
    NVIDIA, Fabian Weise, fweise@nvidia.com                              12/10/2018
    
       11. Calculate the intersection to get the 2D positions (image space again)in
           order to e.g. render them in another view (needs to be done on your
           side)
       12. This equals a simple plane model with some set camera height
       13. Imagine the world view where the camera it positioned at the bottom,
           now let us print our the values according to the lane marking points
    ******************************************************************************/
                    float32_t t = m_transformationSensor2Rig.array[14] / rayInRig[2];
    
                    float32_t xGround = t*rayInRig[0] + m_transformationSensor2Rig.array[12];
                    float32_t yGround = t*rayInRig[1] + m_transformationSensor2Rig.array[13];
    
                    std::cout << "Ground X:"  << xGround << " Y:" << yGround << std::endl;
                }
    
                dwLanePositionType category = laneMarking.positionType;
                CHECK_DW_ERROR(dwRenderEngine_setColor(getLaneMarkingColor(category), m_renderEngine));
                CHECK_DW_ERROR(dwRenderEngine_setLineWidth(6.0f, m_renderEngine));
                CHECK_DW_ERROR(dwRenderEngine_render(DW_RENDER_ENGINE_PRIMITIVE_TYPE_LINESTRIP_2D,
                                                     laneMarking.imagePoints,
                                                     sizeof(dwVector2f), 0, laneMarking.numPoints,
                                                     m_renderEngine));
    
                // Render lane type at the far end of the lane
                char laneType[2] = "";
                switch (laneMarking.lineType) {
                    case DW_LANEMARK_TYPE_SOLID:
                        laneType[0] = 'S';
                        break;
                    case DW_LANEMARK_TYPE_DASHED:
                        laneType[0] = 'D';
                        break;
                    case DW_LANEMARK_TYPE_ROAD_BOUNDARY:
                        laneType[0] = 'B';
                        break;
                    case DW_LANEMARK_TYPE_UNDEFINED:
                    default:
                        break;
                }
                CHECK_DW_ERROR(dwRenderEngine_renderText2D(laneType,
                                                           laneMarking.imagePoints[laneMarking.numPoints-1],
                                                           m_renderEngine));
            }
    

    Hope that helps as all the comments are pretty self explaining.

    • Fabian

    Hi Fabian,

    Thanks for the sample code. I’ve managed to integrate this into the sample laneDetection code and get Ground X and Y points out. However, I’m a bit unclear on what exactly these Ground X and Y correspond to, and what is the reference point for them.

    I see that there is some scaling being done with:

    float32_t t = m_transformationSensor2Rig.array[14] / rayInRig[2];

    Is this fixing the height at the ground, and then calculating the depth and the lateral distance from the center of the rear wheel axle?

    Thanks

    Göksan

    Hi Göksan,

    Would you please file a new topic for your current issue, then we can follow up and support you through there?

    Thanks

    Hi,

    we implemented your second code for image to world space transformation into the Camera Calibration Sample but after a few seconds it freezes and only works for 2 seconds before freezing again. Where should we put your code in (we tried in onProcess at the bottom of the if(m_pendingCude) statement) and do we need to consider something else?

    Andy