nppiGetPerspectiveTransform() bug Problem with nppiGetPerspectiveTransform() library function

Recent experimentation with the nppiGetPerspectiveTransform function, which is part of the Nvidia Performance Primitives (NPP) library, has exposed a potential problem. A short C++ program was written to call the nppiGetPerspectiveTransform function. The result was then compared to that of a MATLAB program and another C++ program based on the OpenCV library using the same data, and the results did not agree.
The call to the NPP function has the following syntax:
nppiGetPerspectiveTransform(srcRoi, quad, coefs);
We used the following rectangle for our test (upper left corner at (380, 52), width of 90 and height of 100:
srcRoi.x = 380;
srcRoi.y = 52;
srcRoi.width = 90;
srcRoi.height = 100;
This is equivalent to a rectangle with the following vertices:
src[0][0] = 380.; src [0][1] = 52.;
src[1][0] = 470.; src [1][1] = 52.;
src[2][0] = 470.; src [2][1] = 152.;
src[3][0] = 380.; src [3][1] = 152.;
The 2-D array quad contains the coordinates of the corners of a quadrilateral, clockwise from the upper left corner:
quad[0][0] = 383.; quad[0][1] = 51.;
quad[1][0] = 473.; quad[1][1] = 56.;
quad[2][0] = 468.; quad[2][1] = 147.;
quad[3][0] = 380.; quad[3][1] = 143.;
The variable coefs is a 3x3 array initialized to zeros. Upon exit, it should contain the perspective transform matrix H such that Q = H * S, where Q is the homogeneous version of the transpose of quad, and S is the homogeneous version of the transpose of src. The homogeneous versions of both Q and S simply used ones for the third row. When the program is executed, the matrix H returned is:
1.064197 0.059106 -24.468256
0.062450 0.962939 -22.803817
0.000112 0.000235 0.945217
If S is then left-multiplied by H, the following matrix is obtained:
383.000116 478.777846 484.688446 388.910716
51.000011 56.620511 152.914411 147.293911
0.999997 1.010077 1.033577 1.023497
Converting from homogeneous to non-homogeneous, we have our estimate of (the transpose of) quad:
383.001265 474.001335 468.942755 379.982273
51.000164 56.055638 147.946801 143.912401
Comparing with the original quad, the error terms are:
0.001265 1.001335 0.942755 -0.017727
0.000164 0.055638 0.946801 0.912401
Note that four of the error terms are much closer to one than to zero. A MATLAB-based homography solver was then used to find H, given the same Q and S as above. The resulting matrix is:
1.104397 0.061408 -20.930201
0.064809 1.000437 -23.128942
0.000116 0.000244 0.992569
Repeating the checking procedure used above, and calculating the error terms, we have:
0.036027 0.051977 0.069702 0.050864
0.004779 0.006133 0.021853 0.019102
Note that all of the error terms are much closer to zero than to one (or -1, for that matter). Also note that the double-precision floating point values obtained using MATLAB were rounded to the six decimal places shown above for a more accurate comparison with the values obtained using the NPP function. When the full precision MATLAB values are used, the error terms are all on the order of 1e-10.
Finally, a similar function from the OpenCV library, cvGetPerspectiveTransform, was used to find H given Q and S. That function returns a matrix equivalent to the inverse of the H returned by the NPP function, so in order to compare the two, its inverse is shown here:
1.104401 0.061403 -20.930171
0.064814 1.000432 -23.128913
0.000116 0.000245 0.992568
Checking this result in the usual manner and calculating the error terms gives:
0.018640 0.030770 0.005580 -0.001839
0.003891 0.005438 0.002873 0.000081
Again, all the error terms are much closer to zero than to any other integer, even more so than the MATLAB-based results. Note that the values used for the OpenCV version of H were also rounded to six decimal places for comparison to the NPP and MATLAB cases.
In summary, three independent homography solvers were used to find a matrix H that maps one set of four points in a plane to a second set of four points in the same plane. Two of the solvers, one written in MATLAB, and one from the OpenCV library, gave very similar results and produced very similar errors (which were all close to zero). The NPP solver produced a result with significant errors in half the terms when the homography was applied to the source matrix. The errors were approximately one pixel in magnitude, compared to errors on the order of 0.01 pixel for the other methods. For this reason, we believe there is a potential problem with the NPP function nppiGetPerspectiveTransform.
nppPerspectiveTransform.zip (5.93 KB)

We reproduced this bug under the windows 4.2 release and get the same incorrect results as under the 4.1 unix release.

Hi Beau,

I’ve just spent some time looking into your issue. We have completely rewritten the Warp code for our upcoming 5.0 release. If you’re a registered developer you should already be able to download a preview build of this release.

I had done quite a bit of hand-verification of the code and in addition to our existing unit tests I added a test using your parameters. I also added some code to compute the transformed source-rect using the transformation obtained from a call to nppiGetPerspectiveTransform. Here’s a piece of console output from running that test:

Dest Quad: [v0 = (383, 51), v1 = (473, 56), v2 = (468, 147), v3 = (380, 143)]

NPP Transform: [(1.11266, 0.0618677, -21.0869)

                (0.0652942, 1.00793, -23.3021)]

                (0.000117067, 0.00024628, 1)]

v0 = (383, 51)

v1 = (473, 56)

v2 = (468, 147)

v3 = (380, 143)

The values v0, …, v3 at the bottom are the source rect’s vertices transformed using the NPP Transform (which was obtained using nppiGetPerspectiveTransform). These values perfectly match the original Dest Quad (see first line of the console log).

If you have a chance to check out NPP 5.0, please let me know if your issue is fixed (which my little experiment seems to indicate).

Frank,

Thanks for looking into this. We are registered on the site, could you please let me know how to obtain the 5.0 NPP release? Can we use it with our existing RHEL 4.1 cuda installation?

much appreciated

Hi Beau,

the CUDA 5.0 preview release should go up on the registered developer site some time today. We’ve just officially announced it at GTC.

As for your existing installation, you would have to upgrade the toolkit and the driver to the latest release. Since NPP from the preview release is linked against the new CUDA runtime libraries, it wouldn’t work if you just dropped the newer NPP dynamic libraries into an existing 4.2 release. Sorry :-(

–Frank

The location of the download is not obvious on the developer site, would you please post the link.

much appreciated,

http://developer.nvidia.com/rdp/cuda-50-preview-license-agreement

Frank,
We installed the version 5 release on Windows and the fix works fine, thank you.

For our device, a Quadro 4000 we are getting about 250MFlops, but the device can get about 30 Glops for other applications. I suspect that the device is underutilized for the size of the computation and data.

For our application we want to calculate numerous homographies, e.g. 300+ and then use those in a subsequent pipeline step. Might it be possible to extend the nppiGetPerspectiveTransform function to take a list of arguments and return a list of corresponding H? I think that this could provide much better utilization of the gpu and much better througput than looping for each H calculation.

thanks

Hi, I think there is some misunderstanding that the GetPerspectiveTransform works on the GPU. It runs completely on host. We do not intend to provide a device implementation for this method. In your case a batched device implementation might provide a performance benefit, but there is still the question of how you would pass those matrices to the Warp primitves from the device. If you had to read those back any potentialy performance gain would likely be lost.

I’d also be curious to see how much of the overall time of your algorithm is spent in the transform computation vs. the actual warping of the image. Even computing 300 of those transforms is not that many instructions vs millions of instructions to warp a single image.

Frank, thank you for the clarification, that was not clear in the nppi documentation. How can we tell in the documentation which functions are on the gpu and which are on the host? We came to the same conclusion on the cost that the warping operations will dwarf getting the perspective transform so doing that on the cpu might be fine.