YUV 420 to RGB conversion

Hi,

I’m trying to get decoded video frames back into RAM, therefore I’m basically building ontop of the source code of the NvDecodeGL sample. I’m successfully decoding the frames and the Y channel looks fine. As soon as I try to get color however, I fail, getting a lot of glitches. I’m not sure how the YUV data layout looks like. Is there documentation on this? The chroma format is YUV420 in my case, with progressive frames, so what I tried is this:

inline void yuv2rgb( unsigned char y, unsigned char u, unsigned char v, unsigned char &r, unsigned char &g, unsigned char &b )
{
	r = clamp<int>( y + 1.403f * v, 0, 255 );
	g = clamp<int>( y - 0.344f * u - 1.403 * v, 0, 255 );
	b = clamp<int>( y + 1.770f * u, 0, 255 );
}

//...

const unsigned char *y = src;
const unsigned char *u = src + ( srcStride * height );
const unsigned char *v = u + ( srcStride * height ) / 4;

for( int j = 0; j < height; j++ )
{
	for( int i = 0; i < width; i++ )
	{
		yuv2rgb( y[i], u[i/2], v[i/2], dst[0], dst[1], dst[2] );

		dst += 3;
	}

	if( j & 0x01 )
	{
		u += srcStride / 2;
		v += srcStride / 2;
	}

	y += srcStride;
	dst += ( dstStride - width * 3 );
}

where src is the host memory pointer where data is read to with cuMemcpyDtoHAsync (cp. frameYUV in the GL sample) and srcStride is the pitch returned by cuvidMapVideoFrame. dstWidth and dstHeight simply are the texture dimensions (thus video target size, not coded size). I tried around a bit, but I keep failing. What I get are interlaced effects since every other row is shifted, so I’m guessing my assumptions on data layout are wrong. Could anybody shed some light on this?

Also, being a novice at CUDA programming, I’m wondernig how you guarantee that data copying is finished by the time it’s used, after all it seems you’re using an async copy operation in the sample with cuMemcpyDtoHAsync.

Hi

As soon as I try to get color however, I fail, getting a lot of glitches:

The output YUV format supported by NVDECODE is NV12. The enum cudaVideoSurfaceFormat_enum in dynlink_cuviddec.h is the enumerator for output formats. NV12 has two planes, a Y plane and an interleaved UV plane. Look at this link for a visualization: https://msdn.microsoft.com/en-us/library/windows/desktop/dd206750(v=vs.85).aspx#nv12 . The offsets for u,v seems to be the problem in your code.

Also, being a novice at CUDA programming, I’m wondernig how you guarantee that data copying is finished by the time it’s used, after all it seems you’re using an async copy operation in the sample with cuMemcpyDtoHAsync:

You should use cuStreamSynchronize(hStream) where hStream is the stream handle that was used in cuMemcpyDtoHAsync. Alternatively you can also use cuStreamQuery(hStream) and wait until you receive a success from the call.

Thanks!