NVDEC Video Size Limitation for GTX 960

I have problems decoding 4K video (8-bit 3840x2160) on a GTX 960 card. The decoder seems to decode only the top 503 lines. When I look at the decoded NV12 video data, the first 503 rows have Y values > 0, while starting with row 504 all the Y values are 0.

When I look at the CbCr data, the first 247 rows have values close to 80 (the first frame is mostly black). But starting with row 248, they are all 0.

The data is 4096 bytes aligned. So it looks like this:
ptr + 4096 * 0 … ptr + 4096 * 503 are all good (0x0F)
ptr + 4096 * 504 … ptr + 4096 * 2159 are all 0s
ptr + 4096 * 2160 … ptr + 4096 * 2407 are all good (0x80)
ptr + 4096 * 2408 … ptr + 4096 * 3239 are all 0s

I believe this card has GM206 chipset (although I don’t know how to determine this, since the NVIDIA specifications page doesn’t mention anything about chipset). If it is, indeed GM206, then according to https://developer.nvidia.com/nvidia-video-codec-sdk#NVDECFeatures, this card should be able to handle video up to 10-bit 4096x2304. So 3840 x 2160 8-bit should work.

A second problem is that this card fails to decode 1080p 10-bit video. It seems to think it is only 8 bit and the decoded data is all garbage.

Yes, GTX 960 should decode your 4K just fine.

To get 10-bit you need either Video SDK 8 or modified headers that can be found on the internet for SDK 7.

To help properly we’re going to need to see your code.

I just updated to the latest device driver (382.05) and the 1st problem of getting partial frames for 3840x2160 is fixed. So I am going to now focus of the problem decoding 10-bit video.

The code is rather long to post it all here.

What is the format of the uncompressed 10-bit data? For example, the output bytes at the beginning of the buffer are like this:

0x14AA0000 2b af 06 c1 73 20 5d 3d 2c 68 1a 78 72 0d 50 17
0x14AA0010 0a 5b 24 97 52 e5 75 48 46 78 31 c2 6b fc ad ef

I use a parser created with cuvidCreateVideoParser. The CUVIDEOFORMAT structure passed to the SequenceCallback has chroma_format set to cudaVideoChromaFormat_420.

I think that leaves the CUVIDPICPARAMS structure passed to the DecodePicture callback. Which member of this structure tells me the output is 10-bit?

I see that Video SDK v8 has these two extra fields in CUVIDHEVCPICPARAMS:
unsigned char bit_depth_luma_minus8;
unsigned char bit_depth_chroma_minus8;

These two values are set to 2 for my 10-bit video, so the NVIDIA decoder realizes the data is 10-bit.

If I set
oVideoDecodeCreateInfo.OutputFormat = cudaVideoSurfaceFormat_P016;
oVideoDecodeCreateInfo.bitDepthMinus8 = 2;
before calling cuvidCreateDecoder, I do seem to get the correct 16-bit output data.

So it looks like I have everything to handle this properly now.

That’s great, silviu22! Thanks for updating the status.

I plan to implement this also. May I ask, the U and V are still interleaved (but 16 bits of course), yes?

Yes, the P016 is a 16-bit version of NV12. This is pretty much the same as the output from Intel Media SDK decoder for the same file.

Thank you! Rock on.

I want to decode 10-bit video.If I need to get the correct 8-bit output data,what the oVideoDecodeCreateInfo.OutputFormat and oVideoDecodeCreateInfo.bitDepthMinus8 set as?

OutputFormat: cudaVideoSurfaceFormat_NV12.

The source stream bit depth is given to you in pFormat returned from HandleVideoSequence(). Do not change it.

I am finally got the correct 8-bit output data, this problem has been bothering me for several days.Thank you!

Great to hear and you are welcome. Glad to assist.

I have a new question, Can cuda decode the lossless compressed video? If so, what is the difference between with decoding lossy HEVC video, Is it necessary to modify some parameters or need other operations?