NVencs Output Bitstream is not readable

Christoph1 · July 29, 2016, 12:01pm

I have one question related to Nvidias NVenc API. I want to use the API to encode some OpenGL graphics. My problem is, that the API reports no error throughout the whole program, everything seems to be fine. But the generated output is not readable by, e.g. VLC. If I try to play the generated file, VLC would flash a black screen for about 0.5s, then ends the playback. The Video has the length of 0, the size of the Vid seems rather small, too. Resolution is 1280*720 and the size of 5secs recording is only 700kb. Is this realistic?

The flow of the application is as following:

Render to secondary Framebuffer
Download Framebuffer to one of two PBOs (glReadPixels())
Map the PBO of the previous frame, to get a pointer understandable by Cuda.
Call a simple CudaKernel converting OpenGLs RGBA to ARGB which should be understandable by NVenc according to this(p.18). The kernel reads the content of the PBO and writes the converted content into a CudaArray (created with cudaMalloc) which is registered as InputResource with NVenc.
The content of the converted Array gets encoded. A completion event plus the corresponding output bitstream buffer get queued.
A secondary thread listens on the queued output events, if one event is signaled, the Output Bitstream gets mapped and written to hdd.

The initializion of NVenc-Encoder:

InitParams* ip = new InitParams();
m_initParams = ip;
memset(ip, 0, sizeof(InitParams));
ip->version = NV_ENC_INITIALIZE_PARAMS_VER;
ip->encodeGUID = m_encoderGuid;  //Used Codec
ip->encodeWidth = width; // Frame Width
ip->encodeHeight = height; // Frame Height
ip->maxEncodeWidth = 0; // Zero means no dynamic res changes
ip->maxEncodeHeight = 0; 
ip->darWidth = width; // Aspect Ratio
ip->darHeight = height; 
ip->frameRateNum = 60; // 60 fps
ip->frameRateDen = 1; 
ip->reportSliceOffsets = 0; // According to programming guide
ip->enableSubFrameWrite = 0;
ip->presetGUID = m_presetGuid; // Used Preset for Encoder Config

NV_ENC_PRESET_CONFIG presetCfg; // Load the Preset Config
memset(&presetCfg, 0, sizeof(NV_ENC_PRESET_CONFIG));
presetCfg.version = NV_ENC_PRESET_CONFIG_VER;
presetCfg.presetCfg.version = NV_ENC_CONFIG_VER;
CheckApiError(m_apiFunctions.nvEncGetEncodePresetConfig(m_Encoder,
    m_encoderGuid, m_presetGuid, &presetCfg));
memcpy(&m_encodingConfig, &presetCfg.presetCfg, sizeof(NV_ENC_CONFIG));
// And add information about Bitrate etc
m_encodingConfig.rcParams.averageBitRate = 500000;
m_encodingConfig.rcParams.maxBitRate = 600000;
m_encodingConfig.rcParams.rateControlMode = NV_ENC_PARAMS_RC_MODE::NV_ENC_PARAMS_RC_CBR;
ip->encodeConfig = &m_encodingConfig;
ip->enableEncodeAsync = 1; // Async Encoding
ip->enablePTD = 1; // Encoder handles picture ordering

Registration of CudaResource

m_cuContext->SetCurrent(); // Make the clients cuCtx current
NV_ENC_REGISTER_RESOURCE res;
memset(&res, 0, sizeof(NV_ENC_REGISTER_RESOURCE));
NV_ENC_REGISTERED_PTR resPtr; // handle to the cuda resource for future use
res.bufferFormat = m_inputFormat; // Format is ARGB
res.height = m_height;
res.width = m_width;
// NOTE: I've set the pitch to the width of the frame, because the resource is a non-pitched 
//cudaArray. Is this correct? Pitch = 0 would produce no output.
res.pitch = pitch; 
res.resourceToRegister = (void*) (uintptr_t) resourceToRegister; //CUdevptr to resource
res.resourceType = 
    NV_ENC_INPUT_RESOURCE_TYPE::NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
res.version = NV_ENC_REGISTER_RESOURCE_VER;
CheckApiError(m_apiFunctions.nvEncRegisterResource(m_Encoder, &res));
m_registeredInputResources.push_back(res.registeredResource);

Encoding

m_cuContext->SetCurrent(); // Make Clients context current
MapInputResource(id); //Map the CudaInputResource
NV_ENC_PIC_PARAMS temp;
memset(&temp, 0, sizeof(NV_ENC_PIC_PARAMS));
temp.version = NV_ENC_PIC_PARAMS_VER;
unsigned int currentBufferAndEvent = m_counter % m_registeredEvents.size(); //Counter is inc'ed in every Frame
temp.bufferFmt = m_currentlyMappedInputBuffer.mappedBufferFmt;
temp.inputBuffer = m_currentlyMappedInputBuffer.mappedResource; //got set by MapInputResource
temp.completionEvent = m_registeredEvents[currentBufferAndEvent];
temp.outputBitstream = m_registeredOutputBuffers[currentBufferAndEvent];
temp.inputWidth = m_width;
temp.inputHeight = m_height;
temp.inputPitch = m_width;
temp.inputTimeStamp = m_counter;
temp.pictureStruct = NV_ENC_PIC_STRUCT_FRAME; // According to samples
temp.qpDeltaMap = NULL;
temp.qpDeltaMapSize = 0;

EventWithId latestEvent(currentBufferAndEvent,
    m_registeredEvents[currentBufferAndEvent]);
PushBackEncodeEvent(latestEvent); // Store the Event with its ID in a Queue

CheckApiError(m_apiFunctions.nvEncEncodePicture(m_Encoder, &temp));
m_counter++;
UnmapInputResource(id); // Unmap

Every little hint, where to look at, is very much appreciated. I’m running out of ideas what might be wrong.

Thanks a lot!

hall822 · July 29, 2016, 8:09pm

Hello Christoph,

I’m working on a very similar problem with similar issues. So I’m very interested in any information that gets posted here. I haven’t tried connecting up OpenGL to the input (through CUDA) of the encoder yet. However, I do have a few observations that may be helpful to you.

The h264 and hevc encoders are capable of 300x-400x compression relative to plain RGBA. This is highly dependent on the quality and bitrate settings you use. For example a lossless compression may only be 3x compression. In other words, a small output is not surprising. Although it might be at an undesirable quality.
The output of the encoder is not directly playable in VLC. However if you run the output through FFMPEG with -codec copy, it will package the output up into a playable .mp4 file.
I’ve only gotten VLC to successfully play the sample file. I suspect that I’m not packaging up all the meta data properly on my own input. The NVDecode sample, however, has no trouble reading and displaying anything I put through the encoder. I recommend using it to verify if your vid file is corrupt or not.

Bare in mind I’ve just started studying video Encoding/Decoding a couple weeks ago. So take anything posted here with a grain of salt ;)

-Philippe

Christoph1 · July 30, 2016, 11:33am

Hey Philippe,

thanks for your reply. What I’ve learned so far: VLC is not capable of playing the raw bitstream of NVenc. Never the less if you tell VLC by renaming your video file to e.g. <fileName.h264> which Codec you have used should enable the playback.

Would you mind to share your configuration of th NVenc API? I would like to see, if I missed something important. Or may you take a look over the posted snippets of mine and compare my setup to yours?

At the moment, the encoding result of my OpenGL content looks like this:
External Media

Thanks, Christoph

hall822 · August 1, 2016, 1:31pm

Hello Christoph,

Looking at your output, it looks like a misalignment issue. My first guess would be that somewhere in your memory allocation or data communication you have bytes (sizeof color channel) confused with uint32 (sizeof pixel), given that your output just happens to be 25% of the size of the frame.

My second guess is how you’re using pitch. My understanding is that pitch is a padded version of width that allows for 2-D spatial locality and that you may need to use the API to get the pitch value. Currently I’m using nvEncLockInputBuffer to get the pitch. For CUDA interop, I think you get the pitch when allocating the input buffer. The sample code uses cuMemAllocPitch.

My current settings for the encoder are nearly identical to the sample code (NvEncoder.cpp defaults except frame size and input & output files).

Have you verified if the frames coming out of your CUDA kernel are uncorrupted? Also if you are using a shader to write to the framebuffer, you can do a simple swizzle operation to convert to ARGB instead of using a kernel to reduce some complexity.

-Philippe

hall822 · August 1, 2016, 5:23pm

It just occurred to me that your problem is probably both issues I proposed in the prev post. Pitch is measured in bytes while width is measured in pixels. If you multiply your pitch by 4, it might fix the problem.

-Philippe

Christoph1 · August 1, 2016, 7:11pm

duplicate

Christoph1 · August 1, 2016, 7:12pm

Hello Philippe,

thanks for your proposals! They helped a lot. Indeed, multiplying the pitch by 4 solved one of the problems.

This is the input:
External Media
This is the result:
External Media

Still a long road. I dont know if creating pitched memory is explicitly needed in OpenGL interop.
But in my opinion the wrong color plus the images beeing upside down is not pitch related.

If you or anyone else has further ideas, everything is very much appreciated.

Christoph

hall822 · August 1, 2016, 7:48pm

The road may not be as long as you think. The upside down issue is due to screen coordinates being flipped compared to texture coordinates. Just render your texture upside down to fix this. The color looks like your RGB channels are scrambled. This is likely a small bug in your CUDA kernel.

You may also want to reduce your qp on the encoder to improve the quality. This will also increase the file size.

-Philippe

Christoph1 · August 2, 2016, 1:08pm

It works. Again, Philippe, thanks a lot!

As you said, the RGB channels were scrambled. I thought because of
NV_ENC_BUFFER_FORMAT::NV_ENC_BUFFER_FORMAT_ARGB, that I’m supposed to provide a ARGB ordering. But aktually, the ordering ment by this enum is BGRA, so the alpha value, which is allways 255, polluted the blue channel and therefore the image was blue. ^^
Edit: This may be due to the fact that NVidia is using little endian internally. I’m writing
my pixel data to a byte array, choosing an other type like int32 may allow one to pass actual ARGB data.

Up to know I’m using my Cuda Kernel to mirror the image horizontally and swap the blue and the red channel. The execution takes about 3ms per frame.
Currently I’m rendering into a Renderbuffer-Framebuffer and blit this buffer in GL_BACK afterwards. May be interessting if rendering to texture with upside down coordinates would outspeed my current approach.

One additional information that may be important for anyone interested in OpenGL interop: You dont need to provide pitched memory. So a CUdeviceptr created with cuMemAlloc is perfectly fine, the pitch when registering this resource should then be set to FrameWidth * num(ColorChannels).

Would you mind to stress a bit more on this? Which QP Value do you mean? Where can I find it? At the moment the created video is ok, but still a bit unsmooth and the quality could be better even at high bitrates.

hall822 · August 2, 2016, 2:01pm

Hello Christoph,

QP stands for Quantization Parameter which is the amount of allowable information loss due removing lower bits. Since I’ve been using NvHWEncoder from the samples as a wrapper, the QP is easy for me to access. Manipulating it directly in the nvEncodeAPI is nontrivial as there are a number of different areas and frame types where you can remove bits.

It may be better for you to try different bitrates and presets. Also take a look at the table at the end of the NVEnc programmer’s guide

-Philippe