Encoding Video with NVCUVENC DEVICE_MEMORY_INPUT NVVE_DEVICE_MEMORY_INPUT

Hi,

I successfully used the C API for the NVCUCENC Videoencoder for encoding videos. As an input I passed a pointer to host memory.

The images that I am encoding are RGBA and I use a cuda kernel to convert them to YV12.
Like this:

char* d_buf
cutilSafeCall(cudaMalloc((void**) &d_buf, mem_size_buf))

… kernel to generate YV12 frame there…

cutilSafeCall(cudaMemcpy(buf, d_buf, mem_size_buf,cudaMemcpyDeviceToHost));

to transfer the YV12 pixels to a unsigned char* buffer which I then pass to the EncodeFrame function with
something like this:
efparams.picBuf =buf;
HRESULT hr = NVEncodeFrame(pCudaEncoder,&efparams,0,NULL);

Since the data is on the GPU anyways I thought I would get a speedup if I use the NVVE_DEVICE_MEMORY_INPUT flag.
and pass the d_buf directly as to the NVEncodeFrame as the pData as described in the docs.
HRESULT hr = NVEncodeFrame(pCudaEncoder,&efparams,0,d_buf);
But I only get rubbish then. Any ideas?

Best regards,
dimo

It seems that I haven’t set the context right. Somewhere buried down in the doc I found this:
“Device Context Lock parameter must also be set if device memory input is enabled. Context lock should be created from cuvidCtxLockCreate API available in NVCUVID.”

I am not familiar with the Driver Api. How do I get the current context in the runtime API, so I can pass it correctly to the Encoder?

Best regards,
dimo

It seems that I haven’t set the context right. Somewhere buried down in the doc I found this:
“Device Context Lock parameter must also be set if device memory input is enabled. Context lock should be created from cuvidCtxLockCreate API available in NVCUVID.”

I am not familiar with the Driver Api. How do I get the current context in the runtime API, so I can pass it correctly to the Encoder?

Best regards,
dimo

hi again,

did anyone manage to start the NVHWEncoder with a context lock?

That’s what I am trying to do now:

[codebox]

CUcontext cucontext;

error = cudaGetDevice(&d);

CUdevice cud;

result = cuDeviceGet(&cud,d);

result = cuCtxCreate(&cucontext,CU_CTX_SCHED_AUTO,cud);

int useDeviceMemory = 1;

CUvideoctxlock g_CtxLock = NULL;

cuCtxPopCurrent(cuContext); //tried with and without this line

result = cuvidCtxLockCreate(&g_CtxLock, *cuContext);

hr = SetParamValue(encoder->pCudaEncoder,NVVE_DEVICE_MEMORY_INPUT, &useDeviceMemory);

hr = SetParamValue(encoder->pCudaEncoder,NVVE_DEVICE_CTX_LOCK,g_CtxLock );

HRESULT hr = NVCreateHWEncoder(pCudaEncoder) //crashes here!

[/codebox]

this crashes. When I comment out the line with SetParamValue(encoder->pCudaEncoder,NVVE_DEVICE_CTX_LOCK,g_CtxLock ) it doesnt crash, but the memory passed to the encoder is not the one that I filled then.

Any ideas?

Best regards,

dimo

hi again,

did anyone manage to start the NVHWEncoder with a context lock?

That’s what I am trying to do now:

[codebox]

CUcontext cucontext;

error = cudaGetDevice(&d);

CUdevice cud;

result = cuDeviceGet(&cud,d);

result = cuCtxCreate(&cucontext,CU_CTX_SCHED_AUTO,cud);

int useDeviceMemory = 1;

CUvideoctxlock g_CtxLock = NULL;

cuCtxPopCurrent(cuContext); //tried with and without this line

result = cuvidCtxLockCreate(&g_CtxLock, *cuContext);

hr = SetParamValue(encoder->pCudaEncoder,NVVE_DEVICE_MEMORY_INPUT, &useDeviceMemory);

hr = SetParamValue(encoder->pCudaEncoder,NVVE_DEVICE_CTX_LOCK,g_CtxLock );

HRESULT hr = NVCreateHWEncoder(pCudaEncoder) //crashes here!

[/codebox]

this crashes. When I comment out the line with SetParamValue(encoder->pCudaEncoder,NVVE_DEVICE_CTX_LOCK,g_CtxLock ) it doesnt crash, but the memory passed to the encoder is not the one that I filled then.

Any ideas?

Best regards,

dimo

Hi again,

it’s a pity that this thread is a monolog, but maybe someone can profit from my findings ;).

I still couldnt manage to encode using the device memory. However it doesnt crash anymore.

I had a stupid mistake and now I changed the line here

[codebox]

hr = SetParamValue(encoder->pCudaEncoder,NVVE_DEVICE_CTX_LOCK,g_CtxLock );

to:

hr = SetParamValue(encoder->pCudaEncoder,NVVE_DEVICE_CTX_LOCK,&g_CtxLock );

[/codebox]

I also embraced all my cuda calls in cuvidCtxLock and cuvidCtxUnlock as described in the cuviddec.h now.

This is nowhere documented in the encoder documentation.

It doesnt crash anymore now and encodes faster, which is good. Bad thing is that the encoded frames are pitch black.

Really no one out there who managed to use device memory for encoding?

Best regards,

dimo

Hi again,

it’s a pity that this thread is a monolog, but maybe someone can profit from my findings ;).

I still couldnt manage to encode using the device memory. However it doesnt crash anymore.

I had a stupid mistake and now I changed the line here

[codebox]

hr = SetParamValue(encoder->pCudaEncoder,NVVE_DEVICE_CTX_LOCK,g_CtxLock );

to:

hr = SetParamValue(encoder->pCudaEncoder,NVVE_DEVICE_CTX_LOCK,&g_CtxLock );

[/codebox]

I also embraced all my cuda calls in cuvidCtxLock and cuvidCtxUnlock as described in the cuviddec.h now.

This is nowhere documented in the encoder documentation.

It doesnt crash anymore now and encodes faster, which is good. Bad thing is that the encoded frames are pitch black.

Really no one out there who managed to use device memory for encoding?

Best regards,

dimo

Dimo, My friend,

I will be interested to see how you solve this problem… So, Keep posting.

It will be useful to the community, If I get any clue, I will get back to you on this thread,

Meanwhile, Keep updating,

Thanks,
Best Regards,
Sarnath

Dimo, My friend,

I will be interested to see how you solve this problem… So, Keep posting.

It will be useful to the community, If I get any clue, I will get back to you on this thread,

Meanwhile, Keep updating,

Thanks,
Best Regards,
Sarnath

Hi Dimo,

  1. cuvidCtxLocCreate() takes a “CUcontext” as 2nd argument. You are not passing it right. Did you not get a warning about it?

  2. SetParamValue() function has a problem. It is implemented in the APP only and it is not a library call.

    Just disable the “printing” done in this function. It has a bug. It indexes past the array while printing. Just fix it (hope some1 from NV is listening)

    It overflows for the DEVICE_MEM_INPUT(44) and the 45th one. NV needs to fix it. Well, Technically it will work for their APP…Upto them.

Best Regards,

Sarnath

Hi Dimo,

  1. cuvidCtxLocCreate() takes a “CUcontext” as 2nd argument. You are not passing it right. Did you not get a warning about it?

  2. SetParamValue() function has a problem. It is implemented in the APP only and it is not a library call.

    Just disable the “printing” done in this function. It has a bug. It indexes past the array while printing. Just fix it (hope some1 from NV is listening)

    It overflows for the DEVICE_MEM_INPUT(44) and the 45th one. NV needs to fix it. Well, Technically it will work for their APP…Upto them.

Best Regards,

Sarnath

  1. cuInit() is required for “NVCreateHWEncoder()” to succeed. Otherwise, the context that you create actually fails.
  2. Before cudaMalloc and cudaMemcpy – One needs to lock and unlock the context.
  3. It is better to pop-off the created context (as u had suggested) to make it floating. We think this does not matter at all. Because “ctxLock()” locks and pushes the related context automatically. but better we have it in place.

After all these, we got the API calls succeeding…But we still get “black” frames… However, even the original code (that does NOT use device mem input) too gives us “Black” frames only…

No idea whether this works at all or not… :-(

  1. cuInit() is required for “NVCreateHWEncoder()” to succeed. Otherwise, the context that you create actually fails.
  2. Before cudaMalloc and cudaMemcpy – One needs to lock and unlock the context.
  3. It is better to pop-off the created context (as u had suggested) to make it floating. We think this does not matter at all. Because “ctxLock()” locks and pushes the related context automatically. but better we have it in place.

After all these, we got the API calls succeeding…But we still get “black” frames… However, even the original code (that does NOT use device mem input) too gives us “Black” frames only…

No idea whether this works at all or not… :-(

Hi Sarnath,

Thanks for the pointers.

I think I got the cucontext stuff right now, at least there are no crashes. (I also changed the printing code you mentioned )I have set the NVVE_DEVICE_CTX_LOCK option successfully without any crashes so far. Also encoding several videos in parallel in different threads works. Using host memory everything works as expected here. However when the NVVE_DEVICE_MEMORY_INPUT is set, I get black frames. May be the doc isnt right about passing the pixels as the last argument in the NVEncodeFrame(pCudaEncoder,&efparams,0,d_buf)?

Or maybe it is not expecting YV12 but RGB in this case?

I am out of ideas.

Best regards,

Dieter

Hi Sarnath,

Thanks for the pointers.

I think I got the cucontext stuff right now, at least there are no crashes. (I also changed the printing code you mentioned )I have set the NVVE_DEVICE_CTX_LOCK option successfully without any crashes so far. Also encoding several videos in parallel in different threads works. Using host memory everything works as expected here. However when the NVVE_DEVICE_MEMORY_INPUT is set, I get black frames. May be the doc isnt right about passing the pixels as the last argument in the NVEncodeFrame(pCudaEncoder,&efparams,0,d_buf)?

Or maybe it is not expecting YV12 but RGB in this case?

I am out of ideas.

Best regards,

Dieter

Oh! I am surprised to see that the original SDK code worked fine for you.
Because, I am getting only black frames out here…even without DEVICE_MEM_INPUT

I also see that the binary for “cudaEncode” is “not” shipped in the “bin” directory of the SDK. I thought they did this on purpose.
I am using the 3.2RC SDK code. Which one are you using? I would be interested to know your configuration.
This is what I am using:

VIDEO_SOURCE_FILE “plush_480p_60fr.yuv”
VIDEO_CONFIG_FILE “704x480-h264.cfg”
VIDEO_OUTPUT_FILE “plush_480p_60fr.264”
Format is YV12

cudaEncode.exe data\plush_480p_60fr.yuv data\704x480-h264.cfg plush_480p_60fr.264 -format=YV12

Can you let me know if this one worked for you?

Many Thanks,
Best Regards,
Sarnath

Oh! I am surprised to see that the original SDK code worked fine for you.
Because, I am getting only black frames out here…even without DEVICE_MEM_INPUT

I also see that the binary for “cudaEncode” is “not” shipped in the “bin” directory of the SDK. I thought they did this on purpose.
I am using the 3.2RC SDK code. Which one are you using? I would be interested to know your configuration.
This is what I am using:

VIDEO_SOURCE_FILE “plush_480p_60fr.yuv”
VIDEO_CONFIG_FILE “704x480-h264.cfg”
VIDEO_OUTPUT_FILE “plush_480p_60fr.264”
Format is YV12

cudaEncode.exe data\plush_480p_60fr.yuv data\704x480-h264.cfg plush_480p_60fr.264 -format=YV12

Can you let me know if this one worked for you?

Many Thanks,
Best Regards,
Sarnath

Yes, this worked for me. But only the 32bit.

I think I read somewhere that 64 bit isnt supported yet.

The cuviddec.h states it explicitly:

[codebox]

#if defined(__x86_64) || defined(AMD64) || defined(_M_AMD64)

#if (CUDA_VERSION >= 3020) && (!defined(CUDA_FORCE_API_VERSION) || (CUDA_FORCE_API_VERSION >= 3020))

#error “CUVID currently does not support cuda 3.2 64-bit apis”

#endif

#endif[/codebox]

I wasn’t able to do any 64bit compiles.

Best regards,

dimo

Yes, this worked for me. But only the 32bit.

I think I read somewhere that 64 bit isnt supported yet.

The cuviddec.h states it explicitly:

[codebox]

#if defined(__x86_64) || defined(AMD64) || defined(_M_AMD64)

#if (CUDA_VERSION >= 3020) && (!defined(CUDA_FORCE_API_VERSION) || (CUDA_FORCE_API_VERSION >= 3020))

#error “CUVID currently does not support cuda 3.2 64-bit apis”

#endif

#endif[/codebox]

I wasn’t able to do any 64bit compiles.

Best regards,

dimo

Hmm… Interesting… I am also doing 32-bit stuff…

Why do I get black frames even for “host memory” input? Oops…
Are you using CUDA 3.2RC?