There are more numbers “officially” from NVidia for more chips (kepler,maxwell g1, maxwell g2, pascal) and for many encoding parameters (quality vs. speed) - NVIDIA VIDEO CODEC SDK | NVIDIA Developer
I have seen all of those documents before we created this table, but i was unable to find which GTX (no Quadro) has 2xNVENC chipsets and also NVDEC/CUDA performance, so this could help somebody to know true power of those cards…
the gtx 1080 has 2x the threads for a total of 5200fps, do you know if streaming on obs (encoding h264 using nvenc) will be double the performance than gtx 1070 with 1x thread for 2600fps
How many NVENC engines does the new GTX 1070 Ti have?
Given that the GTX 1070Ti is a slightly cut down GTX 1080, I’m Keen to know whether it has 1 or 2 NVENC engines and whether both are enabled.
Further tried hw_decode.c sample given in ffmpeg/doc/examples folder.
This took about 3 times more time to decode same input.mp4 file compared to the time taken for ffmpeg command given above.
Next modified the hw_decode.c as follows:
ret = avcodec_receive_frame(avctx, frame);
if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
av_frame_free(&frame);
av_frame_free(&sw_frame);
return 0;
} else if (ret < 0) {
fprintf(stderr, "Error while decoding\n");
goto fail;
}
#define QUICK_RELEASE
#ifdef QUICK_RELEASE
av_frame_free(&frame);
av_frame_free(&sw_frame);
return 0;
#endif
if (frame->format == hw_pix_fmt) {
/* retrieve data from GPU to CPU */
if ((ret = av_hwframe_transfer_data(sw_frame, frame, 0)) < 0) {
fprintf(stderr, "Error transferring the data to system memory\n");
goto fail;
}
tmp_frame = sw_frame;
Here the frame gets decoded and immediately released before transferring the decoded frame to host. After this the time taken by the program reduced by 3 times and matched with ffmpeg command.
So, the conclusion is that time to transfer data from GPU memory to motherboard memory is taking time. I feel that shared memory is the only way to overcome this. Any other suggestions ?