The plugin currently queries the pipeline upstream to figure out the batch-size with 1 being the minimum. So if you use two sources in your pipeline, the yolo plugin would use batch-size 2 and create an engine for the same. Currently, there is no support for temporal batching, i.e using batch sizes greater than 1 with just a single source.
Hi NvCJR, does the batch size need to equal the number of streams exactly? If I have 2 streams being processed does the engine need to be built with a batch size of 2? What are the consequences of building an engine with batch size of 4 and only processing 2 streams?
What I’m getting at is I want to have the engines prebuilt for deployment, but I cannot always be sure how many streams will be loaded. So am I best off pre-building an array of engines set for different batch sizes?
What are the consequences of building an engine with batch size of 4 and only processing 2 streams?
You can process 2 streams with an engine that has been built with max batch size of 4. Although it depends on the network and how TRT optimizes it, you shouldn’t see a big diff in perf if you use a lower batch size than what the engine was built for.
What I’m getting at is I want to have the engines prebuilt for deployment, but I cannot always be sure how many streams will be loaded. So am I best off pre-building an array of engines set for different batch sizes?
You can try a simple experiment to check if this affects you. Build an engine for batch size 128 (or a num suitable for you use case). Check what’s the perf if you use it for the lowest possible batch size in your use case (or batch size 1). You can then check perf with an engine with the lowest possible batch size too(like 1), and compare the difference. From an accuracy point of the view, there should be no change.
Also, can you explain temporal batching please?
For a single stream use case, you can set streammux timeout property to -1, to fill a batch completely before pushing it downstream. So if you set batch-size to 4 when only a single stream is being used, the muxer will wait until 4 frames have arrived before pushing the batch downstream.