As far as I know, and in my experience, TRT is designed to work only for a fixed input size. It uses that static information (along with the network params, and GPU type, and who knows what else) to build its best execution engine it can for that particular setup. If you need a different setup, including different input dimensions, you need to build a new engine, but you can stash them for future use later using serialize and deserializeCudaEngine.