Error with Concatenate Layer in TensorRT2

Hi,

I have an network that uses a concatenate layer. Initially it concatenated 3 input blobs but this does not seem to be supported so I changed the layer to two layers concatenating the first and second blob initially and then the intermediary result with the third blob. This finally stopped an error in the IBuilder::validate method called from IBuilder::buildCudaEngine.

Now I have another error, also in buildCudaEngine but in a different sub-function: IBuilder::buildSingleLayer. It occurs when the network optimizer encounters the first concatenate layer. The error is the following:

cudnnBuilder2.cpp:1528: std::unique_ptr<nvinfer1::cudnn::Layer>
    nvinfer1::builder::buildSingleLayer(
         nvinfer1::cudnn::EngineBuildContext&, 
         const nvinfer1::builder::Node&, 
         const EngineTensors&, const EngineTensors&): 

Assertion `0' failed.

The layer proto is reassembles the following:

layer {
  name: "concat2"
  type: "Concat"
  bottom: "conv_1"
  bottom: "conv_2"
  top: "concat_2_output"
  concat_param {
    axis: 1
  }
}

Is the Concatenation layer well supported in TensorRT2? It seems present in the documentation (IConcatenationLayer) but it’s not in the list given here: https://devtalk.nvidia.com/default/topic/997770/tensor-rt-supports-caffe-model-layers-/. Will the concat_param::axis=1 parameters be handled?

Note: I’m running this on a GTX 1070 but I’m targeting the TX1/2 and the P4. Would running directly on those boards make any difference?

Andrei Stoian
R&D Engineer, Thales Services SAS

Hi,

Thanks for your question.
Supported layers can be found in document which is located at ‘/usr/share/doc/gie/’.

Concat layer is not contained in tensorRT-1.0 but already included into tensorRT-3.0, our latest version. (Not available yet)

For your last question, code will be the same but please re-compile your model with aarch64 tensorRT library to make it compatible to TX1 GPU architecture.

Thanks.

Thank you for your answer.

Is there a release date for TensorRT 3?

Since I’m looking to get this working until the end of summer, I guess that then I could implement it myself in CUDA and use two contexts, one for the part before the concat and one for the part after like you show here: [url]https://devtalk.nvidia.com/default/topic/997770/tensor-rt-supports-caffe-model-layers-/[/url]. Is there any performance penalty when taking this approach ?

Hi,

Sorry for that we can’t disclosure any schedule plan.
Please pay attention to our announcement and update.

For not supported layer, please add your own layer into tensorRT flow.

IExecutionContext *contextA =
engineA->createExecutionContext();
IExecutionContext *contextB =
engineB->createExecutionContext();
<...>
contextA.enqueue(batchSize, buffersA, stream, nullptr);
myLayer(outputFromA, inputToB, stream);
contextB.enqueue(batchSize, buffersB, stream, nullptr);

If you implement myLayer with CUDA, there is no extra penalty. (ex. cpu <-> gpu memory copy)

I will try to use this method to split the network. In fp32 mode, as you say, I guess there should be no extra penalty when running a custom CUDA kernel on the output from the first context execution.

However, if I want to use fp16 or int8 mode, would there be a penalty? From what I see, TensorRT automatically converts/dequantizes the output tensors from fp16 or int8 to fp32. After my kernel is run if I want to run the second part of the network in int8/fp16, a new conversion/quantization would be necessary. Is there a way to get the fp16 or int8 raw output data directly from TensorRT?

Additionally, could you elaborate on the int8 to fp32 conversion and on how fp32 values are quantized to int8?

Hi,

Thanks for your feedback.

For simplicity, you can set input/output to be fp32 type and execute tensorRT on fp16 mode.
Conversion will be applied automatically and you can handle myLayer() function with general float format.

For better performance, you can handle myLayer() function with float16 directly.
We follow standard fp16 format: Half-precision floating-point format - Wikipedia
Conversion code from float to FP16 is available in the NVIDIA® CUDA® library for GPU execution.

That sounds interesting, I looked at the doc and I found a ITensor::setType function, I’m guessing this is the function to enable/disable conversion on the input/output layers?

Thanks!

Hi,

In compuatational mode=FP16, TensorRT can accept input or output data in either FP32 or FP16 mode.
You can change to use any combinations below for input and output:
• Input FP32, output FP32
• Input FP16, output FP32
• Input FP16, output FP16
• Input FP32, output FP16

setAllNetworkInputsToHalf(network);

static void setAllNetworkInputsToHalf(INetworkDefinition* network){
    for (int i = 0; i < network->getNbInputs(); i++)
        network->getInput(i)->setType(DataType::kHALF);
}

Thanks.

Excellent! I’ll use that then.

One more question about reduced precision: In the sampleGoogleNet.cpp example, for parsing the caffe model the code is :

DataType modelDataType = useFp16 ? DataType::kHALF : DataType::kFLOAT; // create a 16-bit model if it's natively supported
	const IBlobNameToTensor *blobNameToTensor =
		parser->parse(locateFile(deployFile).c_str(),	// caffe deploy file
		locateFile(modelFile).c_str(),		// caffe model file
		*network,				// network definition that the parser will populate
		modelDataType);

Does modelDataType reflect the data type that the parser expects to find the in the caffemodel file or does it imply that the parser will read fp32 from the caffemodel and convert to fp16 if modelDataType=kHALF ?

Since I’m training with vanilla Caffe in fp32 all my caffemodels are in fp32. Should I thus always be passing DataType::kFLOAT to ICaffeParser::parse?

Hi,

Thanks for your feedback.

If you want to run fp16 mode, it’s required to construct a kHALF tensorRT model.

parser->parse(locateFile(deployFile).c_str(), locateFile(modelFile).c_str(), *network, DataType::kHALF);
builder->setHalf2Mode(true)

But please notice that you still can input/output float or fp16 type. If float type is used, conversion will be called automatically.

Ok, thanks for the info!

I have a few more questions about quantization in TensorRT, I’ll post them here unless you think there’s a more adequate forum section.

  1. In the TensorRT guide it says:

I can’t find the ‘accompanying white paper’, where could I get it?

  1. Does the Jetson TX2 support int8 operations in hardware?

Hi,

  1. Could you share which TensorRT guide you read? Then we can find the corresponding white paper.

  2. No, int8 only works on P4/P40/TitanX/…, not for TX1 and TX2.

Thanks.

It’s in the TensorRT User Guide.html that is installed by the deb package to /usr/share/doc/gie/doc/. It is provided by the libnvinfer-dev ver. 2.0.0-1+cuda8 package.

Thanks for the info on the int8 support!

I am confused, the TensorRT2 user guide states that the build phase performs:

“elision of concatenation layers by directing layer outputs to the correct eventual destination”

Doesn’t this mean that the outputs are copied appropriately into memory, rather than explicitly executing concat code. This still effectively implementing concat layer, right?

Anyways, I get a failed assertion during the build phase when analyzing the first concat layer

main: cudnnBuilder2.cpp:371: void nvinfer1::builder::checkSanity(const nvinfer1::builder::Graph&): Assertion `readRegions.find(t->region.get()) == readRegions.end()' failed.

Any workarounds, other than splitting up the pipeline with custom concat layer?

Thanks!

Hi,

Do you use tensorRT2.0, tensorRT2.0 only supports desktop GPU and can’t use on Jetson.

Hello,

I came across this post from an online search and didn’t notice its in the embedded section of the forum. I am in fact using a desktop GPU. I made a similar thread in the compute libraries forum.

Thanks.