Segmentation fault when building an ICudaEngine in TensorRT3
Hi Why do I get a segmentation fault when building an ICudaEngine running [code]// create the builder IBuilder* builder = createInferBuilder(gLogger); // parse the caffe model to populate the network, then set the outputs INetworkDefinition* network = builder->createNetwork(); ICaffeParser* parser = createCaffeParser(); parser->setPluginFactory(pluginFactory); std::cout << "Begin parsing model..." << std::endl; const IBlobNameToTensor* blobNameToTensor = parser->parse(locateFile(deployFile).c_str(), locateFile(modelFile).c_str(), *network, DataType::kFLOAT); std::cout << "End parsing model..." << std::endl; // specify which tensors are outputs for (auto& s : outputs) network->markOutput(*blobNameToTensor->find(s.c_str())); // Build the engine builder->setMaxBatchSize(maxBatchSize); builder->setMaxWorkspaceSize(10 << 20); // we need about 6MB of scratch space for the plugin layer for batch size 5 std::cout << "Begin building engine..." << std::endl; ICudaEngine* engine = builder->buildCudaEngine(*network); assert(engine); std::cout << "End building engine..." << std::endl; // we don't need the network any more, and we can destroy the parser network->destroy(); parser->destroy(); // serialize the engine, then close everything down (*gieModelStream) = engine->serialize(); engine->destroy(); builder->destroy(); shutdownProtobufLibrary();[/code] [code]Begin building engine... Thread 1 "sample_fasterRC" received signal SIGSEGV, Segmentation fault. 0x00007fffe7994ba7 in nvinfer1::Network::validate(nvinfer1::cudnn::HardwareContext const&, bool, bool, int) const () from /home/TensorRT-3/TensorRT-3.0.1/lib/libnvinfer.so.4[/code] I can correctly build a cuda engine when running the jetson-inference samples. Does this mean that there is something wrong with the network I implemented?
Hi

Why do I get a segmentation fault when building an ICudaEngine running
// create the builder
IBuilder* builder = createInferBuilder(gLogger);

// parse the caffe model to populate the network, then set the outputs
INetworkDefinition* network = builder->createNetwork();
ICaffeParser* parser = createCaffeParser();
parser->setPluginFactory(pluginFactory);

std::cout << "Begin parsing model..." << std::endl;
const IBlobNameToTensor* blobNameToTensor = parser->parse(locateFile(deployFile).c_str(),
locateFile(modelFile).c_str(),
*network,
DataType::kFLOAT);
std::cout << "End parsing model..." << std::endl;
// specify which tensors are outputs
for (auto& s : outputs)
network->markOutput(*blobNameToTensor->find(s.c_str()));

// Build the engine
builder->setMaxBatchSize(maxBatchSize);
builder->setMaxWorkspaceSize(10 << 20); // we need about 6MB of scratch space for the plugin layer for batch size 5

std::cout << "Begin building engine..." << std::endl;
ICudaEngine* engine = builder->buildCudaEngine(*network);
assert(engine);
std::cout << "End building engine..." << std::endl;

// we don't need the network any more, and we can destroy the parser
network->destroy();
parser->destroy();

// serialize the engine, then close everything down
(*gieModelStream) = engine->serialize();

engine->destroy();
builder->destroy();
shutdownProtobufLibrary();

Begin building engine...

Thread 1 "sample_fasterRC" received signal SIGSEGV, Segmentation fault.
0x00007fffe7994ba7 in nvinfer1::Network::validate(nvinfer1::cudnn::HardwareContext const&, bool, bool, int) const ()
from /home/TensorRT-3/TensorRT-3.0.1/lib/libnvinfer.so.4

I can correctly build a cuda engine when running the jetson-inference samples. Does this mean that there is something wrong with the network I implemented?

#1
Posted 01/02/2018 03:11 AM   
Hi, It looks like you met the similar error with this topic: https://devtalk.nvidia.com/default/topic/1027521/why-received-signal-sigsegv-when-import-deploy-prototxt-with-tensor-rt-3-0/ Please check it first. Thanks.
Answer Accepted by Original Poster
Hi,

It looks like you met the similar error with this topic:
https://devtalk.nvidia.com/default/topic/1027521/why-received-signal-sigsegv-when-import-deploy-prototxt-with-tensor-rt-3-0/

Please check it first.
Thanks.

#2
Posted 01/02/2018 09:17 AM   
Hi. Thank you for providing the link, that helped me find out that it was something wrong with the outputs of a layer. By the way, does the Concat layer in TensorRT support 2 channel Concat? The layer parameters looks like this: [code]layer { name: "mbox_priorbox" type: "Concat" bottom: "conv4_3_norm_mbox_priorbox" bottom: "fc7_conv_mbox_priorbox" bottom: "conv6_2_mbox_priorbox" bottom: "conv7_2_mbox_priorbox" bottom: "conv8_2_mbox_priorbox" bottom: "conv9_2_mbox_priorbox" top: "mbox_priorbox" concat_param { axis: 2 } }[/code] And when I try to parse the definition I get the following error: [code]Parameter check failed at: Network.cpp::addConcatenation::152, condition: first->getDimensions().d[j] == dims.d[j] && "All non-channel dimensions must match across tensors." error parsing layer type Concat index 96 [/code] I do not, however, get the error when I remove the layer from the network. I also see that Concat works for other Concat layers that only uses 1 axis as parameter. Do you think I need to implement my own Concat plugin that supports 2 axis, or is it something I have done wrong in my code?
Hi.

Thank you for providing the link, that helped me find out that it was something wrong with the outputs of a layer.

By the way, does the Concat layer in TensorRT support 2 channel Concat? The layer parameters looks like this:

layer {
name: "mbox_priorbox"
type: "Concat"
bottom: "conv4_3_norm_mbox_priorbox"
bottom: "fc7_conv_mbox_priorbox"
bottom: "conv6_2_mbox_priorbox"
bottom: "conv7_2_mbox_priorbox"
bottom: "conv8_2_mbox_priorbox"
bottom: "conv9_2_mbox_priorbox"
top: "mbox_priorbox"
concat_param {
axis: 2
}
}

And when I try to parse the definition I get the following error:
Parameter check failed at: Network.cpp::addConcatenation::152, condition: first->getDimensions().d[j] == dims.d[j] && "All non-channel dimensions must match across tensors."
error parsing layer type Concat index 96

I do not, however, get the error when I remove the layer from the network. I also see that Concat works for other Concat layers that only uses 1 axis as parameter. Do you think I need to implement my own Concat plugin that supports 2 axis, or is it something I have done wrong in my code?

#3
Posted 01/02/2018 03:46 PM   
Hi, TensorRT only supports channel-axis concatenate (axis=1). You can find this information in our document: [color="gray"] > 1.1. TensorRT Layers ... Concatenation The Concatenation layer links together multiple tensors of the same height and width [b]across the channel[/b] dimension. ... [/color] Thanks.
Hi,

TensorRT only supports channel-axis concatenate (axis=1).
You can find this information in our document:

> 1.1. TensorRT Layers
...
Concatenation
The Concatenation layer links together multiple tensors of the same height and width
across the channel dimension.
...

Thanks.

#4
Posted 01/03/2018 09:26 AM   
Thank you, that cleared up the misunderstanding. Also, AastaLLL, what is the most efficient way to do inference. Synchronously with IExecutionContext::execute or asynchronously with IExecutionContext::enqueue? And when executing IExecutionContext::execute will the buffers then be on the GPU or CPU?
Thank you, that cleared up the misunderstanding. Also, AastaLLL, what is the most efficient way to do inference. Synchronously with IExecutionContext::execute or asynchronously with IExecutionContext::enqueue? And when executing IExecutionContext::execute will the buffers then be on the GPU or CPU?

#5
Posted 01/03/2018 09:48 PM   
Hi, Buffer for TensorRT is on the GPU. A user may have some advantage with enqueue(). But still depends on the use case. Check details in our document: ------------------------------------------------[i][color="gray"] In a typical production case, TensorRT will execute asynchronously. The enqueue() method will add kernels to a cuda stream specified by the application, which may then wait on that stream for completion. The fourth parameter to enqueue() is an optional cudaEvent which will be signaled when the input buffers are no longer in use and can be refilled. In this sample we simply copy the input buffer to the GPU, run inference, then copy the result back and wait on the stream: cudaMemcpyAsync(<...>, cudaMemcpyHostToDevice, stream); context.enqueue(batchSize, buffers, stream, nullptr); cudaMemcpyAsync(<...>, cudaMemcpyDeviceToHost, stream); cudaStreamSynchronize(stream); [/color][/i] Thanks.
Hi,

Buffer for TensorRT is on the GPU.
A user may have some advantage with enqueue(). But still depends on the use case.

Check details in our document:
------------------------------------------------
In a typical production case, TensorRT will execute asynchronously. The enqueue() method will add kernels to a cuda stream specified by the application, which may then wait on that stream for completion. The fourth parameter to enqueue() is an optional cudaEvent which will be signaled when the input buffers are no longer in use and can be refilled.

In this sample we simply copy the input buffer to the GPU, run inference, then copy the result back and wait on the stream:

cudaMemcpyAsync(<...>, cudaMemcpyHostToDevice, stream);
context.enqueue(batchSize, buffers, stream, nullptr);
cudaMemcpyAsync(<...>, cudaMemcpyDeviceToHost, stream);
cudaStreamSynchronize(stream);

Thanks.

#6
Posted 01/04/2018 08:58 AM   
Hi again Aasta. Do you know if the Softmax layer in TensorRT supports 'axis: 2' as parameter? And also am I supposed to get different outputs from layers in TensorRT and the output from layers in Caffe?
Hi again Aasta. Do you know if the Softmax layer in TensorRT supports 'axis: 2' as parameter? And also am I supposed to get different outputs from layers in TensorRT and the output from layers in Caffe?

#7
Posted 01/06/2018 05:45 PM   
Hi, Currently, TensorRT only support cross-channel SoftMax layer(axis=1). And the output of TensorRT and Caffe should be similar. Thanks.
Hi,

Currently, TensorRT only support cross-channel SoftMax layer(axis=1).
And the output of TensorRT and Caffe should be similar.

Thanks.

#8
Posted 01/10/2018 08:00 AM   
Does this TensorRT's concatenation require all dimension size to be known before the graph is evaluated? I am also seeing this error when concatenating tensors of shape [None, None, 1, 4]. At evaluation time, the value of the first "None", i.e. batch size, is same across all tensors.
Does this TensorRT's concatenation require all dimension size to be known before the graph is evaluated? I am also seeing this error when concatenating tensors of shape [None, None, 1, 4]. At evaluation time, the value of the first "None", i.e. batch size, is same across all tensors.

#9
Posted 02/16/2018 09:27 PM   
Hi, Suppose that you can assign the batch-size when runtime. But the dimension of other axis should be assinged since TensorRT is not yet supported dynamic input. Thanks.
Hi,

Suppose that you can assign the batch-size when runtime.
But the dimension of other axis should be assinged since TensorRT is not yet supported dynamic input.

Thanks.

#10
Posted 02/21/2018 09:04 AM   
Scroll To Top

Add Reply