I have built a SSD net using TensorRT 3.0 in TX2 with some plugin layers such as reshape, permute and so on. When I use DataType::kHALF to create the TensorRT net, it comes the error as follows:
ERROR: Internal error: could not find any implementation for node fc6 + relu6, try increasing the workspace size with IBuilder::setMaxWorkspaceSize()
ERROR: cudnnBuilder2.cpp (452) - OutOfMemory Error in buildSingleLayer
sample_SSD: SSD.cpp:108: void caffeToGIEModel(const string&, const string&, const std::vector<std::__cxx11::basic_string<char> >&, unsigned int, nvcaffeparser1::IPluginFactory*, nvinfer1::IHostMemory**): Assertion `engine' failed.
Aborted (core dumped)
I create the TensorRT net as follows:
void caffeToGIEModel(const std::string& deployFile, // name for caffe prototxt
const std::string& modelFile, // name for model
const std::vector<std::string>& outputs, // network outputs
unsigned int maxBatchSize, // batch size - NB must be at least as large as the batch we want to run with)
nvcaffeparser1::IPluginFactory* pluginFactory, // factory for plugin layers
IHostMemory **gieModelStream) // output stream for the GIE model
{
// create the builder
IBuilder* builder = createInferBuilder(gLogger);
// parse the caffe model to populate the network, then set the outputs
INetworkDefinition* network = builder->createNetwork();
ICaffeParser* parser = createCaffeParser();
parser->setPluginFactory(pluginFactory);
bool fp16 = builder->platformHasFastFp16();
std::cout << "Begin parsing model..." << std::endl;
const IBlobNameToTensor* blobNameToTensor = parser->parse(locateFile(deployFile).c_str(),
locateFile(modelFile).c_str(),
*network,
fp16 ? nvinfer1::DataType::kHALF : nvinfer1::DataType::kFLOAT);
std::cout << "End parsing model..." << std::endl;
// specify which tensors are outputs
for (auto& s : outputs)
network->markOutput(*blobNameToTensor->find(s.c_str()));
// Build the engine
builder->setMaxBatchSize(maxBatchSize);
builder->setMaxWorkspaceSize(10 << 20); // we need about 6MB of scratch space for the plugin layer for batch size 5
builder->setHalf2Mode(fp16);
ICudaEngine* engine = builder->buildCudaEngine(*network);
assert(engine);
std::cout << "End building engine..." << std::endl;
// we don't need the network any more, and we can destroy the parser
network->destroy();
parser->destroy();
// serialize the engine, then close everything down
(*gieModelStream) = engine->serialize();
engine->destroy();
builder->destroy();
shutdownProtobufLibrary();
}
I set the setMaxWorkspaceSize lager such as “16<<20” or even lager, it also comes the same error.
When I set fp16=false, it runs successfully.
Could someone give me some suggestions? Thank you in advance!