TensorRT fails to build engine for network constructed using C++ API when setHalf2Mode(true)

Hi,
I built a network using TensorRT3.0 C++ API, and everything works fine for FP32 mode.
Then I wanted to try the FP16 mode. When I change the DataType of model Weight from kFLOAT to kHALF, everything works as before, but I haven’t observed a big acceleration.
However, when I add a line of code “builder->setHalf2Mode(true);” the program crashed at buildCudaEngine, and I got an error message as following:

cudnnLayerUtils.cpp:98: void* nvinfer1::cudnn::getTensorMem(const nvinfer1::cudnn::EngineTensor&, void**, void**): Assertion `start[vectorIndex]%spv == 0’ failed

Anyone can help me on this issue? What might cause this assertion failure?

Thanks.

BTW, I am testing on TX2.

I met same errors on TX2, hope someone can help to answer it.

Update:
I have tried using a UffParser to parse a tensorflow-trained model into TensorRT engine, and do the same as in “SampleGoogleNet - Profiling and 16-bit Inference” tutorial in TensorRT3 User Guide. That is, setting DataType to kHALF when parser->parse() get called, and then call builder->setHalf2Mode(true).
But it is still core-dumped at builder->buildCudaEngine(*network) with error message:

cudnnLayerUtils.cpp:98: void* nvinfer1::cudnn::getTensorMem(const nvinfer1::cudnn::EngineTensor&, void**, void**): Assertion `start[vectorIndex]%spv == 0’ failed

Hope someone can help.

Thanks!

The core’s information for your reference:

(gdb) bt
#0  0x0000007fa3d42528 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x0000007fa3d439e0 in __GI_abort () at abort.c:89
#2  0x0000007fa3d3bc04 in __assert_fail_base (fmt=0x7fa3e28240 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7fa4ea39f8 "start[vectorIndex]%spv == 0", file=file@entry=0x7fa4ea3940 "cudnnLayerUtils.cpp", 
    line=line@entry=98, 
    function=function@entry=0x7fa4ea35d0 <nvinfer1::cudnn::getTensorMem(nvinfer1::cudnn::EngineTensor const&, void**, void**)::__PRETTY_FUNCTION__> "void* nvinfer1::cudnn::getTensorMem(const nvinfer1::cudnn::EngineTensor&, void**, void**)") at assert.c:92
#3  0x0000007fa3d3bcac in __GI___assert_fail (assertion=0x7fa4ea39f8 "start[vectorIndex]%spv == 0", file=0x7fa4ea3940 "cudnnLayerUtils.cpp", line=98, 
    function=0x7fa4ea35d0 <nvinfer1::cudnn::getTensorMem(nvinfer1::cudnn::EngineTensor const&, void**, void**)::__PRETTY_FUNCTION__> "void* nvinfer1::cudnn::getTensorMem(const nvinfer1::cudnn::EngineTensor&, void**, void**)")
    at assert.c:101
#4  0x0000007fa4c7d61c in nvinfer1::cudnn::getTensorMem(nvinfer1::cudnn::EngineTensor const&, void**, void**) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.4
#5  0x0000007fa4c6bc38 in nvinfer1::cudnn::WinogradConvActLayer::execute(nvinfer1::cudnn::CommonContext const&) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.4
#6  0x0000007fa4c7f2a4 in nvinfer1::cudnn::selectFastestLayerAndDeleteOthers(nvinfer1::cudnn::EngineBuildContext&, std::vector<nvinfer1::cudnn::Layer*, std::allocator<nvinfer1::cudnn::Layer*> > const&) ()
   from /usr/lib/aarch64-linux-gnu/libnvinfer.so.4
#7  0x0000007fa4c23ef8 in nvinfer1::builder::buildSingleLayer(nvinfer1::cudnn::EngineBuildContext&, nvinfer1::builder::Node&, std::unordered_map<std::string, std::unique_ptr<nvinfer1::cudnn::Region, std::default_delete<nvinfer1::cudnn::Region> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::unique_ptr<nvinfer1::cudnn::Region, std::default_delete<nvinfer1::cudnn::Region> > > > > const&, nvinfer1::CpuMemoryGroup&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*, bool) ()
   from /usr/lib/aarch64-linux-gnu/libnvinfer.so.4
#8  0x0000007fa4c26170 in nvinfer1::builder::EngineTacticSupply::getBestTactic(nvinfer1::builder::Node&, nvinfer1::query::Ports<nvinfer1::RegionFormatL> const&, bool) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.4
#9  0x0000007fa4c52964 in nvinfer1::builder::(anonymous namespace)::LeafCNode::computeCosts(nvinfer1::builder::TacticSupply&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.4
#10 0x0000007fa4c56e98 in nvinfer1::builder::chooseFormatsAndTactics(nvinfer1::builder::Graph&, nvinfer1::builder::TacticSupply&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.4
#11 0x0000007fa4c26f88 in nvinfer1::builder::makeEngineFromGraph(nvinfer1::CudaEngineBuildConfig const&, nvinfer1::cudnn::HardwareContext const&, nvinfer1::builder::Graph&, std::unordered_map<std::string, std::vector<float, std::allocator<float> >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<float, std::allocator<float> > > > >*, int) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.4
#12 0x0000007fa4c28b40 in nvinfer1::builder::buildEngine(nvinfer1::CudaEngineBuildConfig&, nvinfer1::cudnn::HardwareContext const&, nvinfer1::Network const&) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.4
#13 0x0000007fa4c0fe3c in nvinfer1::Builder::buildCudaEngine(nvinfer1::INetworkDefinition&) () from /usr/lib/aarch64-linux-gnu/libnvinfer.so.4
#14 0x0000000000403200 in LaneDetector::Init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#15 0x0000000000404bd4 in main ()
(gdb)

Same here. Noticing the same problem! I am also parsing a UFF from a tensorflow model in the Jetson TX2, and I get the same backtrace. I convert in the host computer from tensorflow 1.4 frozen pb to uff and copy it over to the Jetson TX2.

A response from NVIDIA would be nice, considering it looks like a widespread problem.

Thanks!

I opened the debug mode but there are nothing detail log out which is totally black box operation…No idea what’s wrong in build cuda engine.

I have the same error when build a network using tensorRT3.0. Does this problem be solved?

Open a bug at developer.nvidia.com!