Hi,
I am currently trying to move some code from caffe framework to tensorRT (GIE) on JetsonTX1.
I installed the “JetPack 2.3.1 - L4T R24.2.1 released for Jetson TX1” and everything seems to be ok for TensorRT.
The code from https://github.com/dusty-nv/jetson-inference which use TensorRT works on the cards.
and the gie_sample “sampleMNISTGIE” from tensorRT package “nv-gie-repo-ubuntu1404-6-rc-cuda8.0_1.0.3-1_amd64.deb” also works.
I am now trying to make a neural net works with tensorRT : YOLO
the prototxt of the network is the following one : https://github.com/xingwangsfu/caffe-yolo/blob/master/prototxt/yolo_small_deploy.prototxt
I think that everything in this network is compatible with TensorRT.
I also have the associate caffemodel file to perform detection on images and everything together works, with the caffe framework.
With TensorRT I don’t have any errors but the output of the neural network is wrong.
The output of the very same image with caffe and tensorRT gives two output completely different.
Here is the code I use :
[code]
IBuilder* builder = createInferBuilder(gLogger);
const char* prototxt="yolo_small_deploy.prototxt";
const char* caffemodel="yolo_small.caffemodel";
// parse the caffe model to populate the network, then set the outputs and create an engine
//ICudaEngine* engine = createMNISTEngine(maxBatchSize, builder, DataType::kFLOAT);
INetworkDefinition* network = builder->createNetwork();
ICaffeParser *parser = createCaffeParser();
const IBlobNameToTensor *blobNameToTensor =parser->parse(prototxt, // caffe deploy file
caffemodel, // caffe model file
*network, // network definition that parser populate
DataType::kFLOAT);
assert(blobNameToTensor != nullptr);
// the caffe file has no notion of outputs
// so we need to manually say which tensors the engine should generate
network->markOutput(*blobNameToTensor->find(OUTPUT_BLOB_NAME));
// Build the engine
builder->setMaxBatchSize(1);
builder->setMaxWorkspaceSize(16 << 20);//WORKSPACE_SIZE);
// Eliminate the side-effect from the delay of GPU frequency boost
builder->setMinFindIterations(3);
builder->setAverageFindIterations(2);
//build
ICudaEngine *engine = builder->buildCudaEngine(*network);
IExecutionContext *context = engine->createExecutionContext();
// run inference
float prob[OUTPUT_SIZE];
// input and output buffer pointers that we pass to the engine - the engine requires exactly IEngine::getNbBindings(),
// of these, but in this case we know that there is exactly one input and one output.
assert(engine->getNbBindings() == 2);
void* buffers[2];
// In order to bind the buffers, we need to know the names of the input and output tensors.
// note that indices are guaranteed to be less than IEngine::getNbBindings()
int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);
int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);
// create GPU buffers and a stream
CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE *3* INPUT_H * INPUT_W * sizeof(float)));
CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));
cudaStream_t stream;
CHECK(cudaStreamCreate(&stream));
// DMA the input to the GPU, execute the batch asynchronously, and DMA it back:
CHECK(cudaMemcpyAsync(buffers[inputIndex], mInputCPU[0], BATCH_SIZE *3* INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));
context->enqueue(BATCH_SIZE, buffers, stream, nullptr);
CHECK(cudaMemcpyAsync(prob, buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE*sizeof(float), cudaMemcpyDeviceToHost, stream));
cudaStreamSynchronize(stream);
// release the stream and the buffers
cudaStreamDestroy(stream);
CHECK(cudaFree(buffers[inputIndex]));
CHECK(cudaFree(buffers[outputIndex]));
// destroy the engine
context->destroy();
engine->destroy();
[\code]
The ouputs of the neural network with tensorRT are similar for different images here are the results for a cat and for a matrice of zeros.
https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/cat_detection.jpg
https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/zeros_detection.jpg
The detection done when the neural network is run with caffe :
https://github.com/TLESORT/YOLO-TensorRT-GIE-/blob/master/Images/true_detection.jpg
The complete code is on :
Do you have any Idea on what could possible go wrong.
It seems like there is a error in the conversion of the caffemodel which make the result wrong.
Thank you for your help :)