nvcaffe 0.17 used in two plugins in the same pipe crashes

frederickk · January 29, 2019, 2:42am

When used only in one plugin, nvcaffe 0.17 works fine.
When I use it in 2 plugins in the same pipe I get

F0129 02:34:36.384413 15692 cudnn_conv_layer.cu:55] Check failed: error == cudaSuccess (4 vs. 0) unspecified launch failure

The last line below is line 55

} else {
// “old” path
for (int i = 0; i < bottom.size(); ++i) {
const Ftype* bottom_data = bottom[i]->gpu_data();
Ftype* top_data = top[i]->mutable_gpu_data();
// Forward through cuDNN in parallel over groups.
const size_t gsize = ws->size() / ws_groups();
CHECK(is_even(gsize));
for (int g = 0; g < groups(); ++g) {
void* pspace = static_cast<unsigned char*>(ws->data()) + gsize * idxg(g);
// Filters.
CUDNN_CHECK(cudnnConvolutionForward(Caffe::cudnn_handle(idxg(g)),
cudnn::dataType::one, fwd_bottom_descs_[i], bottom_data + bottom_offset_ * g,
fwd_filter_desc_, weight + this->weight_offset_ * g,
fwd_conv_descs_[i], fwd_algo_[i], pspace, gsize,
cudnn::dataType::zero, fwd_top_descs_[i], top_data + top_offset_ * g));
}
// NOLINT_NEXT_LINE(whitespace/operators)
for (int ig = 0; ig < ws_groups(); ++ig) {
CUDA_CHECK(cudaStreamSynchronize(Caffe::thread_stream(ig)));

frederickk · January 29, 2019, 5:16am

Some additional information:

This happens while both plugins (gstreamer) are executing inside
Net::Forward.

I use gst-launch.

AastaLLL · January 29, 2019, 5:58am

Hi,

CUDA error 4 is cudaErrorLaunchFailure:
An exception occurred on the device while executing a kernel. Common causes include dereferencing an invalid device pointer and accessing out of bounds shared memory. All existing device memory allocations are invalid. To continue using CUDA, the process must be terminated and relaunched.

Could you share more information about the ‘plugin’?
Is it a customized layer in your caffe frameworks? Or you are using TensorRT?

Thanks.

frederickk · January 29, 2019, 6:13am

Nvcaffe is used simply, load model and infer.

They are part of two separate gstreamer/DeepStream plugins/elements.

I run them in a gstreamer pipe using gst-launch. The first plugin receives video frames from uridecodebin and sends it to nvtracker, which then sends it to the second plugin. The sink is a fakesink.

Each plugin is a separate *.so shared library loaded by gst-launch.

frederickk · January 29, 2019, 6:15am

To clarify, the term “plugin” is used in the sense of gstreamer/DeepStream plugin.

frederickk · January 29, 2019, 6:20am

I even ran simple experiments just to make sure it wasn’t memory corruption.
Each plugin just repeatedly inputs the same cv::Mat and does a Net::Forward.

Same error.

Both seem to be exercising code in cudnn_conv_layer.cu at the same time.

frederickk · January 30, 2019, 10:47am

I am also getting this:

W0130 10:43:05.240928 23861 gpu_memory.cpp:129] Lazily initializing GPU Memory Manager Scope on device 0. Note: it’s recommended to do this explicitly in your main() function.

Not sure if it is related to the crash, but how do I initialize the “GPU Memory Manager Scope”?

frederickk · January 31, 2019, 12:59am

I did more digging and found that
test_mem_req_all_grps_
is a static member of CuDNNConvolutionLayer

So my question is:
Is nvcaffe cudnn_conv_layer (.cu,.hpp,.cpp) safe to be used in two separate inferencing Net objects inferencing in separate threads?

Also, is there a better forum for this question?

frederickk · January 31, 2019, 1:02am

I checked the main codeline version of caffe. That version of CuDNNConvolutionLayer does not have static data members.

frederickk · January 31, 2019, 7:00am

Discussion continued here
https://github.com/NVIDIA/caffe/issues/555