Hi,
We are using the Jetson TX2 (L4T R28.2) platform with Tensorflow (1.7.0 with CUDA 9, it also happens with 1.7.1 with CUDA 8) to run real-time object detection.
We have 2 issues that we need help investigating them:
- From time to time, upon startup of our application we will get an error where Tensorflow fails to create a session, with an error like this:
2018-05-30 14:01:50.824196: E tensorflow/core/common_runtime/direct_session.cc:167] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: unknown error
Process DetectionProcess-1:
Traceback (most recent call last):
File “/usr/lib/python3.5/multiprocessing/process.py”, line 249, in _bootstrap
self.run()
File “/opt/unit/mp/UnitProcess.py”, line 23, in run
raise e
File “/opt/unit/mp/UnitProcess.py”, line 18, in run
self.work()
File “/opt/unit/vision/detection/detection_process.py”, line 62, in work
net = Factory.createObjectDetector(self._net_params, self.logger)
File “/opt/unit/vision/detection/factory.py”, line 17, in createObjectDetector
return MobileNet(params, logger)
File “/opt/unit/vision/detection/mobile_net.py”, line 121, in init
self.build(tracking_params[“inference_graph”], tracking_params[“label_map”])
File “/opt/unit/vision/detection/mobile_net.py”, line 172, in build
config=config)
File “/home/notraffic/.virtualenvs/cv/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 1509, in init
super(Session, self).init(target, graph, config=config)
File “/home/notraffic/.virtualenvs/cv/lib/python3.5/site-packages/tensorflow/python/client/session.py”, line 638, in init
self._session = tf_session.TF_NewDeprecatedSession(opts, status)
File “/home/notraffic/.virtualenvs/cv/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py”, line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
There is plenty memory free before our application starts (6GB free), but running simple gpu stress testing reveals that when this issue happens something is wrong with the memory and a simple script will fail with memory problems and only setting gpu_usage_fraction in tensorflow to a low value will make it work:
2018-06-07 05:53:02.401007: E tensorflow/stream_executor/cuda/cuda_blas.cc:462] failed to create cubl
as handle: CUBLAS_STATUS_NOT_INITIALIZED
2018-06-07 05:53:02.402026: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:650] failed to rec
ord completion event; therefore, failed to create inter-stream dependency
2018-06-07 05:53:02.402026: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:650] failed to rec
ord completion event; therefore, failed to create inter-stream dependency
2018-06-07 05:53:02.402113: E tensorflow/stream_executor/event.cc:40] could not create CUDA event: CU
DA_ERROR_UNKNOWN
Segmentation fault (core dumped)
For us it looks like something gets “stuck” in the memory.
- We are trying to move our code to run on Docker with the GPU on the Jetson TX2, we managed to run it but the GPU performance is 50% of what we had before in terms of detection FPS (from 23FPS to 11FPS).
We are using this guide: GitHub - Technica-Corporation/Tegra-Docker: Instructions and key files to enable Docker support on NVIDIA Tegra devices, specifically the TX-2.
We use JetPack ver 3.2