Can tensorrt do inference in python thread or subprocess?

luisyin · August 24, 2019, 1:20pm

I am using tensorrt python api.I am able to do inference in main process.But when I start a new process or thead, and do inference in the new created process or thread, it just failed. The error is just like this post:
[url]https://devtalk.nvidia.com/default/topic/1055083/-can-tensorrt-do-inference-in-a-child-thread-/[/url]

My development environment is: x86+tensorrt5.1.5+cuda10.0+python3.6+ubuntu16.04

thomas.p.16 · August 25, 2019, 4:00pm

in that same thread it is stated

“I used the cuda.Context.attach() ,then the inference can run .”

Did you try this?

tensorrt will run in subprocess, yes…

luisyin · August 26, 2019, 3:45am

Thanks,Thomas.I just omitted that answer! Adding “cuda.Context.attach()” in allocate_buffers(), then it works.

# Allocate host and device buffers, and create a stream.
def allocate_buffers(engine, batch_size=1):
	# we need this to support multi process
	cuda.Context.attach()
	# Determine dimensions and create page-locked memory buffers (i.e. won't be swapped to disk) to hold host inputs/outputs.
	h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)) * batch_size, dtype=trt.nptype(trt.float32))
	h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)) * batch_size, dtype=trt.nptype(trt.float32))

	# Allocate device memory for inputs and outputs.
	d_input = cuda.mem_alloc(h_input.nbytes)
	d_output = cuda.mem_alloc(h_output.nbytes)
	# Create a stream in which to copy inputs/outputs and run inference.
	stream = cuda.Stream()
	return h_input, d_input, h_output, d_output, stream