Can tensorrt do inference in python thread or subprocess?

I am using tensorrt python api.I am able to do inference in main process.But when I start a new process or thead, and do inference in the new created process or thread, it just failed. The error is just like this post:
[url]https://devtalk.nvidia.com/default/topic/1055083/-can-tensorrt-do-inference-in-a-child-thread-/[/url]

My development environment is: x86+tensorrt5.1.5+cuda10.0+python3.6+ubuntu16.04

in that same thread it is stated

“I used the cuda.Context.attach() ,then the inference can run .”

Did you try this?

tensorrt will run in subprocess, yes…

Thanks,Thomas.I just omitted that answer! Adding “cuda.Context.attach()” in allocate_buffers(), then it works.

# Allocate host and device buffers, and create a stream.
def allocate_buffers(engine, batch_size=1):
	# we need this to support multi process
	cuda.Context.attach()
	# Determine dimensions and create page-locked memory buffers (i.e. won't be swapped to disk) to hold host inputs/outputs.
	h_input = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)) * batch_size, dtype=trt.nptype(trt.float32))
	h_output = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)) * batch_size, dtype=trt.nptype(trt.float32))

	# Allocate device memory for inputs and outputs.
	d_input = cuda.mem_alloc(h_input.nbytes)
	d_output = cuda.mem_alloc(h_output.nbytes)
	# Create a stream in which to copy inputs/outputs and run inference.
	stream = cuda.Stream()
	return h_input, d_input, h_output, d_output, stream