I don’t have the right camera for it, but I’d like to create something similar to capture a human hand on a mouse and joystick. The goal would be to create an icon display of mouse/joystick movements for inclusion on tutorial style videos or as an overlay to video capture, but without the main operating system having to take part. Basically an independent camera/Xavier edge computing/streaming icons and/or events of slight hand motions.
The main problem is that the camera probably needs to be a bit specialized for close in macro style movements with fine details. What I’m wondering is that on this particular demo, what is the smallest movement which your capture can identify? What happens if you do a close-up for tiny mouse movements? Imagine someone gaming and making fast light twitches with mouse and joystick…what kind of camera would you need to replace your existing camera with to see those fine touches?
Btw, if you’ve ever seen the comedy movie “Galaxy Quest”, where the aliens create a ship with controls based on learning the movements they saw on a fictional show, then that is the basic idea. An ability to describe fine movement via computer of rapid tiny movements. Unfortunately I don’t think a regular stereo camera can capture tiny rapid movements as small as 0.1 mm (or less).
Thank you! The dance moves were the real challenge here :)
And cool! Sounds like an interesting project.
This is using a monocular camera, so the absolute precision largely depends largely on the resolution and distance of the object.
It’s hard to say exactly what camera is necessary, I imagine it would require some experimentation. In addition to the image resolution, you can also experiment with different neural network architectures to trade off accuracy/speed.
Would it be correct to say that if one were pointing a monocular camera straight down on a hand with a mouse, that minor movement of a finger down to press a button would probably not show up? My thought is that stereo is needed, and perhaps close up lenses or wider than normal ocular separation would be required. From what I can tell, something like the Zed stereo camera does not have that close up high res ability…it’s designed for greater distances…probably in need of a wider separation of cameras.
Note: This would require a low latency detect of when the hand pushes a button, and also a very precision idea of lateral movement. It seems the monocular version will have no knowledge of depth.
Hi John,
When I run your live_demon_ipynb on Jupyter notebook,it prompted me an error message "No module of trt_pose.coco and other error. May you send me a complete workable all files of trt_pose project for learning your wonderful project please?
Many thanks,
Francis
Hi @jaybdub, this project looks amazing, best pose estimation model on the jetson series thus far!
I would like to train on my own data for this purpose, in your training script train.py, there is a config.json file that is needed to perform training. Could you provide that the config file that you used to train for your model? Thank you!
I’d like to examine on our own device. The aim of project is human pose estimation just like yours.But also we want to create simulation which replicated the motions of the human in front of the camera Which programs do we have to install to Jetson Nano?
I’m trying to run this on my Geforce 1060 laptop, for this I’m using the PyTorch 20.06 NGC Container. But I’m receiving this error message in the firt cell of live_demo.ipynb.
Thank you, I was looking for this kind of example to get started and I would like to get the key points and coordinates of body joints Is it possible? the idea is to build a system that alert when someone falls or behave unnaturally, if you have any input, please provide.
@pmario.silva Hmm. This looks like perhaps trt_pose was built against a version of PyTorch different from what you’re currently using. I would try uninstalling trt_pose, and re-install from scratch.
@salmanfaris Hi salmanfaris, you can get the 2D body keypoints from the output of the model. Please let me know if you have questions on how to do this. While pose is particularily useful because it’s offers a nice programatic interface (point locations) and also abstracts away the visual variation of different people, sometimes an end-to-end approach, like training a classification model, may be more robust and easy to continually improve. This is particularily true if the problem is visually simple. If you want to learn how to train your own model, I’d check out the JetBot project. I encourage you to explore which fits your application best.
Please let me know if this helps or you have any questions.
This notebook demonstrates how to run the model and draw keypoints.
The 2D keypoints are parsed from the neural network using the “ParseObjects” function. This returns
object_counts: The number of people per image (Tensor of size (Num Images)
objects: An (Num Images)x(Number of People)x(Number of Body Part Types) matrix. The values in this matrix correspond the the keypoint index (see next tensor, and are -1 if the keypoint doesn’t exist for that person)
normalized_peaks: An (Num Images)x(Number of Body Part Types)x(Maximum Num Possible Keypoints)x(2) tensor containing the keypoint locations in normalized images coordinates [0,1].
To get the left eye for the person with index=0 we would do.
image_idx = 0
if object_counts[image_idx ] > 0:
# there is an object in the first image
person_idx = 0
left_eye_type_idx = 1
left_eye_idx = objects[image_idx, person_idx , left_eye_type_idx ]
if left_eye_idx > 0:
# the person has a left eye
left_eye_location = normalized_peaks[image_idx, left_eye_type_idx, left_eye_idx, :] # row, col
y, x = left_eye_location[0], left_eye_location[1]
y_pixels, x_pixels = y * height, x * width
You may find these helpful. Apologies that there isn’t currently a helper function to do this parsing into a more intuitive format.
Please let me know if this helps or you have any questions.
Is it possible to parse this trt_pose into deepstream and then overlay the pose on the video stream and output? Then use the key points to identify different scenarios?
HI,
when I run the demo, I’m receiving an error message in the firt cell of live_demo.ipynb .
OSError: /usr/lib/aarch64-linux-gnu/libgomp.so.1: cannot allocate memory in static TLS block.
Thanks