Pose estimation

Curated list[]

https://github.com/cbsudux/awesome-human-pose-estimation

hand gesture[]

https://github.com/facebookresearch/InterHand2.6M Official PyTorch implementation of "InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image", ECCV 2020

https://github.com/otaheri/MANO

simplify x[]

https://github.com/vchoutas/smplify-x Expressive Body Capture: 3D Hands, Face, and Body from a Single Image https://smpl-x.is.tue.mpg.de/ To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face.

General github repos[]

https://github.com/carolineec/EverybodyDanceNow , https://www.youtube.com/watch?v=PCBTZh41Ris This paper presents a simple method for "do as I do" motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves

CMU[]

https://github.com/CMU-Perceptual-Computing-Lab/MonocularTotalCapture

http://domedb.perception.cs.cmu.edu/mtc.html

https://github.com/Project-Splinter/MonoPort

https://github.com/Arthur151/CenterHMR

https://github.com/hongsukchoi/Pose2Mesh_RELEASE

https://github.com/Lotayou/densebody_pytorch

https://github.com/victordibia/handtracking

https://github.com/intel-isl/OpenBot

https://github.com/lixiny/bihand

https://github.com/vchoutas/expose

https://github.com/XingangPan/SCNN

akanazawa[]

https://github.com/akanazawa/human_dynamics based on Alphapose. Tracks people better than Deep sort.

https://github.com/akanazawa/hmr End-to-end Recovery of Human Shape and Pose. Indicates for example if a person is carefully sneaking around a corner, it can even flag the same person by his unique gait. Human Mesh Recovery (HMR): End-to-end adversarial learning of human pose and shape. We present a real time framework for recovering the 3D joint angles and shape of the body from a single RGB image. Bottom row shows results from a model trained without using any coupled 2D-to-3D supervision. We infer the full 3D body even in case of occlusions and truncations. Note that we capture head and limb orientations.

We present Human Mesh Recovery (HMR), an end-to-end framework for reconstructing a full 3D mesh of a human body from a single RGB image. In contrast to most current methods that compute 2D or 3D joint locations, we produce a richer and more useful mesh representation that is parameterized by shape and 3D joint angles. The main objective is to minimize the reprojection loss of keypoints, which allow our model to be trained using images in-the-wild that only have ground truth 2D annotations. However, reprojection loss alone is highly under constrained. In this work we address this problem by introducing an adversary trained to tell whether a human body parameter is real or not using a large database of 3D human meshes. We show that HMR can be trained with and without using any paired 2D-to-3D supervision. We do not rely on intermediate 2D keypoint detection and infer 3D pose and shape parameters directly from image pixels. Our model runs in real-time given a bounding box containing the person. We demonstrate our approach on various images in-the-wild and out-perform previous optimization-based methods that output 3D meshes and show competitive results on tasks such as 3D joint location estimation and part segmentation. from Neural papers with code.

hybrik[]

https://github.com/Jeff-sjtu/HybrIK join kinematics estimation.

HRnet[]

https://github.com/HRNet

https://github.com/HRNet/Higher-HRNet-Human-Pose-Estimation

https://jingdongwang2017.github.io/Projects/HRNet/PoseEstimation.html . Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process.

https://jingdongwang2017.github.io/Projects/HRNet/SemanticSegmentation.html

from https://paperswithcode.com/task/pose-tracking

Facebook[]

Facebook's openpose and its python version Alphapose

People tracking[]

https://github.com/Guanghan/lighttrack In this paper, we propose a novel effective light-weightframework, called as LightTrack, for online human posetracking. The proposed framework is designed to be genericfor top-down pose tracking and is faster than existing onlineand offline methods. Single-person Pose Tracking (SPT)and Visual Object Tracking (VOT) are incorporated into oneunified functioning entity, easily implemented by a replace-able single-person pose estimation module. See Deep sort for yolo3 based people tracking.

https://github.com/PJunhyuk/people-counting-pose People counting. See Activity analysis

Single RGB image[]

https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image

crowd estimation[]

https://github.com/thomasgolda/Human-Pose-Estimation-for-Real-World-Crowded-Scenarios . In this work, we explore methods to optimize pose estimation for human crowds, focusing on challenges introduced with dense crowds, such as occlusions, people in close proximity to each other, and partial visibility of people.

SelecSLS[]

https://github.com/mehtadushy/SelecSLS-Pytorch combine with yolo based Deep sort object tracking. https://arxiv.org/pdf/1907.00837.pdf We present a real-time approach for multi-person 3D motion capture atover30fps using a single RGB camera. It operates in generic scenes and isrobust to difficult occlusions both by other people and obje

zhec[]

https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation OpenPose is freely available for free non-commercial use, and may be redistributed under these conditions. Please, see the license for further details. Interested in a commercial license? Check this FlintBox link. For commercial queries, use the Directly Contact Organization section from the FlintBox link and also send a copy of that message to Yaser Sheikh.

This is why https://aws.amazon.com/rekognition/ is expensive, they have to pay lots of money for using open source proprietary code. In South africa you do whatever you want, if you have assets, then setup a Fronting company.

vnect[]

http://gvv.mpi-inf.mpg.de/3dhp-dataset/

https://www.youtube.com/watch?v=m3KG_Z0P_nU We present the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera. Our method combines a new convolutional neural network (CNN) based pose regressor with kinematic skeleton fitting. Our novel fully-convolutional pose formulation regresses 2D and 3D joint positions jointly in real time and does not require tightly cropped input frames. A real-time kinematic skeleton fitting method uses the CNN output to yield temporally stable 3D global pose reconstructions on the basis of a coherent kinematic skeleton. This makes our approach the first monocular RGB method usable in real-time applications such as 3D character control---thus far, the only monocular methods for such applications employed specialized RGB-D cameras. Our method's accuracy is quantitatively on par with the best offline 3D monocular RGB pose estimation methods. Our results are qualitatively comparable to, and sometimes better than, results from monocular RGB-D approaches, such as the Kinect. However, we show that our approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras.

https://github.com/timctho/VNect-tensorflow

https://github.com/makkunda/Real-time-3D-Human-Pose-Estimation-with-a-Single-RGB-Camera

https://github.com/XinArkh/VNect

https://github.com/hwangdonghyun/VNect-TensorFlow-Keras

https://github.com/vrm-c/UniVRM In the past, when trying to handle the 3D humanoid avatar (3D model) in Virtual Reality, Virtual YouTuber, etc., it was necessary to develop an unique system for applications and fine-tune the 3D model data due to… The output data is depend on how creators make the 3D model and what modeling tools are used The coordinate system is different. The scale is different. The initial pose is different. The expression is different… Needless to say, the way bones put into the 3D model is also different

The format of handling the 3D model data is different. Each company has its own specifications, which are way more complex than necessary. Also, the necessary information about their file formats is not fully provided Whether the FBX file, which is compatible with various software, is viable for an application and which version of the application can process the FBX file are main issues concerned by most of the users.

There is not enough necessary information available from the viewpoint of using the 3D model data as the Avatar For example, in the first-person view, how to obtain the right position, how to get rid of the head display, which part of the body should be excluded and so on With the use of avatar in VR applications grows up very fast, if the situations mentioned above remain unchanged, application developers and 3D model creators will have to spend double or triple effort. To improve the current situation, based on the humanoid character and avatar, the first step is to:

Effectively absorb and unify the differences from the model data Make handling the 3D model easy on the application side Here we propose a platform-independent 3D avatar file format that possesses the above characteristics called VRM.

densepose[]

http://densepose.org/

https://www.youtube.com/watch?v=EMjPqgLX14A Can machine vision map humans from videos to 3D Models? Yes! DensePose is a new architecture by the team at Facebook AI research that does just that. It uses a convolutional network with some special features like region of interest pooling and cascading to make this happen. It was also trained on a newly created labeled dataset that mapped human poses to 3D models. The team open sourced the dataset but not the code, but using the details in the paper we can recreate their results. I'll explain how it works in this video. https://github.com/llSourcell/3D_Pose_Estimation

https://github.com/n1ckfg/OpenPoseRig

https://github.com/facebookresearch/VideoPose3D

https://github.com/facebookresearch/pythia

nvidia labs[]

https://github.com/NVlabs/6dof-graspnet and video https://www.youtube.com/watch?v=y5EJXeEiB1o

https://github.com/NVlabs/Deep_Object_Pose This is the official DOPE ROS package for detection and 6-DoF pose estimation of known objects from an RGB camera. The network has been trained on the following YCB objects: cracker box, sugar box, tomato soup can, mustard bottle, potted meat can, and gelatin box. For more details, see our CoRL 2018 paper and video.

https://www.youtube.com/watch?v=yVGViBqWtBI&feature=youtu.be For the first time, an algorithm trained only on synthetic data is able to beat a state-of-the-art network trained on real images for object pose estimation on several objects of a standard benchmark. Learn more here: https://nvda.ws/2CvO2Jy

https://arxiv.org/abs/1809.10790 Using synthetic data for training deep neural networks for robotic manipulation holds the promise of an almost unlimited amount of pre-labeled training data, generated safely out of harm's way. One of the key challenges of synthetic data, to date, has been to bridge the so-called reality gap, so that networks trained on synthetic data operate correctly when exposed to real-world data. We explore the reality gap in the context of 6-DoF pose estimation of known objects from a single RGB image. We show that for this problem the reality gap can be successfully spanned by a simple combination of domain randomized and photorealistic data. Using synthetic data generated in this manner, we introduce a one-shot deep neural network that is able to perform competitively against a state-of-the-art network trained on a combination of real and synthetic data. To our knowledge, this is the first deep network trained only on synthetic data that is able to achieve state-of-the-art performance on 6-DoF object pose estimation. Our network also generalizes better to novel environments including extreme lighting conditions, for which we show qualitative results. Using this network we demonstrate a real-time system estimating object poses with sufficient accuracy for real-world semantic grasping of known household objects in clutter by a real robot.