Tracking Smoothing and HUD Integration

An issue with the person bounding-boxes provided by the detector is the variability of their sizes. This makes for harder de-duplication and re-identification. By creating bounding boxes from pose information, box size variability was greatly reduced and tracking was improved.

Creating a center-of-mass box for tracked persons using only the neural net pose estimator data resulted in jittery results, so a weighted, prior-3-positions, smoothing scheme was applied resulting in a less jerky tracking box per person.

Weapon detection highlighting graphics was added to the HUD.

As the Deep Hi-Res Net pose estimator employed doesn’t provide actual hand and finger keypoints (such as what’s produced by AlphaPose), a hand approximation was created by extending the elbow-to-wrist vector, then adding two more “fingers”. This hand approximation was used to determine when the subject was holding a weapon.

The following video shows all the detection and pose estimation data used, plus one of the hand approximations for illustration.

All the diagnostic artifacts have been removed from the following video, to produce the HelVision v0.0.2 prototype.