Rudimentary Multi-Object Tracking

With person detection and pose estimation demonstrated, what stood out was a need to track the same person frame-to-frame, to be able to make body language determinations over time. A rudimentary multi-object tracking (MOT) scheme was added using each person’s bounding box center to track them frame-to-frame. Basically, if a center in the next frame is close enough to the center in the prior frame, we call it the same object.

To handle occlusion, where one person walks in front of another, the occluded center was projected based on its prior 10-frame average velocity, and if in a later frame, a center came close enough to the projected center, we call it the same object. Rudimentary, but good enough for now.

Detection of hands being held at the waist, a simple pre-assaultive cue, was added mostly as a placeholder for pose-based analysis.

Finally, a simple graphics overlay was created as a start for the HUD development to produce HelVision v0.0.1.