Appearance-Based Body Language Detection

Pre-assaultive behavior cues are often communicated with body language, ie, blading, pacing, hand position, head motion, etc. A reliable means of body language detection is sought.

The initial scheme is to use an appearance-based model for body language detection. Basically, have a library of canonical desired poses expressed as COCO-style keypoints, then search it for matching poses displayed by the scene subjects.

The match-scoring formula used is from Google’s PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model paper (formula #6) which is the sum of keypoint deltas weighted by their confidence scores. We normalize the pose coordinates before evaluating them so we don’t use the lambda factor.

Normalization includes cropping the keypoints box to the origin, then resizing to max height = 1, maintaining aspect-ratio.

Another match-scoring formula, using body part angles and lengths is found in the blog entry Pose estimation and matching with TensorFlow lite PoseNet model.

The following shows successful results detecting the subject blading before attack.

Source video: Boxing Stance: Wide vs Narrow? Squared vs Bladed? by fightTIPS

Blading in other video clips was not detected due to different camera distances (resulting in differing body part angles) and camera distortions (such as slight fish-eye’ing). This approach seems brittle. Using heuristics with pose approximations is probably good for detecting some motions like hands above belly-button, nervous hand movement, head nodding/shaking, but some stances such as lowering and blading are more subtle. A neural net classification approach may prove more robust.