Additional Bladed Stance Detection Approaches
Pre-assaultive behavior cues are often communicated with body language such as a bladed stance.
Legs and Feet: Primarily, what we are looking for here is stance. As often happens subconsciously, people will tend to blade themselves towards a perceived threat. Blading refers to a combat style stance, where the dominant foot is behind the non-dominant foot and about shoulder width apart. Blading also serves as a dual clue – most people carrying a concealed weapon will subconsciously blade the weapon side away from the threat, both to protect it and to conceal it.
In addition, a person in a bladed stance is typically over both legs.
Following the appearance-based approach to bladed stance detection, two additional schemes were investigated:
- Neural net classifier approach
- Body part angles and lengths approach
Neural Net Classifier Approach
The strategy was to fine-tune a pre-trained ResNet model. Such models are trained with square images but people are long rectangles, so pose estimatation was performed on each training image (producing keypoints), then each image was cropped to the hips on down to produce a rough square of interest for training purposes.
- Note: A bladed stance also requires the subject to be facing/looking at the observer. Face orientation analysis will be required after the stance component is resolved.
A dataset of 100 bladed stances and 100 relaxed/non-bladed stances was processed producing a model with 82.9% accuracy. In an attempt to increase classification accuracy, an additional 100 bladed and 100 relaxed stances were added to the training data. Accuracy unexpectedly decreased to <80% with the additional data.
Inspection of the incorrectly classified images revealed no particular/actionable pattern for the misclassifications.
Body Part Angles and Lengths Approach
In this approach, knee angles and feet placement in the 2D image were analysed to determine bladed stances. Results were 89% accuracy for bladed stances and 91% for relaxed stances.
Inspection of the incorrectly classified images revealed special cases, largely produced by camera angles, distances, and radial distortions that would lead to an undesirable and endless rabbit-hole of special-case logic in an attempt to increase detection accuracies.
The innate subtleties of bladed stance determinations plus the image variability due to camera angles, distances, and radial distortions make these classifications difficult. The appearance-based, classifier, and body part angles/lengths approaches are fragile in their current states.
In light of the working list of gross body analysis pre-assaultive cues summarized below, taking a case-by-case approach to each queue based only off of a camera-distorted, pose-estimated 2D image seems onerous.
- Body motion (bouncing, pacing, drawing a weapon, etc) - 9 cues
- Posture (bladed stance, hands at waist, tense shoulders, etc) - 6 cues
- Hands (fist clenching, high hands, nervous hands) - 3 cues
A better approach might be to have a non-camera dependent, internal, human-body-mechanics-restricted model for each subject in the scene, updated with pose-estimation results, which can then be analyzed for the 18 cues.
This approach is also more similar to how a person (easily) makes such determinations, which, rely on a very accurate, internal 3D conceptual model of people and how they move and pose.