Mimetics as a Learning Basis - Part 1
A rudimentary model of the human mind could be characterized by the following components:
- Consciousness - to have subjective experience and thought
- Self-awareness - to be aware of oneself as a separate individual, especially to be aware of one’s own thoughts
- Sentience - the ability to feel perceptions or emotions subjectively
- Sapience - the capacity for wisdom
The capacity for mimicry, such as that engaged in by a child for example, and specifically, playful or experimental mimicry draws a thread through each of the above components.
- Consciousness - “my goal is to mimic you, and I have an approach to try for achieving that goal”
- Self-awareness - “I observe you, then observe myself mimicking you, and perceive the differences”
- Sentience - “Your reactions to my mimicry of you may reward/reinforce or punish/discourage my understanding of what happened”
- Sapience - “I am refining my internal model of you/getting to know you, and possibly learning some general things about others”
As we observe and interact with the world, we refine our internal model of what we perceive, which can be thought of as a mimicry of reality, biased and augmented by our own direct experiences.
To better understand how mimetics can be used to draw a thread through various components of the mind, a system is being developed that:
- Observes a person
- Tries to build an internal physical behavior model of that person by predicting their actions and correcting for disparities
- Mimics aspects of the person’s behavior and watches for reactions
- Classifies the reactions as positive, ambiguous, and negative and refines behavior to generally seek approval
To start, a system was put together that observes a person, groups their typical actions into sequences, then predicts what action will play out as soon as some motion from the person is detected.
Initially, the method from Clustering and Recognition of Spatiotemporal Features through Interpretable Embedding of Sequence to Sequence Recurrent Neural Networks was explored, but performing algorithmic agglomerative clustering produced much the same results without the computational load of a neural net. To wit, various clustering methods (such as k-means) are commonly used in unsupervised learning scenarios.
Some issues with standard clustering algorithms pertaining to the system being developed are:
- How to match same clusters across multiple clustering runs
- Pure clustering algorithms typically need external parameterization to get them to work well, and “work well” is “as determined by a human observer”, but what’s needed is a standalone, self-parameterizing approach
To that end, a simple motion detection algorithm was used to decide “stillness” and “action” motion segments of a 3D skeletal time sequence, then an algorithm that analyzes joint locations across each segment was used to cluster similar actions. This resulted in a set of signatures for each segment that can be used to both detect the start of actions, and then play the action (or stillness) out as a prediction of what the person will do.
Below is the 3D representation (derived from video processed through Deep High-Resolution Representation Learning for Human Pose Estimation plus VideoPose3D) of a person sitting with their legs on a pouf, working on their laptop and occasionally putting their hand to their head in contemplation.
The projection on the right is the internal model of the system being constantly updated by predictions of what will happen next. As soon as motion is detected, the system tries to find a matching, known, action-start signature, and based on that, predicts what action will play out as time continues.
The signatures were trained on a video of the subject performing various actions while sitting on that same pouf.
Next is to close the loop with prediction scoring combined with a self-correction/learning method.