How to Recognize Human Activity and Pose Estimation Using alwaysAI
by Andres Ulloa | Feb 14, 2020 | Models | 10 minute read
by Andres Ulloa | Feb 14, 2020 | Models | 10 minute read
The ability to recognize human activity with computer vision allows us to create applications that can interact with and respond to an user in real time. For instance, we can make an application that gives feedback to a user in the moment so that they can learn how to recreate the perfect golf swing, or that sends an immediate alert for help when someone has fallen, or that generates an immersive augmented reality experience based on the user's position.
In computer vision we use a technique called Pose Estimation to achieve these goals. Pose Estimation maps out a person’s physical frame in an image by assigning sets of coordinates known as key-points to specific body parts (some pose estimation models return vector maps, while others return key-points. We will be focusing on a model that returns key-points in this guide). Once we have this map of key-points, we can begin to determine a person’s activity based on their positions in a video stream.
alwaysAI provides a set of open source starter models in the Model Catalog. The following example uses one of the starter models with a simple algorithm in order to achieve its goal.
In this example we'll be using the alwaysai/human-pose model, along with a set of checks to determine if someone is performing a “Y,” “M,” “C,” or “A” pose. Then we’ll overlay the corresponding letter on the screen when the person in the image is in one of these poses.
The final code for this project can be found at this GitHub repo.
I’ll be prototyping this application on my laptop running Ubuntu, and then deploying it to an edge device. If you use Windows or MacOS, you can develop this app using remote development and your edge device.
To get started, we’ll use the real-time pose estimator app from the set of basic starter apps provided by alwaysAI.
First, I install alwaysAI and download all of the starter apps by running the command:
aai get-starter-apps |
Now let’s try running the pose estimator app to see what it produces. After I navigate into the realtime_pose_estimator directory, I have to configure my project by running "aai app configure". I choose “Your local computer” for my deployment option and the default settings for the rest. Then I have to install the model and the dependencies, by running "aai app install". Finally, I start the application by running "aai app start". I click on the http://localhost:5000 link that appears in my terminal (after successfully starting the app) to open the Streamer window:
cd alwaysai-starter-apps/realtime_pose_estimator |
Looking at the code in the realtime_pose_estimator app, I see that it loads the model, camera, and Streamer in order to write the key-point values to the Streamer, and to display the image overlaid with the connected key-points. Here each pair of key-points maps to a body part (when the key-points appear as [-1, -1], it means that these parts aren’t in the scope of the image and so can’t be found by the model).
pose_estimator = edgeiq.PoseEstimation("alwaysai/human-pose") |
But how do we know which key points correspond to each body part? Well, the documentation provides a handy little map which I can add to the starter app as a dictionary.
BODY_PART_MAP = {"Nose": 0, |
The origin point (where the axes are 0, 0) of the image here is the top left corner, so the further up and to the left something is in the image, the closer it is to 0.
Let's think about the problem we are trying to solve. If we want to determine whether someone is doing the YMCA poses, we will need to know where a person's arms are in relation to each other and the person's head. Is there a way that we could find out if a person's arm and head key-points match a pose using the key-points available to us? Let's try to define a “Y” pose. A “Y” pose is when a person extends both arms straight and angled outwards from their body above their head. So we can define a “Y” pose as:
all([arms_overhead(pose), arms_outward(pose), arms_straight(pose)]) |
We can define the arms_overhead function with the key-points given to us, for example:
def arms_overhead(pose): |
Here we’re saying that when a person has their arms over their head, all wrist and elbow key-points will be above the nose. Let's take a crack at the arms_outward function:
def arms_outward(pose): |
Here we’re saying that the right wrist and right elbow must come before the nose in the x direction, and the left wrist and left elbow must come after (note that we’re talking about the person’s right wrist, and not the right wrist in the image).
Now let's try arms straight. This is a tough one.
def arms_straight(pose): |
We’re saying that if a wrist is higher up than the elbow, and farther away from the nose than the elbow, then it’s basically straight (enough so for our purposes here).
So, there you have it! We’ve created a function that determines if a pose is “Y.” We just need to repeat this process to finish building our simple algorithmic classifier.
I applied the same logic to all 4 categories:
def is_y(pose): |
And created the associated body position functions:
def main(): |
Now that I have my app working, I want to run it on my edge device (I’ll be using a Pi 4 with a Pi camera attached) and test it. To do this, I use “aai app configure” to set up my target configuration to allow my code to be deployed and run on my device:
aai app configure |
Now whenever we run “aai app install” it will deploy our updated application, and “aai app start” will run the application on the device.
aai app install |
We’ve walked through a simple example of modifying the pose estimation app so that it can detect when a person is doing the YMCA. This same process can be expanded to many use cases that involve assessing a person’s activity and actions.
The alwaysAI platform makes it easy to build, test, and deploy computer vision applications such as this YMCA detector. We can’t wait to see what you build with alwaysAI!