At the AWS re:Invent conference, we deepened our collaboration with Qualcomm® Technologies by demonstrating real-time object detection and pose estimation on the Qualcomm® Robotics RB3 platform. Based on the Qualcomm® SDA845 system-on-a-chip (SoC), the RB3 platform enables the creation of high-performing computer vision applications on robots and other IoT devices. We built an application on a demo robot that showcased the powerful combination of the Qualcomm® Robotics RB3 platform, our deep learning computer vision developer platform, and the training capabilities of the AWS Sagemaker.
Our demo robot is an example of an application utilizing Qualcomm® processors — however, with the combination of any powerful processor and our developer platform, along with the natural language processing technology of Amazon Alexa, the deployment and management features of AWS IoT, and the AWS Robomaker, a developer could easily create other helpful applications, such as an elderly care robot that detects a person falling and triggers a response call for help.
Let’s walk through our demo example to see how the robot performs using object detection and pose estimation. This is a simple demo to depict the power of this robotics platform, as well as the power of our alwaysAI computer vision platform.
In our demo, the Qualcomm® RB3 board is driving our turtle bot.
And here, Vikram, the Senior Director of Computer Vision at alwaysAI, is showing the particular view from the camera that the robot is getting.
We are running two networks on the RB3 platform. One is for object detection, which is detecting the presence of the Hulk figure as an object. The other network is for pose estimation, which you can see is determining that the Hulk is standing up.
Now, if we make the Hulk lie down, or “fall down” in the example of the elderly care application, you will see that the pose estimation and object detection networks will identify that the Hulk is down.
Here, the robot recognizes that the Hulk is down and estimates that the pose the Hulk is in might mean that a rescue attempt is needed. The robot then displays the command prompt, “HULK DOWN. LET’S RESCUE HULK.”
In this demo, our robot approaches the Hulk and completes a rescue. However, another version of this application might alert someone for help or dial emergency services — which could be very beneficial in the case of an elderly person living alone, as there could be a Roomba-like robot that would follow the person around the house, and if the person were to fall down, the robot could move close to the person and ask if they need help, listen for an answer, call an ambulance if needed, and more.
Now that our rescue demo is complete, we can imagine how pose estimation combined with the power of the RB3 robotics platform can lead to meaningful real-world applications. Our goal is to enable developers and entrepreneurs to quickly prototype and deploy applications for a variety of industries and use cases, including retail and elderly care.
As a separate example, in a retail analytics environment, you can imagine that if a customer is taking items off of a shelf and reading product labels, and then putting the items back, or is looking around for a customer service representative, that could mean that they need questions answered. In that case, we could send a robot close to the customer and start a conversation to determine how to best help answer their questions. With support for gesture recognition, the customer could also seek the attention of the robot using a gesture such as a hand wave.
So, the power of combining object detection and pose estimation on our computer vision platform installed on an RB3 device is pretty incredible. There is huge potential for developers to use our platform to create solutions for many significant and far-reaching use cases.
Our developer platform works with many AI model framework technologies and endpoint environments, and comes pre-loaded with a growing model catalog, starter apps, and core computer vision APIs (including object detection, object tracking and counting, image classification, and pose estimation) enabling the development of a wide range of computer vision applications.