Recent advances in technology have greatly broadened the scope of object detection and related computer vision (CV) services. Hardware with advanced features paired with smarter neural networks has attracted developers and data scientists from numerous industries to start leveraging computer vision to solve complex business challenges. Combined with the rising popularity of embedded devices capturing data on the edge, computer vision on a grand scale has been exploding with seemingly endless potential to revolutionize the way the world collects and analyzes real-world data.
What is Object Detection?
Object detection is the process of identifying objects within images, more than often in real-time. For example, it will identify and isolate instances of cars, humans, bikes, and buses from a real-time video feed of a busy street. It isolates objects of interest from the image through recognition, localization, and classification. The primary objective of object detection is to simply identify and label the presence of objects.
Detection allows developers to employ several other core computer vision functions like object tracking, counting, image classification and more, to equip their devices with machine learning capabilities. For example, a security camera in a retail shopping mall could be trained to detect the presence of objects in a store and be further trained to classify those objects, such as a person, more narrowly by gender, age range, or other identifying characteristics. Tracking and counting of those “objects” can also be trained into models to allow for further data analysis.
Object detection is based on the foundation of 3 core steps.
- Object Localization: Identifying certain regions of the image from which objects will be identified and will contain the bounding boxes.
- Object Classification: Once an object has been detected, it is run against the training data within the model for classification and to apply a label to which class it belongs.
- Non-Maximum Suppression: The multiple bounding boxes are eliminated to form a single bounding box for the entity.
Working with Embedded Devices
Working with embedded devices on the “edge” means building applications for deployment straight onto system-on-chip (SoC) environments where processing happens directly on the device rather than pushing data to a cloud server. For example, a simple dual-core ARM chip lacking a GPU can successfully support machine learning applications, still with memory to spare.
Embedded devices impressively resolve a number of issues typically dealt with when relying on the cloud. Latency issues combined with high bandwidth requirements are eliminated when building for edge computing. However, working with edge devices must be thought through strategically and considerations must be made for power supply constraints, limited processing power, and storage capabilities on the device. These device-side considerations affect your development decisions including how robust the model is that you’re using in your application. More training data means more processing power and model size. If you know your intended application will be working in a resource-constrained environment, take steps to ensure not to use a bloated model and slow the processing speed of your device. When building applications for edge computer vision, finding a balance between device/environment constraints and the model used in the application itself can make or break your results.
Advantages of Edge Computer Vision
Edge devices do not require an internet connection for functionality as the computing for object detection is entirely done on the device itself. This process minimizes the complexity of the program by scaling down detection filters and converting the feed to grayscale. For example, when tracking an object, ML algorithms compare the current image detected with the details of the image detected immediately previous in order to track movements instead of comparing multiple frames, which could strain processing time and accuracy. Onboard sensors are able to relay information in real-time to the SoC, which processes it without delay.
Autonomous vehicles are another example. A self-driving car will be able to perform without latency issues as it does not require a cloud-connected data analytics process. The autonomous vehicle, drone, security camera, or nearly any other embedded device does not need to relay its data, await processing, receive and then respond according to results as would be required from cloud-tethered computer vision solutions. Latency is not an option for developers or data scientists interested in capturing real-time data at the source. Real-time machine learning and optimizations are an absolute need for today’s businesses.
The Future of Object Detection
Developer-friendly, edge-enabled CV software providers have empowered devs, data scientists, and forward-thinking executives from industries, ranging from retail and aerospace to medical and mechanical, to embrace object detection applications in their operations. With the ability to integrate autonomous decision making in real-time based on visual feedback, business operations can be streamlined effectively and efficiently. Moreover, the cost of human resources and operative redundancy can be drastically minimized.