It's time: software and hardware technologies are advancing to a place where you can easily equip low-power, resource-constrained edge devices with AI computer vision capabilities. Here's a brief overview of how CV started, what CV is now, where it might be going, and how you can use it to empower your existing equipment or new project. Ultimately, developers everywhere are now able to easily build and deploy deep learning computer vision applications to make your business more functional, productive, and profitable.
A Brief History of Computer Vision
Computer vision got its start in the 1950s. Back then, it had almost zero resemblance to the real-time object detection and tracking technology you see now. The high cost and low power of sensors, data storage, and processing all played a role in limiting the industrial applications.
Back then, the only place you saw computer image recognition was in academic studies — and science fiction. The first “seeing” robots, so to speak, were only tested on their abilities to recognize basic shapes in highly controlled conditions. Facial recognition and motion tracking functions were relegated to the comics and pulp novels.
Robots might not be thinking for themselves yet, but you can harness AI to allow your devices to see for themselves. Various advancements — computer optics, processor development, computer science, big data — have made CV what it is today.
Here's a timeline of computer vision breakthroughs for the past seventy years:
Computer Vision Timeline
- IBMs first magnetic data-storage systems
- Frank Rosenblatt’s Perceptron
- Development of the complementary metal-oxide-semiconductor
- MIT artificial intelligence lab begins computer vision projects
- Introduction of AGM-65 Maverick optical contrast seeker missiles with onboard cameras for targeting
- Optical Character Recognition (OCR) debuts via Kurzweil Computer Products
- Smart cameras, such as the Xerox optical mouse, and layered image processing
- The Internet and the dot-com boom
- Viola-Jones object detection framework developed for facial recognition and object detection.
- Pascal VOC Project launched for developing image datasets for training.
- Security Show Japan 2005 features the first mobile phone with working facial recognition feature.
- AI’s average object-recognition error rate becomes better than humans’; 2.5% versus 5%.
- First ImageNet competition held in which developers compete for developing image recognition models. ImageNet now contains up to 15 million images in 20,000 categories.
- Google’s X Lab’s neural network starts to look for cat videos when allowed access to YouTube.
- October -- Tesla releases autopilot in its Tesla Model S electric car. The revolutionary autopilot not only worked offline but offered advanced capabilities including parking capabilities and semi-autonomous driving.
- Google introduces FaceNet for facial recognition. Requires minimum data set.
- Niantic Inc. releases Pokemon Go, an AR based mobile game which relied on mobile cameras to display Pokemon in augmented reality. Players could globally battle them by traveling to certain physical locations and using their mobiles to “see” them through the in-app camera feature.
- Tesla releases Hardware Version 2 on its vehicles, a much more advanced version which could automatically switch lanes, apply brakes, manage acceleration and deceleration depending on road conditions and utilize radar signals to actually “see” ahead.
- Vision Processing Unit (VPU) microprocessors which are designed for machine vision processes start to trend.
- Online cosmetic and fashion retail industry giant ASOS integrates feature which enables searching product through photos from client end. Other applications soon follow suit.
- Candy retailers Lolli & Pops deploys facial recognition in physical stores for identifying frequent customers as part of a reward program.
- Google announces its Google Maps Live View which provides live viewing of different sites using augmented reality on mobile cameras.
- NVIDIA announces that it will make its hyper-realistic face generator StyleGAN open source.
The Current State: What Can Computer Vision Do Now?
As you can see from the timeline, in sixty years science fiction has become scientific reality. Not only is the technology functional, but it has also gone from cumbersome, processing-heavy hardware to embedded on low-cost, low-power chipsets. You can place your apps on standalone edge devices — no need for cloud computing resources, high-power devices, constant connectivity, and so on.
With the plethora of training data and models being developed today, computer vision applications can identify, recognize, and track nearly any object. You can now use it to identify just about anything humans can see; for example, customers in line in a store, loiterers outside a facility, dropped objects in a public area, or assets on an assembly line. You can also use it to identify things the human eye can't detect, such as minuscule defects in highly refined products or cancerous cells during a medical procedure.
An Eye Towards the Future
In general, it's clear that the world is heading towards using more CV-enabled devices, especially at the edge. Some estimates put the worldwide total of security cameras at around 350 million. If you add that to all the existing cameras on smartphones, smart home devices and so forth, it represents a major opportunity for businesses and enterprises to embed deep learning, decision-making capabilities into their existing equipment.
In the next five years, you will likely see greater adoption and use of edge computer vision applications spanning multiple industries. Security enterprises will depend on drone assistants to provide birds-eye-view perspectives humans cannot. Grocery stores will keep customers safe and avoid liability by using cameras to detect spills immediately. Retail stores will optimize the layout of products on their floor as well as gain valuable data about which areas of a shelf customers focus on the most. Factories will use computer vision to perform preventative maintenance by catching defects in manufacturing line output early and often. Consumer electronics companies will make their appliances and products smarter by recognizing
Here are a few examples in various industries where deep learning CV could play a key role:
- Security: An automated human-detecting drone flyover of a train-yard could eliminate dangerous patrols.
- Retail: Existing cameras could detect low stock and notify managers.
- Entertainment: Cameras and object tracking CV tools could place viewers of an entertainment program in the action they're watching.
- Health and Wellness: Fitness tracking devices could be embedded into workout equipment, such as treadmills, to modify training regimens or suggest changes.
- Medicine and Medical Devices: Object classification using sensors that are more sensitive than human vision could assist doctors during diagnosis.
- Agriculture: Harvesting equipment with object counting technology could immediately establish yield numbers.
- Manufacturing: Quality-control stations could use AI pattern recognition to quickly find small circuit board defects.
- Automotive/Transportation: Computer vision tracking and counting tools could inform traffic management systems' rerouting capabilities for faster and more effective responses of autonomous vehicles.
- Smart Home/Consumer Devices: Smart devices could monitor children's breathing and heart rates to enhance baby monitoring.
- Drones/Aerospace and Aviation: Machine-vision measurement tools could help mechanics increase the accuracy and speed of maintenance checks for aircraft.
Looking farther down the line, you may have a CV-powered self-driving car — it will be as ubiquitous of a safety feature as airbags and seatbelts are now. Car manufacturers are already incorporating computer vision applications into vehicles with driver assistance functionality guiding parallel parking or braking a vehicle when it comes too close to an object.
Beyond the automotive industry, it is likely that these advancements coupled with the technologies being developed around artificial intelligence and computer vision could allow for such CV-enabled devices to be universally used for mission-critical business applications, such as to perform scheduling for retail and ordering for manufacturing.
As future generations continue tech innovation, there exists a massive opportunity for everything to be supported by CV, whether it's caring for kids, trying out a new recipe in the kitchen or buying new clothes.
Computer Vision Key Terms
The core functions of CV all have to do with how your applications treat images and handle objects. Identifying the use case for computer vision as it applies to your business needs is the first step in using deep learning.
As you enter the computer vision field and start to explore building it into your application, here are some important, high-level terms you’ll want to understand:
- Object: When looking for “things” in your image, you’re looking for objects. An object can be almost anything in your image. You’ll set out to identify that object, understand what class it belongs to, and assign it a label with a particular level of confidence.
- Model: Models analyze uploaded or inputted data to learn what a particular object looks like. Models require examples of both positive and negative identifications to establish their probability tables for detecting objects. In order for your model to be accurate and efficient, large datasets of quality images or videos are necessary. Therefore, the more data a model receives during training, the more reliable it will be. Deep learning CV platforms are able to provide pre-trained models or enable you to train your own models.
- Device: Smart devices make decisions using AI tools without human input. This includes edge and IoT devices capable of off-cloud and low-power CV applications such as object detection, tracking, and counting.
- Segmentation: Computer vision applications split up images in order to generate regions for object proposals in a process known as segmentation. At a granular level, pixels are grouped according to certain characteristics which then inform how boundaries and objects are identified. The way that the machine segments images is one of the main factors in how quickly it performs other computer vision tasks.
- Detection: Object detection is a CV function that takes image data and uses a trained model to return the probability that a known object is in the frame. Objects have to be detected before they can be classified, tracked, counted or recognized.
- Tracking: Object tracking identifies objects in live or recorded video and then follows their positions as they move throughout the field of view. One example of this would be tracking a person as he or she crosses a street at an intersection or observing the movement patterns of animals in a zoo enclosure.
- Counting: Counting functions return the population count of object proposals in an image, a number usually limited by a preset probability threshold. For example, you could develop an application for a drone that would count the number of damaged plants in a farm after a storm.
- Classification: AI models are often trained to recognize specific classes of objects, such as people, vehicles or circuit patterns. For example, you might want to use image classification to scan pictures of dogs in social media posts to determine which was the most popular breed in a specific market.
- Recognition: Object recognition uses detection to find familiar objects, give a probability score for them via classification and locate them in the frame. Recognition is commonly thought of in facial recognition where biometric security systems match a particular face to a known positive occurrence of the face in the past.
Challenges Deploying CV on Embedded Devices
A company may look to drive down data transfer and storage costs by deploying deep learning capabilities directly on the device that is being manufactured or sold. Whether it is hardware limitations, connectivity issues, environmental factors or resource constraints, building deep learning applications for edge devices can be a significant challenge. Developers need to consider the size of their models, processing requirements for their app, and several other factors when working on an embedded device. It is also important to think about how a company can scale their application from prototype to hundreds (or thousands) of devices.
These challenges are being met with new entrants into the computer vision space aimed at keeping processing requirements low, file sizes down, and leveraging API platforms to plugin to core CV functions without the necessary time-intensive development process.
How Do I Get Started?
A Look at Our Platform
We want to give all developers across any enterprise or industry the tools needed to deploy CV in a simple and affordable way without needing to dive into the deep inner workings of deep learning. Our robust SDK, pre-trained model libraries and open API platform make integrating CV a straightforward process.
What we promise you:
- An intuitive, easy-to-use API platform
- A platform that gives you the power to apply deep learning computer vision to your specific business needs
- A system that interfaces and supports a broad range of embedded and IoT devices
- Pre-trained models or the ability to use your own based on your own custom training data
- A robust SDK to get you up-and-running quickly
- Simple APIs that allow you to use of complex, incredibly technical frameworks
- The ability to implement core CV services quickly, such as object counting, tracking classification and detection
We would love for you to test out our platform. It will work on any 32- or 64-bit ARM-based developer board running Linux.
We can’t wait to see what you build. Start deploying your computer vision solutions as quickly and easily as possible.