Anatomy of a Successful Computer Vision Project

Anatomy of a Successful Computer Vision project

Can your organization afford to waste years in development cycles to build production-ready Computer Vision solutions: gathering more data, re-training, re-testing, re-configuring the system?

Building and deploying a Computer Vision (CV) application into production requires the integration of various pieces of equipment and programming. The latest advances in CNN-based machine learning Computer Vision techniques have opened-up practical solutions to a wide variety of Computer Vision algorithms. However, if developers fail to follow some simple guidelines to avoid common pitfalls, they can waste many painful and superfluous hours getting the Computer Vision solution into production. For a seamless development process, it is essential to find the right tools and methodologies that can help CV Engineers create Computer Vision applications from scratch quickly and easily. This article defines the 4 key measures an engineer can take to ensure the delivery of a successful Computer Vision project.

Below are four key guidelines to avoid common Computer Vision pitfalls:

Avoid these Computer Vision project

1. Prototype Using a Modular Approach

2. Use Production Data for Model Training

3. Simplify with physical constraints

4. Develop/Test with production hardware

What is Computer Vision?

Computer vision is a subset of artificial intelligence that enables computers to understand the contents of a digital image or video. A Computer Vision solution’s goal is to understand what is in a video stream, and what action to take when a certain element(s) are identified in the stream. At the core of a Computer Vision application is a machine learning model that takes video as an input and outputs predictions about what is in the video. Computer Vision applications are built from a combination of “primitives” such as classification and detection to accomplish complex tasks such as tracking elements of the video feed, recognizing human body language using pose estimation, and detecting people with object detection. Computer Vision on the edge allows you to cost-effectively work with “real life” use cases. For example, inventory movement on a factory floor, shopper dwell time in a retail store, or compliance with hardhat and masking requirements at a job site.

Should you be using Computer Vision in your business?

Computer Vision will be part of any digital transformation effort for a business with a physical component (i.e., manufacturing, security, retail analytics, and robotics). CV provides businesses with unprecedented insights into customers and enables automation with actionable visual data. Using low-code solutions such as alwaysAI in conjunction with Computer Vision appliances like Eyecloud’s cameras can unlock the practical, simple Computer Vision applications which can serve as stepping stones to more ambitious Computer Vision projects. And because the industry is moving very rapidly, the investment in edge processing power needed to support Computer Vision applications is significantly low.

Anatomy of a successful Computer Vision project:

1. Prototype Using a Modular Approach

The complicated nature of CV prototype development is a result of the multiple steps required to build such a production application: data collection and annotation, training Computer Vision models, integrating existing models made with different frameworks into your application, integrating existing Computer Vision libraries in the application, etc. When developing such a solution, it is prudent to ensure the application is flexible and has the potential to evolve along with the production use case. This ensures that the cost and effort associated with the iterative development of your CV application are low. The critical components of the modular development of a prototype are containerization and Application Programming Interface (API) abstraction. This flexibility allows engineers to move to a different model, different model framework, or change the hardware as needed.

At the prototyping stage, an engineer should focus on developing the application’s functionality using a modular approach. In this phase, engineers can begin with pre-trained, off-the-shelf models to evaluate their CV use case, and then swap out a custom CV model.

During the prototyping stage, engineers can also try different CV components and APIs, and try various hardware options to test the functionality of the prototype.

The alwaysAI platform bundles software and dependencies in a single unit (containerization) using Docker images to support deployment on different devices such as the Raspberry Pi, Jetson nano, and most recently complete AI camera kits such as the Eyecloud OpenNCC. alwaysAI also leverages other open-source tools such as the Computer Vision Annotation Tool (CVAT), which relies on containerization directly into the alwaysAI package to make installation and usage of these tools simpler.

In addition to containerization, alwaysAI offers additional tools and APIs to accelerate the development of the Computer Vision prototype. These tools allow solution developers to manipulate Computer Vision-related objects, such as key points, bounding boxes, labels, etc. that are returned from model inference, as well as hardware features like cameras, and external entities like files (images, videos, etc.) without having to know all the details behind these requests.

alwaysAI integrates with other open-source tools such as OpenCV where an alwaysAI app can directly import the OpenCV library without the need to install additional libraries or dependencies. Users can still benefit from leveraging legacy code that relies on cv2.

Because alwaysAI’s platform enables Computer Vision prototypes to be hardware and framework agnostic, using this AI platform to develop a prototype becomes a simpler process. For instance, an engineer can begin to prototype a manufacturing application using a pre-trained object detection model on a Raspberry Pi. Then, the engineer can build and optimize a model using the training tools available on alwaysAI. Once the prototype meets the use case requirements, the engineer can simply transfer the application onto the production edge device.

2. Use Production Data for Model Training

When training a Computer Vision model, the data an engineer uses to generate the CV model will be a key factor in determining the success of the final application. At this stage of the development cycle, engineers have the option of:

  1. Finding a publicly-available dataset
  2. Purchasing a dataset
  3. Generating a custom dataset that closely matches the production environment.

For optimal results in the production environment, a Computer Vision application will be most successful when data used to generate the Computer Vision algorithm closely matches the production data. This means collecting the data under the same environmental conditions that closely match the production environment and using the same appliance to generate the data as the one which will be used to make inferences. Leveraging a production-ready AI vision appliance like OpenNCC during development will improve data quality and reduce time wasted on generating data on prototype hardware. Additionally, data collection should factor in environmental variables such as the final camera position, optics, sensor, illumination, and actual production objects/subjects.

It is also important to note the limitations of many deep learning models. While deep learning models are very good at recognition with simple translations or minor scale changes, these models do not automatically handle image or object rotations, or perceive 3D structure from 2D images; these variations must be represented in the training data.

Example of Anatomy of a Successful Computer Vision Project

Here is a classic example of how imperceptible noise in the image can significantly degrade the application’s accuracy. These are synthetically generated examples to show the worst-case impact of misclassification.

The same recommendations apply when optimizing a pre-trained model. It will be much more simple to use a pre-trained model, apply it to a production environment, and use the model along with the tools such as alwaysAI’s image capture dashboard to generate a production-level data set. Once an engineer has generated the custom data set, she can then annotate the data and begin the process of training.

3. Simplify with physical constraints

For many computer vision problems, satisfying the robustness and accuracy requirements can be dramatically simplified by reducing environmental variability. Application of physical constraints can reduce the amount of training data and system testing required. Some typical approaches include the mounting position of the camera, the illumination of critical aspects of the scene, or constraining the motion of objects in front of the camera. These approaches are usually very fast and inexpensive to implement, however, the biggest challenge is negotiating acceptable limitations with customers (or marketing).

Examples of physical constraints:

Camera Mounting Constraints - Camera angles can have a massive impact on the accuracy of the system. Theoretically, one can train a model to handle additional perspectives but this often diminishes the robustness that could be achieved with a single perspective. Often, doubling the number of perspectives does not double the size of the dataset required, but it does increase training cost and can double the amount of testing required.

Camera Mounting Constraints. example of Anatomy of a Successful Computer Vision Project
Camera Mounting Constraints. example oof Anatomy of a Successfu. example ofl Computer Vision Project

Controlled Illumination - lighting variability (shadows, reflections, and contrast changes) within a dataset can have unfavorable effects on the performance of the machine learning model. Ideally, illumination should be held constant and used to empathize with the critical elements of the scene because this variability can impact training and testing.

Controlled Illumination. example of Anatomy of a Successful Computer Vision Project

 

Controlled Illumination. Anatomy of a Successful Computer Vision Project example

Controlled Subject Orientation / Motion - it is often beneficial to constrain the orientation of the subject/object. One example of this is human face mask detection. It is virtually impossible to reliably detect a face mask on a human face from every orientation because views from some direction (like from behind the head) have no information about the task. This can be solved operationally, by requiring the person to look at the camera during the analysis, making the problem more tractable.

By applying some physical constraints to the Computer Vision problem one can dramatically reduce the time and cost required for sufficient data testing to deliver the required performance. While the implementation of these constraints is low-cost and simple to implement, they may impact some of the flexibility of the system.

4. Develop/Test with production hardware

For the system variability that cannot be eliminated using techniques outlined in Tip #3, it is critical that training data and testing comprehensively represent the entire range of operation expected including very low-probability exceptions or edge-cases. Often, this requires the generation of synthetic training data and test cases to ensure proper system behavior. Doing this analysis early in the development process will allow you to explore a larger solution space in dealing with these issues including more physical constraints described in Tip #2.

These issues vary widely, but an example that illustrates this is outdoor security cameras. These devices must handle a very wide range of environmental conditions including snow, rain, night, day, deep shadows, bright sun, unlimited object types, unconstrained object-orientation/position, and wide variability in-camera mounting position. The good news is that they are generally only required to correctly identify a human in the scene that is moving. The bad news is that a human can be at any orientation, any position, variable size, and is a highly deformable object. Add to this the infinite permutations of clothing, shoes, hats, masks, and other accessories to see how scoping out the edge-cases can be very challenging. The use of a production AI vision appliance like OpenNCC during development will enable better sensitivity and testing under production deployment conditions.

Using the methodologies above one can rapidly update the application when new edge-cases are discovered in testing/deployment.

Conclusion:

Building a Computer Vision application from prototype to production is a challenge even with the latest technologies. However, with the right tools and methodologies, your company can avoid the common pitfalls associated with CV application development. Computer Vision is truly transformative for businesses across all industries. If you would like to learn more about how to develop successful Computer Vision applications for production, visit alwaysAI.co or OpenNCC.com for webinars, blogs, and other resources.

This article was written with the input of Dale Hitt, VP of Marketing with EyeCloud.

Get started now

We are providing professional developers with a simple and easy-to-use platform to build and deploy computer vision applications on edge devices. 

Sign Up for Free