Overcoming Challenges in Bringing CV Applications to Production
Developing a Computer Vision (CV) application and bringing it to production requires integrating several pieces of hardware and software.
How can we ensure the pieces work seamlessly together? With the right methodologies, we can expedite development and deployment of Computer Vision applications. It is essential to find a platform with the goal of helping developers create computer vision applications from scratch quickly and easily with an integrated set of freely available tools. This article describes some of the challenges to developing and deploying CV applications, and how to mitigate them. You can also watch a replay of our webinar on this subject.
- Computer Vision in the World
- Prototyping and Developing CV Applications
- Deploying Computer Vision Applications
- CV on the Edge
- Challenges with Edge Deployment
- Making it All Work
Computer Vision in the World
First, some background. The term ‘computer vision’ refers to the process of a computer being able to analyze visual data as a human would, and make inferences about what that data contains. When integrated in an application, these inferences can be turned into actionable responses. For instance, take self-driving vehicles: computer vision can be used to analyze real-time video data to detect objects in the road, and then the software running the computer vision model can respond to this detection to stop the car or change its path to avoid the object.
Computer Vision is being leveraged more and more to solve diverse real world problems, in fields ranging from security and health care, to manufacturing, smart cities, and robotics. CV can be used to detect cancerous cells in radiology reports, help analyze body movements, like proper gait and posture, or track production on a manufacturing line.
Throughout the rest of this article, we’ll explore the challenges of building and deploying a security camera Computer Vision application. You can find an implementation of such an application that has already been built on GitHub.
This application notices when a new person has entered a frame and records an image of the person. In this blog, we’ll cover the challenges to building such an application, as well as those involved with deploying it, potentially using multiple cameras, but if you’d like to read more about the code logic in the application itself, you can read this blog.
Prototyping and Developing CV Applications
Computer Vision applications involve numerous components and can be extremely complicated to develop. Along with integrating the hardware that does the processing with the application software itself, the process of developing a production-ready computer vision application involves a multitude of steps: data collection and annotation, training computer vision models, integrating existing models made with different frameworks into your application, integrating existing computer vision libraries in the application, etc. Given the potential cost and effort it takes to develop a computer vision solution, ideally you also want to make sure your application is flexible, so that your application can evolve as your problem space evolves. This means ensuring you can use additional model types, move to a different model framework, or change the hardware without breaking your application.
Considering the security application, let’s say you just want to detect people. If you want a model that is guaranteed to work at different times of day, in different weather, etc., as you most likely would for a production-quality application, you may need to collect and annotate images taken from the angle of the camera you’ll use and in different environments and train your own model, iterating through this workflow a few times to achieve the desired performance. However, let’s say you just want a prototype application, in which case you could probably use an existing model that detects people, such as mobilenet_ssd or yolo_v2_tiny.
Ideally, you could try different models to see which one(s) work best, and maybe your application would benefit from using two models at once, to perhaps detect people at different distances, or in different conditions. Once you have your object detection model(s), you can build your application.
You’ll need to use software packages that enable you to access a video stream from a camera, and you’ll need to know how to access the returned predictions from the model you’ve trained, so you can see if that person is new to the frame and save the portion of the image that corresponds to the bounding boxes of the detected person.
Deploying Computer Vision Applications
Another challenge in developing computer vision applications is deployment. Once you have a working prototype of a computer vision application, how do you get it out in the world where it can have an impact?
While the cloud is often seen as a very flexible solution for machine learning applications, it can be prohibitively expensive, have higher latency, and the transmission of data poses a security risk. The alternative and increasingly popular option is to take your CV application to the edge.
CV on the Edge
What is the edge? In general, edge devices are small, lightweight, devices on which a computer vision application can be deployed and run. Many edge devices today even have a Graphical Processing Unit (GPU), or Visual Processing Unit (VPU), which enable usage of a greater range of models and application complexity.
In the context of this article, an edge device is a device such as a Raspberry Pi, NVIDIA Jetson devices like the Jetson Nano or TX2, or various Internet of Things (Iot) devices, which are edge devices that have some capability to sense or assess, and possibly interact with, the environment in which they are used.
While an internet connection can be used to deploy the application, once the application is on the edge device, there is no need for it to have cloud connectivity to function. This means that any inferencing the application does is on the edge device itself, not in the cloud, drastically reducing the time it takes for those inferences to be turned into actions by the application.
For certain use cases, such as self-driving vehicles or security cameras, this is imperative. Aside from the risk that data may be lost being sent to and from the cloud, the additional time required for using a cloud approach can mean not responding to a task in time, which can be catastrophic for tasks like autonomous driving.
In addition to edge devices, there are specialized peripheral devices, in particular cameras, that don’t have internet connectivity themselves and are used by edge devices to improve performance or expand application functionality. With devices like these, this concept of processing on the edge is taken even further. While edge devices typically connect to a USB or ribbon camera, which passes the image data to the device for processing, these devices incorporate a processor into the camera itself, reducing the inference time even further.
Additionally, because the data doesn’t need to travel to the cloud with edge deployment, all data can stay in a closed circuit on the device itself, which is much more secure, and, without the need for cloud processing, much less expensive.
Challenges with Edge Deployment
Deploying a production-quality computer vision application on the edge comes with its own set of challenges. With cloud computing you can scale instances when computation demand ramps up, however with edge devices, you are limited to the power of the individual device. Additionally, with edge deployment you may not have access to device status without integrating additional packages.
Making it All Work
We’ve discussed the complexity of developing computer vision applications as well as deploying them; how can we mitigate these challenges and get Computer Vision applications to production as quickly as possible? If you look at a platform like alwaysAI, there are a couple key methodologies utilized to reduce the complexity of this workflow: containerization and Application Programming Interface (API) abstraction.
Containerization refers to the process of bundling up software and dependencies in one unit, which can be executed in different environments. AlwaysAI leverages containerization using Docker images to enable deployment on different devices, including Raspberry Pi and Jetson Nano. It also incorporates open source computer vision tools, such as the Computer Vision Annotation Tool (CVAT), that rely on containerization directly into our package to make installation and usage of these tools simpler.
alwaysAI has also developed an API to further accelerate development. With this API, users can interact with and manipulate computer vision related objects, such as keypoints, bounding boxes, labels, etc. that are returned from model inference, as well as hardware features like cameras, and external entities like files (images, videos, etc.) without having to know all the details behind these requests. How does this work? An API serves as a kind of negotiator between a complicated backend and an end user. It defines what kinds of requests can be made by the user and how data is returned. This simplifies usage for the user, by hiding some of the complexities involved in these calls, and helps ensure the only appropriate requests can be made.
The alwaysAI API wraps other APIs to help further facilitate application development. The OpenCV-Python API is a popular computer vision library; alwaysAI has incorporated some of the most common OpenCV-Python API calls into its library in order to simplify utilizing these functions and make sure they work with other aspects of its library. In addition, any alwaysAI app can directly import cv2, the OpenCV library, without needing to install the library or add any dependencies. This way, legacy code depending on cv2 will still work and users can default to the OpenCV-Python API if desired, but development can be done with one library going forward for simplicity.
APIs can also help give developers more fine-grained access to hardware as well. Some specialized cameras, such as the RealSense and OpenNCC Knight cameras, have their own software development kits (SDKs) to facilitate application creation. These SDKs also include APIs, which are likely sufficient for integrating these cameras with existing applications. While hardware features can be accessed directly by using these APIs, it is also possible to incorporate a camera-specific API into another library already being used for application development, which is what we’ve done with our API. This means that end users can access these camera features with the same library their application already uses and it is assured to work.
APIs and containerization can also help overcome the challenges associated with deployment outlined above. Using a container-based deployment service, you can deploy multiple edge devices and monitor their status. Both Kubernetes and Docker Swarm enable monitoring and managing multiple Docker containers, as does balena, which is a commercial service with a freemium option. Both of these methods utilize APIs, which can subsequently be used by third party applications like Prometheus, which is open source, or Datadog, which is commercial, to gain more insights into raw metrics. While this approach does rely on some cloud connectivity, the actual computer vision data is not transmitted across the network, so you still maintain your speed and low cost features of deploying on the edge, and security is improved as authentication is needed to connect to the edge devices, and the application data remains on the edge.
Let’s return to our security camera application one final time. You’ve built your application, you can swap out different cameras easily and you can store images when objects are detected. You can now use the containerization techniques we’ve mentioned above to coordinate deployment of your application across multiple edge devices, as shown in the diagram below. With the right methodologies, platforms, libraries, and hardware can be seamlessly integrated, making CV applications easier to develop and deploy, as well as scalable for production.
Contributions to the article made by Eric VanBuhler and Stephanie Casola