Computer vision-based deep learning projects might seem far beyond the kind of what you and your development team have tackled in the past. However, though it is an emerging technology, computer vision application development cycles remain relatively similar to that of many projects you may already be familiar with.
It’s important to start by considering the requirements for the application. Who is it for? What do they want it to do? What kind of budget are you working with? Next, it’s vital to identify what can and can’t be done with machine learning. Is it within the scope of the machine intelligence to do what you want it to? It’s also important to define who and what will generate the input data. For example, a video feed from a static camera inside an elevator will be very different from a video stream on a consumer smartphone.
Once you’ve answered the aforementioned questions, you can move on to the more familiar territory of:
- Data collection and annotation
We’ll cover these three phases in detail. Our goal is to provide a wider understanding of CV deployment among those who want to get underway, but perhaps still need to iron out their entry strategy.
Prototyping Your Model
There may be multiple use-cases for the application, but it's often helpful to isolate a single use-case for prototyping purposes. Note that prototyping will sometimes create an additional burden; for example, the same set of images may need to be labeled multiple times — once for prototyping, and then again for building the production-grade application. It is possible to avoid this redundant process with careful planning up front, but sometimes datasets change during production and rework is necessary. That said, it’s generally good practice to prototype first. Being able to actually work with something — albeit likely with a trimmed-down feature set — is incredibly valuable in determining an array of needs, as well as potentially exposing blind spots. Even highly experienced developers are unlikely to get everything right during the first pass.
Once a particular use-case has been isolated, we should define the input and output requirements of the application. For example, we may want to count the number of people in the queue at any given time in a retail store, and we might also want to know how many items are purchased by each customer. It is recommended, however, to prototype these use-cases individually, end-to-end, and then optimize. You may need different amounts of data, models, training, and testing for each of the use-cases. Once you have established the initial problem, it’s time to collect data. In our retail store example, you will need to detect and track people in the frame, and then determine application logic based on a person’s potential presence in the queue.
It is important to not go overboard in data collection during the prototyping phase. If there is a pre-trained model available that does person detection, use it! The goal is to see if you need to collect and create a dataset for your problem. If you are lucky enough to get an annotated dataset, and/or a working model for your problem, simply collect data from the deployment system. You may find that the model does not work well due to the camera angles, poor lighting, etc. By adapting the model parameters, such as those that determine the scale/size of objects that will be detected, you may be able to retrain the model for your particular use-case. You can try out different models as well. You may be able to save a lot of time and resources if you can skip the dataset-creation phase of the project.
Data Collection and Annotation
If you discover that it’s an absolute necessity, you should collect and annotate the data, but only for a few images. There's no need to annotate a lot of data, as you may have to redo it. This is by far the most cumbersome part of the project. It is recommended that you do this in-house for the prototyping phase, as it fosters a greater understanding of the computer vision problem, and forces you to think and rethink your annotation strategy based on many possible real-world issues with your application. For example, revisiting our earlier retail example, if multiple people are standing in the line and three of them are in a group (say a mom and two kids), how will your system handle it? Or, if a person joins the queue and then leaves it, how will you handle that? If there’s a lifelike image of a baseball player on a poster in your system’s field of vision (which gets recognized as a person), how will you handle it? Once your annotation strategy is finalized, you will feel much more confident in moving to the next step in the process.
Testing, Testing, Testing
Assuming you encounter and account for the aforementioned edge cases in your real-world application, you should begin testing the models, along with the system on the video feed from the deployment. It's important to understand the accuracy targets before proceeding to the next phase — i.e., understanding whether the accuracy is data-limited or model-limited. If it is data-limited, adding more data will not help and you will need to look at other models. As with any application, testing will uncover some unexpected results. Make sure that your team is flexible in handling solutions to problems as they arise.
As with any software project, having access to debugging tools is invaluable. While the debugging tools for model development are readily available and widely used, the need for tools that enable developers to quickly understand and isolate failures from the deployed system applications is not well appreciated. Consider the scenario in which the application performance depends on a combination of models, computer vision tracking, and other pre- and post-processing algorithmic components. When you deploy such a complex system and it returns a failure case, it can be hard to isolate the sub-system at the root cause of failure. It is crucial to develop and utilize system-level debugging tools that can be used for testing and are well understood by the entire development team. This is similar to system integration testing carried out in traditional software development.
Deployment, Optimizations, and Maintenance
Similar to traditional software development, computer vision-based software has deployment, maintenance, and optimization phases. During the initial deployment it is important to analyze any failures and their associated costs. For example, the cost of missing some detections versus picking up a few false detections differs greatly. Another aspect to consider is the effective frames per second (FPS) achieved on the live system. In certain cases, the detection latency and FPS performance criterion may not be met when the processing times vary, or if the compute resources are being shared amongst applications. These observations can be useful in optimizing the application. The maintenance phase usually consists of adding additional objects to detect, or incorporating the data from failure cases.
Get Started Building Computer Vision Applications
While there are some differences specific to CV, implementing it is still akin to the standard software development process. It’s mainly important to learn how the system operates, to understand what data is important and what isn’t, and to make sure you’re building the right solution. alwaysAI can get you up to speed in your CV project quickly and reliably — why build it from scratch when we’ve done a lot of the heavy lifting for you?