Computer vision-based deep learning projects can seem like a completely different kind of animal compared to what you and your development team have tackled in the past. Even though it is an emerging technology, the software development cycle remains relatively similar to many projects you may be more used to.
It’s important to always start with the business requirements for the application. Who is it for? What do they want it to do? What kind of budget are you working with? Next, it’s vital to identify what can and can’t be done with machine learning. Is it within the scope of the machine intelligence to do what you want it to? It’s also important to define who and what will generate the input data. For example, a video feed from a static camera inside an elevator will be very different from a video stream on a consumer smartphone.
Once you’ve answered the aforementioned questions, you can move on to the more familiar territory of:
- Data collection and annotation
We’ll cover these three phases in detail. Our goal is to achieve a wider understanding of CV deployment among those who want to get underway, but perhaps still need to iron out their entry strategy.
Prototyping Your Model
There may be multiple use-cases in the application, but in many cases, it is important to isolate a single use-case for prototyping purposes. Note that prototyping will sometimes create an additional burden. For example, the same set of images may need to be labeled multiple times, once for prototyping, and again for building the production-grade application. This doubling of work is possible to avoid with careful planning up front, but sometimes datasets change during production and rework is necessary. That said, it’s generally good practice to prototype first. Being able to actually work with something — albeit likely with a trimmed down feature set — is incredibly valuable in determining an array of needs, as well as potentially exposing blind spots. Even highly experienced developers are unlikely to get everything right during the first pass.
Once a particular use-case has been isolated, we should define the input and output requirements of the application. For example, we may want to count the number of people in the queue at any given time at a retail store. Now, in this case, there may be other applications such as how many items were purchased by each customer. It is recommended, however, to prototype these use-cases individually, end-to-end and then optimize. You may need different amounts of data, models, training, and testing for each of the use-cases. Once you have the initial problem, it’s time to collect data. In our retail store example, you will need to detect and track people in the frame, and determine application logic based on a person’s presence potential in the queue.
It is important to not go overboard in data collection during the prototyping phase. If there is a pre-trained model available that does person detection, use it! The goal is to see if you need to collect and create a dataset for your own problem. If you are lucky enough to get an annotated dataset, and/or a working model for your problem, collect data from the deployment system. You may find that the model does not work well due to the camera angles, poor lighting, etc. By adapting the model parameters, such as those that determine the scale/size of objects that will be detected, you may be able to retrain the model for your particular use-case. You can try different models as well. You may be able to save a lot of time and resources if you can skip the dataset creation phase of the project.
Data Collection and Annotation
If you discover that it’s an absolute necessity, you should collect and annotate the data, but only for a few images. It’s best to not go overboard and annotate a lot of data as you may have to redo it. This is by far the most cumbersome part of the project. It is recommended that you do this in-house for the prototyping phase, as it fosters a greater understanding of the computer vision problem, and forces you to think and rethink your annotation strategy based on many possible real-world issues with your application. For example, revisiting our earlier retail example, if multiple people are standing in the line and three of them are in one group (say a mom and two kids), how will your system handle it? Or if a person joins the queue and leaves it, how will you handle that? If there’s a lifelike image of a baseball player on a poster in your system’s field of vision (which gets recognized as a person), how will you handle it? Once your annotation strategy is finalized, you will feel much more confident in moving to the next step in the process.
Testing, Testing, Testing
Assuming you encounter and account for the aforementioned abnormalities in your real-world application, you should begin testing the models, as well as the complete system on the video feed from the deployment. Strive to understand the accuracy targets before proceeding to the next phase. Understand if the accuracy is data-limited or model-limited. If it isn’t data-limited, adding more data will not help and you will need to look at look at other models. Similarly, if the performance is limited by the amount of data, then altering the models is unlikely to yield the desired increase in accuracy. As with any application, testing will uncover some unexpected results. Make sure that your team is flexible in handling solutions to problems as they arise.
As with any software project, having access to debugging tools is invaluable. While the debugging tools for model development are readily available and widely used, the need for tools that enable developers to quickly understand and isolate failures from the deployed system applications is not well appreciated. Consider the scenario in which the application performance depends on a combination of models, computer vision tracking and other pre- and post-processing algorithmic components. When you deploy such a complex system and it returns a failure case it can be hard to isolate the sub-system at the root cause of failure. It is crucial to develop and utilize system level debugging tools that can be used for testing and are well understood by the entire development team. This is similar to system integration testing carried out in traditional software development.
Deployment, Optimizations, and Maintenance
Similar to traditional software development, computer-vision-based software has deployment, maintenance and optimization phases. Typically, during the initial deployment it is advisable to keep a human in the loop and not completely rely on the output of the application. It is also important to analyze the failures and their associated costs. For example, the cost of missing some detections versus detecting a few spurious ones differs greatly. Another aspect to consider is the effective frames per second (F.P.S.) achieved on the live system. In certain cases, the detection latency and FPS performance criterion may not be met when the processing times vary or if the compute resources are being shared amongst applications. These observations can be useful in optimizing the application. The maintenance phase usually consists of adding additional objects to detection, or incorporating the data from failure cases.
Get Started Building Computer Vision Applications
While there are some differences specific to CV, implementing it is still akin to the tried and true software development process. It’s most important to understand what’s different than what you’re used to — learning how the system operates, understanding what data is important and what isn’t, and making sure you’re building the right solution. alwaysAI can get you up to speed in your CV project quickly and reliably — why build it from scratch when we’ve front-loaded a lot of the heavy lifting for you?