Using alwaysAI to Build Your Own License Plate Detection Model

In this tutorial, we’ll walk through the steps and decision points involved in the creation of the ‘alwaysai/vehicle_license_mobilenet_ssd’ model, an object detection model for identifying vehicles and license plates. 

Our aim with this tutorial is to provide a detailed overview of how custom models can be trained using the alwaysAI model training tool, starting with data collection and ending with testing your model’s performance. 

For more information about model training, check out this article: Introduction to Computer Vision Model Training.

Fill out this short survey to join our model training beta program and train your own model!


Introduction to the License Plate Model


This model was developed with the intent that it would be general enough to detect both American and European license plates and be applied to a few field use cases, including dashboard cameras and static traffic cameras at:

  • intersections 
  • overpasses
  • parking lots
  • store or house fronts

The model detects four-wheeled vehicles, including 

  • sedans 
  • trucks
  • buses

Data Collection

We collected data from different vantage points that support the desired use cases mentioned above. This included images from highway overpasses, high-angle street views, low-angle street views, parking garages and parking lots. We aimed to capture images that represented readable plates from as many angles and distances as possible. Below are some examples of included images.

Most of the data were originally collected as videos, from which individual frames were then sampled. 

Having a lot of very similar images won’t enhance a model’s performance, so we sampled all of the collected data to ensure that the input data was sufficiently varied. For photos, this simply meant excluding individual photos that were too similar. For videos, a little more work was involved. First, these videos were edited in an application, such as QuickTime Player, to remove large unusable portions of the video, such as sections of dead feed or footage with no cars, etc. Once the videos were cleaned up a bit, they were sampled using ffmpeg

The command that was used was 

ffmpeg -i -r 2 -q:v 1 image_name_%4d.png

This command sampled 2 frames per second (specified with ‘-r 2’), using the highest quality (specified with ‘-q:v 1’) and saved as png, which uses lossless compression (rather than jpg). 

Once the videos were sampled, the resulting images were manually scanned through to eliminate near-duplicates or frames that did not contain objects of interest.

We’ve also created a data collection starter app, which enables you to collect data using your edge device and perform simple sampling and which is available for model training beta users. You can use ‘ffmpeg’, as described above, to perform more advanced sampling.


Data Annotation

The Computer Vision Annotation Tool (CVAT) was used to perform data annotation, using the alwaysAI CLI annotation tool, which can be accessed with ‘aai annotate’. With CVAT, there is a limit to the amount of data that can be uploaded at a given time, so annotation was done in batches and the annotation sets were later merged, which we’ll cover later in this article. For each task (containing approximately 40-100 images each), the labels were kept constant. 

Originally, we used labels ‘car’ and ‘license plate’, however we decided ‘vehicles’ would be more appropriate for a general model and ‘license_plates’ may avoid issues with whitespace down the road. A useful feature of CVAT is that you can log in as a super user, navigate to the ‘admin’ panel, and modify the labels of an annotation task at any time!

When a series of images contained the same annotation objects, the interpolation mode was used; this mode enables annotations to persist from one frame to the next, significantly reducing the time it takes to annotate a dataset.

NOTE: CVAT doesn’t automatically save your work, so remember to save often!

CVAT supports multiple annotation formats, but alwaysAI currently uses only the PASCAL VOC format. 

What to Annotate

In general, we chose to annotate any license plate and vehicle that were discernible and around which a box could be drawn. Occasionally, this meant annotating a vehicle without a license plate or annotating a partially occluded vehicle, although we tried to limit these instances as much as possible. If a portion of an image clearly represented an object that we wanted to detect, even if it was very small, we annotated it.

Merging Annotations

As we mentioned above, you may not be able to upload all your images at once. Therefore, you’ll end up with multiple exported datasets that will need to be merged. The training tool will automatically merge your datasets if you pass in multiple files for training, however you can also easily do this with the alwaysAI CLI command ‘aai dataset merge’ if you want to work with just one dataset and simplify the training command. 

aai dataset merge

In the above command we showed three datasets being merged, however you can specify as many datasets to merge as necessary. 

NOTE: we had the best success with merging ‘raw’ annotation sets exported directly from CVAT. If you need to train on datasets not exported from CVAT, try zipping the individual folders (‘Annotation’ and ‘JPEGImages’) within the parent annotation directory first. Then try merging the resulting zipped files.


Training the Model

We trained the model using alwaysAI’s CLI tool for model retraining. The tool is currently in closed beta and we are rapidly iterating on its features and functionality. More documentation on the training features will be made available soon.

NOTE: if you’re interested in joining the model training beta, fill out this short survey!

Currently, the tool can train an object-detection model by using transfer-learning on a mobilenet ssd network trained on the COCO dataset, using Tensorflow 1.14. There is support for either training using a CPU or using a GPU, and upon completion the model is instantly compatible with the alwaysAI platform. 

We initially trained the model using CPU on a personal laptop, specifically a MacBook Pro. We began with batch size of 1 and increased it to 16, at which point the laptop ran out of memory. This was most likely the result of using large images in the dataset. We decided to use a batch size of 4 for the proof of concept models, varying the number of epochs and training images. Once we were satisfied with the progression of the model, we decided to drastically bump the number of epochs. The specifications of this training and the various configurations are shown in the table below, and training took approximately 20 hours.


Amazon EC2 p2.xlarge instance



Number of Cores



12 GiB

Number of Epochs


Batch Size


Number of Steps


Number of Images


NOTE: There are much more powerful EC2 instances available; we could increase the number of GPUs to 24, and or use the more powerful Tesla V100 cores, but we wanted to test on more realistic machines. 


Analyzing Model Performance


Final Loss


Final mAP


The final statistics for this model indicate that we trained more steps than required. For the loss graph, you can see a long, relatively flat line. It is fluctuating up and down but it definitely flattens. In the graph of precision and recall, I highlight the area where the trend stops angling up. It seems we could have stopped training at about 35k steps and gotten similar results. 

Here is the graph for loss for the training session. Number of steps is on the x-axis, and loss is plotted every 200 steps.

And here is the precision and recall statistics that are output during training.

While the graphs show a flattening, the model doesn’t appear to be overfit to our dataset. This could be confirmed by plotting validation loss, and that is a feature that we will soon implement. The test that we rely on the most is a visual test on novel data, which is easy to perform using the alwaysAI platform, and what we’ll describe next.

Models trained using the alwaysAI CLI can be published using

aai app publish <username/modelname>

Once you publish your model, you can add it to your app using

aai app model add <username/modelname>

You can also test it locally, without publishing it, by using

aai app model add --local-version <version>

We wanted to test how the model would perform in real-time, so we modified an existing starter app to use a file stream and used pre-collected video representing the desired use cases. You can see how this was done in this blog.

Some example output from testing the license plate detector model is shown below:


Next Steps

To access the license plate detector, and many other pre-trained models, sign up for a free account to get started with computer vision today. If you’re interested in participating in our closed beta for model training and training your own object detection model, fill out the survey to sign up! 

You can find a subset of the dataset used in the creation ‘alwaysai/vehicle_license_mobilenet_ssd’ model here and a larger version here, as well as all the code for the corresponding license plate tracker app to test your model on GitHub.

Contributions to the article made by Todd Gleed and Jason Koo

Get started now

We are providing professional developers with a simple and easy-to-use platform to build and deploy computer vision applications on edge devices. 

Sign Up for Free