Using Multiple Object Detection Models

In this article I will demonstrate how to easily modify existing apps offered with alwaysAI to use two object detection models simultaneously, and to display the output in side-by-side frames.

Below is an example of the expected output. In this case, both detection models identified two bottles, and you can see the output from the first detection model in green on the top image frame, and the output from the second detection model in red in the bottom frame.

Screen Shot 2020-02-25 at 1.28.14 PM

Before we get started, you’ll need:

  1. An alwaysAI account (it’s free!)
  2. alwaysAI set up on your machine
  3. A text editor such as sublime or an IDE such as PyCharm, both of which offer free versions, or whatever else you prefer to code in

Please see the alwaysAI blog for more background on computer vision, developing models, how to change models, and more.

Why use multiple object detection models concurrently?

To be clear, I am not referring to ensembling, which is when you integrate multiple less performant models to yield a more robust one, but rather combining two models with distinct output at the same time. This can be useful when you

  • Have a need for detecting a set of different objects, but can only find models that cover this entire desired set when combined,
  • Want to determine which model is best for detecting a specific object type,
  • Want to detect one object type at different camera depths or angles, which a single model may not be trained to do
  • etc...

Especially when building a prototype, this type of model combination can save a lot of time.

All of the finished code from this tutorial is available on GitHub.

Let’s get started.

After you have your free account and have set up your developer environment, you need to download the starter apps; do so using this link before proceeding with the rest of the tutorial. 

With the starter applications downloaded, you can begin to modify an existing starter app to use two object detection models. The app that was modified for this tutorial was the ‘Object Detector’ app, so cd into the starter apps folder and into the ‘realtime_object_detector’ folder. 

cd ./alwaysai-starter-apps/realtime_object_detector

The object detection model used by default in the ‘realtime_object_detector’ starter app is ‘alwaysai/mobilenet_ssd’. We are going to add a second model, ‘alwaysai/ssd_inception_v2_coco_2018_01_28’. The original model, ‘alwaysai/mobilenet_ssd’ identifies 20 object classes including humans, animals, some furniture, bikes, trains, plains, potted plants, etc.. The new model, ‘alwaysai/ssd_inception_v2_coco_2018_01_28’ identifies many of the objects that ‘alwaysai/mobilenet_ssd’ does, however not all, for instance, it is not trained on planes. By using libraries that are similar, we can also see if one is perhaps better at detecting certain types of objects than the other. However, you could use one model that detects faces (‘alwaysai/mobilenet_ssd_face‘) and another that detects hands (‘alwaysai/hand_detection’), as a different use case, or find other models that may better suit your specific app by browsing the model catalog

Since we added a new model to the app, we will need to add this model to our app environment by typing into the command line:

aai app models add alwaysai/ssd_inception_v2_coco_2018_01_28

NOTE: You can change any model by following the ‘Changing the Computer Vision Model’ documentation.

Now that you’ve added your model to your app environment, you can begin to modify the starter app code. We’ll do this in the following steps:

  1. Initialize detection objects for all models being used. The code in the original app is on lines 16-18. This code initializes a new object detector and loads the engine. Instead of simply duplicating this code to create a second object, which is not extensible, a somewhat more elegant solution is to track all of the models you want to use in an array, and go through the same start up process for all of them in a loop.
    1. In main, create a list to store all of your models; in the finished code I call this list ‘models’. This is more extensible than simply duplicating the original code. 
    2. We’ll put ‘alwaysai/mobilenet_ssd’ as the first element in ‘models’ and ‘alwaysai/ssd_inception_v2_coco_2018_01_28’ as the second element in ‘models’. Your code should now look like this:
models = ["alwaysai/mobilenet_ssd","alwaysai/ssd_inception_v2_coco_2018_01_28"]
  1. Store colors for each model you use. 
    1. Initialize a list called ‘colors
    2. Add two entries in the format [(B, G, R)] for each element. Your code should now contain the following lines:
colors = [[(66, 68, 179)], [(50, 227, 62)]]

NOTE: This is optional, but will make the differences in detections made by models easier to identify. If you omit this, there will be an additional line of the new code you should omit as well.

  1. Initialize and maintain detection objects for all models, where each detector object is comparable to ‘obj_detect’ in the starter app code. 
    1. Since we are using multiple models, we need multiple detectors. Create a list of detectors. Add the following line of code to your app:
      detectors = []
    2. Now iterate through models using a for loop like so:
      for model in models:
    3. Inside the loop, initialize a new object detector for each model:
      obj_detect = edgeiq.ObjectDetection(model)                      obj_detect.load(engine=edgeiq.Engine.DNN)                    

NOTE: this is almost identical to the original starter app’s code, but now instead of ‘alwaysai/mobilenet_ssd’, we just use ‘model’ to pull out the next model from the list.

    1. Now, append the newly created detector object to the list you created in the step above so we can keep track of all the detector objects:
      detectors.append(obj_detect)
    2. Print out the details for the object detector: you can copy the original code from lines #20-23 in the original code right in to your loop to accomplish this.

Now, index i of the models list will correspond to index i of colors list, as well as index i of detectors list in the new code.

Original code:

def main():
obj_detect = edgeiq.ObjectDetection(
"alwaysai/mobilenet_ssd")
obj_detect.load(engine=edgeiq.Engine.DNN)

print("Loaded model:\n{}\n".format(obj_detect.model_id))
print("Engine: {}".format(obj_detect.engine))
print("Accelerator: {}\n".format(obj_detect.accelerator))
print("Labels:\n{}\n".format(obj_detect.labels))

fps = edgeiq.FPS()

New code:

def main():

# if you would like to test an additional model, add one to the list below:
models = ["alwaysai/mobilenet_ssd", "alwaysai/ssd_inception_v2_coco_2018_01_28"]

# if you've added a model, add a new color in as a list of tuples in BGR format
# to make visualization easier (e.g. [(B, G, R)]).
colors = [[(66, 68, 179)], [(50, 227, 62)]]

detectors = []

# load all the models (creates a new object detector for each model)
for model in models:

# start up a first object detection model
obj_detect = edgeiq.ObjectDetection(model)
obj_detect.load(engine=edgeiq.Engine.DNN)

# track the generated object detection items by storing them in detectors
detectors.append(obj_detect)

# print the details of each model to the console
print("Model:\n{}\n".format(obj_detect.model_id))
print("Engine: {}".format(obj_detect.engine))
print("Accelerator: {}\n".format(obj_detect.accelerator))
print("Labels:\n{}\n".format(obj_detect.labels))

fps = edgeiq.FPS()

With the code as it is now, we can add a model to the ‘models’ list and the same initialization process will be completed for each model. The ‘detectors’ list will store the object detectors. NOTE: the colors list is in BGR format (not RGB!)

Now, add a ‘for loop’ inside the ‘while loop’ to loop through every detector in detectors. This will involve the following steps:
     5.  generate a new image for each model:
    • Change the variable ‘frame’ in line #38 to ‘object_frame’.
    • Change the ‘colors’ attribute to the call to ‘markup_image’ in this same line to, using colors[i] as the value. This will override the existing colors option, making all colors for a given model the same. If you don’t want to do this, or you didn’t create a ‘colors’ list, omit this change. Otherwise, your code should look like this:
      object_frame = edgeiq.markup_image(frame, results.predictions, show_labels=False, colors=colors[i])

      6.  Below this, create an if/else statement to overwrite the input feed with the first model detected. This isn’t strictly necessary, without this step there will always be a plain video stream with no detection frames shown on the image. 

    • We’ll create a new frame and call it ‘display_frame’.
      • If we are on the first iteration, i == 0, set ‘display_frame’ to be ‘object_frame’; otherwise, concatenate the two frames together. Add the following code:
        if i == 0:           
        display_frame = object_frame


        else:             
        display_frame = numpy.concatenate((object_frame, display_frame))

      7.  Reformatting the text for the output images. 

    • Remove lines #41-45, which appends the model and inference time to the screen. This is too busy with more than one model.
    • Now add the model and the inference time to the text for each prediction inside the innermost for loop that appends the prediction text to each image. Your code should look like this:
      for prediction in results.predictions:   
      text.append("Model {} detects {}: {:2.2f}% (inference time: {:1.2f})".format(detectors[i].model_id, prediction.label, prediction.confidence * 100, results.duration))

This will make it easier to know which model is making which prediction on the output screen.

The final while loop code will look like this:

# loop detection
while True:
frame = video_stream.read()

text = [""]

# gather data from the all the detectors
for i in range(0, len(detectors)):
results = detectors[i].detect_objects(frame, confidence_level=.5)
object_frame = edgeiq.markup_image(frame, results.predictions, show_labels=False, colors=colors[i])

# for the first frame, overwrite the input feed
if i == 0:
display_frame = object_frame
else:

# otherwise, append the new marked-up frame to the previous one
display_frame = numpy.concatenate((object_frame, display_frame))

# append each prediction
for prediction in results.predictions:
text.append("Model {} detects {}: {:2.2f}% (inference time: {:1.2f})".format(detectors[i].model_id, prediction.label, prediction.confidence * 100, results.duration))

# send the image frame and the predictions for both
# prediction models to the output stream
streamer.send_data(display_frame, text)

fps.update()

if streamer.check_exit():
break

That’s it! Now you can build and start your app to see it in action. You may want to configure the app first, especially if you changed edge devices or created a new folder from scratch. Do this with the following command and enter the desired configuration input when prompted:

aai app configure

Now, to see your app in action, first build the app by typing into the command line:

aai app deploy

And once it is done building, start the app with the command:

aai app start

Now open any browser to ‘localhost:5000’ and you should see the output illustrated at the beginning of the article!

Get started now

We provide developers with a simple and easy-to-use platform to build and deploy computer vision applications on edge devices.

Get started