Work with Semantic Segmentation from within the Terminal

This blog post is a follow-up to the Hacky Hour presented on 12/3/2020, Build a Virtual Green Screen with Semantic Segmentation.

This Hacky Hour demoed a couple of applications:

  1. The original virtual green screen repository
  2. An application that smooths the edges returned by running the semantic segmentation

The virtual green screen repository shows you how to use semantic segmentation results to replace pixels in one image that correspond to labels in another image. The model that was most successful in pulling out people in this application was a more coarse-grained model, resulting in blocky segmentation edges. The purpose of the second repository is to demonstrate how you can create smoother edges using numpy array manipulation and other techniques.

This blog can be used as a reference when watching the above YouTube link or working your way through the repositories.In this tutorial, we’ll cover how to interact with the alwaysAI python API, edgeiq, from the command line. This is useful for stepping through different manipulations of the segmentation masks and viewing the output.

Set Up

Note, you’ll have to have alwaysAI installed on your local machine to work through this tutorial. You can download alwaysAI from the alwaysAI dashboard.

First, let’s set up your environment. You’ll need to configure and install any application that uses a semantic segmentation model; so using either of the above repositories, or any of the starter apps that use semantic segmentation will do. 

Choose your application, and from within the working directory run 

aai app configure

You should select your local machine for the directory. Then then running

aai app install

Next, activate the python virtual environment, which will have all the required packages installed, by running

source venv/bin/activate

Finally, start up your python terminal by running


You should see the terminal change to the python interpreter, which has the prompt 


Now you’re all set to work through this tutorial!


  • You can always check the type of an output in python using `type(variable)`
  • You can try to view images with the following code, although I have issues with this from within the venv and python terminal:
def display(image):
cv2.imshow("image", image)

This will create a function you can call using `display(image)` to display any image. I’ve had trouble closing the window, but you can press any key then and be back in the terminal, although the display window may remain up. 

  • Another solution is to just write out images like so:
cv2.imwrite("test1.png", image)

Any time you want to see the output of a step, I suggest writing an image out, which is what this tutorial will do.

I’ll splice some images in with the code that follows, but you can still copy paste the code text into your terminal to work through this tutorial. Most of the following code is taken from the virtual green screen repository.

First, we need to import the necessary libraries.

import edgeiq
import cv2
import numpy as np
import sys

The alwaysAI API, edgeiq, is what we use for semantic segmentation. We'll use OpenCV's library, cv2, for writing images in and out. Numpy is used for some array manipulation, and sys is used for changing the display format of the numpy arrays.

# use this if you want to see the entire numpy array (not very user friendly, 
# but can be helpful to seeing what labels are used)

You’ll need an image in your working directory, I called mine ‘output.png’. I just got an image off of Unsplash.

output, Photo by Michael Dam on Unsplash.example of Semantic Segmentation

Photo by Michael Dam on Unsplash


# read in the image
image = cv2.imread("output.jpg")

# instantiate a semantic segmentation object and load the engine
semantic_segmentation = edgeiq.SemanticSegmentation("alwaysai/fcn_alexnet_pascal_voc")

# run segmentation and get the results
segmentation_results = semantic_segmentation.segment_image(image)

# to view the class_map of the resulting segmentation_results object:

The last line of code in the block above yeilds the following output (run before re-setting the display options, for the purposes of this tutorial):

array([[ 0,  0,  0, ...,  0,  0,  0],
       [ 0,  0,  0, ...,  0,  0,  0],
       [ 0,  0,  0, ...,  0,  0,  0],
       [ 0,  0,  0, ..., 11, 11, 11],
       [ 0,  0,  0, ..., 11, 11, 11],
       [ 0,  0,  0, ..., 11, 11, 11]], dtype=int32)

We can also write out an what this class map looks like, using:

# to write out an image of the class map (note, the class_map is a numpy array!)
cv2.imwrite("class_map.png", segmentation_results.class_map)

class map. example of how itshuld look like Semantic Segmentation

Note, this is a bit hard to see, but there will be another example further along in the tutorial that will be a bit more obvious.

# get a label map, which is the same dimensions as the class map but uses labels
label_map = np.array(semantic_segmentation.labels)[segmentation_results.class_map]

The above code yields the following output:

array([['background', 'background', 'background', ..., 'background',
        'background', 'background'],
       ['background', 'background', 'background', ..., 'background',
        'background', 'background'],
       ['background', 'background', 'background', ..., 'background',
        'background', 'background'],
       ['background', 'background', 'background', ..., 'diningtable',
        'diningtable', 'diningtable'],
       ['background', 'background', 'background', ..., 'diningtable',
        'diningtable', 'diningtable'],
       ['background', 'background', 'background', ..., 'diningtable',
        'diningtable', 'diningtable']], dtype='<U11')


Now we’ll start building a mask that just has our desired labels in it.

# First just make a numpy array that is the same shape as the class map but just zeros
filtered_class_map = np.zeros(segmentation_results.class_map.shape).astype(int)

This yields the following:

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])
And, just to show how the 'type' command works, we can check the type of the output like so:
Which yields:
<class 'numpy.ndarray'>

And here is what the current filtered_class_map looks like (don’t strain your eyes, it’s just all black for now!):

original filtered class map. example of how itlook like Semantic Segmentation

# you can add labels here if you want to pick up more labels, check out what you get from the 
# label_map print out to see what is being picked up around the person
labels_to_mask = ['person']

# now toggle the cells that mapped to ‘person’
for label in labels_to_mask:
filtered_class_map += segmentation_results.class_map * (label_map == label).astype(int)

# Print out the map

# Now, If we write out the image
cv2.imwrite(“new_filtered_class_map.png”, filtered_class_map)

Again, this output is dim, but it is slightly different than the class_map output:

new filtered class map,example of Semantic Segmentation

# you can also check out the sizes of all the maps we’ve used


The above code yields (960, 640, 3), (960, 640), (960, 640) respectively for my input image's results. Note they're all the same size! The '3' in the first tuple denotes the color array.

# Now we can make a map to pull out just the labels of interest (people)
detection_map = (filtered_class_map != 0)

Example output:

array([[False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])
We can also check out the shape of the detection_map by running 
Example output:
(960, 640)

Now we'll choose a background image. Note that you can find some examples of images in the GitHub repositories! Download one of those and specify whatever name you choose in the example command below.

# Read in an image for your background. I found a picture of a mountain on Unsplash.
background = cv2.imread("mountain_pic.jpg")

mountain pic.example of Semantic Segmentation

Whenever we use a new image, it must be the same shape! So we can resize it with 

shape = image.shape[:2]
background = edgeiq.resize(background, shape[1], shape[0], keep_scale=False)

Note that since the background is horizontal and the input image is vertical, we'll have to use 'keep_scale=False', which will warp the image a bit. You could also choose a vertical image for your background, or keep it as is if your streamer/input image will be horizontal.

Finally, we replace the area in the background that corresponds to the ‘True’ sections in the detection map with the original image

background[detection_map] = image[detection_map].copy()

# And finally write out the new image!
cv2.imwrite("final_background.png", background)

final result of Semantic Segmentation
Now, just the portion of the original image that was labeled 'person' is used to replace the corresponding section in the background. This is a (rough) virtual green screen example!

We'll see some more examples of segmentation masks, including some blurring, next.

Let's change label colors to make the mask easier to view.

# iterate over all the desired items to identify, labeling those white
for label in labels_to_mask:
index = semantic_segmentation.labels.index(label)
semantic_segmentation.colors[index] = (255,255,255)

# build a new mask, and write a new image
new_mask = semantic_segmentation.build_image_mask(segmentation_results.class_map)
cv2.imwrite("new_mask.png", new_mask)

This is the output:

output Semantic Segmentation

Now that we have this mask that is easier to see, let's do some image manipulations. If we want to blur the background, we could do the following:

# Create a blurred mask
blurred_mask = cv2.blur(new_mask, (100, 100))
cv2.imwrite("blurred.png", blurred_mask)

This create the following image:

blurred Semantic Segmentation

If we want to make ONLY the labels we want white, and ALL others black, execute the following code and re-run the for loop above:

# this makes all the colors in the label map black
semantic_segmentation.colors = [ (0,0,0) for i in semantic_segmentation.colors]

After running the for loop, re-building the mask and writing out the output, the result is:

Semantic Segmentation example

Now that we have an idea of how the masks are used and manipulated, let's use these techniques to smooth out those edges and create a more visually-appealing virtual green screen. The next section of the tutorial uses code from the following repository:

The image we're working with is the same as the one shown just above. However, if you want to make sure you have the proper set up, you can re-run the following code:

# read in the initial image again
image = cv2.imread("output.jpg")

# build the color mask, making all colors the same except for background
semantic_segmentation.colors = [ (0,0,0) for i in semantic_segmentation.colors]

# iterate over all the desired items to identify, labeling those white
for label in labels_to_mask:
index = semantic_segmentation.labels.index(label)
semantic_segmentation.colors[index] = (255,255,255)

# build the color mask
mask = semantic_segmentation.build_image_mask(results.class_map)

First, we’ll start off with enlarging the dilation mask (line 95 in the smoothing edges repo).

dilatation_size = 15
dilatation_type = cv2.MORPH_CROSS
element = cv.getStructuringElement(
                        (2*dilatation_size + 1, 2*dilatation_size+1),
                        (dilatation_size, dilatation_size))
mask = cv2.dilate(mask, element)

Now the mask looks like:

Semantic Segmentation example

Next, we'll apply smoothing to the mask:

mask = cv2.blur(mask, (50, 50))

Now the mask looks like this:

Semantic Segmentation example

Next, we'll set up the background image. To be safe, we'll just re-import it and run the resizing code.

# Also import the background image again
background = cv.imread("mountain_pic.jpg")
background = edgeiq.resize(background, image.shape[1],
image.shape[0], keep_scale=False)

Finally, we'll use the 'overlay' function built in to edgeiq. This will use the blurred mask, the foreground image and the background image and interweave them to give nice, smoothed edges.

# Now overlay the background and foreground using the overlay function
frame = edgeiq.overlay_image(frame, background, mask)
cv.imwrite("overlayed_frame.png", frame)

This is the resulting output:

example of Semantic Segmentation

Now, notice that center portion of the woman that is trees? Those look like they correspond to the red portion of the original class map we created. If we look at the 'label_map' output, we see that ‘diningtable’ and ‘sofa’ are other labels that are printed out. Let’s add those to ‘labels_to_mask’ and re-run all of the above code for the smoothing edges section.

Now the output looks like this:

Girl smiling. example of Semantic SegmentationMuch better! 

In this tutorial, we covered how to interact with edgeiq from within the terminal, how to work with segmentation masks, how to replace specific portions of images using these masks, and how to smooth the output from more coarse-grained segmentation models. These skills will help you improve your applications when coarse-grained segmentation models are the only segmentation models available, or are the only models that best fit your use specific case.

Get started now

We are providing professional developers with a simple and easy-to-use platform to build and deploy computer vision applications on edge devices. 

Sign Up for Free