Create Your Own Virtual Green Screen

Virtual green screen

As you most likely noticed in the image above, the edges generated by this model are fairly large. In a subsequent tutorial, I’ll cover how to smooth these edges for a less blocky look!

In this tutorial, we will use semantic segmentation to create your own virtual green screen and separate a person from their background, and subsequently blur the background or replace it with a different image. You may have seen this feature in video conferences, or in ‘behind-the-scenes’ footage from movies. Increased numbers of people are now working from home, learning from home, and using video chat features to connect with friends instead of meeting in person. Adding a green screen feature to video streaming applications offers users more privacy or interesting options while still being visually present for online meetings, parties, or game nights.

This tutorial utilizes code found in two other blogs: the first demonstrates how to separate configuration information for your code using a configuration file, and the second shows how to remove unwanted items from images using semantic segmentation.

To complete the tutorial, you must have:

  1. An alwaysAI account (it’s free!)
  2. alwaysAI set up on your machine (also free)
  3. A text editor such as sublime or an IDE such as PyCharm, both of which offer free versions, or whatever else you prefer to code in

Please see the alwaysAI blog for more background on computer vision, training models, how to change models, and more.

All of the code from this tutorial is available on GitHub.

Let’s get started!

After you have your free account and have set up your developer environment, you need to download the starter apps; do so using this link before proceeding with the rest of the tutorial.

We will use an existing starter app as a foundation, so cd into the starter apps folder and into the ‘semantic_segmentation_cityscape’ folder. There will be two components to this tutorial:

  1. Put all settings in a configuration file.
  2. Processing the video stream.

First, set up our configuration file. Having your settings in a separate file enables you to change them easily and without fear of unintentionally altering your existing code. This portion of the tutorial was adopted from this original blog post.

1a. Create the configuration file using your file explorer or by going to the terminal. If you are on a Mac, enter

touch config.json

Otherwise, for windows, enter

type nul > config.json

1b. Then, copy-paste the following contents into your config.json file:

“model_id” : “alwaysai/fcn_alexnet_pascal_voc”,
“target_labels” : [“person”],
“background_images” : “images/”,
“image” : “beach_pic.jpg”,
“blur” : true,
“blur_level” : 35,
“use_background_image” : false

Here, we specify the type of segmentation model to use (alwaysai/fcn_alexnet_pascal_voc), which labels we are interested in segmenting out, which for this application will just be people, the folder in which to store alternative background images in, which image to use, or whether we would like to just blur the background and the degree to which the background is blurred.

NOTE: the higher the blur_level, the more distinct the segmentation edges will appear around the person. You can try various levels, deciding on whether it is more important to disguise your backdrop or to avoid hard edges in your frame.

1c. Add the following import statements to the top of your file:

import os
import json
import numpy as np
import cv2 as cv

1d. Add the following static variables underneath the import statements:

CONFIG_FILE = “config.json”
SEGMENTER = “segmenter”
MODEL_ID = “model_id”
BACKGROUND_IMAGES = “background_images”
IMAGE = “image”
TARGETS = “target_labels”
BLUR = “blur”
BLUR_LEVEL = “blur_level”
USE_BACKGROUND_IMAGE = “use_background_image”

1e. Create a method that handles opening the configuration file and returning the JSON data to main; copy the following underneath the static variable declaration:

def load_json(filepath):
# check that the file exists and return the loaded json data
if os.path.exists(filepath) == False:
raise Exception(‘File at {} does not exist’.format(filepath))

with open(filepath) as data:
return json.load(data)

This code checks that there is a configuration file, and if so, converts the JSON data into python data and returns it to the calling method.

1f. Now, call the load_json method from main, using the static variable we declared in step1.d as input, and store the resulting data in a local variable:

def main():
config = load_json(CONFIG_FILE)

1g. Finally, set local variables inside main that pull the remaining settings out of the configuration file:

    labels_to_mask = config.get(TARGETS)
model_id = config.get(MODEL_ID)
background_image = config.get(BACKGROUND_IMAGES) + config.get(IMAGE)
blur = config.get(BLUR)
blur_level = config.get(BLUR_LEVEL)
use_background_image = config.get(USE_BACKGROUND_IMAGE)

Now the configuration is all set up!

NOTE: if you’re deploying on an edge device and working off of the starter app, you are all done. If instead you deploy locally, you may need to create a ‘requirements.txt’ file and add ‘opencv-python’ to that file, in order to fulfill this dependency. For more information on dependencies, see this link.

This should be the same for Windows and Mac.

Now, we will edit the starter app to segment a person out of a video stream and return the altered image.

2a. Replace ‘semantic_segmentation = edgeiq.SemanticSegmentation(“alwaysai/enet”)’ with the following code:

    semantic_segmentation = edgeiq.SemanticSegmentation(model_id)

This will use the configuration variable for the model instead of hard-coding it.

2b. Remove the ‘image_paths’ and the corresponding print statement below it. We won’t be streaming in images, and instead will use a video stream, which we will configure in the next step.

2c. Where the ‘image_paths’ declaration was, instead add the following code:

    fps = edgeiq.FPS()

This tracks the frames per second of a video input stream.

2d. Now change the ‘with edgeiq.Streamer(…. As streamer:’ line to:

with edgeiq.WebcamVideoStream(cam=0) as video_stream, edgeiq.Streamer() as streamer:
# Allow Webcam to warm up

# loop detection
while True:
# read in the video stream
frame =

This will use a video stream instead of individual static images.

2e. As we are now reading in video frames and have deleted the ‘image’ variable, change the ‘results = semantic_segmentation.segment_image(image)’ to instead be:

                results = semantic_segmentation.segment_image(frame)

2f. Delete the following lines from the original starter app:

mask = semantic_segmentation.build_image_mask(results.class_map)
blended = edgeiq.blend_images(image, mask, alpha=0.5)

The legend will be bulky on our output screen and as we only are tracking people, these labels will not be very descriptive for us. The calls to build_image_mask and blend_images will not be needed either, as instead of masking the image, we will just cut out the person from the original frame.

Now, we’re going to build a restricted class map, which maps only to the objects we want to identify, namely people. The full tutorial of how to do this can be seen in this blog. We’re going to slightly alter this methodology, and only track the part of the class map that matches people, then cut out the corresponding portion of the input video frame to keep as it is, blurring the rest.

First, we’ll build the ‘filtered’ class map. Add the following lines to the code to the program:


filtered_class_map = np.zeros(segmentation_results.class_map.shape).astype(int)

for label in labels_to_mask:
filtered_class_map += segmentation_results.class_map * (label_map == label).astype(int)

# just the part of the map that is people
detection_map = (filtered_class_map != 0)

Now we will either blur the image, if this variable is set to ‘true’ in the configuration file, or we will replace the background with a new image.

2g. Replace the background with the new background image by adding the following lines of code:

                # the background defaults to just the original frame
background = frame

if use_background_image:
# read in the image
img = cv.imread(background_image)

# get 2D the dimensions of the frame (need to reverse for compatibility with cv2)
shape = frame.shape[:2]

# resize the image
background = cv.resize(img, (shape[1], shape[0]), interpolation=cv.INTER_NEAREST)

2h. To blur the image, add the following lines of code:

            if blur:
# blur the background:
background = cv.blur(background, (blur_level, blur_level))

2i. Replace the section of the new frame that corresponds to the detected person in the original frame with the original image:

            background[detection_map] = frame[detection_map].copy()

2j. Finally, replace the following lines of code:

            streamer.send_data(blended, text)
print(“Program Ending”)


            streamer.send_data(background, text)

if streamer.check_exit():

print(“elapsed time: {:.2f}”.format(fps.get_elapsed_seconds()))
print(“approx. FPS: {:.2f}”.format(fps.compute_fps()))
print(“Program Ending”)

This sends the new frame to the streamer, updates the video output stream, and also checks if the user closed the program. The ‘finally’ statement is executed whenever the user closes the program or if there was an issue in our ‘try’ block.

That’s it!

Now, to see your app in action, first build the app by typing into the command line:

aai app install

And once it is done building, type the following command to start the app:

aai app start

Now open any browser to ‘localhost:5000’ to see your virtual green screen in action. Below, I’ve shown an example of the output when ‘blur’ is set to true and ‘use_background_image’ is set to false.

Screen Shot using semantic segmentation

As you most likely noticed, the edges generated by this model are fairly large. In a subsequent tutorial, I’ll cover how to smooth these edges in your virtual green screen for a less blocky look.

Get started now

We are providing professional developers with a simple and easy-to-use platform to build and deploy computer vision applications on edge devices.

Get started