Build Your Own Video Streaming Server with Flask-SocketIO

The code for this tutorial is available here:

This tutorial will show how to build and run an alwaysAI computer vision app that sends its video feed to a web page. There are many ways to approach this problem, but for this tutorial we’ll stay in the Python ecosystem to build our solution. From the user’s perspective, we’d like to have our alwaysAI app perform inference on a video feed, and then have that video feed and the results of the inference displayed on a web page.

The alwaysAI Streamer debugging tool does exactly what has been described above, but runs on-device and isn’t customizable. This guide will show how to build an off-device server to free up processing in your application!

From a technical perspective, we’ll need three main components:

  • A computer vision app with a client to send the video feed

    • written in Python and running on the edge device.

  • A server to host the web page

    • written in Python Websocket and running on our laptop.

  • A web client to receive and display the video feed

    • written in Javascript and running in a web browser.

Send a data stream from a Python app to a web page

In Python, Flask Websocket is a commonly used framework to host an HTTP server. HTTP works great for hosting our web page but is not well suited for a data stream, so we’ll need to find another solution for our video and data stream. If we only cared about streaming the video to another Python app, we might choose to open a TCP connection from one app to the other. However, in our case we need to get the video stream to a web client which will most likely be written in Javascript. Python WebSockets is a popular protocol for sending a data stream over the web, and SocketIO is a great cross-platform implementation of Python WebSockets that we can use to send our video stream from our computer vision app to our web client. We’ll use Flask-SocketIO for our server, which combines an HTTP server and SocketIO server, and we’ll use the SocketIO Javascript library for the web client.

Make a new directory for your application, and create two directories inside it: computer vision for the computer vision app and server for the server app.

$ tree
├── cv
└── server
Build the server

Let’s start with a very simple server that hosts our web page and passes data from our computer vision client to our web client. First, make a Python virtual environment in your app directory to isolate the server’s app dependencies, and install the app dependencies:

video-streamer$ cd server
server$ virtualenv venv
server$ source venv/bin/activate
(venv) server$ pip install flask flask-socketio eventlet

(The eventlet package is a WSGI server that Flask will automatically use when installed. To use Python WebSockets, a server package must be installed and eventlet works well in practice. It is also used by the alwaysAI Streamer.)

Next, create a file called in the server/ directory and make a basic Flask SocketIO app:

1 from flask_socketio import SocketIO
2 from flask import Flask, render_template
5 app = Flask(__name__)
6 socketio = SocketIO(app)
9 @_app.route('/')
10 def index():
11    """Home page."""
12    return render_template('index.html')
15 if __name == "__main__":
16    print('[INFO] Starting server at http://localhost:5001')
17, host='', port=5001)

Setting the host to '' ensures that the server listens on all network interfaces, enabling us to connect from other devices. Now let’s add some logging to let us know when SocketIO clients connect and disconnect. We’ll create two namespaces, one for the computer vision client and one for the web client.

1 from flask import Flask, render_template, request
2 ...
3 @socketio.on('connect', namespace='/web')
4 def connect_web():
5 print('[INFO] Web client connected: {}'.format(request.sid))
8 @socketio.on('disconnect', namespace='/web')
9 def disconnect_web():
10 print('[INFO] Web client disconnected: {}'.format(request.sid))
13 @socketio.on('connect', namespace='/cv')
14 def connect_cv():
15 print('[INFO] CV client connected: {}'.format(request.sid))
18 @socketio.on('disconnect', namespace='/cv')
19 def disconnect_cv():
20 print('[INFO] CV client disconnected: {}'.format(request.sid))

These logs will help us to understand how the server and clients are interacting. Next we’ll add a message handler to pass messages from the computer vision app to the web client:

1 ...
2 @socketio.on('cv2server')
3 def handle_cv_message(message):
4 socketio.emit('server2web', message, namespace='/web')
5 ...

Finally, let’s pin our dependencies in a requirements.txt file so that we can reproduce our Python virtual environment if needed:

(venv) server$ pip freeze > requirements.txt
Build the alwaysAI computer vision app

Our computer vision app will simply be the alwaysAI realtime_object_detector starter app with the Streamer portion replaced with a SocketIO client. Begin by copying all the files from the realtime_object_detector directory to your cv/ directory. Your file tree now looks like this:

video-streamer$ tree -L 3
├── cv
│   ├──
│   ├──
│   └── Dockerfile
└── server
    └── venv

Let’s begin by parameterizing the Streamer. Use argparse to add a --use-streamer flag to the app:

1 import argparse
2 ...
3 def main(use_streamer):
4 ...
6 if __name__ == "__main__":
7 parser = argparse.ArgumentParser(description='alwaysAI Video Streamer')
8 parser.add_argument(
9 '--use-streamer', action='store_true',
10 help='Use the embedded streamer instead of connecting to the server.')
11 args = parser.parse_args()
12 main(args.use_streamer)

Next, start the Streamer only if the flag is set:

1 def main(use_streamer):
2    ...
3    try:
4        streamer = None
5        if use_streamer:
6            streamer = edgeiq.Streamer().setup()
7        else:
8            # Add SocketIO client here
9            pass
11        with edgeiq.WebcamVideoStream(cam=0) as video_stream:
12            ...
13    finally:
14        if streamer is not None:
15            streamer.close()
16        ...

Build the Python SocketIO client

Now let’s build a SocketIO client that looks like the Streamer. We need to define a class with setup(), send_data(), check_exit(), and close() member functions:

1 import socketio
2 ...
3 sio = socketio.Client()
6 @sio.event
7 def connect():
8 print('[INFO] Successfully connected to server.')
11 @sio.event
12 def connect_error():
13 print('[INFO] Failed to connect to server.')
16 @sio.event
17 def disconnect():
18 print('[INFO] Disconnected from server.')
21 class CVClient(object):
22 def __init__(self, server_addr):
23 self.server_addr = server_addr
24 self.server_port = 5001
26 def setup(self):
27 print('[INFO] Connecting to server http://{}:{}...'.format(
28 self.server_addr, self.server_port))
29 sio.connect(
30 'http://{}:{}'.format(self.server_addr, self.server_port),
31 transports=['websocket'],
32 namespaces=['/cv'])
33 time.sleep(1)
34 return self
36 def send_data(self, frame, text):
37 # Process and send frame to web client
38 pass
40 def check_exit(self):
41 pass
43 def close(self):
44 sio.disconnect()

For the computer vision client connection, set 2 “websocket” as the default transport to skip the HTTP long-polling connection. Beginning with long-polling is useful when a browser may not support websockets, but in this case we know both the client and server support 3 websockets.

Update the main() function to instantiate the new class:

1 def main(camera, use_streamer, server_addr):
2    ...
3    try:
4        if use_streamer:
5            streamer = edgeiq.Streamer().setup()
6        else:
7            streamer = CVClient(server_addr).setup()
8        ...
10 if __name__ == "__main__":
11    parser = argparse.ArgumentParser(description='alwaysAI Video Streamer')
12    parser.add_argument(
13            '--camera', type=int, default='0',
14            help='The camera index to stream from.')
15    parser.add_argument(
16            '--use-streamer',  action='store_true',
17            help='Use the streamer instead of connecting to the server.')
18    parser.add_argument(
19            '--server-addr',  type=str, default='localhost',
20            help='The IP address or hostname of the SocketIO server.')
21    args = parser.parse_args()
22    main(, args.use_streamer, args.server_addr)

We’ll need to add the requests and websocket-client packages to initiate a SocketIO connection, so add both in a requirements.txt file:


We can run a simple test to make sure the computer vision client can connect to the server. Start the server in your Python virtual environment. Then open the cv/ directory in another terminal and use the alwaysAI CLI to deploy and start your computer vision app:

cv$ aai app install
✔ Target configuration not found. Do you want to create it now? … yes
✔ What is the destination? › Your local computer
✔ Check docker executable
✔ Check docker permissions
✔ Found Dockerfile
✔ Write
✔ Build docker image
✔ Install model alwaysai/mobilenet_ssd
✔ Install python virtual environment
cv$ aai app start -- --server-addr localhost
Loaded model:

Engine: Engine.DNN
Accelerator: Accelerator.GPU

['background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']

[INFO] Connecting to server...
[INFO] Successfully connected to server.

Since I ran the computer vision app locally on my laptop, I was able to use “localhost” as the server address. Looks like our connection was successful!

Convert images to JPEG for the web client

In our Python app, we’re perfectly happy using numpy arrays as images. However, our web app won’t understand that format, so we need to get the image into a format it will understand. Luckily, it’s easy to convert numpy arrays to JPEG images in Python. We can add a conversion function to our CVClient class:

1 import cv2
2 import base64
3 ...
4 class CVClient(object):
5    ...
6    def _convert_image_to_jpeg(self, image):
7        # Encode frame as jpeg
8        frame = cv2.imencode('.jpg', image)[1].tobytes()
9        # Encode frame in base64 representation and remove
10       # utf-8 encoding
11       frame = base64.b64encode(frame).decode('utf-8')
12       return "data:image/jpeg;base64,{}".format(frame)

Now we can update our send_data() function. The web client expects images with a max size of 640 x 480, so first we must resize the image to have those maximum dimensions. The text needs to be converted from a list of strings to a string, with the entries separated by a line break. emit() sends the image and text to the recipient.

1 class CVClient(object):
2    ...
3    def send_data(self, frame, text):
4        frame = edgeiq.resize(
5                frame, width=640, height=480, keep_scale=True)
6        self._sio.emit(
7                'cv-data',
8                {
9                    'image': self._convert_image_to_jpeg(frame),
10                    'text': '<br />'.join(text)
11                })

With that addition, we’ve wrapped up the changes to our computer vision app for now!

Build the web client

Now we’ll add our web client, which will be written in Javascript. In the server directory, make a directory called templates. This is where Flask will look for the web pages to load. Also, make a directory called static under server. This is where Flask will look for files used by our web pages. Copy index.html from the source repo into the templates directory, and copy favicon.ico into the static directory.

Let’s dig into index.html. Much of the file is taken up by the the styles of the page. We’ll focus on the relevant HTML section at the bottom, and the Javascript SocketIO client. Here’s the HTML portion:

1 ...
2  <div class="container" min-width= "1024px" width="100%">
4    <div class="dash-sub-title">
5      <h1>alwaysAI Video Streamer</h1>
6    </div>
8    <div class="card-deck">
10      <div class="card">
11        <div class="card-body">
12          <h5 class="card-title">Output</h5>
13          <div class="card-scroller">
14            <p id="streamer-text"></p>
15          </div>
16        </div>
17      </div>
19      <div style="width: 70%">
20        <img id="streamer-image" src="">
21      </div>
23    </div>
24  ...

The most important parts are the “streamer-text” card scroller and the ”streamer_image” with an empty source field. Our SocketIO client will update those elements upon receiving updates from the server.

The Javascript SocketIO client gets a reference to both elements, and initiates a connection to the server. I’ve also added callbacks with logs to help us understand how the client is interacting with the server. The last block handles receiving the server2web message from the server and updating the HTML elements.

1 <script>
2  document.addEventListener("DOMContentLoaded", function(event) {
3    const image_elem = document.getElementById("streamer-image");
4    const text_elem = document.getElementById("streamer-text");
6    var socket = io.connect('http://' + document.domain + ':' + location.port, {
7      reconnection: false
8    });
10    socket.on('connect', () => {
11      console.log('Connected');
12    });
14    socket.on('disconnect', () => {
15      console.log('Disconnected');
16    });
18    socket.on('connect_error', (error) => {
19      console.log('Connect error! ' + error);
20    });
22    socket.on('connect_timeout', (error) => {
23      console.log('Connect timeout! ' + error);
24    });
26    socket.on('error', (error) => {
27      console.log('Error! ' + error);
28    });
30    // Update image and text data based on incoming data messages
31    socket.on('cv-data', (msg) => {
32      image_elem.src = msg.image;
33      text_elem.innerHTML = msg.text;
34    });
35  });
36 </script>

Run the server and CV app

In one terminal, make sure your Python virtual environment is activated, and start the server:

server$ source venv/bin/activate
(venv) server$ python 
[INFO] Starting server at http://localhost:5001

Open http://localhost:5001 in your browser:

alwaysAI video streaming output, using computer vision app Flask-SocketIO

In another terminal, start your CV app using the alwaysAI CLI:
cv$ aai app install
✔ Build docker image
✔ Install model alwaysai/mobilenet_ssd
✔ Found python virtual environment
✔ Install python dependencies
cv$ aai app start -- --server-addr localhost
Loaded model:

Engine: Engine.DNN
Accelerator: Accelerator.GPU

['background', 'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor']

[INFO] Connecting to server...
[INFO] Successfully connected to server.

You should see the video and text content being updated on your browser page!

alwaysAi video streamer output working with computer vision app. Flask-SocketIO

Dealing with slow network connections

This app might run well over a wired ethernet connection, but it's a different story over WiFi. Our computer vision app is continually adding frames to be sent, but with slow connections the device may not be able to keep up. If this is happening, you may notice a significant lag (30 seconds or more) or even that messages aren’t being received by the server and web client at all! One solution would be to add a callback to our CV app send, and only send another frame when we get the callback. However, the latency is still significant and the implementation is not trivial. A simple solution is to reduce the frame rate of the SocketIO client’s output. Let’s start by adding a command-line parameter to our application:

1 if __name__ == "__main__":
2    ...
3    parser.add_argument(
4             '--stream-fps',  type=float, default=20.0,
5             help='The rate to send frames to the server.')
6    args = parser.parse_args()
7    main(, args.use_streamer, args.server_addr, args.stream_fps)

Now, update the main() function to pass the value to the SocketIO client:

1 def main(camera, use_streamer, server_addr, stream_fps):
2    ...
3    if use_streamer:
4             streamer = edgeiq.Streamer().setup()
5         else:
6             streamer = CVClient(server_addr, stream_fps).setup()

The client computes the wait time between frames and drops all frames that come in before the wait time has elapsed:

1 class CVClient(object):
2    def __init__(self, server_addr, stream_fps):
3         self.server_addr = server_addr
4         self.server_port = 5001
5         self._stream_fps = stream_fps
6         self._last_update_t = time.time()
7         self._wait_t = (1/self._stream_fps)
8    ...
9    def send_data(self, frame, text):
10        cur_t = time.time()
11        if cur_t - self._last_update_t > self._wait_t:
12            self._last_update_t = cur_t
13            frame = edgeiq.resize(
14                    frame, width=640, height=480, keep_scale=True)
15            sio.emit(
16                    'cv2server',
17                    {
18                        'image': self._convert_image_to_jpeg(frame),
19                        'text': '<br />'.join(text)
20                    })

With some trial and error you can figure out the streaming rate that works best on your network!


In this tutorial we built a simple HTTP and SocketIO server using Flask-SocketIO, which can run on a laptop and receive a data and video stream from an alwaysAI app running on an edge device. This is just the starting point for a multitude of applications that require some form of data streaming, such as security, robotics, and healthcare. Let us know what you build with alwaysAI!

Get started now

We are providing professional developers with a simple and easy-to-use platform to build and deploy computer vision applications on edge devices.

Get started