Build a Vaccination Center Assistant with Computer Vision

Build A Vaccination Center Assistant with Computer Vision

 

Overview

The COVID-19 pandemic has impacted our lives for over a year in numerous ways and has presented us with a multitude of challenges. With vaccines now available, we have the ability to expedite the end of the pandemic, as long as the vaccine can be distributed safely and efficiently. Any effort that can assist in this process should be taken. Enter computer vision! 

It is always best to develop an application with a specific use case and environment in mind. In order to determine how computer vision might help in the vaccination process, the team at alwaysAI consulted a volunteer at a vaccination center to learn more about the process and find opportunities for computer vision to be applied.

At the volunteer's vaccination center, there were three main phases: waiting for your vaccine (and/or waiting to register), getting your vaccination, and post-vaccination monitoring. The main issues our volunteer identified that we thought could easily be helped with computer vision were: reducing prolonged wait times, helping individuals practice social distancing, and monitoring the post-vaccination room. It would also be useful to have a means of verifying how many vaccinations had taken place, as the current method was a manual clicker counter.

Some other obstacles to the vaccination process that were described, such as hard-to-navigate websites, are not suitable to be solved with computer vision. Other challenges, such as redundant form entering, could potentially be solved with a combination of computer vision solutions for license verification combined with a more efficient registration process online! 

For our solution, we chose to develop three modular applications, which could be used individually or combined with a dashboard to help solve the issues outlined above. The first application would monitor the waiting room, noting the distance between individuals, the proportion of individuals wearing masks, and how many chairs were unoccupied in the waiting room. The second application would monitor how many vaccination events had taken place. The third application would monitor when a patient raises their hand, indicating they need assistance from a volunteer. With these solutions integrated, and some general input data about the type of vaccine being used and the number of vaccination appointments that day, we could also help provide a means of verification for how many vaccine vials had been opened, how many doses were left in an opened vial, and whether the last appointment of the day had been reached. This data could help manage the wait times, by automatically texting upcoming appointments if the line was moving slower or faster than expected, as well as dialing the waiting list if there are vaccine doses remaining in an opened vial at the end of the day.

You can watch the full recording of the webinar here!

In this blog, I’ll describe the general approach, and provide the code for the individual computer vision applications, as well as the components of the alwaysAI pipeline that were used for the application development. The main code for these applications can be found here. As this is a larger application, I won't go through the code line by line, however for more detailed descriptions of the application logic, you can check out the additional GitHub repositories and blogs are provided throughout the blog. Because these applications were not being developed for a real site, just as close to the real scenario as possible, stock video data was used, and this data is what drove the implementation of the applications. If you build your own vaccination application, the exact app will most likely be very different; either the process will be slightly different, or the camera angles will be different. The general approach described in this blog should be applicable to many scenarios, including non-vaccination center use cases. 

The Waiting Room Application

 

Object Detection for People with Masks

This app assisted in analyzing social distancing and mask detection, as well as monitoring when waiting room chairs were unoccupied. People who were at least 6 feet (approximately) away from every other person and were wearing a mask had green bounding boxes; if someone was less than 6 feet away from anyone, or was not wearing a mask, red boxes were displayed. 

To determine how full a waiting room was, we detected how many people were in chairs in the waiting room. This was done using a stock alwaysAI model, yolov3, for person detection. The results filtering function was used to limit results to those with the label ‘person’ to ensure that only predictions for people were being detected. This was accomplished by creating a bounding box around each chair and noting when a person’s bounding box overlapped with the closest chair bounding box over a certain threshold -- say around 70%. This threshold was used because some people leaned forward in a chair, or stuck their foot out to the side, and so using a threshold < 100% allowed for a more generalizable way of detecting that a person was sitting in a chair. Reports about how many masks were detected, the average distance or distance between each pair of people, and how many chairs were occupied were sent to the server (see the last section).

The first step was mask detection. 

Todd Gleed, Product Manager at alwaysAI, trained a new model, using labels ‘mask’ (representing any face wearing a face covering) and ‘no-mask’ (which represented a face without any covering). By sharing the model on a project with me, I was able to test out different versions. Testing the model in the application, worked sporadically, but had difficulty on the stock video data, which had smaller instances of masks (the people were farther away from the camera than they were in the training data). The stock data also happened to only use masks that had more muted colors. I also noticed that when patients sat directly across from the check-in staff, the model could not always make any predictions at all (either as ‘mask’ or as ‘no-mask’); so images from that angle were likely not well represented in the dataset. We certainly could have trained the model more, using data that more closely matched the use case, however instead I modified the application code. Because it is useful to know which people are wearing masks, either the bounding boxes of the people had to be compared to the bounding boxes of the mask detections, or the image corresponding to the person detection boxes could simply be used as input to the mask detection model, which is what I chose to do. For each ‘person’ detection box, the portion of the original frame that corresponded to the person was cut out and this image was run through the mask detection model. A similar method was used in this application. Additionally, if the model was not able to make any predictions, we gained the ability to say ‘there is a person here, but I can’t tell if they are wearing a mask’; this is useful if someone has their back turned to the camera, or mask detection is difficult for another reason. If a person was detected, but it wasn’t possible to determine if they were wearing a mask, the person’s bounding box was red and had a label of ‘no mask detected’. The application still had trouble with the angle directly across from the check-in staff. 

This portion of the application did use tracking, however, it probably wasn’t necessary. Tracking, in this case, is useful for seeing how patients moved from chair to chair, but for the waiting room, we probably only care if all chairs are full, or noting how many chairs are empty, etc. This can be done just with object detection. Any time someone entered a chair or left a chair, this information was sent to the dashboard application.

The distancing was done simply by using a hard-coded standard measurement for a person and using a pixel scale, such as is used in this sample application. If you want to see a demo and explanation of this logic, you can check out our Hacky Hour on distance measurement. Three and a half feet was the hard-coded standard measurement used for a person in this case because people were generally sitting down in the camera frame. Using a hard-coded scale was possible because there was a single static camera, and whenever a person was detected and sitting, the entire person was in the frame. Depending on the camera angle and environment, more robust methods of distance measuring may be needed, such as using other objects with known dimensions in the frame or checking for key points to see if a whole person was found, or using a different scale if just a partial person was found. You could also use a depth camera for more precise measurements.

The Vaccination Application

Waiting Room Object Detection

For the vaccination application, we found some stock videos that had a static view of a table, with a chair on either side. The medical professional would sit at the table, and periodically, a patient would come sit down at the table. After some time, the medical professional would walk around the table, stand next to the patient, and give them their vaccination. So, we can capture a ‘vaccination event’ by noting that two people are now in a portion of an image, and they stayed there for an acceptable duration of time.

 This just requires tracking and noting that the tracked bounding boxes are within a specific portion of an image for a set amount of time, exactly the same way as described in the section above. Depending on the camera angle, and the number of cameras per vaccination station, you may need to take an approach similar to the waiting room chairs, or maybe you need to use key points to note when someone is sitting or standing. Every time a vaccination event occurred, a POST request was sent to the dashboard server.

The Post-Vaccination Monitoring Application

Hand Raise Object Detection

This is a pretty simple concept -- capturing a hand raise. We did something similar in this application. In this case, instead of defining what constitutes poor posture, I just captured different notions of a ‘hand raise’. In the stock video used for this application, a hand raise could be captured by noting when a person’s elbow was higher than the corresponding shoulder (note: this means the y-coordinate of the elbow is less than the y-coordinate of the corresponding shoulder). You can watch the Hacky Hour where we covered the posture corrector app for a more detailed explanation of this logic.

This is an overly simplistic way to capture a hand raise, though. Think about how your arm looks in relation to your head, shoulders, and other key points when you raise your arm. Maybe your friend raises their arm a little differently -- maybe your arm goes straight up, so your wrist is right above your elbow, which is right above your shoulder. Maybe your friend just brings their hand up by their face, so their wrist is over their shoulder, but their elbow is off to the side and in between the wrist and shoulder in terms of vertical placement. Maybe you can’t see a wrist, or an elbow, or a shoulder, in the camera view. You can test with real-world hand raises, and see what key point relationships reflect the most common hand raises given the angle of the camera! To make sure we don’t capture when someone just brushes the hair off of their face or maybe adjusts their glasses, motions that might result in very similar key point relationships to hand raises, we can use a ‘consensus’ logic in our application. This starts a timer when a hand raise is detected, and then counts the number of times the hand raise is detected within a certain timeout period. The number of total results is divided by the current number of people, and then this is compared to the total times a ‘hard raise’ was detected. For instance, if we run the loop 10 times, and one person is raising their hand, we should have 10 ‘hand raise’ results and 20 ‘no hand raise’ results. As long as the number of ‘hand raise’ events is greater than or equal to the total events divided by the number of people, we log it as a true hand raise. Every time a hand was raised, this was sent to the dashboard server. A similar approach was used in our gesture-control application. You can also watch the Hacky Hour we did on this topic.

Bringing it all Together

These three applications were tied together using a dashboard generated with Dash Plotly. Because alwaysAI is Python-based, you can integrate edgeIQ with existing Python packages. This GitHub repository provides a basic Dash dashboard as an alwaysAI application; you can send all of your data (using the ‘server_url’ attribute in each of the individual applications) to this dashboard, process the event log, and use this data to populate various graphs and tables in your dashboard! Note that in the individual applications, we sent data to a ‘route’ as well as the URL. In order to process the incoming data, you will need to create the routes you sent data to in the individual applications in dashboard_server/app.py, like so:

@app.route("/event", methods=["POST"])
def event():
body = request.json
# process body further, send to managing object 

You can make additional routes (like for/setup, or you can have different endpoints for each of your individual applications). It may be useful to make a specific class to parse and manage the incoming data. So, you can instantiate one ‘manager’ object (an instance of a class that you can define yourself) inside dashboard_server/app.py and then pass all the data to this object and process it. For instance, the Flask route could look like this:

manager = DataManager()

@app.route("/event", methods=["POST"])

def event():
body = request.json
manager.process_data(body)

In fact, if you just want to aggregate data and send out alerts, or write data files based on the results, you could do this without Dash visualizations --  just use these routes to process the incoming data and have some application logic use the data management instance to determine what action to take.

If you are using Dash, the callbacks in your dashboard_server/app.py can invoke methods on this managing object to get data to populate your visualizations. For instance, the callback in the above GitHub repository could be changed to look like this:

# Dash Callbacks
@dash_app.callback(
   output=[Output("logs", "data"), Output("logs", "columns")],
   inputs=[Input('interval-component', 'n_intervals')]
)

def render_log_table(n_intervals):
   df = manager.get_log_table_df() 
   return df.to_dict('records'), [{"name": i, "id": i} for i in df.columns]

Inside the ‘DataManager’ class you would define the function get_log_table_df()and implement it to return the proper data.

It may be useful to store reformatted data in pandas DataFrames.

While all of the other applications were run using the streamer functionality, this was really just to show how the applications were working. If you need more control over the streamer, you can build your own; you can find various examples of that here. You can also use this template for your central server if you don't need to incorporate visualizations. You can also run any alwaysAI application without a streamer, and simply send requests to your server with information generated using computer vision services, such as ‘three people in the waiting room’, or ‘one person with a hand raised’. If you want to use the Streamer, but want to hide faces, you can always use the face detection model (such as alwaysai/res10_300x300_ssd_iter_140000) and then use the returned predictions in a call to edgeiq.markup_image(). Of course, if you were ever developing a production application using health-related data, you would need to adhere to HIPAA and any and all requirements of the health organization.

All of these applications were run on a single closed network in a modular fashion; you can run just one application with the server if you don’t need all three, or you could add a whole new application and incorporate more details in your dashboard. You can easily deploy the individual applications on different edge devices and communicate with the server using the server machine’s IP address.

As you saw, a lot of the underlying logic in this vaccination application suite was previously generated in our past blog posts and Hacky Hours, which goes to show that the logic in these applications can be reused in many other use cases!  Let us know on Discord what projects you're working on and how you're using computer vision to solve real world problems.

Get started now

We are providing professional developers and enterprises with a simple and easy-to-use platform to build and deploy computer vision applications on edge devices. 

Sign Up for Free