This document discusses automating video streaming analysis using Microsoft Azure and Amazon Web Services. It explores using .NET Core, OpenCV, Face and Computer Vision APIs from Azure Cognitive Services, and Amazon Rekognition from AWS. Experiments were conducted using the Extended Cohn-Kanade Dataset to compare the APIs from Azure and AWS for tasks like face detection, recognition, and emotion analysis. The document concludes that Azure provided more accurate and user-friendly experiences compared to AWS.
Biometric Systems - Automate Video Streaming Analysis with Azure and AWS
1. Sapienza – University of Rome
MSc in Engineering in Computer Science, A.A. 19/20
Prof. M. De Marsico, Course of Biometric Systems
C. Navarra – F. Guidi – R. Falconi – S. Clinciu
Biometric Systems - Automate Video Streaming Analysis
With Microsoft Azure and Amazon Web Services
2.
3. 1. Summary
2. Introduction and Ideas..............................................................................................................5
3. Technologies...............................................................................................................................6
.NET Core.......................................................................................................................................6
OpenCV..........................................................................................................................................6
Microsoft Azure.............................................................................................................................6
Amazon Web Services..................................................................................................................7
Visual Studio..................................................................................................................................7
4. Microsoft Azure .........................................................................................................................8
Azure Dashboard..........................................................................................................................8
Cognitive Services.........................................................................................................................8
Face and Computer Vision APIs.................................................................................................8
Face Detection................................................................................................................................8
Face Recognition ...........................................................................................................................9
Azure Pricing Tier.......................................................................................................................10
Custom Vision .............................................................................................................................10
5. Amazon Web Services (AWS)................................................................................................11
AWS Management Console.......................................................................................................11
AWS Lambda and Amazon S3..................................................................................................11
Amazon Rekognition and Amazon Kinesis............................................................................11
AWS Pricing Tier.........................................................................................................................12
6. Software implementation in .NET Core and C#..................................................................12
.NET Core CLI .............................................................................................................................12
Used Libraries..............................................................................................................................12
From a simple approach to parallelizing api calls .................................................................13
4. A producer-consumer design....................................................................................................13
AWS Lambda Function constructors .......................................................................................14
AWS Lambda FunctionHandler ...............................................................................................14
AWS Toolkit and AWS explorer...............................................................................................14
Upload AWS Lambda function.................................................................................................15
OpenCV sharp .............................................................................................................................15
7. Experiments..............................................................................................................................16
Datasets.........................................................................................................................................16
APIs Comparison ........................................................................................................................16
8. Conclusions...............................................................................................................................17
9. Future Work .............................................................................................................................18
10. How to run the code............................................................................................................18
11. Useful Links ..........................................................................................................................19
GitHub ..........................................................................................................................................19
LinkedIn .......................................................................................................................................19
SlideShare.....................................................................................................................................19
12. References..............................................................................................................................20
5. 2. Introduction and Ideas
The goal of our project is to:
• Perform near-real-time analysis on faces (emotion, gender, age, etc.) taken from a live
video stream with the OpenCV .NET SDK
• Acquire frames from a video source
• Select which frames to analyse
• Submit these frames to the Microsoft Azure Face, Computer Vision and Emotion
APIs
• Consume each analysis result that is returned from the API call
• Return positive or negative results basing the test on Face IDs
• Experimental comparison between Microsoft Azure and AWS
We want to use Biometric Systems and services by Microsoft Azure and AWS and compare
them, using Face ID as a unique identifier string for each detected and analysed face, distil
actionable info from images of the real world and detect, identify, analyse, organize, and
tag faces in photos with both Microsoft Azure and AWS.
To develop our software, to write the code and to build the final project, we have used the
following tools and technologies:
• Microsoft .NET Core context and C# programming language
• Visual Studio Enterprise 2019 integrated development environment (IDE)
• OpenCV (Open Source Computer Vision Library) to provide an infrastructure for
biometric systems and apps
• Microsoft Azure with Cognitive Services, Computer Vision, Emotion and Face APIs
• AWS with Amazon Rekognition and Amazon Kinesis
6. 3. Technologies
.NET Core
We have started developing our software in C# programming language using Microsoft
.NET Core environments and libraries. .NET Core is a free and open-source, managed
computer software framework for Windows, Linux, and macOS operating systems.
It is a cross-platform successor to .NET Framework.
The project is primarily developed by Microsoft and released under the MIT License, but it
is widely supported by developers and competitors (such as Amazon).
OpenCV
OpenCV (Open source computer vision) is a library of programming functions mainly
aimed at real-time computer vision.
The library is cross-platform and free for use under the open-source BSD license.
We used it to develop a near-real-time video streaming linked to Azure Cognitive Services.
Microsoft Azure
Microsoft Azure is a cloud computing service created by Microsoft for building, testing,
deploying, and managing applications and services through Microsoft-managed data
centres.
It provides software as a service (SaaS), platform as a service (PaaS) and infrastructure as a
service (IaaS) and supports many different programming languages.
In our project, we used Cognitive Services as described later.
7. Amazon Web Services
Amazon Web Services (AWS) provides on-demand cloud computing platforms and APIs to
individuals, companies, and governments, on a metered pay-as-you-go basis.
Cloud computing web services provide a set of primitive abstract technical infrastructure
and distributed computing building blocks and tools.
In our project, we used AWS to be compared with Microsoft Azure, one of its major
competitors.
Visual Studio
Microsoft Visual Studio is an integrated development environment (IDE) from Microsoft.
Team Explorer is used to integrate the capabilities of Azure DevOps (either Azure DevOps
Services or Azure DevOps Server) into the IDE.
8. 4. Microsoft Azure
Azure Dashboard
Microsoft Azure Dashboard is a focused and organized view of your cloud resources, in the
Azure portal. We used the dashboard as a workspace where you can quickly launch tasks
for day-to-day operations and monitor resources.
Cognitive Services
A comprehensive family of AI services and cognitive APIs to help you build intelligent
apps.
Cognitive Services bring AI within reach of every developer—without requiring machine-
learning expertise. All it takes is an API call to embed the ability to see, hear, speak, search,
understand, and accelerate decision-making into your apps.
Face and Computer Vision APIs
Explore the Azure services to get started with Computer Vision and Face.
Get the API Key and endpoint to authenticate your applications and start sending calls to
the service: All Computer Vision calls, and Docker container activations require a key.
Specify the key either in the request header (Web API), the Computer Vision client (SDK)
or through the command-line (Docker container).
Try the service in the API console - requires an API Key and selecting your location: west
Europe.
Make a web API call - requires your API Key and endpoint.
Face Detection
Face detection is the action of locating human faces in an image and optionally returning
different kinds of face-related data.
9. At a minimum, each detected face corresponds to a face-rectangle field in the response. This
set of pixel coordinates for the left, top, width, and height mark the located face.
Using these coordinates, you can get the location of the face and its size. In the API response,
faces are listed in size order from largest to smallest.
If you're detecting faces from a video feed, you may be able to improve performance by
adjusting certain settings on your video camera.
Face Recognition
Verify and Identify face recognition operations and the underlying data structures.
Recognition describes the work of comparing two different faces to determine if they're
similar or belong to the same person.
Face Recognition, in Azure according to Microsoft consists in two different operations:
• Verify operation takes two face ID and determines whether they belong to the same
person; get the locations of various face landmarks, such as pupils, nose, and mouth,
in an image; guess the gender, age, emotion, and other attributes of a detected face.
• Identify operation takes one or several face IDs and returns faces might belong to the
Ids.
It is possible to write C# code by using the Azure Cognitive Services Face API client library
to apply Verify operation. Every call to the Face API requires a subscription key. This key
can be either passed through a query string parameter or specified in the request header. To
pass the subscription key through a query string.
To get the subscription keys, it is required to go in the Azure Marketplace from the Azure
portal.
10. Azure Pricing Tier
The cost of your cognitive services depends on the actual usage and the options you choose.
Both Face API and Vision API, in the Cognitive Services of Azure, cost 0.84 EUR / 1000 Calls
as the day of writing this document in Europe.
Custom Vision
Custom Vision (here on the right) is a fully available web app by Azure to easily train your
models, analyse its performance and make quickly predictions, using Computer Vision API
in a user-friendly way without use of code or any programming language.
Analyse content in images.
Customize image recognition to fit your business needs.
As shown in the next slides, we have instead used C# programming language and .NET
Core libraries.
11. 5. Amazon Web Services (AWS)
AWS Management Console
Access and manage Amazon Web Services through a simple and intuitive web-based user
interface.
Administer your AWS account: The Console facilitates cloud management for all aspects of
your AWS account
Finding Services in the AWS Console: there are several ways for you to locate and navigate
to the services you need thanks to the AWS Console
AWS Lambda and Amazon S3
AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a
part of the Amazon Web Services. It is a computing service that runs code in response to
events and automatically manages the computing resources required by that code.
Amazon Simple Storage Service (S3) is a service offered by Amazon Web Services (AWS)
that provides object storage through a web service interface.
We put together Lambda and S3 because, in our case, the first is hosted by the second.
Amazon Rekognition and Amazon Kinesis
Amazon Rekognition is a cloud-based Software as a service (SaaS) computer vision
platform, it provides several computer vision capabilities, which can be divided into two
categories.
Amazon Kinesis makes it easy to collect, process, and analyse real-time, streaming data so
you can get timely insights and react quickly to new information.
12. We put together Rekognition and Kinesis because they work in close contact.
AWS Pricing Tier
AWS Lambda Prices are clearly declared, but are not easy to understand, at least not as
much as Azure does.
While Azure declares a cost of 0.84 EUR / 1000 APIs calls for Cognitive Services, AWS
explain costs in a less easy to understand way as the following image shows.
Other than AWS Lambda, you must pay also for both you store in S3 (the Lambda function
itself and the video streaming) and for any request or data retrieval.
It follows that the cost is a sum of Lambda and S3 cost, each one is a result of a complex
calculation.
6. Software implementation in .NET Core and C#
.NET Core CLI
In a console window (such as cmd, PowerShell, or Bash), the dotnet new command lets you
create a new console app with the name face-quickstart.
This command creates a simple "Hello World" C# project with a single source file:
Program.cs.
Used Libraries
From the project directory, open the Program.cs file in your preferred editor or IDE. Add
the following using directives.
13. In the application's Main method, create variables for your resource's Azure endpoint and
key.
Within the application directory, install the Face client library for .NET.
From a simple approach to parallelizing api calls
The simplest design for a near-real-time analysis system is an infinite loop, where each
iteration grabs a frame, analyses it, and then consumes the result.
If our analysis consisted of a lightweight client-side algorithm, this approach would be
suitable. However, when analysis happens in the cloud, the latency involved means that an
API call might take several seconds. During this time, we are not capturing images, and our
thread is essentially doing nothing. Our maximum frame-rate is limited by the latency of
the API calls.
While a simple single-threaded loop makes sense for a lightweight client-side algorithm, it
doesn't fit well with the latency involved in cloud API calls. The solution to this problem is
to allow the long-running API calls to execute in parallel with the frame-grabbing. In C#,
we could achieve this using Task-based parallelism.
A producer-consumer design
Our code launches each analysis in a separate Task, which can run in the background while
we continue grabbing new frames. With this method we avoid blocking the main thread
while waiting for an API call to return, but we have lost some of the guarantees that the
simple version provided. Multiple API calls might occur in parallel, and the results might
get returned in the wrong order.
This could also cause multiple threads to enter the ConsumeResult() function
simultaneously, which could be dangerous, if the function is not thread-safe. Finally, this
simple code does not keep track of the Tasks that get created, so exceptions will silently
14. disappear. Therefore, the final step is to add a "consumer" thread that will track the analysis
tasks, raise exceptions, kill long-running tasks, and ensure that the results get consumed in
the correct order.
AWS Lambda Function constructors
The class has two constructors. The first is a constructor that is used when Lambda invokes
your function. This constructor creates the S3 and Rekognition service clients and gets the
AWS credentials for these clients from the IAM role you assign to the function when you
deploy it. The AWS Region for the clients is set to the region your Lambda function is
running in. This constructor also checks the environment variable MinConfidence to
determine the acceptable confidence level.
You can use the second constructor (the smaller one) for testing.
AWS Lambda FunctionHandler
FunctionHandler is the method Lambda calls after it constructs the instance. Notice that the
input parameter is of type S3Event and not a Stream. You can do this because of the
registered Lambda JSON serializer. The S3Event contains all the information about the
event triggered in Amazon S3. The function loops through all the S3 objects that were part
of the event and tells Rekognition to detect labels. After the labels are detected, they are
added as tags to the S3 object.
AWS Toolkit and AWS explorer
The AWS Toolkit for Visual Studio is an extension for Microsoft Visual Studio running on
Microsoft Windows that makes it easier for developers to develop, debug, and deploy .NET
applications using Amazon Web Services.
15. With the AWS Toolkit for Visual Studio, you'll be able to get started faster and be more
productive when building AWS applications using AWS Explorer.
Upload AWS Lambda function
This launches the deployment process, which builds and packages the Lambda project and
then creates the Lambda function. Once publishing is complete, the Function view in the
AWS Explorer window is displayed. From here, you can invoke a test function.
OpenCV sharp
NuGet is a free and open-source package manager, distributed as a Visual Studio extension.
OpenCVSharp is an essential library used in our project in both Azure and AWS connections
to the cloud providers and the APIs.
16. 7. Experiments
AWS lets you upload to S3 an image and to analyse it (with Rekognition), thanks to a
Lambda function, Rekognition send back its results.
We uploaded to our AWS Bucket and Azure Cognitive Services different sets of images
from the CK+ Dataset and videos stream.
Datasets
• A facial expression database is a collection of images or video clips with facial
expressions of a range of emotions.
• Well-annotated (emotion-tagged) media content of facial behavior is essential for
training, testing, and validation of algorithms for the development of expression
recognition systems.
• The emotion annotation can be done in discrete emotion labels or on a continuous
scale.
• We used Extended Cohn-Kanade Dataset (CK+), which is released under a Creative
Commons Attribution license, it contains 123 subjects and 593 image sequences (327
sequences having discrete emotion labels) of both posed and spontaneous smiles, at
a resolution of 640*490.
APIs Comparison
17. ROC and CMC Curves
Receiver operating characteristic (or ROC) is a plot of the correctly classified labels vs. the
incorrectly classified labels for a model.
Cumulative Match Characteristic (or CMC) is a plot of the rank at which a true match occurs
vs. the identification accuracy.
Both can be appreciated in the project’s slides.
8. Conclusions
By numbers, Microsoft Azure Cognitive Services is quite more accurate than Amazon
Rekognition:
• Azure has clear over fitting and over confidence problems.
• Overall, AWS does not suffer by these problems but has quite less accurate results.
By us, Microsoft Azure provides:
• User-friendly experiences (Dashboard is clearer than Management Console and
almost all services provide the same UI).
• Clearer names (Cognitive, Speech Services, Face, Emotion APIs etc. are speaking
names and they are easier to remember while compared with Amazon Rekognition,
Polly, Kinesis).
• Clearer prices (Azure just says EUR / Calls, while AWS provides very complex tables
and calculations to do).
0
50
100
150
200
Azure AWS CK+
Anger Disgust Contempt Fear
Sadness Happiness Surprise Neutral
18. • Hybrid cloud approach, mixing local and web infrastructure, which is fundamental
for governments and public administrations to keep locally the data of citizen while
using computational power in the cloud.
9. Future Work
It would be very interesting to make more comparison and discover new services:
• Compare more services between AWS and Microsoft Azure, such as their respective
Amazon Polly and Speech Services.
• Compare more Cloud Provider to AWS and Microsoft Azure, such as Google Cloud
and IBM Cloud.
10. How to run the code
We have tried to make it easy to run the project. Get your own Cognitive Services API keys
on microsoft.com/cognitive, for video frame analysis the applicable APIs are Computer
Vision API and Face API. Open the sample in Visual Studio, build and run the application
inserting the API keys in the settings using IIS (Internet Information Services).