Senior Solution Architect Shares Insights on Anomaly Detection and Machine Learning

Senior Solution Architect in beanTech
Microsoft Azure MVP
Community Lead 1nn0va // Pordenone
1nn0va After Hour
weekly, every Thuesday evening 9PM CET
https://bit.ly/1nn0va-video
Linkedin: https://www.linkedin.com/in/marcoparenzan/
Marco Parenzan
nov 20, 2022

This work is the producy of
prolonged inexperience
Questo lavoro è frutto di una
lunga inesperienza
Beppe Severgnini, «Un italiano in America»

Storage
Account
Azure
IoT Central
Devices
Events
Ingest
The starting point...
• Industry 4.0
• Asset Management/Monitoring is not
enough!
(Web)
Control

How can we implement processing?
Ingest ProcessValue
Storage
Account
Azure
IoT Hub-Related
Services
Devices
Events
?

Anomaly Detection
Anomaly detection is the process of identifying unexpected items or events in data
sets (called OUTLIERS), which differ from the norm.
On Anomalies we take decisions.
On data we take decisions.

• In time series data, an anomaly or outlier can be termed as a data point which is
not following the common collective trend or seasonal or cyclic pattern of the
entire data and is significantly distinct from rest of the data. By significant, most
data scientists mean statistical significance, which in order words, signify that
the statistical properties of the data point is not in alignment with the rest of the
series.
• Anomaly detection has two basic assumptions:
• Anomalies only occur very rarely in the data.
• Their features differ from the normal instances significantly.
Anomaly Detection in Time Series

• Measuring=Associate Quantities
• But what about anything we can’t “measure”?
• Defects
Anomaly Detection in Time SeriesTelemetry

• How a computer «view»?
• (Convolutional) Neural NetworksDeep Learning
• Transforming images in (very) big arrays (they are arrays already)
• ArraysVectors
• Apply big matrices to vectors (Linear Algebra)
• Apply big matrices more and more times (hidden layers)
• Extracting Features
• On featured vectorsapply «classic» Machine Learning (regression,
classification, clusterization)
• regardless of what these mean
• It it the topic where there is the majority of supervised data
Vision

Object
Detection
Identify Objects
into an image
Classify
image
Enrich image with
tags
On images you can

How to build an Object Detector or Image Classifier

Model Class Reference Description
Tiny YOLOv2 Redmon et al. A real-time CNN for object detection that detects 20 different classes. A smaller version of the more complex full YOLOv2 network.
SSD Liu et al. Single Stage Detector: real-time CNN for object detection that detects 80 different classes.
SSD-MobileNetV1 Howard et al. A variant of MobileNet that uses the Single Shot Detector (SSD) model framework. The model detects 80 different object classes and locates up to 10 objects in an image.
Faster-RCNN Ren et al. Increases efficiency from R-CNN by connecting a RPN with a CNN to create a single, unified network for object detection that detects 80 different classes.
Mask-RCNN He et al.
A real-time neural network for object instance segmentation that detects 80 different classes. Extends Faster R-CNN as each of the 300 elected ROIs go through 3 parallel branches of the
network: label prediction, bounding box prediction and mask prediction.
RetinaNet Lin et al.
A real-time dense detector network for object detection that addresses class imbalance through Focal Loss. RetinaNet is able to match the speed of previous one-stage detectors and defines
the state-of-the-art in two-stage detectors (surpassing R-CNN).
YOLO v2-coco Redmon et al.
A CNN model for real-time object detection system that can detect over 9000 object categories. It uses a single network evaluation, enabling it to be more than 1000x faster than R-CNN and
100x faster than Faster R-CNN. This model is trained with COCO dataset and contains 80 classes.
YOLO v3 Redmon et al. A deep CNN model for real-time object detection that detects 80 different classes. A little bigger than YOLOv2 but still very fast. As accurate as SSD but 3 times faster.
Tiny YOLOv3 Redmon et al. A smaller version of YOLOv3 model.
YOLOv4 Bochkovskiy et al.
Optimizes the speed and accuracy of object detection. Two times faster than EfficientDet. It improves YOLOv3's AP and FPS by 10% and 12%, respectively, with mAP50 of 52.32 on the
COCO 2017 dataset and FPS of 41.7 on a Tesla V100.
DUC Wang et al.
Deep CNN based pixel-wise semantic segmentation model with >80% mIOU (mean Intersection Over Union). Trained on cityscapes dataset, which can be effectively implemented in self
driving vehicle systems.
FCN Long et al.
Deep CNN based segmentation model trained end-to-end, pixel-to-pixel that produces efficient inference and learning. Built off of AlexNet, VGG net, GoogLeNet classification methods.
contribute
Object Detection with CNNs

Model Class Reference Description
MobileNet Sandler et al.
Light-weight deep neural network best suited for mobile and embedded vision applications.
Top-5 error from paper - ~10%
ResNet He et al.
A CNN model (up to 152 layers). Uses shortcut connections to achieve higher accuracy when classifying images.
Top-5 error from paper - ~3.6%
SqueezeNet Iandola et al.
A light-weight CNN model providing AlexNet level accuracy with 50x fewer parameters.
VGG Simonyan et al.
Deep CNN model(up to 19 layers). Similar to AlexNet but uses multiple smaller kernel-sized filters that provides more accuracy when classifying images.
AlexNet Krizhevsky et al.
A Deep CNN model (up to 8 layers) where the input is an image and the output is a vector of 1000 numbers.
GoogleNet Szegedy et al.
Deep CNN model(up to 22 layers). Comparatively smaller and faster than VGG and more accurate in detailing than AlexNet.
CaffeNet Krizhevsky et al. Deep CNN variation of AlexNet for Image Classification in Caffe where the max pooling precedes the local response normalization (LRN) so that the LRN takes less compute and memory.
RCNN_ILSVRC13 Girshick et al. Pure Caffe implementation of R-CNN for image classification. This model uses localization of regions to classify and extract features from images.
DenseNet-121 Huang et al. Model that has every layer connected to every other layer and passes on its own feature providing strong gradient flow and more diversified features.
Inception_V1 Szegedy et al.
This model is same as GoogLeNet, implemented through Caffe2 that has improved utilization of the computing resources inside the network and helps with the vanishing gradient problem.
Inception_V2 Szegedy et al.
Deep CNN model for Image Classification as an adaptation to Inception v1 with batch normalization. This model has reduced computational cost and improved image resolution compared to
Inception v1.
Top-5 error from paper ~4.82%
ShuffleNet_V1 Zhang et al.
Extremely computation efficient CNN model that is designed specifically for mobile devices. This model greatly reduces the computational cost and provides a ~13x speedup over AlexNet on ARM-
based mobile devices. Compared to MobileNet, ShuffleNet achieves superior performance by a significant margin due to it's efficient structure.
ShuffleNet_V2 Zhang et al.
Extremely computation efficient CNN model that is designed specifically for mobile devices. This network architecture design considers direct metric such as speed, instead of indirect metric like
FLOP.
ZFNet-512 Zeiler et al.
Deep CNN model (up to 8 layers) that increased the number of features that the network is capable of detecting that helps to pick image features at a finer level of resolution.
EfficientNet-Lite4 Tan et al.
CNN model with an order of magnitude of few computations and parameters, while still acheiving state-of-the-art accuracy and better efficiency than previous ConvNets.
Image Classifications with CNNs

• ImageArray=arrayFrom(Image)
• Result=AnomalyDetect(ImageArray)
• Result=true/false
• True/falseclasses with associated probabilites
• Probability: all results are NEVER exact (P=100%)
Anomaly Detection in Images

• “…The most important thing to realize about TensorFlow is that, for the most
part, the core is not written in Python: It's written in a combination of highly-
optimized C++ and CUDA (Nvidia's language for programming GPUs). Much of
that happens, in turn, by using Eigen (a high-performance C++ and CUDA
numerical library) and NVidia's cuDNN (a very optimized DNN library for NVidia
GPUs, for functions such as convolutions). The model for TensorFlow is that the
programmer uses "some language" (most likely Python!) to express the
model…”
• “…PyTorch backend is written in C++ which provides API's to access highly
optimized libraries such as; Tensor libraries for efficient matrix operations, CUDA
libaries to perform GPU operations and Automatic differentiation for gradience
calculations etc…”
The truth

• TensorFlow, PyTorch, SciKitLearn are frameworks to model, train ML and DL
models and score data with them
• They have:
• A native C++ core model
• A binding in Python (compose/parametrize the model)
Frameworks

• ONNX is a machine learning model representation format that is open source.
• ONNX establishes a standard set of operators - the building blocks of machine
learning and deep learning models - as well as a standard file format, allowing AI
developers to utilise models with a range of frameworks, tools, runtimes, and
compilers
• TensorFlow, PyTorch can export their Neural Network in ONNX
ONNX

• ML.NET is first and foremost a framework that you can use to
create your own custom ML models. This custom approach
contrasts with “pre-built AI,” where you use pre-designed general
AI services from the cloud (like many of the offerings from Azure
Cognitive Services). This can work great for many scenarios, but
it might not always fit your specific business needs due to the
nature of the machine learning problem or to the deployment
context (cloud vs. on-premises).
• ML.NET enables developers to use their existing .NET skills to
easily integrate machine learning into almost any .NET
application. This means that if C# (or F# or VB) is your
programming language of choice, you no longer have to learn a
new programming language, like Python or R, in order to
develop your own ML models and infuse custom machine
learning into your .NET apps.
Data Science and AI for the .NET developer

• Evolution and generalization of the seminal role of Mathematica
• In web standards way
• Web (HTTP+Markdown)
• Python adoption (ipynb)
• Written in Java
• Python has an interop bridge...not native (if ever important)
Jupyter (just to know)

• .NET bindings (C# e F#) to Spark
• Written on the Spark interop layer, designed to provide high performance bindings to multiple
languages
• Re-use knowledge, skills, code you have as a .NET developer
• Compliant with .NET Standard
• You can use .NET for Apache Spark anywhere you write .NET code
• Original project Moebius
• https://github.com/microsoft/Mobius
Data Science with Notebooks and .NET (and Spark)...just
to know

Microsoft Computer Vision
• A pre-trained service capable of recognizing
images and objects within images
• A great standard service to use if you need a
generic service that can recognize all kinds of
things, and you do not want to go through the
trouble of creating a custom service
• Less accurate on details and specifics than a custom model
• Various APIs as well as a test-bench-page are available
• https://azure.microsoft.com/en-us/services/cognitive-services/computer-
vision/

Custom Vision API Overview
• Taking things a step further…
• Instead of using a pre-trained model, we can create our own model purpose-built
for a specific example
• Leads to better results in a more specific/narrow scenario
• It is a complete service to manage the lifecycle of a Vision-based solution

Custom Vision Steps
1. Train a Custom Model based on your own data/images
• Can be done through a UI
• Can also be done through programmatic service calls
2. Publish the new Model as your private/personal service
• Often done as part of Azure Cloud
• Can be deployed as local services
3. Call the service from your application

Exporting models
1. Trained models can be exported to ONNX format

• Statistical Profiling Approach
• This can be done by calculating statistical values like mean or median moving average of the
historical data and using a standard deviation to come up with a band of statistical values
which can define the uppermost bound and the lower most bound and anything falling
beyond these ranges can be an anomaly.
• By Predictive Confidence Level Approach
• One way of doing anomaly detection with time series data is by building a predictive model
using the historical data to estimate and get a sense of the overall common trend, seasonal
or cyclic pattern of the time series data.
• Clustering Based Unsupervised Approach
• Unsupervised approaches are extremely useful for anomaly detection as it does not require
any labelled data, mentioning that a particular data point is an anomaly.
How to do Time Series Anomaly Detection?

• To monitor the time-series continuously and alert for potential incidents on time
• The algorithm first computes the Fourier Transform of the original data. Then it computes
the spectral residual of the log amplitude of the transformed signal before applying the Inverse
Fourier Transform to map the sequence back from the frequency to the time domain. This
sequence is called the saliency map. The anomaly score is then computed as the relative
difference between the saliency map values and their moving averages. If the score is above a
threshold, the value at a specific timestep is flagged as an outlier.
• There are several parameters for SR algorithm. To obtain a model with good performance, we
suggest to tune windowSize and threshold at first, these are the most important parameters to
SR. Then you could search for an appropriate judgementWindowSize which is no larger than
windowSize. And for the remaining parameters, you could use the default value directly.
• Time-Series Anomaly Detection Service at Microsoft [https://arxiv.org/pdf/1906.03821.pdf]
Spectrum Residual Cnn (SrCnn)

• .NET Interactive gives C# and F# kernels to Jupyter
• .NET Interactive gives all tools to create your hosting application independently
from Jupyter
• In Visual Studio Code, you have two different notebooks (looking similar but
developed in parallel by different teams)
• .NET Interactive Notebook (by the .NET Interactive Team) that can run also Python
• Jupyter Notebook (by the Azure Data Studio Team – probably) that can run also C# and F#
• There is a little confusion on that 
• .NET Interactive has a strong C#/F# Kernel...
• ...a less mature infrastructure (compared to Jupiter)
.NET Interactive and Jupyter
and Visual Studio Code

• Don’t think that Data Scientists are «superhuman» 
• No modeling from scratch, but tuning, training and testing existing models
• Python has won (?!??!!) but only because it is a lazy world
• And .net is evolving (consistently)
• Azure (cloud) is on-demand super power (training)
Conclusions

Marco Parenzan
Senior Solution Architect @ beanTech
Microsoft Azure MVP
1nn0va Community Lead

Senior Solution Architect Shares Insights on Anomaly Detection and Machine Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Senior Solution Architect Shares Insights on Anomaly Detection and Machine Learning

Similar to Senior Solution Architect Shares Insights on Anomaly Detection and Machine Learning (20)

More from Marco Parenzan

More from Marco Parenzan (20)

Recently uploaded

Recently uploaded (20)

Senior Solution Architect Shares Insights on Anomaly Detection and Machine Learning