Here is the anomalow-down!

•Download as PPTX, PDF•

0 likes•117 views

Why should we care about anomalies? They demand our attention because they are telling a different story from the norm. An anomaly might signify a failing heart rate of a patient, a fraudulent credit card activity, or an early indication of a tsunami. As such, it is extremely important to detect anomalies. What are the challenges in anomaly detection? As with many machine/statistical learning tasks high dimensional data poses a problem. Another challenge is selecting appropriate parameters. Yet another challenge is high false positive rates. In this talk we introduce two R packages – dobin and lookout - that address different challenges in anomaly detection. Dobin is a dimension reduction technique especially catered to anomaly detection. So, dobin is somewhat similar PCA; but dobin puts anomalies in the forefront. We can use dobin as a pre-processing step and find anomalies using fewer dimensions. On the other hand, lookout is an anomaly detection method that uses kernel density estimates and extreme value theory. But there is a difference. Generally, anomaly detection methods that use kernel density estimates require a user-defined bandwidth parameter. But does the user know how to specify this elusive bandwidth parameter? Lookout addresses this challenge by constructing an appropriate bandwidth for anomaly detection using topological data analysis, so the user doesn’t need to specify a bandwidth parameter. Furthermore, lookout has a low false positive rate because it uses extreme value theory. We also introduce the concept of anomaly persistence, which explores the birth and death of anomalies as the bandwidth changes. If a data point is identified as an anomaly for a large range of bandwidth values, then its significance as an anomaly is high.

Data & Analytics

Here is the
anomalow-down!
Sevvandi Kandanaarachchi
RMIT University
Joint work with Rob Hyndman
1

Why anomalies?
• They tell a different story
• Fraudulent credit card transactions amongst billions of
legitimate transactions
• Computer network intrusions
• Astronomical anomalies – solar flares
• Weather anomalies – tsunamis
• Stock market anomalies – heralding a crash?
2

Anomaly detection – why?
• Take fraud and network intrusions for example
• Training a model on certain fraud/intrusions/cyber attacks is
not optimal, because there are new types of fraud/attacks,
always!
• You want to be alerted when weird things happen.
• Anomaly detection is used in these applications.
3

Some
Current
Challenges
High dimensionality of data
• Finding anomalies in high dimensional data is hard
• Anomalies and normal points look similar
High false positives
• Do not want an “alarm factory” – confidence in the
system goes down
Parameters need to be defined by the user
• But expert knowledge is needed
5

Overview
lookout – an
anomaly
detection
method
Low false positives
User does not need to specify parameters
lookout – on CRAN
dobin – a
dimension
reduction
method for
anomaly
detection
Addresses the high dimensionality challenge
dobin – on CRAN
6

dobin –
dimension
reduction for
outlier detection
Sevvandi Kandanaarachchi, Rob Hyndman
JCGS, (2021) 30:1, 204-219
7

What is it?
Original anomalies are still
anomalies in the reduced
dimensional space
It is a preprocessing technique
Not an anomaly detection method
8

What does
it do?
Find a set of new axes (basis
vectors), which preserves
anomalies
First basis vector in the direction of
most anomalousness (largest knn
distances), second basis vector in
the direction of second largest knn
distances
9

Example
• Uniform distribution in 20
dimensions,
• one point at (0.9, 0.9, 0.9, . . .)
• This is the outlier
• In R
• > dobin(X)
10

Sevvandi Kandanaarachchi, Rob Hyndman
Preprint - https://bit.ly/lookoutliers
lookout – leave one
out kde for outlier
detection
11

lookout
Outlier detection method
• Because of Extreme Value Theory
(EVT)
• EVT is used to model 100-year floods
• Use a Generalized Pareto Distribution
Low false positives
Not an “alarm factory”
12

lookout
User does not need to specify
parameters
• Use Kernel Density Estimates –
need a bandwidth parameter
• But general bandwidth is not
appropriate for anomaly detection
• Select bandwidth using topological
data analysis
• bw(TDA) → KDE → EVT → outliers
Anomaly persistence
• Which anomalies are consistently
identified, with changing
bandwidth?
• Visual representation of anomaly
persistence
13

Example 1
2D normal distribution, with outliers at the far end.
The outlying indices are 501 - 505
The persistence diagram. The outliers get identified
for a large range of bandwidth values.
14

Example 2
2D bimodal distribution, with outliers in the trough.
The outliers have indices 1001 - 1005
The persistence diagram. Again, the outliers
get identified for a large range of bandwidth values.
15

Example 3
Points in 3 normally distributed clusters, with anomalies
away from them. Anomalies have indices 701 - 703.
The persistence diagram. Anomalies get
identified for a broad range of bandwidth
values.
16

Example 4
Points in an annulus with anomalies in the middle.
Anomalies have indices 1001 - 1010
The persistence diagram.
17

Summary
• dobin - a dimension reduction method for anomaly detection
• lookout - a EVT based method to find anomalies
• Both paper/preprint available
• https://doi.org/10.1080/10618600.2020.1807353
• https://bit.ly/lookoutliers
• Both packages on CRAN
18

Similar to Here is the anomalow-down!

FINAL B.V.C 8051.pptxJayshree873826

presentation.pptxshamaaslam3

Big Data for Big Power: How smart is the grid if the infrastructure is stupid?OReillyStrata

From ensembles to computer networksCSIRO

4th Year Project Presentation SlidesItrat Rahman

Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01MapR Technologies

Strata 2014 Anomaly DetectionTed Dunning

Final observability starts_with_dataDave McAllister

Credit Card Fraudulent Transaction Detection Research PaperGarvit Burad

Reproducible Emulation of Analog Behavioral Modelsfnothaft

Anomalies and events keep us on our toesCSIRO

Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P

cable fault.pptxramalingams7

Estimating default risk in fund structuresIFMR

Portal Imaging used to clear setup uncertaintyMajoVJJose

Practical solutions in ultra low power design for artificial retinachiportal

238 iit conf 2384th International Conference on Advances in Energy Research (ICAER) 2013

Digital radiography testingmehrdad kehtari

“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...Edge AI and Vision Alliance

Wqtc2013 invest ofperformanceprobswitheds-20130910John B. Cook, PE, CEO

Similar to Here is the anomalow-down! (20)

FINAL B.V.C 8051.pptx

presentation.pptx

Big Data for Big Power: How smart is the grid if the infrastructure is stupid?

From ensembles to computer networks

4th Year Project Presentation Slides

Strata 2014-tdunning-anomaly-detection-140211162923-phpapp01

Strata 2014 Anomaly Detection

Final observability starts_with_data

Credit Card Fraudulent Transaction Detection Research Paper

Reproducible Emulation of Analog Behavioral Models

Anomalies and events keep us on our toes

Anomaly detection (Unsupervised Learning) in Machine Learning

cable fault.pptx

Estimating default risk in fund structures

Portal Imaging used to clear setup uncertainty

Practical solutions in ultra low power design for artificial retina

238 iit conf 238

Digital radiography testing

“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...

Wqtc2013 invest ofperformanceprobswitheds-20130910

Recently uploaded

RadioAdProWritingCinderellabyButleri.pdfgstagge

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor

From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck

Industrialised data - the key to AI success.pdfLars Albertsson

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

E-Commerce Order PredictionShraddha Kamble.pptxBoston Institute of Analytics

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach

Ukraine War presentation: KNOW THE BASICSAishani27

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor

Data Science Jobs and Salaries Analysis.pptxFurkanTasci3

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

Recently uploaded (20)

RadioAdProWritingCinderellabyButleri.pdf

Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai

From idea to production in a day – Leveraging Azure ML and Streamlit to build...

Industrialised data - the key to AI success.pdf

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

Data Science Project: Advancements in Fetal Health Classification

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

100-Concepts-of-AI by Anupama Kate .pptx

E-Commerce Order PredictionShraddha Kamble.pptx

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt

Ukraine War presentation: KNOW THE BASICS

04242024_CCC TUG_Joins and Relationships

VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...

Data Science Jobs and Salaries Analysis.pptx

RA-11058_IRR-COMPRESS Do 198 series of 1998

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

Here is the anomalow-down!

1. Here is the anomalow-down! Sevvandi Kandanaarachchi RMIT University Joint work with Rob Hyndman 1

2. Why anomalies? • They tell a different story • Fraudulent credit card transactions amongst billions of legitimate transactions • Computer network intrusions • Astronomical anomalies – solar flares • Weather anomalies – tsunamis • Stock market anomalies – heralding a crash? 2

3. Anomaly detection – why? • Take fraud and network intrusions for example • Training a model on certain fraud/intrusions/cyber attacks is not optimal, because there are new types of fraud/attacks, always! • You want to be alerted when weird things happen. • Anomaly detection is used in these applications. 3

4. Is everything rosy? 4

5. Some Current Challenges High dimensionality of data • Finding anomalies in high dimensional data is hard • Anomalies and normal points look similar High false positives • Do not want an “alarm factory” – confidence in the system goes down Parameters need to be defined by the user • But expert knowledge is needed 5

6. Overview lookout – an anomaly detection method Low false positives User does not need to specify parameters lookout – on CRAN dobin – a dimension reduction method for anomaly detection Addresses the high dimensionality challenge dobin – on CRAN 6

7. dobin – dimension reduction for outlier detection Sevvandi Kandanaarachchi, Rob Hyndman JCGS, (2021) 30:1, 204-219 7

8. What is it? Original anomalies are still anomalies in the reduced dimensional space It is a preprocessing technique Not an anomaly detection method 8

9. What does it do? Find a set of new axes (basis vectors), which preserves anomalies First basis vector in the direction of most anomalousness (largest knn distances), second basis vector in the direction of second largest knn distances 9

10. Example • Uniform distribution in 20 dimensions, • one point at (0.9, 0.9, 0.9, . . .) • This is the outlier • In R • > dobin(X) 10

11. Sevvandi Kandanaarachchi, Rob Hyndman Preprint - https://bit.ly/lookoutliers lookout – leave one out kde for outlier detection 11

12. lookout Outlier detection method • Because of Extreme Value Theory (EVT) • EVT is used to model 100-year floods • Use a Generalized Pareto Distribution Low false positives Not an “alarm factory” 12

13. lookout User does not need to specify parameters • Use Kernel Density Estimates – need a bandwidth parameter • But general bandwidth is not appropriate for anomaly detection • Select bandwidth using topological data analysis • bw(TDA) → KDE → EVT → outliers Anomaly persistence • Which anomalies are consistently identified, with changing bandwidth? • Visual representation of anomaly persistence 13

14. Example 1 2D normal distribution, with outliers at the far end. The outlying indices are 501 - 505 The persistence diagram. The outliers get identified for a large range of bandwidth values. 14

15. Example 2 2D bimodal distribution, with outliers in the trough. The outliers have indices 1001 - 1005 The persistence diagram. Again, the outliers get identified for a large range of bandwidth values. 15

16. Example 3 Points in 3 normally distributed clusters, with anomalies away from them. Anomalies have indices 701 - 703. The persistence diagram. Anomalies get identified for a broad range of bandwidth values. 16

17. Example 4 Points in an annulus with anomalies in the middle. Anomalies have indices 1001 - 1010 The persistence diagram. 17

18. Summary • dobin - a dimension reduction method for anomaly detection • lookout - a EVT based method to find anomalies • Both paper/preprint available • https://doi.org/10.1080/10618600.2020.1807353 • https://bit.ly/lookoutliers • Both packages on CRAN 18

19. Thank you! 19

Here is the anomalow-down!

Recommended

Recommended

More Related Content

Similar to Here is the anomalow-down!

Similar to Here is the anomalow-down! (20)

More from CSIRO

More from CSIRO (15)

Recently uploaded

Recently uploaded (20)

Here is the anomalow-down!