SlideShare a Scribd company logo
1 of 13
Anomaly Detection(10.1 ~ 10.3) Khalid Elshafie abolkog@dblab.cbnu.ac.kr Database / Bioinformatics Lab. Chungbuk National University
Anomaly Detection (10.1 ~ 10.3) Contents 1 2 3 Introduction Statistical Approach Proximity-based Approach 2
Anomaly Detection (10.1 ~ 10.3) Introduction (1/4) Anomaly Detection Find objects that are different from most other objects. Anomaly objects are often known as outliers. On a scatter plot of data, they lie far away from other data points. Also knows as Deviation detection Anomalous objects have attribute values that deviate significantly from the expected or typical attribute values. Exception mining Because anomalies are exceptional in some sense. 3 outlier
Anomaly Detection (10.1 ~ 10.3) Introduction (2/4) Applications Fraud Detection. The purchasing behavior of someone who steals a credit card is probably different from that of the original owner. Intrusion Detection. Attacks on computer systems and computer networks. Ecosystem Disturbance. Hurricanes, floods, heat waves…etc Medicine. Unusual symptoms or test result may indicate potential health problem. …… 4
Anomaly Detection (10.1 ~ 10.3) Introduction (3/4) What causes anomalies Data from Different Sources Someone who committing credit card fraud belongs to different class than those people who use credit card legitimately. Such anomalies are often of considerable interest and are the focus of anomaly detection in the field of data mining. An outlier is an observation that differs so much from other observations as to arouse suspicion that it was generated by different mechanism (Hawkins’ Definition of Outlier). Natural Variant Many data sets can be modeled by statistical distribution where the probability of a data object decrease rapidly as the distance of the object from the center of the distribution increases. Most objects are near a center (average object) and the likelihood that an object differs from this average is small. Anomalies that represent extreme or unlikely variations are often interesting. Data Measurement and Collection Error Error in the data collection or measurement process are another source of anomalies. The goal is to eliminate such anomalies since they provide no interesting information but only reduce the quality of the data and the subsequent data analysis. 5
Anomaly Detection (10.1 ~ 10.3) Introduction (4/4) Approach to Anomaly Detection Model-based Technique. Build a model of the data. Anomalies are objects that do not fit the model very well. Proximity-based Technique. Many of the technique in this area are based on distances and are referred toasdistance-based outlier detection technique. Anomalous object are those that are distant from most of the other objects. Density-Based Technique. Objects that are in regions of low density are relatively distant from their neighbors and can be considered anomalous. 6
Anomaly Detection (10.1 ~ 10.3) Statistical Approach (1/2) Statistical approach are model-based approaches A model is created for the data and object are evaluated with respect to how well they fit the model. Most statistical approach to outlier detection are based on building a probability model distribution model and considering how likely objects are under that model. Outliers are objects that has a low probability with respect to probability distribution model of the data (Probabilistic Definition of an Outlier). 7
Anomaly Detection (10.1 ~ 10.3) Statistical Approach (2/2) Strength and weakness  Have a firm foundation and build on standard statistical technique When there is sufficient knowledge of the data and the type of the test that should be applied, these tests can be very effective. There are a wide variety of statistical outliers test for single attributes, fewer options are available for multivariate data.  Can perform poorly for high-dimensional data. 8
Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (1/3) Proximity-based Approach The basic notation of this approach is straightforward An object is anomaly if it is distant from most point. More general and more easily applied than statistical approaches. Its easier to determine a meaningful proximity measure for data set than to determine its statistical distribution. One of the simplest way to measure whether an object is distant from most point is to use the distance to the k-nearest neighbor. The outlier score of an object is given by the distance to its k-nearest neighbor. The lowest value of outlier score is 0 The highest value is the maximum possible value of the distance function (usually infinity). 9
Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (2/4) 10 Approach: Compute the distance between every pair of data points There are various ways to define outliers: Data points for which there are fewer than p neighboring points within a distance D The top n data points whose distance to the kth nearest neighbor is greatest The top n data points whose average distance to the kth nearest neighbors is greatest
Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (3/4) 11 Proximity-based Approach ,[object Object]
The outlier score can be highly sensitive to the value of k
If k is too small e.g., 1 then a small number of nearby outliers can cause a low outlier score

More Related Content

What's hot

Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Salah Amean
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slidesQuantUniversity
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly DetectionKenneth Graham
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introductionDaeJin Kim
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection TechniqueChakrit Phain
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleImpetus Technologies
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptSubrata Kumer Paul
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionDerek Kane
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysisKrish_ver2
 

What's hot (20)

Outlier Detection
Outlier DetectionOutlier Detection
Outlier Detection
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly Detection
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Outlier detection method introduction
Outlier detection method introductionOutlier detection method introduction
Outlier detection method introduction
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Outlier detection handling
Outlier detection handlingOutlier detection handling
Outlier detection handling
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 
Chapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.pptChapter 12. Outlier Detection.ppt
Chapter 12. Outlier Detection.ppt
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 

Similar to Chapter 10 Anomaly Detection

Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detectionguest76d673
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataIJERA Editor
 
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxData Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxrandyburney60861
 
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier DetectionReverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection1crore projects
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detectionguest0edcaf
 
Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection IJORCS
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSZac Darcy
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersZac Darcy
 
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...theijes
 
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data StreamA Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data StreamIIRindia
 
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubsUnsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubsIRJET Journal
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique Sujeet Suryawanshi
 
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised DataOutlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised Dataijtsrd
 
Detection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed ApproachDetection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed ApproachEditor IJMTER
 

Similar to Chapter 10 Anomaly Detection (20)

Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detection
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional DataOutlier Detection Using Unsupervised Learning on High Dimensional Data
Outlier Detection Using Unsupervised Learning on High Dimensional Data
 
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxData Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
 
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier DetectionReverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection
 
angle based outlier de
angle based outlier deangle based outlier de
angle based outlier de
 
Kdd08 abod
Kdd08 abodKdd08 abod
Kdd08 abod
 
12 outlier
12 outlier12 outlier
12 outlier
 
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSA MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERS
 
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
 
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
 
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data StreamA Survey on Cluster Based Outlier Detection Techniques in Data Stream
A Survey on Cluster Based Outlier Detection Techniques in Data Stream
 
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubsUnsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
 
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised DataOutlier Detection using Reverse Neares Neighbor for Unsupervised Data
Outlier Detection using Reverse Neares Neighbor for Unsupervised Data
 
Detection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed ApproachDetection of Outliers in Large Dataset using Distributed Approach
Detection of Outliers in Large Dataset using Distributed Approach
 
G44093135
G44093135G44093135
G44093135
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Chapter 10 Anomaly Detection

  • 1. Anomaly Detection(10.1 ~ 10.3) Khalid Elshafie abolkog@dblab.cbnu.ac.kr Database / Bioinformatics Lab. Chungbuk National University
  • 2. Anomaly Detection (10.1 ~ 10.3) Contents 1 2 3 Introduction Statistical Approach Proximity-based Approach 2
  • 3. Anomaly Detection (10.1 ~ 10.3) Introduction (1/4) Anomaly Detection Find objects that are different from most other objects. Anomaly objects are often known as outliers. On a scatter plot of data, they lie far away from other data points. Also knows as Deviation detection Anomalous objects have attribute values that deviate significantly from the expected or typical attribute values. Exception mining Because anomalies are exceptional in some sense. 3 outlier
  • 4. Anomaly Detection (10.1 ~ 10.3) Introduction (2/4) Applications Fraud Detection. The purchasing behavior of someone who steals a credit card is probably different from that of the original owner. Intrusion Detection. Attacks on computer systems and computer networks. Ecosystem Disturbance. Hurricanes, floods, heat waves…etc Medicine. Unusual symptoms or test result may indicate potential health problem. …… 4
  • 5. Anomaly Detection (10.1 ~ 10.3) Introduction (3/4) What causes anomalies Data from Different Sources Someone who committing credit card fraud belongs to different class than those people who use credit card legitimately. Such anomalies are often of considerable interest and are the focus of anomaly detection in the field of data mining. An outlier is an observation that differs so much from other observations as to arouse suspicion that it was generated by different mechanism (Hawkins’ Definition of Outlier). Natural Variant Many data sets can be modeled by statistical distribution where the probability of a data object decrease rapidly as the distance of the object from the center of the distribution increases. Most objects are near a center (average object) and the likelihood that an object differs from this average is small. Anomalies that represent extreme or unlikely variations are often interesting. Data Measurement and Collection Error Error in the data collection or measurement process are another source of anomalies. The goal is to eliminate such anomalies since they provide no interesting information but only reduce the quality of the data and the subsequent data analysis. 5
  • 6. Anomaly Detection (10.1 ~ 10.3) Introduction (4/4) Approach to Anomaly Detection Model-based Technique. Build a model of the data. Anomalies are objects that do not fit the model very well. Proximity-based Technique. Many of the technique in this area are based on distances and are referred toasdistance-based outlier detection technique. Anomalous object are those that are distant from most of the other objects. Density-Based Technique. Objects that are in regions of low density are relatively distant from their neighbors and can be considered anomalous. 6
  • 7. Anomaly Detection (10.1 ~ 10.3) Statistical Approach (1/2) Statistical approach are model-based approaches A model is created for the data and object are evaluated with respect to how well they fit the model. Most statistical approach to outlier detection are based on building a probability model distribution model and considering how likely objects are under that model. Outliers are objects that has a low probability with respect to probability distribution model of the data (Probabilistic Definition of an Outlier). 7
  • 8. Anomaly Detection (10.1 ~ 10.3) Statistical Approach (2/2) Strength and weakness Have a firm foundation and build on standard statistical technique When there is sufficient knowledge of the data and the type of the test that should be applied, these tests can be very effective. There are a wide variety of statistical outliers test for single attributes, fewer options are available for multivariate data. Can perform poorly for high-dimensional data. 8
  • 9. Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (1/3) Proximity-based Approach The basic notation of this approach is straightforward An object is anomaly if it is distant from most point. More general and more easily applied than statistical approaches. Its easier to determine a meaningful proximity measure for data set than to determine its statistical distribution. One of the simplest way to measure whether an object is distant from most point is to use the distance to the k-nearest neighbor. The outlier score of an object is given by the distance to its k-nearest neighbor. The lowest value of outlier score is 0 The highest value is the maximum possible value of the distance function (usually infinity). 9
  • 10. Anomaly Detection (10.1 ~ 10.3) Proximity-based Approach (2/4) 10 Approach: Compute the distance between every pair of data points There are various ways to define outliers: Data points for which there are fewer than p neighboring points within a distance D The top n data points whose distance to the kth nearest neighbor is greatest The top n data points whose average distance to the kth nearest neighbors is greatest
  • 11.
  • 12. The outlier score can be highly sensitive to the value of k
  • 13. If k is too small e.g., 1 then a small number of nearby outliers can cause a low outlier score
  • 14.
  • 15. Thank You ! www.themegallery.com