SlideShare a Scribd company logo
1 of 8
Download to read offline
Outlier Detection
Contents
Outlier Detection
Types of outliers
Common causes of outliers
Methods for outlier detection
Outlier Detection
• Observation which deviates so much from other observations
as to arouse suspicion it was generated by a different
mechanism
•
An outlier may be defined as a piece of data or observation
that deviates drastically from the given norm or average of the
data set.
• An outlier is a data point that differs
significantly from other observations.
Types of outliers
Outliers can be of two kinds:
univariate and multivariate.
Univariate outliers can be found when looking at a distribution
of values in a single feature space.
Multivariate outliers can be found in a n-dimensional space (of
n-features). Looking at distributions in n-dimensional spaces
can be very difficult for the human brain, that is why we need
to train a model
Common causes of outliers
• Data entry errors (human errors)
• Measurement errors (instrument errors)
• Experimental errors (data extraction or experiment planning/executing errors)
• Intentional (dummy outliers made to test detection methods)
• Data processing errors (data manipulation or data set unintended mutations)
• Sampling errors (extracting or mixing data from wrong or various sources)
• Natural (not an error, novelties in data)
Methods for outlier detection
• Z-Score or Extreme Value Analysis (parametric)
• Probabilistic and Statistical Modeling (parametric)
• Linear Regression Models (PCA, LMS)
• Proximity Based Models (non-parametric)
• Information Theory Models
• High Dimensional Outlier Detection Methods (high dimensional
sparse data)
References
https://towardsdatascience.com/
https://www.techopedia.com/

More Related Content

What's hot

Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaPyData
 
Introduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detectionIntroduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detectionJoseph Itopa Abubakar
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleImpetus Technologies
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Salah Amean
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionKhalid Elshafie
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Marina Santini
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPTANUSUYA T K
 
Cross validation
Cross validationCross validation
Cross validationRidhaAfrawe
 

What's hot (20)

Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
 
Introduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detectionIntroduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detection
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly Detection
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPT
 
Cross validation
Cross validationCross validation
Cross validation
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
KNN
KNN KNN
KNN
 

Similar to Outlier Detection

Pattern recognition at scale anomaly detection in banking on stream data
Pattern recognition at scale anomaly detection in banking on stream dataPattern recognition at scale anomaly detection in banking on stream data
Pattern recognition at scale anomaly detection in banking on stream dataNUS-ISS
 
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and TechniquesOutlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and TechniquesAshikur Rahman
 
Data Sampling Methods in Healthcare
Data Sampling Methods in Healthcare Data Sampling Methods in Healthcare
Data Sampling Methods in Healthcare kiran
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.CSIRO
 
types of data in research, measurement level, sampling techniques, sampling t...
types of data in research, measurement level, sampling techniques, sampling t...types of data in research, measurement level, sampling techniques, sampling t...
types of data in research, measurement level, sampling techniques, sampling t...SRM UNIVERSITY, SIKKIM
 
Errors in Research
Errors in ResearchErrors in Research
Errors in ResearchTANUSISODIA2
 
2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higgins2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higginsrgveroniki
 
Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing DataDataCards
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.pptmanaswidebbarma1
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysisDatamining Tools
 
Marketing Research Project on T test
Marketing Research Project on T test Marketing Research Project on T test
Marketing Research Project on T test Meghna Baid
 
unit 10 Sampling presentation L- short.ppt
unit 10 Sampling presentation L- short.pptunit 10 Sampling presentation L- short.ppt
unit 10 Sampling presentation L- short.pptMitikuTeka1
 
Sampling and sampling distribution tttt
Sampling and sampling distribution ttttSampling and sampling distribution tttt
Sampling and sampling distribution ttttpardeepkaur60
 
Clinical data munging
Clinical data mungingClinical data munging
Clinical data mungingKen Mwai
 

Similar to Outlier Detection (20)

Pattern recognition at scale anomaly detection in banking on stream data
Pattern recognition at scale anomaly detection in banking on stream dataPattern recognition at scale anomaly detection in banking on stream data
Pattern recognition at scale anomaly detection in banking on stream data
 
Depth based app
Depth based appDepth based app
Depth based app
 
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and TechniquesOutlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
 
Data Sampling Methods in Healthcare
Data Sampling Methods in Healthcare Data Sampling Methods in Healthcare
Data Sampling Methods in Healthcare
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.
 
Data wrangling week 10
Data wrangling week 10Data wrangling week 10
Data wrangling week 10
 
types of data in research, measurement level, sampling techniques, sampling t...
types of data in research, measurement level, sampling techniques, sampling t...types of data in research, measurement level, sampling techniques, sampling t...
types of data in research, measurement level, sampling techniques, sampling t...
 
Errors in Research
Errors in ResearchErrors in Research
Errors in Research
 
2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higgins2010 smg training_cardiff_day1_session3_higgins
2010 smg training_cardiff_day1_session3_higgins
 
Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing Data
 
Chapter 6 Selecting a Sample
Chapter 6 Selecting a SampleChapter 6 Selecting a Sample
Chapter 6 Selecting a Sample
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Marketing Research Project on T test
Marketing Research Project on T test Marketing Research Project on T test
Marketing Research Project on T test
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
unit 10 Sampling presentation L- short.ppt
unit 10 Sampling presentation L- short.pptunit 10 Sampling presentation L- short.ppt
unit 10 Sampling presentation L- short.ppt
 
Sampling and sampling distribution tttt
Sampling and sampling distribution ttttSampling and sampling distribution tttt
Sampling and sampling distribution tttt
 
Clinical data munging
Clinical data mungingClinical data munging
Clinical data munging
 
2RM2 PPT.pptx
2RM2 PPT.pptx2RM2 PPT.pptx
2RM2 PPT.pptx
 
Survey design
Survey designSurvey design
Survey design
 

More from Dr. Abdul Ahad Abro

Artificial intelligence - AI Complete Concept
Artificial intelligence - AI Complete ConceptArtificial intelligence - AI Complete Concept
Artificial intelligence - AI Complete ConceptDr. Abdul Ahad Abro
 
Edge Coloring & K-tuple coloring
Edge Coloring & K-tuple coloringEdge Coloring & K-tuple coloring
Edge Coloring & K-tuple coloringDr. Abdul Ahad Abro
 
Shortest-Path Problems - Graph Theory in Computer Applications
Shortest-Path Problems - Graph Theory in Computer ApplicationsShortest-Path Problems - Graph Theory in Computer Applications
Shortest-Path Problems - Graph Theory in Computer ApplicationsDr. Abdul Ahad Abro
 
Connectivity - Graph Theory in Computer Applications
Connectivity - Graph Theory in Computer ApplicationsConnectivity - Graph Theory in Computer Applications
Connectivity - Graph Theory in Computer ApplicationsDr. Abdul Ahad Abro
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
Regression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms ExcelRegression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms ExcelDr. Abdul Ahad Abro
 
Expert System - Artificial intelligence
Expert System - Artificial intelligenceExpert System - Artificial intelligence
Expert System - Artificial intelligenceDr. Abdul Ahad Abro
 

More from Dr. Abdul Ahad Abro (10)

DBMS & RDBMS
DBMS & RDBMSDBMS & RDBMS
DBMS & RDBMS
 
AI vs Human
AI vs HumanAI vs Human
AI vs Human
 
Artificial intelligence - AI Complete Concept
Artificial intelligence - AI Complete ConceptArtificial intelligence - AI Complete Concept
Artificial intelligence - AI Complete Concept
 
Edge Coloring & K-tuple coloring
Edge Coloring & K-tuple coloringEdge Coloring & K-tuple coloring
Edge Coloring & K-tuple coloring
 
Graph Coloring
Graph ColoringGraph Coloring
Graph Coloring
 
Shortest-Path Problems - Graph Theory in Computer Applications
Shortest-Path Problems - Graph Theory in Computer ApplicationsShortest-Path Problems - Graph Theory in Computer Applications
Shortest-Path Problems - Graph Theory in Computer Applications
 
Connectivity - Graph Theory in Computer Applications
Connectivity - Graph Theory in Computer ApplicationsConnectivity - Graph Theory in Computer Applications
Connectivity - Graph Theory in Computer Applications
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Regression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms ExcelRegression with Microsoft Azure & Ms Excel
Regression with Microsoft Azure & Ms Excel
 
Expert System - Artificial intelligence
Expert System - Artificial intelligenceExpert System - Artificial intelligence
Expert System - Artificial intelligence
 

Recently uploaded

Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 

Recently uploaded (20)

Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 

Outlier Detection

  • 1.
  • 3. Contents Outlier Detection Types of outliers Common causes of outliers Methods for outlier detection
  • 4. Outlier Detection • Observation which deviates so much from other observations as to arouse suspicion it was generated by a different mechanism • An outlier may be defined as a piece of data or observation that deviates drastically from the given norm or average of the data set. • An outlier is a data point that differs significantly from other observations.
  • 5. Types of outliers Outliers can be of two kinds: univariate and multivariate. Univariate outliers can be found when looking at a distribution of values in a single feature space. Multivariate outliers can be found in a n-dimensional space (of n-features). Looking at distributions in n-dimensional spaces can be very difficult for the human brain, that is why we need to train a model
  • 6. Common causes of outliers • Data entry errors (human errors) • Measurement errors (instrument errors) • Experimental errors (data extraction or experiment planning/executing errors) • Intentional (dummy outliers made to test detection methods) • Data processing errors (data manipulation or data set unintended mutations) • Sampling errors (extracting or mixing data from wrong or various sources) • Natural (not an error, novelties in data)
  • 7. Methods for outlier detection • Z-Score or Extreme Value Analysis (parametric) • Probabilistic and Statistical Modeling (parametric) • Linear Regression Models (PCA, LMS) • Proximity Based Models (non-parametric) • Information Theory Models • High Dimensional Outlier Detection Methods (high dimensional sparse data)