SlideShare a Scribd company logo
Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
Outliners
Parameter Tuning & Use cases
Introduction with
example
What is Outliners in Data
An outlier is an
element of a data set
that distinctly stands
out from the rest of
the data
In other words, they
are the observations
lying outside overall
pattern of distribution
as shown in the figure
Outliers
Example Outliners
An outlier in the list 212,
361, 201, 203, 227, 221, 188,
192, 198 is 361
An outlier in the list 14, 9, 17,
19, 42, 22, 35, 99, 32, 2 is 99
In the examples,
361 and 99 are
far apart from the
remaining set of
values making
them Outlier
How to detect outliers
in large dataset
How to Detect Outliers
The easiest way
to detect
outliers is by
creating a
graph. Plots
such as
Box plots
Scatterplot and
Histogram
Can easily help
us detect
outliers
Alternatively we can
use mean and
standard deviation to
list out the outliers
Interquartile Range
and Quartiles can also
be used to detect
outliers
Detecting Outliers
• We can simply use following formula to identify outliers;
this is subject to analyst if he/she wants to change this
criteria : Outliers = (Xi-mean) > 3* 𝝈
Where Xi = Observation
𝝈 = Standard Deviation
• This will classify those data points into outliers whose
distance from mean is beyond 3 standard deviation
• Alternatively we can use Q1-1.5*IQR and Q3+1.5IQR
formula to detect lower and upper outliers where IQR is
Inter Quartile Range which is Quartile 3rd - Quartile 1st i.e.
Percentile 75th – Percentile 25th
With Mean + Standard deviation or
Inter Quartile Range
Detecting Outliers
With Histogram
A univariate outlier is a data point
that consists of an extreme value on
one variable
If you look at the Histogram, you
can notice that there is one value
that lies far to the left side of all the
other data. This data point is an
outlier
Detecting Outliers
With Box plot
A data point is an outlier if it is
more than 1.5 IQR above the third
quartile or below the first quartile
In other words, low outliers are
below Q1-1.5*IQR and high outliers
are above Q3+1.5*IQR where IQR is
Inter Quartile Range
If you look at the box below, you
can notice the outliers easily; These
are the points lying above
Q3+1.5*IQR and below Q1-1.5*IQR
Detecting bivariate
outliers
With Scatterplot
When we are working with two
quantitative variables, we can look
at a scatterplot to identify bivariate
(two variable) outliers
A bivariate outlier is an observation
that does not fit with the pattern of
the other observations. In the plot
below, there is an arrow pointing
out the outlier
How to handle outliers
How to Handle Outliers
We can either remove the
outliers altogether from the
selected dataset or we can
replace them by
recommended statistical
measure that is percentiles
1
It is general practice to
replace lower and upper
outliers with 5th and 95th
percentile values
respectively
2
But , in case of domains
demanding high accuracy
and no loss of data, we can
use 1st and 99th percentile
values to replace lower and
upper outliers respectively
3
Thus, it is at sole discretion
of an analyst which
approach to select and
apply
4
Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018

More Related Content

What's hot

Advance Excel tips
Advance Excel tips Advance Excel tips
Advance Excel tips Ashish Patel
 
Algorithm and Data Structure - Binary Search
Algorithm and Data Structure - Binary SearchAlgorithm and Data Structure - Binary Search
Algorithm and Data Structure - Binary SearchAndiNurkholis1
 
1st Semester 7th Grade Math Notes To Memorize
1st Semester 7th Grade Math Notes To Memorize1st Semester 7th Grade Math Notes To Memorize
1st Semester 7th Grade Math Notes To MemorizeMrs. Henley
 
Algorithm and Data Structure - Linear Search
Algorithm and Data Structure - Linear SearchAlgorithm and Data Structure - Linear Search
Algorithm and Data Structure - Linear SearchAndiNurkholis1
 
Lesson8 creating complex formulas
Lesson8 creating complex formulasLesson8 creating complex formulas
Lesson8 creating complex formulasricsanmae
 
Sulpcegu5e ppt 2_1
Sulpcegu5e ppt 2_1Sulpcegu5e ppt 2_1
Sulpcegu5e ppt 2_1silvia
 
5 1 Systems Of Linear Equat Two Var
5 1 Systems Of Linear Equat Two Var5 1 Systems Of Linear Equat Two Var
5 1 Systems Of Linear Equat Two Varsilvia
 

What's hot (10)

Advance Excel tips
Advance Excel tips Advance Excel tips
Advance Excel tips
 
Algorithm and Data Structure - Binary Search
Algorithm and Data Structure - Binary SearchAlgorithm and Data Structure - Binary Search
Algorithm and Data Structure - Binary Search
 
1st Semester 7th Grade Math Notes To Memorize
1st Semester 7th Grade Math Notes To Memorize1st Semester 7th Grade Math Notes To Memorize
1st Semester 7th Grade Math Notes To Memorize
 
Algorithm and Data Structure - Linear Search
Algorithm and Data Structure - Linear SearchAlgorithm and Data Structure - Linear Search
Algorithm and Data Structure - Linear Search
 
Lesson8 creating complex formulas
Lesson8 creating complex formulasLesson8 creating complex formulas
Lesson8 creating complex formulas
 
Aproximacion numerica
Aproximacion numericaAproximacion numerica
Aproximacion numerica
 
Matrix
MatrixMatrix
Matrix
 
Sulpcegu5e ppt 2_1
Sulpcegu5e ppt 2_1Sulpcegu5e ppt 2_1
Sulpcegu5e ppt 2_1
 
5 1 Systems Of Linear Equat Two Var
5 1 Systems Of Linear Equat Two Var5 1 Systems Of Linear Equat Two Var
5 1 Systems Of Linear Equat Two Var
 
Entering formulas in excel
Entering formulas in excelEntering formulas in excel
Entering formulas in excel
 

Similar to What is Outlier Analysis and How Can It Improve Analysis?

Most prominent methods of how to find outliers in statistics
Most prominent methods of how to find outliers in statisticsMost prominent methods of how to find outliers in statistics
Most prominent methods of how to find outliers in statisticsStat Analytica
 
DATA SCIENCE - Outlier detection and treatment_ sachin pathania
DATA SCIENCE - Outlier detection and treatment_ sachin pathaniaDATA SCIENCE - Outlier detection and treatment_ sachin pathania
DATA SCIENCE - Outlier detection and treatment_ sachin pathaniaSachin Pathania
 
Chapter 02 describing distributions with numbers part II
Chapter 02 describing distributions with numbers part IIChapter 02 describing distributions with numbers part II
Chapter 02 describing distributions with numbers part IIHamdy F. F. Mahmoud
 
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docxTSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docxnanamonkton
 
4. six sigma descriptive statistics
4. six sigma descriptive statistics4. six sigma descriptive statistics
4. six sigma descriptive statisticsHakeem-Ur- Rehman
 
Revisionf2
Revisionf2Revisionf2
Revisionf2wind12
 
ap_stat_1.3.ppt
ap_stat_1.3.pptap_stat_1.3.ppt
ap_stat_1.3.pptfghgjd
 
SELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSSELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSKAMIL MAJEED
 
EDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptxEDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptxBrajkishore23
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersionGnana Sravani
 
Exploratory Data Analysis Unit 1 ppt presentation.pptx
Exploratory Data Analysis Unit 1 ppt presentation.pptxExploratory Data Analysis Unit 1 ppt presentation.pptx
Exploratory Data Analysis Unit 1 ppt presentation.pptxMayura shelke
 
What is Descriptive Statistics and How Do You Choose the Right One for Enterp...
What is Descriptive Statistics and How Do You Choose the Right One for Enterp...What is Descriptive Statistics and How Do You Choose the Right One for Enterp...
What is Descriptive Statistics and How Do You Choose the Right One for Enterp...Smarten Augmented Analytics
 
MEASURES OF CENTRAL TENDENCY AND MEASURES OF DISPERSION
MEASURES OF CENTRAL TENDENCY AND  MEASURES OF DISPERSION MEASURES OF CENTRAL TENDENCY AND  MEASURES OF DISPERSION
MEASURES OF CENTRAL TENDENCY AND MEASURES OF DISPERSION Tanya Singla
 

Similar to What is Outlier Analysis and How Can It Improve Analysis? (20)

Outliers introductory stat
Outliers introductory statOutliers introductory stat
Outliers introductory stat
 
Most prominent methods of how to find outliers in statistics
Most prominent methods of how to find outliers in statisticsMost prominent methods of how to find outliers in statistics
Most prominent methods of how to find outliers in statistics
 
outliers
outliersoutliers
outliers
 
DATA SCIENCE - Outlier detection and treatment_ sachin pathania
DATA SCIENCE - Outlier detection and treatment_ sachin pathaniaDATA SCIENCE - Outlier detection and treatment_ sachin pathania
DATA SCIENCE - Outlier detection and treatment_ sachin pathania
 
Chapter 02 describing distributions with numbers part II
Chapter 02 describing distributions with numbers part IIChapter 02 describing distributions with numbers part II
Chapter 02 describing distributions with numbers part II
 
Outlier managment
Outlier managmentOutlier managment
Outlier managment
 
EDA.pptx
EDA.pptxEDA.pptx
EDA.pptx
 
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docxTSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
 
4. six sigma descriptive statistics
4. six sigma descriptive statistics4. six sigma descriptive statistics
4. six sigma descriptive statistics
 
Revisionf2
Revisionf2Revisionf2
Revisionf2
 
ap_stat_1.3.ppt
ap_stat_1.3.pptap_stat_1.3.ppt
ap_stat_1.3.ppt
 
SELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSSELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODS
 
EDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptxEDA and Preprocessing in Tabular and Text data .pptx
EDA and Preprocessing in Tabular and Text data .pptx
 
10391737.ppt
10391737.ppt10391737.ppt
10391737.ppt
 
Section 2
Section 2Section 2
Section 2
 
DESCRIBING VARIABILITY.pptx
DESCRIBING  VARIABILITY.pptxDESCRIBING  VARIABILITY.pptx
DESCRIBING VARIABILITY.pptx
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
Exploratory Data Analysis Unit 1 ppt presentation.pptx
Exploratory Data Analysis Unit 1 ppt presentation.pptxExploratory Data Analysis Unit 1 ppt presentation.pptx
Exploratory Data Analysis Unit 1 ppt presentation.pptx
 
What is Descriptive Statistics and How Do You Choose the Right One for Enterp...
What is Descriptive Statistics and How Do You Choose the Right One for Enterp...What is Descriptive Statistics and How Do You Choose the Right One for Enterp...
What is Descriptive Statistics and How Do You Choose the Right One for Enterp...
 
MEASURES OF CENTRAL TENDENCY AND MEASURES OF DISPERSION
MEASURES OF CENTRAL TENDENCY AND  MEASURES OF DISPERSION MEASURES OF CENTRAL TENDENCY AND  MEASURES OF DISPERSION
MEASURES OF CENTRAL TENDENCY AND MEASURES OF DISPERSION
 

More from Smarten Augmented Analytics

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenSmarten Augmented Analytics
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...Smarten Augmented Analytics
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...Smarten Augmented Analytics
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?Smarten Augmented Analytics
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?Smarten Augmented Analytics
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values Smarten Augmented Analytics
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenSmarten Augmented Analytics
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 

More from Smarten Augmented Analytics (20)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 

Recently uploaded

Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAlluxio, Inc.
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAlluxio, Inc.
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfmbmh111980
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILNatan Silnitsky
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEJelle | Nordend
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...Alluxio, Inc.
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...rajkumar669520
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1KnowledgeSeed
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockSkilrock Technologies
 

Recently uploaded (20)

Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 

What is Outlier Analysis and How Can It Improve Analysis?

  • 1. Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
  • 4. What is Outliners in Data An outlier is an element of a data set that distinctly stands out from the rest of the data In other words, they are the observations lying outside overall pattern of distribution as shown in the figure Outliers
  • 5. Example Outliners An outlier in the list 212, 361, 201, 203, 227, 221, 188, 192, 198 is 361 An outlier in the list 14, 9, 17, 19, 42, 22, 35, 99, 32, 2 is 99 In the examples, 361 and 99 are far apart from the remaining set of values making them Outlier
  • 6. How to detect outliers in large dataset
  • 7. How to Detect Outliers The easiest way to detect outliers is by creating a graph. Plots such as Box plots Scatterplot and Histogram Can easily help us detect outliers Alternatively we can use mean and standard deviation to list out the outliers Interquartile Range and Quartiles can also be used to detect outliers
  • 8. Detecting Outliers • We can simply use following formula to identify outliers; this is subject to analyst if he/she wants to change this criteria : Outliers = (Xi-mean) > 3* 𝝈 Where Xi = Observation 𝝈 = Standard Deviation • This will classify those data points into outliers whose distance from mean is beyond 3 standard deviation • Alternatively we can use Q1-1.5*IQR and Q3+1.5IQR formula to detect lower and upper outliers where IQR is Inter Quartile Range which is Quartile 3rd - Quartile 1st i.e. Percentile 75th – Percentile 25th With Mean + Standard deviation or Inter Quartile Range
  • 9. Detecting Outliers With Histogram A univariate outlier is a data point that consists of an extreme value on one variable If you look at the Histogram, you can notice that there is one value that lies far to the left side of all the other data. This data point is an outlier
  • 10. Detecting Outliers With Box plot A data point is an outlier if it is more than 1.5 IQR above the third quartile or below the first quartile In other words, low outliers are below Q1-1.5*IQR and high outliers are above Q3+1.5*IQR where IQR is Inter Quartile Range If you look at the box below, you can notice the outliers easily; These are the points lying above Q3+1.5*IQR and below Q1-1.5*IQR
  • 11. Detecting bivariate outliers With Scatterplot When we are working with two quantitative variables, we can look at a scatterplot to identify bivariate (two variable) outliers A bivariate outlier is an observation that does not fit with the pattern of the other observations. In the plot below, there is an arrow pointing out the outlier
  • 12. How to handle outliers
  • 13. How to Handle Outliers We can either remove the outliers altogether from the selected dataset or we can replace them by recommended statistical measure that is percentiles 1 It is general practice to replace lower and upper outliers with 5th and 95th percentile values respectively 2 But , in case of domains demanding high accuracy and no loss of data, we can use 1st and 99th percentile values to replace lower and upper outliers respectively 3 Thus, it is at sole discretion of an analyst which approach to select and apply 4
  • 14. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com June 2018