SlideShare a Scribd company logo
1 of 25
Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
Descriptive Statistics
Parameter Tuning & Use cases
List Of Statistical Summary
Measures And Plots
List of statistical summary measures and plots
Mean, Median,
Mode
Percentile,
Quartile , Inter
Quartile Range
Skewness
Variance ,
Standard
deviation
Measures
Box plot and
Histogram plot
Plots
Introduction With Example
Introduction with example
Mean :
• It is simply the average of
all the data values
• This measure can be biased
in case of significant
number of outliers present
in data
• Descriptive statistics help
describe and understand
the features of a specific
dataset, by giving short
summaries about the
measures of the data. The
most recognized types of
descriptive statistics are
listed and explained below
• Mean, median, and mode
are different ways to figure
out an average
Median :
• It is the value in the middle
when the data items are
arranged in ascending order
• This measure is relatively
robust in case of significant
number of outliers present
in data making it more
appropriate measure of
average in case of presence
of outliers in data
• For instance, when profiling
customers based on various
attributes such as income
or balances, their median
age/income/balance etc.
can be looked at instead of
mean to avoid bias due to
outliers
Mode :
• It is the most frequently
occurring value in a series
of data
• In case of no repeating
values, there would be no
mode
• For example, in satisfaction
survey analysis, mode can
be used to find what is the
most common rating
provided by responders to a
particular service/product
• The second most popular
use of mode is while
imputing missing values of
a character variable ; when
we have number of missing
values in say, region
variable then it’s general
tendency to replace these
missing values with most
frequently occurring region
i.e. mode of region
Introduction with example
Percentile :
• It represents a percentage position in a list of data
• For example, the 20th percentile is the value below which 20% of the observations
may be found
• Let’s consider the 25th percentile for the 8 numbers in Table 1. Notice the numbers
are given ranks ranging from 1 for the lowest number to 8 for the highest number. Thus
the numbers are sorted in ascending order for ranking
• Step 1: Compute the rank (R) of the 25th percentile. This is done using the following
formula:
• R = P/100 x (N + 1)
• where P is the desired percentile (25 in this case) and N is the number of numbers (8
in this case). Therefore,
• R = 25/100 x (8 + 1) = 9/4 = 2.25
Number Rank
3
5
7
8
9
11
13
15
1
2
3
4
5
6
7
8
Introduction
with example
 Percentile :
 Step 2: If R is an integer, the Pth percentile is the number
with rank R; But when R is not an integer, we compute
the Pth percentile as follows :
 Define IR as the integer portion of R (the number
to the left of the decimal point). For this example,
IR = 2
 Define FR as the fractional portion of R. For this
example, FR = 0.25
 Find the scores with Rank IR and with Rank IR + 1.
For this example, this means the score with Rank 2
and the score with Rank 3. The scores are 5 and 7
 Multiply the difference between the scores by FR
and add the result to the lower score. For these
data, this is (0.25)(7 - 5) + 5 = 5.5
 Therefore, the 25th percentile is 5.5
Introduction with
example
 Quartile :
 Quartiles are specific percentiles which divide
the dataset into four equal parts
 First Quartile = Q1 = 25th Percentile ;
 Second Quartile = Q2 = 50th Percentile =
Median
 Third Quartile = Q3 = 75th Percentile
 For instance, if lower quartile is at Income =
100k then bottom 25% have income <=100k.
If median i.e. second quartile has Income =
200k then bottom 50% population has income
<=200k and so on
Introduction with example
 Standard deviation and Variance:
 Both are the popular measures of how spread out the data points are from a
center value mean
 For example, let’s find the standard deviation of the following data: 1,2,2,4,6
1. Calculate the mean of data: 15/5 = 3
2. Subtract the mean from each data value: -2, -1, -1, 1, 3
3. Square each of the new data value: 4,1,1,1,9
4. Sum these squared data values: 16
5. Divide this sum by (number of observations -1): 16 / (5-1) = 4
6. This number is Variance and Square root of this number is standard
deviation: Sqrt (4) = 2
 For instance, standard deviations of price data are frequently used as a
measure of volatility; While monitoring some industrial process , if process
indicators go beyond design standards then it my be troublesome hence
variance/standard deviation can be used in such cases
Introduction with
example
Skewness:
It is a measure of symmetry. A dataset
is symmetric if it looks the same to the
left and right of the center point
If skewness < −1 or greater than > 1,
the distribution is highly skewed
If skewness is between −1 and − 0.5 or
between 0.5 and +1, the distribution is
moderately skewed
If skewness is between −0.5 and + 0.5,
the distribution is approximately
symmetric
Introduction with example
Skewness Calculation Formula:
• Where:
n = Number of observations
s = Standard deviation
S = Skewness
Xi = Ith observation
X avg = Mean of observations
Introduction with
example
Histogram:
• It is a graphical display where the data is
grouped into buckets and then plotted as bars
• For example, a price by volume chart
shown below is a common histogram that
shows how many shares of a stock traded at a
given price range
• Here share price is converted into 12 bins
and counts of shares traded for each price
range is plotted as bars
Introduction
with example
 The histogram is an effective graphical technique for showing the
Skewness and kurtosis of dataset
 For example, to quickly check whether the data follows normal
distribution or not, before applying any predictive algorithm,
which requires data to follow normality, Kurtosis can be looked at
and transformation can be applied on data if necessary, to
achieve normality
 If the bulk of the data is at the left and the right tail is longer, we
say that the distribution is skewed right or positively skewed
 If the peak is toward the right and the left tail is longer, we say
that the distribution is skewed left or negatively skewed
 In negatively (left) skewed data, mean will always be < median
and mode, whereas in positively (right) skewed data, mean will
always be > median and mode
 For example, casual equity investors look at the chart of a stock's
price and try to make investments in companies that have a
positive skew. The idea is to invest in a company with a long tail,
which in the equity markets is a stock price that is greatly skewed
positively
Introduction with
example
 Box plot:
 It is a standardized way of displaying the
distribution of data based on the five-number
summary: minimum, first quartile, median,
third quartile, and maximum
 The central rectangle spans the first quartile
to the third quartile (the interquartile range or
IQR). A segment inside the rectangle shows
the median and "whiskers" above and below
the box show the locations of the minimum
and maximum
 For example, for the quartiles Q1, Q2 and Q3
with values 4.5, 7 and 11.5 respectively and
minimum=0.5, maximum=22, Box plot can be
drawn as shown in right
Use cases
Business use
cases – In
general
Descriptive Statistics as the name implies
describes or summarizes the raw data and
makes it interpretable by humans
Common examples of descriptive analytics are
reports that provide historical insights regarding
the company’s production, financials,
operations, sales, finance, inventory and
customers
Thus, Descriptive Statistics is to be used when
you need to understand at an aggregate level
what is going on in your company, and when
you want to summarize and describe different
aspects of your business
Business use cases – Mode
Business benefit :
• By identifying mode of a name of Dish
purchased , restaurant owner will become aware
of the most popular dish and will be able to
decide the prizing of that dish accordingly
• By identifying most rated movie or restaurant of
the month, news publishers or other researchers
can broadcast such piece of information or a
market researcher can provide this information
to prospective restaurant owners who are
currently surveying the market condition before
launching
• By knowing the most frequently bought product
category/size , stock inventory can be better
managed
Business problem : Identify the
most popular dish served in the
restaurant or find out the most
frequent rating given by customers
for a given movie/ restaurant or most
frequent size or category of a sold
product etc.
Business use cases – Mean/Median
Business benefit :
•By identifying mean/median income of
this segment, targeted marketing can
be done to this segment in order to
improve ROI and sales revenue
•However, median is a better measure
than mean in order to get the accurate
picture ; for instance, if couple of users
have some extreme income values, it
will affect overall average in case of
mean
Business problem : Find out
the average age and income of
particular type of product
category purchased
Business use cases – Percentile
Business benefit :
•By checking the credit score
distribution, he will be able to know
how many % of applicants fall in top
10 percentiles and can estimate the
total number of eligible loan
applicants based on the bank’s set
criteria for loan eligibility in terms of
credit score
•This high number of delinquencies and
defaulters can be avoided by taking
informative decision on whom to give
loan using such statistical measures
Business problem : A bank’s
loan manager needs to find out
the percentile distribution of
credit score of the loan
applicants
Business use cases – Quartiles Interquartile Range +
Boxplot
Business benefit :
•By checking Q1, Q3 and
Inter quartile range(Q3-
Q1) values of each step
of the process, we can
come to know which
particular step has a
scope of time reduction
Business problem
: A business owner
wants to reduce the
business process
cycle time
For instance, in the box plot above, we can observe that steps of preliminary analysis, database research,
Evaluation, Record keeping and follow up have high Inter quartile range( box height indicating Inter Quartile
Range) , making them the steps of further inspection with follow up step being the primary concern owing to its
highest inter quartile range ( Box height)
Business use cases – Standard deviation/variance
Business benefit :
• By analyzing the standard deviation or variance,
one can measure the risk associated with a
particular stock in terms of price fluctuations
• If these measures are relatively low, the proper
estimate of future pricing as well as expected
volatility can be made
• Thus standard deviation/variance tells us how
much the stock price or fund's return is deviating
from the expected normal price or returns and
therefore is used by investors as a gauge for the
amount of expected volatility
• However, one would have to divide the standard
deviation by the closing price to directly
compare volatility of two stocks in order to make
valid comparison
Business problem : A stock broker
wants to analyze the price volatility of
a stock as a measure of risk
Business use cases –
Standard
deviation/variance
For instance, lets compute the standard deviation for 10 days
stock prices shown in table shown :
• Calculate the average (mean) price for the number of
periods or observations
• Determine each period's deviation ( price – mean )
• Square each period's deviation
• Sum the squared deviations
• Divide this sum by the number of observations – this is
variance
• The standard deviation is then equal to the square root of
this number
Lower this number, lesser the volatility in price, easier to
estimate future price of the stock
Business use cases –
Skewness + Histogram
 Business problem : A quality control manager of a
company producing elevator rails needs to know which
machine is ideal to produce rails
 Business benefit :
 If required diameter for an elevator rail is 3 inches,
you can conclude from the image in right that
machine A is producing elevator rails that are too
narrow, whereas machine B is producing elevator
rails that are too wide
 Hence, both machines are failing to produce the
required diameter of elevator rails
 However, Machine C is producing the right diameter
most of the time on an average making it ideal
choice for production of elevator rails with diameter
=3 inch
Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018

More Related Content

What's hot

What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Ent...
What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Ent...What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Ent...
What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Ent...Smarten Augmented Analytics
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?Smarten Augmented Analytics
 
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...Smarten Augmented Analytics
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenSmarten Augmented Analytics
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using RANURAG SINGH
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?Smarten Augmented Analytics
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAbhishek Singh
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...Smarten Augmented Analytics
 
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...PAPIs.io
 
Regression analysis made easy
Regression analysis made easyRegression analysis made easy
Regression analysis made easyWeam Banjar
 
Scope and objective of the assignment
Scope and objective of the assignmentScope and objective of the assignment
Scope and objective of the assignmentGourab Chakraborty
 
Basics of statistical notation
Basics of statistical notationBasics of statistical notation
Basics of statistical notationmocia76
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
 

What's hot (20)

What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Ent...
What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Ent...What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Ent...
What is the Holt-Winters Forecasting Algorithm and How Can it be Used for Ent...
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
What is the KMeans Clustering Algorithm and How Does an Enterprise Use it to ...
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
What is ARIMAX Forecasting and How is it Used for Enterprise Analysis?
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
 
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
Machine Learning Performance Evaluation: Tips and Pitfalls - Jose Hernandez O...
 
Regression analysis made easy
Regression analysis made easyRegression analysis made easy
Regression analysis made easy
 
Scope and objective of the assignment
Scope and objective of the assignmentScope and objective of the assignment
Scope and objective of the assignment
 
Basics of statistical notation
Basics of statistical notationBasics of statistical notation
Basics of statistical notation
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 

Similar to What is Descriptive Statistics and How Do You Choose the Right One for Enterprise Analysis?

Lesson 7 measures of dispersion part 1
Lesson 7 measures of dispersion part 1Lesson 7 measures of dispersion part 1
Lesson 7 measures of dispersion part 1nurun2010
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAnand Thokal
 
Statistics
StatisticsStatistics
Statisticspikuoec
 
Numerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsNumerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsBabasab Patil
 
Chap 04 - Describing Data_Displaying and Exploring Data.pdf
Chap 04 - Describing Data_Displaying and Exploring Data.pdfChap 04 - Describing Data_Displaying and Exploring Data.pdf
Chap 04 - Describing Data_Displaying and Exploring Data.pdfSouravHasan8
 
1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdfthaersyam
 
Statistics for machine learning shifa noorulain
Statistics for machine learning   shifa noorulainStatistics for machine learning   shifa noorulain
Statistics for machine learning shifa noorulainShifaNoorUlAin1
 
Business statistics (Basics)
Business statistics (Basics)Business statistics (Basics)
Business statistics (Basics)AhmedToheed3
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxMuhammadNafees42
 
Chapter 03
Chapter 03Chapter 03
Chapter 03bmcfad01
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxAnusuya123
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxIndhuGreen
 
Statistics final seminar
Statistics final seminarStatistics final seminar
Statistics final seminarTejas Jagtap
 
Data Management_new.pptx
Data Management_new.pptxData Management_new.pptx
Data Management_new.pptxDharenOla3
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data scienceANURAG SINGH
 
Lecture 01_What is Satistics.pptx
Lecture 01_What is Satistics.pptxLecture 01_What is Satistics.pptx
Lecture 01_What is Satistics.pptxFazleRabby74
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptxCalvinAdorDionisio
 
2. chapter ii(analyz)
2. chapter ii(analyz)2. chapter ii(analyz)
2. chapter ii(analyz)Chhom Karath
 

Similar to What is Descriptive Statistics and How Do You Choose the Right One for Enterprise Analysis? (20)

Introduction to Descriptive Statistics
Introduction to Descriptive StatisticsIntroduction to Descriptive Statistics
Introduction to Descriptive Statistics
 
Lesson 7 measures of dispersion part 1
Lesson 7 measures of dispersion part 1Lesson 7 measures of dispersion part 1
Lesson 7 measures of dispersion part 1
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Statistics
StatisticsStatistics
Statistics
 
Statistics.pdf
Statistics.pdfStatistics.pdf
Statistics.pdf
 
Numerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsNumerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec doms
 
Chap 04 - Describing Data_Displaying and Exploring Data.pdf
Chap 04 - Describing Data_Displaying and Exploring Data.pdfChap 04 - Describing Data_Displaying and Exploring Data.pdf
Chap 04 - Describing Data_Displaying and Exploring Data.pdf
 
1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf1.0 Descriptive statistics.pdf
1.0 Descriptive statistics.pdf
 
Statistics for machine learning shifa noorulain
Statistics for machine learning   shifa noorulainStatistics for machine learning   shifa noorulain
Statistics for machine learning shifa noorulain
 
Business statistics (Basics)
Business statistics (Basics)Business statistics (Basics)
Business statistics (Basics)
 
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptxSTATISTICAL PROCEDURES (Discriptive Statistics).pptx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
 
Chapter 03
Chapter 03Chapter 03
Chapter 03
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptx
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Statistics final seminar
Statistics final seminarStatistics final seminar
Statistics final seminar
 
Data Management_new.pptx
Data Management_new.pptxData Management_new.pptx
Data Management_new.pptx
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
 
Lecture 01_What is Satistics.pptx
Lecture 01_What is Satistics.pptxLecture 01_What is Satistics.pptx
Lecture 01_What is Satistics.pptx
 
Measure of Variability Report.pptx
Measure of Variability Report.pptxMeasure of Variability Report.pptx
Measure of Variability Report.pptx
 
2. chapter ii(analyz)
2. chapter ii(analyz)2. chapter ii(analyz)
2. chapter ii(analyz)
 

More from Smarten Augmented Analytics

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenSmarten Augmented Analytics
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...Smarten Augmented Analytics
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...Smarten Augmented Analytics
 
What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?Smarten Augmented Analytics
 
What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?Smarten Augmented Analytics
 

More from Smarten Augmented Analytics (13)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
 
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...
What is Karl Pearson Correlation Analysis and How Can it be Used for Enterpri...
 
What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?What is SVM Classification Analysis and How Can It Benefit Business Analytics?
What is SVM Classification Analysis and How Can It Benefit Business Analytics?
 
What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?What is Outlier Analysis and How Can It Improve Analysis?
What is Outlier Analysis and How Can It Improve Analysis?
 

Recently uploaded

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

Recently uploaded (20)

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

What is Descriptive Statistics and How Do You Choose the Right One for Enterprise Analysis?

  • 1. Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
  • 3. List Of Statistical Summary Measures And Plots
  • 4. List of statistical summary measures and plots Mean, Median, Mode Percentile, Quartile , Inter Quartile Range Skewness Variance , Standard deviation Measures Box plot and Histogram plot Plots
  • 6. Introduction with example Mean : • It is simply the average of all the data values • This measure can be biased in case of significant number of outliers present in data • Descriptive statistics help describe and understand the features of a specific dataset, by giving short summaries about the measures of the data. The most recognized types of descriptive statistics are listed and explained below • Mean, median, and mode are different ways to figure out an average Median : • It is the value in the middle when the data items are arranged in ascending order • This measure is relatively robust in case of significant number of outliers present in data making it more appropriate measure of average in case of presence of outliers in data • For instance, when profiling customers based on various attributes such as income or balances, their median age/income/balance etc. can be looked at instead of mean to avoid bias due to outliers Mode : • It is the most frequently occurring value in a series of data • In case of no repeating values, there would be no mode • For example, in satisfaction survey analysis, mode can be used to find what is the most common rating provided by responders to a particular service/product • The second most popular use of mode is while imputing missing values of a character variable ; when we have number of missing values in say, region variable then it’s general tendency to replace these missing values with most frequently occurring region i.e. mode of region
  • 7. Introduction with example Percentile : • It represents a percentage position in a list of data • For example, the 20th percentile is the value below which 20% of the observations may be found • Let’s consider the 25th percentile for the 8 numbers in Table 1. Notice the numbers are given ranks ranging from 1 for the lowest number to 8 for the highest number. Thus the numbers are sorted in ascending order for ranking • Step 1: Compute the rank (R) of the 25th percentile. This is done using the following formula: • R = P/100 x (N + 1) • where P is the desired percentile (25 in this case) and N is the number of numbers (8 in this case). Therefore, • R = 25/100 x (8 + 1) = 9/4 = 2.25 Number Rank 3 5 7 8 9 11 13 15 1 2 3 4 5 6 7 8
  • 8. Introduction with example  Percentile :  Step 2: If R is an integer, the Pth percentile is the number with rank R; But when R is not an integer, we compute the Pth percentile as follows :  Define IR as the integer portion of R (the number to the left of the decimal point). For this example, IR = 2  Define FR as the fractional portion of R. For this example, FR = 0.25  Find the scores with Rank IR and with Rank IR + 1. For this example, this means the score with Rank 2 and the score with Rank 3. The scores are 5 and 7  Multiply the difference between the scores by FR and add the result to the lower score. For these data, this is (0.25)(7 - 5) + 5 = 5.5  Therefore, the 25th percentile is 5.5
  • 9. Introduction with example  Quartile :  Quartiles are specific percentiles which divide the dataset into four equal parts  First Quartile = Q1 = 25th Percentile ;  Second Quartile = Q2 = 50th Percentile = Median  Third Quartile = Q3 = 75th Percentile  For instance, if lower quartile is at Income = 100k then bottom 25% have income <=100k. If median i.e. second quartile has Income = 200k then bottom 50% population has income <=200k and so on
  • 10. Introduction with example  Standard deviation and Variance:  Both are the popular measures of how spread out the data points are from a center value mean  For example, let’s find the standard deviation of the following data: 1,2,2,4,6 1. Calculate the mean of data: 15/5 = 3 2. Subtract the mean from each data value: -2, -1, -1, 1, 3 3. Square each of the new data value: 4,1,1,1,9 4. Sum these squared data values: 16 5. Divide this sum by (number of observations -1): 16 / (5-1) = 4 6. This number is Variance and Square root of this number is standard deviation: Sqrt (4) = 2  For instance, standard deviations of price data are frequently used as a measure of volatility; While monitoring some industrial process , if process indicators go beyond design standards then it my be troublesome hence variance/standard deviation can be used in such cases
  • 11. Introduction with example Skewness: It is a measure of symmetry. A dataset is symmetric if it looks the same to the left and right of the center point If skewness < −1 or greater than > 1, the distribution is highly skewed If skewness is between −1 and − 0.5 or between 0.5 and +1, the distribution is moderately skewed If skewness is between −0.5 and + 0.5, the distribution is approximately symmetric
  • 12. Introduction with example Skewness Calculation Formula: • Where: n = Number of observations s = Standard deviation S = Skewness Xi = Ith observation X avg = Mean of observations
  • 13. Introduction with example Histogram: • It is a graphical display where the data is grouped into buckets and then plotted as bars • For example, a price by volume chart shown below is a common histogram that shows how many shares of a stock traded at a given price range • Here share price is converted into 12 bins and counts of shares traded for each price range is plotted as bars
  • 14. Introduction with example  The histogram is an effective graphical technique for showing the Skewness and kurtosis of dataset  For example, to quickly check whether the data follows normal distribution or not, before applying any predictive algorithm, which requires data to follow normality, Kurtosis can be looked at and transformation can be applied on data if necessary, to achieve normality  If the bulk of the data is at the left and the right tail is longer, we say that the distribution is skewed right or positively skewed  If the peak is toward the right and the left tail is longer, we say that the distribution is skewed left or negatively skewed  In negatively (left) skewed data, mean will always be < median and mode, whereas in positively (right) skewed data, mean will always be > median and mode  For example, casual equity investors look at the chart of a stock's price and try to make investments in companies that have a positive skew. The idea is to invest in a company with a long tail, which in the equity markets is a stock price that is greatly skewed positively
  • 15. Introduction with example  Box plot:  It is a standardized way of displaying the distribution of data based on the five-number summary: minimum, first quartile, median, third quartile, and maximum  The central rectangle spans the first quartile to the third quartile (the interquartile range or IQR). A segment inside the rectangle shows the median and "whiskers" above and below the box show the locations of the minimum and maximum  For example, for the quartiles Q1, Q2 and Q3 with values 4.5, 7 and 11.5 respectively and minimum=0.5, maximum=22, Box plot can be drawn as shown in right
  • 17. Business use cases – In general Descriptive Statistics as the name implies describes or summarizes the raw data and makes it interpretable by humans Common examples of descriptive analytics are reports that provide historical insights regarding the company’s production, financials, operations, sales, finance, inventory and customers Thus, Descriptive Statistics is to be used when you need to understand at an aggregate level what is going on in your company, and when you want to summarize and describe different aspects of your business
  • 18. Business use cases – Mode Business benefit : • By identifying mode of a name of Dish purchased , restaurant owner will become aware of the most popular dish and will be able to decide the prizing of that dish accordingly • By identifying most rated movie or restaurant of the month, news publishers or other researchers can broadcast such piece of information or a market researcher can provide this information to prospective restaurant owners who are currently surveying the market condition before launching • By knowing the most frequently bought product category/size , stock inventory can be better managed Business problem : Identify the most popular dish served in the restaurant or find out the most frequent rating given by customers for a given movie/ restaurant or most frequent size or category of a sold product etc.
  • 19. Business use cases – Mean/Median Business benefit : •By identifying mean/median income of this segment, targeted marketing can be done to this segment in order to improve ROI and sales revenue •However, median is a better measure than mean in order to get the accurate picture ; for instance, if couple of users have some extreme income values, it will affect overall average in case of mean Business problem : Find out the average age and income of particular type of product category purchased
  • 20. Business use cases – Percentile Business benefit : •By checking the credit score distribution, he will be able to know how many % of applicants fall in top 10 percentiles and can estimate the total number of eligible loan applicants based on the bank’s set criteria for loan eligibility in terms of credit score •This high number of delinquencies and defaulters can be avoided by taking informative decision on whom to give loan using such statistical measures Business problem : A bank’s loan manager needs to find out the percentile distribution of credit score of the loan applicants
  • 21. Business use cases – Quartiles Interquartile Range + Boxplot Business benefit : •By checking Q1, Q3 and Inter quartile range(Q3- Q1) values of each step of the process, we can come to know which particular step has a scope of time reduction Business problem : A business owner wants to reduce the business process cycle time For instance, in the box plot above, we can observe that steps of preliminary analysis, database research, Evaluation, Record keeping and follow up have high Inter quartile range( box height indicating Inter Quartile Range) , making them the steps of further inspection with follow up step being the primary concern owing to its highest inter quartile range ( Box height)
  • 22. Business use cases – Standard deviation/variance Business benefit : • By analyzing the standard deviation or variance, one can measure the risk associated with a particular stock in terms of price fluctuations • If these measures are relatively low, the proper estimate of future pricing as well as expected volatility can be made • Thus standard deviation/variance tells us how much the stock price or fund's return is deviating from the expected normal price or returns and therefore is used by investors as a gauge for the amount of expected volatility • However, one would have to divide the standard deviation by the closing price to directly compare volatility of two stocks in order to make valid comparison Business problem : A stock broker wants to analyze the price volatility of a stock as a measure of risk
  • 23. Business use cases – Standard deviation/variance For instance, lets compute the standard deviation for 10 days stock prices shown in table shown : • Calculate the average (mean) price for the number of periods or observations • Determine each period's deviation ( price – mean ) • Square each period's deviation • Sum the squared deviations • Divide this sum by the number of observations – this is variance • The standard deviation is then equal to the square root of this number Lower this number, lesser the volatility in price, easier to estimate future price of the stock
  • 24. Business use cases – Skewness + Histogram  Business problem : A quality control manager of a company producing elevator rails needs to know which machine is ideal to produce rails  Business benefit :  If required diameter for an elevator rail is 3 inches, you can conclude from the image in right that machine A is producing elevator rails that are too narrow, whereas machine B is producing elevator rails that are too wide  Hence, both machines are failing to produce the required diameter of elevator rails  However, Machine C is producing the right diameter most of the time on an average making it ideal choice for production of elevator rails with diameter =3 inch
  • 25. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com June 2018