SlideShare a Scribd company logo
Statistics
Mean, Median, Mode, Standard
Deviation, Normal and Sampling
Distribution, and Z-Score
Portland Data Science Group
Created by Andrew Ferlitsch
Community Outreach Officer
July, 2017
Mean
• The mean is the average of a set of samples or a
population distribution.
Sum (add) up all the samples
Example:
Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 }
1 + 2 + 2.5 + 2.5 + 3 + 3 + 3.5
7
µ = 2.5
1
𝑛
𝑖=0
𝑛
𝑥𝑖
Divide the summation by the number of samples
µ =
Symbol for mean (mu)
Median
• The median is the mid-point in a sorted (frequency) distribution of
samples (population).
• Odd Number of Samples – is the sample at the midpoint (center)
• Even Number of Samples – is the average of the two samples at
the midpoint (center)
Seven Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 }
= 2.5
midpoint
Eight Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5, 4 }
= ( 2.5 + 3 ) / 2 = 2.75
midpoint
Symbol for median
Discrete vs. Continuous
• The values of a population can be classified as either discrete or
continuous values.
• Discrete – the values in a sample (population) are discrete if the
selected values are from a finite set of values. Examples, a fix set
of values for a categorical variable (US States), or a finite set of
numbers (person’s age in years as whole numbers).
• Continuous – the values in a sample (population) are continuous
if the selected values are from an infinite set of values. Examples,
an infinite number of real values (dollar value in checking account,
or a person’s age as a real number [not rounded]).
Ex., Age = 0, 1, 2 … 99
Checking = { $1, $10, $1046.37, $2,000,300.12, etc … }
Mode
• The mode is the value that occurs must frequently in a set of
samples (population distribution).
On a bar chart, it is the tallest bar.
• For discrete samples, it is the value that occurs most frequently.
• For continuous samples, it is the range that occurs must frequently,
where the values are grouped into ranges.
Samples = { 1, 2, 2, 2, 3, 3, 4, 5, 7 }
Discrete values that occur most frequent
Mode
Steps:
1. Select a Range Size (e.g., 10)
2. Partition the samples into sequential steps of the range (e.g., 10, 20, 30)
3. Assign each sample to a range that it is within.
4. Select the range with the largest number of samples.
Standard Deviation
• The standard deviation is a measure that is used to quantify the
amount of variation or dispersion of a set of samples (population).
1
𝑛
𝑖
𝑛
µ − 𝑥𝑖 2σ =
Symbol for standard deviation (sigma)
Sum (add) up the squared difference between the mean and each sample
Divide the summation by the number of samples
Example:
Seven Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 } , µ = 2.5
1
7
𝑖
𝑛
(2.5 – 1)2 + (2.5 – 2)2 + (2.5 – 2.5)2 + (2.5 – 2.5)2 + (2.5 – 3)2 + (2.5 – 3)2 + (2.5 – 3.5)2
1
7
𝑖
𝑛
2.25 + 0.25 + 0 + 0 + 0.25 + 0.25 + 1
1
7
∗ 4= = 4
7
= 0.87
Normal Distribution
• The normal (Gaussian) distribution is a distribution that is
used in probability for the expected random distribution of samples
in a population.
• Based on distributions on natural occurring things.
• 68% of the samples should be within 1 standard deviation of the mean.
• 95% of the samples should be within 2 standard deviations of the mean.
• 99.8% of the samples should be within 3 standard deviations of the mean.
Population vs. Sample
Population
Random Sample
Distribution
µ (mean)
σ (std. dev)
N (size)
Can be any distribution
Parameters
Probability
x̅ (mean)
s (std. dev)
n (size)
Can calculate probability of
sample is in population, when
population is known.
Statistic
Sampling Distribution
Population
Random Samples
( , , , … )
Sampling Distribution
µ = µ (mean)
σ =
σ
𝑛
(std. dev)
A collection of randomly chosen samples
in a population is called a sampling
distribution.
x̅
x̅
x̅
x̅
Each sample has a mean
x̅ x̅ x̅
Plot of Sample Means
Central Limit Theorem
As the number of samples increase,
plot of the sample means will
approach a normal distribution
The mean of a
sampling distribution
will approach the
mean of the
population.
x̅
x̅
Central limit theorem only specifies that the central part of a distribution of
averages will approach a normal distribution as the number of trials goes to infinity.
Z-Score
• The Z-Score is the same as the standard deviation from the mean
in a normal distribution.
Z-Score = 2Z-Score = -2
Arbitrary Z-score (e.g., 1.5)
Z =
(x̅ − µ )
σx̅
µ
Standard Normal Probabilities
• The Probability that a Z-Score for a sample will fall within the area
of a normal distribution can be looked up in the Standard Normal
Probabilities Table - http://www.stat.ufl.edu/~athienit/Tables/Ztable.pdf
50% Probability that Sample falls into the area of the distribution
µ
Probability of Sample falling within area of distribution increases with the std. deviation
Robot Example
• Warehouse of Boxes: Mean Weight of 50 lbs, Standard Deviation of 10 lbs.
• Pallet of Boxes: Need to move pallet of 10 boxes of unknown weight.
• Robot: Has lift limit of 560 lbs.
• Question: What is the probability the Robot can lift this pallet.
Population
Weight Distribution of Boxes
µ (mean) = 50 lbs
σ (std. dev) = 10 lbs
Pallet of 10 Boxes
Weight of Boxes Unknown
µ = µ (mean) = 50
σ =
σ
𝑛
(std. dev) = 10 / 𝟏𝟎 = 3.16
Calculate
Std. Dev.
of Pallet
max = 560 lbs / 10 boxes = 56
x̅
x̅
X̅
Z =
(x̅max − µ )
σ
x̅
Maximum mean weight of
10 boxes robot can lift.
=
𝟔
𝟑.𝟏𝟔
= 1.9Standard Normal Probability of 1.9 = 97.13 %
Null Hypothesis
• The Null Hypothesis H0 is the opposite of what one is trying to prove.
H0 = The mean price of a transaction has increased (e.g., µ > $25)
H1 = The mean price of a transaction has not increase (e.g., µ ≤ $25)
• To Prove the Alternate Hypothesis H1 :
• Disprove the Null Hypothesis
• Within a Level of Statistical Significance
• Example: Transaction History has µ = $25 with σ = $5
Transaction Sample has x̅ = $26.50
σ =
σ
𝑛
= 5 / 𝟏𝟎 = 1.58x̅
Z =
(x̅max − µ )
σ
=
𝟐𝟔.𝟓 −𝟐𝟓
𝟏.𝟓𝟖
= 0.95
x̅
Calculate Std. Dev. of
Transaction
Z-Score of Transaction
Standard Normal Probability of 0.95 = 82.18 %
Confidence
Level
Transaction Sample Size = 10
σ =
σ
𝑛
= 5 / 𝟏𝟎𝟎 = 0.5x̅
Z =
(x̅max − µ )
σ
=
𝟐𝟔.𝟓 −𝟐𝟓
𝟎.𝟓
= 3
x̅
Standard Normal Probability of 3 = 99.87 %
Transaction Sample Size = 100
i.e., nothing changed
Box (and Whisker) Plot
• A method used to visualize the spread of data.
• Split the data into quartiles (quarters).
• A box is drawn around the middle two quartiles (1st and 3rd)
• The whiskers are drawn at the end points.
0
Data Values
(x) 2nd quartile (median)
1st quartile (median of lower half)
3rd quartile (median of upper half)
Box
(IQR)
Lowest value
Highest valueWhisker
Whisker
1. Calculate the median
of the entire dataset,
Split the dataset into halves.
2. Calculate the median
of the top and lower half
of the dataset, splitting them
Into quarters.
Box (and Whisker) Plot - Outliers
• A variation of a box plot to show outliers.
• The whiskers are replaced with an inner and outer fence at
1.5 x IQR (inner) and 3 x IQR (outer).
• Values between 1.5 and 3 IQR are suspected outliers (white).
• Values outside of 3 IQR are outliers (black).
0
Data Values
(x)
Inner Fence (1.5 IQR)
Box
(IQR)
Inner Fence (1.5 IQR)
Outer Fence (3 IQR)
Outlier
Suspected
Outliers
Outlier

More Related Content

What's hot

STATISTIC ESTIMATION
STATISTIC ESTIMATIONSTATISTIC ESTIMATION
STATISTIC ESTIMATION
Smruti Ranjan Parida
 
Basic statistics
Basic statisticsBasic statistics
Practice Test 1 solutions
Practice Test 1 solutions  Practice Test 1 solutions
Practice Test 1 solutions
Long Beach City College
 
Statistical parameters
Statistical parametersStatistical parameters
Statistical parameters
Burdwan University
 
Basic statistics for algorithmic trading
Basic statistics for algorithmic tradingBasic statistics for algorithmic trading
Basic statistics for algorithmic trading
QuantInsti
 
Measures of Central Tendency
Measures of Central TendencyMeasures of Central Tendency
Measures of Central Tendency
Nida Nafees
 
statistical estimation
statistical estimationstatistical estimation
statistical estimation
Amish Akbar
 
Measures of Variation
Measures of Variation Measures of Variation
Measures of Variation
Long Beach City College
 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate AnalysisSoumya Sahoo
 
Chap06 sampling and sampling distributions
Chap06 sampling and sampling distributionsChap06 sampling and sampling distributions
Chap06 sampling and sampling distributions
Judianto Nugroho
 
Powerpoint sampling distribution
Powerpoint sampling distributionPowerpoint sampling distribution
Powerpoint sampling distribution
Susan McCourt
 
Chapter 8
Chapter 8Chapter 8
Advanced statistics Lesson 1
Advanced statistics Lesson 1Advanced statistics Lesson 1
Advanced statistics Lesson 1
Cliffed Echavez
 
A Lecture on Sample Size and Statistical Inference for Health Researchers
A Lecture on Sample Size and Statistical Inference for Health ResearchersA Lecture on Sample Size and Statistical Inference for Health Researchers
A Lecture on Sample Size and Statistical Inference for Health Researchers
Dr Arindam Basu
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
CIToolkit
 
Sampling Distribution and Simulation in R
Sampling Distribution and Simulation in RSampling Distribution and Simulation in R
Sampling Distribution and Simulation in R
Premier Publishers
 
Sampling Distributions and Estimators
Sampling Distributions and EstimatorsSampling Distributions and Estimators
Sampling Distributions and Estimators
Long Beach City College
 
Ch4 Confidence Interval
Ch4 Confidence IntervalCh4 Confidence Interval
Ch4 Confidence Interval
Farhan Alfin
 
T test statistics
T test statisticsT test statistics
T test statistics
Mohammad Ihmeidan
 

What's hot (20)

STATISTIC ESTIMATION
STATISTIC ESTIMATIONSTATISTIC ESTIMATION
STATISTIC ESTIMATION
 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
 
Practice Test 1 solutions
Practice Test 1 solutions  Practice Test 1 solutions
Practice Test 1 solutions
 
Hypo
HypoHypo
Hypo
 
Statistical parameters
Statistical parametersStatistical parameters
Statistical parameters
 
Basic statistics for algorithmic trading
Basic statistics for algorithmic tradingBasic statistics for algorithmic trading
Basic statistics for algorithmic trading
 
Measures of Central Tendency
Measures of Central TendencyMeasures of Central Tendency
Measures of Central Tendency
 
statistical estimation
statistical estimationstatistical estimation
statistical estimation
 
Measures of Variation
Measures of Variation Measures of Variation
Measures of Variation
 
Univariate Analysis
 Univariate Analysis Univariate Analysis
Univariate Analysis
 
Chap06 sampling and sampling distributions
Chap06 sampling and sampling distributionsChap06 sampling and sampling distributions
Chap06 sampling and sampling distributions
 
Powerpoint sampling distribution
Powerpoint sampling distributionPowerpoint sampling distribution
Powerpoint sampling distribution
 
Chapter 8
Chapter 8Chapter 8
Chapter 8
 
Advanced statistics Lesson 1
Advanced statistics Lesson 1Advanced statistics Lesson 1
Advanced statistics Lesson 1
 
A Lecture on Sample Size and Statistical Inference for Health Researchers
A Lecture on Sample Size and Statistical Inference for Health ResearchersA Lecture on Sample Size and Statistical Inference for Health Researchers
A Lecture on Sample Size and Statistical Inference for Health Researchers
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
Sampling Distribution and Simulation in R
Sampling Distribution and Simulation in RSampling Distribution and Simulation in R
Sampling Distribution and Simulation in R
 
Sampling Distributions and Estimators
Sampling Distributions and EstimatorsSampling Distributions and Estimators
Sampling Distributions and Estimators
 
Ch4 Confidence Interval
Ch4 Confidence IntervalCh4 Confidence Interval
Ch4 Confidence Interval
 
T test statistics
T test statisticsT test statistics
T test statistics
 

Similar to Statistics - Basics

Chapter one on sampling distributions.ppt
Chapter one on sampling distributions.pptChapter one on sampling distributions.ppt
Chapter one on sampling distributions.ppt
FekaduAman
 
presentation_statistics_1448025870_153985.ppt
presentation_statistics_1448025870_153985.pptpresentation_statistics_1448025870_153985.ppt
presentation_statistics_1448025870_153985.ppt
AKSAKS12
 
Chp11 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp11  - Research Methods for Business By Authors Uma Sekaran and Roger BougieChp11  - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp11 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Hassan Usman
 
estimation
estimationestimation
estimation
Mmedsc Hahm
 
Estimation
EstimationEstimation
Estimation
Mmedsc Hahm
 
Basic statistics 1
Basic statistics  1Basic statistics  1
Basic statistics 1
Kumar P
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
hktripathy
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
hktripathy
 
descriptive statistics.pptx
descriptive statistics.pptxdescriptive statistics.pptx
descriptive statistics.pptx
Teddyteddy53
 
Statistics 3, 4
Statistics 3, 4Statistics 3, 4
Statistics 3, 4
Diana Diana
 
Standard deviation and standard error
Standard deviation and standard errorStandard deviation and standard error
Standard deviation and standard error
Shahla Yasmin
 
Sampling distribution.pptx
Sampling distribution.pptxSampling distribution.pptx
Sampling distribution.pptx
ssusera0e0e9
 
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdfUnit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Ravinandan A P
 
Statistics by DURGESH JHARIYA OF jnv,bn,jbp
Statistics by DURGESH JHARIYA OF jnv,bn,jbpStatistics by DURGESH JHARIYA OF jnv,bn,jbp
Statistics by DURGESH JHARIYA OF jnv,bn,jbp
DJJNV
 
Chap 6
Chap 6Chap 6
Str t-test1
Str   t-test1Str   t-test1
Str t-test1iamkim
 
9618821.ppt
9618821.ppt9618821.ppt
9618821.ppt
UMAIRASHFAQ20
 
9618821.pdf
9618821.pdf9618821.pdf
9618821.pdf
UMAIRASHFAQ20
 
BA 3 Statistics.ppt
BA 3 Statistics.pptBA 3 Statistics.ppt
BA 3 Statistics.ppt
NimeshGandhey
 

Similar to Statistics - Basics (20)

Chapter one on sampling distributions.ppt
Chapter one on sampling distributions.pptChapter one on sampling distributions.ppt
Chapter one on sampling distributions.ppt
 
presentation_statistics_1448025870_153985.ppt
presentation_statistics_1448025870_153985.pptpresentation_statistics_1448025870_153985.ppt
presentation_statistics_1448025870_153985.ppt
 
Chp11 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp11  - Research Methods for Business By Authors Uma Sekaran and Roger BougieChp11  - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp11 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
 
estimation
estimationestimation
estimation
 
Estimation
EstimationEstimation
Estimation
 
Basic statistics 1
Basic statistics  1Basic statistics  1
Basic statistics 1
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
 
descriptive statistics.pptx
descriptive statistics.pptxdescriptive statistics.pptx
descriptive statistics.pptx
 
Statistics 3, 4
Statistics 3, 4Statistics 3, 4
Statistics 3, 4
 
Standard deviation and standard error
Standard deviation and standard errorStandard deviation and standard error
Standard deviation and standard error
 
Sampling distribution.pptx
Sampling distribution.pptxSampling distribution.pptx
Sampling distribution.pptx
 
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdfUnit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
Unit-I Measures of Dispersion- Biostatistics - Ravinandan A P.pdf
 
Statistics by DURGESH JHARIYA OF jnv,bn,jbp
Statistics by DURGESH JHARIYA OF jnv,bn,jbpStatistics by DURGESH JHARIYA OF jnv,bn,jbp
Statistics by DURGESH JHARIYA OF jnv,bn,jbp
 
Chap 6
Chap 6Chap 6
Chap 6
 
Str t-test1
Str   t-test1Str   t-test1
Str t-test1
 
9618821.ppt
9618821.ppt9618821.ppt
9618821.ppt
 
9618821.pdf
9618821.pdf9618821.pdf
9618821.pdf
 
BA 3 Statistics.ppt
BA 3 Statistics.pptBA 3 Statistics.ppt
BA 3 Statistics.ppt
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 

More from Andrew Ferlitsch

AI - Intelligent Agents
AI - Intelligent AgentsAI - Intelligent Agents
AI - Intelligent Agents
Andrew Ferlitsch
 
Pareto Principle Applied to QA
Pareto Principle Applied to QAPareto Principle Applied to QA
Pareto Principle Applied to QA
Andrew Ferlitsch
 
Whiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in PythonWhiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in Python
Andrew Ferlitsch
 
Object Oriented Programming Principles
Object Oriented Programming PrinciplesObject Oriented Programming Principles
Object Oriented Programming Principles
Andrew Ferlitsch
 
Python - OOP Programming
Python - OOP ProgrammingPython - OOP Programming
Python - OOP Programming
Andrew Ferlitsch
 
Python - Installing and Using Python and Jupyter Notepad
Python - Installing and Using Python and Jupyter NotepadPython - Installing and Using Python and Jupyter Notepad
Python - Installing and Using Python and Jupyter Notepad
Andrew Ferlitsch
 
Natural Language Processing - Groupings (Associations) Generation
Natural Language Processing - Groupings (Associations) GenerationNatural Language Processing - Groupings (Associations) Generation
Natural Language Processing - Groupings (Associations) Generation
Andrew Ferlitsch
 
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Andrew Ferlitsch
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural Networks
Andrew Ferlitsch
 
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksMachine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural Networks
Andrew Ferlitsch
 
Machine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural NetworksMachine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural Networks
Andrew Ferlitsch
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Andrew Ferlitsch
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
Andrew Ferlitsch
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
Andrew Ferlitsch
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
Andrew Ferlitsch
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear Regression
Andrew Ferlitsch
 
Machine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable ConversionMachine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable Conversion
Andrew Ferlitsch
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
Andrew Ferlitsch
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
Andrew Ferlitsch
 
Machine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowMachine Learning - Introduction to Tensorflow
Machine Learning - Introduction to Tensorflow
Andrew Ferlitsch
 

More from Andrew Ferlitsch (20)

AI - Intelligent Agents
AI - Intelligent AgentsAI - Intelligent Agents
AI - Intelligent Agents
 
Pareto Principle Applied to QA
Pareto Principle Applied to QAPareto Principle Applied to QA
Pareto Principle Applied to QA
 
Whiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in PythonWhiteboarding Coding Challenges in Python
Whiteboarding Coding Challenges in Python
 
Object Oriented Programming Principles
Object Oriented Programming PrinciplesObject Oriented Programming Principles
Object Oriented Programming Principles
 
Python - OOP Programming
Python - OOP ProgrammingPython - OOP Programming
Python - OOP Programming
 
Python - Installing and Using Python and Jupyter Notepad
Python - Installing and Using Python and Jupyter NotepadPython - Installing and Using Python and Jupyter Notepad
Python - Installing and Using Python and Jupyter Notepad
 
Natural Language Processing - Groupings (Associations) Generation
Natural Language Processing - Groupings (Associations) GenerationNatural Language Processing - Groupings (Associations) Generation
Natural Language Processing - Groupings (Associations) Generation
 
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural Networks
 
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksMachine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural Networks
 
Machine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural NetworksMachine Learning - Introduction to Neural Networks
Machine Learning - Introduction to Neural Networks
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear Regression
 
Machine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable ConversionMachine Learning - Dummy Variable Conversion
Machine Learning - Dummy Variable Conversion
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
Machine Learning - Dataset Preparation
Machine Learning - Dataset PreparationMachine Learning - Dataset Preparation
Machine Learning - Dataset Preparation
 
Machine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowMachine Learning - Introduction to Tensorflow
Machine Learning - Introduction to Tensorflow
 

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 

Statistics - Basics

  • 1. Statistics Mean, Median, Mode, Standard Deviation, Normal and Sampling Distribution, and Z-Score Portland Data Science Group Created by Andrew Ferlitsch Community Outreach Officer July, 2017
  • 2. Mean • The mean is the average of a set of samples or a population distribution. Sum (add) up all the samples Example: Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 } 1 + 2 + 2.5 + 2.5 + 3 + 3 + 3.5 7 µ = 2.5 1 𝑛 𝑖=0 𝑛 𝑥𝑖 Divide the summation by the number of samples µ = Symbol for mean (mu)
  • 3. Median • The median is the mid-point in a sorted (frequency) distribution of samples (population). • Odd Number of Samples – is the sample at the midpoint (center) • Even Number of Samples – is the average of the two samples at the midpoint (center) Seven Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 } = 2.5 midpoint Eight Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5, 4 } = ( 2.5 + 3 ) / 2 = 2.75 midpoint Symbol for median
  • 4. Discrete vs. Continuous • The values of a population can be classified as either discrete or continuous values. • Discrete – the values in a sample (population) are discrete if the selected values are from a finite set of values. Examples, a fix set of values for a categorical variable (US States), or a finite set of numbers (person’s age in years as whole numbers). • Continuous – the values in a sample (population) are continuous if the selected values are from an infinite set of values. Examples, an infinite number of real values (dollar value in checking account, or a person’s age as a real number [not rounded]). Ex., Age = 0, 1, 2 … 99 Checking = { $1, $10, $1046.37, $2,000,300.12, etc … }
  • 5. Mode • The mode is the value that occurs must frequently in a set of samples (population distribution). On a bar chart, it is the tallest bar. • For discrete samples, it is the value that occurs most frequently. • For continuous samples, it is the range that occurs must frequently, where the values are grouped into ranges. Samples = { 1, 2, 2, 2, 3, 3, 4, 5, 7 } Discrete values that occur most frequent Mode Steps: 1. Select a Range Size (e.g., 10) 2. Partition the samples into sequential steps of the range (e.g., 10, 20, 30) 3. Assign each sample to a range that it is within. 4. Select the range with the largest number of samples.
  • 6. Standard Deviation • The standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of samples (population). 1 𝑛 𝑖 𝑛 µ − 𝑥𝑖 2σ = Symbol for standard deviation (sigma) Sum (add) up the squared difference between the mean and each sample Divide the summation by the number of samples Example: Seven Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 } , µ = 2.5 1 7 𝑖 𝑛 (2.5 – 1)2 + (2.5 – 2)2 + (2.5 – 2.5)2 + (2.5 – 2.5)2 + (2.5 – 3)2 + (2.5 – 3)2 + (2.5 – 3.5)2 1 7 𝑖 𝑛 2.25 + 0.25 + 0 + 0 + 0.25 + 0.25 + 1 1 7 ∗ 4= = 4 7 = 0.87
  • 7. Normal Distribution • The normal (Gaussian) distribution is a distribution that is used in probability for the expected random distribution of samples in a population. • Based on distributions on natural occurring things. • 68% of the samples should be within 1 standard deviation of the mean. • 95% of the samples should be within 2 standard deviations of the mean. • 99.8% of the samples should be within 3 standard deviations of the mean.
  • 8. Population vs. Sample Population Random Sample Distribution µ (mean) σ (std. dev) N (size) Can be any distribution Parameters Probability x̅ (mean) s (std. dev) n (size) Can calculate probability of sample is in population, when population is known. Statistic
  • 9. Sampling Distribution Population Random Samples ( , , , … ) Sampling Distribution µ = µ (mean) σ = σ 𝑛 (std. dev) A collection of randomly chosen samples in a population is called a sampling distribution. x̅ x̅ x̅ x̅ Each sample has a mean x̅ x̅ x̅ Plot of Sample Means Central Limit Theorem As the number of samples increase, plot of the sample means will approach a normal distribution The mean of a sampling distribution will approach the mean of the population. x̅ x̅ Central limit theorem only specifies that the central part of a distribution of averages will approach a normal distribution as the number of trials goes to infinity.
  • 10. Z-Score • The Z-Score is the same as the standard deviation from the mean in a normal distribution. Z-Score = 2Z-Score = -2 Arbitrary Z-score (e.g., 1.5) Z = (x̅ − µ ) σx̅ µ
  • 11. Standard Normal Probabilities • The Probability that a Z-Score for a sample will fall within the area of a normal distribution can be looked up in the Standard Normal Probabilities Table - http://www.stat.ufl.edu/~athienit/Tables/Ztable.pdf 50% Probability that Sample falls into the area of the distribution µ Probability of Sample falling within area of distribution increases with the std. deviation
  • 12. Robot Example • Warehouse of Boxes: Mean Weight of 50 lbs, Standard Deviation of 10 lbs. • Pallet of Boxes: Need to move pallet of 10 boxes of unknown weight. • Robot: Has lift limit of 560 lbs. • Question: What is the probability the Robot can lift this pallet. Population Weight Distribution of Boxes µ (mean) = 50 lbs σ (std. dev) = 10 lbs Pallet of 10 Boxes Weight of Boxes Unknown µ = µ (mean) = 50 σ = σ 𝑛 (std. dev) = 10 / 𝟏𝟎 = 3.16 Calculate Std. Dev. of Pallet max = 560 lbs / 10 boxes = 56 x̅ x̅ X̅ Z = (x̅max − µ ) σ x̅ Maximum mean weight of 10 boxes robot can lift. = 𝟔 𝟑.𝟏𝟔 = 1.9Standard Normal Probability of 1.9 = 97.13 %
  • 13. Null Hypothesis • The Null Hypothesis H0 is the opposite of what one is trying to prove. H0 = The mean price of a transaction has increased (e.g., µ > $25) H1 = The mean price of a transaction has not increase (e.g., µ ≤ $25) • To Prove the Alternate Hypothesis H1 : • Disprove the Null Hypothesis • Within a Level of Statistical Significance • Example: Transaction History has µ = $25 with σ = $5 Transaction Sample has x̅ = $26.50 σ = σ 𝑛 = 5 / 𝟏𝟎 = 1.58x̅ Z = (x̅max − µ ) σ = 𝟐𝟔.𝟓 −𝟐𝟓 𝟏.𝟓𝟖 = 0.95 x̅ Calculate Std. Dev. of Transaction Z-Score of Transaction Standard Normal Probability of 0.95 = 82.18 % Confidence Level Transaction Sample Size = 10 σ = σ 𝑛 = 5 / 𝟏𝟎𝟎 = 0.5x̅ Z = (x̅max − µ ) σ = 𝟐𝟔.𝟓 −𝟐𝟓 𝟎.𝟓 = 3 x̅ Standard Normal Probability of 3 = 99.87 % Transaction Sample Size = 100 i.e., nothing changed
  • 14. Box (and Whisker) Plot • A method used to visualize the spread of data. • Split the data into quartiles (quarters). • A box is drawn around the middle two quartiles (1st and 3rd) • The whiskers are drawn at the end points. 0 Data Values (x) 2nd quartile (median) 1st quartile (median of lower half) 3rd quartile (median of upper half) Box (IQR) Lowest value Highest valueWhisker Whisker 1. Calculate the median of the entire dataset, Split the dataset into halves. 2. Calculate the median of the top and lower half of the dataset, splitting them Into quarters.
  • 15. Box (and Whisker) Plot - Outliers • A variation of a box plot to show outliers. • The whiskers are replaced with an inner and outer fence at 1.5 x IQR (inner) and 3 x IQR (outer). • Values between 1.5 and 3 IQR are suspected outliers (white). • Values outside of 3 IQR are outliers (black). 0 Data Values (x) Inner Fence (1.5 IQR) Box (IQR) Inner Fence (1.5 IQR) Outer Fence (3 IQR) Outlier Suspected Outliers Outlier