SlideShare a Scribd company logo
Data Science
Statistical Analysis : Estimation and Testing
By
Kumar P
Managerial Decisions
How many Programmers should I
staff for?
What is the right level of inventory
for our new product manufacturing
Where should we open our new
retail store?
What will be next year revenue?
Whether we are on right or wrong
track
How much should I invest in
advertising
Flow Diagram
Acknowledge Uncertainty
Characterize uncertainty
Make Inferences under
uncertainty
Make predictions under
uncertainty
Make optimal decisions under uncertainty
Type of Statistics
Statistics
Descriptive
Inferential
Descriptive statistics
Descriptive statistics utilizes numerical and graphical methods to look for patterns
in a data set, to summarize the information revealed in a data set and to present
that information in a convenient form.
• Average
• Spread
• Range
• Frequency
• Histogram
• Mode
• Scatter Plot
• Mode
• Interquartile Range
Inferential statistics
• Hypothesis Test
• Z score
• ANNOVA
• Confidence Interval
• Margin of error
• Ordinary least Square
• T test
• F Test
Types of Data
Type of Data Definition Example
Nominal The categories are in no logical order and have
no particular relationship
Your Previous Degree
Ordinal Can be ranked/ordered but not measured College Rankings
Interval Scale Set of numerical measurements in which the
distance between numbers is of a known
Temperature in Celsius
Ratio Scale Ratios are meaningful Sales of a new product
Source of data Definition Example
Observational Analyst Does not control data
generation process
Stock returns on BSE
Experimental Analyst has good control over data
generation
Clinical trials for drug
efficiency
Few Examples
1. The length of time until a pain reliever begins to work.
2. Ranking of racers in moto GP.
3. The number of colors used in a statistics textbook.
4. The brand of refrigerator in a home.
5. The overall satisfaction rating of a new car.
6. The number of files on a computer’s hard disk.
7. The pH level of the water in a swimming pool.
8. The number of staples in a stapler.
Population & Sample
Population: A collection, or set, of individuals or objects or events whose
properties are to be analyzed.
Typically, there are too many experimental units in a population to consider
every one.
Sample: A Subset of population
Measure of Central Tendency
Mode: The value in the data that occurs most frequently
Mean: The average of a given set of numbers
Mean of sample
Population Mean µ=
1
𝑁 𝑖
𝑛
𝑥𝑖
Percentiles: The pth Percentile of a group of numbers is that value below which
lie p% of the numbers in the group .
Pth percentile= (n+1)p/100 where n is the number of data points
Median: 50th percentile
Quartiles: These are percentiles which break down the distribution of the data.
1st (25 percentile),3rd (75th percentile)
Interquartile Range(IQR): Difference between 1st and 3rd quartile
value Frequency
18 4
19 1
20 3
21 1
22 2
23 2
24 1
Quick Exercise
Data- 33 26 24 21 18 52 19
Mean ??
Mode ??
Median ??
IQR??
Measure of Variability
Range: Difference between largest number and smallest number in a given data
set
Variance: Is the average squared deviation of the data points from their mean
Sample Variance
Population Variance
Standard Deviation: Square root of variance of the data set
Sample sd S=√𝑆2
Population sd 𝜎 = √𝜎2
Spare some thoughts
Why SD & VAR
Why different denominator
Why not mod
Histogram
• Histogram is a chart made of bars where height of each bars represent frequency
of values
• Frequency of values can be absolute frequencies of counts or relative frequency
• Relative frequency of data points counts of the data points divided by total
number of data points
Boxplot
Boxplot is a measure of five point summary measures of the distribution of the
data
Skew ness
Skew ness is the measure of the degree of asymmetry of a frequency
distribution
Kurtosis
Kurtosis is a measure of peakedness of a distribution
Kurtosis for normal distribution is 3
What Is Random Variable?
How To Summarize Random Variable?
How to pictorially Represent Probability Distribution?
Random Variable
Random Variable
A Random Variable describes the probabilities for an uncertain future numerical
outcome of a random process
It is a variable that can take on several possible value
It is random because there is some chance associated with each possible values
Random variable is of 2 types
• Discrete
• Continuous
Probability Distribution
• Probability
o Long Run average of a random event occurring
o Different from subjective beliefs
• A Probability distribution is a rule that identifies possible outcomes of a
random variable and assigns a probability to each
• A discrete distribution has finite number of values
o E.g. face value of a card, height of students in class
• A continuous distribution has all possible values in some range
o E.g. salaries per month, Temperature in a month
PDF & CDF of Random Variable
The PDF(probability distribution function) for a discrete random variable x is the
relative frequency distributions of the x. It is a graph, table or formula that gives
the possible values of x and the probability p(x) associated with each value.
For all xi pdf must satisfy
CDF(Cumulative distribution function), F(x) of a discrete random variable is
F(x)=P(X≤x)= 𝑎𝑙𝑙 𝑖≤𝑥 𝑃(𝑖)
1)(and1)(0
havemustWe
 xpxp
X p(X=x) F(x)
0 0.1 0.1
1 0.2 0.3
2 0.3 0.6
3 0.2 0.8
4 0.1 0.9
5 0.1 1.00
1.00
Example
Toss a fair coin three times and define
x = number of heads.
P(x = 0) = 1/8
P(x = 1) = 3/8
P(x = 2) = 3/8
P(x = 3) = 1/8
HHH
HHT
HTH
THH
HTT
THT
TTH
TTT
x p(x)
0 1/8
1 3/8
2 3/8
3 1/8
Probability Histogram
for x
1/8
1/8
1/8
1/8
1/8
1/8
1/8
1/8
x
3
2
2
2
1
1
1
0
Quick exercise
Randomly chosen card from a deck of cards
What is the probability of getting an ace?
What is the probability of getting a card less than 3?
What is the probability of getting 1 head if I toss 2 unbiased coin?
What is the probability of getting 2 head if I toss 3 unbiased coin?
An Example
X p(X=x)
0 0.4
1 0.25
2 0.2
3 0.05
4 0.1
• Daily sales of TVs at store
• What is the probability of a sale?
• What is the probability of selling at least three TVs?
Expected Value or Mean
• The expected value or mean(µ) of a random variable is
the weighted average of its values
‒ The probabilities serve as weights
‒ E(x)= 𝒊
𝒏
𝒙𝒊 𝒑(𝑿 = 𝒙𝒊)
• What is the mean number of TVs sold per day
• What does this imply
Variance and Standard Deviation
• Both measures of variation or uncertainty in random variable
• Variance(σ2) :The weighted average of the squared deviations from the
mean
‒ Probabilities serve as weights
‒ σ2(x)= 𝑖
𝑛
𝑥𝑖 − µ 2 𝑝 𝑋 = 𝑥𝑖 = 𝐸 𝑥 − µ 2
‒ Units are squared of the units of the variables
‒ Another way Var(X)=E(X2)-[E(X)]2
• Standard Deviation(σ) :Square root of variance
‒ Has units same as variable
Sum of Random Variables
Let X1 and x2 be 2 random variables with means µ1 and µ2 and standard
deviation σ1 and σ2, suppose Y=aX1 +b X2
‒ What is the Mean of Y?
E[Y]=aE[X1] +bE[X2]
‒ What is the standard deviation of Y?
Var(Y)=a2var(X1)+b2Var(X2)
• Independent: When the value taken by random variable does not affect
the value taken by other random variable
‒ E.g. Rolls of 2 Dice
• Dependent : When the value of one random variable gives us more
information about the other random variable
‒ E.g. Height and weight of students
Example
Let X1 and X2 be the outcomes associated with a toss of a pair of dice
E(X1)=E(X2)=3.5
SD(X1)=SD(X2)=1.708
Compute the following:
E(x1+X2)=
SD(X1+X2)=
The Empirical Rule
• Approximately 68% of data points will be within 1 standard deviation of
the mean
• Approximately 95% of the data points will be within 2 standard
deviation of the mean
• A vast majority(almost all) will lie within 3 standard deviation of the
mean
Normal distribution
• The graph of the PDF is a bell shaped curve
• The normal random variable takes values from -∞ to +∞
• It is symmetric and centered around the mean(which is also the median and the
mode)
• Any normal distribution can be specified with just 2 parameters – the mean(µ)
and the standard deviation(σ)
• We write this as X~N(µ,σ2)
Comparing multiple normal
distributions
Probability Calculation for
continuous Distribution
• The probability associated with any single value of the random variable is always
zero
• Probability of values being in a range = Area under the pdf curve in that range
• Area under the entire curve is always equals 1
Z-scores, Standard Normal
Distribution
For every value(x) of the random variable X, we calculate its z-score:
Interpretation- How many standard deviations away is the value from the
mean?
If X~N(µ,σ2) then
‒ Z-scores have a normal distribution with µ=0 and σ=1
‒ i.e. Z~N(0,1)
‒ Standard normal distribution
• Inverse Transformation
‒ X=µ + zσ
Probability calculation for normal
distribution
• Consider a normal distribution X~N(µ,σ2)
• Methods to calculate P(X≤ 𝑥)
‒ Use R:pnorm(x,µ,σ)

More Related Content

What's hot

Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear model
Rahul Rockers
 
Statistical inference
Statistical inferenceStatistical inference
Statistical inferenceJags Jagdish
 
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and Statistics
T.S. Lim
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Dr Resu Neha Reddy
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Attaullah Khan
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
Dr.ammara khakwani
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
MoonWeryah
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
Ashok Kulkarni
 
Methods of point estimation
Methods of point estimationMethods of point estimation
Methods of point estimation
Suruchi Somwanshi
 
Hypothesis testing an introduction
Hypothesis testing an introductionHypothesis testing an introduction
Hypothesis testing an introductionGeetika Gulyani
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAileen Balbido
 
Basic Statistics Presentation
Basic Statistics PresentationBasic Statistics Presentation
Basic Statistics PresentationIUBAT
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
Dalia El-Shafei
 
Lec 5 - Normality Testing.pptx
Lec 5 - Normality Testing.pptxLec 5 - Normality Testing.pptx
Lec 5 - Normality Testing.pptx
Farah Amir
 
Use of statistics in real life
Use of statistics in real lifeUse of statistics in real life
Use of statistics in real life
Harsh Rajput
 
Powerpoint sampling distribution
Powerpoint sampling distributionPowerpoint sampling distribution
Powerpoint sampling distribution
Susan McCourt
 
Hypothesis
HypothesisHypothesis
Hypothesis
Nilanjan Bhaumik
 
SPSS FINAL.pdf
SPSS FINAL.pdfSPSS FINAL.pdf
SPSS FINAL.pdf
Thanavathi C
 
lfstat3e_ppt_02_rev.ppt
lfstat3e_ppt_02_rev.pptlfstat3e_ppt_02_rev.ppt
lfstat3e_ppt_02_rev.ppt
AnkurSingh340457
 

What's hot (20)

Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear model
 
Statistical inference
Statistical inferenceStatistical inference
Statistical inference
 
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and Statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Role of statistics in real life , business & good governance
Role of statistics in real life , business & good governanceRole of statistics in real life , business & good governance
Role of statistics in real life , business & good governance
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
 
Methods of point estimation
Methods of point estimationMethods of point estimation
Methods of point estimation
 
Hypothesis testing an introduction
Hypothesis testing an introductionHypothesis testing an introduction
Hypothesis testing an introduction
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Basic Statistics Presentation
Basic Statistics PresentationBasic Statistics Presentation
Basic Statistics Presentation
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
 
Lec 5 - Normality Testing.pptx
Lec 5 - Normality Testing.pptxLec 5 - Normality Testing.pptx
Lec 5 - Normality Testing.pptx
 
Use of statistics in real life
Use of statistics in real lifeUse of statistics in real life
Use of statistics in real life
 
Powerpoint sampling distribution
Powerpoint sampling distributionPowerpoint sampling distribution
Powerpoint sampling distribution
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
SPSS FINAL.pdf
SPSS FINAL.pdfSPSS FINAL.pdf
SPSS FINAL.pdf
 
lfstat3e_ppt_02_rev.ppt
lfstat3e_ppt_02_rev.pptlfstat3e_ppt_02_rev.ppt
lfstat3e_ppt_02_rev.ppt
 

Viewers also liked

15 16 diagrama de ubicación de ap
15 16 diagrama de ubicación de ap15 16 diagrama de ubicación de ap
15 16 diagrama de ubicación de ap
teutli
 
Петиции в Днепре
Петиции в ДнепреПетиции в Днепре
Петиции в Днепре
Jaanika Merilo
 
ELON MUSK-Ali
ELON MUSK-AliELON MUSK-Ali
Adhesives in maxillofacial prosthesis /orthodontics courses in india
Adhesives in maxillofacial prosthesis /orthodontics courses in indiaAdhesives in maxillofacial prosthesis /orthodontics courses in india
Adhesives in maxillofacial prosthesis /orthodontics courses in india
Indian dental academy
 
Getting Engaged: funding and communication of CSO projects in the association...
Getting Engaged: funding and communication of CSO projects in the association...Getting Engaged: funding and communication of CSO projects in the association...
Getting Engaged: funding and communication of CSO projects in the association...
Olga Honcharenko
 
Propuestas logotipo
Propuestas logotipoPropuestas logotipo
Propuestas logotipo
Alicia Pomares
 
Elon musk
Elon muskElon musk
Elon musk
Shashi shekhar
 
Elon musk
Elon muskElon musk
Elon musk
Shiraz316
 
Practical Aspects of Stochastic Modeling.pptx
Practical Aspects of Stochastic Modeling.pptxPractical Aspects of Stochastic Modeling.pptx
Practical Aspects of Stochastic Modeling.pptxRon Harasym
 
An Overview of Basic Statistics
An Overview of Basic StatisticsAn Overview of Basic Statistics
An Overview of Basic Statistics
getyourcheaton
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
mercy rani
 
MATERIALS USED FOR DENTAL IMPLANT / dental implant courses by Indian dental a...
MATERIALS USED FOR DENTAL IMPLANT / dental implant courses by Indian dental a...MATERIALS USED FOR DENTAL IMPLANT / dental implant courses by Indian dental a...
MATERIALS USED FOR DENTAL IMPLANT / dental implant courses by Indian dental a...
Indian dental academy
 
Descriptive Statistics Part II: Graphical Description
Descriptive Statistics Part II: Graphical DescriptionDescriptive Statistics Part II: Graphical Description
Descriptive Statistics Part II: Graphical Description
getyourcheaton
 
MD Paediatrics (Part 1) - Overview of Basic Statistics
MD Paediatrics (Part 1) - Overview of Basic StatisticsMD Paediatrics (Part 1) - Overview of Basic Statistics
MD Paediatrics (Part 1) - Overview of Basic Statistics
Bernard Deepal W. Jayamanne
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Sr Edith Bogue
 
Kytu.prezent.
Kytu.prezent.Kytu.prezent.
Kytu.prezent.
jkmuffgrhdcv
 

Viewers also liked (19)

CompTIA A+ ce certificate
CompTIA A+ ce certificateCompTIA A+ ce certificate
CompTIA A+ ce certificate
 
Paella aavv spn
Paella aavv spnPaella aavv spn
Paella aavv spn
 
15 16 diagrama de ubicación de ap
15 16 diagrama de ubicación de ap15 16 diagrama de ubicación de ap
15 16 diagrama de ubicación de ap
 
Петиции в Днепре
Петиции в ДнепреПетиции в Днепре
Петиции в Днепре
 
ELON MUSK-Ali
ELON MUSK-AliELON MUSK-Ali
ELON MUSK-Ali
 
Adhesives in maxillofacial prosthesis /orthodontics courses in india
Adhesives in maxillofacial prosthesis /orthodontics courses in indiaAdhesives in maxillofacial prosthesis /orthodontics courses in india
Adhesives in maxillofacial prosthesis /orthodontics courses in india
 
Getting Engaged: funding and communication of CSO projects in the association...
Getting Engaged: funding and communication of CSO projects in the association...Getting Engaged: funding and communication of CSO projects in the association...
Getting Engaged: funding and communication of CSO projects in the association...
 
Propuestas logotipo
Propuestas logotipoPropuestas logotipo
Propuestas logotipo
 
Elon musk
Elon muskElon musk
Elon musk
 
Elon musk
Elon muskElon musk
Elon musk
 
Practical Aspects of Stochastic Modeling.pptx
Practical Aspects of Stochastic Modeling.pptxPractical Aspects of Stochastic Modeling.pptx
Practical Aspects of Stochastic Modeling.pptx
 
An Overview of Basic Statistics
An Overview of Basic StatisticsAn Overview of Basic Statistics
An Overview of Basic Statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
MATERIALS USED FOR DENTAL IMPLANT / dental implant courses by Indian dental a...
MATERIALS USED FOR DENTAL IMPLANT / dental implant courses by Indian dental a...MATERIALS USED FOR DENTAL IMPLANT / dental implant courses by Indian dental a...
MATERIALS USED FOR DENTAL IMPLANT / dental implant courses by Indian dental a...
 
Descriptive statistics -review(2)
Descriptive statistics -review(2)Descriptive statistics -review(2)
Descriptive statistics -review(2)
 
Descriptive Statistics Part II: Graphical Description
Descriptive Statistics Part II: Graphical DescriptionDescriptive Statistics Part II: Graphical Description
Descriptive Statistics Part II: Graphical Description
 
MD Paediatrics (Part 1) - Overview of Basic Statistics
MD Paediatrics (Part 1) - Overview of Basic StatisticsMD Paediatrics (Part 1) - Overview of Basic Statistics
MD Paediatrics (Part 1) - Overview of Basic Statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Kytu.prezent.
Kytu.prezent.Kytu.prezent.
Kytu.prezent.
 

Similar to Basic statistics 1

Basic statistics
Basic statisticsBasic statistics
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptx
Anusuya123
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in Research
Manoj Sharma
 
estimation
estimationestimation
estimation
Mmedsc Hahm
 
Estimation
EstimationEstimation
Estimation
Mmedsc Hahm
 
Res701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasamRes701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasam
VIT University (Chennai Campus)
 
Review of Chapters 1-5.ppt
Review of Chapters 1-5.pptReview of Chapters 1-5.ppt
Review of Chapters 1-5.ppt
NobelFFarrar
 
Introduction to Statistics23122223.ppt
Introduction to Statistics23122223.pptIntroduction to Statistics23122223.ppt
Introduction to Statistics23122223.ppt
pathianithanaidu
 
Introduction to Statistics2312.ppt
Introduction to Statistics2312.pptIntroduction to Statistics2312.ppt
Introduction to Statistics2312.ppt
pathianithanaidu
 
Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.ppt
TripthiDubey
 
Statistics.pdf
Statistics.pdfStatistics.pdf
Statistics.pdf
Shruti Nigam (CWM, AFP)
 
Ch3_Statistical Analysis and Random Error Estimation.pdf
Ch3_Statistical Analysis and Random Error Estimation.pdfCh3_Statistical Analysis and Random Error Estimation.pdf
Ch3_Statistical Analysis and Random Error Estimation.pdf
Vamshi962726
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
Prithwis Mukerjee
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
Prithwis Mukerjee
 
Ders 1 mean mod media st dev.pptx
Ders 1 mean mod media st dev.pptxDers 1 mean mod media st dev.pptx
Ders 1 mean mod media st dev.pptx
Ergin Akalpler
 
Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learning
Sadia Zafar
 

Similar to Basic statistics 1 (20)

Basic statistics
Basic statisticsBasic statistics
Basic statistics
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptx
 
determinatiion of
determinatiion of determinatiion of
determinatiion of
 
Statistical Methods in Research
Statistical Methods in ResearchStatistical Methods in Research
Statistical Methods in Research
 
estimation
estimationestimation
estimation
 
Estimation
EstimationEstimation
Estimation
 
Res701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasamRes701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasam
 
Review of Chapters 1-5.ppt
Review of Chapters 1-5.pptReview of Chapters 1-5.ppt
Review of Chapters 1-5.ppt
 
Introduction to Statistics23122223.ppt
Introduction to Statistics23122223.pptIntroduction to Statistics23122223.ppt
Introduction to Statistics23122223.ppt
 
Introduction to Statistics2312.ppt
Introduction to Statistics2312.pptIntroduction to Statistics2312.ppt
Introduction to Statistics2312.ppt
 
Introduction to Statistics53004300.ppt
Introduction to Statistics53004300.pptIntroduction to Statistics53004300.ppt
Introduction to Statistics53004300.ppt
 
Statistics.pdf
Statistics.pdfStatistics.pdf
Statistics.pdf
 
Ch3_Statistical Analysis and Random Error Estimation.pdf
Ch3_Statistical Analysis and Random Error Estimation.pdfCh3_Statistical Analysis and Random Error Estimation.pdf
Ch3_Statistical Analysis and Random Error Estimation.pdf
 
9주차
9주차9주차
9주차
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
statistics
statisticsstatistics
statistics
 
Chapter 11 Psrm
Chapter 11 PsrmChapter 11 Psrm
Chapter 11 Psrm
 
Ders 1 mean mod media st dev.pptx
Ders 1 mean mod media st dev.pptxDers 1 mean mod media st dev.pptx
Ders 1 mean mod media st dev.pptx
 
Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learning
 

More from Kumar P

Using R for customer segmentation
Using R  for customer segmentationUsing R  for customer segmentation
Using R for customer segmentation
Kumar P
 
Segmentation of Targeting
Segmentation of TargetingSegmentation of Targeting
Segmentation of Targeting
Kumar P
 
Arima model (time series)
Arima model (time series)Arima model (time series)
Arima model (time series)
Kumar P
 
What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)
Kumar P
 
Basic R Learning
Basic R LearningBasic R Learning
Basic R Learning
Kumar P
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Kumar P
 

More from Kumar P (6)

Using R for customer segmentation
Using R  for customer segmentationUsing R  for customer segmentation
Using R for customer segmentation
 
Segmentation of Targeting
Segmentation of TargetingSegmentation of Targeting
Segmentation of Targeting
 
Arima model (time series)
Arima model (time series)Arima model (time series)
Arima model (time series)
 
What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)What goes with what (Market Basket Analysis)
What goes with what (Market Basket Analysis)
 
Basic R Learning
Basic R LearningBasic R Learning
Basic R Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 

Recently uploaded

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 

Recently uploaded (20)

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 

Basic statistics 1

  • 1. Data Science Statistical Analysis : Estimation and Testing By Kumar P
  • 2. Managerial Decisions How many Programmers should I staff for? What is the right level of inventory for our new product manufacturing Where should we open our new retail store? What will be next year revenue? Whether we are on right or wrong track How much should I invest in advertising
  • 3. Flow Diagram Acknowledge Uncertainty Characterize uncertainty Make Inferences under uncertainty Make predictions under uncertainty Make optimal decisions under uncertainty
  • 5. Descriptive statistics Descriptive statistics utilizes numerical and graphical methods to look for patterns in a data set, to summarize the information revealed in a data set and to present that information in a convenient form. • Average • Spread • Range • Frequency • Histogram • Mode • Scatter Plot • Mode • Interquartile Range
  • 6. Inferential statistics • Hypothesis Test • Z score • ANNOVA • Confidence Interval • Margin of error • Ordinary least Square • T test • F Test
  • 7. Types of Data Type of Data Definition Example Nominal The categories are in no logical order and have no particular relationship Your Previous Degree Ordinal Can be ranked/ordered but not measured College Rankings Interval Scale Set of numerical measurements in which the distance between numbers is of a known Temperature in Celsius Ratio Scale Ratios are meaningful Sales of a new product Source of data Definition Example Observational Analyst Does not control data generation process Stock returns on BSE Experimental Analyst has good control over data generation Clinical trials for drug efficiency
  • 8. Few Examples 1. The length of time until a pain reliever begins to work. 2. Ranking of racers in moto GP. 3. The number of colors used in a statistics textbook. 4. The brand of refrigerator in a home. 5. The overall satisfaction rating of a new car. 6. The number of files on a computer’s hard disk. 7. The pH level of the water in a swimming pool. 8. The number of staples in a stapler.
  • 9. Population & Sample Population: A collection, or set, of individuals or objects or events whose properties are to be analyzed. Typically, there are too many experimental units in a population to consider every one. Sample: A Subset of population
  • 10. Measure of Central Tendency Mode: The value in the data that occurs most frequently Mean: The average of a given set of numbers Mean of sample Population Mean µ= 1 𝑁 𝑖 𝑛 𝑥𝑖 Percentiles: The pth Percentile of a group of numbers is that value below which lie p% of the numbers in the group . Pth percentile= (n+1)p/100 where n is the number of data points Median: 50th percentile Quartiles: These are percentiles which break down the distribution of the data. 1st (25 percentile),3rd (75th percentile) Interquartile Range(IQR): Difference between 1st and 3rd quartile value Frequency 18 4 19 1 20 3 21 1 22 2 23 2 24 1
  • 11. Quick Exercise Data- 33 26 24 21 18 52 19 Mean ?? Mode ?? Median ?? IQR??
  • 12. Measure of Variability Range: Difference between largest number and smallest number in a given data set Variance: Is the average squared deviation of the data points from their mean Sample Variance Population Variance Standard Deviation: Square root of variance of the data set Sample sd S=√𝑆2 Population sd 𝜎 = √𝜎2
  • 13. Spare some thoughts Why SD & VAR Why different denominator Why not mod
  • 14. Histogram • Histogram is a chart made of bars where height of each bars represent frequency of values • Frequency of values can be absolute frequencies of counts or relative frequency • Relative frequency of data points counts of the data points divided by total number of data points
  • 15. Boxplot Boxplot is a measure of five point summary measures of the distribution of the data
  • 16. Skew ness Skew ness is the measure of the degree of asymmetry of a frequency distribution
  • 17. Kurtosis Kurtosis is a measure of peakedness of a distribution Kurtosis for normal distribution is 3
  • 18. What Is Random Variable? How To Summarize Random Variable? How to pictorially Represent Probability Distribution? Random Variable
  • 19. Random Variable A Random Variable describes the probabilities for an uncertain future numerical outcome of a random process It is a variable that can take on several possible value It is random because there is some chance associated with each possible values Random variable is of 2 types • Discrete • Continuous
  • 20. Probability Distribution • Probability o Long Run average of a random event occurring o Different from subjective beliefs • A Probability distribution is a rule that identifies possible outcomes of a random variable and assigns a probability to each • A discrete distribution has finite number of values o E.g. face value of a card, height of students in class • A continuous distribution has all possible values in some range o E.g. salaries per month, Temperature in a month
  • 21. PDF & CDF of Random Variable The PDF(probability distribution function) for a discrete random variable x is the relative frequency distributions of the x. It is a graph, table or formula that gives the possible values of x and the probability p(x) associated with each value. For all xi pdf must satisfy CDF(Cumulative distribution function), F(x) of a discrete random variable is F(x)=P(X≤x)= 𝑎𝑙𝑙 𝑖≤𝑥 𝑃(𝑖) 1)(and1)(0 havemustWe  xpxp X p(X=x) F(x) 0 0.1 0.1 1 0.2 0.3 2 0.3 0.6 3 0.2 0.8 4 0.1 0.9 5 0.1 1.00 1.00
  • 22. Example Toss a fair coin three times and define x = number of heads. P(x = 0) = 1/8 P(x = 1) = 3/8 P(x = 2) = 3/8 P(x = 3) = 1/8 HHH HHT HTH THH HTT THT TTH TTT x p(x) 0 1/8 1 3/8 2 3/8 3 1/8 Probability Histogram for x 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 x 3 2 2 2 1 1 1 0
  • 23. Quick exercise Randomly chosen card from a deck of cards What is the probability of getting an ace? What is the probability of getting a card less than 3? What is the probability of getting 1 head if I toss 2 unbiased coin? What is the probability of getting 2 head if I toss 3 unbiased coin?
  • 24. An Example X p(X=x) 0 0.4 1 0.25 2 0.2 3 0.05 4 0.1 • Daily sales of TVs at store • What is the probability of a sale? • What is the probability of selling at least three TVs?
  • 25. Expected Value or Mean • The expected value or mean(µ) of a random variable is the weighted average of its values ‒ The probabilities serve as weights ‒ E(x)= 𝒊 𝒏 𝒙𝒊 𝒑(𝑿 = 𝒙𝒊) • What is the mean number of TVs sold per day • What does this imply
  • 26. Variance and Standard Deviation • Both measures of variation or uncertainty in random variable • Variance(σ2) :The weighted average of the squared deviations from the mean ‒ Probabilities serve as weights ‒ σ2(x)= 𝑖 𝑛 𝑥𝑖 − µ 2 𝑝 𝑋 = 𝑥𝑖 = 𝐸 𝑥 − µ 2 ‒ Units are squared of the units of the variables ‒ Another way Var(X)=E(X2)-[E(X)]2 • Standard Deviation(σ) :Square root of variance ‒ Has units same as variable
  • 27. Sum of Random Variables Let X1 and x2 be 2 random variables with means µ1 and µ2 and standard deviation σ1 and σ2, suppose Y=aX1 +b X2 ‒ What is the Mean of Y? E[Y]=aE[X1] +bE[X2] ‒ What is the standard deviation of Y? Var(Y)=a2var(X1)+b2Var(X2) • Independent: When the value taken by random variable does not affect the value taken by other random variable ‒ E.g. Rolls of 2 Dice • Dependent : When the value of one random variable gives us more information about the other random variable ‒ E.g. Height and weight of students
  • 28. Example Let X1 and X2 be the outcomes associated with a toss of a pair of dice E(X1)=E(X2)=3.5 SD(X1)=SD(X2)=1.708 Compute the following: E(x1+X2)= SD(X1+X2)=
  • 29. The Empirical Rule • Approximately 68% of data points will be within 1 standard deviation of the mean • Approximately 95% of the data points will be within 2 standard deviation of the mean • A vast majority(almost all) will lie within 3 standard deviation of the mean
  • 30. Normal distribution • The graph of the PDF is a bell shaped curve • The normal random variable takes values from -∞ to +∞ • It is symmetric and centered around the mean(which is also the median and the mode) • Any normal distribution can be specified with just 2 parameters – the mean(µ) and the standard deviation(σ) • We write this as X~N(µ,σ2)
  • 32. Probability Calculation for continuous Distribution • The probability associated with any single value of the random variable is always zero • Probability of values being in a range = Area under the pdf curve in that range • Area under the entire curve is always equals 1
  • 33. Z-scores, Standard Normal Distribution For every value(x) of the random variable X, we calculate its z-score: Interpretation- How many standard deviations away is the value from the mean? If X~N(µ,σ2) then ‒ Z-scores have a normal distribution with µ=0 and σ=1 ‒ i.e. Z~N(0,1) ‒ Standard normal distribution • Inverse Transformation ‒ X=µ + zσ
  • 34. Probability calculation for normal distribution • Consider a normal distribution X~N(µ,σ2) • Methods to calculate P(X≤ 𝑥) ‒ Use R:pnorm(x,µ,σ)