SlideShare a Scribd company logo
1 of 80
An introduction to statistical inference
Dr. Abhay Pratap Pandey
University of Delhi
What is inference?
Inference defined:
• An everyday meaning…
We infer a conclusion based on evidence and reasoning
• A statistical meaning…
We infer a property of a population from a sample
Why inference?
The aim of inference is to determine the characteristics of a population
from a sample.
Population
Sample
Population and sample
In statistical analysis, a population is a collection of all the
people, items, or events about which one wants to make
inferences. OR
Any well-defined group of subjects, which could be
individuals, firms, cities, or many other possibilities
(For example university students in India.)
In statistical analysis, a sample, is a subset of the population
(i.e. the people, items, or events) that one collects and
analyzes to make inferences. (For example 200 randomly
chosen university students.)
Statistical sample - Subset of the population chosen to represent the
population in a statistical analysis; denoted as (X1,X2, ... Xn).
Random sample- randomly chosen from the population sample of
individuals.
In the case of random sampling, the following techniques can be used:
Independent sampling (draw with replacement) - after each draw the
unit returns to the population.
Dependent sampling (draw without replacement) - after each draw the
unit does not return to the population (no longer participate in the
drawing).
In statistical analysis, an observation is an elements of the sample. (For
example Helena, a student at Central University.)
Sampling
Estimation
Testing of
hypothesis
Statistical inference
Aim of statistical inference
The aim of statistical inference is to learn about the population using the observed
data
This involves:
• computing something with the data
• a statistic: function of data
• interpret the result
• in probabilistic terms: sampling distribution of statistic
Estimation
• Determination of the population parameter by the calculation of a
sample statistic…
Characteristic
Population
Parameter
μ
Sample
Statistic
𝑥
A sampling distribution is a probability distribution of a statistic obtained
through a large number of samples drawn from a specific population.
Population
parameter
μ
Sample
Statistic 𝑥1
Sample
Statistic
𝑥2
Sample
Statistic 𝑥3
Uncertainty
Estimates are not perfect
Sampling
distribution
Types of estimators in statistics
Estimator
An estimator is a statistic (function of data) that produces such a guess.
We usually mean by “best” an estimator whose sampling distribution is more
concentrated about the population parameter value compared to other
estimators.
The two main types of estimators in statistics are
• Point estimators
• Interval estimators
Point estimation: Point estimators are functions that are used to find an
approximate value of a population parameter from random samples of the
population. They use the sample data of a population to calculate a point
estimate or a statistic that serves as the best estimate of an
unknown parameter of a population. We want to estimate a population
parameter using the observed data.
Ex. some measure of variation, an average, min, max, quantile, etc.
• Interval estimation
Interval estimation uses sample data to calculate the interval
of the possible values of an unknown parameter of a
population. The interval of the parameter is selected in a
way that it falls within a 95% or higher probability, also
known as the confidence interval. The confidence interval is
used to indicate how reliable an estimate is, and it is
calculated from the observed data. The endpoints of the
intervals are referred to as the upper and lower confidence
limits.
Properties of Point Estimators
• Unbiasedness
• Consistency
• Sufficiency
• Efficiency
Unbiasedness
An estimator of a given parameter is said to be unbiased if its expected
value is equal to the true value of the parameter.
The bias of a point estimator is defined as the difference between
the expected value of the estimator and the value of the parameter being
estimated. When
Also, the closer the expected value of a parameter is to the value of the
parameter being measured, the lesser the bias is.
Consistency
Consistency tells us how close the point estimator stays to the value of
the parameter as it increases in size. The point estimator requires a
large sample size for it to be more consistent and accurate. You can also
check if a point estimator is consistent by looking at its corresponding
expected value and variance. For the point estimator to be consistent,
the expected value should move toward the true value of the
parameter.
Maximum likelihood estimator
The maximum likelihood estimator method of point estimation
attempts to find the unknown parameters that maximize the likelihood
function. It takes a known model and uses the values to compare data
sets and find the most suitable match for the data.
For example, a researcher may be interested in knowing the average
weight of babies born prematurely. Since it would be impossible to
measure all babies born prematurely in the population, the researcher
can take a sample from one location. Since the weight of pre-term
babies follows a normal distribution, the researcher can use the
maximum likelihood estimator to find the average weight of the entire
population of pre-term babies based on the sample data.
Method of moments
The method of moments of estimating parameters was introduced in
1887 by Russian mathematician Pafnuty Chebyshev. It starts by taking
known facts about a population and then applying the facts to a sample
of the population. The first step is to derive equations that relate the
population moments to the unknown parameters.
The next step is to draw a sample of the population to be used to
estimate the population moments. The equations derived in step one
are then solved using the sample mean of the population moments.
This produces the best estimate of the unknown population
parameters.
What is Confidence Interval?
A confidence interval is an estimate of an interval in statistics that may
contain a population parameter. The unknown population parameter is
found through a sample parameter calculated from the sampled data.
For example, the population mean μ is found using the sample mean x̅.
The interval is generally defined by its lower and upper bounds. The
confidence interval is expressed as a percentage (the most frequently
quoted percentages are 90%, 95%, and 99%). The percentage reflects
the confidence level.
The concept of the confidence interval is very important in statistics
(hypothesis testing) since it is used as a measure of uncertainty. The
concept was introduced by Polish mathematician and statistician, Jerzy
Neyman in 1937.
Confidence Interval
We can also quantify the uncertainty (sampling distribution) of our
point estimate.
One way of doing this is by constructing an interval that is likely to
contain the population parameter.
One such an interval, which is computed on the basis of the data, is
called a confidence interval.
The sampling probability that the confidence interval will indeed
contain the parameter value is called the confidence level.
We construct confidence intervals for a given confidence level.
Interpretation of Confidence Interval
The proper interpretation of a confidence interval is probably the most
challenging aspect of this statistical concept. One example of the most
common interpretation of the concept is the following:
There is a 95% probability that, in the future, the true value of the
population parameter (e.g., mean) will fall within X [lower bound] and Y
[upper bound] interval.
In addition, we may interpret the confidence interval using the statement
below:
We are 95% confident that the interval between X [lower bound] and Y
[upper bound] contains the true value of the population parameter.
However, it would be inappropriate to state the following:
There is a 95% probability that the interval between X [lower bound] and
Y [upper bound] contains the true value of the population parameter.
How to Calculate the Confidence Interval?
The interval is calculated using the following steps:
• Gather the sample data.
• Calculate the sample mean x̅.
• Determine whether a population’s standard deviation is known or
unknown.
• If a population’s standard deviation is known, we can use a z-score for
the corresponding confidence level.
• If a population’s standard deviation is unknown, we can use a t-
statistic for the corresponding confidence level.
• Find the lower and upper bounds of the confidence interval using the
following formulas:
a. Known population standard deviation
b. Unknown population standard deviation
Examples
• Suppose we conduct a poll to try and get a sense of the outcome of an
upcoming election with two candidates. We poll 1000 people, and 550 of
them respond that they will vote for candidate A .
How confident can we be that a given person will cast their vote for
candidate A?
Sol.
1. Select our desired levels of confidence We’re going to use the 90%,
95%, and 99% levels
2. Calculate α and α/2 Our α values are 0.1, 0.05, and 0.01 respectively
Our α/2 values are 0.05, 0.025, and 0.005
3. Look up the corresponding z-scores Our Zα /2 values are 1.645, 1.96,
and 2.58
4. Multiply the z-score by the standard error to find the margin of error
First we need to calculate the standard error
5. Find the interval by adding and subtracting this product from the mean.
In this case, we are working with a distribution we have not previously
discussed, a normal binomial distribution (i.e. a vote can choose Candidate
A or B, a binomial function).
We have a probability estimator from our sample, where the probability of
an individual in our sample voting for candidate A was found to be 550/1000
or 0.55.
We can use this information in a formula to estimate the standard error for
such a distribution:
5. Multiply the z-score by the standard error cont.
• For a normal binominal distribution, the standard error can be estimated
using:
S.E= 0.0157
• We can now multiply this value by the z-scores to calculate the
margins of error for each conf. level
Multiply the z-score by the standard error cont.
• We calculate the margin of error and add and subtract that value
from the mean (0.55 in this case) to find the bounds of our confidence
intervals at each level of confidence:
CI Zα/2 Margin of error Lower Bounds Upper Bounds
90% 1.645 0.026 0.524 0.576
95% 1.96 0.031 0.519 0.581
99% 2.58 0.041 0.509 0.591
What is Hypothesis Testing?
Hypothesis Testing is a method of statistical inference. It is used to test
if a statement regarding a population parameter is statistically
significant. Hypothesis testing is a powerful tool for testing the power
of predictions.
For example: A Statistician might want to make a prediction of the
mean value a customer would pay for his firm’s product. He can then
formulate a hypothesis, for example, “The average value that
customers will pay for my product is larger than $5”. To statistically test
this question, the firm owner could use hypothesis testing.
Hypothesis testing is formulated in terms of two hypothesis:
• H0: the null hypothesis;
• H1: the alternate hypothesis.
The hypothesis we want to test is if H1 is “likely" true.
So, there are two possible outcomes:
• Reject H0 and accept H1 because of sufficient evidence in the sample
in favor or H1;
• Do not reject H0 because of insufficient evidence to support H1.
Null Hypothesis and Alternative Hypothesis
• Null Hypothesis
• Alternative Hypothesis
The Null Hypothesis is usually set as what we don’t want to be true. It is
the hypothesis to be tested. Therefore, the Null Hypothesis is considered
to be true, until we have sufficient evidence to reject it. If we reject the
null hypothesis, we are led to the alternative hypothesis.
Example of the business owner who is looking for some customer insight.
His null hypothesis would be:
H0 : The average value customers are willing to pay for my product is
smaller than or equal to $5 or H0 : µ ≤ 5(µ = the population mean)
The alternative hypothesis would then be what we are evaluating, so, in
this case, it would be:
Ha : The average value customers are willing to pay for the product is
greater than $5 or Ha : µ > 5
Type I and Type II Errors
A Type I Error arises when a true Null Hypothesis is rejected. The
probability of making a Type I Error is also known as the level of
significance of the test, which is commonly referred to as alpha (α). So,
for example, if a test that has its alpha set as 0.01, there is a 1%
probability of rejecting a true null hypothesis or a 1% probability of
making a Type I Error.
A Type II Error arises when you fail to reject a False Null Hypothesis.
The probability of making a Type II Error is commonly denoted by the
Greek letter beta (β). β is used to define the Power of a Test, which is
the probability of correctly rejecting a false null hypothesis.
The Power of a Test is defined as 1-β. A test with more Power is more
desirable, as there is a lower probability of making a Type II Error.
However, there is a tradeoff between the probability of making a Type I
Error and the probability of making a Type II Error.
Properties of hypothesis testing
• Significance level - is the maximum probability of committing a Type I
error. This probability is symbolized by α.
P(Type I error|H0 is true)=α.
• Critical or Rejection Region – the range of values for the test value
that indicate a significant difference and that the null hypothesis
should be rejected.
• Non-critical or Non-rejection Region – the range of values for the test
value that indicates that the difference was probably due to chance
and that the null hypothesis should not be rejected.
One tail test(Right tail)
Left-tail test
Two-tail test
Steps in hypothesis testing
Testing a hypothesis about the mean of a population
We have the following steps:
1.Data: determine variable, sample size (n), sample mean( ) ,
population standard deviation or sample standard deviation (s) if is
unknown
2. Assumptions : We have two cases:
Case1: Population is normally or approximately normally distributed
with known or unknown variance (sample size n may be small or large),
Case 2: Population is not normal with known or unknown variance (n is
large i.e. n≥30).
3.Hypothesis: we have three cases
Case I : H0: μ=μ0 Vs HA: μ μ0
e.g. we want to test that the population mean is different than 50
Case II : H0: μ = μ0 Vs HA: μ > μ0
e.g. we want to test that the population mean is greater than 50
Case III : H0: μ = μ0 Vs HA: μ< μ0
e.g. we want to test that the population mean is less than 50
Example
• Researchers are interested in the mean age of a certain population.
• A random sample of 10 individuals drawn from the population of
interest has a mean of 27.
• Assuming that the population is approximately normally distributed
with variance 20,can we conclude that the mean is different from 30
years ? (α=0.05) .
• If the p - value is 0.0340 how can we use it in making a decision?
Solution
1-Data: variable is age, n=10, =27 ,σ2=20,α=0.05
2-Assumptions: the population is approximately normally distributed with
variance 20
3-Hypotheses:
• H0 : μ=30
• HA: μ 30
4-Test Statistic:
• Z = -2.12
5.Decision Rule
The alternative hypothesis is HA: μ ≠ 30
Hence we reject H0 if Z > Z(1-0.025)= Z(0.975)
• or Z< - Z(1-0.025 )= - Z(0.975)
• Z(0.975)=1.96(from table D)
6.Decision:
• We reject H0 ,since -2.12 is in the rejection region .
• We can conclude that μ is not equal to 30
• Using the p value ,we note that p-value =0.0340< 0.05,therefore we
reject H0
Thankyou

More Related Content

What's hot

Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Marina Santini
 
INFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONINFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONJohn Labrador
 
Statistical inference
Statistical inferenceStatistical inference
Statistical inferenceJags Jagdish
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis TestingSr Edith Bogue
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysisSetia Pramana
 
hypothesis testing
 hypothesis testing hypothesis testing
hypothesis testingzoheb khan
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statisticsAshok Kulkarni
 
Estimation in statistics
Estimation in statisticsEstimation in statistics
Estimation in statisticsRabea Jamal
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regressionpankaj8108
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDouglas Joubert
 
Factor analysis
Factor analysis Factor analysis
Factor analysis Nima
 
Lecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysisLecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysisDr Rajeev Kumar
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorAmir Al-Ansary
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressioncbt1213
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Mohammed Musah
 

What's hot (20)

Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Lecture 5: Interval Estimation
Lecture 5: Interval Estimation Lecture 5: Interval Estimation
Lecture 5: Interval Estimation
 
INFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONINFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTION
 
z-test
z-testz-test
z-test
 
Statistical inference
Statistical inferenceStatistical inference
Statistical inference
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
 
Sampling theory
Sampling theorySampling theory
Sampling theory
 
Multivariate data analysis
Multivariate data analysisMultivariate data analysis
Multivariate data analysis
 
Point estimation
Point estimationPoint estimation
Point estimation
 
hypothesis testing
 hypothesis testing hypothesis testing
hypothesis testing
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
 
Estimation in statistics
Estimation in statisticsEstimation in statistics
Estimation in statistics
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data Visualization
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
 
Lecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysisLecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysis
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 

Similar to An Introduction to Statistical Inference: Population, Sampling, Estimation, and Confidence Intervals

Point and Interval Estimation
Point and Interval EstimationPoint and Interval Estimation
Point and Interval EstimationShubham Mehta
 
statistical estimation
statistical estimationstatistical estimation
statistical estimationAmish Akbar
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distributionAvjinder (Avi) Kaler
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excelParag Shah
 
Estimation and hypothesis
Estimation and hypothesisEstimation and hypothesis
Estimation and hypothesisJunaid Ijaz
 
statistical inference
statistical inference statistical inference
statistical inference BasitShah18
 
Statistical inference with Python
Statistical inference with PythonStatistical inference with Python
Statistical inference with PythonJohnson Ubah
 
stats for 1st sem MBA atudents hypothesis testing notes
stats  for 1st sem MBA atudents hypothesis testing notesstats  for 1st sem MBA atudents hypothesis testing notes
stats for 1st sem MBA atudents hypothesis testing notesSoujanyaLk1
 
Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Harve Abella
 
2_Lecture 2_Confidence_Interval_3.pdf
2_Lecture 2_Confidence_Interval_3.pdf2_Lecture 2_Confidence_Interval_3.pdf
2_Lecture 2_Confidence_Interval_3.pdfCHANSreyya1
 
Statistical Parameters , Estimation , Confidence region.pptx
Statistical Parameters , Estimation , Confidence region.pptxStatistical Parameters , Estimation , Confidence region.pptx
Statistical Parameters , Estimation , Confidence region.pptxPawanDhamala1
 

Similar to An Introduction to Statistical Inference: Population, Sampling, Estimation, and Confidence Intervals (20)

Point and Interval Estimation
Point and Interval EstimationPoint and Interval Estimation
Point and Interval Estimation
 
Burns And Bush Chapter 16
Burns And Bush Chapter 16Burns And Bush Chapter 16
Burns And Bush Chapter 16
 
statistical estimation
statistical estimationstatistical estimation
statistical estimation
 
Estimating a Population Proportion
Estimating a Population ProportionEstimating a Population Proportion
Estimating a Population Proportion
 
Estimating a Population Proportion
Estimating a Population ProportionEstimating a Population Proportion
Estimating a Population Proportion
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distribution
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excel
 
Chapter 8
Chapter 8Chapter 8
Chapter 8
 
Estimation and hypothesis
Estimation and hypothesisEstimation and hypothesis
Estimation and hypothesis
 
statistical inference
statistical inference statistical inference
statistical inference
 
Descriptive Analysis.pptx
Descriptive Analysis.pptxDescriptive Analysis.pptx
Descriptive Analysis.pptx
 
Statistical inference with Python
Statistical inference with PythonStatistical inference with Python
Statistical inference with Python
 
Estimating a Population Proportion
Estimating a Population Proportion  Estimating a Population Proportion
Estimating a Population Proportion
 
stats for 1st sem MBA atudents hypothesis testing notes
stats  for 1st sem MBA atudents hypothesis testing notesstats  for 1st sem MBA atudents hypothesis testing notes
stats for 1st sem MBA atudents hypothesis testing notes
 
Sampling
SamplingSampling
Sampling
 
Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)Estimation and hypothesis testing 1 (graduate statistics2)
Estimation and hypothesis testing 1 (graduate statistics2)
 
Chapter 8
Chapter 8Chapter 8
Chapter 8
 
Elements of inferential statistics
Elements of inferential statisticsElements of inferential statistics
Elements of inferential statistics
 
2_Lecture 2_Confidence_Interval_3.pdf
2_Lecture 2_Confidence_Interval_3.pdf2_Lecture 2_Confidence_Interval_3.pdf
2_Lecture 2_Confidence_Interval_3.pdf
 
Statistical Parameters , Estimation , Confidence region.pptx
Statistical Parameters , Estimation , Confidence region.pptxStatistical Parameters , Estimation , Confidence region.pptx
Statistical Parameters , Estimation , Confidence region.pptx
 

Recently uploaded

Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...First NO1 World Amil baba in Faisalabad
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfAdnet Communications
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintSuomen Pankki
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesMarketing847413
 
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...makika9823
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...shivangimorya083
 
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130  Available With RoomVIP Kolkata Call Girl Serampore 👉 8250192130  Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Roomdivyansh0kumar0
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionMuhammadHusnain82237
 
Financial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and DisadvantagesFinancial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and Disadvantagesjayjaymabutot13
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfshaunmashale756
 
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxOAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxhiddenlevers
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfHenry Tapper
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证rjrjkk
 
Bladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarHarsh Kumar
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
Stock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfStock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfMichael Silva
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证jdkhjh
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Sonam Pathan
 

Recently uploaded (20)

Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
Authentic No 1 Amil Baba In Pakistan Authentic No 1 Amil Baba In Karachi No 1...
 
Lundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdfLundin Gold April 2024 Corporate Presentation v4.pdf
Lundin Gold April 2024 Corporate Presentation v4.pdf
 
Governor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraintGovernor Olli Rehn: Dialling back monetary restraint
Governor Olli Rehn: Dialling back monetary restraint
 
Q3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast SlidesQ3 2024 Earnings Conference Call and Webcast Slides
Q3 2024 Earnings Conference Call and Webcast Slides
 
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
Independent Lucknow Call Girls 8923113531WhatsApp Lucknow Call Girls make you...
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
 
🔝+919953056974 🔝young Delhi Escort service Pusa Road
🔝+919953056974 🔝young Delhi Escort service Pusa Road🔝+919953056974 🔝young Delhi Escort service Pusa Road
🔝+919953056974 🔝young Delhi Escort service Pusa Road
 
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130  Available With RoomVIP Kolkata Call Girl Serampore 👉 8250192130  Available With Room
VIP Kolkata Call Girl Serampore 👉 8250192130 Available With Room
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th edition
 
Financial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and DisadvantagesFinancial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and Disadvantages
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdf
 
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxOAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
 
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdfmagnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
magnetic-pensions-a-new-blueprint-for-the-dc-landscape.pdf
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
 
Bladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results PresentationBladex 1Q24 Earning Results Presentation
Bladex 1Q24 Earning Results Presentation
 
The Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh KumarThe Triple Threat | Article on Global Resession | Harsh Kumar
The Triple Threat | Article on Global Resession | Harsh Kumar
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
Stock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfStock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdf
 
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
原版1:1复刻堪萨斯大学毕业证KU毕业证留信学历认证
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
 

An Introduction to Statistical Inference: Population, Sampling, Estimation, and Confidence Intervals

  • 1. An introduction to statistical inference Dr. Abhay Pratap Pandey University of Delhi
  • 2. What is inference? Inference defined: • An everyday meaning… We infer a conclusion based on evidence and reasoning • A statistical meaning… We infer a property of a population from a sample
  • 3. Why inference? The aim of inference is to determine the characteristics of a population from a sample. Population Sample
  • 4.
  • 5. Population and sample In statistical analysis, a population is a collection of all the people, items, or events about which one wants to make inferences. OR Any well-defined group of subjects, which could be individuals, firms, cities, or many other possibilities (For example university students in India.) In statistical analysis, a sample, is a subset of the population (i.e. the people, items, or events) that one collects and analyzes to make inferences. (For example 200 randomly chosen university students.)
  • 6. Statistical sample - Subset of the population chosen to represent the population in a statistical analysis; denoted as (X1,X2, ... Xn). Random sample- randomly chosen from the population sample of individuals. In the case of random sampling, the following techniques can be used: Independent sampling (draw with replacement) - after each draw the unit returns to the population. Dependent sampling (draw without replacement) - after each draw the unit does not return to the population (no longer participate in the drawing). In statistical analysis, an observation is an elements of the sample. (For example Helena, a student at Central University.)
  • 8. Aim of statistical inference The aim of statistical inference is to learn about the population using the observed data This involves: • computing something with the data • a statistic: function of data • interpret the result • in probabilistic terms: sampling distribution of statistic
  • 9. Estimation • Determination of the population parameter by the calculation of a sample statistic… Characteristic Population Parameter μ Sample Statistic 𝑥
  • 10.
  • 11. A sampling distribution is a probability distribution of a statistic obtained through a large number of samples drawn from a specific population. Population parameter μ Sample Statistic 𝑥1 Sample Statistic 𝑥2 Sample Statistic 𝑥3 Uncertainty Estimates are not perfect Sampling distribution
  • 12.
  • 13. Types of estimators in statistics Estimator An estimator is a statistic (function of data) that produces such a guess. We usually mean by “best” an estimator whose sampling distribution is more concentrated about the population parameter value compared to other estimators. The two main types of estimators in statistics are • Point estimators • Interval estimators Point estimation: Point estimators are functions that are used to find an approximate value of a population parameter from random samples of the population. They use the sample data of a population to calculate a point estimate or a statistic that serves as the best estimate of an unknown parameter of a population. We want to estimate a population parameter using the observed data. Ex. some measure of variation, an average, min, max, quantile, etc.
  • 14. • Interval estimation Interval estimation uses sample data to calculate the interval of the possible values of an unknown parameter of a population. The interval of the parameter is selected in a way that it falls within a 95% or higher probability, also known as the confidence interval. The confidence interval is used to indicate how reliable an estimate is, and it is calculated from the observed data. The endpoints of the intervals are referred to as the upper and lower confidence limits.
  • 15. Properties of Point Estimators • Unbiasedness • Consistency • Sufficiency • Efficiency Unbiasedness An estimator of a given parameter is said to be unbiased if its expected value is equal to the true value of the parameter. The bias of a point estimator is defined as the difference between the expected value of the estimator and the value of the parameter being estimated. When Also, the closer the expected value of a parameter is to the value of the parameter being measured, the lesser the bias is.
  • 16.
  • 17. Consistency Consistency tells us how close the point estimator stays to the value of the parameter as it increases in size. The point estimator requires a large sample size for it to be more consistent and accurate. You can also check if a point estimator is consistent by looking at its corresponding expected value and variance. For the point estimator to be consistent, the expected value should move toward the true value of the parameter.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. Maximum likelihood estimator The maximum likelihood estimator method of point estimation attempts to find the unknown parameters that maximize the likelihood function. It takes a known model and uses the values to compare data sets and find the most suitable match for the data. For example, a researcher may be interested in knowing the average weight of babies born prematurely. Since it would be impossible to measure all babies born prematurely in the population, the researcher can take a sample from one location. Since the weight of pre-term babies follows a normal distribution, the researcher can use the maximum likelihood estimator to find the average weight of the entire population of pre-term babies based on the sample data.
  • 23.
  • 24.
  • 25.
  • 26. Method of moments The method of moments of estimating parameters was introduced in 1887 by Russian mathematician Pafnuty Chebyshev. It starts by taking known facts about a population and then applying the facts to a sample of the population. The first step is to derive equations that relate the population moments to the unknown parameters. The next step is to draw a sample of the population to be used to estimate the population moments. The equations derived in step one are then solved using the sample mean of the population moments. This produces the best estimate of the unknown population parameters.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38. What is Confidence Interval? A confidence interval is an estimate of an interval in statistics that may contain a population parameter. The unknown population parameter is found through a sample parameter calculated from the sampled data. For example, the population mean μ is found using the sample mean x̅. The interval is generally defined by its lower and upper bounds. The confidence interval is expressed as a percentage (the most frequently quoted percentages are 90%, 95%, and 99%). The percentage reflects the confidence level. The concept of the confidence interval is very important in statistics (hypothesis testing) since it is used as a measure of uncertainty. The concept was introduced by Polish mathematician and statistician, Jerzy Neyman in 1937.
  • 39. Confidence Interval We can also quantify the uncertainty (sampling distribution) of our point estimate. One way of doing this is by constructing an interval that is likely to contain the population parameter. One such an interval, which is computed on the basis of the data, is called a confidence interval. The sampling probability that the confidence interval will indeed contain the parameter value is called the confidence level. We construct confidence intervals for a given confidence level.
  • 40. Interpretation of Confidence Interval The proper interpretation of a confidence interval is probably the most challenging aspect of this statistical concept. One example of the most common interpretation of the concept is the following: There is a 95% probability that, in the future, the true value of the population parameter (e.g., mean) will fall within X [lower bound] and Y [upper bound] interval. In addition, we may interpret the confidence interval using the statement below: We are 95% confident that the interval between X [lower bound] and Y [upper bound] contains the true value of the population parameter. However, it would be inappropriate to state the following: There is a 95% probability that the interval between X [lower bound] and Y [upper bound] contains the true value of the population parameter.
  • 41. How to Calculate the Confidence Interval? The interval is calculated using the following steps: • Gather the sample data. • Calculate the sample mean x̅. • Determine whether a population’s standard deviation is known or unknown. • If a population’s standard deviation is known, we can use a z-score for the corresponding confidence level. • If a population’s standard deviation is unknown, we can use a t- statistic for the corresponding confidence level.
  • 42. • Find the lower and upper bounds of the confidence interval using the following formulas: a. Known population standard deviation
  • 43. b. Unknown population standard deviation
  • 44. Examples • Suppose we conduct a poll to try and get a sense of the outcome of an upcoming election with two candidates. We poll 1000 people, and 550 of them respond that they will vote for candidate A . How confident can we be that a given person will cast their vote for candidate A? Sol. 1. Select our desired levels of confidence We’re going to use the 90%, 95%, and 99% levels 2. Calculate α and α/2 Our α values are 0.1, 0.05, and 0.01 respectively Our α/2 values are 0.05, 0.025, and 0.005 3. Look up the corresponding z-scores Our Zα /2 values are 1.645, 1.96, and 2.58 4. Multiply the z-score by the standard error to find the margin of error First we need to calculate the standard error
  • 45. 5. Find the interval by adding and subtracting this product from the mean. In this case, we are working with a distribution we have not previously discussed, a normal binomial distribution (i.e. a vote can choose Candidate A or B, a binomial function). We have a probability estimator from our sample, where the probability of an individual in our sample voting for candidate A was found to be 550/1000 or 0.55. We can use this information in a formula to estimate the standard error for such a distribution: 5. Multiply the z-score by the standard error cont. • For a normal binominal distribution, the standard error can be estimated using: S.E= 0.0157
  • 46. • We can now multiply this value by the z-scores to calculate the margins of error for each conf. level Multiply the z-score by the standard error cont. • We calculate the margin of error and add and subtract that value from the mean (0.55 in this case) to find the bounds of our confidence intervals at each level of confidence: CI Zα/2 Margin of error Lower Bounds Upper Bounds 90% 1.645 0.026 0.524 0.576 95% 1.96 0.031 0.519 0.581 99% 2.58 0.041 0.509 0.591
  • 47. What is Hypothesis Testing? Hypothesis Testing is a method of statistical inference. It is used to test if a statement regarding a population parameter is statistically significant. Hypothesis testing is a powerful tool for testing the power of predictions. For example: A Statistician might want to make a prediction of the mean value a customer would pay for his firm’s product. He can then formulate a hypothesis, for example, “The average value that customers will pay for my product is larger than $5”. To statistically test this question, the firm owner could use hypothesis testing.
  • 48. Hypothesis testing is formulated in terms of two hypothesis: • H0: the null hypothesis; • H1: the alternate hypothesis. The hypothesis we want to test is if H1 is “likely" true. So, there are two possible outcomes: • Reject H0 and accept H1 because of sufficient evidence in the sample in favor or H1; • Do not reject H0 because of insufficient evidence to support H1.
  • 49. Null Hypothesis and Alternative Hypothesis • Null Hypothesis • Alternative Hypothesis The Null Hypothesis is usually set as what we don’t want to be true. It is the hypothesis to be tested. Therefore, the Null Hypothesis is considered to be true, until we have sufficient evidence to reject it. If we reject the null hypothesis, we are led to the alternative hypothesis. Example of the business owner who is looking for some customer insight. His null hypothesis would be: H0 : The average value customers are willing to pay for my product is smaller than or equal to $5 or H0 : µ ≤ 5(µ = the population mean) The alternative hypothesis would then be what we are evaluating, so, in this case, it would be: Ha : The average value customers are willing to pay for the product is greater than $5 or Ha : µ > 5
  • 50.
  • 51. Type I and Type II Errors A Type I Error arises when a true Null Hypothesis is rejected. The probability of making a Type I Error is also known as the level of significance of the test, which is commonly referred to as alpha (α). So, for example, if a test that has its alpha set as 0.01, there is a 1% probability of rejecting a true null hypothesis or a 1% probability of making a Type I Error. A Type II Error arises when you fail to reject a False Null Hypothesis. The probability of making a Type II Error is commonly denoted by the Greek letter beta (β). β is used to define the Power of a Test, which is the probability of correctly rejecting a false null hypothesis.
  • 52. The Power of a Test is defined as 1-β. A test with more Power is more desirable, as there is a lower probability of making a Type II Error. However, there is a tradeoff between the probability of making a Type I Error and the probability of making a Type II Error.
  • 54. • Significance level - is the maximum probability of committing a Type I error. This probability is symbolized by α. P(Type I error|H0 is true)=α. • Critical or Rejection Region – the range of values for the test value that indicate a significant difference and that the null hypothesis should be rejected. • Non-critical or Non-rejection Region – the range of values for the test value that indicates that the difference was probably due to chance and that the null hypothesis should not be rejected.
  • 55.
  • 59.
  • 60.
  • 61.
  • 62.
  • 64.
  • 65.
  • 66. Testing a hypothesis about the mean of a population We have the following steps: 1.Data: determine variable, sample size (n), sample mean( ) , population standard deviation or sample standard deviation (s) if is unknown 2. Assumptions : We have two cases: Case1: Population is normally or approximately normally distributed with known or unknown variance (sample size n may be small or large), Case 2: Population is not normal with known or unknown variance (n is large i.e. n≥30).
  • 67. 3.Hypothesis: we have three cases Case I : H0: μ=μ0 Vs HA: μ μ0 e.g. we want to test that the population mean is different than 50 Case II : H0: μ = μ0 Vs HA: μ > μ0 e.g. we want to test that the population mean is greater than 50 Case III : H0: μ = μ0 Vs HA: μ< μ0 e.g. we want to test that the population mean is less than 50
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74. Example • Researchers are interested in the mean age of a certain population. • A random sample of 10 individuals drawn from the population of interest has a mean of 27. • Assuming that the population is approximately normally distributed with variance 20,can we conclude that the mean is different from 30 years ? (α=0.05) . • If the p - value is 0.0340 how can we use it in making a decision?
  • 75. Solution 1-Data: variable is age, n=10, =27 ,σ2=20,α=0.05 2-Assumptions: the population is approximately normally distributed with variance 20 3-Hypotheses: • H0 : μ=30 • HA: μ 30 4-Test Statistic: • Z = -2.12 5.Decision Rule The alternative hypothesis is HA: μ ≠ 30 Hence we reject H0 if Z > Z(1-0.025)= Z(0.975) • or Z< - Z(1-0.025 )= - Z(0.975) • Z(0.975)=1.96(from table D)
  • 76. 6.Decision: • We reject H0 ,since -2.12 is in the rejection region . • We can conclude that μ is not equal to 30 • Using the p value ,we note that p-value =0.0340< 0.05,therefore we reject H0
  • 77.
  • 78.
  • 79.