SlideShare a Scribd company logo
1 of 45
Population & samples
Dr. Lloyd C. Bautista
What is statistics?
• We collect data from the real
world to test our hypothesis on a
phenomenon.
• To test these hypothesis, we build
‘statistical models’ or scaled-down
version to determine its fitness to
the situation of interest.
• In short, we want to know if the
statistical model is an accurate
representation of the real world or
data we collected or observed.
Concept of sampling
 We cannot taste each and every
lanzones in the basket.
 We get a sample, and hope that
it is a good representative of the
population.
 We infer things from a general
population.
In statistics, we are interested in finding results thru a ‘sample’ that can generalize in the
entire ‘population’.
We get a small subset of the population known as the SAMPLE and use these data to infer
things about the population as a whole.
Inferential statistics
 Process of drawing conclusions about the properties of a population
based on the information obtained from the sample.
 Difficulty is to determine which statistical model is the most
appropriate
Slovin’s formula
whereas:
n = no. of samples
N = total population
e = error margin / margin of error
There are 1000 employees in the organization. You want to conduct a satisfaction survey with
the margin of error of 0.05 (5%). Using Slovin’s formula, you need to survey:
Sampling methods: Random/probability
1. Simple random
2. Stratified random
a) Proportionate
b) Disproportionate
3. Cluster
a) Single
b) Double
c) Multiple
Sampling methods: Non-random
1. Quota
2. Judgmental
3. Accidental
4. Snowball
5. Expert
**Mixed random/non-random systematic
Parameters & statistics
Parameters are characteristics of “Population.”
μis the population mean
σis the population standard deviation sis the sample standard deviation
Statistics are characteristics of “Sample.”
Concept of sampling error
• The sample mean is very close to the
population mean. The accuracy increases
when the samples taken increase as well.
• If we take the mean of ALL SAMPLE MEANS,
it will be equal to the POPULATION MEAN
Standard error (SE) = standard deviation of the
SAMPLE MEANS from the POPULATION MEAN
The sampling distribution of all
possible ‘sample means’ can
help generalize what the
‘population mean’ would likely
be.
s
X -
68.26%
95.44%
99.74%
Sampling Distribution of Means is
a NORMAL CURVE.
 A normal curve is a
Probability Distribution,
which shows the likelihood of
cases as we travel away from
the means of means (true
population mean).
 It is hard….but wait.
Sampling distribution of means
As long as the sample size is reasonable large (N >30), the sampling
distribution of means approximates that of a normal curve.
In short, if the sample is drawn from a reasonable large number of
cases that is NORMALLY DISTRIBUTED, the SAMPLING DISTRIBUTION
shall be normally distributed regardless if the raw score in the mean.
The mean of the sampling distribution (the means of means)
becomes closer to the true population mean μ.
The standard deviation s of a sampling distribution of means is
smaller than the standard deviation of the population σ.
Two theories
• Central Limit Theorem: if you obtain repeated samples (n) from a
population (N) that is normally distributed, with “population” mean
(µ) and SD (σ), the SAMPLING DISTRIBUTION OF MEANS (s) SHALL BE
NORMALLY DISTRIBUTED.
• Law of Large Numbers: if you obtain repeated samples (n) from a
population of whatever distribution, with mean (µ) and SD (σ), if the
sample is large enough even if not normally distributed, then the
SAMPLING DISTRIBUTION OF MEAN COULD BE ASSUMED AS
NORMALLY DISTRIBUTED
Confidence interval
• Through the SE, we can find the Range of values within which the
POPULATION MEAN is likely to fall.
• We can use the SAMPLE MEAN as an estimate of the POPULATION MEAN
and find the Range within which there is either 99% or 95% probability
(chance) that the population mean will fall.
About 99%
outliers
outliers
About 95%
outliers
outliers
Z = +2.58
Z = - 2.58 Z = - 1.96 Z = +1.96
μ μ
99% CI = +/- (2.58 x SE)
95% CI = +/- (1.96 x SE)
2.5%
2.5%
0.5%
0.5%
Mean pop = 6.9
Basket
1 2 3
1.58 1.62 1.83
6.45 6.70 6.80
6.0
6.5
SD = 6.45
SD = 6.70
SD = 6.80
Sampling distribution
• This is the different sample means
plotted as a symmetrical distribution.
• It tells us how representative a sample is
of the population.
• Hence, the Standard Error is the standard
deviation of the sample means.
• We obtain this by…
• If the SE is large, then there is a large
variability between the sample means
and so the sample might not represent
the population.
μ = 6.9
Single sample test of the MEAN
 Evaluate the performance of a given population against a ‘standard’
from the sample results.
 Z-test for a Single Sample Test of the Mean (N>30)
 T-test for a Single Sample Test of the Mean (N<30)
 Z-test for Single Sample Test of the Proportion
Z = +1.96
Z = - 1.96
μ
95% CI = +/- (1.96 x SE)
2.5%
Confidence interval / Standard Error
47.5%
95%
2.5%
47.5%
Z = +1.96
Z = - 1.96
μ
97% CI = +/- (2.17 x SE)
2.5%
47.5%
97%
2.5%
48.50%
Z = - 2.17
1.5%
There was random sample of 64 Local Government Units
(LGUs) selected. A standard has been set that LGUs
contractual personnel paid thru their MOOE should not
be more than 60 percent. In your sample, the Mean is 55
percent, and standard deviation (σ) is 12. DO YOU THINK
THAT the LGUs are exceeding the standard percentage of
hiring contractual employees.
Ho : µ = 60
Ha : µ < 60
Z-test for single sample test of Mean (n > 30)
(60)
0
2.5%
.05%
+/- 1.96
49.95%
z
Area between
Mean & z
Area beyond &
z
3.30 49.95 0.05
Table A. Percentage ofArea under the Normal Curve
z = -3.33
95%
2.5%
DECISION CRITERIA : With α = .05, we REJECT the Ho, if Z > +/-1.96
+/- 1.96
You got a sample population of 1,500 informal settlers in Novaliches. PSA
claims that those at the bottom 30 percent of the population spend 59.7%
of their income on food. In your sample, the Mean is 60.7 percent, and
standard deviation (σ) is 12.4. DO YOU THINK THAT the informal settlers are
exceeding the standard expenses for food?
z
Area between
Mean & z
Area beyond &
z
3.12 49.91 0.09
Table A. Percentage ofArea under the Normal Curve
DECISION CRITERIA : With α = 5%, we REJECT the Null Hypothesis, if Z >+/-1.96
(60)
0
2.50%
.09% or .1%
+/- 1.64
49.9%
z = 3.12
95%
5.00%
DECISION CRITERIA : With α = 10%, we REJECT the Ho, if Z > +/-1.64
+/- 1.96
5.00%
2.50%
In an IQ test conducted in class, 24 sample students
were selected. It found that the average IQ was 94. But
in the admission test, the standard IQ was not less than
100. The standard deviation (σ) among the samples is
12. Do you think that the IQ of the sample represents
that of the entire batch?
Ho : µ = 100
Ha : µ < 100
T-test for single sample test of Mean (n < 30)
z
Area between
Mean & z
Area beyond &
z
2.40 49.18 0.82
Table A. Percentage ofArea under the Normal Curve
(60)
0
2.50%
.82%
+/- 1.96
49.18%
z = -2.40
95%
2.50%
DECISION CRITERIA : With α = .05 or 5%, we REJECT the Ho, if Z > +/-1.96
z
Area between
Mean & z
Area beyond &
z
2.40 49.18 0.82
Table A. Percentage ofArea under the Normal Curve
(60)
0
2.50%
.82%
+/- 1.96
49.18%
z = -2.40
95%
2.50%
DECISION CRITERIA : With α = .05 or 5%, we REJECT the Ho, if Z > +/-1.96
In measuring the Body Mass Index (BMI), you got 20
sample of students. It found that the average BMI was
23.7. However, the school’s health officials claim that the
standard is 18.1 (normal). The standard deviation (σ)
among the samples is 8. Do you think that the BMI of the
sample represents that of the entire student population?
Ho : µ = 18.1
Ha : µ > 18.1
T-test for single sample test of Mean (n < 30)
z
Area between
Mean & z
Area beyond &
z
3.05 49.89 0.11
Table A. Percentage ofArea under the Normal Curve
DECISION CRITERIA : With α = 5%, we REJECT the Null Hypothesis, if Z >+/- 1.96
(18.1)
0
2.50%
.11%
+/- 1.67
49.89%
z = 3.05
95%
5.00%
DECISION CRITERIA : With α = 10%, we REJECT the Ho, if Z > +/-1.67
+/- 1.97
5.00%
2.50%
You took a poll of 2,000 voters in Manilaand found out
that out of two candidates for office he will obtain 54
percent of the votes sampled. DO YOU THINK HE WILL
WIN?
Ho : Pµ = .50
Ha : Pµ > .50
Z-test for single sample test of PROPORTION (n > 30)
Ho : Pµ = .50
Ha : Pµ > .50
z
Area between
Mean & z
Area beyond &
z
3.60 49.98 0.02
Table A. Percentage ofArea under the Normal Curve
DECISION CRITERIA : With α = 5%, we REJECT the Null Hypothesis, if Z >+/-1.96
(60)
0
2.50%
.02%
+/- 1.64
49.98%
z = 3.60
95%
5.00%
DECISION CRITERIA : With α = 10%, we REJECT the Ho, if Z > +/-1.64
+/- 1.96
5.00%
2.50%
In the City of San Ignacio, the population mean of
income (μ ) is P18,299. To know if this is true, we got 3
sample sets, with a sample means of distribution at
P24,611, P23,667 & P23,056. Now we want to know the
‘probability’ or likely occurrence that our sample means
approximate the population mean. Your σ is P11,607.
Go to 101_Sampling & Population_workshop_Sheet_Income_2.
n 65
POPULATION MEAN 18,229
POPULATION SD 11,607
Z SCORE
SET A 24,611 24,611 - 18,299 6,382 4.40
11,607 / sqrt (65 - 1) 1,451 -
Z SCORE
SET A 23,667 23,667 - 18,299 5,438 3.75
11,607 / sqrt (65 - 1) 1,451 -
Z SCORE
SET A 23,056 23056 - 18,299 4,827 3.33
11,607 / sqrt (65 - 1) 1,451 -
Table A. Percentage of Area under the Normal Curve
(a) (c)
Z score Area beyond Z
4.40 0.003
3.75 0.010
3.33 0.05
Area between mean & Z
(b)
49.977
49.990
49.950
49.95%
P18,229
0.05%
Z = 3.33
Define the decision criteria if Margin
of error is 5%, 1% and 10%?
Z = 3.75 Z = 4.40
+/- 1.97
2.50%
Rule of thumb
The sample mean is very close to the population mean. The
accuracy increases when the samples taken increase as well.
(1) If the DIFFERENCE BETWEEN MEANS lies farther from 0 (means of
difference), it has small probability of occurrence, thus we RETAIN the
NULL HYPOTHESIS. Why? The difference of means might be the result
of a sampling error
(2) If it is close to 0, the probability is large, ergo we ACCEPT the
ALTERNATIVE HYPOTHESIS. It is statically significant to ignore.
Level of significance
When do we reject someone after
giving him so many chances to prove his
worth?
Oftentimes, we reject a love when the
person invades or crosses something
critically significant to us.
This is the concept of Level of
Significance, denoted as α = .05 or 5%.
α = .05 is the level of probability that
the Null Hypothesis can be rejected
(when they cross the line), and
ALTERNATIVE HYPOTHESIS can be
accepted.
Null vs Alternative Hypothesis
Null hypothesis (H0) means that (μA = μB)… “THERE IS NO
DIFFERENCE BETWEEN THE POPULATION MEAN & SAMPLE MEAN. If
there are any differences, discrepancies, or suspiciously outlying
results they are purely due to sampling errors".
Alternative hypothesis (Ha) means that : (μA ≠ μB) “THERE IS A BIG
DIFFERENCE, or the difference between population and samples are
too large to ignore and statistically significant.
 If the Probability < .05, we reject the NULL since the
probability is too small (less than 5 chances out of 100)
that the sampling difference is a result of sampling error.
 P value <.05, WE REJECT
 In the same way, a .05 level of significance is associated
with z score = 1.96 in either tail of the normal curve. In
other words, the difference between means fall between
-1.96 σX̅1 - X̅2 to +1.96 σX̅1 - X̅2. Only 5% fall out of the
cut-off. These shaded regions are the CRITICAL OR
REJECTION REGIONS
 This tells us that 1.96 standard deviations from the mean,
95% (47.50%+47.50%) of the difference lie between the
two samples. Only 5% fall at or beyond this point
(2.5%+2.5% = 5%) at both tail end
0
2.50%
Z = +/- 1.97
47.5%
95%
2.50%
47.5%
We can even set a more conservative or stringent level of
significance, whereby we reject the NULL HYPOTHESIS if
it is less than 1 chance out of 1000. This is α = .01 which
is very conservative. The z = 2.58.
(a) (b) (c)
z score Area between Mean and Z Area beyond z
2.58 49.51 0.49
(a) (b) (c)
z score Area between Mean and Z Area beyond z
1.96 47.50 2.50
TABLE A. Percentage of Area Under the Normal Curve
0
2.50%
Z = +/- 1.97
47.5%
95%
2.50%
49.5%
0.5%
Z = +/- 2.58
Types of errors
 By the way when we reject the NULL HYPOTHESIS, we open ourselves to two
kinds of errors
 Type 1 error is when we reject the null, and it is true. If our level of significance
(α) is .01, there is 1 chance out of 100 of making the wrong decision.
 Type 2 error is we retain the null and it is actually false. To avoid this, we
increase the size of the sampling population so that it is more represented.
 Again let us go back to level of significance (α = .05.). The P (probability) refers
to the actual cases drawn from the data. As researchers, we set the level of
significance as a threshold below which the NULL HYPOTHESIS is rejected since
the probability is so small.
 Alpha value α is the size of the tail region under the curve that makes us reject
the Null Hypothesis.
 In this case, notwithstanding the critical value is
1.70% or .017, we chose .5% or .05. Thus, if the P <
.05, we reject the Null Hypothesis. We may commit
the Type Error I.
(a) (b) (c)
z score Area between Mean and Z Area beyond z
1.96 47.50 2.50
TABLE A. Percentage of Area Under the Normal Curve
(a) (b) (c)
z score Area between Mean and Z Area beyond z
2.12 48.30 1.70
0
95.0%
47.5%
1.7%
Z = +/- 2.12
Z = +/- 1.97
2.5%
48.3%

More Related Content

Similar to 101_sampling__population_Sept_2020.ppt

Introduction to Statistics - Part 2
Introduction to Statistics - Part 2Introduction to Statistics - Part 2
Introduction to Statistics - Part 2Damian T. Gordon
 
Statistik 1 7 estimasi & ci
Statistik 1 7 estimasi & ciStatistik 1 7 estimasi & ci
Statistik 1 7 estimasi & ciSelvin Hadi
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis TestingSr Edith Bogue
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statisticsMaria Theresa
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distributionAvjinder (Avi) Kaler
 
Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxmaxinesmith73660
 
The standard normal curve & its application in biomedical sciences
The standard normal curve & its application in biomedical sciencesThe standard normal curve & its application in biomedical sciences
The standard normal curve & its application in biomedical sciencesAbhi Manu
 
Statistik Chapter 6
Statistik Chapter 6Statistik Chapter 6
Statistik Chapter 6WanBK Leo
 
lesson 3.1 Unit root testing section 1 .pptx
lesson 3.1 Unit root testing section 1 .pptxlesson 3.1 Unit root testing section 1 .pptx
lesson 3.1 Unit root testing section 1 .pptxErgin Akalpler
 
Z-Test and Standard error
Z-Test and Standard errorZ-Test and Standard error
Z-Test and Standard errordharazalavadiya
 
Hypothesis testing: A single sample test
Hypothesis testing: A single sample testHypothesis testing: A single sample test
Hypothesis testing: A single sample testUmme Salma Tuli
 
What is a Single Sample Z Test?
What is a Single Sample Z Test?What is a Single Sample Z Test?
What is a Single Sample Z Test?Ken Plummer
 
lecture13MWF.pdflecture13MWF.pdflecture13MWF.pdf
lecture13MWF.pdflecture13MWF.pdflecture13MWF.pdflecture13MWF.pdflecture13MWF.pdflecture13MWF.pdf
lecture13MWF.pdflecture13MWF.pdflecture13MWF.pdfEmersonRosal1
 

Similar to 101_sampling__population_Sept_2020.ppt (20)

05inference_2011.ppt
05inference_2011.ppt05inference_2011.ppt
05inference_2011.ppt
 
Introduction to Statistics - Part 2
Introduction to Statistics - Part 2Introduction to Statistics - Part 2
Introduction to Statistics - Part 2
 
Statistik 1 7 estimasi & ci
Statistik 1 7 estimasi & ciStatistik 1 7 estimasi & ci
Statistik 1 7 estimasi & ci
 
QT1 - 07 - Estimation
QT1 - 07 - EstimationQT1 - 07 - Estimation
QT1 - 07 - Estimation
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
 
Taxi for Professor Evans
Taxi for Professor EvansTaxi for Professor Evans
Taxi for Professor Evans
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distribution
 
Confidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docxConfidence Intervals in the Life Sciences PresentationNamesS.docx
Confidence Intervals in the Life Sciences PresentationNamesS.docx
 
The standard normal curve & its application in biomedical sciences
The standard normal curve & its application in biomedical sciencesThe standard normal curve & its application in biomedical sciences
The standard normal curve & its application in biomedical sciences
 
A+.pptx
A+.pptxA+.pptx
A+.pptx
 
Measures of relationship
Measures of relationshipMeasures of relationship
Measures of relationship
 
Biostatistics ii4june
Biostatistics ii4juneBiostatistics ii4june
Biostatistics ii4june
 
Statistik Chapter 6
Statistik Chapter 6Statistik Chapter 6
Statistik Chapter 6
 
U unit8 ksb
U unit8 ksbU unit8 ksb
U unit8 ksb
 
lesson 3.1 Unit root testing section 1 .pptx
lesson 3.1 Unit root testing section 1 .pptxlesson 3.1 Unit root testing section 1 .pptx
lesson 3.1 Unit root testing section 1 .pptx
 
Z-Test and Standard error
Z-Test and Standard errorZ-Test and Standard error
Z-Test and Standard error
 
Hypothesis testing: A single sample test
Hypothesis testing: A single sample testHypothesis testing: A single sample test
Hypothesis testing: A single sample test
 
What is a Single Sample Z Test?
What is a Single Sample Z Test?What is a Single Sample Z Test?
What is a Single Sample Z Test?
 
lecture13MWF.pdflecture13MWF.pdflecture13MWF.pdf
lecture13MWF.pdflecture13MWF.pdflecture13MWF.pdflecture13MWF.pdflecture13MWF.pdflecture13MWF.pdf
lecture13MWF.pdflecture13MWF.pdflecture13MWF.pdf
 

Recently uploaded

Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

101_sampling__population_Sept_2020.ppt

  • 1. Population & samples Dr. Lloyd C. Bautista
  • 2. What is statistics? • We collect data from the real world to test our hypothesis on a phenomenon. • To test these hypothesis, we build ‘statistical models’ or scaled-down version to determine its fitness to the situation of interest. • In short, we want to know if the statistical model is an accurate representation of the real world or data we collected or observed.
  • 3. Concept of sampling  We cannot taste each and every lanzones in the basket.  We get a sample, and hope that it is a good representative of the population.  We infer things from a general population.
  • 4. In statistics, we are interested in finding results thru a ‘sample’ that can generalize in the entire ‘population’. We get a small subset of the population known as the SAMPLE and use these data to infer things about the population as a whole.
  • 5. Inferential statistics  Process of drawing conclusions about the properties of a population based on the information obtained from the sample.  Difficulty is to determine which statistical model is the most appropriate
  • 6. Slovin’s formula whereas: n = no. of samples N = total population e = error margin / margin of error There are 1000 employees in the organization. You want to conduct a satisfaction survey with the margin of error of 0.05 (5%). Using Slovin’s formula, you need to survey:
  • 7. Sampling methods: Random/probability 1. Simple random 2. Stratified random a) Proportionate b) Disproportionate 3. Cluster a) Single b) Double c) Multiple
  • 8. Sampling methods: Non-random 1. Quota 2. Judgmental 3. Accidental 4. Snowball 5. Expert **Mixed random/non-random systematic
  • 9. Parameters & statistics Parameters are characteristics of “Population.” μis the population mean σis the population standard deviation sis the sample standard deviation Statistics are characteristics of “Sample.”
  • 11. • The sample mean is very close to the population mean. The accuracy increases when the samples taken increase as well. • If we take the mean of ALL SAMPLE MEANS, it will be equal to the POPULATION MEAN Standard error (SE) = standard deviation of the SAMPLE MEANS from the POPULATION MEAN
  • 12. The sampling distribution of all possible ‘sample means’ can help generalize what the ‘population mean’ would likely be.
  • 13. s X - 68.26% 95.44% 99.74% Sampling Distribution of Means is a NORMAL CURVE.  A normal curve is a Probability Distribution, which shows the likelihood of cases as we travel away from the means of means (true population mean).  It is hard….but wait.
  • 14. Sampling distribution of means As long as the sample size is reasonable large (N >30), the sampling distribution of means approximates that of a normal curve. In short, if the sample is drawn from a reasonable large number of cases that is NORMALLY DISTRIBUTED, the SAMPLING DISTRIBUTION shall be normally distributed regardless if the raw score in the mean. The mean of the sampling distribution (the means of means) becomes closer to the true population mean μ. The standard deviation s of a sampling distribution of means is smaller than the standard deviation of the population σ.
  • 15. Two theories • Central Limit Theorem: if you obtain repeated samples (n) from a population (N) that is normally distributed, with “population” mean (µ) and SD (σ), the SAMPLING DISTRIBUTION OF MEANS (s) SHALL BE NORMALLY DISTRIBUTED. • Law of Large Numbers: if you obtain repeated samples (n) from a population of whatever distribution, with mean (µ) and SD (σ), if the sample is large enough even if not normally distributed, then the SAMPLING DISTRIBUTION OF MEAN COULD BE ASSUMED AS NORMALLY DISTRIBUTED
  • 16. Confidence interval • Through the SE, we can find the Range of values within which the POPULATION MEAN is likely to fall. • We can use the SAMPLE MEAN as an estimate of the POPULATION MEAN and find the Range within which there is either 99% or 95% probability (chance) that the population mean will fall. About 99% outliers outliers About 95% outliers outliers Z = +2.58 Z = - 2.58 Z = - 1.96 Z = +1.96 μ μ 99% CI = +/- (2.58 x SE) 95% CI = +/- (1.96 x SE) 2.5% 2.5% 0.5% 0.5%
  • 17. Mean pop = 6.9 Basket 1 2 3 1.58 1.62 1.83 6.45 6.70 6.80 6.0 6.5 SD = 6.45 SD = 6.70 SD = 6.80
  • 18. Sampling distribution • This is the different sample means plotted as a symmetrical distribution. • It tells us how representative a sample is of the population. • Hence, the Standard Error is the standard deviation of the sample means. • We obtain this by… • If the SE is large, then there is a large variability between the sample means and so the sample might not represent the population. μ = 6.9
  • 19. Single sample test of the MEAN  Evaluate the performance of a given population against a ‘standard’ from the sample results.  Z-test for a Single Sample Test of the Mean (N>30)  T-test for a Single Sample Test of the Mean (N<30)  Z-test for Single Sample Test of the Proportion
  • 20. Z = +1.96 Z = - 1.96 μ 95% CI = +/- (1.96 x SE) 2.5% Confidence interval / Standard Error 47.5% 95% 2.5% 47.5% Z = +1.96 Z = - 1.96 μ 97% CI = +/- (2.17 x SE) 2.5% 47.5% 97% 2.5% 48.50% Z = - 2.17 1.5%
  • 21.
  • 22. There was random sample of 64 Local Government Units (LGUs) selected. A standard has been set that LGUs contractual personnel paid thru their MOOE should not be more than 60 percent. In your sample, the Mean is 55 percent, and standard deviation (σ) is 12. DO YOU THINK THAT the LGUs are exceeding the standard percentage of hiring contractual employees. Ho : µ = 60 Ha : µ < 60 Z-test for single sample test of Mean (n > 30)
  • 23. (60) 0 2.5% .05% +/- 1.96 49.95% z Area between Mean & z Area beyond & z 3.30 49.95 0.05 Table A. Percentage ofArea under the Normal Curve z = -3.33 95% 2.5% DECISION CRITERIA : With α = .05, we REJECT the Ho, if Z > +/-1.96 +/- 1.96
  • 24.
  • 25. You got a sample population of 1,500 informal settlers in Novaliches. PSA claims that those at the bottom 30 percent of the population spend 59.7% of their income on food. In your sample, the Mean is 60.7 percent, and standard deviation (σ) is 12.4. DO YOU THINK THAT the informal settlers are exceeding the standard expenses for food?
  • 26. z Area between Mean & z Area beyond & z 3.12 49.91 0.09 Table A. Percentage ofArea under the Normal Curve DECISION CRITERIA : With α = 5%, we REJECT the Null Hypothesis, if Z >+/-1.96 (60) 0 2.50% .09% or .1% +/- 1.64 49.9% z = 3.12 95% 5.00% DECISION CRITERIA : With α = 10%, we REJECT the Ho, if Z > +/-1.64 +/- 1.96 5.00% 2.50%
  • 27.
  • 28. In an IQ test conducted in class, 24 sample students were selected. It found that the average IQ was 94. But in the admission test, the standard IQ was not less than 100. The standard deviation (σ) among the samples is 12. Do you think that the IQ of the sample represents that of the entire batch? Ho : µ = 100 Ha : µ < 100 T-test for single sample test of Mean (n < 30)
  • 29. z Area between Mean & z Area beyond & z 2.40 49.18 0.82 Table A. Percentage ofArea under the Normal Curve (60) 0 2.50% .82% +/- 1.96 49.18% z = -2.40 95% 2.50% DECISION CRITERIA : With α = .05 or 5%, we REJECT the Ho, if Z > +/-1.96
  • 30. z Area between Mean & z Area beyond & z 2.40 49.18 0.82 Table A. Percentage ofArea under the Normal Curve (60) 0 2.50% .82% +/- 1.96 49.18% z = -2.40 95% 2.50% DECISION CRITERIA : With α = .05 or 5%, we REJECT the Ho, if Z > +/-1.96
  • 31. In measuring the Body Mass Index (BMI), you got 20 sample of students. It found that the average BMI was 23.7. However, the school’s health officials claim that the standard is 18.1 (normal). The standard deviation (σ) among the samples is 8. Do you think that the BMI of the sample represents that of the entire student population? Ho : µ = 18.1 Ha : µ > 18.1 T-test for single sample test of Mean (n < 30)
  • 32. z Area between Mean & z Area beyond & z 3.05 49.89 0.11 Table A. Percentage ofArea under the Normal Curve DECISION CRITERIA : With α = 5%, we REJECT the Null Hypothesis, if Z >+/- 1.96 (18.1) 0 2.50% .11% +/- 1.67 49.89% z = 3.05 95% 5.00% DECISION CRITERIA : With α = 10%, we REJECT the Ho, if Z > +/-1.67 +/- 1.97 5.00% 2.50%
  • 33.
  • 34. You took a poll of 2,000 voters in Manilaand found out that out of two candidates for office he will obtain 54 percent of the votes sampled. DO YOU THINK HE WILL WIN? Ho : Pµ = .50 Ha : Pµ > .50 Z-test for single sample test of PROPORTION (n > 30)
  • 35. Ho : Pµ = .50 Ha : Pµ > .50 z Area between Mean & z Area beyond & z 3.60 49.98 0.02 Table A. Percentage ofArea under the Normal Curve DECISION CRITERIA : With α = 5%, we REJECT the Null Hypothesis, if Z >+/-1.96 (60) 0 2.50% .02% +/- 1.64 49.98% z = 3.60 95% 5.00% DECISION CRITERIA : With α = 10%, we REJECT the Ho, if Z > +/-1.64 +/- 1.96 5.00% 2.50%
  • 36.
  • 37. In the City of San Ignacio, the population mean of income (μ ) is P18,299. To know if this is true, we got 3 sample sets, with a sample means of distribution at P24,611, P23,667 & P23,056. Now we want to know the ‘probability’ or likely occurrence that our sample means approximate the population mean. Your σ is P11,607. Go to 101_Sampling & Population_workshop_Sheet_Income_2.
  • 38. n 65 POPULATION MEAN 18,229 POPULATION SD 11,607 Z SCORE SET A 24,611 24,611 - 18,299 6,382 4.40 11,607 / sqrt (65 - 1) 1,451 - Z SCORE SET A 23,667 23,667 - 18,299 5,438 3.75 11,607 / sqrt (65 - 1) 1,451 - Z SCORE SET A 23,056 23056 - 18,299 4,827 3.33 11,607 / sqrt (65 - 1) 1,451 - Table A. Percentage of Area under the Normal Curve (a) (c) Z score Area beyond Z 4.40 0.003 3.75 0.010 3.33 0.05 Area between mean & Z (b) 49.977 49.990 49.950 49.95% P18,229 0.05% Z = 3.33 Define the decision criteria if Margin of error is 5%, 1% and 10%? Z = 3.75 Z = 4.40 +/- 1.97 2.50%
  • 39. Rule of thumb The sample mean is very close to the population mean. The accuracy increases when the samples taken increase as well. (1) If the DIFFERENCE BETWEEN MEANS lies farther from 0 (means of difference), it has small probability of occurrence, thus we RETAIN the NULL HYPOTHESIS. Why? The difference of means might be the result of a sampling error (2) If it is close to 0, the probability is large, ergo we ACCEPT the ALTERNATIVE HYPOTHESIS. It is statically significant to ignore.
  • 40. Level of significance When do we reject someone after giving him so many chances to prove his worth? Oftentimes, we reject a love when the person invades or crosses something critically significant to us. This is the concept of Level of Significance, denoted as α = .05 or 5%. α = .05 is the level of probability that the Null Hypothesis can be rejected (when they cross the line), and ALTERNATIVE HYPOTHESIS can be accepted.
  • 41. Null vs Alternative Hypothesis Null hypothesis (H0) means that (μA = μB)… “THERE IS NO DIFFERENCE BETWEEN THE POPULATION MEAN & SAMPLE MEAN. If there are any differences, discrepancies, or suspiciously outlying results they are purely due to sampling errors". Alternative hypothesis (Ha) means that : (μA ≠ μB) “THERE IS A BIG DIFFERENCE, or the difference between population and samples are too large to ignore and statistically significant.
  • 42.  If the Probability < .05, we reject the NULL since the probability is too small (less than 5 chances out of 100) that the sampling difference is a result of sampling error.  P value <.05, WE REJECT  In the same way, a .05 level of significance is associated with z score = 1.96 in either tail of the normal curve. In other words, the difference between means fall between -1.96 σX̅1 - X̅2 to +1.96 σX̅1 - X̅2. Only 5% fall out of the cut-off. These shaded regions are the CRITICAL OR REJECTION REGIONS  This tells us that 1.96 standard deviations from the mean, 95% (47.50%+47.50%) of the difference lie between the two samples. Only 5% fall at or beyond this point (2.5%+2.5% = 5%) at both tail end 0 2.50% Z = +/- 1.97 47.5% 95% 2.50% 47.5%
  • 43. We can even set a more conservative or stringent level of significance, whereby we reject the NULL HYPOTHESIS if it is less than 1 chance out of 1000. This is α = .01 which is very conservative. The z = 2.58. (a) (b) (c) z score Area between Mean and Z Area beyond z 2.58 49.51 0.49 (a) (b) (c) z score Area between Mean and Z Area beyond z 1.96 47.50 2.50 TABLE A. Percentage of Area Under the Normal Curve 0 2.50% Z = +/- 1.97 47.5% 95% 2.50% 49.5% 0.5% Z = +/- 2.58
  • 44. Types of errors  By the way when we reject the NULL HYPOTHESIS, we open ourselves to two kinds of errors  Type 1 error is when we reject the null, and it is true. If our level of significance (α) is .01, there is 1 chance out of 100 of making the wrong decision.  Type 2 error is we retain the null and it is actually false. To avoid this, we increase the size of the sampling population so that it is more represented.  Again let us go back to level of significance (α = .05.). The P (probability) refers to the actual cases drawn from the data. As researchers, we set the level of significance as a threshold below which the NULL HYPOTHESIS is rejected since the probability is so small.  Alpha value α is the size of the tail region under the curve that makes us reject the Null Hypothesis.
  • 45.  In this case, notwithstanding the critical value is 1.70% or .017, we chose .5% or .05. Thus, if the P < .05, we reject the Null Hypothesis. We may commit the Type Error I. (a) (b) (c) z score Area between Mean and Z Area beyond z 1.96 47.50 2.50 TABLE A. Percentage of Area Under the Normal Curve (a) (b) (c) z score Area between Mean and Z Area beyond z 2.12 48.30 1.70 0 95.0% 47.5% 1.7% Z = +/- 2.12 Z = +/- 1.97 2.5% 48.3%