SlideShare a Scribd company logo
1 of 46
Section 2. Mathematics as a Tool
(Part 1 – Data Management)
A Review
GenEd Second Generation Training
Mathematics in the Modern World
Some Statistical Terminologies…
Statistics involves the collection, organization, summarization,
presentation, and interpretation of data. (Aufmann et al, 2013)
Population – refers to an entire group that is being studied.
Parameter – is a value calculated using all the data from a
population.
Census – is a survey of an entire population.
Sample – is a smaller subset of the population, ideally one that is
fairly representative of the population.
Statistic – is a value calculated using the data from the sample.
Some Statistical Terminologies…
Classifying variables
Variable – a particular characteristic or trait of the units of the
population that can take on different values.
Qualitative – when a characteristic can be placed into a well-
defined groups or categories.
Quantitative – when a characteristic is expressed in numerical
value.
Discrete – the domain is at most countable.
Continuous – can take all possible values within a range; that
is, a measurement.
Some Statistical Terminologies…
Levels of Measurement – was first proposed by the American
psychologist Stanley Smith Stevens in 1946.
Nominal – is one in which the values of the variables are names
or labels.
Ordinal – uses numerical categories that convey a meaningful
order.
Interval – measurement shows order and the spaces between the
values also have significant meaning.
Ratio – the ratio between any two values has meaning, because
the data include an absolute zero value.
Measures of Center
Once the data are collected, it is useful to summarize the data
set by identifying a value around which the data are centered.
Mode – is the most frequently occurring number in a data set.
Median – is the middle number or the mean of the two middle
numbers in an ordered set of data.
Mean – is the numerical balancing point of the data set.
Measures of Center
Which measure of center is most useful?
 A teacher wants to know about her students family situation.
She asks for the number of children in their families:
6 3 2 3 4 1 2 2 4 3 1 2 2 4
Measures of Center
Mean – Median – Mode
 The mean is easy to compute. You only deal with one
number. It is not so with the median.
 The mean is affected by outliers while the median is
resistant. In a sense, the median is able to resist the pull of a
far away value, but the mean is drawn to such values.
 A change in any of the numbers changes the mean, and the
mean can be changed drastically by changing an extreme
value.
 In contrast, the median and the mode of a set of data are
usually not changed by changing an extreme value.
 The mean, the median, and the mode are all averages;
however, they are generally not equal.
Measures of Center
Compare the mean, the median, and the mode for the
salaries of 5 employees of a small company.
Salaries: P370,000 P60,000 P36,000 P20,000 P20,000
Mean = P101,200
Median = P 36,000
Mode = P 20,000
Most of the employees of this company would probably
agree that the median of P36,000 better represents the average of
the salaries than does either the mean or the mode.
Measures of Center
Consider the data in the following table below.
 In the first game, Barry has the best average.
 In the second game, Barry has the best average.
 If the statistics for the games are combined, Warren has the best
average.
In statistics, an example such as this is known as a Simpson’s
paradox.
Measures of Center
Simpson-Yule paradox means that sometimes when you
divide data in groups, it looks different from viewing it as a
whole.
Consider the following data on the test scores for two
students.
Is this an example of Simpson’s paradox? Explain.
English History English and History
combined
Maria 84, 65, 70, 90, 99,
84
89, 75, 85 Average: ?
Sarah 66, 84, 75, 77, 94,
96, 81
72, 78, 98, 81, 68,
92, 88, 86
Average: ?
Measures of Dispersion
Another important feature that can help us understand more
about a data set is the manner in which the data are distributed.
 Range is the difference between the largest value (maximum)
and the smallest value (minimum) in the data.
 Standard deviation is an extremely important measure of
spread that is based on the mean. It is a measure of the
average deviation for all of the data point from the mean.
 Variance is the square of the standard deviation of the data. It
does not use the same unit of measure as the original data.
Measure of Dispersion
Consider the following data sets:
1. 5 5 5 5 5 5 5 5 5 5
2. 0 0 0 0 0 10 10 10 10 10
3. 4 4 4 5 5 5 6 6 6
4. 0 5 5 5 5 5 5 10
5
5
x
x


5
5
x
x


5
5
x
x


5
5
x
x


0
0
R
s


10
5.27
R
s


2
0.87
R
s


10
2.67
R
s


Measure of Dispersion
Properties that determine the usefulness of the standard
deviation:
 It is use to describe the variability of the distribution only
when the mean is used to describe the center.
 It is equal to zero when there is no variability. This happens
only when all observations are of the same value.
 It has the same units of measurement as the original
observations.
 Like the mean, it can be influenced by outliers.
for population: for sample:
 
2




 x
n
 
2
1



 x x
s
n
Measures of Relative Position
z-score
The z-score for a given data value x is the number of standard
deviations that x is above or below the mean of the data.
z-score of xi in a population:
z-score of xi in a sample:
i
i
x
x
z




i
i
x
x x
z
s


Measure of Relative Position
Percentiles and Quartiles
are useful when you want to know where the score is located
in reference to the other scores.
 Percentile is a data value for which the specified percentage
of the data is below that value.
 The median is the 50th percentile.
 The 25th, 50th , 75th percentiles divide the data into lower
quartile Q1, middle quartile Q2, and upper quartile Q3,
respectively.
 In using quartiles, there are five numbers to be used
altogether: min value, Q1, median, Q3, and max value.
 Quartiles are useful for box plots.
Problem.
Problem. (Task: Discuss your solutions to each of the 3 problems)
1.The mean time to download a file is 12 minutes with std.
deviation of 4 minutes. Your download time is 20 minutes.
Your friend’s download time is 6 minutes. How can you
compare your download time with your friend?
2. Raul takes 2 tests in Chemistry. He scored 72 in long test 1 for
which the class mean score was 65 with std. deviation of 8. He
received a score of 60 in long test 2 for which the mean was 45
and the std. deviation was 12. In comparison to other students,
did Raul do better in LT 1 or LT 2?
3. A consumer group tested a sample of 100 light bulbs. The
mean life expectancy of the light bulbs was 842 hours with std.
deviation of 90 hours. One particular light bulb from the
company has a z-score life expectancy of 1.2. What was the
life span of the bulb?
Normal Distribution and Probability
Normal Distribution
is an extremely important concept, because it occurs so often
in the data we collect from the natural world, as well as in many
of the more theoretical ideas that are the foundation of statistics.
Normal Distribution and Probability
Characteristics of a Normal Distribution
Shape
A normal distribution is a perfectly symmetric, mound-shaped
distribution. It is commonly referred to the as a normal curve, or
bell curve.
Normal Distribution and Probability
Characteristics of a Normal Distribution
Center
Due to the exact symmetry of a normal curve, the center of a
normal distribution is located at the highest point of the
distribution, and all the statistical measures of center are equal.
Normal Distribution and Probability
Characteristics of a Normal Distribution
Center
It is also important to realize that this center peak divides the
data into two equal parts.
Normal Distribution and Probability
Characteristics of a Normal Distribution
Spread
In an idealized normal distribution of a continuous random
variable, the distribution continues infinitely in both directions.
Normal Distribution and Probability
Characteristics of a Normal Distribution
Area under the Curve
 Areas under the curve that are symmetric about the mean are
equal.
 The total area under the curve is 1.
Normal Distribution and Probability
Empirical Rule for a Normal Distribution
In a normal distribution, approximately
 68% of the data lie within 1 standard deviation of the mean.
 95% of the data lie within 2 standard deviations of the mean.
 99.7% of the data lie within 3 standard deviations of the mean.
Normal Distribution and Probability
Empirical Rule for a Normal Distribution
Example. The heights of a large group of people are assumed to be
normally distributed. Their mean height is 66.5 inches, and the
standard deviation is 2.4 inches. Find and interpret the
intervals representing one, two, and three standard deviations
of the mean.
One standard deviation of the mean:
Approximately 68% of the people are between 64.1 and 68.9 inches tall.
Two standard deviations of the mean:
Therefore, approximately 95% of the people are between 61.7 and 71.3 inches tall.
Three standard deviations of the mean:
Nearly all of the people (99.74%) are between 59.3 and 73.7 inches tall.
Problem. (Use the Empirical rule)
A vegetable distributor knows that during the month of August, the
weights of its tomatoes are normally distributed with a mean of 0.61 kg and a
standard deviation of 0.15 kg.
a. What percent of the tomatoes weigh less than 0.76 kg?
b. In a shipment of 6000 tomatoes, how many tomatoes can be expected to
weigh more than 0.31 kg?
c. In a shipment of 4500 tomatoes, how many tomatoes can be expected to
weigh from 0.31 kg to 0.91 kg?
a. 0.76 kg is 1 standard deviation above the mean of 0.61 kg. In a normal distribution, 34% of all data
lie between the mean and 1 standard deviation above the mean, and 50% of all data lie below the
mean. Thus, 34% + 50% = 84% of the tomatoes weigh less than 0.76 kg.
b. 0.31 kg is 2 standard deviations below the mean of 0.61 kg. In a normal distribution, 47.5% of all
data lie between the mean and 2 standard deviations below the mean, and 50% of all data lie above
the mean. This gives a total of 47.5% + 50% = 97.5% of the tomatoes that weigh more than 0.31 kg.
Therefore 97.5% of 6000 = 5850 of the tomatoes can be expected to weigh more than 0.31 kg.
c. 0.31 kg is 2 standard deviations below the mean of 0.61 kg and 0.91 kg is 2 standard deviations
above the mean of 0.61 kg. In a normal distribution, 95% of all data lie within 2 standard deviations
of the mean. Therefore 95% of 4500 = 4275 of the tomatoes can be expected to weigh from 0.31 kg
to 0.91 kg.
Normal Distribution and Probability
If the original distribution
of x values is a normal
distribution, then the
corresponding distribution of
z-scores will also be a normal
distribution. This normal
distribution of z-scores is
called the standard normal
distribution.
Standard Normal Distribution
The standard normal distribution is the normal distribution
that has a mean of 0 and a standard deviation of 1.
Normal Distribution and Probability
Standard Normal Distribution
In the standard normal distribution, the area of the distribution
from z = a to z = b represents
 the percentage of z-values that lie in the interval from a to b.
 the probability that z lies in the interval from a to b.
Problem
1. A soda machine dispenses soda into 12-ounce cups. Tests show
that the actual amount of soda dispensed is normally distributed,
with a mean of 11.5 oz and a standard deviation of 0.2 oz.
a. What percent of cups will receive less than 11.25 oz of soda?
b. What percent of cups will receive between 11.2 oz and 11.55 oz
of soda?
c. If a cup is chosen at random, what is the probability that the
machine will overflow the cup?
2. The OnTheGo company manufactures laptop computers. A study
indicates that the life spans of their computers are normally
distributed, with a mean of 4.0 years and a standard deviation of
1.2 years. How long should the company warrant its computers if
the company wishes less than 4% of its computers to fail during
the warranty period?
Statistical Hypotheses
A hypothesis is simply a conjecture about a characteristic or
set of facts.
When performing statistical analyses, our hypotheses provide
the general framework of what we are testing and how to perform
the test.
Hypothesis testing involves testing the difference between a
hypothesized value of a population parameter and the estimate of
that parameter which is calculated from a sample.
Statistical Hypotheses
Overview of the Process
The hypothesis to be tested is called the null hypothesis and
given the symbol H0 The alternative hypothesis is given the
symbol H1.
Statistical Hypotheses
Sample null and alternative hypotheses
If the H1 is either > or <, the test is referred to as one-sided test.
If H1 contains ≠, it is two-sided test.
1. H0 : = 0
(Mean is equal to a reference value)
H1 : ≠ 0 or
H1 : > 0 or H1 : < 0
2. H0 : 1 = 2
(Two population means are equal)
H1 : 1 ≠ 2 or
H1 : 1 > 2 or H1 : 1< 0
3. H0 : 1 = 2 = . . . = k
(The k population means are equal)
H1 : at least two means are not equal
4. H0 : π1 = π2
(Two population proportions are
equal)
H1 : π1 ≠ π2 or
H1 : π1 > π2 or H1 : π1< π0
5. H0 : = 0
(There is no linear correlation
between the two variables )
H1 : ≠ 0 or
H1 : > 0 or H1 : < 0
Statistical Hypotheses
Tests Concerning the Mean
To test whether an observed difference between a population
mean and a reference value or to test whether the difference
between the two values of the mean is significant or can be
attributed to chance, the following statistical tests are used.
The z–test is used if the population standard deviation is
known or if not, the sample standard deviation can be used as an
estimate of the population standard deviation provided that the
sample size is large; that is, n ≥ 30.
The t–test is used if the sample size is less than 30 and the
sample standard deviation is known.
Statistical Hypotheses
Tests Concerning the Mean
The purpose of Analysis of variance (ANOVA) is much the
same as the t – tests; however, if a series of several t–tests are
used to evaluate several mean differences, the risk of Type I error
increases; that is, the α-levels accumulate over a series of tests so
that the final experiment wise α-level can be quite large.
The ANOVA is necessary to protect researchers from
excessive risk of a Type I error.
The ANOVA allows researcher to evaluate all of the mean
differences in a single hypothesis test using a single α-level and,
thereby, keeps the risk of a Type I error under control no matter
how many different means are being compared.
Statistical Hypotheses
Tests Concerning the Mean
The ANOVA tests the homogeneity of a set of means but if
the null hypothesis is rejected in favor of the alternative
hypothesis that the means are not all equal, further test should be
done (Post Hoc) to determine which pairs of means are
significantly different.
The following Post Hoc Tests are available in most statistical
software:
1. Duncan’s multiple range test
2. Tukey’s procedure
3. Scheffe test
4. Fisher’s least significant difference
Linear Regression and Correlation
Correlation measures the relationship between bivariate data.
Bivariate data are data sets in which each subject has two
observations associated with it.
A response variable measures an outcome or result of a study.
An explanatory variable is a variable that we think explains
or causes changes in the response variables.
Linear regression is an approach for modeling the relationship
between a dependent variable (outcome) and one or more
explanatory variables. The case of one explanatory variable is
called simple linear regression.
Linear Regression and Correlation
Scatterplot is a graph of plotted points showing the relationship
between two numerical variables.
Linear Regression and Correlation
Examining a Scatterplot
1. Describe the overall pattern of a scatterplot by the form,
direction, and strength of the relationship.
2. Then look for any striking deviations from the pattern. Identify
each occurrence of an outlier.
Linear Regression and Correlation
Linear Regression
– involves using data to calculate a line that best fits that data
and then using that line to predict scores.
Least-Square Regression Line
– is the line that minimizes the sum of the squares of the
vertical deviations from each data point to the line.
The equation of the least-squares line is
where and
ŷ ax b
 
  
   
2
2
n xy x y
a
n x x



  
 
 
b y ax
Linear Regression and Correlation
Linear Correlation Coefficient
– determine the strength of a linear relationship between two
variables which is denoted by the variable r.
If the linear correlation coefficient r is positive, the
relationship between the variables has a positive correlation. In
this case, if one variable increases, the other variable also tends to
increase.
If r is negative, the linear relationship between the
variables has a negative correlation. In this case, if one variable
increases, the other variable tends to decrease.
    
       
2 2
2 2
n xy x y
r
n x x n y y


  
  
   
Linear Regression and Correlation
Happiness vs Life Expectancy
Source: CHED GenEd 1st Generation Training
What is the equation of the least-square regression line?
Country Happiness Life Expectancy
Japan 6.8 80.80
South Korea 6.2 74.20
China 6.3 70.40
Taiwan 6.2 76.40
Indonesia 6.6 78.00
Philippines 6.4 69.00
Singapore 6.8 77.60
Vietnam 6.1 69.40
India 6.2 63.00
Bangladesh 5.7 59.50
Linear Regression and Correlation
Happiness vs Life Expectancy
a = 16.661
b =- 33.635
Will the line give accurate predictions?
Correlation Coefficient r = 0.82
Predict the life expectancy for the following countries:
Actual LE
a) Zimbabwe: happiness = 4.2 35.40
b) Ghana: happiness = 5.4 57.90
c) Belarus: happiness = 6.1 68.60
ˆ 16.661 33.635
 
y x
Linear Regression and Correlation
Example. Unemployment and family income are undoubtedly related; we
would assume that as the national annual unemployment rate increases,
average annual family income would decrease. Table on next slide gives
the annual unemployment rate and the average annual family income for
the Philippines according to regions from the Philippine Statistics
Authority.
a. Use linear regression to predict the average annual family income of
the Philippines if the annual unemployment rate is 6.3%.
b. Use linear regression to predict the annual unemployment rate if the
average annual family income of the Philippines is P267,000.
c. Are the predictions in parts (a) and (b) reliable? Why or why not?
Linear Regression and Correlation
Region
Annual
Unemployment Rate
Ave. Annual Family
Income (000,000)
NCR 8.5 4.25
Cordilla 4.8 2.82
I - Ilocos Region 8.4 2.38
II - Cagayan Valley 3.2 2.37
III - Central Luzon 7.8 2.99
IVA - CALABARZON 8.0 3.12
IVB - MIMAROPA 3.3 2.22
V - Bicol Region 5.6 1.87
VI - Western Visayas 5.4 2.26
VII - Central Visayas 5.9 2.39
VIII- Eastern Visayas 5.4 1.97
IX - Zamboanga Peninsula 3.5 1.90
X - Northern Mindanao 5.6 2.21
XI - Davao Region 5.8 2.47
XII - SOCCSKSARGEN 3.5 1.88
Caraga 5.7 1.98
ARMM 3.5 1.39
References:
Aufmann et al (2013). Mathematical Excursions 3ed. Brooks/Cole ,Cengage
Learning.
Bluman, A. G. (2012). Elementary statistics: a step by step approach 8ed. New
York: McGraw-Hill.
COMAP, Inc. (2013). For all practical purposes: mathematical literacy in
today’s world. New York: W.H Freeman and Company.
Johnson & Mowry (2012). Mathematics: a practical odyssey. Brooks/Cole,
Cengage Learning
Lawsky et al (2014). CK-12 advanced probability and statistics, 2ed. CK-12
Foundation.
Nocon, R. & Nocon, E. (2016). Essential mathematics for the modern world..
QC: C & E Publishing, Inc.
Vistru-Yu, C. and Gozon, A. (2016). Statistics a review ppt. CHED’s GE First
Generation Training.

More Related Content

What's hot

Circle and Its Part - Math 7 (3rd Quarter)
Circle and Its Part - Math 7 (3rd Quarter)Circle and Its Part - Math 7 (3rd Quarter)
Circle and Its Part - Math 7 (3rd Quarter)Carlo Luna
 
Organizing data using frequency distribution
Organizing data using frequency distributionOrganizing data using frequency distribution
Organizing data using frequency distributionKennyAnnGraceBatianc
 
THE-SIX-TRIGONOMETRIC-FUNCTIONS.pptx
THE-SIX-TRIGONOMETRIC-FUNCTIONS.pptxTHE-SIX-TRIGONOMETRIC-FUNCTIONS.pptx
THE-SIX-TRIGONOMETRIC-FUNCTIONS.pptxMarvinReynes1
 
Mode of Grouped Data - Math 7 (4th Quarter)
Mode of Grouped Data - Math 7 (4th Quarter)Mode of Grouped Data - Math 7 (4th Quarter)
Mode of Grouped Data - Math 7 (4th Quarter)Carlo Luna
 
Arc Length & Area of a Sector.pptx
Arc Length & Area of a Sector.pptxArc Length & Area of a Sector.pptx
Arc Length & Area of a Sector.pptxJASONBIGATA4
 
Central And Inscribed Angles
Central And Inscribed AnglesCentral And Inscribed Angles
Central And Inscribed AnglesRyanWatt
 
Arcs of A Circle by Alma Baja
Arcs of A Circle by Alma BajaArcs of A Circle by Alma Baja
Arcs of A Circle by Alma BajaNhatz Marticio
 
Arithmetic Mean in Business Statistics
Arithmetic Mean in Business StatisticsArithmetic Mean in Business Statistics
Arithmetic Mean in Business Statisticsmuthukrishnaveni anand
 
Module 1 statistics
Module 1   statisticsModule 1   statistics
Module 1 statisticsdionesioable
 
7.2 Similar Polygons
7.2 Similar Polygons7.2 Similar Polygons
7.2 Similar Polygonssmiller5
 
45 45-90 triangles
45 45-90 triangles45 45-90 triangles
45 45-90 trianglesmatsu1jk
 
Measures of variability
Measures of variabilityMeasures of variability
Measures of variabilityMark Santos
 
Grouped Mean Median Mode
Grouped Mean Median ModeGrouped Mean Median Mode
Grouped Mean Median ModePassy World
 
theorems on tangents, Secants and segments of a circles 1.pptx
theorems on tangents, Secants and segments of a circles 1.pptxtheorems on tangents, Secants and segments of a circles 1.pptx
theorems on tangents, Secants and segments of a circles 1.pptxPeejayOAntonio
 

What's hot (20)

Circle and Its Part - Math 7 (3rd Quarter)
Circle and Its Part - Math 7 (3rd Quarter)Circle and Its Part - Math 7 (3rd Quarter)
Circle and Its Part - Math 7 (3rd Quarter)
 
Organizing data using frequency distribution
Organizing data using frequency distributionOrganizing data using frequency distribution
Organizing data using frequency distribution
 
THE-SIX-TRIGONOMETRIC-FUNCTIONS.pptx
THE-SIX-TRIGONOMETRIC-FUNCTIONS.pptxTHE-SIX-TRIGONOMETRIC-FUNCTIONS.pptx
THE-SIX-TRIGONOMETRIC-FUNCTIONS.pptx
 
Mode of Grouped Data - Math 7 (4th Quarter)
Mode of Grouped Data - Math 7 (4th Quarter)Mode of Grouped Data - Math 7 (4th Quarter)
Mode of Grouped Data - Math 7 (4th Quarter)
 
Arc Length & Area of a Sector.pptx
Arc Length & Area of a Sector.pptxArc Length & Area of a Sector.pptx
Arc Length & Area of a Sector.pptx
 
Central And Inscribed Angles
Central And Inscribed AnglesCentral And Inscribed Angles
Central And Inscribed Angles
 
Union and intersection
Union and intersectionUnion and intersection
Union and intersection
 
Arcs of A Circle by Alma Baja
Arcs of A Circle by Alma BajaArcs of A Circle by Alma Baja
Arcs of A Circle by Alma Baja
 
Arithmetic Mean in Business Statistics
Arithmetic Mean in Business StatisticsArithmetic Mean in Business Statistics
Arithmetic Mean in Business Statistics
 
Module 1 statistics
Module 1   statisticsModule 1   statistics
Module 1 statistics
 
7.2 Similar Polygons
7.2 Similar Polygons7.2 Similar Polygons
7.2 Similar Polygons
 
Grade 7 Statistics
Grade 7 StatisticsGrade 7 Statistics
Grade 7 Statistics
 
Math 102- Statistics
Math 102- StatisticsMath 102- Statistics
Math 102- Statistics
 
Measures of Spread
Measures of SpreadMeasures of Spread
Measures of Spread
 
45 45-90 triangles
45 45-90 triangles45 45-90 triangles
45 45-90 triangles
 
Mean for Grouped Data
Mean for Grouped DataMean for Grouped Data
Mean for Grouped Data
 
Measures of variability
Measures of variabilityMeasures of variability
Measures of variability
 
Mod mean quartile
Mod mean quartileMod mean quartile
Mod mean quartile
 
Grouped Mean Median Mode
Grouped Mean Median ModeGrouped Mean Median Mode
Grouped Mean Median Mode
 
theorems on tangents, Secants and segments of a circles 1.pptx
theorems on tangents, Secants and segments of a circles 1.pptxtheorems on tangents, Secants and segments of a circles 1.pptx
theorems on tangents, Secants and segments of a circles 1.pptx
 

Similar to Statistics (GE 4 CLASS).pptx

Review of Basic Statistics and Terminology
Review of Basic Statistics and TerminologyReview of Basic Statistics and Terminology
Review of Basic Statistics and Terminologyaswhite
 
Introduction to Statistics23122223.ppt
Introduction to Statistics23122223.pptIntroduction to Statistics23122223.ppt
Introduction to Statistics23122223.pptpathianithanaidu
 
Introduction to Statistics2312.ppt
Introduction to Statistics2312.pptIntroduction to Statistics2312.ppt
Introduction to Statistics2312.pptpathianithanaidu
 
Descriptions of data statistics for research
Descriptions of data   statistics for researchDescriptions of data   statistics for research
Descriptions of data statistics for researchHarve Abella
 
3. measures of central tendency
3. measures of central tendency3. measures of central tendency
3. measures of central tendencyrenz50
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersionGilbert Joseph Abueg
 
Summary statistics
Summary statisticsSummary statistics
Summary statisticsRupak Roy
 
Standard deviation
Standard deviationStandard deviation
Standard deviationM K
 
Research methodology3
Research methodology3Research methodology3
Research methodology3Tosif Ahmad
 
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docxQUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docxmakdul
 
Soni_Biostatistics.ppt
Soni_Biostatistics.pptSoni_Biostatistics.ppt
Soni_Biostatistics.pptOgunsina1
 
Levels of Measurement.docx
Levels of Measurement.docxLevels of Measurement.docx
Levels of Measurement.docxwewe90
 
Levels of Measurement.docx
Levels of Measurement.docxLevels of Measurement.docx
Levels of Measurement.docxdavidnipashe
 
Hcai 5220 lecture notes on campus sessions fall 11(2)
Hcai 5220 lecture notes on campus sessions fall 11(2)Hcai 5220 lecture notes on campus sessions fall 11(2)
Hcai 5220 lecture notes on campus sessions fall 11(2)Twene Peter
 
statisticsintroductionofbusinessstats.ppt
statisticsintroductionofbusinessstats.pptstatisticsintroductionofbusinessstats.ppt
statisticsintroductionofbusinessstats.pptvoore ajay
 
Ch2 Data Description
Ch2 Data DescriptionCh2 Data Description
Ch2 Data DescriptionFarhan Alfin
 

Similar to Statistics (GE 4 CLASS).pptx (20)

Stat11t chapter3
Stat11t chapter3Stat11t chapter3
Stat11t chapter3
 
Review of Basic Statistics and Terminology
Review of Basic Statistics and TerminologyReview of Basic Statistics and Terminology
Review of Basic Statistics and Terminology
 
Advanced statistics
Advanced statisticsAdvanced statistics
Advanced statistics
 
Introduction to Statistics23122223.ppt
Introduction to Statistics23122223.pptIntroduction to Statistics23122223.ppt
Introduction to Statistics23122223.ppt
 
Introduction to Statistics2312.ppt
Introduction to Statistics2312.pptIntroduction to Statistics2312.ppt
Introduction to Statistics2312.ppt
 
Descriptions of data statistics for research
Descriptions of data   statistics for researchDescriptions of data   statistics for research
Descriptions of data statistics for research
 
3. measures of central tendency
3. measures of central tendency3. measures of central tendency
3. measures of central tendency
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
 
Intro to Biostat. ppt
Intro to Biostat. pptIntro to Biostat. ppt
Intro to Biostat. ppt
 
Summary statistics
Summary statisticsSummary statistics
Summary statistics
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
Research methodology3
Research methodology3Research methodology3
Research methodology3
 
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docxQUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
QUESTION 1Question 1 Describe the purpose of ecumenical servic.docx
 
Soni_Biostatistics.ppt
Soni_Biostatistics.pptSoni_Biostatistics.ppt
Soni_Biostatistics.ppt
 
Levels of Measurement.docx
Levels of Measurement.docxLevels of Measurement.docx
Levels of Measurement.docx
 
Levels of Measurement.docx
Levels of Measurement.docxLevels of Measurement.docx
Levels of Measurement.docx
 
Hcai 5220 lecture notes on campus sessions fall 11(2)
Hcai 5220 lecture notes on campus sessions fall 11(2)Hcai 5220 lecture notes on campus sessions fall 11(2)
Hcai 5220 lecture notes on campus sessions fall 11(2)
 
statisticsintroductionofbusinessstats.ppt
statisticsintroductionofbusinessstats.pptstatisticsintroductionofbusinessstats.ppt
statisticsintroductionofbusinessstats.ppt
 
Ch2 Data Description
Ch2 Data DescriptionCh2 Data Description
Ch2 Data Description
 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 

Statistics (GE 4 CLASS).pptx

  • 1. Section 2. Mathematics as a Tool (Part 1 – Data Management) A Review GenEd Second Generation Training Mathematics in the Modern World
  • 2. Some Statistical Terminologies… Statistics involves the collection, organization, summarization, presentation, and interpretation of data. (Aufmann et al, 2013) Population – refers to an entire group that is being studied. Parameter – is a value calculated using all the data from a population. Census – is a survey of an entire population. Sample – is a smaller subset of the population, ideally one that is fairly representative of the population. Statistic – is a value calculated using the data from the sample.
  • 3. Some Statistical Terminologies… Classifying variables Variable – a particular characteristic or trait of the units of the population that can take on different values. Qualitative – when a characteristic can be placed into a well- defined groups or categories. Quantitative – when a characteristic is expressed in numerical value. Discrete – the domain is at most countable. Continuous – can take all possible values within a range; that is, a measurement.
  • 4. Some Statistical Terminologies… Levels of Measurement – was first proposed by the American psychologist Stanley Smith Stevens in 1946. Nominal – is one in which the values of the variables are names or labels. Ordinal – uses numerical categories that convey a meaningful order. Interval – measurement shows order and the spaces between the values also have significant meaning. Ratio – the ratio between any two values has meaning, because the data include an absolute zero value.
  • 5. Measures of Center Once the data are collected, it is useful to summarize the data set by identifying a value around which the data are centered. Mode – is the most frequently occurring number in a data set. Median – is the middle number or the mean of the two middle numbers in an ordered set of data. Mean – is the numerical balancing point of the data set.
  • 6. Measures of Center Which measure of center is most useful?  A teacher wants to know about her students family situation. She asks for the number of children in their families: 6 3 2 3 4 1 2 2 4 3 1 2 2 4
  • 7. Measures of Center Mean – Median – Mode  The mean is easy to compute. You only deal with one number. It is not so with the median.  The mean is affected by outliers while the median is resistant. In a sense, the median is able to resist the pull of a far away value, but the mean is drawn to such values.  A change in any of the numbers changes the mean, and the mean can be changed drastically by changing an extreme value.  In contrast, the median and the mode of a set of data are usually not changed by changing an extreme value.  The mean, the median, and the mode are all averages; however, they are generally not equal.
  • 8. Measures of Center Compare the mean, the median, and the mode for the salaries of 5 employees of a small company. Salaries: P370,000 P60,000 P36,000 P20,000 P20,000 Mean = P101,200 Median = P 36,000 Mode = P 20,000 Most of the employees of this company would probably agree that the median of P36,000 better represents the average of the salaries than does either the mean or the mode.
  • 9. Measures of Center Consider the data in the following table below.  In the first game, Barry has the best average.  In the second game, Barry has the best average.  If the statistics for the games are combined, Warren has the best average. In statistics, an example such as this is known as a Simpson’s paradox.
  • 10. Measures of Center Simpson-Yule paradox means that sometimes when you divide data in groups, it looks different from viewing it as a whole. Consider the following data on the test scores for two students. Is this an example of Simpson’s paradox? Explain. English History English and History combined Maria 84, 65, 70, 90, 99, 84 89, 75, 85 Average: ? Sarah 66, 84, 75, 77, 94, 96, 81 72, 78, 98, 81, 68, 92, 88, 86 Average: ?
  • 11. Measures of Dispersion Another important feature that can help us understand more about a data set is the manner in which the data are distributed.  Range is the difference between the largest value (maximum) and the smallest value (minimum) in the data.  Standard deviation is an extremely important measure of spread that is based on the mean. It is a measure of the average deviation for all of the data point from the mean.  Variance is the square of the standard deviation of the data. It does not use the same unit of measure as the original data.
  • 12. Measure of Dispersion Consider the following data sets: 1. 5 5 5 5 5 5 5 5 5 5 2. 0 0 0 0 0 10 10 10 10 10 3. 4 4 4 5 5 5 6 6 6 4. 0 5 5 5 5 5 5 10 5 5 x x   5 5 x x   5 5 x x   5 5 x x   0 0 R s   10 5.27 R s   2 0.87 R s   10 2.67 R s  
  • 13. Measure of Dispersion Properties that determine the usefulness of the standard deviation:  It is use to describe the variability of the distribution only when the mean is used to describe the center.  It is equal to zero when there is no variability. This happens only when all observations are of the same value.  It has the same units of measurement as the original observations.  Like the mean, it can be influenced by outliers. for population: for sample:   2      x n   2 1     x x s n
  • 14. Measures of Relative Position z-score The z-score for a given data value x is the number of standard deviations that x is above or below the mean of the data. z-score of xi in a population: z-score of xi in a sample: i i x x z     i i x x x z s  
  • 15. Measure of Relative Position Percentiles and Quartiles are useful when you want to know where the score is located in reference to the other scores.  Percentile is a data value for which the specified percentage of the data is below that value.  The median is the 50th percentile.  The 25th, 50th , 75th percentiles divide the data into lower quartile Q1, middle quartile Q2, and upper quartile Q3, respectively.  In using quartiles, there are five numbers to be used altogether: min value, Q1, median, Q3, and max value.  Quartiles are useful for box plots.
  • 17.
  • 18. Problem. (Task: Discuss your solutions to each of the 3 problems) 1.The mean time to download a file is 12 minutes with std. deviation of 4 minutes. Your download time is 20 minutes. Your friend’s download time is 6 minutes. How can you compare your download time with your friend? 2. Raul takes 2 tests in Chemistry. He scored 72 in long test 1 for which the class mean score was 65 with std. deviation of 8. He received a score of 60 in long test 2 for which the mean was 45 and the std. deviation was 12. In comparison to other students, did Raul do better in LT 1 or LT 2? 3. A consumer group tested a sample of 100 light bulbs. The mean life expectancy of the light bulbs was 842 hours with std. deviation of 90 hours. One particular light bulb from the company has a z-score life expectancy of 1.2. What was the life span of the bulb?
  • 19. Normal Distribution and Probability Normal Distribution is an extremely important concept, because it occurs so often in the data we collect from the natural world, as well as in many of the more theoretical ideas that are the foundation of statistics.
  • 20. Normal Distribution and Probability Characteristics of a Normal Distribution Shape A normal distribution is a perfectly symmetric, mound-shaped distribution. It is commonly referred to the as a normal curve, or bell curve.
  • 21. Normal Distribution and Probability Characteristics of a Normal Distribution Center Due to the exact symmetry of a normal curve, the center of a normal distribution is located at the highest point of the distribution, and all the statistical measures of center are equal.
  • 22. Normal Distribution and Probability Characteristics of a Normal Distribution Center It is also important to realize that this center peak divides the data into two equal parts.
  • 23. Normal Distribution and Probability Characteristics of a Normal Distribution Spread In an idealized normal distribution of a continuous random variable, the distribution continues infinitely in both directions.
  • 24. Normal Distribution and Probability Characteristics of a Normal Distribution Area under the Curve  Areas under the curve that are symmetric about the mean are equal.  The total area under the curve is 1.
  • 25. Normal Distribution and Probability Empirical Rule for a Normal Distribution In a normal distribution, approximately  68% of the data lie within 1 standard deviation of the mean.  95% of the data lie within 2 standard deviations of the mean.  99.7% of the data lie within 3 standard deviations of the mean.
  • 26. Normal Distribution and Probability Empirical Rule for a Normal Distribution Example. The heights of a large group of people are assumed to be normally distributed. Their mean height is 66.5 inches, and the standard deviation is 2.4 inches. Find and interpret the intervals representing one, two, and three standard deviations of the mean. One standard deviation of the mean: Approximately 68% of the people are between 64.1 and 68.9 inches tall. Two standard deviations of the mean: Therefore, approximately 95% of the people are between 61.7 and 71.3 inches tall. Three standard deviations of the mean: Nearly all of the people (99.74%) are between 59.3 and 73.7 inches tall.
  • 27. Problem. (Use the Empirical rule) A vegetable distributor knows that during the month of August, the weights of its tomatoes are normally distributed with a mean of 0.61 kg and a standard deviation of 0.15 kg. a. What percent of the tomatoes weigh less than 0.76 kg? b. In a shipment of 6000 tomatoes, how many tomatoes can be expected to weigh more than 0.31 kg? c. In a shipment of 4500 tomatoes, how many tomatoes can be expected to weigh from 0.31 kg to 0.91 kg? a. 0.76 kg is 1 standard deviation above the mean of 0.61 kg. In a normal distribution, 34% of all data lie between the mean and 1 standard deviation above the mean, and 50% of all data lie below the mean. Thus, 34% + 50% = 84% of the tomatoes weigh less than 0.76 kg. b. 0.31 kg is 2 standard deviations below the mean of 0.61 kg. In a normal distribution, 47.5% of all data lie between the mean and 2 standard deviations below the mean, and 50% of all data lie above the mean. This gives a total of 47.5% + 50% = 97.5% of the tomatoes that weigh more than 0.31 kg. Therefore 97.5% of 6000 = 5850 of the tomatoes can be expected to weigh more than 0.31 kg. c. 0.31 kg is 2 standard deviations below the mean of 0.61 kg and 0.91 kg is 2 standard deviations above the mean of 0.61 kg. In a normal distribution, 95% of all data lie within 2 standard deviations of the mean. Therefore 95% of 4500 = 4275 of the tomatoes can be expected to weigh from 0.31 kg to 0.91 kg.
  • 28. Normal Distribution and Probability If the original distribution of x values is a normal distribution, then the corresponding distribution of z-scores will also be a normal distribution. This normal distribution of z-scores is called the standard normal distribution. Standard Normal Distribution The standard normal distribution is the normal distribution that has a mean of 0 and a standard deviation of 1.
  • 29. Normal Distribution and Probability Standard Normal Distribution In the standard normal distribution, the area of the distribution from z = a to z = b represents  the percentage of z-values that lie in the interval from a to b.  the probability that z lies in the interval from a to b.
  • 30. Problem 1. A soda machine dispenses soda into 12-ounce cups. Tests show that the actual amount of soda dispensed is normally distributed, with a mean of 11.5 oz and a standard deviation of 0.2 oz. a. What percent of cups will receive less than 11.25 oz of soda? b. What percent of cups will receive between 11.2 oz and 11.55 oz of soda? c. If a cup is chosen at random, what is the probability that the machine will overflow the cup? 2. The OnTheGo company manufactures laptop computers. A study indicates that the life spans of their computers are normally distributed, with a mean of 4.0 years and a standard deviation of 1.2 years. How long should the company warrant its computers if the company wishes less than 4% of its computers to fail during the warranty period?
  • 31. Statistical Hypotheses A hypothesis is simply a conjecture about a characteristic or set of facts. When performing statistical analyses, our hypotheses provide the general framework of what we are testing and how to perform the test. Hypothesis testing involves testing the difference between a hypothesized value of a population parameter and the estimate of that parameter which is calculated from a sample.
  • 32. Statistical Hypotheses Overview of the Process The hypothesis to be tested is called the null hypothesis and given the symbol H0 The alternative hypothesis is given the symbol H1.
  • 33. Statistical Hypotheses Sample null and alternative hypotheses If the H1 is either > or <, the test is referred to as one-sided test. If H1 contains ≠, it is two-sided test. 1. H0 : = 0 (Mean is equal to a reference value) H1 : ≠ 0 or H1 : > 0 or H1 : < 0 2. H0 : 1 = 2 (Two population means are equal) H1 : 1 ≠ 2 or H1 : 1 > 2 or H1 : 1< 0 3. H0 : 1 = 2 = . . . = k (The k population means are equal) H1 : at least two means are not equal 4. H0 : π1 = π2 (Two population proportions are equal) H1 : π1 ≠ π2 or H1 : π1 > π2 or H1 : π1< π0 5. H0 : = 0 (There is no linear correlation between the two variables ) H1 : ≠ 0 or H1 : > 0 or H1 : < 0
  • 34. Statistical Hypotheses Tests Concerning the Mean To test whether an observed difference between a population mean and a reference value or to test whether the difference between the two values of the mean is significant or can be attributed to chance, the following statistical tests are used. The z–test is used if the population standard deviation is known or if not, the sample standard deviation can be used as an estimate of the population standard deviation provided that the sample size is large; that is, n ≥ 30. The t–test is used if the sample size is less than 30 and the sample standard deviation is known.
  • 35. Statistical Hypotheses Tests Concerning the Mean The purpose of Analysis of variance (ANOVA) is much the same as the t – tests; however, if a series of several t–tests are used to evaluate several mean differences, the risk of Type I error increases; that is, the α-levels accumulate over a series of tests so that the final experiment wise α-level can be quite large. The ANOVA is necessary to protect researchers from excessive risk of a Type I error. The ANOVA allows researcher to evaluate all of the mean differences in a single hypothesis test using a single α-level and, thereby, keeps the risk of a Type I error under control no matter how many different means are being compared.
  • 36. Statistical Hypotheses Tests Concerning the Mean The ANOVA tests the homogeneity of a set of means but if the null hypothesis is rejected in favor of the alternative hypothesis that the means are not all equal, further test should be done (Post Hoc) to determine which pairs of means are significantly different. The following Post Hoc Tests are available in most statistical software: 1. Duncan’s multiple range test 2. Tukey’s procedure 3. Scheffe test 4. Fisher’s least significant difference
  • 37. Linear Regression and Correlation Correlation measures the relationship between bivariate data. Bivariate data are data sets in which each subject has two observations associated with it. A response variable measures an outcome or result of a study. An explanatory variable is a variable that we think explains or causes changes in the response variables. Linear regression is an approach for modeling the relationship between a dependent variable (outcome) and one or more explanatory variables. The case of one explanatory variable is called simple linear regression.
  • 38. Linear Regression and Correlation Scatterplot is a graph of plotted points showing the relationship between two numerical variables.
  • 39. Linear Regression and Correlation Examining a Scatterplot 1. Describe the overall pattern of a scatterplot by the form, direction, and strength of the relationship. 2. Then look for any striking deviations from the pattern. Identify each occurrence of an outlier.
  • 40. Linear Regression and Correlation Linear Regression – involves using data to calculate a line that best fits that data and then using that line to predict scores. Least-Square Regression Line – is the line that minimizes the sum of the squares of the vertical deviations from each data point to the line. The equation of the least-squares line is where and ŷ ax b          2 2 n xy x y a n x x           b y ax
  • 41. Linear Regression and Correlation Linear Correlation Coefficient – determine the strength of a linear relationship between two variables which is denoted by the variable r. If the linear correlation coefficient r is positive, the relationship between the variables has a positive correlation. In this case, if one variable increases, the other variable also tends to increase. If r is negative, the linear relationship between the variables has a negative correlation. In this case, if one variable increases, the other variable tends to decrease.              2 2 2 2 n xy x y r n x x n y y            
  • 42. Linear Regression and Correlation Happiness vs Life Expectancy Source: CHED GenEd 1st Generation Training What is the equation of the least-square regression line? Country Happiness Life Expectancy Japan 6.8 80.80 South Korea 6.2 74.20 China 6.3 70.40 Taiwan 6.2 76.40 Indonesia 6.6 78.00 Philippines 6.4 69.00 Singapore 6.8 77.60 Vietnam 6.1 69.40 India 6.2 63.00 Bangladesh 5.7 59.50
  • 43. Linear Regression and Correlation Happiness vs Life Expectancy a = 16.661 b =- 33.635 Will the line give accurate predictions? Correlation Coefficient r = 0.82 Predict the life expectancy for the following countries: Actual LE a) Zimbabwe: happiness = 4.2 35.40 b) Ghana: happiness = 5.4 57.90 c) Belarus: happiness = 6.1 68.60 ˆ 16.661 33.635   y x
  • 44. Linear Regression and Correlation Example. Unemployment and family income are undoubtedly related; we would assume that as the national annual unemployment rate increases, average annual family income would decrease. Table on next slide gives the annual unemployment rate and the average annual family income for the Philippines according to regions from the Philippine Statistics Authority. a. Use linear regression to predict the average annual family income of the Philippines if the annual unemployment rate is 6.3%. b. Use linear regression to predict the annual unemployment rate if the average annual family income of the Philippines is P267,000. c. Are the predictions in parts (a) and (b) reliable? Why or why not?
  • 45. Linear Regression and Correlation Region Annual Unemployment Rate Ave. Annual Family Income (000,000) NCR 8.5 4.25 Cordilla 4.8 2.82 I - Ilocos Region 8.4 2.38 II - Cagayan Valley 3.2 2.37 III - Central Luzon 7.8 2.99 IVA - CALABARZON 8.0 3.12 IVB - MIMAROPA 3.3 2.22 V - Bicol Region 5.6 1.87 VI - Western Visayas 5.4 2.26 VII - Central Visayas 5.9 2.39 VIII- Eastern Visayas 5.4 1.97 IX - Zamboanga Peninsula 3.5 1.90 X - Northern Mindanao 5.6 2.21 XI - Davao Region 5.8 2.47 XII - SOCCSKSARGEN 3.5 1.88 Caraga 5.7 1.98 ARMM 3.5 1.39
  • 46. References: Aufmann et al (2013). Mathematical Excursions 3ed. Brooks/Cole ,Cengage Learning. Bluman, A. G. (2012). Elementary statistics: a step by step approach 8ed. New York: McGraw-Hill. COMAP, Inc. (2013). For all practical purposes: mathematical literacy in today’s world. New York: W.H Freeman and Company. Johnson & Mowry (2012). Mathematics: a practical odyssey. Brooks/Cole, Cengage Learning Lawsky et al (2014). CK-12 advanced probability and statistics, 2ed. CK-12 Foundation. Nocon, R. & Nocon, E. (2016). Essential mathematics for the modern world.. QC: C & E Publishing, Inc. Vistru-Yu, C. and Gozon, A. (2016). Statistics a review ppt. CHED’s GE First Generation Training.