SlideShare a Scribd company logo
1 of 15
Download to read offline
Data Description
CHAPTER 3
UNIVERSITY OF ANTIQUE - 2 March 2021
Introduction
In the last chapter, you gain useful information from raw data by organizing and
presenting them in charts. This chapter will show you statistical methods that can be used to
summarized data. The most familiar of these methods is the finding of averages. Measures of
average are also called measures of central tendency. In addition to knowing the average, you
must know how the data values are dispersed. The measures determine the spread of data
values are called measures of variation, or measures of dispersion. Finally, another set of measures
is necessary to describe data. These measures are called measures of position. They will tell
where a specific data value falls within the data set or its relative position in comparison with
other data values.
At the end of this lesson, you should be able to:
1.Describe the uses of the measures of central tendency
2.Compute and interpret the mean, median and mode;
3. Discuss the properties of mean, median and mode.
4. Define and interpret results of any measures of variability
5. Determine the properties of normal curve, areas under normal curve, and its corresponding z-scores.
6. Interpret result using normal distribution, skewness and kurtosis.
Lesson 1. Measures of Central Tendency
Statistical methods are needed for summarizing and describing gathered numerical
data. However, by looking at the tables and graphs, one can have a difficulty in describing
the entire set of data. Better, if we can pick or choose a single score that would represent the
entirety of the data set. This value will be helpful in making decisions based on the data
collected. The measures of central tendency are used to select the central value that could
represent the entire set of data thus, helping the investigator in decision making.
WEEK 3 - DR. M. U, MAGBANUA 1
The Mean
The mean is the arithmetic average of the all the scores in the data set. This is the most
frequently used measure of central tendency. This is used when the data is either interval or
ratio.
The symbol ! read as “X bar” is used to represent sample mean while ! read as “mu”
represents the population mean.
Properties of Mean
• Used when the data is interval or ratio.
• It is the layman’s concept of the average.
• Used when the distribution is normal or is not badly skewed. The most reliable
measure of central tendency.
• The mean is found by using all the values of the data
• The mean varies less than the median or mode when samples are taken from the
same population and all three measures are computed for these samples.
• The mean is used in computing other statistics, such as the variance.
• The mean for the data set is unique and not necessarily one of the data values.
• The mean cannot be computed for the data in a frequency distribution that has an
open-ended class.
• The mean is affected by extremely high or low values, called outliers, and may not
be the appropriate average to use in these situations.
The Mean of Ungrouped Data
The mean of ungrouped data can be determined by adding all the scores or data and
divide the sum by the numbers of scores in the data. In symbol,
! .
For example, to find the mean of 5, 7, 9, 10, 12, and 15 is
! .
X̄ μ
X̄ =
X1 + X2 + … + Xn
n
=
∑
n
i=1
Xi
n
X̄ =
5 + 7 + 9 + 12 + 15
6
=
58
6
≈ 9.67
WEEK 3 - DR. M. U, MAGBANUA 2
The Weighted Mean
There are times that a number has a certain weight. For example, you are asked to
determine your mean grade in the first semester. Given the fact that every course has a
weight, this can be done by getting the sum of the products of a number and its weight
divided by the total weight.
Suppose, ! are the scores and their respective weights are
! , the weighted mean of the scores is defined as
!
Let us take John’s grade last semester:
Subject Grade (X) Unit(w)
Calculus 1.5 5
Filipino 2.0 3
Statistics 1.8 3
P.E 1.3 2
NSTP 1.0 0
In this data, grades are the scores and units are weights. To find the weighted mean of
John’s grade. The computation will be as follows:
! .
The Median
The median is the middle most score in the distribution. It divides the distribution into
upper 50% and lower 50%. The determination of median necessitates the arrangement of
scores either ascending or descending. If the number of scores (n) is odd, the median is the
middle value. If n is even, the median of the distribution is the average of two middle scores
in the ordered list. There are varieties of symbols for median. Some of the symbols are MD,
Mdn, Med or ! . For the sake of this module, we will be using Mdn for one simple reason— it
is suggested by American Psychological Association (APA).
Properties of the Median
• The median is used to find the centre or middle value of a data set.
• Is not amenable to algebraic manipulation
X1, X2, X3, …, Xn
w1, w2, w3, …, wn
X̄ =
X1w1 + X2w2 + … + Xnwn
w1 + w2 + w3 + … + wn
X̄ =
(1.5)(5) + (2.0)(3) + (1.8)(3) + (1.0)(0)
5 + 3 + 3 + 2 + 0
=
21.5
13
= 1.65
X̃
WEEK 3 - DR. M. U, MAGBANUA 3
• It is used when the distribution is grossly asymmetrical or skewed
• In an open-ended distribution, the median is the most reliable measure of the central
tendency
• It is used when the data is ordinal
• The median is used when it is necessary to find out whether the data values fall into
the upper half or lower half of the distribution.
• The median is affected less than the mean by extremely high or extremely low values.
Median of Ungrouped Data
To determine the median of ungrouped data, we must take these steps:
Step 1. Arrange the data set in ascending or descending order.
Step 2. If n is (a) odd, the median is the middle most value in an ordered values (b)
even, the median is the average of two middle most values.
To make it easier, to find the position of median in an ordered set of values the
following formula is used:
Position of Median = ! (where n is the number of scores)
Let us try to find the median of the following distributions:
Example 1: 4, 6, 2, 8, 10, 7, 8, 9, 9, 3, 5
Solution:
Step 1. Arrange the scores. 2, 3, 4, 5, 6, 7, 8, 8, 9, 9, 10
Step 2. Select the middle most score. Since there are 11 scores, the position of median is
Position of median = ! implies that the position of median is in
the 6th rank.
2, 3, 4, 5, 6, 7, 8, 8, 9, 9, 10
Step 3. Identify the median in the data set. Mdn= 7
Example 2: 4, 10, 13, 16, 19, 20, 25, 35, 40, 40
Solution:
Step 1. Arrange the scores. 4, 10, 13, 16, 19, 20, 25, 35, 40, 40
n + 1
2
11 + 1
2
=
12
2
= 6
WEEK 3 - DR. M. U, MAGBANUA 4
Step 2. Select the middlemost score. Since, there are 10 scores, the position of the
median is Position of median= ! implies that the position of median is
in the 5.5th rank. 4, 10, 13, 16, 19, 20, 25, 35, 40, 40
Step 3. Identify the median in the data set. Mdn= !
The Mode
The mode is the frequent score appearing in the distribution. It is used when the data is
nominal. If the data set not too large, one can determine the modal score by mere inspection.
The same as mean and median, mode has a variety of symbols. The most common are ! and
Mo. For the sake of this module, we will be using Mo as symbol for mode.
If there is only one mode, the distribution is unimodal. If there are two modes the
distribution is bimodal. If there are three modes, the distribution is trimodal. If there are four
or more modes, the distribution is multimodal or polymodal. If there is no mode, the
distribution is called rectangular distribution.
Properties of Mode
• The mode is used when the most typical case is desired.
• The mode is the easiest average to compute.
• The mode can be used when the data are nominal or categorical
• The mode is not always unique. A data set can have more than one mode, or the
mode may not exist for a data set
• Always located at the peak of the distribution
• Not unduly affected by extreme values
• Very unstable value
Mode of Ungrouped Data
The mode of ungrouped data is a value or values that occur most frequent. This can be
done by mere inspection. For example, we are going to find the mode of the following scores
(a) 3, 4, 6, 7, 7, 7, 8, 8, 9, 10 — the mode is 7. (b) 10, 9, 15, 10, 8, 11, 7, 12, 11, 5, 10 — the
modes are 10 and 11.
10 + 1
2
=
11
2
= 5.5
19 + 20
2
=
39
2
= 19.5
̂
x
WEEK 3 - DR. M. U, MAGBANUA 5
Lesson 2. Measure of Variability
In the previous lessons, you learned how to compute mean, median, and mode. These
measures of centrality focus only on giving information of what score could best represent
the entire set of data. However, if you want to determine the spread of scores, the measures of
variability can address that query. thus, in this lesson you will be learning the different kinds
of measures of variability or sometime called as measures of dispersion.
Closely grouped data have relatively small values, and more widely spread out data
have larger values. The closest possible grouping occurs when the data have no dispersion
(all data are the same value); in this situation, the measure of dispersion will be zero. There is
no limit how widely spread out the data can be.
Range
Range is the cutest measure of dispersion. It is the difference between the highest and
the lowest scores in the data set. This means that range considers only two scores, thus
making it the most unstable measure of dispersion. For ungrouped data, Range is !
Where: R — range; H — highest score; L — lowest score
Variance and Standard Deviation
Variance is defined as the average squared deviation from the mean while standard
deviation is the square root of variance.
Population variance, ! and sample variance, !
Where: ! — population variance
! — sample variance
! — individual score
! — population mean
! — sample mean
! — number of scores (population)
! — number of scores (sample)
R = H − L
σ2
=
∑ (X − μ2
)
N
s2
=
∑ (X − X̄2
)
n − 1
σ2
s2
X
μ
x̄
N
n
WEEK 3 - DR. M. U, MAGBANUA 6
Standard Deviation of Ungrouped Data
For standard deviation, since it is the square root of variance, the formula for the
population and sample standard deviation will be:
Population standard deviation, ! and sample standard deviation,
! . Since the variance and standard deviation are the measures of variability
or spread, they are interpreted as the lower the value the more clustered the scores are and
the higher the value the more spread the scores are.
Uses of the Variance and Standard Deviation
1. As previously stated, variances and standard deviations can be used to determine the
spread of the data. If the variance or standard deviation is large, the data are more
dispersed. This information is useful in comparing two (or more) data sets to determine
which is more (most) variable.
2. The measures of variance and standard deviation are used to determine the consistency of
a variable. For example, in the manufacture of fittings, such as nuts and bolts, the
variation in the diameters must be small, or the parts will not fit together.
3. The variance and standard deviation are used to determine the number of data values
that fall within a specified interval in a distribution.
4. Finally, the variance and standard deviation are used quite often in inferential statistics.
Coefficient of Variation
Whenever two samples have the same units of measure, the variance and standard
deviation for each can be compared directly. A statistics that allows to compare standard
deviations when the units are different is called the coefficient of variation.
The standard deviation or variance is not a reliable measure to compare two data sets in
terms of spread when the two sets are of different units or have the same units but widely
dissimilar mean in the field. In this case, the coefficient of variation is developed to answer
this kind of problem. The formula for coefficient of variation is given below: !
Where: CV — coefficient of variation; s — standard deviation; ! — mean
σ =
∑ (X − μ)2
N
s =
∑ (X − X̄ )2
n − 1
CV =
s
X̄
X̄
WEEK 3 - DR. M. U, MAGBANUA 7
Lesson 3. Measures of Position
In addition to measures of central tendency and measures of variation, there are
measures of position or location. These measures include standard scores, percentiles, deciles,
and quartiles. They are used to locate the relative position of a data value in the data set. For
example, if a value is located at the 80th percentile, it means that 80% of the values fall below
it in the distribution and 20% of the values fall above it. The median is the value that
corresponds to the 50th percentile, since one-half of the values fall below it and one-half of
the values fall above it.
Standard Scores
A standard score or ! score tells how many standard deviation a data value is above or
below the mean for a specific distribution of values. If a standard score is zero, then the data
value is the same as the mean.
A z score or standard score for a value is obtained by subtracting the mean from the
value and dividing the result by the standard deviation. The symbol for a standard score is z.
The formula is !
For the samples, the formula is !
For the populations, the formula is !
The z score represents the number of standard deviations that a data value fails above
or below the mean.
Percentiles
Percentiles are position measures used in educational and health-related fields to
indicate the position of an individual in a group.
Percentiles divide the data set into 100 equal groups. It is used to compare an
individual’s test score with the national norm.
Percentiles are not the same as percentages. That is, if a student gets 72 correct answers
out of a possible 100, she obtained a percentage score of 72. There is no indication of her
position with respect to the rest of the class. On the other hand, if a raw score of 72
z
z =
value - mean
standard deviation
z =
X − X̄
s
z =
X − μ
σ
WEEK 3 - DR. M. U, MAGBANUA 8
corresponds to the 64th percentile, then she did better than 64% of the students in her class.
Percentiles are symbolised by ! and divide the distribution into 100 groups.
Percentile Formula
The percentile corresponding to a given value X is computed by using the following
formula:
!
Finding a Data Value Corresponding to a Given Percentile
Step 1 Arrange the data in order from lowest to highest.
Step 2 Substitute into the formula !
where: n — total number of values
p — percentile
Step 3A If c is not a whole number, round up to the next whole number. Starting at the
lowest value, count over to the number that corresponds to the rounded-up value.
Step 3B If c is a whole number, use the value halfway between the cth and (c+1)st values
when counting up from the lowest value.
Quartiles and Deciles
Quartiles divide the distribution into four groups, separated by ! . Note that
! is the same as the 25th percentile; ! is the same as the 50th percentile, or median; !
corresponds to the 75th percentile, as shown
Finding Data Values Corresponding to ! , ! and !
Step 1 Arrange the data in order from lowest to highest.
P1, P2, P3, …, P99
Percentile =
(number of values below X)+ 0.5
total number of values
⋅ 100
c =
n ⋅ p
100
Q1, Q2, Q3
Q1 Q2 Q3
Q1 Q2 Q3
WEEK 3 - DR. M. U, MAGBANUA 9
Step 2 Find the median of the data values. This is the value for ! .
Step 3 Find the median of the data values that fall below ! . This is the value
for ! .
Step 4 Find the median of the data values fall above ! . This is the value for ! .
In addition to dividing the data set into four groups, quartiles can be used as a rough
measurement of variability. The interquartile range (IQR) is defined as the difference
between ! and ! and is the range of the middle 50% of the data.
The interquartile range is used to identify outliers, and it is also used as a measurement
of varibility in exploratory data analysis.
Deciles divide the distribution into 10 groups. They are denoted by !
Note that ! corresponds to ! ; ! corresponds to ! ; etc. Deciles can be found by
using the formulas given for percentiles.
Taken altogether then, these are the relationships among percentiles, deciles, and
quartiles.
Deciles are denoted by ! , and there correspond to ! .
Quartiles are denoted by ! and they correspond to ! .
The median is the same as ! or !
Summary of Position Measures
Q2
Q2
Q1
Q2 Q3
Q1 Q3
D1, D2, etc.
D1 P10 D2 P20
D1, D2, D3, …, D9 P10, P20, P30, …, P90
Q1, Q2, Q3 P25, P50, P75
P50, Q2 D5
Measure Definition Symbol(s)
Standard score or z score Number of standard deviations that a data value is
above or below the mean
z
Percentile Position in hundredths that a data value holds in
the distribution
Decile Position in tenths that a data value holds in the
distribution
Quartile Position in fourths that a data value holds in the
distribution
!Pn
!Qn
!Dn
WEEK 3 - DR. M. U, MAGBANUA 10
Outliers
A data set should be checked for extremely high or extremely low values. These values
are called outliers.
An outlier is an extremely high or an extremely low data value when compared with
the rest of the data values. It can strongly affect the mean and standard deviation of a
variable and can have an effect on other statistics as well.
There are several ways to check a data set for outliers. One method is
Step 1 Arrange the data in order and find ! and ! .
Step 2 Find the interquartile range: IQR = !
Step 3 Multiply the IQR by 1.5.
Step 4 Subtract the value obtained in Step 3 from ! and add the value of !
Step 5 Check the data set for any data value that smaller that ! or larger
than ! .
Lesson 4 Distribution Shapes
Continuous variable can assume all values between any two given values of the
variable. Many continuous variables have distributions that are bell-shaped, and these are
called approximately normally distributed variables. The distribution is also called a bell curve or a
Gaussian distribution, named for the German mathematician Carl Friedrich Gauss (1777-1855),
who derived its equation.
Skewness
No variable fits a normal distribution perfectly, since a normal distribution is a
theoretical distribution. However, a normal distribution can used to describe many variables,
because the deviations from a normal distribution are very small.
When the data values are evenly distributed about the mean, a distribution is said to
be a symmetric distribution. When the majority of the data values fall to the left or right of
the mean, the distribution is said to be skewed.
When the majority of the data values fall to the right of the mean, the distribution is
said to be a negatively or left-skewed distribution. The mean is to the left of the median,
and the mean and the median are to the left of the mode.
When the majority of the data values fall to the left of the mean, a distribution is said
to be a positively or right-skewed distribution. The mean falls to the right if the median, and
both the mean and the median fall to the right of the mode.
Q1 Q3
Q3 − Q1
Q1 Q3
Q1 − 1.5(IQR)
Q3 + 1.5(IQR)
WEEK 3 - DR. M. U, MAGBANUA 11
There are several formulas in finding the coefficient of skewness. The coefficient of
skewness will be converted to its standard score (z-score). If the alpha level (! level) is set to
! then the calculated z-score must fall within ! and ! so that the data
approximates the normal distribution. If the z-score goes beyond these values, it means that
the data has an unacceptable skewness at !
The “tail” of the curve indicates the direction of skewness (right is positive, left is negative).
Kurtosis
Kurtosis is associated with the tallness rather than the flatness or weakness of the
distribution. It is also a measure that describes the tail of the distribution in relation to its
overall shape. There are three types of kurtosis— The Mesokurtic Distribution has a
kurtosis similar to that of the normal distribution. This means that the extreme value
characteristics of the distribution is the same as the normal distribution.
The Leptokurtic Distribution is a kind of distribution that has kurtosis greater than the
normal. Lepto means thin or skinny. Generally, the leptokurtic curve is characterized by a
narrow or thin curve that is taller than the normal. However, its thin shape is only a
consequence of the tails of the distribution which stretch along the horizontal axis. This
happens when there are occasional extreme outliers appear in the distribution.
α
.05 −1.96 +1.96
α = .05
WEEK 3 - DR. M. U, MAGBANUA 12
The Platykurtic Distribution is a kind of distribution characterized by short tails of the
curve. The platy means broad or flat. This characteristics of the curve is due to the fact that
the middle scores have almost the same or similar frequency. Furthermore, the extreme
values are less compare to that of the normal distribution.
Normal Distribution
A normal distribution is a continuous, symmetric, bell-shaped distribution of a
variable.
Properties of the Theoretical Normal Distribution
1. A normal distribution curve is bell-shaped
2. The mean, median, and mode are equal and are located at the centre of the
distribution.
3. A normal distribution curve is unimodal.
4. The curve is symmetric about the mean, which is equivalent to saying that its shape
is the same on both sides of a vertical line passing through the centre.
5. The curve is continuous; that is, there are no gaps or holes. For each value of X, there
is a corresponding value of Y.
6. The curve never touches the x axis. — but it gets increasingly closer.
7. The total area under the normal distribution curve is equal to 1.00 or 100%.
8. The area under the part of a normal curve that lies within 1 standard deviation of
the mean is approximately 0.68, or 68%; within 2 standard deviations, about 0.95 or 95%;
and within 3 standard deviations, about 0.997, or 99.7%.
WEEK 3 - DR. M. U, MAGBANUA 13
The areas under a Normal Distribution Curve
The Standard Normal Distribution
Since each normally distributed variable has its own mean and standard deviation, the
shape and location of these curves will vary. To simplify this situation, statisticians use what
is called the standard normal distribution.
The standard normal distribution is normal distribution with a mean of 0 and a
standard deviation of 1.
The formula for the standard normal distribution is ! . All normally
distributed variables can be transformed into the standard normally distributed variable by
using the formula for the standard score : ! or ! .
The areas under a Standard Normal Distribution
y =
e−z2/2
2π
z =
value - mean
standard deviation
z =
X − μ
σ
WEEK 3 - DR. M. U, MAGBANUA 14
Determining Normality
A normally shaped or bell-shaped distribution is only one of many shapes that a
distribution can assume; however, it is very important sine many statistical methods require
that the distribution of values be normally or approximately normally shaped.
There are several ways statisticians check for normality. The easiest way is to draw a
histogram for the data and check its shape. If the histogram is not approximately nell-shaped,
then the data are not normally distributed.
Skewness can be checked by using the Pearson coefficient of skewness (PC) also called
Pearson’s index of skewness. The formula is ! . If the index is greater
than or equal to +1 or less than or equal to -1, it can be concluded that data are significantly
skewed.
Another method that is used to check normality is to draw a normal quantile plot.
Quantiles, sometimes called fractiles, are values that separate the data set into approximately
equal groups.
There are several other methods used to checked for normality, if we are to use SPSS the
statistical tool we may use are Kolmogorov-Smirnov test, Liliefors test, and Shapiro-Wilks .
The tool should show no significant results for you to determine that the data are
approximately normal.
PC =
3(X̄ − median)
s
WEEK 3 - DR. M. U, MAGBANUA 15

More Related Content

What's hot

Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersionRajaKrishnan M
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencyNilanjan Bhaumik
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsAndi Koentary
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencyRichard Paulino
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDouglas Joubert
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencygaya3lavanya92
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boaraileeanne
 
Statistics
StatisticsStatistics
Statisticsitutor
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencyAlex Chris
 
Descriptive Statistics, Numerical Description
Descriptive Statistics, Numerical DescriptionDescriptive Statistics, Numerical Description
Descriptive Statistics, Numerical Descriptiongetyourcheaton
 
Medical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statisticsMedical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statisticsRamachandra Barik
 
Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)HennaAnsari
 

What's hot (20)

Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Measures of Central tendency
Measures of Central tendencyMeasures of Central tendency
Measures of Central tendency
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Lesson 1 07 measures of variation
Lesson 1 07 measures of variationLesson 1 07 measures of variation
Lesson 1 07 measures of variation
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data Visualization
 
Types of Data, Key Concept
Types of Data, Key ConceptTypes of Data, Key Concept
Types of Data, Key Concept
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
General Statistics boa
General Statistics boaGeneral Statistics boa
General Statistics boa
 
Statistics
StatisticsStatistics
Statistics
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Descriptive Statistics, Numerical Description
Descriptive Statistics, Numerical DescriptionDescriptive Statistics, Numerical Description
Descriptive Statistics, Numerical Description
 
Medical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statisticsMedical Statistics Part-I:Descriptive statistics
Medical Statistics Part-I:Descriptive statistics
 
Elementary Statistics
Elementary Statistics Elementary Statistics
Elementary Statistics
 
Statistics
StatisticsStatistics
Statistics
 
Measure of Central Tendency
Measure of Central TendencyMeasure of Central Tendency
Measure of Central Tendency
 
Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)
 

Similar to 3. measures of central tendency

ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docxANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docxcullenrjzsme
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsSarfraz Ahmad
 
Central tendancy 4
Central tendancy 4Central tendancy 4
Central tendancy 4Sundar B N
 
Measure OF Central Tendency
Measure OF Central TendencyMeasure OF Central Tendency
Measure OF Central TendencyIqrabutt038
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersionAbhinav yadav
 
Descriptions of data statistics for research
Descriptions of data   statistics for researchDescriptions of data   statistics for research
Descriptions of data statistics for researchHarve Abella
 
Statistics (GE 4 CLASS).pptx
Statistics (GE 4 CLASS).pptxStatistics (GE 4 CLASS).pptx
Statistics (GE 4 CLASS).pptxYollyCalamba
 
Descriptive Statistics: Measures of Central Tendency - Measures of Dispersion...
Descriptive Statistics: Measures of Central Tendency - Measures of Dispersion...Descriptive Statistics: Measures of Central Tendency - Measures of Dispersion...
Descriptive Statistics: Measures of Central Tendency - Measures of Dispersion...EqraBaig
 
Measures of central tendancy
Measures of central tendancy Measures of central tendancy
Measures of central tendancy Pranav Krishna
 
Machine learning pre requisite
Machine learning pre requisiteMachine learning pre requisite
Machine learning pre requisiteRam Singh
 
Descriptive Statistics.pptx
Descriptive Statistics.pptxDescriptive Statistics.pptx
Descriptive Statistics.pptxtest215275
 
Describing quantitative data with numbers
Describing quantitative data with numbersDescribing quantitative data with numbers
Describing quantitative data with numbersUlster BOCES
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsmolly joy
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencyMmedsc Hahm
 

Similar to 3. measures of central tendency (20)

data
datadata
data
 
Stat11t chapter3
Stat11t chapter3Stat11t chapter3
Stat11t chapter3
 
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docxANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Central tendancy 4
Central tendancy 4Central tendancy 4
Central tendancy 4
 
Measure OF Central Tendency
Measure OF Central TendencyMeasure OF Central Tendency
Measure OF Central Tendency
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
 
Central Tendency.pptx
Central Tendency.pptxCentral Tendency.pptx
Central Tendency.pptx
 
Descriptions of data statistics for research
Descriptions of data   statistics for researchDescriptions of data   statistics for research
Descriptions of data statistics for research
 
Statistics (GE 4 CLASS).pptx
Statistics (GE 4 CLASS).pptxStatistics (GE 4 CLASS).pptx
Statistics (GE 4 CLASS).pptx
 
Descriptive Statistics: Measures of Central Tendency - Measures of Dispersion...
Descriptive Statistics: Measures of Central Tendency - Measures of Dispersion...Descriptive Statistics: Measures of Central Tendency - Measures of Dispersion...
Descriptive Statistics: Measures of Central Tendency - Measures of Dispersion...
 
Measures of central tendancy
Measures of central tendancy Measures of central tendancy
Measures of central tendancy
 
Machine learning pre requisite
Machine learning pre requisiteMachine learning pre requisite
Machine learning pre requisite
 
Descriptive Statistics.pptx
Descriptive Statistics.pptxDescriptive Statistics.pptx
Descriptive Statistics.pptx
 
Describing quantitative data with numbers
Describing quantitative data with numbersDescribing quantitative data with numbers
Describing quantitative data with numbers
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
 
Intro to Biostat. ppt
Intro to Biostat. pptIntro to Biostat. ppt
Intro to Biostat. ppt
 
Unit 3_1.pptx
Unit 3_1.pptxUnit 3_1.pptx
Unit 3_1.pptx
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 

Recently uploaded

How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 

Recently uploaded (20)

How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 

3. measures of central tendency

  • 1. Data Description CHAPTER 3 UNIVERSITY OF ANTIQUE - 2 March 2021 Introduction In the last chapter, you gain useful information from raw data by organizing and presenting them in charts. This chapter will show you statistical methods that can be used to summarized data. The most familiar of these methods is the finding of averages. Measures of average are also called measures of central tendency. In addition to knowing the average, you must know how the data values are dispersed. The measures determine the spread of data values are called measures of variation, or measures of dispersion. Finally, another set of measures is necessary to describe data. These measures are called measures of position. They will tell where a specific data value falls within the data set or its relative position in comparison with other data values. At the end of this lesson, you should be able to: 1.Describe the uses of the measures of central tendency 2.Compute and interpret the mean, median and mode; 3. Discuss the properties of mean, median and mode. 4. Define and interpret results of any measures of variability 5. Determine the properties of normal curve, areas under normal curve, and its corresponding z-scores. 6. Interpret result using normal distribution, skewness and kurtosis. Lesson 1. Measures of Central Tendency Statistical methods are needed for summarizing and describing gathered numerical data. However, by looking at the tables and graphs, one can have a difficulty in describing the entire set of data. Better, if we can pick or choose a single score that would represent the entirety of the data set. This value will be helpful in making decisions based on the data collected. The measures of central tendency are used to select the central value that could represent the entire set of data thus, helping the investigator in decision making. WEEK 3 - DR. M. U, MAGBANUA 1
  • 2. The Mean The mean is the arithmetic average of the all the scores in the data set. This is the most frequently used measure of central tendency. This is used when the data is either interval or ratio. The symbol ! read as “X bar” is used to represent sample mean while ! read as “mu” represents the population mean. Properties of Mean • Used when the data is interval or ratio. • It is the layman’s concept of the average. • Used when the distribution is normal or is not badly skewed. The most reliable measure of central tendency. • The mean is found by using all the values of the data • The mean varies less than the median or mode when samples are taken from the same population and all three measures are computed for these samples. • The mean is used in computing other statistics, such as the variance. • The mean for the data set is unique and not necessarily one of the data values. • The mean cannot be computed for the data in a frequency distribution that has an open-ended class. • The mean is affected by extremely high or low values, called outliers, and may not be the appropriate average to use in these situations. The Mean of Ungrouped Data The mean of ungrouped data can be determined by adding all the scores or data and divide the sum by the numbers of scores in the data. In symbol, ! . For example, to find the mean of 5, 7, 9, 10, 12, and 15 is ! . X̄ μ X̄ = X1 + X2 + … + Xn n = ∑ n i=1 Xi n X̄ = 5 + 7 + 9 + 12 + 15 6 = 58 6 ≈ 9.67 WEEK 3 - DR. M. U, MAGBANUA 2
  • 3. The Weighted Mean There are times that a number has a certain weight. For example, you are asked to determine your mean grade in the first semester. Given the fact that every course has a weight, this can be done by getting the sum of the products of a number and its weight divided by the total weight. Suppose, ! are the scores and their respective weights are ! , the weighted mean of the scores is defined as ! Let us take John’s grade last semester: Subject Grade (X) Unit(w) Calculus 1.5 5 Filipino 2.0 3 Statistics 1.8 3 P.E 1.3 2 NSTP 1.0 0 In this data, grades are the scores and units are weights. To find the weighted mean of John’s grade. The computation will be as follows: ! . The Median The median is the middle most score in the distribution. It divides the distribution into upper 50% and lower 50%. The determination of median necessitates the arrangement of scores either ascending or descending. If the number of scores (n) is odd, the median is the middle value. If n is even, the median of the distribution is the average of two middle scores in the ordered list. There are varieties of symbols for median. Some of the symbols are MD, Mdn, Med or ! . For the sake of this module, we will be using Mdn for one simple reason— it is suggested by American Psychological Association (APA). Properties of the Median • The median is used to find the centre or middle value of a data set. • Is not amenable to algebraic manipulation X1, X2, X3, …, Xn w1, w2, w3, …, wn X̄ = X1w1 + X2w2 + … + Xnwn w1 + w2 + w3 + … + wn X̄ = (1.5)(5) + (2.0)(3) + (1.8)(3) + (1.0)(0) 5 + 3 + 3 + 2 + 0 = 21.5 13 = 1.65 X̃ WEEK 3 - DR. M. U, MAGBANUA 3
  • 4. • It is used when the distribution is grossly asymmetrical or skewed • In an open-ended distribution, the median is the most reliable measure of the central tendency • It is used when the data is ordinal • The median is used when it is necessary to find out whether the data values fall into the upper half or lower half of the distribution. • The median is affected less than the mean by extremely high or extremely low values. Median of Ungrouped Data To determine the median of ungrouped data, we must take these steps: Step 1. Arrange the data set in ascending or descending order. Step 2. If n is (a) odd, the median is the middle most value in an ordered values (b) even, the median is the average of two middle most values. To make it easier, to find the position of median in an ordered set of values the following formula is used: Position of Median = ! (where n is the number of scores) Let us try to find the median of the following distributions: Example 1: 4, 6, 2, 8, 10, 7, 8, 9, 9, 3, 5 Solution: Step 1. Arrange the scores. 2, 3, 4, 5, 6, 7, 8, 8, 9, 9, 10 Step 2. Select the middle most score. Since there are 11 scores, the position of median is Position of median = ! implies that the position of median is in the 6th rank. 2, 3, 4, 5, 6, 7, 8, 8, 9, 9, 10 Step 3. Identify the median in the data set. Mdn= 7 Example 2: 4, 10, 13, 16, 19, 20, 25, 35, 40, 40 Solution: Step 1. Arrange the scores. 4, 10, 13, 16, 19, 20, 25, 35, 40, 40 n + 1 2 11 + 1 2 = 12 2 = 6 WEEK 3 - DR. M. U, MAGBANUA 4
  • 5. Step 2. Select the middlemost score. Since, there are 10 scores, the position of the median is Position of median= ! implies that the position of median is in the 5.5th rank. 4, 10, 13, 16, 19, 20, 25, 35, 40, 40 Step 3. Identify the median in the data set. Mdn= ! The Mode The mode is the frequent score appearing in the distribution. It is used when the data is nominal. If the data set not too large, one can determine the modal score by mere inspection. The same as mean and median, mode has a variety of symbols. The most common are ! and Mo. For the sake of this module, we will be using Mo as symbol for mode. If there is only one mode, the distribution is unimodal. If there are two modes the distribution is bimodal. If there are three modes, the distribution is trimodal. If there are four or more modes, the distribution is multimodal or polymodal. If there is no mode, the distribution is called rectangular distribution. Properties of Mode • The mode is used when the most typical case is desired. • The mode is the easiest average to compute. • The mode can be used when the data are nominal or categorical • The mode is not always unique. A data set can have more than one mode, or the mode may not exist for a data set • Always located at the peak of the distribution • Not unduly affected by extreme values • Very unstable value Mode of Ungrouped Data The mode of ungrouped data is a value or values that occur most frequent. This can be done by mere inspection. For example, we are going to find the mode of the following scores (a) 3, 4, 6, 7, 7, 7, 8, 8, 9, 10 — the mode is 7. (b) 10, 9, 15, 10, 8, 11, 7, 12, 11, 5, 10 — the modes are 10 and 11. 10 + 1 2 = 11 2 = 5.5 19 + 20 2 = 39 2 = 19.5 ̂ x WEEK 3 - DR. M. U, MAGBANUA 5
  • 6. Lesson 2. Measure of Variability In the previous lessons, you learned how to compute mean, median, and mode. These measures of centrality focus only on giving information of what score could best represent the entire set of data. However, if you want to determine the spread of scores, the measures of variability can address that query. thus, in this lesson you will be learning the different kinds of measures of variability or sometime called as measures of dispersion. Closely grouped data have relatively small values, and more widely spread out data have larger values. The closest possible grouping occurs when the data have no dispersion (all data are the same value); in this situation, the measure of dispersion will be zero. There is no limit how widely spread out the data can be. Range Range is the cutest measure of dispersion. It is the difference between the highest and the lowest scores in the data set. This means that range considers only two scores, thus making it the most unstable measure of dispersion. For ungrouped data, Range is ! Where: R — range; H — highest score; L — lowest score Variance and Standard Deviation Variance is defined as the average squared deviation from the mean while standard deviation is the square root of variance. Population variance, ! and sample variance, ! Where: ! — population variance ! — sample variance ! — individual score ! — population mean ! — sample mean ! — number of scores (population) ! — number of scores (sample) R = H − L σ2 = ∑ (X − μ2 ) N s2 = ∑ (X − X̄2 ) n − 1 σ2 s2 X μ x̄ N n WEEK 3 - DR. M. U, MAGBANUA 6
  • 7. Standard Deviation of Ungrouped Data For standard deviation, since it is the square root of variance, the formula for the population and sample standard deviation will be: Population standard deviation, ! and sample standard deviation, ! . Since the variance and standard deviation are the measures of variability or spread, they are interpreted as the lower the value the more clustered the scores are and the higher the value the more spread the scores are. Uses of the Variance and Standard Deviation 1. As previously stated, variances and standard deviations can be used to determine the spread of the data. If the variance or standard deviation is large, the data are more dispersed. This information is useful in comparing two (or more) data sets to determine which is more (most) variable. 2. The measures of variance and standard deviation are used to determine the consistency of a variable. For example, in the manufacture of fittings, such as nuts and bolts, the variation in the diameters must be small, or the parts will not fit together. 3. The variance and standard deviation are used to determine the number of data values that fall within a specified interval in a distribution. 4. Finally, the variance and standard deviation are used quite often in inferential statistics. Coefficient of Variation Whenever two samples have the same units of measure, the variance and standard deviation for each can be compared directly. A statistics that allows to compare standard deviations when the units are different is called the coefficient of variation. The standard deviation or variance is not a reliable measure to compare two data sets in terms of spread when the two sets are of different units or have the same units but widely dissimilar mean in the field. In this case, the coefficient of variation is developed to answer this kind of problem. The formula for coefficient of variation is given below: ! Where: CV — coefficient of variation; s — standard deviation; ! — mean σ = ∑ (X − μ)2 N s = ∑ (X − X̄ )2 n − 1 CV = s X̄ X̄ WEEK 3 - DR. M. U, MAGBANUA 7
  • 8. Lesson 3. Measures of Position In addition to measures of central tendency and measures of variation, there are measures of position or location. These measures include standard scores, percentiles, deciles, and quartiles. They are used to locate the relative position of a data value in the data set. For example, if a value is located at the 80th percentile, it means that 80% of the values fall below it in the distribution and 20% of the values fall above it. The median is the value that corresponds to the 50th percentile, since one-half of the values fall below it and one-half of the values fall above it. Standard Scores A standard score or ! score tells how many standard deviation a data value is above or below the mean for a specific distribution of values. If a standard score is zero, then the data value is the same as the mean. A z score or standard score for a value is obtained by subtracting the mean from the value and dividing the result by the standard deviation. The symbol for a standard score is z. The formula is ! For the samples, the formula is ! For the populations, the formula is ! The z score represents the number of standard deviations that a data value fails above or below the mean. Percentiles Percentiles are position measures used in educational and health-related fields to indicate the position of an individual in a group. Percentiles divide the data set into 100 equal groups. It is used to compare an individual’s test score with the national norm. Percentiles are not the same as percentages. That is, if a student gets 72 correct answers out of a possible 100, she obtained a percentage score of 72. There is no indication of her position with respect to the rest of the class. On the other hand, if a raw score of 72 z z = value - mean standard deviation z = X − X̄ s z = X − μ σ WEEK 3 - DR. M. U, MAGBANUA 8
  • 9. corresponds to the 64th percentile, then she did better than 64% of the students in her class. Percentiles are symbolised by ! and divide the distribution into 100 groups. Percentile Formula The percentile corresponding to a given value X is computed by using the following formula: ! Finding a Data Value Corresponding to a Given Percentile Step 1 Arrange the data in order from lowest to highest. Step 2 Substitute into the formula ! where: n — total number of values p — percentile Step 3A If c is not a whole number, round up to the next whole number. Starting at the lowest value, count over to the number that corresponds to the rounded-up value. Step 3B If c is a whole number, use the value halfway between the cth and (c+1)st values when counting up from the lowest value. Quartiles and Deciles Quartiles divide the distribution into four groups, separated by ! . Note that ! is the same as the 25th percentile; ! is the same as the 50th percentile, or median; ! corresponds to the 75th percentile, as shown Finding Data Values Corresponding to ! , ! and ! Step 1 Arrange the data in order from lowest to highest. P1, P2, P3, …, P99 Percentile = (number of values below X)+ 0.5 total number of values ⋅ 100 c = n ⋅ p 100 Q1, Q2, Q3 Q1 Q2 Q3 Q1 Q2 Q3 WEEK 3 - DR. M. U, MAGBANUA 9
  • 10. Step 2 Find the median of the data values. This is the value for ! . Step 3 Find the median of the data values that fall below ! . This is the value for ! . Step 4 Find the median of the data values fall above ! . This is the value for ! . In addition to dividing the data set into four groups, quartiles can be used as a rough measurement of variability. The interquartile range (IQR) is defined as the difference between ! and ! and is the range of the middle 50% of the data. The interquartile range is used to identify outliers, and it is also used as a measurement of varibility in exploratory data analysis. Deciles divide the distribution into 10 groups. They are denoted by ! Note that ! corresponds to ! ; ! corresponds to ! ; etc. Deciles can be found by using the formulas given for percentiles. Taken altogether then, these are the relationships among percentiles, deciles, and quartiles. Deciles are denoted by ! , and there correspond to ! . Quartiles are denoted by ! and they correspond to ! . The median is the same as ! or ! Summary of Position Measures Q2 Q2 Q1 Q2 Q3 Q1 Q3 D1, D2, etc. D1 P10 D2 P20 D1, D2, D3, …, D9 P10, P20, P30, …, P90 Q1, Q2, Q3 P25, P50, P75 P50, Q2 D5 Measure Definition Symbol(s) Standard score or z score Number of standard deviations that a data value is above or below the mean z Percentile Position in hundredths that a data value holds in the distribution Decile Position in tenths that a data value holds in the distribution Quartile Position in fourths that a data value holds in the distribution !Pn !Qn !Dn WEEK 3 - DR. M. U, MAGBANUA 10
  • 11. Outliers A data set should be checked for extremely high or extremely low values. These values are called outliers. An outlier is an extremely high or an extremely low data value when compared with the rest of the data values. It can strongly affect the mean and standard deviation of a variable and can have an effect on other statistics as well. There are several ways to check a data set for outliers. One method is Step 1 Arrange the data in order and find ! and ! . Step 2 Find the interquartile range: IQR = ! Step 3 Multiply the IQR by 1.5. Step 4 Subtract the value obtained in Step 3 from ! and add the value of ! Step 5 Check the data set for any data value that smaller that ! or larger than ! . Lesson 4 Distribution Shapes Continuous variable can assume all values between any two given values of the variable. Many continuous variables have distributions that are bell-shaped, and these are called approximately normally distributed variables. The distribution is also called a bell curve or a Gaussian distribution, named for the German mathematician Carl Friedrich Gauss (1777-1855), who derived its equation. Skewness No variable fits a normal distribution perfectly, since a normal distribution is a theoretical distribution. However, a normal distribution can used to describe many variables, because the deviations from a normal distribution are very small. When the data values are evenly distributed about the mean, a distribution is said to be a symmetric distribution. When the majority of the data values fall to the left or right of the mean, the distribution is said to be skewed. When the majority of the data values fall to the right of the mean, the distribution is said to be a negatively or left-skewed distribution. The mean is to the left of the median, and the mean and the median are to the left of the mode. When the majority of the data values fall to the left of the mean, a distribution is said to be a positively or right-skewed distribution. The mean falls to the right if the median, and both the mean and the median fall to the right of the mode. Q1 Q3 Q3 − Q1 Q1 Q3 Q1 − 1.5(IQR) Q3 + 1.5(IQR) WEEK 3 - DR. M. U, MAGBANUA 11
  • 12. There are several formulas in finding the coefficient of skewness. The coefficient of skewness will be converted to its standard score (z-score). If the alpha level (! level) is set to ! then the calculated z-score must fall within ! and ! so that the data approximates the normal distribution. If the z-score goes beyond these values, it means that the data has an unacceptable skewness at ! The “tail” of the curve indicates the direction of skewness (right is positive, left is negative). Kurtosis Kurtosis is associated with the tallness rather than the flatness or weakness of the distribution. It is also a measure that describes the tail of the distribution in relation to its overall shape. There are three types of kurtosis— The Mesokurtic Distribution has a kurtosis similar to that of the normal distribution. This means that the extreme value characteristics of the distribution is the same as the normal distribution. The Leptokurtic Distribution is a kind of distribution that has kurtosis greater than the normal. Lepto means thin or skinny. Generally, the leptokurtic curve is characterized by a narrow or thin curve that is taller than the normal. However, its thin shape is only a consequence of the tails of the distribution which stretch along the horizontal axis. This happens when there are occasional extreme outliers appear in the distribution. α .05 −1.96 +1.96 α = .05 WEEK 3 - DR. M. U, MAGBANUA 12
  • 13. The Platykurtic Distribution is a kind of distribution characterized by short tails of the curve. The platy means broad or flat. This characteristics of the curve is due to the fact that the middle scores have almost the same or similar frequency. Furthermore, the extreme values are less compare to that of the normal distribution. Normal Distribution A normal distribution is a continuous, symmetric, bell-shaped distribution of a variable. Properties of the Theoretical Normal Distribution 1. A normal distribution curve is bell-shaped 2. The mean, median, and mode are equal and are located at the centre of the distribution. 3. A normal distribution curve is unimodal. 4. The curve is symmetric about the mean, which is equivalent to saying that its shape is the same on both sides of a vertical line passing through the centre. 5. The curve is continuous; that is, there are no gaps or holes. For each value of X, there is a corresponding value of Y. 6. The curve never touches the x axis. — but it gets increasingly closer. 7. The total area under the normal distribution curve is equal to 1.00 or 100%. 8. The area under the part of a normal curve that lies within 1 standard deviation of the mean is approximately 0.68, or 68%; within 2 standard deviations, about 0.95 or 95%; and within 3 standard deviations, about 0.997, or 99.7%. WEEK 3 - DR. M. U, MAGBANUA 13
  • 14. The areas under a Normal Distribution Curve The Standard Normal Distribution Since each normally distributed variable has its own mean and standard deviation, the shape and location of these curves will vary. To simplify this situation, statisticians use what is called the standard normal distribution. The standard normal distribution is normal distribution with a mean of 0 and a standard deviation of 1. The formula for the standard normal distribution is ! . All normally distributed variables can be transformed into the standard normally distributed variable by using the formula for the standard score : ! or ! . The areas under a Standard Normal Distribution y = e−z2/2 2π z = value - mean standard deviation z = X − μ σ WEEK 3 - DR. M. U, MAGBANUA 14
  • 15. Determining Normality A normally shaped or bell-shaped distribution is only one of many shapes that a distribution can assume; however, it is very important sine many statistical methods require that the distribution of values be normally or approximately normally shaped. There are several ways statisticians check for normality. The easiest way is to draw a histogram for the data and check its shape. If the histogram is not approximately nell-shaped, then the data are not normally distributed. Skewness can be checked by using the Pearson coefficient of skewness (PC) also called Pearson’s index of skewness. The formula is ! . If the index is greater than or equal to +1 or less than or equal to -1, it can be concluded that data are significantly skewed. Another method that is used to check normality is to draw a normal quantile plot. Quantiles, sometimes called fractiles, are values that separate the data set into approximately equal groups. There are several other methods used to checked for normality, if we are to use SPSS the statistical tool we may use are Kolmogorov-Smirnov test, Liliefors test, and Shapiro-Wilks . The tool should show no significant results for you to determine that the data are approximately normal. PC = 3(X̄ − median) s WEEK 3 - DR. M. U, MAGBANUA 15