SlideShare a Scribd company logo
1 of 39
Download to read offline
1
SECTION 1
Statistics: is the science of obtaining data, organizing, summarizing, and
presenting, analyzing, interpreting and drawing conclusions based on the data to
give the best decision.
Statistics divided in to two distinct parts:
1- Descriptive Statistics: It is concerned only with the collection, organization,
summarizing, analysis and presentation of an array of numerical qualitative or
quantitative data. Descriptive statistics include the mean, median, mode, standard
deviation, range, etc.
2- Inferential Statistics: it is consist of methods for drawing conclusion based on
the data to give the best decision. Its divide in to two parts also:
A- Estimation
B- Testing Hypothesis
Population: Is the complete collection of all elements to be studied.
Finite (countable) Population: A population is called finite if it is possible to
count its individuals. For example, the number of students in Shaqlawa technical
institute or number of computers in a libratory.
Infinite (uncountable) Population: A population is called infinite if it is
impossible to count its individuals, for example the number of bacteria's in a
garden, number of fishes in a sea.
Census: is the collection of data from every elements of population.
Sample: is a sub- collection of elements drawn from a population.
Sampling: the process of selecting a subset of data from the population is called
Sampling.
2
Sources of collecting the data:
1- Historical Sources
2- Field Sources
Probability (Random) Samples are drawn from populations through several
different sampling methods:
1- Simple Random Sampling
Every member of the population (N) has an equal chance of being selected for
your sample (n). This is arguably the best sampling method, as your samples
almost guaranteed to be representative of your population. However, it is rarely
ever used due to being too impractical.
2- Systematic Sampling
In this method, every nth individual from the population (N) is placed in the
sample (n). For example, if you add every 7th individual to walk out of a
supermarket to your sample, you are performing systematic sampling.
3- Stratified Sampling
A general problem with random sampling is that you could, by chance, miss out a
particular group in the sample. However, if you form the population into groups,
and sample from each group, you can make sure the sample is representative. In
METHODS OF
COLLECTING THE DATA
SAMPLES
PROBABILITY
(RANDOM)
NON
PROBABLITY
CENSUS
3
stratified sampling, the population is divided into groups called strata. A sample is
then drawn from within these strata. Some examples of strata commonly used by
the ABS are States, Age and Sex. Other strata may be religion, academic ability
or marital status.
4- MULTI-STAGE SAMPLING
Multi-stage sampling is like cluster sampling, but involves selecting a sample
within each chosen cluster, rather than including all units in the cluster. Thus,
multi-stage sampling involves selecting a sample in at least two stages. In the first
stage, large groups or clusters are selected. These clusters are designed to contain
more population units than are required for the final sample. In the second stage,
population units are chosen from selected clusters to derive a final sample. If
more than two stages are used, the process of choosing population units within
clusters continues until the final sample is achieved.
Variable: is a characteristic or property of the elements in the population. The
name of variable is derived from the fact that any particular characteristic may
vary among the elements in a population.
Variables
Quantitative variables
Descrete variables
(Number of students)
Continuous variables
(Hieght, Weight)
Qualitative (descriptive)
variables
4
Section 2
Frequency Distribution (Table):
After a researcher might have gotten a raw data from any source, there is a need
for the raw data (ungrouped) to be arranged and organized in a meaningful way in
order to be able to describe and come up with a useful inference. The method that
is being used for such organization and arrangement is called frequency
distribution. Frequency means the number of times something happens.
Frequency distribution simply means organizing of raw data in table from using
classes and frequencies.
1- Frequency Distribution for Qualitative variables:
Frequency Distribution for Qualitative variables lists all classes and the number of
elements that belong to each of the classes.
Example1: the following list gives the rank of a sample that consists of 25 clerks
in Soran institute:
Researcher, Assistant Researcher, Assistant Researcher, Lecturer, Assistant Researcher,
Assistant lecturer, Assistant lecturer, Researcher, Lecturer, Researcher, Assistant
Researcher, Researcher, Assistant Researcher, Assistant lecturer, Assistant Researcher,
Lecturer, Assistant Researcher, Assistant lecturer, Assistant lecturer, Researcher,
Lecturer, Assistant Researcher, Assistant Researcher, Assistant Researcher, Researcher.
Create a frequency distribution for the above data.
Solution:
FrequencyClasses (rank)
4Lecturer
5Assistant lecturer
6Researcher
10Assistant Researcher
25Total
5
Relative Frequency of a Class:
The relative frequency of a class is obtained by dividing the frequency of class by
the sum of the all frequencies.
Example 2: depending on the previous example, calculate the relative frequency.
Solution:
Relative FrequencyFrequencyClasses (rank)
4/25=0.164Lecturer
5/25=0.25Assistant lecturer
6/25=0.246Researcher
10/25=0.410Assistant Researcher
125Total
2- Frequency Distribution for Quantitative variables
Total Range (T.R): is equal to highest value minus lowest value in the data set.
Number of classes: the appropriate number of classes may be decided by Yules
formula which is as follows:
Number of classes= where n is the total number of observation.
Class Width= T.R/ No. of classes
Class Width (Length) is the difference between two consecutive lower class limit
or two consecutive lower class boundaries. The class width can be found by the
following formula:
Frequency (F): is the number of values in a specific class of the distribution.
4
n2.5
6
A- Frequency Distribution for Discrete variables:
The lower and upper limits of the frequency distribution of discrete variables are
as below:
frequency
Class
Upper limitLower limit
f1Xs+W-1Xs
f2Xs+2W-1Xs+W
f3Xs+3W-1Xs+2W
.
.
.
.
.
.
fmXs+M.W-1Xs+(M-1)W
Where:
Xs: the lowest value
W: class width
M: number of classes
Example3: Construct the frequency distribution for the following data:
60 76 80 120 132 82 90 65 68 142 157 164 88
90 98 101 103 110 119 116 120 126 109 114 120 122
111 116 90 78 93 95 98 104 120 113 121 119 125
126 130 131 136 118 120 142 150 154 122 123 139 125
106 154 136 137 110 137 72 150
Total Range (T.R) = 164-60=104
Number of Classes (M) = 2.5(2.783) = 6.958 = 7
Length of Classes (L) = 104/7=14.86 = 15
7
Class Frequency Midpoint Relative Frequency
60 - 74 4 =(60+74)/2= 67 =4/60 = 0.067
75 – 89 5 =(75+89)/2= 82 =5/60 = 0.083
90 – 104 10 97 =10/60 = 0.167
105 – 119 12 112 =12/60 = 0.200
120 – 134 16 127 =16/60 = 0.267
135 – 149 7 142 =7/60 = 0.117
150 – 164 6 157 =6/60 = 0.100
Total 60 1
B- Frequency Distribution for continuous variables:
The lower and upper limits of the frequency distribution of continuous variables
are as below:
frequency
Class
Upper limitLower limit
f1Xs+WXs
f2Xs+2WXs+W
f3Xs+3WXs+2W
.
.
.
.
.
.
fmXs+M.WXs+(M-1)W
Example4: construct a frequency distribution for below data:
1.3 4.1 5.7 6.5 7.9 10.4 2 4.2 5.7 6.5 8.2 8.3 6.8
5.7 4.3 10.4 2.1 2.8 4.3 10.8 5.8 6.9 8.3 8.4 7 11.3
5.8 4.7 3.3 3.3 4.8 5.9 7 8.9 9.1 7.3 6 5.1 3.5
3.7 5.1 6.2 7.6 9.2 9.7 7.8 6.4 5.3 6.4 7.9
8
Cumulative Frequency Distribution
A- Ascending Cumulative Frequency Distribution
Ascending Cumulative Frequency Distribution is the total frequency of all values
less than the upper class boundary of a given class interval.
Example5: Construct an Ascending Cumulative Frequency Distribution
depending on the example 3.
Classes Frequency
Upper Limit of
Class
Ascending Cumulative
Frequency
60 - 74 4 74 Less than or equal to 74= 4
75 – 89 5 89 Less than or equal to 89= 9
90 – 104 10 104 Less than or equal to 104= 19
105 – 119 12 119 Less than or equal to 119= 31
120 – 134 16 134 Less than or equal to 134= 47
135 – 149 7 149 Less than or equal to 149= 54
150 – 164 6 164 Less than or equal to 164= 60
Total 60
B- Descending Cumulative Frequency Distribution
Descending Cumulative Frequency Distribution is the total frequency of all values
Greater than the lower class boundary of a given class interval.
Example6: Construct a descending cumulative frequency distribution depending
on the example 4.
Classes Frequency Lower Limit of Class Descending Cumulative Freq.
0 - 2 1 0 Greater than or equal to 0= 50
2 - 4 7 2 Greater than or equal to 2= 49
4 - 6 15 4 Greater than or equal to 4= 42
6 - 8 15 6 Greater than or equal to 6= 27
8 - 10 8 8 Greater than or equal to 8= 12
10 - 12 4 10 Greater than or equal to 10= 4
Total 50
9
Charts
The graphical presentation of statistical data is using statistical charts. There are
several kinds of charts for representing set of data, such as:
Bar- Charts
A bar chart is a chart composed of bars whose heights are the frequencies of the
different classes. (Qualitative Variables)
Example7: Display the below data as a bar chart.
Red, Green, Green, Green, Blue, Blue, Red, Blue, Green, Green, Red, Red, Blue, Green,
Red, Red
Solution:
In the first step we will create a frequency table for this data:
Color Frequency
red 6
green 6
blue 4
Then we use this table for creating a bar chart
0
1
2
3
4
5
6
7
red green blue
Frequency
Color
11
Histogram
A histogram is similar to bar charts, but it is used for representing the quantitative
variable rather than qualitative variables.
Example8: Draw a histogram for the following frequency distribution.
Classes Frequency
60 - 74 4
75 – 89 5
90 – 104 10
105 – 119 12
120 – 134 16
135 – 149 7
150 – 164 6
Total 60
Solution:
0
2
4
6
8
10
12
14
16
18
60 - 74 75 – 89 90 – 104 105 – 119 120 – 134 135 – 149 150 – 164
Frequency
Classes
11
Pie Chart
A pie chart is a circle divided into sectors, where each sector represents a category
(relative frequency of each class) of data that is proportional to the total amount of
data collected.
We can calculate the angle size of each class by the following rule:
Angle size of class= relative of the class X 360o
Example9: Draw a pie chart for the data in example 1.
Angle SizeRelative FrequencyFrequencyClass
0.16*360=57.64/25=0.164Lecturer
0.2*360=725/25=0.25Assistant lecturer
0.24*360=86.46/25=0.246Researcher
0.4*360=14410/25=0.411Assistant Researcher
360125Total
Lecturer
57.6o
Assistant
lecturer
72o
Researcher
86.4
Assistant
Researcher
1440
12
Frequency Polygon
It is a chart that displays the data by using lines that connect points plotted for the
frequencies at the midpoints of the classes.
Example10: draw a frequency distribution for the frequency distribution in
example3.
Frequency Curve
Frequency curve is like a frequency polygon, but there is one difference between
them, instead of using lines to connect midpoints a smooth curve will be used.
Example 11: draw a frequency curve for the data in example 4.
0
2
4
6
8
10
12
14
16
18
67 82 97 112 127 142 157
Frequency
Midpoints
1
7
15 15
8
4
0
2
4
6
8
10
12
14
16
18
0 2 4 6 8 10 12
Frequency
Midpoints
1 3 5 7 9 11
13
Cumulative Frequency Chart
It is a chart that represents the cumulative frequencies of classes in frequency
distribution.
Example 12: Construct an ascending cumulative frequency chart for the data in
example 4.
Example13: Construct a descending cumulative frequency chart for the data in
example 4.
0
10
20
30
40
50
60
1 2 3 4 5 6
Cumulativefrequency
Upper Limit of classes
2 4 6 8 10 12
0
10
20
30
40
50
60
1 2 3 4 5 6
Cumulativefrequency
Lower Limit of Classes
2 4 6 8 10 12
14
Exercise 1: complete the following frequency distributions if the widths of
classes are equal.
Class Midpoint Class Midpoint
3 8 6
18
Class Midpoint
14
26
Exercise2: the height of 35 students were noted and shown as follows:
170 180 175 165 160 155 180 190 185 170 174 178
165 169 186 179 161 171 159 168 177 164 191 140
173 181 177 173 166 162 168 184 168 158 155
Find the following:
1- Frequency distribution
2- Midpoints
3- Descending cumulative frequency
4- Relative frequency
And draw:
a) Histogram b) frequency polygon
15
SECTION 3
Notations
In this section we will represent some useful notations before explaining the
subjects that related to measures of central tendency and measures of dispersion
(variation).
1- Summation Notation (  )
The symbol
n
i
iX
1
, read as (the summation of X), where n is the number of
observations and (i) is the subscript for the order of values.
Let X is a variable represent 4 values: 2, 3, 5, and 10. Then the sum of variable X
is represent as follow:
2010532
432
1
4
1
1

  
XXXXXX
n
i i
ii
Symbol Operation
n
n
i
i XXXX 
21
1
Sum of observations
22
2
2
1
1
2
n
n
i
i XXXX 
 Sum of Square of observations
 2
21
2
1
n
n
i
i XXXX 






 Square of Sum of observations
Let X and Y are random variables and a is a constant then
ana
XaaX
n
i
n
i
i
n
i
i
.
1
11






 
  





n
i
i
n
i
i
n
i
ii
n
i
i
n
i
i
YXYX
anXaX
111
11
.


16
 
nn
n
i
ii
n
i
i
n
i
i
n
i
ii
YXYXYXYX
YbXabYaX
.... 2211
1
111








 





 n
i
i
n
i
i XX
1
2
2
1
Example 1: If Xi represents the following 4, 3, 5 and 1. Find the following:
a- 
n
i
iX
1
b- 
n
i
iX
1
2
c- 
n
i
iX
1
2 d-  

n
i
iX
1
3
Solution:
    11213)4(333)
26)13.(222)
511534
)
131534
)
4
1
4
11
4
11
2222
2
4
2
3
2
2
2
1
4
1
2
1
2
4321
4
11














i
i
i
i
n
i
i
i
i
n
i
i
i
i
n
i
i
i
i
n
i
i
XXXd
XXc
XXXXXXb
XXXXXXa
17
2- Pie Notation  )(
The symbol 
n
i
iX
1
is used to multiplication of all values of Xi’s, or:
n
n
i
i XXXX .. 21
1








n
i
i
n
n
i
i
n
i
n
XaaX
aa
11
1
.
Example 2: If Xi represents the following 4, 2, 5 and 3. Find the following:
a- 
n
i
iX
1
b- 
n
i
iX
1
5
Solution:
1203*5*2*4
...) 4321
4
11

  
XXXXXXa
i
i
n
i
i
b)
75000)120.(5
.55
4
11

  
n
i
i
n
n
i
i XX
Exercise: If
Xi: 5, 3, 4, 2 and Yi: 3, 1, 4, 2 then find the following:
a- 
4
1
2
i
iX b- 
4
1
3
i
iY c- 2
4
1
. i
i
i YX
d-  

n
i
ii YX
1
e- 4
4
1
66 i
f- 
4
1i
iX g- 
n
i
iY
1
4 h- i
n
i
i YX .2
1

j- 
4
1i i
i
Y
X
k-   2.3
4
1

i
i
i YX
18
SECTION 4: MEASURES OF CENTRAL TENDENCY
In the previous sections, we have studied how to collect raw data, its classification
and tabulation in a useful form, which contributes in solving many problems of
statistical concern. Yet, this is not sufficient, for in practical purposes, there is
need for further condensation, particularly when we want to compare two or more
different distributions. We may reduce the entire distribution to one number
which represents the distribution.
A single value which can be considered as typical or representative of a set of
observations and around which the observations can be considered as Centered is
called an ’Average’ (or average value) or a Center of location. Since such typical
values tend to lie centrally within a set of observations when arranged according
to magnitudes, averages are called measures of central tendency.
So the measure of central tendency is a value at the center or middle of a data set.
This value represents all data of the group.
The fundamental measures of tendencies are:
(1) Arithmetic Mean
(2) Weighted Mean
(3) Harmonic Mean
(4) Quadratic Mean
(5) Mode
(6) Median
However the most common measures of central tendencies or locations are:
Arithmetic mean, median and mode.
19
1)Arithmetic Mean
The arithmetic mean (generally called mean) is the sum of all observations
(values of all items) together and divides this sum by the number of observations
(or items). The symbol X (pronounced as X bar) represents the sample mean and
 represents the population mean.
Arithmetic mean for ungrouped data
Suppose, we have (n) observations (or measures) X1, X2, X3... Xn then the
Arithmetic mean is obviously:
n
XXXX
n
X
X n
n
i
i


 3211
Where: Xi = the ith
observation.
n = the size of the data.
The mean for a population consisting N observations is:
N
XXXX
N
X
N
N
i
i


 3211

Example: Calculate the arithmetic mean of the given values:
98 96 95 98 100 92 96 69
Solution:
93
8
699692100989596981




n
X
X
n
i
i
21
Arithmetic mean for grouped data:
The arithmetic mean of grouped data is found by multiplying every midpoints (i.e.
value of x) by its corresponding frequency (fi) then their total (sum) is found
 ii xf . , and then dividing this sum by the  if .


i
ii
f
xf
X
.
The above formula will be sample data. Similar formulas are used for population data.
Example: Determine the mean for the following set of data.
Classes Frequency
8 - 2
10 3
12 5
14 4
16 1
Solution:


i
ii
f
xf
X
.
Classes Frequency (fi) Midpoint (xi) fi . xi
8- 2 9 18
10- 3 11 33
12- 5 13 65
14- 4 15 60
16- 1 17 17
Total 15 193
87.12
15
193
X
21
The Properties of the Arithmetic Mean:
1- The sum of the deviations, of all the values of x, from their mean, is zero.
0)(
:
)(
1
1
1
1 1






 



 
XnXnXX
then
XXn
n
X
Xhavewe
XnXXX
n
i
i
n
i
i
n
i
i
n
i
n
i
ii
2- If ),...,,( 21 kXXX represent the means for k groups based on ),...,,( 21 knnn
observations respectively, the mean of the groups combined is:




 k
i
i
k
i
ii
n
Xn
X
1
1
.
3- The sum of squares of the deviations from the mean is smaller than from
any other value. (prove this property)
Advantage (merits) of Arithmetic mean
1- It is easy to calculate and simple to understand.
2- It is very popular (most widely used).
3- It is based on all the observations; so that it becomes a good representative.
Disadvantage (demerits) of Arithmetic mean
1- It is affected by outliers or extreme values.
2- It cannot be obtained if a single observation is missing or lost;
3- It cannot be calculated in case open-frequency distributions.
4- It cannot be computed for qualitative data.
22
2) Weighted Arithmetic Mean:
One of the limitations of the arithmetic mean is that it gives equal importance
to all the items. But there are cases where the relative importance of the different
items is not the same. When this is so, we compute weighted arithmetic mean.
The formula for computing weighted arithmetic mean in case of ungrouped data
is:
WWW
XWXWXW
W
XW
n
nn
n
i
i
i
n
i
i
WX









21
2211
1
1
Where, Wi is the weight of ith
observation.
The formula for computing weighted arithmetic mean in case of grouped data is:
nn
nnn
n
i
ii
i
n
i
i
W
fff
ff
f
i
X
WWW
xWXWXfW
W
XfW









2211
22211
1
1 1
Example: The marks of a student in the final examination of Statistics department
are as follows:
Subjects (Xi): 98 96 95 98 100 92 96 69
Units (Wi): 2 3 3 1 3 3 2 2
Calculate the weighted mean.
Solution:
3158.93
19
1773
22331332
)2*69()2*96()3*92()3*100()1*98()3*95()3*96()2*98(
1
1









W
n
i
i
i
n
i
i
W
X
X
W
XW
23
Remark: If all the weights are equal, then the weighted mean is the same as the
arithmetic mean.
Exercise1: The average marks of three groups of students having 70, 50 and 30
students respectively are 50, 55 and 45. Find the average marks of all the 150
students, taken together.
Exercise2: following frequency distribution showing the marks obtained by 50
students in statistics at Soran institute. Find the arithmetic mean.
Classes Frequency (fi)
20 - 29 1
30 - 39 5
40 - 49 12
50 - 59 15
60 - 69 9
70 - 79 6
80 - 89 2
Exercise3: The mean of a certain number of observations is 40. If two items with
values 50 and 64 are added to this data, the mean rises to 42. Find the number of
items in the original data.
Exercise4: If 

n
i
iX
1
72)4( and 

n
i
iX
1
3)7( , then find the number of
observation (n).
24
3) Harmonic Mean
Harmonic mean is one of the measures of central tendency, which are used less
than other measures (mean, median and mode).
The formula for computing weighted arithmetic mean in case of ungrouped data
is:

 n
i i
h
X
n
X
1
1
And for grouped data is:



 n
i i
i
i
h
X
f
f
X
1
Example: calculate the harmonic mean for the following data:
Xi: 8 2 5 3 4 7 8
Solution:

 n
i i
h
X
n
X
1
1
:
1
iX
0.13 0.5 0.2 0.33 0.25 0.14 0.13
167.4
68.1
7
68.1
1
 h
i
X
X
4) Quadratic mean
n
X
X
n
i
i
q

 1
2
for Ungrouped data




 n
i
i
n
i
ii
q
f
Xf
X
1
1
2
for grouped data
25
5) MODE
The mode (Mo) is the value that occurs most often in a data set.
Mode for ungrouped data:
The mode of the following data set: 5, 6, 7, 5, 5, 10, 4, 5, 4, 7, 5, 5 is the number 5
because it is repeated more than other numbers (6 times).
Remark: When 2 numbers occur with the same greatest frequency, each one is
mode and the data set is bimodal. When more than 2 numbers occur with the same
greatest frequency, each is a mode and the data set is said to be multimodal. When
no number is repeated, we say that there is no mode.
Example: Find the mode of the following data set: 5, 7, 6, 7, 5, 7, 5, 10, 4, 4, 7, 5.
Solution: Number 5 and 7 are both modes. The data set is bimodal.
Mode for grouped data:
Let (X1, X2, … Xn) represent the class marks of the class intervals with ( f1, f2, …,
fn) represent the frequencies. The modal class is that class which has the highest
frequency. The formula of obtaining the mode is as follows:
k
kkkk
kk
k W
ffff
ff
LMo 





)()(
)(
11
1
Where:
Lk: lower limit of modal class.
fk: modal class frequency
fk-1: frequency of previous class
fk+1: frequency of next class
Wk: Size of modal class interval (class width).
26
Example: Find the mode for the following frequency distribution:
Solution:
Modal class is 30 – 39 because it has a highest frequency (10).
Lk=30, fk=10, fk-1=7, fk+1=8, Wk=10
k
kkkk
kk
k W
ffff
ff
LMo 





)()(
)(
11
1
3610
5
3
30
10
)810()710(
)710(
30




Mo
Remark1: If there are 2 or more modal classes; therefore, to find the model class
we must use assembly method.
Remark2: When we use assembly method, the formula of mode will be:
k
kkkk
kk
k W
ffff
ff
LMo 





11
1)(
Remark3: If the widths of the classes are not equal, in this case adjusted
frequency must be used instead of real frequency. Where adjusted frequency for
each class is equal to
i
i
W
f
.
Class frequency
10 – 19 5
20 – 29 7
30 – 39 10
40 – 49 8
50 – 59 4
60 – 69 3
70 – 79 1
27
Example: Find the mode for the following frequency distribution:
Solution:
There are 2 modal classes, therefore, to find the model class we must use
assembly method and it is as follows:
From the previous table we can abstract the following table:
Serial No.
Of column
Greatest frequency
appears in the column
Contributor
Class
1 4 1, 2
2 8 1, 2
3 7 2, 3
4 11 1, 2, 3
5 9 2, 3, 4
Then the 2nd
class is the modal class
Class frequency
10 – 19 4
20 – 29 4
30 – 39 3
40 – 49 2
50 – 59 3
60 – 69 3
70 – 79 1
Class frequency
1st
assembly 2nd assembly 3rd
assembly
4th
assembly
10 – 19 4
8
1120 – 29 4
7
930 – 39 3
5
40 – 49 2
5
850 – 59 3
6
760 – 69 3
4
70 – 79 1
28
Lk =20, fk =4, fk-1 =4, fk+1 =3, Wk =10
k
kkkk
kk
k W
ffff
ff
LMo 





11
1)(
2010
3444
)44(
20 


Mo
Advantage of Mode
1- It is easy to calculate.
2- It is not affected by extreme values.
3- It can be used for qualitative data.
4- It can be located graphically (Histogram).
5- It can be calculated for distributions with open end classes.
Disadvantage of Mode
1- It is not based upon all the observations.
2- It is not always possible to find a clearly defined mode (2 modes or 3
modes).
3- It is not capable of further mathematical treatment.
Exercise: Find the mode for the following frequency distributions:
Class frequency Class frequency
5 – 2 10 – 30
10 – 6 20 – 12
15 – 10 30 – 16
25 – 22 40 – 28
35 – 27 50 – 26
50 – 60 11 60 – 14
29
6) MEDIAN
The Median (Me) is the value of the middle item in a data set and divides the
dataset in to two equal parts, one part comprising all values greater and the other
all values smaller than the median
Median for ungrouped data:
In the first step we will arrange the data in ascending (increasing) order.
If number of observations (n) is odd, the median is the observation that has





 
2
1n
order.
If number of observations (n) is even, then the median is the average of
observations that have order 





2
n
and 





1
2
n
.
Example: Find the median of the following data set:
55, 62, 53, 70, 68, 65, 63, 79, and 80.
Solution:
Arrange the data increasingly: 53, 55, 62, 63, 65, 68, 70, 79, 80.
Since n=9 is odd, then the order of median is 




 
2
1n
5
2
19
2
1





 





 n
Then the 5th
observation is the value of median or Me=65.
Example: Find the median of the following data set:
20, 22, 19, 26, 30, 27, 28, 29, 18, 20, 23, 25.
Solution:
Arranging the data in increasing order
18, 19, 20, 20, 22, 23, 25, 26, 27, 28, 29, 30
2366
2
12
2
isvalueththe
n












31
25771
2
12
1
2
isvalueththe
n













Then:
24
2
2523


Me
Median for grouped data:
To find the median of a frequency distribution, follow these steps:
Step1: Find cumulative frequency (Ascending or descending).
Step2: Compute the median order that equal to
2
 if
.
Step3: If k
i
k F
f
F 


2
1 , then the median class is the class which its order is K .
Step4: Compute the value of median:
k
k
k
i
k
f
W
F
f
LMe .
2
1 







 
 for ascending cumulative frequency.
k
ki
kk
f
Wf
FLMe .
2
*









 for descending cumulative frequency.
Where:
Lk : Lower Limit of median class.
fk : Frequency of the median class.
W: Median class’s width.
 if : Sum of the frequencies.
Fk–1: Ascending cumulative frequency precede the median class.
*
kF : Descending cumulative frequency of the median class.
31
Example: Find the mode for the following frequency distribution:
Classes 100 - 120 - 140 - 160 - 180 - 200 - 220 -
no. of families 3 7 14 20 18 12 6
Solution:
In the first step we find ascending cumulative frequency
Then we find the median order that equal to:
40
2
80
2

 if
Compare the median order with ascending cumulative frequency then:
444024
2
1 

 k
i
k F
f
F Then the median class is 4th
class.
Then:
Lk=160, Wk=20, fk=20
4
4
34 .
2 f
W
F
f
LMe
i










176
20
20
.24
2
80
160 





Me
Class frequency
Ascending Cumulative
frequency
100 - 3 3
120 - 7 10
140 - 14 24
160 - 20 44
180 - 18 62
200 - 12 74
220 - 6 80
Total 80
32
Merits of Median
1. It is easy to calculate and understand.
2. It is not affected by extreme values like the arithmetic mean
3. It can be found by mere inspection.
4. It can be used for qualitative studies.
5. It can be calculated for distributions with open-end classes.
6. It can be obtained graphically.
Demerits of Median
1. It is not capable of further algebraic treatment.
2. It is not based on all observations.
Exercise: find the median for the following frequency distribution by using
ascending and descending cumulative frequency.
The relationship between Arithmetic Mean, Median and Mode
If the frequency distribution is symmetric then the following relationship between
these measures is true:
3
o
e
MX
MX


Class frequency
18 - 10
28 - 15
36 - 18
50 - 22
70 - 20
100 - 18
130 - 150 13
Total
33
SECTION 5) Measures of Dispersion (Variation)
Measures that describe the spread of a data set are called measures of dispersion.
The main objective is to know the homogeneity of the values for a data set, or to
compare between the values for two or more than two data set.
1-Range
The simplest measure of absolute variation is the range which calculated by
subtracting the smallest value from the largest value of a data set.
R=Largest value – Smallest value
Example: find the range for the following data: 2, 5, 3, 8, 7, 10, 9, 12, 15.
Solution:
R= Largest value – Smallest value=15-2=13
Remark: in case of grouped data we calculate the value of Range by subtracting
the lower limit of first class from the upper limit of last class.
2- Mean Deviation
It is the sum of the absolute deviation of observations from a point (A) divided by
the number of observations.
n
AX
DM
n
i
i

 1
. for ungrouped data
n
AXf
DM
n
i
ii

 1
. for grouped data
Where A, may be is arithmetic mean ( X ) or median ( eM ) or mode ( oM ).
Example: find the value of mean deviation for the following data by using mean,
median and mode.
Xi: 2, 3, 4, 5, 5, 6, 7, 10, 13, 14, 19
Solution:
34
First we find the value of ( X ) and ( eM ) and ( oM ).
X =8, eM =6, oM =5
Xi XXi  oi MX  ei MX 
2 6 3 4
3 5 2 3
4 4 1 2
5 3 0 1
5 3 0 1
6 2 1 0
7 1 2 1
10 2 5 4
13 5 8 7
14 6 9 8
19 11 14 13
Total 48 44 45
367.4
11
48
)(. 1




n
XX
XDM
n
i
i
0909.4
11
45
)(. 1




n
MX
MDM
n
i
ei
e
4
11
44
)(. 1




n
MX
MDM
n
i
oi
o
35
3- Variance
It is one of the most important measures of absolute variation. The variance can
be calculated by taking the average of the square of the distance (deviation) of
each observation from the mean of data set.
The formula for the population variance ( ) for raw data is:
N
X
n
i
i

 1
2
2
)( 

Where:
X: individual value
µ: population mean
N: population size (number of observations).
Also the formula for the sample variance (S2
) for raw data is as follows:
1
)(
1
2
2




n
XX
S
n
i
i
On the other hand, the formula for the sample variance for grouped data is:
1
)(
1
2
2




n
XXf
S
n
i
ii
Where  ifn
Example: find the variance for the following dataset:
56, 68, 72, 63, 65, 68, 71, 69, 62, 56.
Solution:
1
)(
1
2
2




n
XX
S
n
i
i
65
10
650
10
10
1

i
iX
X
36
Xi )( XXi  2
)( XXi 
56 -9 81
68 3 9
72 7 49
63 -2 4
65 0 0
68 3 9
71 6 36
69 4 16
62 -3 9
56 -9 81
Total 294
then
667.32
110
2942


S
Properties of variance:
1) 02
S
2) If 222
XYii SaSaXY  , where a is a constant. (Prove that)
3) If 22
XYii SSbXY  , where b is a constant. (Prove that)
4) If X and Y are independent variables and iii YX=Z  , then the variance of Z
is:
222
YXZ SSS 
5) If ),...,,( 22
2
2
1 nSSS represent the variance for k groups based on ),...,,( 21 knnn
observations respectively, then the pooled variance of the groups is as follows:






 n
i
i
n
i
ii
p
n
Sn
S
1
1
2
2
)1(
)1(
where 30in




 n
i
i
n
i
ii
p
n
Sn
S
1
1
2
2
.
where 30in
37
4-Standard deviation (S)
Standard deviation is the most important and most widely used measure of
absolute variation. Standard deviation is the square root of variance.
1
)(
1
2
2




n
XX
SS
n
i
i
Example: Find the standard deviation of the following frequency distribution.
Solution:
75.175
80
14060
.
1
1





n
i
i
n
i
ii
f
Xf
X
198.30
80
72955
).(
1
2




n
XXf
S
n
i
ii
Class fi Xi
fi.Xi )( XXi  2
)( XXi  2
).( XXf ii 
100 - 3 110 330 -65.75 4323.063 12969.19
120 - 7 130 910 -45.75 2093.063 14651.44
140 - 14 150 2100 -25.75 663.0625 9282.875
160 - 20 170 3400 -5.75 33.0625 661.25
180 - 18 190 3420 14.25 203.0625 3655.125
200 - 12 210 2520 34.25 1173.063 14076.75
220 - 6 230 1380 54.25 2943.063 17658.38
Total 80 14060 72955
38
Coefficient of Variation
A disadvantage of the standard deviation as a comparative measure of variation is
that it depends on the units of measurement. This means that it is difficult to use
the standard deviation to compare measurements from different populations. For
this reason, statisticians have defined the coefficient of variation, which expresses
the standard deviation as a percentage of the sample or population mean.
If X and S represents the sample mean and the sample standard deviation, then
the coefficient of variation (C.V.) is defined to be:
100*..
X
S
VC 
If μ and σ represent the population mean and standard deviation, then the
coefficient of variation CV is defined to be:
100*..


VC
Notice that the numerator and denominator in the definition of CV have the same
units, so CV itself has no units of measurement. This gives us the advantage of
being able to directly compare the variability of two different populations using
the coefficient of variation.
Example1: A company has two sections (A and B) with 40 and 65 employees
respectively. Their average weekly wages are $450 and $350. The standard
deviations are 7 and 9. Which section has larger variability in wages?
Solution:
55.1100*
450
7
100*.. )( 
X
S
VC A
57.2100*
350
9
100*.. )( 
X
S
VC B
Because the C.V for section A is smaller than C.V for section B then, section B
has larger variability. So section A has more homogeneity than section B.
39
Example2: if we know that the mean and standard deviation of heights and
weights of 40 students are as below:
Mean Standard Deviation
Weights 68.34 3.02
Heights 172.55 26.33
Then find the coefficient of variation of height and weight and compare the
results.
Solution:
42.4100*
34.68
02.3
100*. )Weights( 
X
S
VC
26.15100*
55.172
33.26
100*. )( 
X
S
VC Height
So, the Weights (with C.V. =4.42) have less variation than Heights (with
C.V.=15.26).

More Related Content

What's hot

Stratified Random Sampling - Problems
Stratified Random Sampling -  ProblemsStratified Random Sampling -  Problems
Stratified Random Sampling - ProblemsSundar B N
 
Data array and frequency distribution
Data array and frequency distributionData array and frequency distribution
Data array and frequency distributionraboz
 
Statistics
StatisticsStatistics
Statisticsitutor
 
2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing dataLong Beach City College
 
SAMPLING and SAMPLING DISTRIBUTION
SAMPLING and SAMPLING DISTRIBUTIONSAMPLING and SAMPLING DISTRIBUTION
SAMPLING and SAMPLING DISTRIBUTIONRia Micor
 
Sampling techniques new
Sampling techniques newSampling techniques new
Sampling techniques newbabita jangra
 
CLASSIFICATION AND TABULATION in Biostatic
CLASSIFICATION AND TABULATION in BiostaticCLASSIFICATION AND TABULATION in Biostatic
CLASSIFICATION AND TABULATION in BiostaticMuhammad Amir Sohail
 
Stat 3203 -cluster and multi-stage sampling
Stat 3203 -cluster and multi-stage samplingStat 3203 -cluster and multi-stage sampling
Stat 3203 -cluster and multi-stage samplingKhulna University
 
Chap06 sampling and sampling distributions
Chap06 sampling and sampling distributionsChap06 sampling and sampling distributions
Chap06 sampling and sampling distributionsJudianto Nugroho
 
Business Statistics
Business StatisticsBusiness Statistics
Business Statisticsshorab
 
QT1 - 02 - Frequency Distribution
QT1 - 02 - Frequency DistributionQT1 - 02 - Frequency Distribution
QT1 - 02 - Frequency DistributionPrithwis Mukerjee
 
Probability and statistics(assign 7 and 8)
Probability and statistics(assign 7 and 8)Probability and statistics(assign 7 and 8)
Probability and statistics(assign 7 and 8)Fatima Bianca Gueco
 
Sampling Distribution and Simulation in R
Sampling Distribution and Simulation in RSampling Distribution and Simulation in R
Sampling Distribution and Simulation in RPremier Publishers
 

What's hot (19)

Stratified Random Sampling - Problems
Stratified Random Sampling -  ProblemsStratified Random Sampling -  Problems
Stratified Random Sampling - Problems
 
Lesson 1 07 measures of variation
Lesson 1 07 measures of variationLesson 1 07 measures of variation
Lesson 1 07 measures of variation
 
Data array and frequency distribution
Data array and frequency distributionData array and frequency distribution
Data array and frequency distribution
 
Statistics
StatisticsStatistics
Statistics
 
Elementary Statistics
Elementary Statistics Elementary Statistics
Elementary Statistics
 
2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data
 
SAMPLING and SAMPLING DISTRIBUTION
SAMPLING and SAMPLING DISTRIBUTIONSAMPLING and SAMPLING DISTRIBUTION
SAMPLING and SAMPLING DISTRIBUTION
 
Sampling techniques new
Sampling techniques newSampling techniques new
Sampling techniques new
 
Basic concepts of statistics
Basic concepts of statistics Basic concepts of statistics
Basic concepts of statistics
 
CLASSIFICATION AND TABULATION in Biostatic
CLASSIFICATION AND TABULATION in BiostaticCLASSIFICATION AND TABULATION in Biostatic
CLASSIFICATION AND TABULATION in Biostatic
 
Stat 3203 -cluster and multi-stage sampling
Stat 3203 -cluster and multi-stage samplingStat 3203 -cluster and multi-stage sampling
Stat 3203 -cluster and multi-stage sampling
 
Chap06 sampling and sampling distributions
Chap06 sampling and sampling distributionsChap06 sampling and sampling distributions
Chap06 sampling and sampling distributions
 
Business Statistics
Business StatisticsBusiness Statistics
Business Statistics
 
Presentation of data
Presentation of dataPresentation of data
Presentation of data
 
QT1 - 02 - Frequency Distribution
QT1 - 02 - Frequency DistributionQT1 - 02 - Frequency Distribution
QT1 - 02 - Frequency Distribution
 
Probability and statistics(assign 7 and 8)
Probability and statistics(assign 7 and 8)Probability and statistics(assign 7 and 8)
Probability and statistics(assign 7 and 8)
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
Sampling Distribution and Simulation in R
Sampling Distribution and Simulation in RSampling Distribution and Simulation in R
Sampling Distribution and Simulation in R
 
I. central tendency
I. central tendencyI. central tendency
I. central tendency
 

Similar to Principlles of statistics

Tabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency tableTabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency tableJagdish Powar
 
lesson-data-presentation-tools-1.pptx
lesson-data-presentation-tools-1.pptxlesson-data-presentation-tools-1.pptx
lesson-data-presentation-tools-1.pptxAnalynPasto
 
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdfSTATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdfMariaCatherineErfeLa
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics Bahzad5
 
Classification and tabulation of data
Classification and tabulation of dataClassification and tabulation of data
Classification and tabulation of dataJagdish Powar
 
first lecture to elementary statistcs
first lecture to elementary statistcsfirst lecture to elementary statistcs
first lecture to elementary statistcsssuser19c049
 
Biostatistics FOR NURSING 1.docx.pdf
Biostatistics FOR NURSING 1.docx.pdfBiostatistics FOR NURSING 1.docx.pdf
Biostatistics FOR NURSING 1.docx.pdfParacetamol14
 
Statistics final seminar
Statistics final seminarStatistics final seminar
Statistics final seminarTejas Jagtap
 
Frequency distribution 6
Frequency distribution 6Frequency distribution 6
Frequency distribution 6NadeemShoukat3
 
1) Chapter#02 Presentation of Data.ppt
1) Chapter#02 Presentation of Data.ppt1) Chapter#02 Presentation of Data.ppt
1) Chapter#02 Presentation of Data.pptMuntazirMehdi43
 
Chapter 4 MMW.pdf
Chapter 4 MMW.pdfChapter 4 MMW.pdf
Chapter 4 MMW.pdfRaRaRamirez
 
analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of dataUnsa Shakir
 
Statistics
StatisticsStatistics
Statisticspikuoec
 
Day2 session i&ii - spss
Day2 session i&ii - spssDay2 session i&ii - spss
Day2 session i&ii - spssabir hossain
 

Similar to Principlles of statistics (20)

Tabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency tableTabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency table
 
Statistics and prob.
Statistics and prob.Statistics and prob.
Statistics and prob.
 
Biostats in ortho
Biostats in orthoBiostats in ortho
Biostats in ortho
 
Statistics and prob.
Statistics and prob.Statistics and prob.
Statistics and prob.
 
lesson-data-presentation-tools-1.pptx
lesson-data-presentation-tools-1.pptxlesson-data-presentation-tools-1.pptx
lesson-data-presentation-tools-1.pptx
 
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdfSTATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 
Classification and tabulation of data
Classification and tabulation of dataClassification and tabulation of data
Classification and tabulation of data
 
first lecture to elementary statistcs
first lecture to elementary statistcsfirst lecture to elementary statistcs
first lecture to elementary statistcs
 
Biostatistics FOR NURSING 1.docx.pdf
Biostatistics FOR NURSING 1.docx.pdfBiostatistics FOR NURSING 1.docx.pdf
Biostatistics FOR NURSING 1.docx.pdf
 
Qt notes
Qt notesQt notes
Qt notes
 
Statistics final seminar
Statistics final seminarStatistics final seminar
Statistics final seminar
 
Frequency distribution 6
Frequency distribution 6Frequency distribution 6
Frequency distribution 6
 
1) Chapter#02 Presentation of Data.ppt
1) Chapter#02 Presentation of Data.ppt1) Chapter#02 Presentation of Data.ppt
1) Chapter#02 Presentation of Data.ppt
 
Ch 3 DATA.doc
Ch 3 DATA.docCh 3 DATA.doc
Ch 3 DATA.doc
 
Chapter 4 MMW.pdf
Chapter 4 MMW.pdfChapter 4 MMW.pdf
Chapter 4 MMW.pdf
 
analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of data
 
Statistics
StatisticsStatistics
Statistics
 
1. descriptive statistics
1. descriptive statistics1. descriptive statistics
1. descriptive statistics
 
Day2 session i&ii - spss
Day2 session i&ii - spssDay2 session i&ii - spss
Day2 session i&ii - spss
 

Recently uploaded

Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 

Recently uploaded (20)

Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 

Principlles of statistics

  • 1. 1 SECTION 1 Statistics: is the science of obtaining data, organizing, summarizing, and presenting, analyzing, interpreting and drawing conclusions based on the data to give the best decision. Statistics divided in to two distinct parts: 1- Descriptive Statistics: It is concerned only with the collection, organization, summarizing, analysis and presentation of an array of numerical qualitative or quantitative data. Descriptive statistics include the mean, median, mode, standard deviation, range, etc. 2- Inferential Statistics: it is consist of methods for drawing conclusion based on the data to give the best decision. Its divide in to two parts also: A- Estimation B- Testing Hypothesis Population: Is the complete collection of all elements to be studied. Finite (countable) Population: A population is called finite if it is possible to count its individuals. For example, the number of students in Shaqlawa technical institute or number of computers in a libratory. Infinite (uncountable) Population: A population is called infinite if it is impossible to count its individuals, for example the number of bacteria's in a garden, number of fishes in a sea. Census: is the collection of data from every elements of population. Sample: is a sub- collection of elements drawn from a population. Sampling: the process of selecting a subset of data from the population is called Sampling.
  • 2. 2 Sources of collecting the data: 1- Historical Sources 2- Field Sources Probability (Random) Samples are drawn from populations through several different sampling methods: 1- Simple Random Sampling Every member of the population (N) has an equal chance of being selected for your sample (n). This is arguably the best sampling method, as your samples almost guaranteed to be representative of your population. However, it is rarely ever used due to being too impractical. 2- Systematic Sampling In this method, every nth individual from the population (N) is placed in the sample (n). For example, if you add every 7th individual to walk out of a supermarket to your sample, you are performing systematic sampling. 3- Stratified Sampling A general problem with random sampling is that you could, by chance, miss out a particular group in the sample. However, if you form the population into groups, and sample from each group, you can make sure the sample is representative. In METHODS OF COLLECTING THE DATA SAMPLES PROBABILITY (RANDOM) NON PROBABLITY CENSUS
  • 3. 3 stratified sampling, the population is divided into groups called strata. A sample is then drawn from within these strata. Some examples of strata commonly used by the ABS are States, Age and Sex. Other strata may be religion, academic ability or marital status. 4- MULTI-STAGE SAMPLING Multi-stage sampling is like cluster sampling, but involves selecting a sample within each chosen cluster, rather than including all units in the cluster. Thus, multi-stage sampling involves selecting a sample in at least two stages. In the first stage, large groups or clusters are selected. These clusters are designed to contain more population units than are required for the final sample. In the second stage, population units are chosen from selected clusters to derive a final sample. If more than two stages are used, the process of choosing population units within clusters continues until the final sample is achieved. Variable: is a characteristic or property of the elements in the population. The name of variable is derived from the fact that any particular characteristic may vary among the elements in a population. Variables Quantitative variables Descrete variables (Number of students) Continuous variables (Hieght, Weight) Qualitative (descriptive) variables
  • 4. 4 Section 2 Frequency Distribution (Table): After a researcher might have gotten a raw data from any source, there is a need for the raw data (ungrouped) to be arranged and organized in a meaningful way in order to be able to describe and come up with a useful inference. The method that is being used for such organization and arrangement is called frequency distribution. Frequency means the number of times something happens. Frequency distribution simply means organizing of raw data in table from using classes and frequencies. 1- Frequency Distribution for Qualitative variables: Frequency Distribution for Qualitative variables lists all classes and the number of elements that belong to each of the classes. Example1: the following list gives the rank of a sample that consists of 25 clerks in Soran institute: Researcher, Assistant Researcher, Assistant Researcher, Lecturer, Assistant Researcher, Assistant lecturer, Assistant lecturer, Researcher, Lecturer, Researcher, Assistant Researcher, Researcher, Assistant Researcher, Assistant lecturer, Assistant Researcher, Lecturer, Assistant Researcher, Assistant lecturer, Assistant lecturer, Researcher, Lecturer, Assistant Researcher, Assistant Researcher, Assistant Researcher, Researcher. Create a frequency distribution for the above data. Solution: FrequencyClasses (rank) 4Lecturer 5Assistant lecturer 6Researcher 10Assistant Researcher 25Total
  • 5. 5 Relative Frequency of a Class: The relative frequency of a class is obtained by dividing the frequency of class by the sum of the all frequencies. Example 2: depending on the previous example, calculate the relative frequency. Solution: Relative FrequencyFrequencyClasses (rank) 4/25=0.164Lecturer 5/25=0.25Assistant lecturer 6/25=0.246Researcher 10/25=0.410Assistant Researcher 125Total 2- Frequency Distribution for Quantitative variables Total Range (T.R): is equal to highest value minus lowest value in the data set. Number of classes: the appropriate number of classes may be decided by Yules formula which is as follows: Number of classes= where n is the total number of observation. Class Width= T.R/ No. of classes Class Width (Length) is the difference between two consecutive lower class limit or two consecutive lower class boundaries. The class width can be found by the following formula: Frequency (F): is the number of values in a specific class of the distribution. 4 n2.5
  • 6. 6 A- Frequency Distribution for Discrete variables: The lower and upper limits of the frequency distribution of discrete variables are as below: frequency Class Upper limitLower limit f1Xs+W-1Xs f2Xs+2W-1Xs+W f3Xs+3W-1Xs+2W . . . . . . fmXs+M.W-1Xs+(M-1)W Where: Xs: the lowest value W: class width M: number of classes Example3: Construct the frequency distribution for the following data: 60 76 80 120 132 82 90 65 68 142 157 164 88 90 98 101 103 110 119 116 120 126 109 114 120 122 111 116 90 78 93 95 98 104 120 113 121 119 125 126 130 131 136 118 120 142 150 154 122 123 139 125 106 154 136 137 110 137 72 150 Total Range (T.R) = 164-60=104 Number of Classes (M) = 2.5(2.783) = 6.958 = 7 Length of Classes (L) = 104/7=14.86 = 15
  • 7. 7 Class Frequency Midpoint Relative Frequency 60 - 74 4 =(60+74)/2= 67 =4/60 = 0.067 75 – 89 5 =(75+89)/2= 82 =5/60 = 0.083 90 – 104 10 97 =10/60 = 0.167 105 – 119 12 112 =12/60 = 0.200 120 – 134 16 127 =16/60 = 0.267 135 – 149 7 142 =7/60 = 0.117 150 – 164 6 157 =6/60 = 0.100 Total 60 1 B- Frequency Distribution for continuous variables: The lower and upper limits of the frequency distribution of continuous variables are as below: frequency Class Upper limitLower limit f1Xs+WXs f2Xs+2WXs+W f3Xs+3WXs+2W . . . . . . fmXs+M.WXs+(M-1)W Example4: construct a frequency distribution for below data: 1.3 4.1 5.7 6.5 7.9 10.4 2 4.2 5.7 6.5 8.2 8.3 6.8 5.7 4.3 10.4 2.1 2.8 4.3 10.8 5.8 6.9 8.3 8.4 7 11.3 5.8 4.7 3.3 3.3 4.8 5.9 7 8.9 9.1 7.3 6 5.1 3.5 3.7 5.1 6.2 7.6 9.2 9.7 7.8 6.4 5.3 6.4 7.9
  • 8. 8 Cumulative Frequency Distribution A- Ascending Cumulative Frequency Distribution Ascending Cumulative Frequency Distribution is the total frequency of all values less than the upper class boundary of a given class interval. Example5: Construct an Ascending Cumulative Frequency Distribution depending on the example 3. Classes Frequency Upper Limit of Class Ascending Cumulative Frequency 60 - 74 4 74 Less than or equal to 74= 4 75 – 89 5 89 Less than or equal to 89= 9 90 – 104 10 104 Less than or equal to 104= 19 105 – 119 12 119 Less than or equal to 119= 31 120 – 134 16 134 Less than or equal to 134= 47 135 – 149 7 149 Less than or equal to 149= 54 150 – 164 6 164 Less than or equal to 164= 60 Total 60 B- Descending Cumulative Frequency Distribution Descending Cumulative Frequency Distribution is the total frequency of all values Greater than the lower class boundary of a given class interval. Example6: Construct a descending cumulative frequency distribution depending on the example 4. Classes Frequency Lower Limit of Class Descending Cumulative Freq. 0 - 2 1 0 Greater than or equal to 0= 50 2 - 4 7 2 Greater than or equal to 2= 49 4 - 6 15 4 Greater than or equal to 4= 42 6 - 8 15 6 Greater than or equal to 6= 27 8 - 10 8 8 Greater than or equal to 8= 12 10 - 12 4 10 Greater than or equal to 10= 4 Total 50
  • 9. 9 Charts The graphical presentation of statistical data is using statistical charts. There are several kinds of charts for representing set of data, such as: Bar- Charts A bar chart is a chart composed of bars whose heights are the frequencies of the different classes. (Qualitative Variables) Example7: Display the below data as a bar chart. Red, Green, Green, Green, Blue, Blue, Red, Blue, Green, Green, Red, Red, Blue, Green, Red, Red Solution: In the first step we will create a frequency table for this data: Color Frequency red 6 green 6 blue 4 Then we use this table for creating a bar chart 0 1 2 3 4 5 6 7 red green blue Frequency Color
  • 10. 11 Histogram A histogram is similar to bar charts, but it is used for representing the quantitative variable rather than qualitative variables. Example8: Draw a histogram for the following frequency distribution. Classes Frequency 60 - 74 4 75 – 89 5 90 – 104 10 105 – 119 12 120 – 134 16 135 – 149 7 150 – 164 6 Total 60 Solution: 0 2 4 6 8 10 12 14 16 18 60 - 74 75 – 89 90 – 104 105 – 119 120 – 134 135 – 149 150 – 164 Frequency Classes
  • 11. 11 Pie Chart A pie chart is a circle divided into sectors, where each sector represents a category (relative frequency of each class) of data that is proportional to the total amount of data collected. We can calculate the angle size of each class by the following rule: Angle size of class= relative of the class X 360o Example9: Draw a pie chart for the data in example 1. Angle SizeRelative FrequencyFrequencyClass 0.16*360=57.64/25=0.164Lecturer 0.2*360=725/25=0.25Assistant lecturer 0.24*360=86.46/25=0.246Researcher 0.4*360=14410/25=0.411Assistant Researcher 360125Total Lecturer 57.6o Assistant lecturer 72o Researcher 86.4 Assistant Researcher 1440
  • 12. 12 Frequency Polygon It is a chart that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes. Example10: draw a frequency distribution for the frequency distribution in example3. Frequency Curve Frequency curve is like a frequency polygon, but there is one difference between them, instead of using lines to connect midpoints a smooth curve will be used. Example 11: draw a frequency curve for the data in example 4. 0 2 4 6 8 10 12 14 16 18 67 82 97 112 127 142 157 Frequency Midpoints 1 7 15 15 8 4 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 Frequency Midpoints 1 3 5 7 9 11
  • 13. 13 Cumulative Frequency Chart It is a chart that represents the cumulative frequencies of classes in frequency distribution. Example 12: Construct an ascending cumulative frequency chart for the data in example 4. Example13: Construct a descending cumulative frequency chart for the data in example 4. 0 10 20 30 40 50 60 1 2 3 4 5 6 Cumulativefrequency Upper Limit of classes 2 4 6 8 10 12 0 10 20 30 40 50 60 1 2 3 4 5 6 Cumulativefrequency Lower Limit of Classes 2 4 6 8 10 12
  • 14. 14 Exercise 1: complete the following frequency distributions if the widths of classes are equal. Class Midpoint Class Midpoint 3 8 6 18 Class Midpoint 14 26 Exercise2: the height of 35 students were noted and shown as follows: 170 180 175 165 160 155 180 190 185 170 174 178 165 169 186 179 161 171 159 168 177 164 191 140 173 181 177 173 166 162 168 184 168 158 155 Find the following: 1- Frequency distribution 2- Midpoints 3- Descending cumulative frequency 4- Relative frequency And draw: a) Histogram b) frequency polygon
  • 15. 15 SECTION 3 Notations In this section we will represent some useful notations before explaining the subjects that related to measures of central tendency and measures of dispersion (variation). 1- Summation Notation (  ) The symbol n i iX 1 , read as (the summation of X), where n is the number of observations and (i) is the subscript for the order of values. Let X is a variable represent 4 values: 2, 3, 5, and 10. Then the sum of variable X is represent as follow: 2010532 432 1 4 1 1     XXXXXX n i i ii Symbol Operation n n i i XXXX  21 1 Sum of observations 22 2 2 1 1 2 n n i i XXXX   Sum of Square of observations  2 21 2 1 n n i i XXXX         Square of Sum of observations Let X and Y are random variables and a is a constant then ana XaaX n i n i i n i i . 1 11                 n i i n i i n i ii n i i n i i YXYX anXaX 111 11 .  
  • 16. 16   nn n i ii n i i n i i n i ii YXYXYXYX YbXabYaX .... 2211 1 111                 n i i n i i XX 1 2 2 1 Example 1: If Xi represents the following 4, 3, 5 and 1. Find the following: a-  n i iX 1 b-  n i iX 1 2 c-  n i iX 1 2 d-    n i iX 1 3 Solution:     11213)4(333) 26)13.(222) 511534 ) 131534 ) 4 1 4 11 4 11 2222 2 4 2 3 2 2 2 1 4 1 2 1 2 4321 4 11               i i i i n i i i i n i i i i n i i i i n i i XXXd XXc XXXXXXb XXXXXXa
  • 17. 17 2- Pie Notation  )( The symbol  n i iX 1 is used to multiplication of all values of Xi’s, or: n n i i XXXX .. 21 1         n i i n n i i n i n XaaX aa 11 1 . Example 2: If Xi represents the following 4, 2, 5 and 3. Find the following: a-  n i iX 1 b-  n i iX 1 5 Solution: 1203*5*2*4 ...) 4321 4 11     XXXXXXa i i n i i b) 75000)120.(5 .55 4 11     n i i n n i i XX Exercise: If Xi: 5, 3, 4, 2 and Yi: 3, 1, 4, 2 then find the following: a-  4 1 2 i iX b-  4 1 3 i iY c- 2 4 1 . i i i YX d-    n i ii YX 1 e- 4 4 1 66 i f-  4 1i iX g-  n i iY 1 4 h- i n i i YX .2 1  j-  4 1i i i Y X k-   2.3 4 1  i i i YX
  • 18. 18 SECTION 4: MEASURES OF CENTRAL TENDENCY In the previous sections, we have studied how to collect raw data, its classification and tabulation in a useful form, which contributes in solving many problems of statistical concern. Yet, this is not sufficient, for in practical purposes, there is need for further condensation, particularly when we want to compare two or more different distributions. We may reduce the entire distribution to one number which represents the distribution. A single value which can be considered as typical or representative of a set of observations and around which the observations can be considered as Centered is called an ’Average’ (or average value) or a Center of location. Since such typical values tend to lie centrally within a set of observations when arranged according to magnitudes, averages are called measures of central tendency. So the measure of central tendency is a value at the center or middle of a data set. This value represents all data of the group. The fundamental measures of tendencies are: (1) Arithmetic Mean (2) Weighted Mean (3) Harmonic Mean (4) Quadratic Mean (5) Mode (6) Median However the most common measures of central tendencies or locations are: Arithmetic mean, median and mode.
  • 19. 19 1)Arithmetic Mean The arithmetic mean (generally called mean) is the sum of all observations (values of all items) together and divides this sum by the number of observations (or items). The symbol X (pronounced as X bar) represents the sample mean and  represents the population mean. Arithmetic mean for ungrouped data Suppose, we have (n) observations (or measures) X1, X2, X3... Xn then the Arithmetic mean is obviously: n XXXX n X X n n i i    3211 Where: Xi = the ith observation. n = the size of the data. The mean for a population consisting N observations is: N XXXX N X N N i i    3211  Example: Calculate the arithmetic mean of the given values: 98 96 95 98 100 92 96 69 Solution: 93 8 699692100989596981     n X X n i i
  • 20. 21 Arithmetic mean for grouped data: The arithmetic mean of grouped data is found by multiplying every midpoints (i.e. value of x) by its corresponding frequency (fi) then their total (sum) is found  ii xf . , and then dividing this sum by the  if .   i ii f xf X . The above formula will be sample data. Similar formulas are used for population data. Example: Determine the mean for the following set of data. Classes Frequency 8 - 2 10 3 12 5 14 4 16 1 Solution:   i ii f xf X . Classes Frequency (fi) Midpoint (xi) fi . xi 8- 2 9 18 10- 3 11 33 12- 5 13 65 14- 4 15 60 16- 1 17 17 Total 15 193 87.12 15 193 X
  • 21. 21 The Properties of the Arithmetic Mean: 1- The sum of the deviations, of all the values of x, from their mean, is zero. 0)( : )( 1 1 1 1 1              XnXnXX then XXn n X Xhavewe XnXXX n i i n i i n i i n i n i ii 2- If ),...,,( 21 kXXX represent the means for k groups based on ),...,,( 21 knnn observations respectively, the mean of the groups combined is:      k i i k i ii n Xn X 1 1 . 3- The sum of squares of the deviations from the mean is smaller than from any other value. (prove this property) Advantage (merits) of Arithmetic mean 1- It is easy to calculate and simple to understand. 2- It is very popular (most widely used). 3- It is based on all the observations; so that it becomes a good representative. Disadvantage (demerits) of Arithmetic mean 1- It is affected by outliers or extreme values. 2- It cannot be obtained if a single observation is missing or lost; 3- It cannot be calculated in case open-frequency distributions. 4- It cannot be computed for qualitative data.
  • 22. 22 2) Weighted Arithmetic Mean: One of the limitations of the arithmetic mean is that it gives equal importance to all the items. But there are cases where the relative importance of the different items is not the same. When this is so, we compute weighted arithmetic mean. The formula for computing weighted arithmetic mean in case of ungrouped data is: WWW XWXWXW W XW n nn n i i i n i i WX          21 2211 1 1 Where, Wi is the weight of ith observation. The formula for computing weighted arithmetic mean in case of grouped data is: nn nnn n i ii i n i i W fff ff f i X WWW xWXWXfW W XfW          2211 22211 1 1 1 Example: The marks of a student in the final examination of Statistics department are as follows: Subjects (Xi): 98 96 95 98 100 92 96 69 Units (Wi): 2 3 3 1 3 3 2 2 Calculate the weighted mean. Solution: 3158.93 19 1773 22331332 )2*69()2*96()3*92()3*100()1*98()3*95()3*96()2*98( 1 1          W n i i i n i i W X X W XW
  • 23. 23 Remark: If all the weights are equal, then the weighted mean is the same as the arithmetic mean. Exercise1: The average marks of three groups of students having 70, 50 and 30 students respectively are 50, 55 and 45. Find the average marks of all the 150 students, taken together. Exercise2: following frequency distribution showing the marks obtained by 50 students in statistics at Soran institute. Find the arithmetic mean. Classes Frequency (fi) 20 - 29 1 30 - 39 5 40 - 49 12 50 - 59 15 60 - 69 9 70 - 79 6 80 - 89 2 Exercise3: The mean of a certain number of observations is 40. If two items with values 50 and 64 are added to this data, the mean rises to 42. Find the number of items in the original data. Exercise4: If   n i iX 1 72)4( and   n i iX 1 3)7( , then find the number of observation (n).
  • 24. 24 3) Harmonic Mean Harmonic mean is one of the measures of central tendency, which are used less than other measures (mean, median and mode). The formula for computing weighted arithmetic mean in case of ungrouped data is:   n i i h X n X 1 1 And for grouped data is:     n i i i i h X f f X 1 Example: calculate the harmonic mean for the following data: Xi: 8 2 5 3 4 7 8 Solution:   n i i h X n X 1 1 : 1 iX 0.13 0.5 0.2 0.33 0.25 0.14 0.13 167.4 68.1 7 68.1 1  h i X X 4) Quadratic mean n X X n i i q   1 2 for Ungrouped data      n i i n i ii q f Xf X 1 1 2 for grouped data
  • 25. 25 5) MODE The mode (Mo) is the value that occurs most often in a data set. Mode for ungrouped data: The mode of the following data set: 5, 6, 7, 5, 5, 10, 4, 5, 4, 7, 5, 5 is the number 5 because it is repeated more than other numbers (6 times). Remark: When 2 numbers occur with the same greatest frequency, each one is mode and the data set is bimodal. When more than 2 numbers occur with the same greatest frequency, each is a mode and the data set is said to be multimodal. When no number is repeated, we say that there is no mode. Example: Find the mode of the following data set: 5, 7, 6, 7, 5, 7, 5, 10, 4, 4, 7, 5. Solution: Number 5 and 7 are both modes. The data set is bimodal. Mode for grouped data: Let (X1, X2, … Xn) represent the class marks of the class intervals with ( f1, f2, …, fn) represent the frequencies. The modal class is that class which has the highest frequency. The formula of obtaining the mode is as follows: k kkkk kk k W ffff ff LMo       )()( )( 11 1 Where: Lk: lower limit of modal class. fk: modal class frequency fk-1: frequency of previous class fk+1: frequency of next class Wk: Size of modal class interval (class width).
  • 26. 26 Example: Find the mode for the following frequency distribution: Solution: Modal class is 30 – 39 because it has a highest frequency (10). Lk=30, fk=10, fk-1=7, fk+1=8, Wk=10 k kkkk kk k W ffff ff LMo       )()( )( 11 1 3610 5 3 30 10 )810()710( )710( 30     Mo Remark1: If there are 2 or more modal classes; therefore, to find the model class we must use assembly method. Remark2: When we use assembly method, the formula of mode will be: k kkkk kk k W ffff ff LMo       11 1)( Remark3: If the widths of the classes are not equal, in this case adjusted frequency must be used instead of real frequency. Where adjusted frequency for each class is equal to i i W f . Class frequency 10 – 19 5 20 – 29 7 30 – 39 10 40 – 49 8 50 – 59 4 60 – 69 3 70 – 79 1
  • 27. 27 Example: Find the mode for the following frequency distribution: Solution: There are 2 modal classes, therefore, to find the model class we must use assembly method and it is as follows: From the previous table we can abstract the following table: Serial No. Of column Greatest frequency appears in the column Contributor Class 1 4 1, 2 2 8 1, 2 3 7 2, 3 4 11 1, 2, 3 5 9 2, 3, 4 Then the 2nd class is the modal class Class frequency 10 – 19 4 20 – 29 4 30 – 39 3 40 – 49 2 50 – 59 3 60 – 69 3 70 – 79 1 Class frequency 1st assembly 2nd assembly 3rd assembly 4th assembly 10 – 19 4 8 1120 – 29 4 7 930 – 39 3 5 40 – 49 2 5 850 – 59 3 6 760 – 69 3 4 70 – 79 1
  • 28. 28 Lk =20, fk =4, fk-1 =4, fk+1 =3, Wk =10 k kkkk kk k W ffff ff LMo       11 1)( 2010 3444 )44( 20    Mo Advantage of Mode 1- It is easy to calculate. 2- It is not affected by extreme values. 3- It can be used for qualitative data. 4- It can be located graphically (Histogram). 5- It can be calculated for distributions with open end classes. Disadvantage of Mode 1- It is not based upon all the observations. 2- It is not always possible to find a clearly defined mode (2 modes or 3 modes). 3- It is not capable of further mathematical treatment. Exercise: Find the mode for the following frequency distributions: Class frequency Class frequency 5 – 2 10 – 30 10 – 6 20 – 12 15 – 10 30 – 16 25 – 22 40 – 28 35 – 27 50 – 26 50 – 60 11 60 – 14
  • 29. 29 6) MEDIAN The Median (Me) is the value of the middle item in a data set and divides the dataset in to two equal parts, one part comprising all values greater and the other all values smaller than the median Median for ungrouped data: In the first step we will arrange the data in ascending (increasing) order. If number of observations (n) is odd, the median is the observation that has        2 1n order. If number of observations (n) is even, then the median is the average of observations that have order       2 n and       1 2 n . Example: Find the median of the following data set: 55, 62, 53, 70, 68, 65, 63, 79, and 80. Solution: Arrange the data increasingly: 53, 55, 62, 63, 65, 68, 70, 79, 80. Since n=9 is odd, then the order of median is        2 1n 5 2 19 2 1              n Then the 5th observation is the value of median or Me=65. Example: Find the median of the following data set: 20, 22, 19, 26, 30, 27, 28, 29, 18, 20, 23, 25. Solution: Arranging the data in increasing order 18, 19, 20, 20, 22, 23, 25, 26, 27, 28, 29, 30 2366 2 12 2 isvalueththe n            
  • 30. 31 25771 2 12 1 2 isvalueththe n              Then: 24 2 2523   Me Median for grouped data: To find the median of a frequency distribution, follow these steps: Step1: Find cumulative frequency (Ascending or descending). Step2: Compute the median order that equal to 2  if . Step3: If k i k F f F    2 1 , then the median class is the class which its order is K . Step4: Compute the value of median: k k k i k f W F f LMe . 2 1            for ascending cumulative frequency. k ki kk f Wf FLMe . 2 *           for descending cumulative frequency. Where: Lk : Lower Limit of median class. fk : Frequency of the median class. W: Median class’s width.  if : Sum of the frequencies. Fk–1: Ascending cumulative frequency precede the median class. * kF : Descending cumulative frequency of the median class.
  • 31. 31 Example: Find the mode for the following frequency distribution: Classes 100 - 120 - 140 - 160 - 180 - 200 - 220 - no. of families 3 7 14 20 18 12 6 Solution: In the first step we find ascending cumulative frequency Then we find the median order that equal to: 40 2 80 2   if Compare the median order with ascending cumulative frequency then: 444024 2 1    k i k F f F Then the median class is 4th class. Then: Lk=160, Wk=20, fk=20 4 4 34 . 2 f W F f LMe i           176 20 20 .24 2 80 160       Me Class frequency Ascending Cumulative frequency 100 - 3 3 120 - 7 10 140 - 14 24 160 - 20 44 180 - 18 62 200 - 12 74 220 - 6 80 Total 80
  • 32. 32 Merits of Median 1. It is easy to calculate and understand. 2. It is not affected by extreme values like the arithmetic mean 3. It can be found by mere inspection. 4. It can be used for qualitative studies. 5. It can be calculated for distributions with open-end classes. 6. It can be obtained graphically. Demerits of Median 1. It is not capable of further algebraic treatment. 2. It is not based on all observations. Exercise: find the median for the following frequency distribution by using ascending and descending cumulative frequency. The relationship between Arithmetic Mean, Median and Mode If the frequency distribution is symmetric then the following relationship between these measures is true: 3 o e MX MX   Class frequency 18 - 10 28 - 15 36 - 18 50 - 22 70 - 20 100 - 18 130 - 150 13 Total
  • 33. 33 SECTION 5) Measures of Dispersion (Variation) Measures that describe the spread of a data set are called measures of dispersion. The main objective is to know the homogeneity of the values for a data set, or to compare between the values for two or more than two data set. 1-Range The simplest measure of absolute variation is the range which calculated by subtracting the smallest value from the largest value of a data set. R=Largest value – Smallest value Example: find the range for the following data: 2, 5, 3, 8, 7, 10, 9, 12, 15. Solution: R= Largest value – Smallest value=15-2=13 Remark: in case of grouped data we calculate the value of Range by subtracting the lower limit of first class from the upper limit of last class. 2- Mean Deviation It is the sum of the absolute deviation of observations from a point (A) divided by the number of observations. n AX DM n i i   1 . for ungrouped data n AXf DM n i ii   1 . for grouped data Where A, may be is arithmetic mean ( X ) or median ( eM ) or mode ( oM ). Example: find the value of mean deviation for the following data by using mean, median and mode. Xi: 2, 3, 4, 5, 5, 6, 7, 10, 13, 14, 19 Solution:
  • 34. 34 First we find the value of ( X ) and ( eM ) and ( oM ). X =8, eM =6, oM =5 Xi XXi  oi MX  ei MX  2 6 3 4 3 5 2 3 4 4 1 2 5 3 0 1 5 3 0 1 6 2 1 0 7 1 2 1 10 2 5 4 13 5 8 7 14 6 9 8 19 11 14 13 Total 48 44 45 367.4 11 48 )(. 1     n XX XDM n i i 0909.4 11 45 )(. 1     n MX MDM n i ei e 4 11 44 )(. 1     n MX MDM n i oi o
  • 35. 35 3- Variance It is one of the most important measures of absolute variation. The variance can be calculated by taking the average of the square of the distance (deviation) of each observation from the mean of data set. The formula for the population variance ( ) for raw data is: N X n i i   1 2 2 )(   Where: X: individual value µ: population mean N: population size (number of observations). Also the formula for the sample variance (S2 ) for raw data is as follows: 1 )( 1 2 2     n XX S n i i On the other hand, the formula for the sample variance for grouped data is: 1 )( 1 2 2     n XXf S n i ii Where  ifn Example: find the variance for the following dataset: 56, 68, 72, 63, 65, 68, 71, 69, 62, 56. Solution: 1 )( 1 2 2     n XX S n i i 65 10 650 10 10 1  i iX X
  • 36. 36 Xi )( XXi  2 )( XXi  56 -9 81 68 3 9 72 7 49 63 -2 4 65 0 0 68 3 9 71 6 36 69 4 16 62 -3 9 56 -9 81 Total 294 then 667.32 110 2942   S Properties of variance: 1) 02 S 2) If 222 XYii SaSaXY  , where a is a constant. (Prove that) 3) If 22 XYii SSbXY  , where b is a constant. (Prove that) 4) If X and Y are independent variables and iii YX=Z  , then the variance of Z is: 222 YXZ SSS  5) If ),...,,( 22 2 2 1 nSSS represent the variance for k groups based on ),...,,( 21 knnn observations respectively, then the pooled variance of the groups is as follows:        n i i n i ii p n Sn S 1 1 2 2 )1( )1( where 30in      n i i n i ii p n Sn S 1 1 2 2 . where 30in
  • 37. 37 4-Standard deviation (S) Standard deviation is the most important and most widely used measure of absolute variation. Standard deviation is the square root of variance. 1 )( 1 2 2     n XX SS n i i Example: Find the standard deviation of the following frequency distribution. Solution: 75.175 80 14060 . 1 1      n i i n i ii f Xf X 198.30 80 72955 ).( 1 2     n XXf S n i ii Class fi Xi fi.Xi )( XXi  2 )( XXi  2 ).( XXf ii  100 - 3 110 330 -65.75 4323.063 12969.19 120 - 7 130 910 -45.75 2093.063 14651.44 140 - 14 150 2100 -25.75 663.0625 9282.875 160 - 20 170 3400 -5.75 33.0625 661.25 180 - 18 190 3420 14.25 203.0625 3655.125 200 - 12 210 2520 34.25 1173.063 14076.75 220 - 6 230 1380 54.25 2943.063 17658.38 Total 80 14060 72955
  • 38. 38 Coefficient of Variation A disadvantage of the standard deviation as a comparative measure of variation is that it depends on the units of measurement. This means that it is difficult to use the standard deviation to compare measurements from different populations. For this reason, statisticians have defined the coefficient of variation, which expresses the standard deviation as a percentage of the sample or population mean. If X and S represents the sample mean and the sample standard deviation, then the coefficient of variation (C.V.) is defined to be: 100*.. X S VC  If μ and σ represent the population mean and standard deviation, then the coefficient of variation CV is defined to be: 100*..   VC Notice that the numerator and denominator in the definition of CV have the same units, so CV itself has no units of measurement. This gives us the advantage of being able to directly compare the variability of two different populations using the coefficient of variation. Example1: A company has two sections (A and B) with 40 and 65 employees respectively. Their average weekly wages are $450 and $350. The standard deviations are 7 and 9. Which section has larger variability in wages? Solution: 55.1100* 450 7 100*.. )(  X S VC A 57.2100* 350 9 100*.. )(  X S VC B Because the C.V for section A is smaller than C.V for section B then, section B has larger variability. So section A has more homogeneity than section B.
  • 39. 39 Example2: if we know that the mean and standard deviation of heights and weights of 40 students are as below: Mean Standard Deviation Weights 68.34 3.02 Heights 172.55 26.33 Then find the coefficient of variation of height and weight and compare the results. Solution: 42.4100* 34.68 02.3 100*. )Weights(  X S VC 26.15100* 55.172 33.26 100*. )(  X S VC Height So, the Weights (with C.V. =4.42) have less variation than Heights (with C.V.=15.26).