1. Unit IV : STATISTICS
Statistics is often used when referring to data.
However , statistics has second meaning:
Statistics is a science which deals with methods in
collection , gathering , presentation , analysis , and
interpretation of data.
Data gathering involves getting information through
interviews , questionnaires , objective observations ,
experimentation , psychological tests , and other
methods
2. Data presentations in tabular form are of two
types : the text or summary table.
Data analysis is the resolution of information
into simpler elements by the application of
statistical principles. Results of data analysis
are explained and interpreted.. This includes
relating the findings to the existing theories and
earlier studies in the area.
3. Population as Differentiated from Sample
The word population refers to groups or aggregates of
people , animals , objects , materials , happenings or
things of any form. This means that there are
populations of students , teachers , supervisors ,
laboratory animals . Trees , manufactured articles ,
birds , insects and many others . Your concern are the
properties descriptive of the group or aggregate rather
than properties of particular members.
4. If your interest is on the few members of the
population to represent their characteristics or traits ,
these members constitute a sample.
The measures of the population are called
parameters , while
Those of the samples are called estimates or
statistics.
5. Descriptive Statistics
- is concerned with collecting ,
organizing , summarizing , and presenting
data.
Inferential Statistics
- has to do with making generalization
about drawing conclusions from the data.
6. Variable
The term variable refers to a character or
property whereby the members of the group or set
vary or differ from one another. It is a characteristic
that has two or more mutually exclusive values or
properties.
For instance;
The member of the group may vary in sex ,age ,
eye color , intelligence , attitude and others.
Labels or numerals may be used to name a variable
and its particular values are referred to as values or
levels.
7. Variables according to Functional
relationship
Variables are classified into dependent and
independent with respect to their functional
relationship.
For example ;
If you treat variable y as a function of
variable z , then z is your independent
variable and y is your dependent variable.
8. This means that the value of y , say
academic achievements depends on the value
of z , say the mental ability.
Dependent variables are sometimes called
criterion variables , while independent
variables are sometimes called predictor
variables or variates.
9. Variables according to Continuity of Values
1. Continuous variable - these are variables
whose level can take continuous values.
Examples are height, weight , length, and width.
With these variables , you can make measurements
of varying degrees of precision. The size and
accuracy of the measurements that you can make
along the line depends on the way that the
measurements are made.
10. Suppose that you measure the height of a
person and say you that it is 125 cm in
height. Does it mean that he is exactly 125
cm tall ? Of course he is not.
In reading your scale , you merely read that
number of cm to which the man’s height was
closest. The 125 includes the segment of
your line. One point of view states that the
segment extends from 125.56 to 125.4999.
you can round off the latter then say 125 cm.
11. 2. Discrete or Discontinuous variables
These are the variables whose values or
levels can not take form of decimals.
Example;
The size of the particular family , since it
can only take a specific value as 1 ,2 ,3 ,4
and so on. Values between them like , 1.5 ,
or 3.2 are not possible . We cannot have
family with 5.5 members.
12. 3. Variables according to Scale of
Measurement
a. Nominal Variable
- this variable refers to a property of the
members of a group defined by an operation
which allows making a statements only of
equality or difference.
13. Example:
Individuals may be classified according to their
color of skin. Color is an example of a nominal
variable. In dealing with nominal variables , you
may assign numerals to represent classes , but such
numerals are labels only whose purpose is to
identify the members within the given class.
Example:
How many students are light brown , dark brown
and the like. This is the frequency of the students
that belongs to that particular kind of skin color.
14. b. Ordinal Variable
- This is the property defined by an operation
whereby members of a particular group are ranked.
In this operation , we can state that one member is
greater than the others in a criterion rather than
saying that he/she is only equal or different from
each other.
If you judge individuals according to
aggressiveness , cooperation and some other
qualities by ranking them , the resulting variable
is an ordinal variable.
15. c. Interval Variable
- this refers to a property defined by an operation
which permits making statements of equality of
intervals rather than just statements of sameness
or difference and greater than or less than.
The interval variable does not have a “true “ zero
point , although for convenience , a zero point may
be arbitrary assigned.
The measurements of Fahrenheit and centigrade
temperature constitutes interval variables.
16. d. Ratio Variables
- this refers to the property defined by
operation which permits making a statements
of equality of ratios in addition to statements
of sameness or difference , greater than or
less than and equality or inequality of
differences.
17. - This means that one level or value may be
thought of or said as double , triple or five
times the other and so on.
- In here , the number used represents
distance from a natural origin like length ,
weight and numerosity of aggregates.
- one object can be four times as long , 10
times as heavy , or two times as many as
another.
18. USES OF STATISTICS
1. It can give a precise description of data.
- This is the function of statistics which enables us
to make accurate statements or judgements about
averages , variability , and relationships.
Example :
When you describe the academic performance of a
group of students according to computed mean ,
standard deviation , and correlation with another
factor.
19. 2. It can predict the behavior of individuals.
In school , the grades of the students can be
predicted through scholastic aptitude test.
A teacher’s performance may also be predicted by
his performance in an instrument like a teacher
aptitude test.
To measure the success of a student , a teacher ,
or a worker , we may have to compute measures
like mean , standard scores , percentile and other
statistical methods.
20. 3. It can be used to test hypothesis
We can determine whether a variable is related or
not to another variable through a test of inference
such as in correlation.
Other statistical measure we can apply for
inferential purposes are
t – test , chi-square test , F-test and others. This
includes the scale of measurements used such as
nominal , ordinal , interval and ratio , and its
distributed or not , and other consideration such as
your purpose.
21. MEASURE OF CENTRAL TENDENCY
Arithmetic Mean
One of the most basic statistical concepts involves
finding measures of central tendency of a set of
numerical data. It is often helpful to find numerical
values that locate , in some sense , the center of a set
of data.
The arithmetic mean is the most commonly used
measure of central tendency. The arithmetic mean of a
set of number is often referred to as simply mean. To
find the mean of the data set , find the sum of the data
values and divide by the number of data values.
22. For instance , to find the mean of the five salaries,
$43,750 , $39,500 , $38,000 , $41,250 , $ 44,000
mean =
$43,750 + $39,500 + $38,000 + $41,250 + $ 44,000
5
Mean =
$206,500
5
Mean = $ 41,300
The mean suggest that an employee can
reasonably expect a job offer at a salary of about $
41,300.
23. 1. Mean
The mean of n numbers is the sum of
numbers ( 𝑥 ) divided by n.
Mean = 𝑥 =
𝑥
𝑛
24. Example ( ungroup data ) :
1. Six friends in Biology class of 20 students
received test grades of
92 84 65 76 88 90
Find the mean of the scores;
Solution :
Mean = 𝑥 =
𝑥
𝑛
=
92+84+65+76+88+90
6
𝑥 =
𝑥
𝑛
=
495
6
= 82.5
the mean of these test score is 82.5
25. 2. A doctor ordered 4 separate blood tests to
measure a patient’s total blood cholesterol levels.
The test results were
245 235 220 210
Find the mean of the blood cholesterol levels.
Mean = 𝑥 =
𝑥
𝑛
𝑥 =
245 + 235 + 220 + 210
4
=
910
4
𝑥 = 227.5 is the mean
26. 2. Median
Another type of average is median.
Essentially , the median is the middle
number or the mean of the two middle
numbers in a list of numbers that have been
arranged in numerical order from the
smallest to the largest or to the largest to
the smallest. Any list of numbers that is
arranged in numerical order from smallest to
largest or from largest to smallest is a
ranked list.
27. Median
The median of a ranked list of n numbers is:
the middle number if n is odd.
the mean of the two middle number if n is even.
Example :
Find the median of the data in the following
lists.
a. 4,8,1,14,9,21,12
b. 46,23,92,89,77,108
28. Solution :
a. The list contains 7 numbers . The median of a
list with odd number of entries , is found by
ranking the numbers and finding the middle
number.
1 4 8 9 12 14 21
The middle number is 9 ,
thus , 9 is the median.
29. b. The lists contains 6 numbers . The median of a
list of data with an even number of entries is found
by ranking and computing the mean of the two
middle numbers.
23 46 77 89 92 108
The two middle number is 77 and 89.
The mean of
77+89
2
= 83
Thus , 83 is the median.
30. 3. Mode
The mode of a list of numbers is the number
that occurs most frequently.
Some list of numbers do not have mode ,
for instance ,
4 6 10 32 15 49
because no numbers occurs more often then the
other number , therefore , there is no mode.
31. a list of numerical data can have more than one
mode.
For instance ;
4 2 6 2 7 9 2 4 9 8 7 9
the number 2 and 9 occurs three times
Thus , 2 and 9 are both the modes of the data.
32. The weighted Mean
The value called the weighted mean , is often
used when some data values are more important
than the others.
For instance;
Many teachers determine a student’s course
grade from the students test and final exam. To
find the weighted mean of the student’s score , the
teacher first assigns a weight to each score. In this
case the teacher could assign each of the score a
weight of 1 and the final exam score a weight of 2.
33. A students with a test score of 65 , 70 , 75 , and a
final exam of 90 has a weighted mean of
𝑊
𝑚 =
65𝑥1 + 70 𝑥 1 + 75 𝑥1 +( 90 𝑥 2 )
5
=
390
5
= 78
Note that the numerator of the weighted mean
above is the sum of the products of each test score
and its corresponding weight. The number 5 in the
denominator is the sum of all the weights.
34. Data that have not been organized or manipulated
are called raw data.
A large collection of raw data may not provide
much readily observable information.
35. A frequency distribution , which is a table that is
a list of observed events and the frequency of
occurrence of each observed event , is often used
to organized data.
For instance , consider the raw data ( number of
laptops owned by a family )
2 0 3 1 2 1 0 4
2 1 1 7 2 0 1 1
0 2 2 1 3 2 2 1
1 4 2 5 2 3 1 2
2 1 2 1 5 0 2 5
36. A Frequency Distribution Table
( Observed event )
Number of laptops
x
Frequency
Number household , f , with x laptops ( computers)
0 IIIII - 5
1 IIIII – IIIII – II 12
2 IIIII – IIIII – IIII 14
3 III 3
4 II 2
5 III 3
6 0
7 I 1
Total ----- 40
37. Example :
Find the mean of the data displayed in the Frequency
Distribution Table.
Solution :
From the table , the numbers in the right-hand column of the
table are the frequencies f , for the number in the first column.
Mean = 𝑥=
(𝑓 ∙ 𝑥 )
𝑓
=
0 𝑥 5 + 1 𝑥 12 + 2 𝑥 14 + 3 𝑥 3 + 4 𝑥 2 + 5 𝑥 3 + 6 𝑥0 +( 7 𝑥 1)
40
=
79
40
= 1.975
the mean number of laptop computers per household in the
subdivision is 1.975 or 2.0
38. MEASURE OF DISPERSION
Some characteristics of a set of data may not be
evident from the examination of average.
In some cases , the average values do not reflect ,
so the measure of spread or dispersion of data is
being used.
To measure the spread or dispersion of a data ,
the statistical values known as the Range and the
Standard Deviation was introduced.
39. Range
The range of a set of data values is the
difference between the greatest data value and the
least data value.
Example :
If the greatest number is 100 and the least is 21 ,
find the range of the given data.
Range = 100 - 21 = 79
40. Example :
The number of workers , (in millions ), for the
five countries with largest labor forces. Find the
range.
China - 778 India - 472 USA - 147
Indonesia - 106 Brazil - 82
Range = highest value data - lowest value data
Range = 778 - 82 = 696
The range is 696 millions workers
41. Example :
Find the deviation from the mean for the
five data items
778 472 147 106 82
Solution:
First calculate the mean
Mean = 𝑥 =
𝑥
𝑛
=
778+472+147+106+82
5
= 317
42. DEVIATION FROM THE MEAN
data item Deviation
Data item - mean
778 778 - 317 = 461
472 472 - 317 = 155
147 147 - 317 = -170
106 106 - 317 = - 211
82 82 - 317 = -235
0
43. This shows that we cannot find a
measure of dispersion by finding the
mean of deviations , because this value
is always equal to zero. However , a
kind of average of deviations from the
mean , called the standard deviation
can be computed.
44. STANDARD DEVIATION
The range of a set of data is easy to compute ,
but it can be deceiving. The range is a measure
depends only on two most extreme values , and as
such it is very sensitive.
A measure of dispersion that is less sensitive to
extreme values is the standard deviation .
The standard deviation of a set of numerical data
makes use of the amount by which the individual
data deviates from the mean.
45. STANDARD DEVIATION
s =
( 𝑥 − 𝑥 )2
𝑛 − 1
where :
s = standard deviation
𝑥 = mean
n = number of sample
x = number
46. Procedure for Computing a Standard Deviation
1. Determine the mean of the n numbers.
2. for each number , calculate the deviation
( difference ) between the number and the mean
of the numbers, ( or data item - mean ).
3. Calculate the square of each deviation and find the
sum of these squared deviations
( 𝑑𝑎𝑡𝑎 𝑖𝑡𝑒𝑚 − 𝑚𝑒𝑎𝑛 )2.
4. Then divide the sum by n – 1.
( 𝑑𝑎𝑡𝑎 𝑖𝑡𝑒𝑚 −𝑚𝑒𝑎𝑛)2
𝑛 −1
5. Find the square root of the quotient in step 4.
s =
( 𝑑𝑎𝑡𝑎 𝑖𝑡𝑒𝑚 −𝑚𝑒𝑎𝑛 )2
𝑛 −1
47. Example :
Find the standard deviation of the following
numbers that were obtained by sampling.
2 4 7 12 15
Solution :
Step 1 : Determine the mean of the numbers.
Mean = 𝑥 =
2+4+7+12+15
5
=
40
5
= 8
48. Step 2 . For each number , calculate the deviation
between the number and the mean , and step 3 ,
calculate the square of each deviation and find the
sum of these square deviations.
𝑥 𝑥 − 𝑥 ( 𝑥 − 𝑥)2
2 2 - 8 = -6 (−6)2 36
4 4 - 8 = -4 (−4)2 16
7 7 - 8 = -1 (−1)2 1
12 12 - 8 = 4 (4)2 16
15 15 - 8 = 7 (7)2 49
total = 118
49. Step 4. Because we have a sample n = 5 values,
118 divide the sum by n - 1
=
118
5 −1
=
118
4
= 29.5
s = 29.5
s = 5.43 - is the standard deviation
50. VARIANCE
A statistics known as the variance is also used as a
measure of dispersion .
The variance for a given set of data is the square of
the standard deviation of the data.
Example :
Find the variance for the previous example.
s = 29.5
𝑠2 = ( 29.5)
2
= 29.5
51. MEASURE OF RELATIVE POSITION
The number of standard deviations between a data value
and the mean is known as the data value’s , z – score or
standard score.
The z- score for a given data value x is the number of
the standard deviations that x is above or below the mean of
the data.
The following formula show how to calculate the z-score
for a data value in a sample.
z – score = 𝑧𝑠 =
( 𝑥 − 𝑥)
𝑠
52. Where :
𝑧𝑠 = z – score
x = data item
𝑥 = mean of the sample
s = standard deviation
53. Example :
Aggu Utang has taken two tests in his Chemistry
class. He scored 72 on the first test , for which the
mean of all scores was 65 and the standard
deviation was 8. He received a 60 on a second test ,
for which the mean of all scores was 45 and the
standard deviation was 12. In comparison to the
other students , did Aggu do better on the first test
or second test ?
54. Solution :
Find the z- score for each test
𝑧72 =
72 −65
8
= 0.875
𝑧60 =
60 −45
12
= 1.25
Aggu Utang scored 0.875 standard deviation above the mean
on the first test and 1.25 standard deviation above the mean
on the second test. The z-score indicates that , in
comparison to his classmates , Aggu Utang scored better on
the second test than he did in the first.
55. PERCENTILES
Most standardized examinations provide scores in
terms of percentiles .
𝑝𝑡ℎ Percentile
A value x is called the pth percentile of a data set
provided p% of the data values are less than x
56. Example : ( Using percentile )
In a recent year , the median annual salary for a medical
technologist was $ 76 ,500 . If the 90th percentile for the
annual salary of the medical technologist was $ 110,000,
find the percent of the medical technologist whose annual
salary was
a. more than $76,500.
b. Less than $110,000
c. between $ 76,500 and $110,000.
57. Solution :
a. by definition , the median is the 50th percentile.
Therefore, 50% of the medical technologist earned more
than $ 76,500.
b. Because $ 110,000 is the 90th percentile , 90% of all
medical technologist made less than $110,000.
c. from part a and b , 90% - 50% = 40% of the medical
technologist earned between $76,500 - $ 110,000.