data_management_review_descriptive_statistics.ppt

Introduction
• Statistics consists of conducting
studies to collect, organize,
summarize, analyze, and draw
conclusions from data.

We study statistics for several
reasons:
•Like professional people, we
must be able to read and
understand the various
statistical studies performed in
your field.

•You may be called on to
conduct research in your
field, since statistical
procedures are basic to
research.

•You can also use the
knowledge gained from
studying statistics to become
better consumers and citizens.

A variable is a characteristic or
attribute that can assume different
values
• Data are the values (measurements or
observations) that the variables can
assume.
• Variables whose values are determined
by chance are called random variables.

•A collection of data values forms
a data set.
•Each value in the data set is
called a data value or a datum.

Descriptive and Inferential Statistics
• Descriptive statistics consists of the
collection, organization,
summarization, and presentation of
data.
• Inferential statistics consists of
generalizing from samples to populations,
performing hypothesis testing,
determining relationships among
variables, and making predictions.

• A population consists of all subjects
(human or otherwise) that are being
studied.
• A sample is a subgroup of the
population.

Variables and Types of Data
• Qualitative variables are variables that
can be placed into distinct categories,
according to some characteristic or
attribute.
• Quantitative variables are numerical in
nature and can be ordered or ranked.

•Discrete variables assume values that
can be counted.
•Continuous variables can assume all
values between any two specific
values. They are obtained by
measuring.

Classify each of the following data as
to discrete and continuous variables.
• Intelligence
quotient of
college students
• Speed of a car
• Score in algebra
• Population in
certain barangay
• Number of sacks
of rice
• Body temperature
• Salary of
government
employee
• Score in basketball
• Time
• Weight of baby in
ounces

Level of Measurement
• The nominal level of measurement
classifies data into mutually exclusive
(nonoverlapping), exhausting categories
in which no order or ranking can be
imposed on the data.
• The ordinal level of measurement
classifies data into categories that can
be ranked; precise differences between
the ranks do not exist.

• The interval level of measurement ranks
data; precise differences between units of
measure do exist; there is no meaningful
zero.
• The ratio level of measurement
possesses all the characteristics of
interval measurement, and there exists a
true zero. In addition, true ratios exist for
the same variable.

Data Collection and
Sampling Techniques
• Data can be collected in a variety of
ways.
• One of the most common methods is
through the use of surveys.
• Surveys can be done by using a variety of
methods - telephone, mail questionnaires,
personal interviews, surveying records
and direct observations.

To obtain samples that are unbiased,
statisticians use four methods of sampling
• Random samples are selected by using
chance methods or random numbers.
• Systematic samples are obtained by
numbering each value in the population
and then selecting the kth value.

• Stratified samples are selected by
dividing the population into groups
(strata) according to some characteristic
and then taking samples from each
group.
• Cluster samples are selected by dividing
the population into groups and then
taking all members of the selected
clusters as subjects of the samples.

Uses and Misuses of Statistics
“There are three types of lies – lies,
damn lies and statistics.”
“Figures don’t lie, but liars figure”

1. Suspect Samples
“Three out of four doctors surveyed
recommend brand such and such”

2. Ambiguous Averages
3. Changing the subject

4. Detached Statistics
5. Implied connections

6. Misleading Graphs
7. Faulty Survey Questions

What can you say
about the following
graphs?
23

Measures of Central Tendency
•Mean
•Median
•Mode

Mean
• The most reliable and the most
sensitive measure of position.
• It is commonly known as the
“average”

Mean:
•It comes into 2 different
forms:
1) Simple Mean
2) Weighted Mean

Example 1:
A study was done on 5 typical fast-food
meals in Metro Manila. The following table
shows the amount of fat, in number of
teaspoons, present in each meal. Calculate
the mean amount of fat for these 5 fast-food
meals.
Fast-food meal A B C D E
Fat (in tsp) 14 18 22 10 16

•To obtain the simple mean amount
of fat for the 5 fast-food meals
•Mean = (14+18+22+10+16)/5
•Mean = 80/5 = 16

Example 2:
• The following represents the final grades obtained by a nursing
student one summer term:
• Anatomy (5 units) - - - 93
• Chemistry (3 units) - - - 88
• SOT 2 (2 units) - - - 89
– Find the weighted average of the student.

To solve for the weighted
average of the student we have...
wixi
Mean = ----------
w
93(5) + 88(3) + 89(2)
Mean = --------------------------
10
465 + 264 + 178 907
Mean = ----------------------- = -------- = 90.7 (Excellent)
10 10

The Median
What is
the
Median?

The median is . . .
•A positional measure that divides
the set of data exactly into two
parts.
•Determined by rearranging the
data into an array.

n + 1
X = -------
2
n n
X = --- + --- + 1
2 2
--------------
2
Median for Odd Sample Median for Even Sample
 
    

The array for the data A is :
10, 14, 16, 18, 22
•To obtain the median fat content
of the 5 meals we have to use the
median formula for odd sample
since n = 5.
•Median = [(n + 1)/2]s
•Median = (5 + 1)/2
•Median = 3rd item = 16

Median for Even Sample
What is
even?

The following are samples scores
obtained from a 75 item summative test:
(n= 12) 48, 53, 63, 65, 45, 47, 52, 48, 63,
54, 63, 53
•Since n = 12 (even).
•Median = [ 6th
s + 7th
s /2]
•Median = [(53 + 54)/2] = 53.5
Array : 45, 47, 48, 48, 52, 53, 54, 55, 63, 63, 63, 65

The mode is …
The most favorite score.
The score having the highest
frequency.
Determined by way of inspection.

A set of data is said to be …
•Unimodal or monomodal if it
has only one mode.
•Example: 33, 35, 35, 38, 40,
46
•Its mode is 35.

•Bimodal if it has two modes.
•Example: 33, 35, 35, 38, 40,
40, 46
•Its modes are 35 and 40.

•Multimodal if it has more than
two modes.
•Example: 33, 35, 35, 38, 40, 40,
46, 46, 51, 58, 58, 60
•Its modes are 35, 40, 46 and 58.

Find the mean, median and
the mode of the ff:
1. 85, 82, 83, 88, 85, 87, 89, 90
2. 12, 14, 20, 19, 23, 22, 28
3. 24, 34, 27, 27, 34, 24
4. 102, 100, 111, 100, 106, 102
5. 75, 86, 78, 84, 88, 86, 84, 85, 81,
84, 80

Distribution Shapes
• Frequency distributions can assume
many shapes.
• The three most important shapes are
positively skewed, symmetrical, and
negatively skewed.

Positively Skewed
X
Y
Mode < Median < Mean
PositivelySkewed

n
Symmetrical
Y
X
Symmetrical
Mean = Media = Mode

Y
X
NegativelySkewed
Mean
Negatively Skewed
< Median < Mode

Measures of Variability
•The statistical tool used to
describe the degree to
which scores/ observations
are scattered/dispersed.

Measures of Variability
Range
Standard Deviation
Variance

3-3 Measures of Variation -
Range
• The range is defined to be the highest
value minus the lowest value. The symbol
R is used for the range.
• R = highest value – lowest value.
• Extremely large or extremely small data
values can drastically affect the range.

Population Variance
The variance is the average of the squares of the
distance each value is from the mean.
The symbol for the population variance is
( is the Greek lowercase letter sigma)
2
 



=
=
=
2


( )
,
X
N
where
X individual value
population mean
N population size
2

Population Standard Deviation
The standard deviation is the square
root of the variance.
= 2
 



( )
.
X
N
2

3-3 Measures of Variation - Sample
Variance
The unbiased estimator of the population
variance or the sample variance is a
statistic whose value approximates the
expected value of a population variance.
It is denoted by s
2
,
( )
,
where
s
X X
n
and
X samplemean
n samplesize
=
=
2
2
1





3-3 Measures of Variation - Sample
Standard Deviation
The samplestandarddeviation is the square
root of the samplevariance.
= 2
s s
X X
n




( )
.
2
1

Problem:
 Two seemingly equally excellent BSE
students are vying for an academic
honor where only one must have to be
chosen to get the award. The
following are their grades used as basis
for the award:
Franzen : 91, 90, 94, 93, 92
Rico : 92, 92, 90, 94, 92
Whom do you think deserves to get
the award?

Guiding Principle
 The lesser the value of the
measure, the more
consistent, the more
homogeneous and the less
scattered are the
observations in the set of
data.

data_management_review_descriptive_statistics.ppt

Recommended

Recommended

More Related Content

Similar to data_management_review_descriptive_statistics.ppt

Similar to data_management_review_descriptive_statistics.ppt (20)

Recently uploaded

Recently uploaded (20)

data_management_review_descriptive_statistics.ppt