3. What is statistics?
Statistics is a branch of applied mathematics concerned
with collecting, organizing, and interpreting data. It
attempts to infer the properties of a large collection of
data from inspection of a sample of the collection
thereby allowing educated guesses to be made with a
minimum of expense.
4. MAIN BRANCHES OF STATISTICS
Descriptive Statistics refers to the
collection, presentation, and summary of
data (either using charts and graphs or
using a numerical summary).
Inferential Statistics refers to generalizing
from a sample to a population, estimating
unknown population parameters, drawing
conclusions, and making decisions.
5. Population and Sample
Population refers to all the items (infinite or finite) that we
are interested in. It consists of the totality of the observations,
individuals, or objects in which the investigator/researcher is
interested in.
Sample is a subset or portion of the population. It involves
looking only at some items selected from a population.
Situations where the sample maybe
preferred
1. Infinite Population
2. Destructive Testing
3. Timely Results
4. Accuracy
5. Cost
6. Sensitive Information
Situations where a population
maybe preferred
1. Small Population
2. Large Sample Size
3. Database Exists
4. Legal Requirements
6. PARAMETER and STATISTIC
Parameter – is a value calculated using all the
data from a population.
Statistic – is a value calculated using the data
from the sample
7. What is a variable?
A VARIABLE is a characteristic of interest about an object under
investigation that can take on different possible outcomes, such as age,
hair, color, height, weight, and religious preference.
Two kinds of Variables
QUALITATIVE VARIABLES – These are variables that can be placed into
distinct categories, according to some characteristics or attributes.
QUANTITATIVE VARIABLES – These are numerical and can be ordered or
ranked. Also, these consist of two types: Discrete and Continuous.
Discrete are frequencies, obtained by means of counting.
Continuous are represented by measurement values.
8. DATA
Data is a set of values collected from the variable from
each of the subjects that belong to the sample. It refers
to a collection of natural phenomena descriptors such as
results from experiences, observations or experiments, or
a set of premises. It may consist of numbers, words, or
images.
Data can be classified according to the type of variable
for which it was drawn. There are two general types of
data according to how the data vary across cases:
9. TYPES OF DATA
Categorical data have values that are described by words rather
than numbers. It is of limited statistical use. On occasions, the
values of these variables might be represented using numbers. This
is called coding. Example: 1=cash; 2= check ; 3 = credit/debit; 4 =
gift card
Coding a category as number does not make the data numerical and
the numbers do not typically imply a rank. Example: 1 = Bachelor’s ;
2= Master’s; 3=Doctorate
Numerical Data arise from counting, measuring something, or some
kind of mathematical operation. Example: number of insurance
claims; sales for the last quarter; accounting data; economic
indicators; financial ratios.
Two types of Numerical data: discrete (distinct number or integer)
and continuous (any value within an interval).
10. LEVELS OF MEASUREMENT
NOMINAL LEVEL
*From the Latin nomen, meaning “name” and the weakest level of
measurement
*It merely identify a category
*These are data same as “qualitative”, ”categorical”, or “classification”
*These data are being coded numerically. The codes are arbitrary
placeholders with no numerical meaning.
*With these data, the only permissible mathematical operations are
counting (e.g., frequencies)
11. ORDINAL LEVEL
*Ordinal data codes connote a ranking of data values.
*It can be treated as nominal but not vice versa.
*There is no clear meaning to the distance between 1 &2 or
between 2 & 3, or between 3 & 4 (no clear meaning between
“rarely” and “ never”).
INTERVAL LEVEL
*It is a rank data and has meaningful interval between scale
points.
*Since intervals between numbers represent distances,
mathematical operations can be done such as taking the “average.
*The absence of zero is a key characteristic of interval data.
RATIO LEVEL
*It has all the properties of the other three data types and being
considered as the strongest level of measurement.
*It posses a meaningful zero that represents the absence of the
quantity being measured.
12. Data: (male, female, male, male, female)
Table 1: Respondents in terms sex, n=5
Sex Frequency %
Male 3 60
Female
TOTAL
2
5
40
100
R= f/N (100%)
17. Types of Measures for Center
Once the data are collected, it is useful to summarize the
data set by identifying a value around which the data are
centered.
Mode – is the most frequently occurring number in a
data set.
Median – is the middle number or the mean of the
two middle numbers in an ordered set of data.
Mean – is the numerical balancing point of the data
set.
18. The mean is easy to compute. You only deal with one
number. It is not so with the median.
The mean is affected by outliers while the median is
resistant. In a sense, the median is able to resist the
pull of a far away value, but the mean is drawn to such
values.
A change in any of the numbers changes the mean,
and the mean can be changed drastically by changing
an extreme value.
In contrast, the median and the mode of a set of data
are usually not changed by changing an extreme value.
The mean, the median, and the mode are all averages;
however, they are generally not equal.
19. Example
Which measure of center is most useful?
A teacher wants to know about her students family
situation. She asks for the number of children in their
families:
6 3 2 3 4 1 2 2 4 3 1 2 2 4
A shoe manufacturer wants to know the average shoe size
of women.
Another teacher wants to know how well her class
performed in a long test.
20. Compare the mean, the median, and the mode for the salaries of
5 employees of a small company.
Salaries: P370,000 P60,000 P36,000 P20,000 P20,000
Mean = P101,200
Median = P 36,000
Mode = P 20,000
Most of the employees of this company would probably agree
that the median of P36,000 better represents the average of the
salaries than does either the mean or the mode.
22. Types of Measures of Dispersion or Variability
Another important feature that can help us understand more about a
data set is the manner in which the data are distributed.
Range is the difference between the largest value (maximum) and
the smallest value (minimum) in the data.
Standard deviation is an extremely important measure of spread
that is based on the mean. It is a measure of the average deviation
for all of the data point from the mean.
Variance is the square of the standard deviation of the data. It does
not use the same unit of measure as the original data.
23. Illustration in Computing
EXAMPLE: A consumer group has tested a sample of 8
size-AAA batteries from each of 3 companies. The
results of the tests are shown in the following table.
According to these tests, which company produces
batteries for which the values representing hours of
constant use have the smallest standard deviation?
Ans. Company Dependable