Unit I
Review of Basic Statistics &
Introduction to Scientific Method
Statistical Terms
Statistics
The word statistics is derived from the
Latin word status meaning “state”. The
word statistics has two basic meanings:
• we use this word when referring to
actual numbers derived from data
• We use statistics as a method of
analysis
• Statistics - a collection of quantitative
data such as statistics of crimes,
statistics on enrolment, statistics on
unemployment, etc.
• Statistics - a science which deals with the
collection, presentation, analysis and
interpretation of quantitave
data/information.
Uses of Statistics
Statistics involves much more than the
simple collection, tabulation and
summarizing of data. It is also a tool that
helps us develop meaningful conclusions
that go beyond the original data.
Uses of Statistics – some examples
• Forecasting
weather - analyzing weather data
election outcomes – survey
condition of the economy in the next
period – observation on the various
indices of economy
• Predicting product preferences (consumers
are taken as samples to provide information)
• Reporting enrolment, fish/crop production
among others
Uses of Statistics – some examples ... Cont’d
• Product quality - sampling
• Interpret research results
• Evaluate statistics used everyday
• Presentation of data to audiences
• Formulate and test hypothesis
• Evaluation of technologies (e.g.
fishing gears, varieties, anti-pollution
controls, teaching techniques)
• Analysis of students’ performance
and behaviors
Uses of Statistics… summary
• Provides tools that you need in order to
react intelligently to information you
hear or read.
• Thus, statistics is one of the most
important subject matter that you ever
study.
Basic Statistical Terms . . . Cont’d
1. Population
Types:
• Finite population – can be counted with
relative ease and the number obtained is
finite.
• Infinite population – can not be counted
easily because of the large number involved.
Basic Statistical Terms . . . Cont’d
2. Sample
Is it always possible to gather data from
the entire population?
Actually it is not always possible and it is
even impractical to gather data from a
large population. Researchers resort to
studying a part of a population called
sample.
Basic Statistical Terms . . . Cont’d
2. Sample ... Cont’d
Sample – part/representative of the
population or a sub-collection of
elements drawn from a population.
To make it representative of a population,
samples are usually drawn using the
process of randomization, i.e., all the
members are given an equal chance of
being drawn as a sample.
Basic Statistical Terms . . . Cont’d
3. Parameter and Statistic
Parameter – a numerical measurement
describing some characteristic of a
population (data obtained from a
population). Designated by:
µ (mu) – population mean
σ (sigma) – standard deviation
Basic Statistical Terms . . . Cont’d
3. Parameter and Statistic
Statistic – a numerical measurement
describing some characteristic of a
sample . Designated by:
– sample mean
SD– standard deviation
Basic Statistical Terms . . . Cont’d
4. Variable
Variable – a characteristic of interest
measurable on each and every individual
in the universe denoted by any capital
letter in the English alphabet.
Types: Qualitative and Quantitative
Basic Statistical Terms . . . Cont’d
4. Variable ... Cont’d
Qualitative variable – consists of
categories or attributes, which have non-
numerical characteristics
Examples: classification, year level, sex,
subjects enrolled, emotional condition
(happy or sad), position towards the
instructor (like or dislike), gender (male
or female)
Basic Statistical Terms . . . Cont’d
4. Variable ... Cont’d
Quantitative variable – consists of
numbers representing counts or
measurements (e.g. age, height, weight,
scores, number of books, family size)
Types:
•Discrete
•Continuous
Basic Statistical Terms . . . Cont’d
4. Variable ... Cont’d
Discrete quantitative variable – data
obtained through counting; can not be
subdivided. These are expressed as
whole number and are always exact.
Basic Statistical Terms . . . Cont’d
4. Variable ... Cont’d
Examples: number of students,
number of books, number of patients,
number of children in a party, number
of equipment, number of trainings
attended
Measures of Central Tendency
• Mean (arithmetic) – the sum of the
individual observed value divided by
the total number of observations.
 point within the range about which the
rest of the data is considered balanced
(balance point or fulcrum in a
distribution of observations) – affected
by the value of each observation
Example: What is the mean age of 5 students?
Given: 16, 19.5, 18, 17, 18.5
Sol’n: Mean = 16 + 19.5 + 18 + 17 + 18.5
5
= 89 = 17.8 years
5
• Median – is the midpoint of the
distribution.
Example: Find the median of the following
observations?
Given:8, 4, 1, 3, 5 and 7
To find the median, first array the set of
observations - 1, 3, 4, 7, 8
The median is the middlemost item: Me = 4
• Mode – the value that appears with the
highest (greatest) frequency. That is, the
value that appears most often
Example: Find the mode of the following
distributions?
Given: 5, 8, 10, 5, 3, 5, 2, 5, 7
To find the mode, first array the distribution: 2,
3, 5, 5, 5, 5, 7, 8, 10
The mode is 5
Which average should you use?
With a bit experience, you can readily determine
which measure of central tendency is appropriate
to a given situation
• Arithmetic mean or simply mean is the most
commonly used method – it considers, for
example, the average amount of product
consumed by a user, it is indispensable in business
and commerce.
• Knowing the mean of a distribution also permits
one to compare different frequency distributions
Which average should you use? … cont’d
Median – If you want to know a typical
observation in a distribution, particularly if it is
skewed, the median proves to be a better
measure than the mean.
• Income is the most common example of a
distribution that is typically skewed.
• For income, arithmetic mean is nearly always
inflated.
• Thus median is a good choice because it is not
affected by extreme values.
Which average should you use?... cont’d
• Example 1 – annual salary of employees: 30,000;
45,000; 45,000; and 200,000.
 Median - 45,000
Mode - 45,000
Mean - 73,000 (does not match any of the
salaries)
• Example 2 – Suppose a stock clerk who handles
different sizes of crutches wants to know which is
the most popular size (mode) so that he can order
enough to meet the demand
Which average should you use? … cont’d
General rule:
Mode – nominal scores (numbers which do
not represent an amount or quantity; names
or identifiers of a category or attribute) e.g. zip
code, size of shoes, license number
Median – ordinal scores, are numbers which
represent an ordered series of relationships
e.g. 1st, 2nd, 3rd, etc. Indicates position in an
ordered series by does not say anything about
the magnitude of difference between any two
consecutive entries
Which average should you use?
General rule:
Mean – interval scale and ratio scale
Interval scale has equal units but an arbitrary
zero point, may be added or subtracted but
may not be multiplied or divided, e.g.,
temperature.
Ratio scale, e.g., height or weight, can be
compared meaningfully with one another (e.g.
50 kg is twice 25 kg)
Measures of Variation
• Knowing a distribution’s central tendency is
helpful, but it is not enough.
• Also important to know whether the
observations tend to be very similar
(homogeneous) or they vary considerably
(heterogeneous).
• Common measures of variation – range, mean
deviation, standard deviation and coefficient of
variation
Measures of Variation … cont’d
• Range = Xmax – Xmin (difference between the
highest and lowest value or observation
• Can be computed quickly but is not very useful
because it considers only the extremes and
does not take into consideration the bulk of the
observation
Measures of Variation . . . Cont’d
• Mean Deviation – a bit more sophisticated than
the range
• Defined as the average deviation of all
observations from the mean
Measures of Variation… cont’d
• Standard deviation – represented by the
symbol s. It is the square root of the variance of
the observations
• Variance is computed by squaring each
deviation from the mean, adding them up and
dividing their sum by n-1, where n is the
sample size
Measures of Variation … cont’d
• Coefficient of variation – one application of the
mean and the standard deviation
• Defined as the ratio of the standard deviation
to the absolute of the mean expressed as a
percentage
• Depicts the size of the standard deviation
relative to its mean.
• Both standard deviation and the mean
represent the same units hence CV is free of
the measurement units of the original data.
Therefore, it is possible to compare the relative
variation of even unrelated quantities.
Concluding Statements
• In describing data by using summary measure,
it is important to select the measure of central
tendency that most accurately represents the
data
• A better way of summarizing data is to use two
summary measures – one to indicate the
central tendency and one to indicate the
variation
• The most commonly used pair is arithmetic
mean and the standard deviation

Topic-1-Review-of-Basic-Statistics.pptx

  • 1.
    Unit I Review ofBasic Statistics & Introduction to Scientific Method
  • 2.
    Statistical Terms Statistics The wordstatistics is derived from the Latin word status meaning “state”. The word statistics has two basic meanings: • we use this word when referring to actual numbers derived from data • We use statistics as a method of analysis
  • 3.
    • Statistics -a collection of quantitative data such as statistics of crimes, statistics on enrolment, statistics on unemployment, etc. • Statistics - a science which deals with the collection, presentation, analysis and interpretation of quantitave data/information.
  • 4.
    Uses of Statistics Statisticsinvolves much more than the simple collection, tabulation and summarizing of data. It is also a tool that helps us develop meaningful conclusions that go beyond the original data.
  • 5.
    Uses of Statistics– some examples • Forecasting weather - analyzing weather data election outcomes – survey condition of the economy in the next period – observation on the various indices of economy • Predicting product preferences (consumers are taken as samples to provide information) • Reporting enrolment, fish/crop production among others
  • 6.
    Uses of Statistics– some examples ... Cont’d • Product quality - sampling • Interpret research results • Evaluate statistics used everyday • Presentation of data to audiences • Formulate and test hypothesis • Evaluation of technologies (e.g. fishing gears, varieties, anti-pollution controls, teaching techniques) • Analysis of students’ performance and behaviors
  • 7.
    Uses of Statistics…summary • Provides tools that you need in order to react intelligently to information you hear or read. • Thus, statistics is one of the most important subject matter that you ever study.
  • 8.
    Basic Statistical Terms. . . Cont’d 1. Population Types: • Finite population – can be counted with relative ease and the number obtained is finite. • Infinite population – can not be counted easily because of the large number involved.
  • 9.
    Basic Statistical Terms. . . Cont’d 2. Sample Is it always possible to gather data from the entire population? Actually it is not always possible and it is even impractical to gather data from a large population. Researchers resort to studying a part of a population called sample.
  • 10.
    Basic Statistical Terms. . . Cont’d 2. Sample ... Cont’d Sample – part/representative of the population or a sub-collection of elements drawn from a population. To make it representative of a population, samples are usually drawn using the process of randomization, i.e., all the members are given an equal chance of being drawn as a sample.
  • 11.
    Basic Statistical Terms. . . Cont’d 3. Parameter and Statistic Parameter – a numerical measurement describing some characteristic of a population (data obtained from a population). Designated by: µ (mu) – population mean σ (sigma) – standard deviation
  • 12.
    Basic Statistical Terms. . . Cont’d 3. Parameter and Statistic Statistic – a numerical measurement describing some characteristic of a sample . Designated by: – sample mean SD– standard deviation
  • 13.
    Basic Statistical Terms. . . Cont’d 4. Variable Variable – a characteristic of interest measurable on each and every individual in the universe denoted by any capital letter in the English alphabet. Types: Qualitative and Quantitative
  • 14.
    Basic Statistical Terms. . . Cont’d 4. Variable ... Cont’d Qualitative variable – consists of categories or attributes, which have non- numerical characteristics Examples: classification, year level, sex, subjects enrolled, emotional condition (happy or sad), position towards the instructor (like or dislike), gender (male or female)
  • 15.
    Basic Statistical Terms. . . Cont’d 4. Variable ... Cont’d Quantitative variable – consists of numbers representing counts or measurements (e.g. age, height, weight, scores, number of books, family size) Types: •Discrete •Continuous
  • 16.
    Basic Statistical Terms. . . Cont’d 4. Variable ... Cont’d Discrete quantitative variable – data obtained through counting; can not be subdivided. These are expressed as whole number and are always exact.
  • 17.
    Basic Statistical Terms. . . Cont’d 4. Variable ... Cont’d Examples: number of students, number of books, number of patients, number of children in a party, number of equipment, number of trainings attended
  • 18.
    Measures of CentralTendency • Mean (arithmetic) – the sum of the individual observed value divided by the total number of observations.  point within the range about which the rest of the data is considered balanced (balance point or fulcrum in a distribution of observations) – affected by the value of each observation Example: What is the mean age of 5 students? Given: 16, 19.5, 18, 17, 18.5 Sol’n: Mean = 16 + 19.5 + 18 + 17 + 18.5 5 = 89 = 17.8 years 5
  • 19.
    • Median –is the midpoint of the distribution. Example: Find the median of the following observations? Given:8, 4, 1, 3, 5 and 7 To find the median, first array the set of observations - 1, 3, 4, 7, 8 The median is the middlemost item: Me = 4
  • 20.
    • Mode –the value that appears with the highest (greatest) frequency. That is, the value that appears most often Example: Find the mode of the following distributions? Given: 5, 8, 10, 5, 3, 5, 2, 5, 7 To find the mode, first array the distribution: 2, 3, 5, 5, 5, 5, 7, 8, 10 The mode is 5
  • 21.
    Which average shouldyou use? With a bit experience, you can readily determine which measure of central tendency is appropriate to a given situation • Arithmetic mean or simply mean is the most commonly used method – it considers, for example, the average amount of product consumed by a user, it is indispensable in business and commerce. • Knowing the mean of a distribution also permits one to compare different frequency distributions
  • 22.
    Which average shouldyou use? … cont’d Median – If you want to know a typical observation in a distribution, particularly if it is skewed, the median proves to be a better measure than the mean. • Income is the most common example of a distribution that is typically skewed. • For income, arithmetic mean is nearly always inflated. • Thus median is a good choice because it is not affected by extreme values.
  • 23.
    Which average shouldyou use?... cont’d • Example 1 – annual salary of employees: 30,000; 45,000; 45,000; and 200,000.  Median - 45,000 Mode - 45,000 Mean - 73,000 (does not match any of the salaries) • Example 2 – Suppose a stock clerk who handles different sizes of crutches wants to know which is the most popular size (mode) so that he can order enough to meet the demand
  • 24.
    Which average shouldyou use? … cont’d General rule: Mode – nominal scores (numbers which do not represent an amount or quantity; names or identifiers of a category or attribute) e.g. zip code, size of shoes, license number Median – ordinal scores, are numbers which represent an ordered series of relationships e.g. 1st, 2nd, 3rd, etc. Indicates position in an ordered series by does not say anything about the magnitude of difference between any two consecutive entries
  • 25.
    Which average shouldyou use? General rule: Mean – interval scale and ratio scale Interval scale has equal units but an arbitrary zero point, may be added or subtracted but may not be multiplied or divided, e.g., temperature. Ratio scale, e.g., height or weight, can be compared meaningfully with one another (e.g. 50 kg is twice 25 kg)
  • 26.
    Measures of Variation •Knowing a distribution’s central tendency is helpful, but it is not enough. • Also important to know whether the observations tend to be very similar (homogeneous) or they vary considerably (heterogeneous). • Common measures of variation – range, mean deviation, standard deviation and coefficient of variation
  • 27.
    Measures of Variation… cont’d • Range = Xmax – Xmin (difference between the highest and lowest value or observation • Can be computed quickly but is not very useful because it considers only the extremes and does not take into consideration the bulk of the observation
  • 28.
    Measures of Variation. . . Cont’d • Mean Deviation – a bit more sophisticated than the range • Defined as the average deviation of all observations from the mean
  • 29.
    Measures of Variation…cont’d • Standard deviation – represented by the symbol s. It is the square root of the variance of the observations • Variance is computed by squaring each deviation from the mean, adding them up and dividing their sum by n-1, where n is the sample size
  • 30.
    Measures of Variation… cont’d • Coefficient of variation – one application of the mean and the standard deviation • Defined as the ratio of the standard deviation to the absolute of the mean expressed as a percentage • Depicts the size of the standard deviation relative to its mean. • Both standard deviation and the mean represent the same units hence CV is free of the measurement units of the original data. Therefore, it is possible to compare the relative variation of even unrelated quantities.
  • 31.
    Concluding Statements • Indescribing data by using summary measure, it is important to select the measure of central tendency that most accurately represents the data • A better way of summarizing data is to use two summary measures – one to indicate the central tendency and one to indicate the variation • The most commonly used pair is arithmetic mean and the standard deviation