Introduction to
statistics
By Dr. Amira Talic
What is “Statistics”?
•
•
•
•
•
•

•Statistics is the science of data that involves:
•Collecting
•Classifying
•Summarizing
•Organizing and
•Interpretation

•
•
•
•
•
•
•

Of numerical information.
•Examples:
•Cricket batting averages
•Stock price
•Climatology data such as rainfall amounts, average temperatures
•Marketing information
•Gambling?
Key Terms
• What is Data?
facts or information that is relevant or
appropriate to a decision maker
• Population?
•the totality of objects under consideration
• Sample?
•a portion of the population that is selected
for analysis
Key Terms
• Parameter?
a summary measure (e.g., mean) that is
computed to describe a characteristic of
the population
• Statistic?
a summary measure (e.g., mean) that is
computed to describe a characteristic of
the sample
Variables
• Traits or characteristics that can change
values from case to case.
• A variable is what is measured or
manipulated in an experiment
•Examples:
•Age
•Gender
•Income
•Social class
Types Of Variables
• In causal relationships:
• CAUSE =>EFFECT
independent variable & dependent variable
•Independent variable: is a variable that can be
controlled or manipulated.
An independent variable is the variable you have control
over (dose of drug)
•Dependent variable: is a variable that cannot be
controlled or manipulated. Its values are predicted
from the independent variable ( effect on the
condition)
Types Of Variables
•Discrete variables are measured in units
that cannot be subdivided. Example:
Number of children
•Continuous variables are measured in a
unit that can be subdivided infinitely.
Example: Height
Statistical analysis
• Descriptive Statistics
• Inferential statistics
• Predictive modeling
Descriptive Statistics
•Gives us the overall picture about data
•Presents data in the form of tables, charts and
graphs
•Includes summary data
•Avoids inferences
Examples:
•Measures of central location
Mean, median, mode and midrange
•Measures of Variation
•Variance, Standard Deviation, z-scores
Inferential Statistics
•Take decision on overall population using a
sample
• “Sampled” data are incomplete but can still
be representative of the population
•Permits the making of generalizations
(inferences) about the data
• Probability theory is a major tool used
to analyze sampled data
Predictive Modeling
• The science of predicting future outcomes
based on historical events.
• Model Building: “Developing set of
equations or mathematical formulation to
forecast future behaviors based on current
or historical data.”
• Regression, logistic Regression, time
series analysis etc.,
Calculation of the probability
• Based on the characteristics of the
population for the observed parameter
• (e.g. . Duration of the pregnancy, duration
of the first labor stage, height, et cetera)
• To describe the population, “distribution”
will be used
Distribution
• A statistical distribution describes the
numbers of times each possible outcome
occurs in a sample
• Distributions for continuous variables are
called continuous distributions ( e.g.
height)
• They also carry the fancier
name probability density
Distribution
• Some probability densities have particular
importance in statistics. A very important
one is shaped like a bell, and called
the normal ( Gaussian) distribution.
• Many naturally-occurring phenomena can
be approximated surprisingly well by this
distribution. It will serve to illustrate some
features of all continuous distributions.
Gaussian distribution
What are the Components of A
Distribution?
• Measures of central tendency
• Suppose we have a sample with 4
observations: 4, 1, 4, 3
• Mean = the sum of a set of numbers divided
by the number of observations
(4+1+4+3=12:4=3)
Median - the middle point of a set of
numbers(3.5)
Components of distribution
• Mode - the most frequently occurring
number. Mode=4
• Median - the middle point of a set of
numbers(3.5)
Components of distribution
Measures of variation
Range - the maximum value minus the
minimum value in a set of numbers.
Range = 4-1 = 3
Standard Deviation - the average
distance a data point is away from the
mean.
[ (4- 3)+( 1 -3)+ (4- 3)+ (3- 3)]: 4=1
standard deviation= 1
Standard deviation
Why to know about it ?
• Mean, Median, Mode, Range, and
Standard Deviations are measurements in
a sample (statistics) and can
also be used to make inferences on a
population.
What do we expect from the
statistical analysis?
• To find out whether there is a statistically
significant difference between our sample
(e.g. pregnancy loss in Al Ain Hospital
Patient) and general population
How to perform the statistical
analysis?
• Statistics can take us to a beautiful journey
of understanding ,but
Festina lente! make haste slowly
Introduction to statistics

Introduction to statistics

  • 1.
  • 2.
    What is “Statistics”? • • • • • • •Statisticsis the science of data that involves: •Collecting •Classifying •Summarizing •Organizing and •Interpretation • • • • • • • Of numerical information. •Examples: •Cricket batting averages •Stock price •Climatology data such as rainfall amounts, average temperatures •Marketing information •Gambling?
  • 3.
    Key Terms • Whatis Data? facts or information that is relevant or appropriate to a decision maker • Population? •the totality of objects under consideration • Sample? •a portion of the population that is selected for analysis
  • 4.
    Key Terms • Parameter? asummary measure (e.g., mean) that is computed to describe a characteristic of the population • Statistic? a summary measure (e.g., mean) that is computed to describe a characteristic of the sample
  • 5.
    Variables • Traits orcharacteristics that can change values from case to case. • A variable is what is measured or manipulated in an experiment •Examples: •Age •Gender •Income •Social class
  • 6.
    Types Of Variables •In causal relationships: • CAUSE =>EFFECT independent variable & dependent variable •Independent variable: is a variable that can be controlled or manipulated. An independent variable is the variable you have control over (dose of drug) •Dependent variable: is a variable that cannot be controlled or manipulated. Its values are predicted from the independent variable ( effect on the condition)
  • 7.
    Types Of Variables •Discretevariables are measured in units that cannot be subdivided. Example: Number of children •Continuous variables are measured in a unit that can be subdivided infinitely. Example: Height
  • 8.
    Statistical analysis • DescriptiveStatistics • Inferential statistics • Predictive modeling
  • 9.
    Descriptive Statistics •Gives usthe overall picture about data •Presents data in the form of tables, charts and graphs •Includes summary data •Avoids inferences Examples: •Measures of central location Mean, median, mode and midrange •Measures of Variation •Variance, Standard Deviation, z-scores
  • 10.
    Inferential Statistics •Take decisionon overall population using a sample • “Sampled” data are incomplete but can still be representative of the population •Permits the making of generalizations (inferences) about the data • Probability theory is a major tool used to analyze sampled data
  • 11.
    Predictive Modeling • Thescience of predicting future outcomes based on historical events. • Model Building: “Developing set of equations or mathematical formulation to forecast future behaviors based on current or historical data.” • Regression, logistic Regression, time series analysis etc.,
  • 12.
    Calculation of theprobability • Based on the characteristics of the population for the observed parameter • (e.g. . Duration of the pregnancy, duration of the first labor stage, height, et cetera) • To describe the population, “distribution” will be used
  • 13.
    Distribution • A statisticaldistribution describes the numbers of times each possible outcome occurs in a sample • Distributions for continuous variables are called continuous distributions ( e.g. height) • They also carry the fancier name probability density
  • 14.
    Distribution • Some probabilitydensities have particular importance in statistics. A very important one is shaped like a bell, and called the normal ( Gaussian) distribution. • Many naturally-occurring phenomena can be approximated surprisingly well by this distribution. It will serve to illustrate some features of all continuous distributions.
  • 15.
  • 16.
    What are theComponents of A Distribution? • Measures of central tendency • Suppose we have a sample with 4 observations: 4, 1, 4, 3 • Mean = the sum of a set of numbers divided by the number of observations (4+1+4+3=12:4=3) Median - the middle point of a set of numbers(3.5)
  • 17.
    Components of distribution •Mode - the most frequently occurring number. Mode=4 • Median - the middle point of a set of numbers(3.5)
  • 18.
    Components of distribution Measuresof variation Range - the maximum value minus the minimum value in a set of numbers. Range = 4-1 = 3 Standard Deviation - the average distance a data point is away from the mean. [ (4- 3)+( 1 -3)+ (4- 3)+ (3- 3)]: 4=1 standard deviation= 1
  • 19.
  • 20.
    Why to knowabout it ? • Mean, Median, Mode, Range, and Standard Deviations are measurements in a sample (statistics) and can also be used to make inferences on a population.
  • 21.
    What do weexpect from the statistical analysis? • To find out whether there is a statistically significant difference between our sample (e.g. pregnancy loss in Al Ain Hospital Patient) and general population
  • 22.
    How to performthe statistical analysis? • Statistics can take us to a beautiful journey of understanding ,but
  • 23.
    Festina lente! makehaste slowly