SlideShare a Scribd company logo
1 of 77
Introduction to Probability and Statistics
                           9th Week (5/10)



   1. Descriptive Statistics
        2. Sampling Theory
Probability is a science of (   ).


Statistics is a science of (    ).
Probability is the Science of Uncertainty.

 It is used by Physicists to predict the behaviour of
  elementary particles.
 It is used by engineers to build computers.
 It is used by economists to predict the behaviour of the
  economy.
 It is used by stockbrokers to make money on the
  stockmarket.
 It is used by psychologists to determine if you should get
  that job.
What about Statistics?

 Statistics is the Science of Data.

 There are two kinds of statistics

    Descriptive Statistics: Discipline of quantitatively
     describing the main features of a collection of data

    Inferential Statistics: It is a discipline that allows us
     to estimate unknown quantities by making some
     elementary measurements. Using these estimates we
     can then make Predictions and Forecast the Future
Descriptive Statistics

• Describing data with tables and graphs
   (quantitative or categorical variables)

• Numerical descriptions of center, variability,
  position (quantitative variables)

• Bivariate descriptions (In practice, most studies
  have several variables)
1. Tables and Graphs

Frequency distribution: Lists possible values of
  variable and number of times each occurs

Example: Student survey (n = 60)

“political ideology” measured as ordinal variable
  with 1 = very liberal, …, 4 = moderate, …, 7 =
  very conservative
Histogram: Bar graph of
frequencies or percentages
Shapes of histograms
          (for quantitative variables)

•   Bell-shaped (IQ, SAT, political ideology in all U.S. )
•   Skewed right (annual income, no. times arrested)
•   Skewed left (score on easy exam)
•   Bimodal (polarized opinions)
Stem-and-leaf plot            (John Tukey, 1977)


Example: Exam scores (n = 40 students)

Stem   Leaf
3       6
4
5       37
6       235899
7       011346778999
8       00111233568889
9       02238
2.Numerical descriptions
Let y denote a quantitative variable, with
    observations y1 , y2 , y3 , … , yn


a. Describing the center

Median: Middle measurement of ordered sample

Mean:
              y1 + y2 + ... + yn Σyi
           y=                   =
                      n           n
Properties of mean and median
• For symmetric distributions, mean = median
• For skewed distributions, mean is drawn in
  direction of longer tail, relative to median
• Mean valid for interval scales, median for
  interval or ordinal scales
• Mean sensitive to “outliers” (median often
  preferred for highly skewed distributions)
• When distribution symmetric or mildly skewed or
  discrete with few values, mean preferred
  because uses numerical values of observations
Examples:

• New York Yankees baseball team, 2006
     mean salary = $7.0 million
    median salary = $2.9 million

     How possible? Direction of skew?

• Give an example for which you would expect

           mean < median
b. Describing variability

Range: Difference between largest and smallest
  observations
(but highly sensitive to outliers, insensitive to shape)

Standard deviation: A “typical” distance from the mean

The deviation of observation i from the mean is


                       yi − y
The variance of the n observations is



    Σ ( yi − y ) ( y1 − y ) + ... + ( yn − y )
                   2                2                     2
s =
 2
                =
        n−1                  n−1
The standard deviation s is the square root of the variance,


         s = s           2
Example: Political ideology
 • For those in the student sample who attend religious
   services at least once a week (n = 9 of the 60),
 • y = 2, 3, 7, 5, 6, 7, 5, 6, 4

     y = 5.0,
          (2 − 5) 2 + (3 − 5) 2 + ... + (4 − 5) 2 24
     s2 =                                        =   = 3.0
                          9 −1                     8
     s = 3.0 = 1.7

For entire sample (n = 60), mean = 3.0, standard deviation = 1.6,
tends to have similar variability but be more liberal
c. Measures of position
pth percentile: p percent of observations
   below it, (100 - p)% above it.

 p = 50: median
 p = 25: lower quartile (LQ)
 p = 75: upper quartile (UQ)

 Interquartile range IQR = UQ - LQ
Quartiles portrayed graphically by box plots
    (John Tukey)
 Example: weekly TV watching for n=60 from
 student survey data file, 3 outliers
Box plots have box from LQ to UQ, with
  median marked. They portray a five-
  number summary of the data:
   Minimum, LQ, Median, UQ, Maximum
except for outliers identified separately

Outlier = observation falling
          below LQ – 1.5(IQR)
or       above UQ + 1.5(IQR)

Ex. If LQ = 2, UQ = 10, then IQR = 8 and
 outliers above 10 + 1.5(8) = 22
3. Bivariate description
• Usually we want to study associations between two or
  more variables (e.g., how does number of close
  friends depend on gender, income, education, age,
  working status, rural/urban, religiosity…)
• Response variable: the outcome variable
• Explanatory variable(s): defines groups to compare

Ex.: number of close friends is a response variable,
  while gender, income, … are explanatory variables

Response var. also called “dependent variable”
Explanatory var. also called “independent variable”
Summarizing associations:
• Categorical var’s: show data using contingency tables
• Quantitative var’s: show data using scatterplots
• Mixture of categorical var. and quantitative var. (e.g.,
  number of close friends and gender) can give
  numerical summaries (mean, standard deviation) or
  side-by-side box plots for the groups

• Ex. General Social Survey (GSS) data
  Men:    mean = 7.0, s = 8.4
  Women: mean = 5.9, s = 6.0
Shape? Inference questions for later chapters?
Example: Income by highest degree
Contingency Tables

• Cross classifications of categorical variables in
  which rows (typically) represent categories of
  explanatory variable and columns represent
  categories of response variable.

• Counts in “cells” of the table give the numbers of
  individuals at the corresponding combination of
  levels of the two variables
Happiness and Family Income
(GSS 2008 data: “happy,” “finrela”)

                     Happiness
Income       Very Pretty Not too               Total
             -------------------------------
 Above Aver. 164        233         26         423
 Average      293       473        117         883
 Below Aver. 132        383        172         687
              ------------------------------
Total          589 1089            315         1993
Can summarize by percentages on response
 variable (happiness)

Example: Percentage “very happy” is

39% for above aver. income (164/423 = 0.39)
33% for average income (293/883 = 0.33)
19% for below average income (??)
Happiness
Income      Very         Pretty       Not too             Total
          --------------------------------------------
 Above 164 (39%) 233 (55%) 26 (6%)                        423
 Average 293 (33%) 473 (54%) 117 (13%)                    883
 Below    132 (19%) 383 (56%) 172 (25%)                   687
         ----------------------------------------------

Inference questions for later chapters? (i.e., what can
  we conclude about the corresponding population?)
Scatterplots (for quantitative variables)
  plot response variable on vertical axis,
       explanatory variable on horizontal axis

Example: Table 9.13 (p. 294) shows UN data for several
  nations on many variables, including fertility (births per
  woman), contraceptive use, literacy, female economic
  activity, per capita gross domestic product (GDP), cell-
  phone use, CO2 emissions

Data available at
 http://www.stat.ufl.edu/~aa/social/data.html
Example: Survey in Alachua County, Florida,
 on predictors of mental health
(data for n = 40 on p. 327 of text and at
  www.stat.ufl.edu/~aa/social/data.html)

y = measure of mental impairment (incorporates various
   dimensions of psychiatric symptoms, including aspects of
   depression and anxiety)
  (min = 17, max = 41, mean = 27, s = 5)

x = life events score (events range from severe personal
   disruptions such as death in family, extramarital affair, to
   less severe events such as new job, birth of child, moving)
  (min = 3, max = 97, mean = 44, s = 23)
Bivariate data from 2000 Presidential election
Butterfly ballot, Palm Beach County, FL, text p.290
Example: The Massachusetts Lottery
           (data for 37 communities)



% income
spent on
lottery




               Per capita income
Correlation describes strength of
              association
• Falls between -1 and +1, with sign indicating direction
  of association (formula later in Chapter 9)

The larger the correlation in absolute value, the stronger
  the association (in terms of a straight line trend)

Examples: (positive or negative, how strong?)
Mental impairment and life events, correlation =
GDP and fertility, correlation =
GDP and percent using Internet, correlation =
Inferential Statistics: Fortune Teller

 How can she read the future?
    Analysis of Data from Her Previous Victims (Clients)
    Make Hypotheses
    Test Them
    Fool You!
Population and Sample
- Often in practice we are interested in drawing valid conclusions
  about a large group of individuals or objects.

- Instead of examining the entire group, called the population, which
  may be difficult or impossible to do, we may examine only a small
  part of this population, which is called a sample.

- The process of obtaining samples is called sampling.

                                      Sampling




                population

                                                 Sample
Statistical Inference
- We do this with the aim of inferring certain facts about the
  population from results found in the sample, a process known as
  statistical inference.
Sampling With and Without Replacement
- Population may be finite or infinite.

- If finite, Sampling method is important.

- If we draw an object from an urn, we have the choice of replacing
  or not replacing the object into the urn before we draw again.

- Sampling with replacement: Sampling where each member of a
  population may be chosen more than once

- Sampling without replacement: sampling where each member
  cannot be chosen more than once

- A finite population that is sampled with replacement can
  theoretically be considered infinite since samples of any size can
  be drawn without exhausting the population.
Random Samples, Random Numbers
- Clearly, the reliability of conclusions drawn concerning a population
  depends on whether the sample is properly chosen so as to
  represent the population sufficiently well, and one of the important
  problems of statistical inference is just how to choose a sample.

- One way to do this for finite populations is to make sure that each
  member of the population has the same chance of being in the
  sample, which is then often called a random sample.

- Random sampling can be accomplished for relatively small
  populations by drawing lots or, equivalently, by using a table of
  random numbers specially constructed for such purposes.

- Because inference from sample to population cannot be
  certain, we must use the language of probability in any
  statement of conclusions.
Random Samples, Random Numbers
Population Parameters
- A population is considered to be known when we know the
  probability distribution f (x) (probability function or density function)
  of the associated random variable X.

- If X is a random variable whose values are the heights (or weights)
  of the 12,000 students, then X has a probability distribution f (x).

- If, for example, X is normally distributed, we say that the population
  is normally distributed or that we have a normal population.
  Similarly, if X is binomially distributed, we say that the population is
  binomially distributed or that we have a binomial population.

- There will be certain quantities that appear in f(x), such as µ and σ
  in the case of the normal distribution or p in the case of the
  binomial distribution.

- Other quantities such as the median, moments, and skewness can
  then be determined in terms of these.
Population Parameters
- All such quantities are often called population parameters.

- When we are given the population so that we know f(x), then the
  population parameters are also known.

- An important problem arises when the probability distribution
  f(x) of the population is not known precisely, although we may
  have some idea of, or at least be able to make some hypothesis
  concerning, the general behavior of f(x).

- For example, we may have some reason to suppose that a
  particular population is normally distributed.

- In that case we may not know one or both of the values and so we
  might wish to draw statistical inferences about them.
Population Parameters
Sample Statistics
- We can take random samples from the population and then use
  these samples to obtain values that serve to estimate and test
  hypotheses about the population parameters.
Sample Statistics
Sample statistics /
          Population parameters

• We distinguish between summaries of samples
  (statistics) and summaries of populations
  (parameters).

• Common to denote statistics by Roman letters,
  parameters by Greek letters:
Sample Statistics
- In general, corresponding to each population parameter there will
  be a statistic to be computed from the sample.

- Usually the method for obtaining this statistic from the sample is
  similar to that for obtaining the parameter from a finite population,
  since a sample consists of a finite set of values.

- As we shall see, however, this may not always produce the “best
  estimate,” and one of the important problems of sampling theory is
  to decide how to form the proper sample statistic that will best
  estimate a given population parameter.

- Where possible we shall try to use Greek letters, such µ as σ and ,
  for values of population parameters, and Roman letters, m, s, etc.,
  for values of corresponding sample statistics.
Sampling Distribution
- As we have seen, a sample statistic that is computed from X1, . . . ,
  Xn is a function of these random variables and is therefore itself a
  random variable.

- The probability distribution of a sample statistic is often called the
  sampling distribution of the statistic.

- Alternatively we can consider all possible samples of size n that
  can be drawn from the population, and for each sample we
  compute the statistic.

- In this manner we obtain the distribution of the statistic, which is its
  sampling distribution.

- For a sampling distribution, we can of course compute a mean,
  variance, standard deviation, moments, etc.

- The standard deviation is sometimes also called the standard error.
The Sample Mean
Sampling Distribution of Means
Sampling Distribution of Means
Sampling Distribution of Means
Sampling Distribution of Proportions
 - Suppose that a population is infinite and binomially distributed, with
   p and q = 1- p being the respective probabilities that any given
   member exhibits or does not exhibit a certain property.

 - Consider all possible samples of size n drawn from this population,
   and for each sample determine the statistic that is the proportion P
   of successes.

 - In the case of the coin, p would be the proportion of heads turning
   up in n tosses. Then we obtain a sampling distribution of
   proportions whose mean µp and standard deviation σp are given by




- For finite populations in which sampling is without replacement, the
  second equation in (9) is replaced by as given by (6) with
Sampling Distribution of Proportions
Example 6. Find the probability that in 120 tosses of a fair coin (a) between 40%
and 60% will be heads, (b) or more will be heads.
Example 6. (Continue)
Sampling Distribution of Differences
            and Sums
Sampling Distribution of Differences
            and Sums
Example 7. The electric light bulbs of manufacturer A have a mean lifetime of
1400 hours with a standard deviation of 200 hours, while those of manufacturer
B have a mean lifetime of 1200 hours with a standard deviation of 100 hours. If
random samples of 125 bulbs of each brand are tested, what is the probability
that the brand A bulbs will have a mean lifetime that is at least (a) 160 hours, (b)
250 hours more than the brand B bulbs?
Example 7. (Answer)
Example 8. Ball bearings of a given brand weigh 0.50 oz with a standard deviation of
0.02 oz. What is the probability that two lots, of 1000 ball bearings each, will differ in
weight by more than 2 oz?
The Sample Variance
The Sample Variance
Sampling Distribution of Variances
(Continuous) Chi Square Distribution
A special gamma distribution α = r/2, β = 2

 PD


 E(X)

 Var(X)
(Continuous) Chi Square Distribution
(Continuous) Chi Square Distribution
Case Where Population Variance Is Unknown
(Continuous) Student t Distribution
(Continuous) Student t Distribution
(Continuous) F Distribution
(Continuous) F Distribution
(Continuous) F Distribution
⊙ Effect of the degree of freedom




 For α, 100(1- α)% : fα(m, n)

          (1) P(X ≥ fα(m, n) ) = α

          (2) P(f1-α/2(m, n) ≤ X ≤ fα/2(m, n)) = α

          (3) F ∼ F (m, n) ⇒ 1/F ∼ F (n, m)
(Continuous) F Distribution
Sampling Distribution of Ratios of Variances
Other Statistics
Other Statistics

More Related Content

What's hot

Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsRozainita Rosley
 
Statics for the management
Statics for the managementStatics for the management
Statics for the managementRohit Mishra
 
Estadística investigación _grupo1_ Zitácuaro
Estadística investigación _grupo1_ ZitácuaroEstadística investigación _grupo1_ Zitácuaro
Estadística investigación _grupo1_ ZitácuaroYasminSotoEsquivel
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDouglas Joubert
 
Introduction to Statistics and Probability
Introduction to Statistics and ProbabilityIntroduction to Statistics and Probability
Introduction to Statistics and ProbabilityBhavana Singh
 
Analyzing quantitative data
Analyzing quantitative dataAnalyzing quantitative data
Analyzing quantitative dataBing Villamor
 
Normal distribution
Normal distributionNormal distribution
Normal distributionTeratai Layu
 
2 descriptive statistics
2  descriptive statistics2  descriptive statistics
2 descriptive statisticsMaria Sharif
 
Lesson 8 zscore
Lesson 8 zscoreLesson 8 zscore
Lesson 8 zscorenurun2010
 
Measure OF Central Tendency
Measure OF Central TendencyMeasure OF Central Tendency
Measure OF Central TendencyIqrabutt038
 
Descriptions of data statistics for research
Descriptions of data   statistics for researchDescriptions of data   statistics for research
Descriptions of data statistics for researchHarve Abella
 
Measures of Central Tendency, Variability and Shapes
Measures of Central Tendency, Variability and ShapesMeasures of Central Tendency, Variability and Shapes
Measures of Central Tendency, Variability and ShapesScholarsPoint1
 
Statistics Class 10 CBSE
Statistics Class 10 CBSE Statistics Class 10 CBSE
Statistics Class 10 CBSE Smitha Sumod
 
Scope and objective of the assignment
Scope and objective of the assignmentScope and objective of the assignment
Scope and objective of the assignmentGourab Chakraborty
 
Measure of central tendency
Measure of central tendencyMeasure of central tendency
Measure of central tendencymauitaylor007
 

What's hot (20)

Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statistics
 
Statics for the management
Statics for the managementStatics for the management
Statics for the management
 
elementary statistic
elementary statisticelementary statistic
elementary statistic
 
Estadística investigación _grupo1_ Zitácuaro
Estadística investigación _grupo1_ ZitácuaroEstadística investigación _grupo1_ Zitácuaro
Estadística investigación _grupo1_ Zitácuaro
 
Exploring Data
Exploring DataExploring Data
Exploring Data
 
Panel slides
Panel slidesPanel slides
Panel slides
 
Basic concepts of statistics
Basic concepts of statistics Basic concepts of statistics
Basic concepts of statistics
 
Descriptive Statistics and Data Visualization
Descriptive Statistics and Data VisualizationDescriptive Statistics and Data Visualization
Descriptive Statistics and Data Visualization
 
Descriptive statistics i
Descriptive statistics iDescriptive statistics i
Descriptive statistics i
 
Introduction to Statistics and Probability
Introduction to Statistics and ProbabilityIntroduction to Statistics and Probability
Introduction to Statistics and Probability
 
Analyzing quantitative data
Analyzing quantitative dataAnalyzing quantitative data
Analyzing quantitative data
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
 
2 descriptive statistics
2  descriptive statistics2  descriptive statistics
2 descriptive statistics
 
Lesson 8 zscore
Lesson 8 zscoreLesson 8 zscore
Lesson 8 zscore
 
Measure OF Central Tendency
Measure OF Central TendencyMeasure OF Central Tendency
Measure OF Central Tendency
 
Descriptions of data statistics for research
Descriptions of data   statistics for researchDescriptions of data   statistics for research
Descriptions of data statistics for research
 
Measures of Central Tendency, Variability and Shapes
Measures of Central Tendency, Variability and ShapesMeasures of Central Tendency, Variability and Shapes
Measures of Central Tendency, Variability and Shapes
 
Statistics Class 10 CBSE
Statistics Class 10 CBSE Statistics Class 10 CBSE
Statistics Class 10 CBSE
 
Scope and objective of the assignment
Scope and objective of the assignmentScope and objective of the assignment
Scope and objective of the assignment
 
Measure of central tendency
Measure of central tendencyMeasure of central tendency
Measure of central tendency
 

Viewers also liked

Hs Industrial Insights Manufacturing
Hs Industrial Insights   ManufacturingHs Industrial Insights   Manufacturing
Hs Industrial Insights ManufacturingTKarlsson
 
Exploratory Statistics with R
Exploratory Statistics with RExploratory Statistics with R
Exploratory Statistics with RChristian Robert
 
disenos experimentales
disenos experimentalesdisenos experimentales
disenos experimentalesAngel Velazco
 
Mineral processing-design-and-operation, gupta
Mineral processing-design-and-operation, guptaMineral processing-design-and-operation, gupta
Mineral processing-design-and-operation, guptaCarlos Barreto Gamarra
 
JKSimMet Course - Part 1
JKSimMet Course - Part 1JKSimMet Course - Part 1
JKSimMet Course - Part 1James Didovich
 
The Binomial, Poisson, and Normal Distributions
The Binomial, Poisson, and Normal DistributionsThe Binomial, Poisson, and Normal Distributions
The Binomial, Poisson, and Normal DistributionsSCE.Surat
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...tboubez
 
Diseño de experimetos pres
Diseño de experimetos presDiseño de experimetos pres
Diseño de experimetos presJose Ojeda
 
Guía métodos estadísticos.
Guía métodos estadísticos.Guía métodos estadísticos.
Guía métodos estadísticos.Lenin Goursa
 
Holistic modelling of mineral processing plants a practical approach
Holistic modelling of mineral processing plants   a practical approachHolistic modelling of mineral processing plants   a practical approach
Holistic modelling of mineral processing plants a practical approachBasdew Rooplal
 
Diseños de Experimentos - CAPITULO 07 HERNANDEZ SAMPERI
Diseños de Experimentos - CAPITULO 07 HERNANDEZ SAMPERI Diseños de Experimentos - CAPITULO 07 HERNANDEZ SAMPERI
Diseños de Experimentos - CAPITULO 07 HERNANDEZ SAMPERI Pedro Salcedo Lagos
 
Sampling and analysis for feasibility studies and mineral processing
Sampling and analysis for feasibility studies and mineral processingSampling and analysis for feasibility studies and mineral processing
Sampling and analysis for feasibility studies and mineral processingBasdew Rooplal
 

Viewers also liked (20)

Pearson y sperman
Pearson y spermanPearson y sperman
Pearson y sperman
 
Hs Industrial Insights Manufacturing
Hs Industrial Insights   ManufacturingHs Industrial Insights   Manufacturing
Hs Industrial Insights Manufacturing
 
Estadística: Cálculos SPSS
Estadística: Cálculos SPSSEstadística: Cálculos SPSS
Estadística: Cálculos SPSS
 
Exploratory Statistics with R
Exploratory Statistics with RExploratory Statistics with R
Exploratory Statistics with R
 
disenos experimentales
disenos experimentalesdisenos experimentales
disenos experimentales
 
Mineral processing-design-and-operation, gupta
Mineral processing-design-and-operation, guptaMineral processing-design-and-operation, gupta
Mineral processing-design-and-operation, gupta
 
Jk (1)
Jk (1)Jk (1)
Jk (1)
 
JKSimMet Course - Part 1
JKSimMet Course - Part 1JKSimMet Course - Part 1
JKSimMet Course - Part 1
 
Presentación del curso,agosto,18,2014
Presentación del curso,agosto,18,2014Presentación del curso,agosto,18,2014
Presentación del curso,agosto,18,2014
 
The Binomial, Poisson, and Normal Distributions
The Binomial, Poisson, and Normal DistributionsThe Binomial, Poisson, and Normal Distributions
The Binomial, Poisson, and Normal Distributions
 
Estadística: Revisión Estadística
Estadística: Revisión EstadísticaEstadística: Revisión Estadística
Estadística: Revisión Estadística
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...
 
Clase 4 diseños de bloques - final
Clase 4   diseños de bloques - finalClase 4   diseños de bloques - final
Clase 4 diseños de bloques - final
 
Diseño de experimetos pres
Diseño de experimetos presDiseño de experimetos pres
Diseño de experimetos pres
 
Guía métodos estadísticos.
Guía métodos estadísticos.Guía métodos estadísticos.
Guía métodos estadísticos.
 
Holistic modelling of mineral processing plants a practical approach
Holistic modelling of mineral processing plants   a practical approachHolistic modelling of mineral processing plants   a practical approach
Holistic modelling of mineral processing plants a practical approach
 
MINERAL PROCESSING SIMULATION USING MODSIM SOFWARE
MINERAL PROCESSING SIMULATION USING MODSIM SOFWAREMINERAL PROCESSING SIMULATION USING MODSIM SOFWARE
MINERAL PROCESSING SIMULATION USING MODSIM SOFWARE
 
sampling
samplingsampling
sampling
 
Diseños de Experimentos - CAPITULO 07 HERNANDEZ SAMPERI
Diseños de Experimentos - CAPITULO 07 HERNANDEZ SAMPERI Diseños de Experimentos - CAPITULO 07 HERNANDEZ SAMPERI
Diseños de Experimentos - CAPITULO 07 HERNANDEZ SAMPERI
 
Sampling and analysis for feasibility studies and mineral processing
Sampling and analysis for feasibility studies and mineral processingSampling and analysis for feasibility studies and mineral processing
Sampling and analysis for feasibility studies and mineral processing
 

Similar to Introduction to Probability and Statistics: Descriptive Statistics and Sampling Theory

Descriptive statistics.ppt
Descriptive statistics.pptDescriptive statistics.ppt
Descriptive statistics.pptPerumalPitchandi
 
3. Descriptive statistics.ppt
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.pptTanushreeBiswas23
 
3. Descriptive statistics.pbzfdsdfbbttsh
3. Descriptive statistics.pbzfdsdfbbttsh3. Descriptive statistics.pbzfdsdfbbttsh
3. Descriptive statistics.pbzfdsdfbbttshAjithGhoyal
 
3. Descriptive statistics.ppt
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.pptDoris729291
 
3. Descriptive statistics.ppt
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.pptJeenaJacob19
 
3. Descriptive statistics.ppt
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.pptAnusuya123
 
Class1.ppt
Class1.pptClass1.ppt
Class1.pptGautam G
 
Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1RajnishSingh367990
 
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSSTATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSnagamani651296
 
Basic statistics 1
Basic statistics  1Basic statistics  1
Basic statistics 1Kumar P
 
Basic biostatistics dr.eezn
Basic biostatistics dr.eeznBasic biostatistics dr.eezn
Basic biostatistics dr.eeznEhealthMoHS
 
Basics of Stats (2).pptx
Basics of Stats (2).pptxBasics of Stats (2).pptx
Basics of Stats (2).pptxmadihamaqbool6
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfkobra22
 

Similar to Introduction to Probability and Statistics: Descriptive Statistics and Sampling Theory (20)

Descriptive statistics.ppt
Descriptive statistics.pptDescriptive statistics.ppt
Descriptive statistics.ppt
 
3. Descriptive statistics.ppt
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.ppt
 
3. Descriptive statistics.pbzfdsdfbbttsh
3. Descriptive statistics.pbzfdsdfbbttsh3. Descriptive statistics.pbzfdsdfbbttsh
3. Descriptive statistics.pbzfdsdfbbttsh
 
3. Descriptive statistics.ppt
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.ppt
 
3. Descriptive statistics.ppt
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.ppt
 
3. Descriptive statistics.ppt
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1
 
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSSTATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Statistics
StatisticsStatistics
Statistics
 
Basic statistics 1
Basic statistics  1Basic statistics  1
Basic statistics 1
 
Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
Basic biostatistics dr.eezn
Basic biostatistics dr.eeznBasic biostatistics dr.eezn
Basic biostatistics dr.eezn
 
Medical statistics
Medical statisticsMedical statistics
Medical statistics
 
Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Basics of Stats (2).pptx
Basics of Stats (2).pptxBasics of Stats (2).pptx
Basics of Stats (2).pptx
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdf
 

More from Kookmin University (9)

13주차
13주차13주차
13주차
 
12주차
12주차12주차
12주차
 
11주차
11주차11주차
11주차
 
10주차
10주차10주차
10주차
 
7주차
7주차7주차
7주차
 
6주차
6주차6주차
6주차
 
5주차
5주차5주차
5주차
 
4주차
4주차4주차
4주차
 
2주차
2주차2주차
2주차
 

Introduction to Probability and Statistics: Descriptive Statistics and Sampling Theory

  • 1. Introduction to Probability and Statistics 9th Week (5/10) 1. Descriptive Statistics 2. Sampling Theory
  • 2. Probability is a science of ( ). Statistics is a science of ( ).
  • 3. Probability is the Science of Uncertainty.  It is used by Physicists to predict the behaviour of elementary particles.  It is used by engineers to build computers.  It is used by economists to predict the behaviour of the economy.  It is used by stockbrokers to make money on the stockmarket.  It is used by psychologists to determine if you should get that job.
  • 4. What about Statistics?  Statistics is the Science of Data.  There are two kinds of statistics  Descriptive Statistics: Discipline of quantitatively describing the main features of a collection of data  Inferential Statistics: It is a discipline that allows us to estimate unknown quantities by making some elementary measurements. Using these estimates we can then make Predictions and Forecast the Future
  • 5. Descriptive Statistics • Describing data with tables and graphs (quantitative or categorical variables) • Numerical descriptions of center, variability, position (quantitative variables) • Bivariate descriptions (In practice, most studies have several variables)
  • 6. 1. Tables and Graphs Frequency distribution: Lists possible values of variable and number of times each occurs Example: Student survey (n = 60) “political ideology” measured as ordinal variable with 1 = very liberal, …, 4 = moderate, …, 7 = very conservative
  • 7.
  • 8. Histogram: Bar graph of frequencies or percentages
  • 9. Shapes of histograms (for quantitative variables) • Bell-shaped (IQ, SAT, political ideology in all U.S. ) • Skewed right (annual income, no. times arrested) • Skewed left (score on easy exam) • Bimodal (polarized opinions)
  • 10. Stem-and-leaf plot (John Tukey, 1977) Example: Exam scores (n = 40 students) Stem Leaf 3 6 4 5 37 6 235899 7 011346778999 8 00111233568889 9 02238
  • 11. 2.Numerical descriptions Let y denote a quantitative variable, with observations y1 , y2 , y3 , … , yn a. Describing the center Median: Middle measurement of ordered sample Mean: y1 + y2 + ... + yn Σyi y= = n n
  • 12. Properties of mean and median • For symmetric distributions, mean = median • For skewed distributions, mean is drawn in direction of longer tail, relative to median • Mean valid for interval scales, median for interval or ordinal scales • Mean sensitive to “outliers” (median often preferred for highly skewed distributions) • When distribution symmetric or mildly skewed or discrete with few values, mean preferred because uses numerical values of observations
  • 13. Examples: • New York Yankees baseball team, 2006 mean salary = $7.0 million median salary = $2.9 million How possible? Direction of skew? • Give an example for which you would expect mean < median
  • 14. b. Describing variability Range: Difference between largest and smallest observations (but highly sensitive to outliers, insensitive to shape) Standard deviation: A “typical” distance from the mean The deviation of observation i from the mean is yi − y
  • 15. The variance of the n observations is Σ ( yi − y ) ( y1 − y ) + ... + ( yn − y ) 2 2 2 s = 2 = n−1 n−1 The standard deviation s is the square root of the variance, s = s 2
  • 16. Example: Political ideology • For those in the student sample who attend religious services at least once a week (n = 9 of the 60), • y = 2, 3, 7, 5, 6, 7, 5, 6, 4 y = 5.0, (2 − 5) 2 + (3 − 5) 2 + ... + (4 − 5) 2 24 s2 = = = 3.0 9 −1 8 s = 3.0 = 1.7 For entire sample (n = 60), mean = 3.0, standard deviation = 1.6, tends to have similar variability but be more liberal
  • 17. c. Measures of position pth percentile: p percent of observations below it, (100 - p)% above it.  p = 50: median  p = 25: lower quartile (LQ)  p = 75: upper quartile (UQ)  Interquartile range IQR = UQ - LQ
  • 18. Quartiles portrayed graphically by box plots (John Tukey) Example: weekly TV watching for n=60 from student survey data file, 3 outliers
  • 19. Box plots have box from LQ to UQ, with median marked. They portray a five- number summary of the data: Minimum, LQ, Median, UQ, Maximum except for outliers identified separately Outlier = observation falling below LQ – 1.5(IQR) or above UQ + 1.5(IQR) Ex. If LQ = 2, UQ = 10, then IQR = 8 and outliers above 10 + 1.5(8) = 22
  • 20. 3. Bivariate description • Usually we want to study associations between two or more variables (e.g., how does number of close friends depend on gender, income, education, age, working status, rural/urban, religiosity…) • Response variable: the outcome variable • Explanatory variable(s): defines groups to compare Ex.: number of close friends is a response variable, while gender, income, … are explanatory variables Response var. also called “dependent variable” Explanatory var. also called “independent variable”
  • 21. Summarizing associations: • Categorical var’s: show data using contingency tables • Quantitative var’s: show data using scatterplots • Mixture of categorical var. and quantitative var. (e.g., number of close friends and gender) can give numerical summaries (mean, standard deviation) or side-by-side box plots for the groups • Ex. General Social Survey (GSS) data Men: mean = 7.0, s = 8.4 Women: mean = 5.9, s = 6.0 Shape? Inference questions for later chapters?
  • 22. Example: Income by highest degree
  • 23. Contingency Tables • Cross classifications of categorical variables in which rows (typically) represent categories of explanatory variable and columns represent categories of response variable. • Counts in “cells” of the table give the numbers of individuals at the corresponding combination of levels of the two variables
  • 24. Happiness and Family Income (GSS 2008 data: “happy,” “finrela”) Happiness Income Very Pretty Not too Total ------------------------------- Above Aver. 164 233 26 423 Average 293 473 117 883 Below Aver. 132 383 172 687 ------------------------------ Total 589 1089 315 1993
  • 25. Can summarize by percentages on response variable (happiness) Example: Percentage “very happy” is 39% for above aver. income (164/423 = 0.39) 33% for average income (293/883 = 0.33) 19% for below average income (??)
  • 26. Happiness Income Very Pretty Not too Total -------------------------------------------- Above 164 (39%) 233 (55%) 26 (6%) 423 Average 293 (33%) 473 (54%) 117 (13%) 883 Below 132 (19%) 383 (56%) 172 (25%) 687 ---------------------------------------------- Inference questions for later chapters? (i.e., what can we conclude about the corresponding population?)
  • 27. Scatterplots (for quantitative variables) plot response variable on vertical axis, explanatory variable on horizontal axis Example: Table 9.13 (p. 294) shows UN data for several nations on many variables, including fertility (births per woman), contraceptive use, literacy, female economic activity, per capita gross domestic product (GDP), cell- phone use, CO2 emissions Data available at http://www.stat.ufl.edu/~aa/social/data.html
  • 28.
  • 29. Example: Survey in Alachua County, Florida, on predictors of mental health (data for n = 40 on p. 327 of text and at www.stat.ufl.edu/~aa/social/data.html) y = measure of mental impairment (incorporates various dimensions of psychiatric symptoms, including aspects of depression and anxiety) (min = 17, max = 41, mean = 27, s = 5) x = life events score (events range from severe personal disruptions such as death in family, extramarital affair, to less severe events such as new job, birth of child, moving) (min = 3, max = 97, mean = 44, s = 23)
  • 30.
  • 31. Bivariate data from 2000 Presidential election Butterfly ballot, Palm Beach County, FL, text p.290
  • 32. Example: The Massachusetts Lottery (data for 37 communities) % income spent on lottery Per capita income
  • 33. Correlation describes strength of association • Falls between -1 and +1, with sign indicating direction of association (formula later in Chapter 9) The larger the correlation in absolute value, the stronger the association (in terms of a straight line trend) Examples: (positive or negative, how strong?) Mental impairment and life events, correlation = GDP and fertility, correlation = GDP and percent using Internet, correlation =
  • 34.
  • 35. Inferential Statistics: Fortune Teller  How can she read the future?  Analysis of Data from Her Previous Victims (Clients)  Make Hypotheses  Test Them  Fool You!
  • 36. Population and Sample - Often in practice we are interested in drawing valid conclusions about a large group of individuals or objects. - Instead of examining the entire group, called the population, which may be difficult or impossible to do, we may examine only a small part of this population, which is called a sample. - The process of obtaining samples is called sampling. Sampling population Sample
  • 37. Statistical Inference - We do this with the aim of inferring certain facts about the population from results found in the sample, a process known as statistical inference.
  • 38. Sampling With and Without Replacement - Population may be finite or infinite. - If finite, Sampling method is important. - If we draw an object from an urn, we have the choice of replacing or not replacing the object into the urn before we draw again. - Sampling with replacement: Sampling where each member of a population may be chosen more than once - Sampling without replacement: sampling where each member cannot be chosen more than once - A finite population that is sampled with replacement can theoretically be considered infinite since samples of any size can be drawn without exhausting the population.
  • 39. Random Samples, Random Numbers - Clearly, the reliability of conclusions drawn concerning a population depends on whether the sample is properly chosen so as to represent the population sufficiently well, and one of the important problems of statistical inference is just how to choose a sample. - One way to do this for finite populations is to make sure that each member of the population has the same chance of being in the sample, which is then often called a random sample. - Random sampling can be accomplished for relatively small populations by drawing lots or, equivalently, by using a table of random numbers specially constructed for such purposes. - Because inference from sample to population cannot be certain, we must use the language of probability in any statement of conclusions.
  • 41. Population Parameters - A population is considered to be known when we know the probability distribution f (x) (probability function or density function) of the associated random variable X. - If X is a random variable whose values are the heights (or weights) of the 12,000 students, then X has a probability distribution f (x). - If, for example, X is normally distributed, we say that the population is normally distributed or that we have a normal population. Similarly, if X is binomially distributed, we say that the population is binomially distributed or that we have a binomial population. - There will be certain quantities that appear in f(x), such as µ and σ in the case of the normal distribution or p in the case of the binomial distribution. - Other quantities such as the median, moments, and skewness can then be determined in terms of these.
  • 42. Population Parameters - All such quantities are often called population parameters. - When we are given the population so that we know f(x), then the population parameters are also known. - An important problem arises when the probability distribution f(x) of the population is not known precisely, although we may have some idea of, or at least be able to make some hypothesis concerning, the general behavior of f(x). - For example, we may have some reason to suppose that a particular population is normally distributed. - In that case we may not know one or both of the values and so we might wish to draw statistical inferences about them.
  • 44. Sample Statistics - We can take random samples from the population and then use these samples to obtain values that serve to estimate and test hypotheses about the population parameters.
  • 46. Sample statistics / Population parameters • We distinguish between summaries of samples (statistics) and summaries of populations (parameters). • Common to denote statistics by Roman letters, parameters by Greek letters:
  • 47. Sample Statistics - In general, corresponding to each population parameter there will be a statistic to be computed from the sample. - Usually the method for obtaining this statistic from the sample is similar to that for obtaining the parameter from a finite population, since a sample consists of a finite set of values. - As we shall see, however, this may not always produce the “best estimate,” and one of the important problems of sampling theory is to decide how to form the proper sample statistic that will best estimate a given population parameter. - Where possible we shall try to use Greek letters, such µ as σ and , for values of population parameters, and Roman letters, m, s, etc., for values of corresponding sample statistics.
  • 48. Sampling Distribution - As we have seen, a sample statistic that is computed from X1, . . . , Xn is a function of these random variables and is therefore itself a random variable. - The probability distribution of a sample statistic is often called the sampling distribution of the statistic. - Alternatively we can consider all possible samples of size n that can be drawn from the population, and for each sample we compute the statistic. - In this manner we obtain the distribution of the statistic, which is its sampling distribution. - For a sampling distribution, we can of course compute a mean, variance, standard deviation, moments, etc. - The standard deviation is sometimes also called the standard error.
  • 53. Sampling Distribution of Proportions - Suppose that a population is infinite and binomially distributed, with p and q = 1- p being the respective probabilities that any given member exhibits or does not exhibit a certain property. - Consider all possible samples of size n drawn from this population, and for each sample determine the statistic that is the proportion P of successes. - In the case of the coin, p would be the proportion of heads turning up in n tosses. Then we obtain a sampling distribution of proportions whose mean µp and standard deviation σp are given by - For finite populations in which sampling is without replacement, the second equation in (9) is replaced by as given by (6) with
  • 55. Example 6. Find the probability that in 120 tosses of a fair coin (a) between 40% and 60% will be heads, (b) or more will be heads.
  • 57. Sampling Distribution of Differences and Sums
  • 58. Sampling Distribution of Differences and Sums
  • 59. Example 7. The electric light bulbs of manufacturer A have a mean lifetime of 1400 hours with a standard deviation of 200 hours, while those of manufacturer B have a mean lifetime of 1200 hours with a standard deviation of 100 hours. If random samples of 125 bulbs of each brand are tested, what is the probability that the brand A bulbs will have a mean lifetime that is at least (a) 160 hours, (b) 250 hours more than the brand B bulbs?
  • 61. Example 8. Ball bearings of a given brand weigh 0.50 oz with a standard deviation of 0.02 oz. What is the probability that two lots, of 1000 ball bearings each, will differ in weight by more than 2 oz?
  • 65. (Continuous) Chi Square Distribution A special gamma distribution α = r/2, β = 2  PD  E(X)  Var(X)
  • 66. (Continuous) Chi Square Distribution
  • 67. (Continuous) Chi Square Distribution
  • 68. Case Where Population Variance Is Unknown
  • 69. (Continuous) Student t Distribution
  • 70. (Continuous) Student t Distribution
  • 73. (Continuous) F Distribution ⊙ Effect of the degree of freedom For α, 100(1- α)% : fα(m, n) (1) P(X ≥ fα(m, n) ) = α (2) P(f1-α/2(m, n) ≤ X ≤ fα/2(m, n)) = α (3) F ∼ F (m, n) ⇒ 1/F ∼ F (n, m)
  • 75. Sampling Distribution of Ratios of Variances

Editor's Notes

  1. Draw on board, perhaps add U-shaped
  2. Due to John Tukey, EDA, 1977
  3. Mean = center of gravity of data. Mode another measure, appropriate for all scales
  4. Draw picture showing insensitivity of range to shape
  5. Draw picture