SlideShare a Scribd company logo
1 of 76
Chi-Square
Dr Mahmoud Alhussami
Types of Statistical Tests
When running a t test and ANOVA
 We compare:
       Mean differences between groups
   We assume
    
        random sampling
    
        the groups are homogeneous
    
        distribution is normal
    
        samples are large enough to represent population
        (>30)
    
        DV Data: represented on an interval or ratio scale
   These are Parametric tests!
Types of Tests
When the assumptions are violated:
 Subjects were not randomly sampled
 DV Data:
  
      Ordinal (ranked)
  
       Nominal (categorized: types of car, levels of
      education, learning styles, Likert Scale)
  
      The scores are greatly skewed or we have no
      knowledge of the distribution
We use tests that are equivalent to t test and
 ANOVA
Non-Parametric Test!
Requirements for Chi-
        Square test    4


 Must be a random     sample from population
 Data must be in raw frequencies
 Variables must be independent
 A sufficiently large sample size is required
  (at least 20)
 Actual count data (not percentages)
 Observations must be independent.
 Does not prove causality.
Different Scales, Different Measures
           of Association

   Scale of Both       Measures of
   Variables           Association
   Nominal Scale       Pearson Chi-
                       Square: χ2

   Ordinal Scale       Spearman’s rho

   Interval or Ratio   Pearson r
   Scale
Chi Square
   Used when data are nominal (both IV and DV)
    
        Comparing frequencies of distributions occurring in
        different categories or groups
       Tests whether group distributions are different
         • Shoppers’ preference for the taste of 3 brands of candy
       determines the association between IV and DV by
        counting the frequencies of distribution
         • Gender relative to study preference (alone or in group)
Important
 The chi square test can only be used on
  data that has the following characteristics:
 The data must be in the form
        of frequencies
            The frequency data must have a
          precise numerical value and must be
          organised into categories or groups.
                   The expected frequency in any one cell
                    of the table must be greater than 5.
                              The total number of observations must be
                                          greater than 20.
Formula

                  χ 2 = ∑ (O – E)2
                             E

χ2 = The value of chi square
O = The observed value
E = The expected value
∑ (O – E)2 = all the values of (O – E) squared then added
together
?What is it
 Test of proportions
 Non parametric test
 Dichotomous variables are used
 Tests the association between two
 factors
    e.g. treatment and disease
         gender and mortality
types of chi-square analysis
              techniques
   Tests of Independence is a chi-square
    technique used to determine whether two
    characteristics (such as food spoilage and
    refrigeration temperature) are related or
    independent.
   Goodness-of-fit test is a chi-square test
    technique used to study similarities between
    proportions or frequencies between groupings
    (or classification) of categorical data (comparing
    a distribution of data with another distribution of
    data where the expected frequencies are
    known).
Chi Square Test of Independence
   Purpose
    
        To determine if two variables of interest independent (not
        related) or are related (dependent)?
    
        When the variables are independent, we are saying that knowledge of
        one gives us no information about the other variable. When they are
        dependent, we are saying that knowledge of one variable is helpful in
        predicting the value of the other variable.
    
        The chi-square test of independence is a test of the influence or
        impact that a subject’s value on one variable has on the same
        subject’s value for a second variable.
    
        Some examples where one might use the chi-squared test of
        independence are:
         • Is level of education related to level of income?
         • Is the level of price related to the level of quality in production?
   Hypotheses
    
        The null hypothesis is that the two variables are independent. This will
        be true if the observed counts in the sample are similar to the
        expected counts.
         • H0: X and Y are independent
         • H1: X and Y are dependent
Chi Square Test of
            Independence
 Wording of Research questions
  
      Are X and Y independent?
  
      Are X and Y related?
  
      The research hypothesis states that the
      two variables are dependent or related.
      This will be true if the observed counts for
      the categories of the variables in the
      sample are different from the expected
      counts.
 Level of Measurement
  
      Both X and Y are categorical
Assumptions
Chi Square Test of Independence
   Each subject contributes data to only one cell

   Finite values
    
        Observations must be grouped in categories. No assumption is
        made about level of data. Nominal, ordinal, or interval data may be
        used with chi-square tests.

   A sufficiently large sample size
       In general N > 20.
    
        No one accepted cutoff – the general rules are
         • No cells with observed frequency = 0
         • No cells with the expected frequency < 5
         • Applying chi-square to small samples exposes the researcher to an
           unacceptable rate of Type II errors.
        Note: chi-square must be calculated on actual count data, not
        substituting percentages, which would have the effect of pretending
        the sample size is 100.
Chi Square Test of Goodness of Fit
 Purpose
     To determine whether an observed
      frequency distribution departs significantly
      from a hypothesized frequency distribution.
     This test is sometimes called a One-sample
      Chi Square Test.
 Hypotheses
  
      The null hypothesis is that the two variables are
      independent. This will be true if the observed
      counts in the sample are similar to the expected
      counts.
       • H0: X follows the hypothesized distribution
       • H1: X deviates from the hypothesized distribution
Chi Square Test of Goodness of Fit
 Sample Research Questions
     Do students buy more Coke, Gatorade or
      Coffee?
     Does my sample contain a
      disproportionate amount of Hispanics as
      compared to the population of the county
      from which they were sampled?
     Has the ethnic composition of the city of
      Amman changed since 1990?
 Level of Measurement
  
      X is categorical
Assumptions
Chi Square Test of Goodness of Fit
   The research question involves the comparison of
    the observed frequency of one categorical variable
    within a sample to the expected frequency of that
    variable.

   The observed and theoretical distributions must
    contain the same divisions (i.e. ‘levels’ or
    ‘classes’)

   The expected frequency in each division must be
    >5

   There must be a sufficient sample (in general
    N>20)
Steps in Test of Hypothesis
1.   Determine the appropriate test
2.   Establish the level of significance:α
3.   Formulate the statistical hypothesis
4.   Calculate the test statistic
5.   Determine the degree of freedom
6.   Compare computed test statistic against a
     tabled/critical value
Determine Appropriate Test. 1
   Chi Square is used when both variables are
    measured on a nominal scale.
   It can be applied to interval or ratio data that
    have been categorized into a small number of
    groups.
   It assumes that the observations are
    randomly sampled from the population.
   All observations are independent (an
    individual can appear only once in a table and
    there are no overlapping categories).
   It does not make any assumptions about the
    shape of the distribution nor about the
    homogeneity of variances.
Establish Level of. 2
              Significance
 α is a predetermined value
 The convention
    •   α = .05
    •   α = .01
    •   α = .001
Determine The Hypothesis:. 3
    Whether There is an
     Association or Not
 Ho   : The two variables are independent
 Ha   : The two variables are associated
Calculating Test Statistics. 4
   Contrasts observed frequencies in each cell of a
    contingency table with expected frequencies.
   The expected frequencies represent the number
    of cases that would be found in each cell if the
    null hypothesis were true ( i.e. the nominal
    variables are unrelated).
   Expected frequency of two unrelated events is
    product of the row and column frequency divided
    by number of cases.
              Fe= Fr Fc / N

     Expected frequency = row total x column total
                               Grand total
Calculating Test Statistics. 4



      ( Fo − Fe )    2
χ = ∑
 2
                   
           Fe     
Calculating Test Statistics. 4
          O
      fre bse
         qu r ve
            en d
              c ie
                   s


      ( Fo − Fe )                2
χ = ∑
 2
                   
           Fe     

                                   Ex que
                                     fre
                                      pe ncy
                                         cte
                                            d
                          qu ed
                              cy
                       fre pect
                            en
                         Ex
Determine Degrees of. 5




                                                           f
                                                     ber o
                                               Num column
                          ) df = (R-1)(C-1
       Freedom


                                                 ls in
                                             leve ariable
                                                  v
                                               Numb
                                                       er o
                                             le vels in f
                                                        ro
                                               variab w
                                                       le
Compare computed test statistic. 6
 against a tabled/critical value
 The computed value of the Pearson chi-
  square statistic is compared with the
  critical value to determine if the
  computed value is improbable
 The critical tabled values are based on
  sampling distributions of the Pearson
  chi-square statistic
 If calculated χ2 is greater than χ2 table
  value, reject Ho
Decision and Interpretation
   If the probability of the test statistic is less than
    or equal to the probability of the alpha error rate,
    we reject the null hypothesis and conclude that
    our data supports the research hypothesis. We
    conclude that there is a relationship between the
    variables.
   If the probability of the test statistic is greater
    than the probability of the alpha error rate, we
    fail to reject the null hypothesis. We conclude
    that there is no relationship between the
    variables, i.e. they are independent.
Example
 Suppose a researcher is interested in
  voting preferences on gun control issues.
 A questionnaire was developed and sent
  to a random sample of 90 voters.
 The researcher also collects information
  about the political party membership of the
  sample of 90 respondents.
Bivariate Frequency Table or
       Contingency Table
                Favor   Neutral Oppose     f row

 Democrat         10       10      30       50

Republican        15       15      10       40


     f column     25       25      40    n = 90
Bivariate Frequency Table or
       Contingency Table
                    Favor   Neutral Oppose     f row

 Democrat             10       10      30       50

Republican            15       15      10       40


        f dcolumn     25       25      40    n = 90
           e
      e r v cie s
   b s en
  O qu
   fre
Row frequency
    Bivariate Frequency Table or
         Contingency Table
                Favor   Neutral Oppose            f row

 Democrat         10       10      30                     50

Republican        15       15      10                     40


     f column     25       25      40    n = 90
Bivariate Frequency Table or
             Contingency Table
                       Favor   Neutral Oppose     f row

     Democrat            10       10      30       50

   Republican            15       15      10       40


            f column     25       25      40    n = 90
Column frequency
Determine Appropriate Test. 1

1. Party Membership ( 2 levels) and
   Nominal
2. Voting Preference ( 3 levels) and
   Nominal
Establish Level of. 2
    Significance
      Alpha of .05
Determine The Hypothesis. 3
•   Ho : There is no difference between D & R
    in their opinion on gun control issue.

•   Ha : There is an association between
    responses to the gun control survey and
    the party membership in the population.
Calculating Test Statistics. 4

                 Favor    Neutral Oppose           f row

 Democrat          fo =10   fo =10     fo =30       50
                fe =13.9 fe =13.9    fe=22.2
Republican         fo =15   fo =15      fo =10      40
                fe =11.1 fe =11.1    fe =17.8
     f column        25        25         40     n = 90
Calculating Test Statistics. 4

                Favor    Neutral Oppose           f row
                            = 50*25/90
Democrat          fo =10   fo =10     fo =30       50
               fe =13.9 fe =13.9    fe=22.2
Republica    fo =15   fo =15           fo =10      40
        n f =11.1 f =11.1           fe =17.8
           e        e

    f column        25        25         40     n = 90
Calculating Test Statistics. 4

                Favor    Neutral Oppose           f row

Democrat          fo =10   fo =10     fo =30       50
               fe =13.9 fe =13.9    fe=22.2
                          = 40* 25/90
Republica    fo =15   fo =15           fo =10      40
        n f =11.1 f =11.1           fe =17.8
           e        e

    f column        25        25         40     n = 90
Calculating Test Statistics. 4

     (10 − 13.89) 2 (10 − 13.89) 2 (30 − 22.2) 2
χ2 =               +              +              +
         13.89          13.89          22.2

     (15 − 11.11) 2 (15 − 11.11) 2 (10 − 17.8) 2
                   +              +
         11.11          11.11          17.8


    = 11.03
Determine Degrees of. 5
       Freedom
      df = (R-1)(C-1) =
        (2-1)(3-1) = 2
Compare computed test statistic. 6
  against a tabled/critical value
 α = 0.05
 df = 2
 Critical tabled value = 5.991
 Test statistic, 11.03, exceeds critical value
 Null hypothesis is rejected
 Democrats & Republicans differ
  significantly in their opinions on gun
  control issues
SPSS Output for Gun Control
        Example

                     Chi-Square Tests

                                                 Asymp. Sig.
                         Value          df        (2-sided)
Pearson Chi-Square        11.025a            2           .004
Likelihood Ratio          11.365             2           .003
Linear-by-Linear
                           8.722             1            .003
Association
N of Valid Cases              90
  a. 0 cells (.0%) have expected count less than 5. The
     minimum expected count is 11.11.
Additional Information in SPSS
                Output
 Exceptions that might distort     χ2
 Assumptions
     Associations in some but not all categories
     Low expected frequency per cell
 Extent of association is not same as
 statistical significance

               Demonstrated
            through an example
Another Example Heparin Lock
                 Placement
               Complication Incidence * Heparin Lock Placement Time Group Crosstabulation


                                                                     Heparin Lock
                                                                 Placement Time Group
                                                                     1          2        Total        Time:
       Complication    Had Compilca      Count                            9         11         20   1 = 72 hrs
       Incidence                         Expected Count                10.0       10.0       20.0   2 = 96 hrs
                                         % within Heparin Lock
                                                                    18.0%       22.0%       20.0%
                                         Placement Time Group
                       Had NO Compilca   Count                          41         39          80
                                         Expected Count               40.0       40.0        80.0
                                         % within Heparin Lock
                                                                    82.0%       78.0%       80.0%
                                         Placement Time Group
       Total                             Count                          50         50         100
                                         Expected Count               50.0       50.0       100.0
                                         % within Heparin Lock
                                                                   100.0%      100.0%    100.0%
                                         Placement Time Group


from Polit Text: Table 8-1
Hypotheses in Heparin Lock Placement

 Ho: There is no association between
  complication incidence and length of
  heparin lock placement. (The variables are
  independent).
 Ha: There is an association between
  complication incidence and length of
  heparin lock placement. (The variables are
  related).
More of SPSS Output


                                     Chi-Square Tests

                                                  Asymp. Sig.    Exact Sig.   Exact Sig.
                         Value           df        (2-sided)      (2-sided)    (1-sided)
Pearson Chi-Square          .250b             1           .617
Continuity Correctiona      .063              1           .803
Likelihood Ratio            .250              1           .617
Fisher's Exact Test                                                   .803         .402
Linear-by-Linear
                            .248              1          .619
Association
N of Valid Cases             100
  a. Computed only for a 2x2 table
  b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10.
     00.
Pearson Chi-Square
 Pearson Chi-Square = .
  250, p = .617
 Since the p > .05, we fail
  to reject the null
  hypothesis that the                                                Chi-Square Tests



  complication rate is
                                                                                  Asymp. Sig.    Exact Sig.   Exact Sig.
                                                         Value           df        (2-sided)      (2-sided)    (1-sided)
                                Pearson Chi-Square          .250b             1           .617

  unrelated to heparin lock     Continuity Correctiona
                                Likelihood Ratio
                                                            .063
                                                            .250
                                                                              1
                                                                              1
                                                                                          .803
                                                                                          .617

  placement time.               Fisher's Exact Test
                                Linear-by-Linear
                                                            .248              1          .619
                                                                                                      .803         .402

                                Association
 Continuity correction is      N of Valid Cases            100
                                  a. Computed only for a 2x2 table
  used in situations in           b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10.


  which the expected
                                     00.



  frequency for any cell in a
  2 by 2 table is less than
  10.
More SPSS Output


                                     Symmetric Measures

                                                           Asymp.
                                                                   a          b
                                               Value      Std. Error Approx. T Approx. Sig.
Nominal by             Phi                       -.050                                .617
Nominal                Cramer's V                 .050                                .617
Interval by Interval   Pearson's R               -.050         .100      -.496        .621c
Ordinal by Ordinal     Spearman Correlation      -.050         .100      -.496        .621c
N of Valid Cases                                   100
   a. Not assuming the null hypothesis.
   b. Using the asymptotic standard error assuming the null hypothesis.
   c. Based on normal approximation.
Phi Coefficient
   Pearson Chi-Square                                                Symmetric Measures

                                                                                            Asym

    provides information
                                                                                Value      Std. E
                                 Nominal by             Phi                       -.050
                                 Nominal                Cramer's V                 .050

    about the existence of       Interval by Interval
                                 Ordinal by Ordinal
                                                        Pearson's R
                                                        Spearman Correlation
                                                                                  -.050
                                                                                  -.050


    relationship between 2
                                 N of Valid Cases                                   100
                                    a. Not assuming the null hypothesis.
                                    b. Using the asymptotic standard error assuming the null hypo
    nominal variables, but not      c. Based on normal approximation.


    about the magnitude of
    the relationship
    Phi coefficient is the
                                                 χ
                                                                      2
    measure of the strength
    of the association                        φ=
                                                 N
Cramer’s V
 When the table is larger than 2                                           Symmetric Measures


  by 2, a different index must be                                                     Value
                                                                                                  Asymp
                                                                                                 Std. Er

  used to measure the strength
                                       Nominal by             Phi                       -.050
                                       Nominal                Cramer's V                 .050

  of the relationship between the      Interval by Interval
                                       Ordinal by Ordinal
                                                              Pearson's R
                                                              Spearman Correlation
                                                                                        -.050
                                                                                        -.050
                                                                                                       .1
                                                                                                       .1

  variables. One such index is         N of Valid Cases
                                          a. Not assuming the null hypothesis.
                                                                                          100


  Cramer’s V.                             b. Using the asymptotic standard error assuming the null hypoth
                                          c. Based on normal approximation.
 If Cramer’s V is large, it means
  that there is a tendency for
  particular categories of the first
  variable to be associated with
  particular categories of the
                                             χ                         2
  second variable.
                                       V=
                                          N (k − 1)
Cramer’s V
 When the table is larger than 2                                              Symmetric Measures


  by 2, a different index must be                                                        Value
                                                                                                     Asymp
                                                                                                    Std. Er

  used to measure the strength
                                          Nominal by             Phi                       -.050
                                          Nominal                Cramer's V                 .050

  of the relationship between the         Interval by Interval
                                          Ordinal by Ordinal
                                                                 Pearson's R
                                                                 Spearman Correlation
                                                                                           -.050
                                                                                           -.050
                                                                                                          .1
                                                                                                          .1

  variables. One such index is            N of Valid Cases
                                             a. Not assuming the null hypothesis.
                                                                                             100


  Cramer’s V.                                b. Using the asymptotic standard error assuming the null hypoth
                                             c. Based on normal approximation.
 If Cramer’s V is large, it means
  that there is a tendency for
  particular categories of the first
  variable to be associated with
  particular categories of the
                                               χ                          2
  second variable.
                                         V=
                                            N (k − 1)
                                       Number of                              Smallest of number
                                         cases                                of rows or columns
Interpreting Cell Differences in
     a Chi-square Test - 1


                      A chi-square test of
                      independence of the
                      relationship between sex
                      and marital status finds a
                      statistically significant
                      relationship between the
                      variables.
Interpreting Cell Differences in
     a Chi-square Test - 2




     Researcher often try to identify try to identify which cell or cells are
     the major contributors to the significant chi-square test by examining
     the pattern of column percentages.

     Based on the column percentages, we would identify cells on the
     married row and the widowed row as the ones producing the
     significant result because they show the largest differences: 8.2% on
     the married row (50.9%-42.7%) and 9.0% on the widowed row
     (13.1%-4.1%)
Interpreting Cell Differences in
     a Chi-square Test - 3




      Using a level of significance of 0.05, the critical value for a
      standardized residual would be -1.96 and +1.96. Using
      standardized residuals, we would find that only the cells on the
      widowed row are the significant contributors to the chi-square
      relationship between sex and marital status.

      If we interpreted the contribution of the married marital status,
      we would be mistaken. Basing the interpretation on column
      percentages can be misleading.
Chi-Square Test of
Independence: post hoc test
       ) in SPSS (1


               You can conduct a chi-square test
               of independence in crosstabulation
               of SPSS by selecting:

               Analyze > Descriptive
               Statistics > Crosstabs…
Chi-Square Test of
Independence: post hoc test
       ) in SPSS (2
                                    First, select and move the
                                    variables for the question to
                                    “Row(s):” and “Column(s):”
                                    list boxes.

                                    The variable mentioned first
                                    in the problem, sex, is used
                                    as the independent variable
                                    and is moved to the
                                    “Column(s):” list box.

                                    The variable mentioned
          Second, click on          second in the problem,
          “Statistics…” button to   [fund], is used as the
          request the test          dependent variable and is
          statistic.                moved to the “Row(s)” list
                                    box.
Chi-Square Test of
Independence: post hoc test
       ) in SPSS (3

  First, click on “Chi-square” to
  request the chi-square test of    Second, click on “Continue”
  independence.                     button to close the Statistics
                                    dialog box.
Chi-Square Test of
Independence: post hoc test
       ) in SPSS (6



               In the table Chi-Square Tests result,
               SPSS also tells us that “0 cells have
               expected count less than 5 and the
               minimum expected count is 70.63”.

               The sample size requirement for the
               chi-square test of independence is
               satisfied.
Chi-Square Test of
Independence: post hoc test
       ( in SPSS (7
              The probability of the chi-square test
              statistic (chi-square=2.821( was
              p=0.244, greater than the alpha level of
              significance of 0.05. The null hypothesis
              that differences in "degree of religious
              fundamentalism" are independent of
              differences in "sex" is not rejected.

              The research hypothesis that
              differences in "degree of religious
              fundamentalism" are related to
              differences in "sex" is not supported by
              this analysis.

              Thus, the answer for this question is
              False. We do not interpret cell
              differences unless the chi-square test
              statistic supports the research
              hypothesis.
SPSS Analysis
       Chi Square Test of Goodness
                  of Fit
   Analyze –               X variable
                            goes here
    Nonparametric Tests –
    Chi Square                             If all expected
                                          frequencies are
                                          the same, click
                                               this box


                                            If all expected
                                           frequencies are
                                         not the same, enter
                                         the expected value
                                          for each division
                                                  here
Examples
     ((using the Montana.sav data

         All                    All
expected frequencies    expected frequencies
    are the same          are not the same
Types of Statistical Inference
   Parameter estimation:
    
        It is used to estimate a population value, such as a
        mean, relative risk index or a mean difference
        between two groups.
       Estimation can take two forms:
         • Point estimation: involves calculating a single statistic to
           estimate the parameter. E.g. mean and median.
                Disadvantages: they offer no context for interpreting their
                 accuracy and a point estimate gives no information regarding
                 the probability that it is correct or close to the population value.
         • Interval estimation: is to estimate a range of values that has
           a high probability of containing the population value .
Interval Estimation
   For example, it is more likely the population
    height mean lies between 165-175cm.
   Interval estimation involves constructing a
    confidence interval (CI) around the point
    estimate.
   The upper and lower limits of the CI are called
    confidence limits.
   A CI around a sample mean communicates a
    range of values for the population value, and the
    probability of being right. That is, the estimate is
    made with a certain degree of confidence of
    capturing the parameter.
Confidence Intervals around a
                   Mean
 95% CI = (mean + (1.96 x SEM)
 This statement indicates that we can be 95% confident that the
  population mean lies between the confident limits , and that these
  limits are equal to 1.96 times the true standard error, above and
  below the sample mean.
 E.g. if the mean = 61 inches, and SEM = 1, What is 95% CI.
       Solution: 95% CI = (61 + (1.96 X 1))
                  95% CI = (61 + 1.96)
                  95% CI = 59.04 < μ < 62.96
   E.g. if the mean = 61 inches, and SEM = 1, What is 99% CI.
    
        Solution: 99% CI = (61 + (2.58 X 1))
                  99% CI = (61 + 2.58)
                  99% CI = 58.42 < μ < 63.58
Data Interpretation
         Consideration:
1.        Accuracy
     1.     critical view of the data
     2.     investigating evidence of the results
     3.     consider other studies’ results
     4.     peripheral data analysis
     5.     conduct power analysis: type I & type II
                               True          False
                 True         Correct        Type-II
                False         Type -I        Correct
Types of Errors
If You……          When the Null Then You
                  Hypothesis is… Have…….
Reject the null   True (there really    Made a Type I
hypothesis        are no difference)    Error
Reject the null   False (there really   ☻
hypothesis        are difference)

Accept the null   False (there really   Made Type II
hypothesis        are difference)       Error
Accept the null   True (there really    ☻
hypothesis        are no difference)
 alpha : the level of significance used for
  establishing type-I error
 β : the probability of type-II error
 1 – β : is the probability of obtaining

  significance results ( power)
 Effect size: how much we can say that the
  intervention made a significance
  difference
2. Meaning of the results
    - translation of the results and make it
     understandable
3. Importance:
     - translation of the significant findings into
     practical findings
4. Generalizability:
      - how can we make the findings useful for all
     the population
5. Implication:
      - what have we learned related to what has
     been used during study
Needed Parameters
 Alpha--chance of a Type I error
 Beta--chance of a Type II error
 Power = 1 - beta
 Effect size--difference between groups   or
 amount of variance explained or how
 much relationship there is between the DV
 and the IVs
?Remember this in English
 Type I error is when you say there is a
  difference or relationship and there is not
 Type II error is when you say there is no
  difference or relationship and there really
  is
?Which is more important
 Type I error more important if possibility of
  harm or lethal effect
 Type II error more important in relatively
  unexplored areas of research
 In some studies, Type I and Type II errors
  may be equally important
How to Increase Power
1. Increase the n
2. Decrease the unexplained variance--control by design or statistics
    (e.g. ANCOVA)
3. Increase alpha (controversial)
4. Use a one tailed test (directional hypothesis)--puts the zone of
    rejection all in one tail; same effect as increasing alpha
5. Use parametric statistics as long as you meet the assumptions. If
    not, parametric statistics are LESS powerful
6. Decrease measurement error (decrease unexplained variance)--use
    more reliable instruments, standardize measurement protocol,
    frequent calibration of physiologic instruments, improve inter-rater
    reliability
?What is good power
By tradition, “good” power is 80%

The correct answer is it depends on the nature of
  the phenomenon and which kind of error is most
  important in your study. This is a theoretical
  argument that you have to make.
Using convention (alpha = .05 and power = .80,
  beta = .20) you are saying that Type I error is
  _________ as serious as a Type II error
Effect Size
How large an effect do I expect exists in the
 population if the null is false?
OR
How much of a difference do I want to be
 able to detect?
The larger the effect, the fewer the cases
 needed to see it. (The difference is so big
 you can trip on it.)
The World According to Power
                    Kraemer & Thiemann
   The more stringent the significance level, the greater the
    necessary sample size. More subjects are needed for a
    1% level than a 5% level
   Two tailed tests require larger sample sizes than one
    tailed tests. Assessing two directions at the same time
    requires a greater investment.
   The smaller the effect size, the larger the necessary
    sample size. Subtle effects require greater efforts.
   The larger the power required, the larger the necessary
    sample size. Greater protection from failure requires
    greater effort.
   The smaller the sample size, the smaller the power, ie
    the greater the chance of failure
The World According to Power
              Kraemer & Thiemann

 If one proposed to go with a sample size
  of 20 or fewer, you have to be willing to
  have a high risk of failure or a huge effect
  size
 To achieve 99% power for a effect size of .
  01, you need > 150,000 subjects
Power for each test
 You do a power analysis for each statistic
  you are going to use.
 Choose the sample size based on the
  highest number of subjects from the power
  analysis.
 Use the most conservative power
  analysis--guarantees you the most
  subjects

More Related Content

What's hot

Confidence interval & probability statements
Confidence interval & probability statements Confidence interval & probability statements
Confidence interval & probability statements DrZahid Khan
 
Inferential statistics powerpoint
Inferential statistics powerpointInferential statistics powerpoint
Inferential statistics powerpointkellula
 
Statistical inference
Statistical inferenceStatistical inference
Statistical inferenceJags Jagdish
 
Non parametric tests by meenu
Non parametric tests by meenuNon parametric tests by meenu
Non parametric tests by meenumeenu saharan
 
Introduction to ANOVAs
Introduction to ANOVAsIntroduction to ANOVAs
Introduction to ANOVAsflorentinoz
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributionsjasondroesch
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis TestingSr Edith Bogue
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statisticsAshok Kulkarni
 
Estimation in statistics
Estimation in statisticsEstimation in statistics
Estimation in statisticsRabea Jamal
 
Confidence intervals
Confidence intervalsConfidence intervals
Confidence intervalsTanay Tandon
 
Statistical inference concept, procedure of hypothesis testing
Statistical inference   concept, procedure of hypothesis testingStatistical inference   concept, procedure of hypothesis testing
Statistical inference concept, procedure of hypothesis testingAmitaChaudhary19
 
Sampling and statistical inference
Sampling and statistical inferenceSampling and statistical inference
Sampling and statistical inferenceBhavik A Shah
 
Testing of hypotheses
Testing of hypothesesTesting of hypotheses
Testing of hypothesesRajThakuri
 

What's hot (20)

Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
 
Confidence interval & probability statements
Confidence interval & probability statements Confidence interval & probability statements
Confidence interval & probability statements
 
Inferential statistics powerpoint
Inferential statistics powerpointInferential statistics powerpoint
Inferential statistics powerpoint
 
Statistical inference
Statistical inferenceStatistical inference
Statistical inference
 
Non parametric tests by meenu
Non parametric tests by meenuNon parametric tests by meenu
Non parametric tests by meenu
 
Introduction to ANOVAs
Introduction to ANOVAsIntroduction to ANOVAs
Introduction to ANOVAs
 
INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Chisquare Test
Chisquare Test Chisquare Test
Chisquare Test
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
 
Review & Hypothesis Testing
Review & Hypothesis TestingReview & Hypothesis Testing
Review & Hypothesis Testing
 
Inferential statistics
Inferential statisticsInferential statistics
Inferential statistics
 
Estimation in statistics
Estimation in statisticsEstimation in statistics
Estimation in statistics
 
Confidence interval
Confidence intervalConfidence interval
Confidence interval
 
Confidence intervals
Confidence intervalsConfidence intervals
Confidence intervals
 
BIOSTATISTICS
BIOSTATISTICSBIOSTATISTICS
BIOSTATISTICS
 
Statistical inference concept, procedure of hypothesis testing
Statistical inference   concept, procedure of hypothesis testingStatistical inference   concept, procedure of hypothesis testing
Statistical inference concept, procedure of hypothesis testing
 
Sampling and statistical inference
Sampling and statistical inferenceSampling and statistical inference
Sampling and statistical inference
 
Testing of hypotheses
Testing of hypothesesTesting of hypotheses
Testing of hypotheses
 
Biostatistics lec 1
Biostatistics lec 1Biostatistics lec 1
Biostatistics lec 1
 

Similar to Chi square mahmoud

Chi-square IMP.ppt
Chi-square IMP.pptChi-square IMP.ppt
Chi-square IMP.pptShivraj Nile
 
Test of significance in Statistics
Test of significance in StatisticsTest of significance in Statistics
Test of significance in StatisticsVikash Keshri
 
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and IndependenceThe Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and Independencejasondroesch
 
Chi square test social research refer.ppt
Chi square test social research refer.pptChi square test social research refer.ppt
Chi square test social research refer.pptSnehamurali18
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminardrdeepika87
 
Presentation chi-square test & Anova
Presentation   chi-square test & AnovaPresentation   chi-square test & Anova
Presentation chi-square test & AnovaSonnappan Sridhar
 
Significance Tests
Significance TestsSignificance Tests
Significance TestsNandithaPJ
 
Categorical Data and Statistical Analysis
Categorical Data and Statistical AnalysisCategorical Data and Statistical Analysis
Categorical Data and Statistical AnalysisMichael770443
 
Categorical data analysis.pptx
Categorical data analysis.pptxCategorical data analysis.pptx
Categorical data analysis.pptxBegashaw3
 
univariate and bivariate analysis in spss
univariate and bivariate analysis in spss univariate and bivariate analysis in spss
univariate and bivariate analysis in spss Subodh Khanal
 
T test^jsample size^j ethics
T test^jsample size^j ethicsT test^jsample size^j ethics
T test^jsample size^j ethicsAbhishek Thakur
 
Inferential Statistics: Chi Square (X2) - DAY 6 - B.ED - 8614 - AIOU
Inferential Statistics: Chi Square (X2) - DAY 6 - B.ED - 8614 - AIOUInferential Statistics: Chi Square (X2) - DAY 6 - B.ED - 8614 - AIOU
Inferential Statistics: Chi Square (X2) - DAY 6 - B.ED - 8614 - AIOUEqraBaig
 

Similar to Chi square mahmoud (20)

Chi-square IMP.ppt
Chi-square IMP.pptChi-square IMP.ppt
Chi-square IMP.ppt
 
Test of significance in Statistics
Test of significance in StatisticsTest of significance in Statistics
Test of significance in Statistics
 
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and IndependenceThe Chi-Square Statistic: Tests for Goodness of Fit and Independence
The Chi-Square Statistic: Tests for Goodness of Fit and Independence
 
Chi square
Chi squareChi square
Chi square
 
Chi square test social research refer.ppt
Chi square test social research refer.pptChi square test social research refer.ppt
Chi square test social research refer.ppt
 
Analyzing Results
Analyzing ResultsAnalyzing Results
Analyzing Results
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminar
 
Stat topics
Stat topicsStat topics
Stat topics
 
Presentation chi-square test & Anova
Presentation   chi-square test & AnovaPresentation   chi-square test & Anova
Presentation chi-square test & Anova
 
Chi squared test
Chi squared testChi squared test
Chi squared test
 
One way AVOVA
One way AVOVA  One way AVOVA
One way AVOVA
 
Significance Tests
Significance TestsSignificance Tests
Significance Tests
 
Categorical Data and Statistical Analysis
Categorical Data and Statistical AnalysisCategorical Data and Statistical Analysis
Categorical Data and Statistical Analysis
 
Categorical data analysis.pptx
Categorical data analysis.pptxCategorical data analysis.pptx
Categorical data analysis.pptx
 
univariate and bivariate analysis in spss
univariate and bivariate analysis in spss univariate and bivariate analysis in spss
univariate and bivariate analysis in spss
 
T test^jsample size^j ethics
T test^jsample size^j ethicsT test^jsample size^j ethics
T test^jsample size^j ethics
 
Inferential Statistics: Chi Square (X2) - DAY 6 - B.ED - 8614 - AIOU
Inferential Statistics: Chi Square (X2) - DAY 6 - B.ED - 8614 - AIOUInferential Statistics: Chi Square (X2) - DAY 6 - B.ED - 8614 - AIOU
Inferential Statistics: Chi Square (X2) - DAY 6 - B.ED - 8614 - AIOU
 
Chi square test
Chi square testChi square test
Chi square test
 
Chi square test
Chi square testChi square test
Chi square test
 
Contingency tables
Contingency tables  Contingency tables
Contingency tables
 

More from Mohammad Ihmeidan

Anatomy physiology and embryology of urinary tract
Anatomy physiology and embryology of urinary tractAnatomy physiology and embryology of urinary tract
Anatomy physiology and embryology of urinary tractMohammad Ihmeidan
 
Medical History taking form introductory course
Medical History taking form introductory courseMedical History taking form introductory course
Medical History taking form introductory courseMohammad Ihmeidan
 
Male reproductive system - Anatomy
Male reproductive system - AnatomyMale reproductive system - Anatomy
Male reproductive system - AnatomyMohammad Ihmeidan
 
The cystoscope and accessories
The cystoscope  and accessoriesThe cystoscope  and accessories
The cystoscope and accessoriesMohammad Ihmeidan
 
Physicochemistry of renal stones
Physicochemistry of renal stonesPhysicochemistry of renal stones
Physicochemistry of renal stonesMohammad Ihmeidan
 
Hepatic adenoma vs focal nodular hyperplasia
Hepatic adenoma vs focal nodular hyperplasiaHepatic adenoma vs focal nodular hyperplasia
Hepatic adenoma vs focal nodular hyperplasiaMohammad Ihmeidan
 
ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,
 ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,  ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,
ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser, Mohammad Ihmeidan
 
نحن نعيش في عالم
نحن نعيش في عالمنحن نعيش في عالم
نحن نعيش في عالمMohammad Ihmeidan
 
اللغة العربية و المصطلح العلمي
اللغة العربية و المصطلح العلمياللغة العربية و المصطلح العلمي
اللغة العربية و المصطلح العلميMohammad Ihmeidan
 

More from Mohammad Ihmeidan (20)

Anatomy physiology and embryology of urinary tract
Anatomy physiology and embryology of urinary tractAnatomy physiology and embryology of urinary tract
Anatomy physiology and embryology of urinary tract
 
Fusion prostatic biopsy
Fusion prostatic biopsyFusion prostatic biopsy
Fusion prostatic biopsy
 
Renal cell carcinoma
Renal cell carcinomaRenal cell carcinoma
Renal cell carcinoma
 
Medical History taking form introductory course
Medical History taking form introductory courseMedical History taking form introductory course
Medical History taking form introductory course
 
Male reproductive system - Anatomy
Male reproductive system - AnatomyMale reproductive system - Anatomy
Male reproductive system - Anatomy
 
Testicular tumors
Testicular tumors Testicular tumors
Testicular tumors
 
Vesico ureteral reflux
Vesico ureteral reflux Vesico ureteral reflux
Vesico ureteral reflux
 
Wilms tumors
Wilms tumorsWilms tumors
Wilms tumors
 
Bladder injuries
Bladder injuriesBladder injuries
Bladder injuries
 
Urethral trauma
Urethral traumaUrethral trauma
Urethral trauma
 
The cystoscope and accessories
The cystoscope  and accessoriesThe cystoscope  and accessories
The cystoscope and accessories
 
Physicochemistry of renal stones
Physicochemistry of renal stonesPhysicochemistry of renal stones
Physicochemistry of renal stones
 
Postobstructive diuresis
Postobstructive diuresisPostobstructive diuresis
Postobstructive diuresis
 
Renal physiology
Renal physiology Renal physiology
Renal physiology
 
Peri-operative Anaphylaxis
Peri-operative Anaphylaxis Peri-operative Anaphylaxis
Peri-operative Anaphylaxis
 
Hepatic adenoma vs focal nodular hyperplasia
Hepatic adenoma vs focal nodular hyperplasiaHepatic adenoma vs focal nodular hyperplasia
Hepatic adenoma vs focal nodular hyperplasia
 
Hepatic artery varients
Hepatic artery varientsHepatic artery varients
Hepatic artery varients
 
ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,
 ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,  ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,
ABSITE Review: Practice Questions, Second Edition 2nd edition by FIser,
 
نحن نعيش في عالم
نحن نعيش في عالمنحن نعيش في عالم
نحن نعيش في عالم
 
اللغة العربية و المصطلح العلمي
اللغة العربية و المصطلح العلمياللغة العربية و المصطلح العلمي
اللغة العربية و المصطلح العلمي
 

Chi square mahmoud

  • 2. Types of Statistical Tests When running a t test and ANOVA  We compare:  Mean differences between groups  We assume  random sampling  the groups are homogeneous  distribution is normal  samples are large enough to represent population (>30)  DV Data: represented on an interval or ratio scale  These are Parametric tests!
  • 3. Types of Tests When the assumptions are violated:  Subjects were not randomly sampled  DV Data:  Ordinal (ranked)  Nominal (categorized: types of car, levels of education, learning styles, Likert Scale)  The scores are greatly skewed or we have no knowledge of the distribution We use tests that are equivalent to t test and ANOVA Non-Parametric Test!
  • 4. Requirements for Chi- Square test 4  Must be a random sample from population  Data must be in raw frequencies  Variables must be independent  A sufficiently large sample size is required (at least 20)  Actual count data (not percentages)  Observations must be independent.  Does not prove causality.
  • 5. Different Scales, Different Measures of Association Scale of Both Measures of Variables Association Nominal Scale Pearson Chi- Square: χ2 Ordinal Scale Spearman’s rho Interval or Ratio Pearson r Scale
  • 6. Chi Square  Used when data are nominal (both IV and DV)  Comparing frequencies of distributions occurring in different categories or groups  Tests whether group distributions are different • Shoppers’ preference for the taste of 3 brands of candy  determines the association between IV and DV by counting the frequencies of distribution • Gender relative to study preference (alone or in group)
  • 7. Important  The chi square test can only be used on data that has the following characteristics: The data must be in the form of frequencies The frequency data must have a precise numerical value and must be organised into categories or groups. The expected frequency in any one cell of the table must be greater than 5. The total number of observations must be greater than 20.
  • 8. Formula χ 2 = ∑ (O – E)2 E χ2 = The value of chi square O = The observed value E = The expected value ∑ (O – E)2 = all the values of (O – E) squared then added together
  • 9. ?What is it  Test of proportions  Non parametric test  Dichotomous variables are used  Tests the association between two factors e.g. treatment and disease gender and mortality
  • 10. types of chi-square analysis techniques  Tests of Independence is a chi-square technique used to determine whether two characteristics (such as food spoilage and refrigeration temperature) are related or independent.  Goodness-of-fit test is a chi-square test technique used to study similarities between proportions or frequencies between groupings (or classification) of categorical data (comparing a distribution of data with another distribution of data where the expected frequencies are known).
  • 11. Chi Square Test of Independence  Purpose  To determine if two variables of interest independent (not related) or are related (dependent)?  When the variables are independent, we are saying that knowledge of one gives us no information about the other variable. When they are dependent, we are saying that knowledge of one variable is helpful in predicting the value of the other variable.  The chi-square test of independence is a test of the influence or impact that a subject’s value on one variable has on the same subject’s value for a second variable.  Some examples where one might use the chi-squared test of independence are: • Is level of education related to level of income? • Is the level of price related to the level of quality in production?  Hypotheses  The null hypothesis is that the two variables are independent. This will be true if the observed counts in the sample are similar to the expected counts. • H0: X and Y are independent • H1: X and Y are dependent
  • 12. Chi Square Test of Independence  Wording of Research questions  Are X and Y independent?  Are X and Y related?  The research hypothesis states that the two variables are dependent or related. This will be true if the observed counts for the categories of the variables in the sample are different from the expected counts.  Level of Measurement  Both X and Y are categorical
  • 13. Assumptions Chi Square Test of Independence  Each subject contributes data to only one cell  Finite values  Observations must be grouped in categories. No assumption is made about level of data. Nominal, ordinal, or interval data may be used with chi-square tests.  A sufficiently large sample size  In general N > 20.  No one accepted cutoff – the general rules are • No cells with observed frequency = 0 • No cells with the expected frequency < 5 • Applying chi-square to small samples exposes the researcher to an unacceptable rate of Type II errors. Note: chi-square must be calculated on actual count data, not substituting percentages, which would have the effect of pretending the sample size is 100.
  • 14. Chi Square Test of Goodness of Fit  Purpose  To determine whether an observed frequency distribution departs significantly from a hypothesized frequency distribution.  This test is sometimes called a One-sample Chi Square Test.  Hypotheses  The null hypothesis is that the two variables are independent. This will be true if the observed counts in the sample are similar to the expected counts. • H0: X follows the hypothesized distribution • H1: X deviates from the hypothesized distribution
  • 15. Chi Square Test of Goodness of Fit  Sample Research Questions  Do students buy more Coke, Gatorade or Coffee?  Does my sample contain a disproportionate amount of Hispanics as compared to the population of the county from which they were sampled?  Has the ethnic composition of the city of Amman changed since 1990?  Level of Measurement  X is categorical
  • 16. Assumptions Chi Square Test of Goodness of Fit  The research question involves the comparison of the observed frequency of one categorical variable within a sample to the expected frequency of that variable.  The observed and theoretical distributions must contain the same divisions (i.e. ‘levels’ or ‘classes’)  The expected frequency in each division must be >5  There must be a sufficient sample (in general N>20)
  • 17. Steps in Test of Hypothesis 1. Determine the appropriate test 2. Establish the level of significance:α 3. Formulate the statistical hypothesis 4. Calculate the test statistic 5. Determine the degree of freedom 6. Compare computed test statistic against a tabled/critical value
  • 18. Determine Appropriate Test. 1  Chi Square is used when both variables are measured on a nominal scale.  It can be applied to interval or ratio data that have been categorized into a small number of groups.  It assumes that the observations are randomly sampled from the population.  All observations are independent (an individual can appear only once in a table and there are no overlapping categories).  It does not make any assumptions about the shape of the distribution nor about the homogeneity of variances.
  • 19. Establish Level of. 2 Significance  α is a predetermined value  The convention • α = .05 • α = .01 • α = .001
  • 20. Determine The Hypothesis:. 3 Whether There is an Association or Not  Ho : The two variables are independent  Ha : The two variables are associated
  • 21. Calculating Test Statistics. 4  Contrasts observed frequencies in each cell of a contingency table with expected frequencies.  The expected frequencies represent the number of cases that would be found in each cell if the null hypothesis were true ( i.e. the nominal variables are unrelated).  Expected frequency of two unrelated events is product of the row and column frequency divided by number of cases. Fe= Fr Fc / N Expected frequency = row total x column total Grand total
  • 22. Calculating Test Statistics. 4  ( Fo − Fe )  2 χ = ∑ 2   Fe 
  • 23. Calculating Test Statistics. 4 O fre bse qu r ve en d c ie s  ( Fo − Fe )  2 χ = ∑ 2   Fe  Ex que fre pe ncy cte d qu ed cy fre pect en Ex
  • 24. Determine Degrees of. 5 f ber o Num column ) df = (R-1)(C-1 Freedom ls in leve ariable v Numb er o le vels in f ro variab w le
  • 25. Compare computed test statistic. 6 against a tabled/critical value  The computed value of the Pearson chi- square statistic is compared with the critical value to determine if the computed value is improbable  The critical tabled values are based on sampling distributions of the Pearson chi-square statistic  If calculated χ2 is greater than χ2 table value, reject Ho
  • 26. Decision and Interpretation  If the probability of the test statistic is less than or equal to the probability of the alpha error rate, we reject the null hypothesis and conclude that our data supports the research hypothesis. We conclude that there is a relationship between the variables.  If the probability of the test statistic is greater than the probability of the alpha error rate, we fail to reject the null hypothesis. We conclude that there is no relationship between the variables, i.e. they are independent.
  • 27. Example  Suppose a researcher is interested in voting preferences on gun control issues.  A questionnaire was developed and sent to a random sample of 90 voters.  The researcher also collects information about the political party membership of the sample of 90 respondents.
  • 28. Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f row Democrat 10 10 30 50 Republican 15 15 10 40 f column 25 25 40 n = 90
  • 29. Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f row Democrat 10 10 30 50 Republican 15 15 10 40 f dcolumn 25 25 40 n = 90 e e r v cie s b s en O qu fre
  • 30. Row frequency Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f row Democrat 10 10 30 50 Republican 15 15 10 40 f column 25 25 40 n = 90
  • 31. Bivariate Frequency Table or Contingency Table Favor Neutral Oppose f row Democrat 10 10 30 50 Republican 15 15 10 40 f column 25 25 40 n = 90 Column frequency
  • 32. Determine Appropriate Test. 1 1. Party Membership ( 2 levels) and Nominal 2. Voting Preference ( 3 levels) and Nominal
  • 33. Establish Level of. 2 Significance Alpha of .05
  • 34. Determine The Hypothesis. 3 • Ho : There is no difference between D & R in their opinion on gun control issue. • Ha : There is an association between responses to the gun control survey and the party membership in the population.
  • 35. Calculating Test Statistics. 4 Favor Neutral Oppose f row Democrat fo =10 fo =10 fo =30 50 fe =13.9 fe =13.9 fe=22.2 Republican fo =15 fo =15 fo =10 40 fe =11.1 fe =11.1 fe =17.8 f column 25 25 40 n = 90
  • 36. Calculating Test Statistics. 4 Favor Neutral Oppose f row = 50*25/90 Democrat fo =10 fo =10 fo =30 50 fe =13.9 fe =13.9 fe=22.2 Republica fo =15 fo =15 fo =10 40 n f =11.1 f =11.1 fe =17.8 e e f column 25 25 40 n = 90
  • 37. Calculating Test Statistics. 4 Favor Neutral Oppose f row Democrat fo =10 fo =10 fo =30 50 fe =13.9 fe =13.9 fe=22.2 = 40* 25/90 Republica fo =15 fo =15 fo =10 40 n f =11.1 f =11.1 fe =17.8 e e f column 25 25 40 n = 90
  • 38. Calculating Test Statistics. 4 (10 − 13.89) 2 (10 − 13.89) 2 (30 − 22.2) 2 χ2 = + + + 13.89 13.89 22.2 (15 − 11.11) 2 (15 − 11.11) 2 (10 − 17.8) 2 + + 11.11 11.11 17.8 = 11.03
  • 39. Determine Degrees of. 5 Freedom df = (R-1)(C-1) = (2-1)(3-1) = 2
  • 40. Compare computed test statistic. 6 against a tabled/critical value  α = 0.05  df = 2  Critical tabled value = 5.991  Test statistic, 11.03, exceeds critical value  Null hypothesis is rejected  Democrats & Republicans differ significantly in their opinions on gun control issues
  • 41. SPSS Output for Gun Control Example Chi-Square Tests Asymp. Sig. Value df (2-sided) Pearson Chi-Square 11.025a 2 .004 Likelihood Ratio 11.365 2 .003 Linear-by-Linear 8.722 1 .003 Association N of Valid Cases 90 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 11.11.
  • 42. Additional Information in SPSS Output  Exceptions that might distort χ2 Assumptions  Associations in some but not all categories  Low expected frequency per cell  Extent of association is not same as statistical significance Demonstrated through an example
  • 43. Another Example Heparin Lock Placement Complication Incidence * Heparin Lock Placement Time Group Crosstabulation Heparin Lock Placement Time Group 1 2 Total Time: Complication Had Compilca Count 9 11 20 1 = 72 hrs Incidence Expected Count 10.0 10.0 20.0 2 = 96 hrs % within Heparin Lock 18.0% 22.0% 20.0% Placement Time Group Had NO Compilca Count 41 39 80 Expected Count 40.0 40.0 80.0 % within Heparin Lock 82.0% 78.0% 80.0% Placement Time Group Total Count 50 50 100 Expected Count 50.0 50.0 100.0 % within Heparin Lock 100.0% 100.0% 100.0% Placement Time Group from Polit Text: Table 8-1
  • 44. Hypotheses in Heparin Lock Placement  Ho: There is no association between complication incidence and length of heparin lock placement. (The variables are independent).  Ha: There is an association between complication incidence and length of heparin lock placement. (The variables are related).
  • 45. More of SPSS Output Chi-Square Tests Asymp. Sig. Exact Sig. Exact Sig. Value df (2-sided) (2-sided) (1-sided) Pearson Chi-Square .250b 1 .617 Continuity Correctiona .063 1 .803 Likelihood Ratio .250 1 .617 Fisher's Exact Test .803 .402 Linear-by-Linear .248 1 .619 Association N of Valid Cases 100 a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10. 00.
  • 46. Pearson Chi-Square  Pearson Chi-Square = . 250, p = .617 Since the p > .05, we fail to reject the null hypothesis that the Chi-Square Tests complication rate is Asymp. Sig. Exact Sig. Exact Sig. Value df (2-sided) (2-sided) (1-sided) Pearson Chi-Square .250b 1 .617 unrelated to heparin lock Continuity Correctiona Likelihood Ratio .063 .250 1 1 .803 .617 placement time. Fisher's Exact Test Linear-by-Linear .248 1 .619 .803 .402 Association  Continuity correction is N of Valid Cases 100 a. Computed only for a 2x2 table used in situations in b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10. which the expected 00. frequency for any cell in a 2 by 2 table is less than 10.
  • 47. More SPSS Output Symmetric Measures Asymp. a b Value Std. Error Approx. T Approx. Sig. Nominal by Phi -.050 .617 Nominal Cramer's V .050 .617 Interval by Interval Pearson's R -.050 .100 -.496 .621c Ordinal by Ordinal Spearman Correlation -.050 .100 -.496 .621c N of Valid Cases 100 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. c. Based on normal approximation.
  • 48. Phi Coefficient  Pearson Chi-Square Symmetric Measures Asym provides information Value Std. E Nominal by Phi -.050 Nominal Cramer's V .050 about the existence of Interval by Interval Ordinal by Ordinal Pearson's R Spearman Correlation -.050 -.050 relationship between 2 N of Valid Cases 100 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypo nominal variables, but not c. Based on normal approximation. about the magnitude of the relationship Phi coefficient is the χ  2 measure of the strength of the association φ= N
  • 49. Cramer’s V  When the table is larger than 2 Symmetric Measures by 2, a different index must be Value Asymp Std. Er used to measure the strength Nominal by Phi -.050 Nominal Cramer's V .050 of the relationship between the Interval by Interval Ordinal by Ordinal Pearson's R Spearman Correlation -.050 -.050 .1 .1 variables. One such index is N of Valid Cases a. Not assuming the null hypothesis. 100 Cramer’s V. b. Using the asymptotic standard error assuming the null hypoth c. Based on normal approximation.  If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the χ 2 second variable. V= N (k − 1)
  • 50. Cramer’s V  When the table is larger than 2 Symmetric Measures by 2, a different index must be Value Asymp Std. Er used to measure the strength Nominal by Phi -.050 Nominal Cramer's V .050 of the relationship between the Interval by Interval Ordinal by Ordinal Pearson's R Spearman Correlation -.050 -.050 .1 .1 variables. One such index is N of Valid Cases a. Not assuming the null hypothesis. 100 Cramer’s V. b. Using the asymptotic standard error assuming the null hypoth c. Based on normal approximation.  If Cramer’s V is large, it means that there is a tendency for particular categories of the first variable to be associated with particular categories of the χ 2 second variable. V= N (k − 1) Number of Smallest of number cases of rows or columns
  • 51. Interpreting Cell Differences in a Chi-square Test - 1 A chi-square test of independence of the relationship between sex and marital status finds a statistically significant relationship between the variables.
  • 52. Interpreting Cell Differences in a Chi-square Test - 2 Researcher often try to identify try to identify which cell or cells are the major contributors to the significant chi-square test by examining the pattern of column percentages. Based on the column percentages, we would identify cells on the married row and the widowed row as the ones producing the significant result because they show the largest differences: 8.2% on the married row (50.9%-42.7%) and 9.0% on the widowed row (13.1%-4.1%)
  • 53. Interpreting Cell Differences in a Chi-square Test - 3 Using a level of significance of 0.05, the critical value for a standardized residual would be -1.96 and +1.96. Using standardized residuals, we would find that only the cells on the widowed row are the significant contributors to the chi-square relationship between sex and marital status. If we interpreted the contribution of the married marital status, we would be mistaken. Basing the interpretation on column percentages can be misleading.
  • 54. Chi-Square Test of Independence: post hoc test ) in SPSS (1 You can conduct a chi-square test of independence in crosstabulation of SPSS by selecting: Analyze > Descriptive Statistics > Crosstabs…
  • 55. Chi-Square Test of Independence: post hoc test ) in SPSS (2 First, select and move the variables for the question to “Row(s):” and “Column(s):” list boxes. The variable mentioned first in the problem, sex, is used as the independent variable and is moved to the “Column(s):” list box. The variable mentioned Second, click on second in the problem, “Statistics…” button to [fund], is used as the request the test dependent variable and is statistic. moved to the “Row(s)” list box.
  • 56. Chi-Square Test of Independence: post hoc test ) in SPSS (3 First, click on “Chi-square” to request the chi-square test of Second, click on “Continue” independence. button to close the Statistics dialog box.
  • 57. Chi-Square Test of Independence: post hoc test ) in SPSS (6 In the table Chi-Square Tests result, SPSS also tells us that “0 cells have expected count less than 5 and the minimum expected count is 70.63”. The sample size requirement for the chi-square test of independence is satisfied.
  • 58. Chi-Square Test of Independence: post hoc test ( in SPSS (7 The probability of the chi-square test statistic (chi-square=2.821( was p=0.244, greater than the alpha level of significance of 0.05. The null hypothesis that differences in "degree of religious fundamentalism" are independent of differences in "sex" is not rejected. The research hypothesis that differences in "degree of religious fundamentalism" are related to differences in "sex" is not supported by this analysis. Thus, the answer for this question is False. We do not interpret cell differences unless the chi-square test statistic supports the research hypothesis.
  • 59. SPSS Analysis Chi Square Test of Goodness of Fit  Analyze – X variable goes here Nonparametric Tests – Chi Square If all expected frequencies are the same, click this box If all expected frequencies are not the same, enter the expected value for each division here
  • 60. Examples ((using the Montana.sav data All All expected frequencies expected frequencies are the same are not the same
  • 61. Types of Statistical Inference  Parameter estimation:  It is used to estimate a population value, such as a mean, relative risk index or a mean difference between two groups.  Estimation can take two forms: • Point estimation: involves calculating a single statistic to estimate the parameter. E.g. mean and median.  Disadvantages: they offer no context for interpreting their accuracy and a point estimate gives no information regarding the probability that it is correct or close to the population value. • Interval estimation: is to estimate a range of values that has a high probability of containing the population value .
  • 62. Interval Estimation  For example, it is more likely the population height mean lies between 165-175cm.  Interval estimation involves constructing a confidence interval (CI) around the point estimate.  The upper and lower limits of the CI are called confidence limits.  A CI around a sample mean communicates a range of values for the population value, and the probability of being right. That is, the estimate is made with a certain degree of confidence of capturing the parameter.
  • 63. Confidence Intervals around a Mean  95% CI = (mean + (1.96 x SEM)  This statement indicates that we can be 95% confident that the population mean lies between the confident limits , and that these limits are equal to 1.96 times the true standard error, above and below the sample mean.  E.g. if the mean = 61 inches, and SEM = 1, What is 95% CI.  Solution: 95% CI = (61 + (1.96 X 1)) 95% CI = (61 + 1.96) 95% CI = 59.04 < μ < 62.96  E.g. if the mean = 61 inches, and SEM = 1, What is 99% CI.  Solution: 99% CI = (61 + (2.58 X 1)) 99% CI = (61 + 2.58) 99% CI = 58.42 < μ < 63.58
  • 64. Data Interpretation  Consideration: 1. Accuracy 1. critical view of the data 2. investigating evidence of the results 3. consider other studies’ results 4. peripheral data analysis 5. conduct power analysis: type I & type II True False True Correct Type-II False Type -I Correct
  • 65. Types of Errors If You…… When the Null Then You Hypothesis is… Have……. Reject the null True (there really Made a Type I hypothesis are no difference) Error Reject the null False (there really ☻ hypothesis are difference) Accept the null False (there really Made Type II hypothesis are difference) Error Accept the null True (there really ☻ hypothesis are no difference)
  • 66.  alpha : the level of significance used for establishing type-I error  β : the probability of type-II error  1 – β : is the probability of obtaining significance results ( power)  Effect size: how much we can say that the intervention made a significance difference
  • 67. 2. Meaning of the results - translation of the results and make it understandable 3. Importance: - translation of the significant findings into practical findings 4. Generalizability: - how can we make the findings useful for all the population 5. Implication: - what have we learned related to what has been used during study
  • 68. Needed Parameters  Alpha--chance of a Type I error  Beta--chance of a Type II error  Power = 1 - beta  Effect size--difference between groups or amount of variance explained or how much relationship there is between the DV and the IVs
  • 69. ?Remember this in English  Type I error is when you say there is a difference or relationship and there is not  Type II error is when you say there is no difference or relationship and there really is
  • 70. ?Which is more important  Type I error more important if possibility of harm or lethal effect  Type II error more important in relatively unexplored areas of research  In some studies, Type I and Type II errors may be equally important
  • 71. How to Increase Power 1. Increase the n 2. Decrease the unexplained variance--control by design or statistics (e.g. ANCOVA) 3. Increase alpha (controversial) 4. Use a one tailed test (directional hypothesis)--puts the zone of rejection all in one tail; same effect as increasing alpha 5. Use parametric statistics as long as you meet the assumptions. If not, parametric statistics are LESS powerful 6. Decrease measurement error (decrease unexplained variance)--use more reliable instruments, standardize measurement protocol, frequent calibration of physiologic instruments, improve inter-rater reliability
  • 72. ?What is good power By tradition, “good” power is 80% The correct answer is it depends on the nature of the phenomenon and which kind of error is most important in your study. This is a theoretical argument that you have to make. Using convention (alpha = .05 and power = .80, beta = .20) you are saying that Type I error is _________ as serious as a Type II error
  • 73. Effect Size How large an effect do I expect exists in the population if the null is false? OR How much of a difference do I want to be able to detect? The larger the effect, the fewer the cases needed to see it. (The difference is so big you can trip on it.)
  • 74. The World According to Power Kraemer & Thiemann  The more stringent the significance level, the greater the necessary sample size. More subjects are needed for a 1% level than a 5% level  Two tailed tests require larger sample sizes than one tailed tests. Assessing two directions at the same time requires a greater investment.  The smaller the effect size, the larger the necessary sample size. Subtle effects require greater efforts.  The larger the power required, the larger the necessary sample size. Greater protection from failure requires greater effort.  The smaller the sample size, the smaller the power, ie the greater the chance of failure
  • 75. The World According to Power Kraemer & Thiemann  If one proposed to go with a sample size of 20 or fewer, you have to be willing to have a high risk of failure or a huge effect size  To achieve 99% power for a effect size of . 01, you need > 150,000 subjects
  • 76. Power for each test  You do a power analysis for each statistic you are going to use.  Choose the sample size based on the highest number of subjects from the power analysis.  Use the most conservative power analysis--guarantees you the most subjects

Editor's Notes

  1. Mean difference between pairs of values
  2. Mean difference between pairs of values
  3. Mean difference between pairs of values