SlideShare a Scribd company logo
1 of 19
Download to read offline
Introduction to applied statistics
& applied statistical methods
1 Prof. Dr. Chang Zhu
Overview
• Chi-square test
• Discriminant analysis
• Logistic regression
• Nominal data/categorical data
1
3
• Dichotomous variable
Only 2 values, yes or no, male or
female
• Binary variable
Assign a 0 (yes) or 1 (no) to indicate
presence or absence of something
Chi-square analysis
• Level of measurement is nominal
• The chi square test is non-parametric. It can
be used when normality is not assumed.
2
Chi-square analysis
Association between categorical variables
•Suppose both response and explanatory variables
are categorical.
•There is association if the population conditional
distribution for the response variable differs among
the categories of the explanatory variable
Example: Contingency table on happiness cross-classified
by family income (data from 2006 GSS)
Chi-square analysis
Happiness
Income Very Pretty Not too Total
---------------------------------------------
Above 272 (44%) 294 (48%) 49 (8%) 615
Average 454 (32%) 835 (59%) 131 (9%) 1420
Below 185 (20%) 527 (57%) 208 (23%) 920
----------------------------------------------
Response: Happiness,Explanatory: Income
Relationship between income and happiness?
3
Chi-Squared Test of Independence
(Karl Pearson, 1900)
• Tests H0: The variables are statistically independent
• Ha: The variables are statistically dependent
• Intuition behind test statistic: Summarize differences
between observed cell counts and expected cell
counts (what is expected if H0 true)
• Notation: fo = observed frequency (cell count)
fe = expected frequency
r = number of rows in table, c = number of columns
Chi-square analysis
• Chi-squared test answers “Is there an association?”
• Standardized residuals answer “How do data
differ from what independence predicts?”
• “How strong is the association?” using a measure
of the strength of association, such as the difference
of proportions
4
Chi-square analysis
• Like all tests of hypothesis, chi square is
sensitive to sample size.
– As N increases, obtained chi square increases.
– With large samples, trivial relationships may be
significant. To correct for this, when N>1000, set
your alpha = .01.
Practice 1
• CHI-SQUARE TEST (CROSS-TAB)
• A group of students were classified in terms of
personality (introvert or extrovert) and in
terms of colour preference (red, yellow, green
or blue). Personality and colour preference are
categorical variables. We want to find answer
to this question:
• Is there an association between personality and
colour preference?
5
Practice 1
• In SPSS, Analyze > Descriptive Statistics >
Crosstab
Practice 1 (output)
Chi-Square Tests
Asymp. Sig. (2-
Value df sided)
71.200a
Pearson Chi-Square 3 .000
Likelihood Ratio 70.066 3 .000
Linear-by-Linear Association 69.124 1 .000
N of Valid Cases 400
a. 0 cells (0.0%) have expected count less than 5. The
minimum expected count is 10.00.
There is a relationship between students’ personality and preferences
for colours: χ² (3, N = 400) = 71.20, p < .0001.
6
Discriminant analysis
• Similar to Regression, except that criterion (or
dependent variable) is categorical rather than
continuous.
• used to identify boundaries between groups of
objects
For example: (a) does a person have the disease
or not
(b) Is someone a good credit risk or not?
(c) Should a student be admitted to college or not?
Discriminant analysis
• We wish to predict group membership for
a number of subjects from a set of
predictor variables.
• The criterion variable (also called grouping
variable) is the object of classification. This
is ALWAYS a categorical variable.
• Simple case: two groups and p predictor
variables.
14
7
Discriminant analysis
• Similar to regression:
– What predictor variables are related to the
criterion (dependent variable)
– Predict values on the criterion variable when
given new values on the predictor variable
Discriminant analysis
• Can we classify new (unclassified) subjects into
groups?
– Given the classification functions how accurate are
we? And when we are inaccurate is there some
pattern to the misclassification?
D = (.024 × age) + (.080 × self-concept) + (-.100 × anxiety) + (-.012 days absent)
+ (.134 anti-smoking score) - 4.543
• What is the strength of association between
group membership and the predictors?
8
Discriminant analysis
Questions?
•Which predictors are most important in
predicting group membership?
Practice 2
A study is set up to determine if the following variables
help to discriminate between those who smoke and those
whose don’t:
•age
•absence (days of absence last year)
•selfcon (self-concept score)
•anxiety (anxiety score)
•anti_smoking (attitude towards anti-smoking policies)
9
Practice 2
• In SPSS, Analyze > Classify > Discriminant
Practice 2
• In SPSS, Analyze > Classify > Discriminant
10
Practice 2
Functions at Group CentroidsCanonical Discriminant
Function Coefficients
(means of group calculated by the D function)Function
1
Functionage .024
self concept score .080
anxiety score -.100 1
days absent last year -.012
non-smoker 1.125total anti-smoking test score .134
(Constant) -4.543
smoker -1.598Unstandardized coefficients
D = (.024 × age) + (.080 × self-concept) + (-.100 × anxiety) + (-.012 days absent) +
(.134 anti-smoking score) - 4.543
Practice 2
Classification Resultsa,c
Predicted Group
Membership
smoke or not
non-
smoker smoker T otal
Original Count non-smoker 19238 257
smoker 17 164 181
% non-smoker 92.6 7.4 100.0
smoker 9.4 90.6 100.0
Cross- Count non-smoker 238 19 257
validatedb
smoker 17 164 181
% non-smoker 92.6 7.4 100.0
smoker 9.4 90.6 100.0
a. 91.8% of original grouped cases correctly classified.
11
Practice 2
When reporting the result, we should include the following:
• Name of the predictors and sample size
• Results of the Univariate ANOVAs and the Box’s M test
• The significance of the discriminant function
• The variance explained (Canonical correlation coefficient)
• Significant predictors and their contribution to the
model (discriminant function)
• Result from the cross-validation process
(page 9)
Logistic regression
• In logistic regression the response (Y) is a
dichotomous categorical variable.
For example: voting, mortality, and
participation data is not continuous or
distributed normally.
Binary logistic regression is a type of
regression analysis where the dependent
variable is a dummy variable: coded 0 (did not
vote) or 1(did vote)
12
Logistic regression
• Models the relationship between a set of
variables xi
– dichotomous (eat : yes/no)
– categorical (social class, ... )
– continuous (age, ...)
and
– dichotomous variable Y
Binary Logistic regression
• Binary logistic regression is a type of
regression analysis where the dependent
variable is a dummy variable (coded 0, 1)
13
BinaryBinary LogisticDependentregressionVariables
A few examples:
Consumer chooses brand (1) or not
(0); A quality defect occurs (1) or not
(0); A person is hired (1) or not (0);
Other Examples
Binary Logistic regression
• The logistic regression model is simply a non-
linear transformation of the linear regression.
• The logistic distribution is an S-shaped
distribution function (cumulative density
function) which is similar to the standard
normal distribution and constrains the
estimated probabilities to lie between 0 and 1.
14
Binary Logistic regression
• p: the probability of success/event (range from 0 to 1)
• 1-p: probability of failure/non-event
If the probability of success is .8 (80%), the
probability of failure is ???
• The odds of success: the ratio between the probability
of success over the probability of failure
• What is the odds of success for the above situation?
• What can we conclude about the probabilities of success
and failure in a situation when odds equal to 1?
Binary Logistic regression
• The odds of success: the ratio between the probability
of success over the probability of failure
• Logistic regression: model the logit-transformed probability as
a linear relationship with the predictor variables.
• logit(p) = log(p/(1-p)) = log (odds) = b0 + b1*x1 + ... + bk*xk to a
probability:
p= exp(b0 + b1*x1 + ... + bk*xk)/(1+exp(b0 + b1*x1 + ... + bk*xk))
15
Binary Logistic regression
(SPSS output)
Variables in the Equation
B Exp(B)
Step 1a (log odds) S.E. Wald df Sig. (odds)
-.005 .202 .001 1 .981 .995gender(1)
• If the odds ratio > 1: when the predictor increases, the odds
of the event occurs increase.
• If the odds ratio < 1: when the predictor increases, the odds
of the event occurs decreases.
Practice 3
• Conduct logistic regression to see if gender is a
significant predictor of whether someone is a
smoker or non-smoker.
• In SPSS, Analyze > Regression > Binary
Logistic
• The data file is smoker_DA.sav.
16
Practice 3
Practice 3
Analyze > Regression > Binary Logistic
Practice 3
• Conduct logistic regression to see if anti-
smoking attitude is a significant predictor
of whether someone is a smoker or non-
smoker.
• In SPSS, Analyze > Regression > Binary
Logistic
• The data file is smoker_DA.sav.
17
Practice 3
Practice 3
Conduct logistic regression to see the following are
significant predictors of whether someone is a smoker or
non-smoker:
•age
•gender
•absence (days of absence last year)
•selfcon (self-concept score) •anxiety
(anxiety score)
•anti_smoking (attitude towards anti-smoking policies)
When we have no idea about the importance of the
predictors, so we’ll choose Stepwise: Forward LR)
Practice 3
95% C.I. for Odds
Ratio
B S.E. Odds Ratio Lower Upper
constant 9.257**
2.050 10480.856
self-concept
-.260** .033 .771 .724 .822
anxiety
.236** .036 1.266 1.181 1.357
absence
.075*
.030 1.078 1.016 1.144
anti-smoking test
score -.303** .075 .739 .638 .856
Notes. R2=.607 (Cox & Snell), .818 (Nagelkerke). Model χ² (8) = 42.0, p < .001. *
p <.05. **
p <.01
18
Practice 3
• Report:
• A discriminant analysis was conducted age, gender, number of
days from work in previous year, self-concept score, anxiety
score, and attitude to anti-smoking workplace policy as
predictors. A total of 438 cases were analyzed. The full
model significantly predicted whether an employee is a
smoker or non-smoker (χ² = 42.04, df = 8, p < .001),
accounting for between 60.7% and 81.8% on the variance in
the group membership with 92.6% non-smokers and 90.6%
smokers successfully predicted.

More Related Content

What's hot

Tests of significance
Tests of significanceTests of significance
Tests of significanceAkhilaNatesan
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spssDr Nisha Arora
 
The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-testChristina K J
 
Introduction tocausalinference april02_2020
Introduction tocausalinference april02_2020Introduction tocausalinference april02_2020
Introduction tocausalinference april02_2020Viswanath Gangavaram
 
Dependent T Test
Dependent T TestDependent T Test
Dependent T Testshoffma5
 
Exploratory Data Analysis - Checking For Normality
Exploratory Data Analysis - Checking For NormalityExploratory Data Analysis - Checking For Normality
Exploratory Data Analysis - Checking For NormalityAzmi Mohd Tamil
 
T11 types of tests
T11 types of testsT11 types of tests
T11 types of testskompellark
 
Non parametric tests
Non parametric testsNon parametric tests
Non parametric testsTwinkleJoshi4
 
Lecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignmentLecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignmentDaria Bogdanova
 
Data screening
Data screeningData screening
Data screening緯鈞 沈
 
Reporting a single sample t-test
Reporting a single sample t-testReporting a single sample t-test
Reporting a single sample t-testKen Plummer
 
Analysis of Variance and Repeated Measures Design
Analysis of Variance and Repeated Measures DesignAnalysis of Variance and Repeated Measures Design
Analysis of Variance and Repeated Measures DesignJ P Verma
 
Two-way Mixed Design with SPSS
Two-way Mixed Design with SPSSTwo-way Mixed Design with SPSS
Two-way Mixed Design with SPSSJ P Verma
 
Test of hypothesis (t)
Test of hypothesis (t)Test of hypothesis (t)
Test of hypothesis (t)Marlon Gomez
 

What's hot (20)

Tests of significance
Tests of significanceTests of significance
Tests of significance
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-test
 
Z And T Tests
Z And T TestsZ And T Tests
Z And T Tests
 
Introduction tocausalinference april02_2020
Introduction tocausalinference april02_2020Introduction tocausalinference april02_2020
Introduction tocausalinference april02_2020
 
Dependent T Test
Dependent T TestDependent T Test
Dependent T Test
 
T test statistics
T test statisticsT test statistics
T test statistics
 
T test and ANOVA
T test and ANOVAT test and ANOVA
T test and ANOVA
 
Malhotra18
Malhotra18Malhotra18
Malhotra18
 
Exploratory Data Analysis - Checking For Normality
Exploratory Data Analysis - Checking For NormalityExploratory Data Analysis - Checking For Normality
Exploratory Data Analysis - Checking For Normality
 
T11 types of tests
T11 types of testsT11 types of tests
T11 types of tests
 
Non parametric tests
Non parametric testsNon parametric tests
Non parametric tests
 
Lecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignmentLecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignment
 
Data screening
Data screeningData screening
Data screening
 
Reporting a single sample t-test
Reporting a single sample t-testReporting a single sample t-test
Reporting a single sample t-test
 
Repeated Measures ANOVA
Repeated Measures ANOVARepeated Measures ANOVA
Repeated Measures ANOVA
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Analysis of Variance and Repeated Measures Design
Analysis of Variance and Repeated Measures DesignAnalysis of Variance and Repeated Measures Design
Analysis of Variance and Repeated Measures Design
 
Two-way Mixed Design with SPSS
Two-way Mixed Design with SPSSTwo-way Mixed Design with SPSS
Two-way Mixed Design with SPSS
 
Test of hypothesis (t)
Test of hypothesis (t)Test of hypothesis (t)
Test of hypothesis (t)
 

Similar to Applied statistics lecture_7

Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatMarwa Zalat
 
6.hypothesistesting.06
6.hypothesistesting.066.hypothesistesting.06
6.hypothesistesting.06MehwishFahad
 
univariate and bivariate analysis in spss
univariate and bivariate analysis in spss univariate and bivariate analysis in spss
univariate and bivariate analysis in spss Subodh Khanal
 
Chi square test final
Chi square test finalChi square test final
Chi square test finalHar Jindal
 
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdf
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdfDr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdf
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdfHassanMohyUdDin2
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfkobra22
 
Dive into the Data
Dive into the DataDive into the Data
Dive into the Datadr_jp_ebejer
 
Test of significance in Statistics
Test of significance in StatisticsTest of significance in Statistics
Test of significance in StatisticsVikash Keshri
 
Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxMarketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxxababid981
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics Bahzad5
 
Basic Statistical Concepts.pdf
Basic Statistical Concepts.pdfBasic Statistical Concepts.pdf
Basic Statistical Concepts.pdfKwangheeJung
 
The nature of probability and statistics
The nature of probability and statisticsThe nature of probability and statistics
The nature of probability and statisticsSan Benito CISD
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsMaarten van Smeden
 
Marketing Experimentation - Part I
Marketing Experimentation - Part IMarketing Experimentation - Part I
Marketing Experimentation - Part IMinha Hwang
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminardrdeepika87
 
3. descriptive statistics
3. descriptive statistics3. descriptive statistics
3. descriptive statisticsbilal samad
 

Similar to Applied statistics lecture_7 (20)

Spss basic Dr Marwa Zalat
Spss basic Dr Marwa ZalatSpss basic Dr Marwa Zalat
Spss basic Dr Marwa Zalat
 
6.hypothesistesting.06
6.hypothesistesting.066.hypothesistesting.06
6.hypothesistesting.06
 
univariate and bivariate analysis in spss
univariate and bivariate analysis in spss univariate and bivariate analysis in spss
univariate and bivariate analysis in spss
 
Chi square test final
Chi square test finalChi square test final
Chi square test final
 
Ch01 03
Ch01 03Ch01 03
Ch01 03
 
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdf
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdfDr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdf
Dr.Dinesh-BIOSTAT-Tests-of-significance-1-min.pdf
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdf
 
Dive into the Data
Dive into the DataDive into the Data
Dive into the Data
 
Test of significance in Statistics
Test of significance in StatisticsTest of significance in Statistics
Test of significance in Statistics
 
Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxMarketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptx
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 
Basic Statistical Concepts.pdf
Basic Statistical Concepts.pdfBasic Statistical Concepts.pdf
Basic Statistical Concepts.pdf
 
The nature of probability and statistics
The nature of probability and statisticsThe nature of probability and statistics
The nature of probability and statistics
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
 
9주차
9주차9주차
9주차
 
introduction CDA.pptx
introduction CDA.pptxintroduction CDA.pptx
introduction CDA.pptx
 
Marketing Experimentation - Part I
Marketing Experimentation - Part IMarketing Experimentation - Part I
Marketing Experimentation - Part I
 
Stat2013
Stat2013Stat2013
Stat2013
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminar
 
3. descriptive statistics
3. descriptive statistics3. descriptive statistics
3. descriptive statistics
 

More from Daria Bogdanova

Get started: Learning approaches
Get started: Learning approachesGet started: Learning approaches
Get started: Learning approachesDaria Bogdanova
 
Template outline of_a_systematic_review_research_paper
Template outline of_a_systematic_review_research_paperTemplate outline of_a_systematic_review_research_paper
Template outline of_a_systematic_review_research_paperDaria Bogdanova
 
Template of a_research_proposal
Template of a_research_proposalTemplate of a_research_proposal
Template of a_research_proposalDaria Bogdanova
 
Research seminar lecture_apa_writing_and_references_students_full
Research seminar lecture_apa_writing_and_references_students_fullResearch seminar lecture_apa_writing_and_references_students_full
Research seminar lecture_apa_writing_and_references_students_fullDaria Bogdanova
 
Research seminar lecture_10_analysing_qualitative_data
Research seminar lecture_10_analysing_qualitative_dataResearch seminar lecture_10_analysing_qualitative_data
Research seminar lecture_10_analysing_qualitative_dataDaria Bogdanova
 
Research seminar lecture_9_focus_groups
Research seminar lecture_9_focus_groupsResearch seminar lecture_9_focus_groups
Research seminar lecture_9_focus_groupsDaria Bogdanova
 
Research seminar lecture_9_focus_groups
Research seminar lecture_9_focus_groups Research seminar lecture_9_focus_groups
Research seminar lecture_9_focus_groups Daria Bogdanova
 
Research seminar lecture_8_mixed_methods_research
Research seminar lecture_8_mixed_methods_researchResearch seminar lecture_8_mixed_methods_research
Research seminar lecture_8_mixed_methods_researchDaria Bogdanova
 
Research seminar lecture_7_criteria_good_research
Research seminar lecture_7_criteria_good_researchResearch seminar lecture_7_criteria_good_research
Research seminar lecture_7_criteria_good_researchDaria Bogdanova
 
Research seminar lecture_6
Research seminar lecture_6Research seminar lecture_6
Research seminar lecture_6Daria Bogdanova
 
Research seminar lecture_4_research_questions
Research seminar lecture_4_research_questionsResearch seminar lecture_4_research_questions
Research seminar lecture_4_research_questionsDaria Bogdanova
 
Research seminar lecture_3_literature_review
Research seminar lecture_3_literature_reviewResearch seminar lecture_3_literature_review
Research seminar lecture_3_literature_reviewDaria Bogdanova
 
Research seminar lecture_2_research_proposal__types_of_research_methods_stude...
Research seminar lecture_2_research_proposal__types_of_research_methods_stude...Research seminar lecture_2_research_proposal__types_of_research_methods_stude...
Research seminar lecture_2_research_proposal__types_of_research_methods_stude...Daria Bogdanova
 
Research seminar lecture_1_educational_research_proposal_&_apa
Research seminar lecture_1_educational_research_proposal_&_apaResearch seminar lecture_1_educational_research_proposal_&_apa
Research seminar lecture_1_educational_research_proposal_&_apaDaria Bogdanova
 
Lecture 8 guidelines_and_assignments
Lecture 8 guidelines_and_assignmentsLecture 8 guidelines_and_assignments
Lecture 8 guidelines_and_assignmentsDaria Bogdanova
 
Lecture 6 guidelines_and_assignment
Lecture 6 guidelines_and_assignmentLecture 6 guidelines_and_assignment
Lecture 6 guidelines_and_assignmentDaria Bogdanova
 
Lecture 5 practical_guidelines_assignments
Lecture 5 practical_guidelines_assignmentsLecture 5 practical_guidelines_assignments
Lecture 5 practical_guidelines_assignmentsDaria Bogdanova
 
Lecture 3 practical_guidelines_assignment
Lecture 3 practical_guidelines_assignmentLecture 3 practical_guidelines_assignment
Lecture 3 practical_guidelines_assignmentDaria Bogdanova
 
Lecture 2 practical_guidelines_assignment
Lecture 2 practical_guidelines_assignmentLecture 2 practical_guidelines_assignment
Lecture 2 practical_guidelines_assignmentDaria Bogdanova
 
Lecture 1 practical_guidelines_assignment
Lecture 1 practical_guidelines_assignmentLecture 1 practical_guidelines_assignment
Lecture 1 practical_guidelines_assignmentDaria Bogdanova
 

More from Daria Bogdanova (20)

Get started: Learning approaches
Get started: Learning approachesGet started: Learning approaches
Get started: Learning approaches
 
Template outline of_a_systematic_review_research_paper
Template outline of_a_systematic_review_research_paperTemplate outline of_a_systematic_review_research_paper
Template outline of_a_systematic_review_research_paper
 
Template of a_research_proposal
Template of a_research_proposalTemplate of a_research_proposal
Template of a_research_proposal
 
Research seminar lecture_apa_writing_and_references_students_full
Research seminar lecture_apa_writing_and_references_students_fullResearch seminar lecture_apa_writing_and_references_students_full
Research seminar lecture_apa_writing_and_references_students_full
 
Research seminar lecture_10_analysing_qualitative_data
Research seminar lecture_10_analysing_qualitative_dataResearch seminar lecture_10_analysing_qualitative_data
Research seminar lecture_10_analysing_qualitative_data
 
Research seminar lecture_9_focus_groups
Research seminar lecture_9_focus_groupsResearch seminar lecture_9_focus_groups
Research seminar lecture_9_focus_groups
 
Research seminar lecture_9_focus_groups
Research seminar lecture_9_focus_groups Research seminar lecture_9_focus_groups
Research seminar lecture_9_focus_groups
 
Research seminar lecture_8_mixed_methods_research
Research seminar lecture_8_mixed_methods_researchResearch seminar lecture_8_mixed_methods_research
Research seminar lecture_8_mixed_methods_research
 
Research seminar lecture_7_criteria_good_research
Research seminar lecture_7_criteria_good_researchResearch seminar lecture_7_criteria_good_research
Research seminar lecture_7_criteria_good_research
 
Research seminar lecture_6
Research seminar lecture_6Research seminar lecture_6
Research seminar lecture_6
 
Research seminar lecture_4_research_questions
Research seminar lecture_4_research_questionsResearch seminar lecture_4_research_questions
Research seminar lecture_4_research_questions
 
Research seminar lecture_3_literature_review
Research seminar lecture_3_literature_reviewResearch seminar lecture_3_literature_review
Research seminar lecture_3_literature_review
 
Research seminar lecture_2_research_proposal__types_of_research_methods_stude...
Research seminar lecture_2_research_proposal__types_of_research_methods_stude...Research seminar lecture_2_research_proposal__types_of_research_methods_stude...
Research seminar lecture_2_research_proposal__types_of_research_methods_stude...
 
Research seminar lecture_1_educational_research_proposal_&_apa
Research seminar lecture_1_educational_research_proposal_&_apaResearch seminar lecture_1_educational_research_proposal_&_apa
Research seminar lecture_1_educational_research_proposal_&_apa
 
Lecture 8 guidelines_and_assignments
Lecture 8 guidelines_and_assignmentsLecture 8 guidelines_and_assignments
Lecture 8 guidelines_and_assignments
 
Lecture 6 guidelines_and_assignment
Lecture 6 guidelines_and_assignmentLecture 6 guidelines_and_assignment
Lecture 6 guidelines_and_assignment
 
Lecture 5 practical_guidelines_assignments
Lecture 5 practical_guidelines_assignmentsLecture 5 practical_guidelines_assignments
Lecture 5 practical_guidelines_assignments
 
Lecture 3 practical_guidelines_assignment
Lecture 3 practical_guidelines_assignmentLecture 3 practical_guidelines_assignment
Lecture 3 practical_guidelines_assignment
 
Lecture 2 practical_guidelines_assignment
Lecture 2 practical_guidelines_assignmentLecture 2 practical_guidelines_assignment
Lecture 2 practical_guidelines_assignment
 
Lecture 1 practical_guidelines_assignment
Lecture 1 practical_guidelines_assignmentLecture 1 practical_guidelines_assignment
Lecture 1 practical_guidelines_assignment
 

Applied statistics lecture_7

  • 1. Introduction to applied statistics & applied statistical methods 1 Prof. Dr. Chang Zhu Overview • Chi-square test • Discriminant analysis • Logistic regression • Nominal data/categorical data 1
  • 2. 3 • Dichotomous variable Only 2 values, yes or no, male or female • Binary variable Assign a 0 (yes) or 1 (no) to indicate presence or absence of something Chi-square analysis • Level of measurement is nominal • The chi square test is non-parametric. It can be used when normality is not assumed. 2
  • 3. Chi-square analysis Association between categorical variables •Suppose both response and explanatory variables are categorical. •There is association if the population conditional distribution for the response variable differs among the categories of the explanatory variable Example: Contingency table on happiness cross-classified by family income (data from 2006 GSS) Chi-square analysis Happiness Income Very Pretty Not too Total --------------------------------------------- Above 272 (44%) 294 (48%) 49 (8%) 615 Average 454 (32%) 835 (59%) 131 (9%) 1420 Below 185 (20%) 527 (57%) 208 (23%) 920 ---------------------------------------------- Response: Happiness,Explanatory: Income Relationship between income and happiness? 3
  • 4. Chi-Squared Test of Independence (Karl Pearson, 1900) • Tests H0: The variables are statistically independent • Ha: The variables are statistically dependent • Intuition behind test statistic: Summarize differences between observed cell counts and expected cell counts (what is expected if H0 true) • Notation: fo = observed frequency (cell count) fe = expected frequency r = number of rows in table, c = number of columns Chi-square analysis • Chi-squared test answers “Is there an association?” • Standardized residuals answer “How do data differ from what independence predicts?” • “How strong is the association?” using a measure of the strength of association, such as the difference of proportions 4
  • 5. Chi-square analysis • Like all tests of hypothesis, chi square is sensitive to sample size. – As N increases, obtained chi square increases. – With large samples, trivial relationships may be significant. To correct for this, when N>1000, set your alpha = .01. Practice 1 • CHI-SQUARE TEST (CROSS-TAB) • A group of students were classified in terms of personality (introvert or extrovert) and in terms of colour preference (red, yellow, green or blue). Personality and colour preference are categorical variables. We want to find answer to this question: • Is there an association between personality and colour preference? 5
  • 6. Practice 1 • In SPSS, Analyze > Descriptive Statistics > Crosstab Practice 1 (output) Chi-Square Tests Asymp. Sig. (2- Value df sided) 71.200a Pearson Chi-Square 3 .000 Likelihood Ratio 70.066 3 .000 Linear-by-Linear Association 69.124 1 .000 N of Valid Cases 400 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 10.00. There is a relationship between students’ personality and preferences for colours: χ² (3, N = 400) = 71.20, p < .0001. 6
  • 7. Discriminant analysis • Similar to Regression, except that criterion (or dependent variable) is categorical rather than continuous. • used to identify boundaries between groups of objects For example: (a) does a person have the disease or not (b) Is someone a good credit risk or not? (c) Should a student be admitted to college or not? Discriminant analysis • We wish to predict group membership for a number of subjects from a set of predictor variables. • The criterion variable (also called grouping variable) is the object of classification. This is ALWAYS a categorical variable. • Simple case: two groups and p predictor variables. 14 7
  • 8. Discriminant analysis • Similar to regression: – What predictor variables are related to the criterion (dependent variable) – Predict values on the criterion variable when given new values on the predictor variable Discriminant analysis • Can we classify new (unclassified) subjects into groups? – Given the classification functions how accurate are we? And when we are inaccurate is there some pattern to the misclassification? D = (.024 × age) + (.080 × self-concept) + (-.100 × anxiety) + (-.012 days absent) + (.134 anti-smoking score) - 4.543 • What is the strength of association between group membership and the predictors? 8
  • 9. Discriminant analysis Questions? •Which predictors are most important in predicting group membership? Practice 2 A study is set up to determine if the following variables help to discriminate between those who smoke and those whose don’t: •age •absence (days of absence last year) •selfcon (self-concept score) •anxiety (anxiety score) •anti_smoking (attitude towards anti-smoking policies) 9
  • 10. Practice 2 • In SPSS, Analyze > Classify > Discriminant Practice 2 • In SPSS, Analyze > Classify > Discriminant 10
  • 11. Practice 2 Functions at Group CentroidsCanonical Discriminant Function Coefficients (means of group calculated by the D function)Function 1 Functionage .024 self concept score .080 anxiety score -.100 1 days absent last year -.012 non-smoker 1.125total anti-smoking test score .134 (Constant) -4.543 smoker -1.598Unstandardized coefficients D = (.024 × age) + (.080 × self-concept) + (-.100 × anxiety) + (-.012 days absent) + (.134 anti-smoking score) - 4.543 Practice 2 Classification Resultsa,c Predicted Group Membership smoke or not non- smoker smoker T otal Original Count non-smoker 19238 257 smoker 17 164 181 % non-smoker 92.6 7.4 100.0 smoker 9.4 90.6 100.0 Cross- Count non-smoker 238 19 257 validatedb smoker 17 164 181 % non-smoker 92.6 7.4 100.0 smoker 9.4 90.6 100.0 a. 91.8% of original grouped cases correctly classified. 11
  • 12. Practice 2 When reporting the result, we should include the following: • Name of the predictors and sample size • Results of the Univariate ANOVAs and the Box’s M test • The significance of the discriminant function • The variance explained (Canonical correlation coefficient) • Significant predictors and their contribution to the model (discriminant function) • Result from the cross-validation process (page 9) Logistic regression • In logistic regression the response (Y) is a dichotomous categorical variable. For example: voting, mortality, and participation data is not continuous or distributed normally. Binary logistic regression is a type of regression analysis where the dependent variable is a dummy variable: coded 0 (did not vote) or 1(did vote) 12
  • 13. Logistic regression • Models the relationship between a set of variables xi – dichotomous (eat : yes/no) – categorical (social class, ... ) – continuous (age, ...) and – dichotomous variable Y Binary Logistic regression • Binary logistic regression is a type of regression analysis where the dependent variable is a dummy variable (coded 0, 1) 13
  • 14. BinaryBinary LogisticDependentregressionVariables A few examples: Consumer chooses brand (1) or not (0); A quality defect occurs (1) or not (0); A person is hired (1) or not (0); Other Examples Binary Logistic regression • The logistic regression model is simply a non- linear transformation of the linear regression. • The logistic distribution is an S-shaped distribution function (cumulative density function) which is similar to the standard normal distribution and constrains the estimated probabilities to lie between 0 and 1. 14
  • 15. Binary Logistic regression • p: the probability of success/event (range from 0 to 1) • 1-p: probability of failure/non-event If the probability of success is .8 (80%), the probability of failure is ??? • The odds of success: the ratio between the probability of success over the probability of failure • What is the odds of success for the above situation? • What can we conclude about the probabilities of success and failure in a situation when odds equal to 1? Binary Logistic regression • The odds of success: the ratio between the probability of success over the probability of failure • Logistic regression: model the logit-transformed probability as a linear relationship with the predictor variables. • logit(p) = log(p/(1-p)) = log (odds) = b0 + b1*x1 + ... + bk*xk to a probability: p= exp(b0 + b1*x1 + ... + bk*xk)/(1+exp(b0 + b1*x1 + ... + bk*xk)) 15
  • 16. Binary Logistic regression (SPSS output) Variables in the Equation B Exp(B) Step 1a (log odds) S.E. Wald df Sig. (odds) -.005 .202 .001 1 .981 .995gender(1) • If the odds ratio > 1: when the predictor increases, the odds of the event occurs increase. • If the odds ratio < 1: when the predictor increases, the odds of the event occurs decreases. Practice 3 • Conduct logistic regression to see if gender is a significant predictor of whether someone is a smoker or non-smoker. • In SPSS, Analyze > Regression > Binary Logistic • The data file is smoker_DA.sav. 16
  • 17. Practice 3 Practice 3 Analyze > Regression > Binary Logistic Practice 3 • Conduct logistic regression to see if anti- smoking attitude is a significant predictor of whether someone is a smoker or non- smoker. • In SPSS, Analyze > Regression > Binary Logistic • The data file is smoker_DA.sav. 17
  • 18. Practice 3 Practice 3 Conduct logistic regression to see the following are significant predictors of whether someone is a smoker or non-smoker: •age •gender •absence (days of absence last year) •selfcon (self-concept score) •anxiety (anxiety score) •anti_smoking (attitude towards anti-smoking policies) When we have no idea about the importance of the predictors, so we’ll choose Stepwise: Forward LR) Practice 3 95% C.I. for Odds Ratio B S.E. Odds Ratio Lower Upper constant 9.257** 2.050 10480.856 self-concept -.260** .033 .771 .724 .822 anxiety .236** .036 1.266 1.181 1.357 absence .075* .030 1.078 1.016 1.144 anti-smoking test score -.303** .075 .739 .638 .856 Notes. R2=.607 (Cox & Snell), .818 (Nagelkerke). Model χ² (8) = 42.0, p < .001. * p <.05. ** p <.01 18
  • 19. Practice 3 • Report: • A discriminant analysis was conducted age, gender, number of days from work in previous year, self-concept score, anxiety score, and attitude to anti-smoking workplace policy as predictors. A total of 438 cases were analyzed. The full model significantly predicted whether an employee is a smoker or non-smoker (χ² = 42.04, df = 8, p < .001), accounting for between 60.7% and 81.8% on the variance in the group membership with 92.6% non-smokers and 90.6% smokers successfully predicted.