Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Analysis Of A Binary Outcome Variable

855

Published on

Government &amp; Healthcare Apps, SESUG 2011

Government &amp; Healthcare Apps, SESUG 2011

0 Comments
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

No Downloads
Views
Total Views
855
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
19
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript

• 1. Analysis of a Binary Outcome Variable Using the FREQ and the LOGISTIC Procedures Arthur Li
• 2. INTRODUCTION
• A common application in the health care industry:
Outcome (Y) Exposure (X) (smoking) (cancer) Exposure (X1) (age) Exposure (X2) (gender)
• PROC FREQ
• PROC LOGISTIC
• 3. CONTINGENCY TABLE
• One starting point  create a contingency table
• Forthofer & Lehnen (1981) (Agresti, 1990)
• Subjects: Caucasians who work in certain industrial plants in Houston
• Response (Y): breathing test
• explanatory variable (X) is smoking status
741 38 NEVER 927 131 CURRENT SMOKING STATUS NORMAL ABNORMAL BREATHING TEST
• 4. STUDY DESIGN
• Three types of study design in observational study
• Cross-sectional : X and Y are collected at the same time. Prevalence Ratio = P 1 / P 0
• Cohort: X is collected first:
• Relative Risk (RR) = P 1 / P 0
• Case-control: Y is collected first. You can’t calculate RR
P 1 = A A+B P 0 = C C+D D C 0 B A 1 Exposure (X) 0 1 Outcome (Y)
• 5. ODDS RATIO A Odds 1 = B Odds 0 = C D D C 0 B A 1 Exposure (X) 0 1 Outcome (Y) Odds Ratio = Odds 1 Odds 0 AD BC =
• 6. ODDS RATIO
• OR measures the strength between X and Y
OR = 1  No Association OR > 1  Exposed Group (X = 1) has higher odds OR < 1  Non-exposed Group (X = 0) has higher odds D C 0 B A 1 Exposure (X) 0 1 Outcome (Y) 0 1 infinity
• 7. ODDS RATIO 0 1 infinity
• To test the association between X and Y
• Use the chi-square statistics
• Use 95% CI for OR – including 1 or not
• OR measures the strength between X and Y
D C 0 B A 1 Exposure (X) 0 1 Outcome (Y)
• 8. PROC FREQ data breathTest; input test \$ 1 - 8 neversmk \$ 10 - 16 count; datalines ; abnormal current 131 normal current 927 abnormal never 38 normal never 741 ; 741 (D) 38 (C) NEVER (0) 927 (B) 131 (A) CURRENT (1) SMOKING STATUS (X) NORMAL (0) ABNORMAL (1) BREATHING TEST (Y)
• 9. PROC FREQ proc freq data =breathTest; weight count; tables neversmk*test; run ; the data is entered directly from the cell count of the table The FREQ Procedure Table of neversmk by test neversmk test Frequency‚ Percent ‚ Row Pct ‚ Col Pct ‚abnormal‚normal ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ current ‚ 131 ‚ 927 ‚ 1058 ‚ 7.13 ‚ 50.46 ‚ 57.59 ‚ 12.38 ‚ 87.62 ‚ ‚ 77.51 ‚ 55.58 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ never ‚ 38 ‚ 741 ‚ 779 ‚ 2.07 ‚ 40.34 ‚ 42.41 ‚ 4.88 ‚ 95.12 ‚ ‚ 22.49 ‚ 44.42 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 169 1668 1837 9.20 90.80 100.00
• 10. PROC FREQ - RELRISK proc freq data =breathTest; weight count; tables neversmk*test/ relrisk ; run ;
• Compute
• RR for col1
• RR for col2
• OR
col1 col2 741 (D) 38 (C) NEVER (0) 927 (B) 131 (A) CURRENT (1) SMOKING STATUS (X) NORMAL (0) ABNORMAL (1) BREATHING TEST (Y)
• 11. PROC FREQ - RELRISK proc freq data =breathTest; weight count; tables neversmk*test/ relrisk ; run ; Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 2.7557 1.8962 4.0047 Cohort (Col1 Risk) 2.5383 1.7904 3.5987 Cohort (Col2 Risk) 0.9211 0.8960 0.9470 Sample Size = 1837
• Compute
• RR for col1
• RR for col2
• OR
Odds of having an abnormal test result are about 2.8 times higher for current smokers compared to those who have never smoked (95% CI: 1.9 – 4.0).
• 12. PROC FREQ - CHISQ proc freq data =breathTest; weight count; tables neversmk*test/ relrisk chisq ; run ; Statistics for Table of neversmk by test Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 30.2421 <.0001 Likelihood Ratio Chi-Square 1 32.3820 <.0001 Continuity Adj. Chi-Square 1 29.3505 <.0001 Mantel-Haenszel Chi-Square 1 30.2257 <.0001 Phi Coefficient 0.1283 Contingency Coefficient 0.1273 Cramer's V 0.1283
• 13. LOGISTIC REGRESSION MODEL
• Use logistic regression to study the association between the “Breathing Test” & “Smoking”
• For logistic regression, the MLE (not OLS) is used to estimate the parameters
• Why not use a linear probability model?
• The probability is bounded
• The relationship between p and X can be nonlinear
• 14. LOGISTIC REGRESSION MODEL
• A logistic regression is used for predicting the probability occurrence of an event by fitting data to a logit function
• 15. LOGISTIC REGRESSION MODEL Reference cell coding β: the increment in log odds for current smokers compared to those that never smoked 741 38 NEVER 927 131 CURRENT SMOKING STATUS NORMAL ABNORMAL BREATHING TEST
• 16. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk / param =ref; weight count; model test = neversmk; run ; The LOGISTIC Procedure Model Information Data Set WORK.BREATHTEST Response Variable test Number of Response Levels 2 Weight Variable count Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 4 Number of Observations Used 4 Sum of Weights Read 1837 Sum of Weights Used 1837
• 17. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk / param =ref; weight count; model test = neversmk; run ; Response Profile Ordered Total Total Value test Frequency Weight 1 abnormal 2 169.0000 2 normal 2 1668.0000 Probability modeled is test='abnormal'.
• By default, PROC LOGISTIC models the probability of response levels with lower ordered value
• 18. LOGISTIC REGRESSION MODEL proc logistic data =breathTest descending ; class neversmk / param =ref; weight count; model test = neversmk; run ;
• To model probability of being “normal”
proc logistic data =breathTest; class neversmk / param =ref; weight count; model test ( descending ) = neversmk; run ; proc logistic data =breathTest; class neversmk / param =ref; weight count; model test ( event = &quot;normal&quot; ) = neversmk; run ;
• 19. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk / param =ref; weight count; model test = neversmk; run ; Class Level Information Design Class Value Variables neversmk current 1 never 0
• Reference cell coding estimates the difference between the effect of each level and the last level
• Easy to interpret the result
Reference Cell Coding
• 20. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk; weight count; model test = neversmk; run ; Class Level Information Design Class Value Variables neversmk current 1 never -1
• Effect coding estimates the difference between the effect of each level and the average effect over all levels
Effect Coding
• 21. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk / param =ref; weight count; model test = neversmk; run ; Class Level Information Design Class Value Variables neversmk current 1 never 0
• By default, the last ordered value of the classification variable is considered the reference level
• 22. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = &quot;never&quot; ) / param =ref; weight count; model test = neversmk; run ; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 1130.417 1100.035 SC 1129.803 1098.808 -2 Log L 1128.417 1096.035 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 32.3820 1 <.0001 Score 30.2421 1 <.0001 Wald 28.2434 1 <.0001
• Information for model selection
• These are the goodness-of-fit measures that used to compare one model to another
• 23. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = &quot;never&quot; ) / param =ref; weight count; model test = neversmk; run ; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 1130.417 1100.035 SC 1129.803 1098.808 -2 Log L 1128.417 1096.035 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 32.3820 1 <.0001 Score 30.2421 1 <.0001 Wald 28.2434 1 <.0001
• Ho: All regression coefficients =0
• Similar to overall F statistics in linear regression
• 24. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = &quot;never&quot; ) / param =ref; weight count; model test = neversmk; run ; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 1130.417 1100.035 SC 1129.803 1098.808 -2 Log L 1128.417 1096.035 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 32.3820 1 <.0001 Score 30.2421 1 <.0001 Wald 28.2434 1 <.0001
• Ho: All regression coefficients =0
• LRT is more reliable, esp. for small N
• 25. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = &quot;never&quot; ) / param =ref; weight count; model test = neversmk; run ; Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq neversmk 1 28.2434 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.9704 0.1663 318.9365 <.0001 neversmk current 1 1.0136 0.1907 28.2434 <.0001
• NEVERSMK variable has only 1 df, test results will be identical
• 26. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = &quot;never&quot; ) / param =ref; weight count; model test = neversmk; run ; Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq neversmk 1 28.2434 <.0001 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.9704 0.1663 318.9365 <.0001 neversmk current 1 1.0136 0.1907 28.2434 <.0001 Current smoker has 1.01 increase in the log odds of having abnormal test compared to people who never smoked OR = exp(1.0136) = 2.756
• 27. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = &quot;never&quot; ) / param =ref; weight count; model test = neversmk; run ; Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits neversmk current vs never 2.756 1.896 4.004 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 2.7557 1.8962 4.0047 Cohort (Col1 Risk) 2.5383 1.7904 3.5987 Cohort (Col2 Risk) 0.9211 0.8960 0.9470 Sample Size = 1837 Result from PROC FREQ:
• 28. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = &quot;never&quot; ) / param =ref; weight count; model test = neversmk; oddsratio 'smoking' neversmk; run ; ODDSRATIO <‘label’> variable </options>; new to 9.2! Wald Confidence Interval for Odds Ratios Label Estimate 95% Confidence Limits smoking neversmk current vs never 2.756 1.896 4.004
• 29. LOGISTIC REGRESSION MODEL proc logistic data =breathTest; class neversmk ( ref = &quot;never&quot; ) / param =ref; weight count; model test = neversmk; oddsratio 'smoking' neversmk/ cl =pl; run ; Profile Likelihood Confidence Interval for Odds Ratios Label Estimate 95% Confidence Limits smoking neversmk current vs never 2.756 1.916 4.054
• Wald CI is based on normal approximation
• PL CI is based the value of log-likelihood
• PL CI is generally preferred for small sample size
• 30. CONFOUNDING Smoking Test Age Not including Age can cause either over-/under-estimates of the relationship between Smoking & Test
• 31. CONFOUNDING Log (odds) Non smoker smoker Smoking Test Age Adjusting age, you are comparing smoker and non-smoker at the common values of age Age Non smoker Non smoker smoker smoker < 40 ≥ 40
• 32. INTERACTION
• Interaction: if the relationship between “Smoking” and “Test” differs depending upon whether the Age is absent or not
Age is referred to as an effect modifier Age Non smoker Non smoker smoker smoker < 40 ≥ 40 Log (odds)
• 33. INTERACTION & CONFOUNDING
• PROC FREQ: analyze the association of your interest when there is only one confounder or one effect modifier
• If you want to control multiple confounder variables or include multiple effect modifiers in your model, you need to use the PROC LOGISTIC
• 34. THE PURPOSES AND STRATEGIES FOR MODEL BUILDING
• The methods of fitting a regression model differ depending upon your research purpose
• Two Purposes :
• Investigating the essential association between an outcome variable with a set of explanatory variables - epidemiologic field
• Predict the outcome variable by using a set of explanatory variables
• 35. THE PURPOSES AND STRATEGIES FOR MODEL BUILDING
• Situations for building a prediction model:
• statistical decision making
• generating (not testing) hypotheses for a future study
• A prediction model needs to be validated in an independent sample to evaluate its usefulness
• For building a prediction model, one only needs to consider the interaction effect
• Technique for building a prediction model:
• forward
• backward
• and stepwise, etc.
• The focus of this talk is not on building a prediction model but rather estimating the relationship between a main explanatory variable and an outcome variable
• 36. THE PURPOSES AND STRATEGIES FOR MODEL BUILDING
• For estimating association, interaction and confounding issues must be considered
• Which should be evaluated first? Confounding effect or interaction effect?
• 37. THE PURPOSES AND STRATEGIES FOR MODEL BUILDING Is the association between “Smoking” & “Test” different in the 2 age groups? There is an interaction. Report age-specific OR No Interaction. Is “Age” a confounder? Report Crude OR Report Age-Adjusted OR Y N Y N
• 38. THE PURPOSES AND STRATEGIES FOR MODEL BUILDING
• Effect Modification (interaction) can be detected via statistical testing
• Confounding effect cannot be tested statistically
0.01 0.2 <0.05 P MAYBE YES Include? 2.4 Z X Y 4.2 Z X Y 2.3 OR X Y Covariate Main Var Outcome
• 39. PROC FREQ: INTERACTION EFFECT data breathTestAge; input test \$ 1 - 8 neversmk \$ 10 - 16 over40 \$ 18 - 20 count; datalines ; normal never no 577 abnormal never no 34 normal current no 682 abnormal current no 57 normal never yes 164 abnormal never yes 4 normal current yes 245 abnormal current yes 74 ;
• 40. PROC FREQ: INTERACTION EFFECT proc freq data =breathTestAge; weight count; tables over40*neversmk*test/ chisq relrisk cmh ; run ;
• Cochran-Mantel-Haenszel statistics (test for association between the row and column variables after adjusting for the 3 rd variable)
• The adjusted Mantel-Haenszel and logit estimates of the odds ratio and relative risks
• the Breslow-Day test for homogeneity of odds ratios
The CMH option:
• 41. PROC FREQ: INTERACTION EFFECT proc freq data =breathTestAge; weight count; tables over40*neversmk*test/ chisq relrisk cmh ; run ; Breslow-Day Test for Homogeneity of the Odds Ratios ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 18.0829 DF 1 Pr > ChiSq <.0001 Total Sample Size = 1837 the association between smoking status and the breathing test are not the same across different age groups
• 42. PROC FREQ: INTERACTION EFFECT proc freq data =breathTestAge; weight count; tables over40*neversmk*test/ chisq relrisk cmh ; run ; Statistics for Table 1 of neversmk by test Controlling for over40=no Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 2.4559 0.1171 Likelihood Ratio Chi-Square 1 2.4893 0.1146 Continuity Adj. Chi-Square 1 2.1260 0.1448 Mantel-Haenszel Chi-Square 1 2.4541 0.1172 Phi Coefficient 0.0427 Contingency Coefficient 0.0426 Cramer's V 0.0427 Statistics for Table 1 of neversmk by test Controlling for over40=no Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 1.4184 0.9144 2.2000 Cohort (Col1 Risk) 1.3861 0.9190 2.0906 Cohort (Col2 Risk) 0.9772 0.9499 1.0054 Sample Size = 1350
• 43. PROC FREQ: INTERACTION EFFECT proc freq data =breathTestAge; weight count; tables over40*neversmk*test/ chisq relrisk cmh ; run ; Statistics for Table 2 of neversmk by test Controlling for over40=yes Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 35.4510 <.0001 Likelihood Ratio Chi-Square 1 45.1246 <.0001 Continuity Adj. Chi-Square 1 33.9203 <.0001 Mantel-Haenszel Chi-Square 1 35.3782 <.0001 Phi Coefficient 0.2698 Contingency Coefficient 0.2605 Cramer's V 0.2698 Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 12.3837 4.4416 34.5272 Cohort (Col1 Risk) 9.7429 3.6253 26.1844 Cohort (Col2 Risk) 0.7868 0.7374 0.8394
• 44. PROC FREQ: INTERACTION EFFECT proc freq data =breathTestAge; weight count; tables over40*neversmk*test/ chisq relrisk cmh ; run ; Summary Statistics for neversmk by test Controlling for over40 Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 Nonzero Correlation 1 25.2444 <.0001 2 Row Mean Scores Differ 1 25.2444 <.0001 3 General Association 1 25.2444 <.0001 Estimates of the Common Relative Risk (Row1/Row2) Type of Study Method Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control Mantel-Haenszel 2.5683 1.7618 3.7441 (Odds Ratio) Logit 1.9840 1.3252 2.9702 Cohort Mantel-Haenszel 2.4174 1.6754 3.4879 (Col1 Risk) Logit 1.8475 1.2641 2.7001 Cohort Mantel-Haenszel 0.9289 0.9046 0.9538 (Col2 Risk) Logit 0.9437 0.9195 0.9686 These statistics and its adjusted OR are only useful if there is a homogeneity in the OR across each category of the adjusting variable
• 45. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = &quot;never&quot; ) over40 ( ref = &quot;no&quot; )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; run ;
• 46. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = &quot;never&quot; ) over40 ( ref = &quot;no&quot; )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; run ; Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.8315 0.1765 257.4193 <.0001 neversmk current 1 0.3495 0.2240 2.4355 0.1186 over40 yes 1 -0.8820 0.5359 2.7086 0.0998 neversmk*over40 current yes 1 2.1668 0.5691 14.4985 0.0001 Wald Test:
• 47. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = &quot;never&quot; ) over40 ( ref = &quot;no&quot; )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; run ; Likelihood Ratio Test:
• 48. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = &quot;never&quot; ) over40 ( ref = &quot;no&quot; )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; run ; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 1130.417 1055.467 SC 1130.497 1055.785 -2 Log L 1128.417 1047.467 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 80.9500 3 <.0001 Score 95.7956 3 <.0001 Wald 81.3305 3 <.0001
• 49. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = &quot;never&quot; ) over40 ( ref = &quot;no&quot; )/ param =ref; weight count; model test = neversmk over40; run ; Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 1130.417 1074.123 SC 1130.497 1074.361 -2 Log L 1128.417 1068.123 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 60.2942 2 <.0001 Score 61.2515 2 <.0001 Wald 56.4737 2 <.0001
• 50. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = &quot;never&quot; ) over40 ( ref = &quot;no&quot; )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; ods output FitStatistics = log2Ratio_full GlobalTests = df_full; data _null_ ; set log2Ratio_full; if Criterion = '-2 Log L' ; call symput( 'neg2L_full' , InterceptAndCovariates); data _null_ ; set df_full; if Test = 'Likelihood Ratio' ; call symput( 'df_full' , DF);
• 51. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = &quot;never&quot; ) over40 ( ref = &quot;no&quot; )/ param =ref; weight count; model test = neversmk over40; ods output FitStatistics = log2Ratio_reduce GlobalTests = df_reduce; data _null_ ; set log2Ratio_reduce; if Criterion = '-2 Log L' ; call symput( 'neg2L_reduce' , InterceptAndCovariates); data _null_ ; set df_reduce; if Test = 'Likelihood Ratio' ; call symput( 'df_reduce' , DF); run ;
• 52. PROC LOGISTIC: INTERACTION EFFECT data result; LR = &neg2L_reduce - &neg2L_full; df = &df_full - &df_reduce; p = 1 -probchi(LR,df); label LR = 'Likelihood Ratio' ; proc print data =result label noobs ; title &quot;Likelihood ratio test&quot; ; run ; Likelihood ratio test Likelihood Ratio df p 20.6558 1 .000005497
• 53. PROC LOGISTIC: INTERACTION EFFECT proc logistic data =breathTestAge; class neversmk ( ref = &quot;never&quot; ) over40 ( ref = &quot;no&quot; )/ param =ref; weight count; model test = neversmk over40 neversmk*over40; oddsratio neversmk/ at (over40 = 'no' ) ; oddsratio neversmk/ at (over40 = 'yes' ); run ; Wald Confidence Interval for Odds Ratios Label Estimate 95% Confidence Limits neversmk current vs never at over40=no 1.418 0.914 2.200 neversmk current vs never at over40=yes 12.383 4.441 34.525
• 54. NURSE HEALTH STUDY
• NHS - nurses aged 30 to 55 who were enrolled in 1976
• Part of the study investigated the association between OC use and BC
• 55. NURSE HEALTH STUDY data nurse_study; input bc age oc count; datalines ; 1 0 1 71 0 0 1 28418 1 0 0 35 0 0 0 12267 1 1 1 143 0 1 1 20661 1 1 0 321 0 1 0 44424 ; BREAST CANCER 35 71 CASE (1) AGE 30 – 39 (0) 12267 28418 CONTROL (0) 44424 321 NO (0) 20651 143 YES (1) OC USE CONTROL (0) CASE (1) AGE 40 – 55 (1)
• 56. NURSE HEALTH STUDY proc freq data =nurse_study order =data; weight count; tables age*oc*bc/ chisq relrisk cmh ; run ; Breslow-Day Test for Homogeneity of the Odds Ratios ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 0.1521 DF 1 Pr > ChiSq 0.6966 There is no interaction Check for confounding
• 57. NURSE HEALTH STUDY Summary Statistics for oc by bc Controlling for age Cochran-Mantel-Haenszel Statistics (Based on Table Scores) Statistic Alternative Hypothesis DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 Nonzero Correlation 1 0.4361 0.5090 2 Row Mean Scores Differ 1 0.4361 0.5090 3 General Association 1 0.4361 0.5090 Estimates of the Common Relative Risk (Row1/Row2) Type of Study Method Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control Mantel-Haenszel 0.9419 0.7882 1.1256 (Odds Ratio) Logit 0.9415 0.7882 1.1246 Cohort Mantel-Haenszel 0.9422 0.7897 1.1243 (Col1 Risk) Logit 0.9419 0.7894 1.1238 Cohort Mantel-Haenszel 1.0003 0.9994 1.0013 (Col2 Risk) Logit 1.0003 0.9995 1.0012
• 58. NURSE HEALTH STUDY proc freq data =nurse_study order =data; weight count; tables oc*bc/ chisq relrisk ; run ; Statistics for Table of oc by bc Statistic DF Value Prob ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Chi-Square 1 17.8881 <.0001 Likelihood Ratio Chi-Square 1 18.1401 <.0001 Continuity Adj. Chi-Square 1 17.5337 <.0001 Mantel-Haenszel Chi-Square 1 17.8879 <.0001 Phi Coefficient -0.0130 Contingency Coefficient 0.0130 Cramer's V -0.0130 Statistics for Table of oc by bc Estimates of the Relative Risk (Row1/Row2) Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 0.6944 0.5858 0.8230 Cohort (Col1 Risk) 0.6957 0.5874 0.8239 Cohort (Col2 Risk) 1.0019 1.0010 1.0028
• 59. NURSE HEALTH STUDY
• Unadjusted OR = 0.69, Adjusted OR = 0.94  Age is a confounder
• In this situation, the age-adjusted statistics and its odds ratio should be reported
• After adjusting for age, there is no association between using OC and having BC (p = 0.51; age adjusted OR = 0.94, 95% CI = 0.79 – 1.13)
• 60. NURSE HEALTH STUDY proc logistic data =nurse_study descending ; weight count; model bc = oc age; run ; Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -5.9083 0.1156 2612.5788 <.0001 oc 1 -0.0602 0.0911 0.4360 0.5090 age 1 0.9835 0.1133 75.3707 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits oc 0.942 0.788 1.126 age 2.674 2.141 3.338
• 61. NURSE HEALTH STUDY proc logistic data =nurse_study descending ; weight count; model bc = oc; run ; Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -5.0704 0.0532 9095.8096 <.0001 oc 1 -0.3646 0.0867 17.6834 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits oc 0.694 0.586 0.823
• 62. CONCLUSION
• Analyzing variables with dichotomized outcomes by using the FREQ and LOGISTIC procedures is a common task for statisticians in the health care industry
• Simply knowing how to use the procedures is not sufficient
• Understanding the goal of model building and following correct model-building steps are extremely important in order to obtain accurate and unbiased results