3. SAS Codes to Create a Binary Outcome
SAS file name: SAS demo TLC genmod
data tlc ; set ala.tlc ;
y=y0 ; time=0 ; week=0 ; output ;
y=y1 ; time=1 ; week=1 ; output ;
y=y4 ; time=2 ; week=4 ; output ;
y=y6 ; time=3 ; week=6 ; output ;
run ;
data tlc ; set tlc ;
if week=0 then delete ;
if y>=20 then lead_normal=0 ;
if y ne . and y < 20 then lead_normal=1 ;
run ;
proc print ; run ;
Note: the event/success is normal blood lead level
4. id trt y0 y1 y4 y6 y time week lead_normal
1 P 30.8 26.9 25.8 23.8 26.9 1 1 0
1 P 30.8 26.9 25.8 23.8 25.8 2 4 0
1 P 30.8 26.9 25.8 23.8 23.8 3 6 0
2 A 26.5 14.8 19.5 21.0 14.8 1 1 1
2 A 26.5 14.8 19.5 21.0 19.5 2 4 1
2 A 26.5 14.8 19.5 21.0 21.0 3 6 0
3 A 25.8 23.0 19.1 23.2 23.0 1 1 0
3 A 25.8 23.0 19.1 23.2 19.1 2 4 1
3 A 25.8 23.0 19.1 23.2 23.2 3 6 0
5. TLC Data
Days Group A Group P
7 0.78 0.16
28 0.76 0.26
42 0.54 0.26
Blood lead levels were repeatedly measured in the
TLC trial data.
Binary outcome: blood lead level < 20 μg/dL (no lead
poisoning)
Percent of no lead poisoning in the two groups:
10. data tlc ; set ala.tlc ;
y=y0 ; time=0 ; week=0 ; output ;
y=y1 ; time=1 ; week=1 ; output ;
y=y4 ; time=2 ; week=4 ; output ;
y=y6 ; time=3 ; week=6 ; output ;
run ;
data tlc ; set tlc ;
if week=0 then delete ;
if y>=20 then lead_normal=0 ;
if y ne . and y < 20 then lead_normal=1 ;
run ;
proc genmod data=tlc descending ;
class id trt ;
model lead_normal =trt week / d=bin link=logit ;
repeated subject=id / type=exch corrw modelse ;
output out=pprobs p=pred xbeta=xbeta ;
run ;
Note: Genmod default is to use empirical (i.e. robust) standard error estimates. I used
the “modelse” option to show the difference between empirical and model-based
results.
11. GEE Model Information
Correlation Structure Exchangeable
Subject Effect id (100 levels)
Number of Clusters 100
Correlation Matrix Dimension 3
Maximum Cluster Size 3
Minimum Cluster Size 3
Algorithm converged.
Working Correlation Matrix
Col1 Col2 Col3
Row1 1.0000 0.4622 0.4622
Row2 0.4622 1.0000 0.4622
Row3 0.4622 0.4622 1.0000
Exchangeable Working Correlation
Correlation 0.4621656646
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept -1.0402 0.2839 -1.5966 -0.4838 -3.66 0.0002
trt A 2.0654 0.3706 1.3391 2.7918 5.57 <.0001
trt P 0.0000 0.0000 0.0000 0.0000 . .
week -0.0613 0.0522 -0.1635 0.0409 -1.18 0.2399
12. Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept -1.0402 0.2839 -1.5966 -0.4838 -3.66 0.0002
trt A 2.0654 0.3706 1.3391 2.7918 5.57 <.0001
trt P 0.0000 0.0000 0.0000 0.0000 . .
week -0.0613 0.0522 -0.1635 0.0409 -1.18 0.2399
Analysis Of GEE Parameter Estimates
Model-Based Standard Error Estimates
Parameter Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept -1.0402 0.3150 -1.6575 -0.4229 -3.30 0.0010
trt A 2.0654 0.3677 1.3447 2.7862 5.62 <.0001
trt P 0.0000 0.0000 0.0000 0.0000 . .
week -0.0613 0.0471 -0.1536 0.0310 -1.30 0.1930
Scale 1.0000 . . . . .
13. TLC Data
Observed and predicted proportions of normal lead
level in the two groups (predicted in parentheses)
Note the differences between observed and predicted
proportions in the treatment group. This is because the model
we fit was “main effect” only which assumes treatment effects
Days Group A Group P
7 0.78 (0.72) 0.16 (0.25)
28 0.76 (0.69) 0.26 (0.22)
42 0.54 (0.66) 0.26 (0.20)
14. TLC Data: Adding an Interaction
proc genmod data=tlc descending ;
class id trt ;
model lead_normal =trt week trt*week / d=bin link=logit ;
repeated subject=id / type=exch corrw ;
output out=pprobs p=pred xbeta=xbeta ;
run ;
15. GEE Model Information
Correlation Structure Exchangeable
Subject Effect id (100 levels)
Number of Clusters 100
Correlation Matrix Dimension 3
Maximum Cluster Size 3
Minimum Cluster Size 3
Algorithm converged.
Working Correlation Matrix
Col1 Col2 Col3
Row1 1.0000 0.4784 0.4784
Row2 0.4784 1.0000 0.4784
Row3 0.4784 0.4784 1.0000
Exchangeable Working Correlation
Correlation 0.4783943345
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept -1.6952 0.3935 -2.4665 -0.9239 -4.31 <.0001
week 0.1233 0.0770 -0.0276 0.2742 1.60 0.1091
trt A 3.3776 0.5711 2.2583 4.4970 5.91 <.0001
trt P 0.0000 0.0000 0.0000 0.0000 . .
week*trt A -0.3452 0.1045 -0.5500 -0.1404 -3.30 0.0010
week*trt P 0.0000 0.0000 0.0000 0.0000 . .
16. Group P:
logit 𝜇𝑖𝑗 = −1.6952 + 0.1233 ∗ 𝑤𝑒𝑒𝑘
Group A:
logit 𝜇𝑖𝑗 = −1.6952 + 3.3776 + 0.1233 ∗ 𝑤𝑒𝑒𝑘 − 0.3452 ∗ 𝑤𝑒𝑒𝑘
= 1.6824 − 0.2219 ∗ 𝑤𝑒𝑒𝑘
Thus, in the placebo group (group P), the odds of having normal lead level goes up over time
(although not reaching significance at the 0.05 level)
OR per week= exp(0.1233) = 1.13
But in the treatment group (group A), the odds of having normal lead level goes down over
time:
OR per week = exp(-0.2219) = 0.80
Change in OR over time between the two groups is significantly different (p=0.0010)
𝑂𝑅 =
𝑃𝑟𝑜𝑏 (𝑏𝑙𝑜𝑜𝑑 𝑙𝑒𝑎𝑑<20)
𝑃𝑟𝑜𝑏 (𝑏𝑙𝑜𝑜𝑑 𝑙𝑒𝑎𝑑≥20)
17. TLC Data
Comparisons of observed and predicted probabilities (in
parentheses) from the GEE model with trt, week as main
effects and trt and week interaction.
Days Group A Group P
7 0.78 (0.81) 0.16 (0.17)
28 0.76 (0.69) 0.26 (0.23)
42 0.54 (0.59) 0.26 (0.28)
Days Group A Group P
7 0.78 (0.72) 0.16 (0.25)
28 0.76 (0.69) 0.26 (0.22)
42 0.54 (0.66) 0.26 (0.20)
Predicted results using
main effects only model in
parentheses
18. GEE2
R(α) is the working correlation matrix containing unknown
parameter α. If we can write V=Wα, then we can include a second
set of estimating equations for α.
Second-order generalized estimating equation (GEE2)
23. Alternate Logistic Regression using GEE2
Let be the log OR between pairs of
between subject binary outcomes.
The ALR algorithm models the log OR with:
𝛾𝑖𝑗𝑘 = 𝑍𝑖𝑗𝑘
′
𝛼
The vector α is now also included in the GEE iterative
algorithm in addition to the regression parameter β.
24. Respiratory Disease Example
• Clinical trial data comparing two treatments for a
respiratory disorder.
• Patients in each of two centers are randomly assigned
to groups receiving the active treatment or a placebo.
• ID re-used within each center
• During treatment, respiratory status, represented by
the variable outcome (coded as 0=poor, 1=good) is
determined for each of four visits.
25. Respiratory Disease Data
SAS file name: SAS demo GEE binary
center id treatment sex age baseline visit outcome
1 1 P M 46 0 1 0
1 1 P M 46 0 2 0
1 1 P M 46 0 3 0
1 1 P M 46 0 4 0
1 2 P M 28 0 1 0
1 2 P M 28 0 2 0
1 2 P M 28 0 3 0
1 2 P M 28 0 4 0
1 3 A M 23 1 1 1
1 3 A M 23 1 2 1
1 3 A M 23 1 3 1
1 3 A M 23 1 4 1
26. SAS Codes
proc genmod data=resp descend;
class id treatment(ref="P") center(ref="1") sex(ref="M")
baseline(ref="0") / param=ref;
model outcome=treatment center sex age baseline / dist=bin;
repeated subject=id(center) / corr=unstr corrw;
run;
proc genmod data=resp descend;
class id treatment(ref="P") center(ref="1") sex(ref="M")
baseline(ref="0") / param=ref;
model outcome=treatment center sex age baseline / dist=bin;
repeated subject=id(center) / logor=fullclust;
run;
In this study, IDs are re-used within each of the two centers.
So the code: subject=id(center) tells SAS that subjects with same ID but different
center will still be different subjects. This saves us from re-creating new unique
IDs.
27. SAS demo
• GEE with unstructured correlation
• GEE2 with alternate logistic regression
28. The GENMOD Procedure
Model Information
Data Set WORK.RESP
Distribution Binomial
Link Function Logit
Dependent Variable outcome
Number of Observations Read 444
Number of Observations Used 444
Number of Events 248
Number of Trials 444
Class Level Information
Class Value Design
Variables
treatment A 1
P 0
center 1 0
2 1
sex F 1
M 0
baseline 0 0
1 1
Response Profile
Ordered
Value
outcome Total
Frequency
1 1 248
2 0 196
PROC GENMOD is modeling the probability that outcome='1'.
29. Parameter Information
Parameter Effect treatment center sex baseline
Prm1 Intercept
Prm2 treatment A
Prm3 center 2
Prm4 sex F
Prm5 age
Prm6 baseline 1
Algorithm converged.
GEE Model Information
Correlation Structure Unstructured
Subject Effect id(center) (111 levels)
Number of Clusters 111
Correlation Matrix Dimension 4
Maximum Cluster Size 4
Minimum Cluster Size 4
Algorithm converged.
30. Working Correlation Matrix
Col1 Col2 Col3 Col4
Row1 1.0000 0.3351 0.2140 0.2953
Row2 0.3351 1.0000 0.4429 0.3581
Row3 0.2140 0.4429 1.0000 0.3964
Row4 0.2953 0.3581 0.3964 1.0000
GEE Fit Criteria
QIC 512.3416
QICu 499.6081
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept -0.8882 0.4568 -1.7835 0.0071 -1.94 0.0519
treatment A 1.2442 0.3455 0.5669 1.9214 3.60 0.0003
center 2 0.6558 0.3512 -0.0326 1.3442 1.87 0.0619
sex F 0.1128 0.4408 -0.7512 0.9768 0.26 0.7981
age -0.0175 0.0129 -0.0427 0.0077 -1.36 0.1728
baseline 1 1.8981 0.3441 1.2237 2.5725 5.52 <.0001
QIC: Quasi-likelihood Criterion
Smaller is better
31. GEE Model Information
Log Odds Ratio Structure Fully Parameterized Clusters
Subject Effect id(center) (111 levels)
Number of Clusters 111
Correlation Matrix Dimension 4
Maximum Cluster Size 4
Minimum Cluster Size 4
Log Odds Ratio Parameter
Information
Parameter Group
Alpha1 (1, 2)
Alpha2 (1, 3)
Alpha3 (1, 4)
Alpha4 (2, 3)
Alpha5 (2, 4)
Alpha6 (3, 4)
33. Results depends on the corr structure assumed
Un logor
Parameter Estimate Standard
Error
Pr > |Z| Estimate Standard
Error
Pr > |Z|
Intercept -0.8882 0.4568 0.0519 -0.9266 0.4513 0.0400
treatment A 1.2442 0.3455 0.0003 1.2611 0.3406 0.0002
center 2 0.6558 0.3512 0.0619 0.6287 0.3486 0.0713
sex F 0.1128 0.4408 0.7981 0.1024 0.4362 0.8144
age -0.0175 0.0129 0.1728 -0.0162 0.0125 0.1977
baseline 1 1.8981 0.3441 <.0001 1.8980 0.3404 <.0001
34. Log odds ratio structure
𝑂𝑅 𝑌
𝑗, 𝑌𝑘 =
Pr(𝑌𝑗=1,𝑌𝑘=1)
Pr(𝑌𝑗=0,𝑌𝑘=1)
/
Pr(𝑌𝑗=1,𝑌𝑘=0)
Pr(𝑌𝑗=0,𝑌𝑘=0)
𝐴𝑙𝑝ℎ𝑎1 = 𝑂𝑅 𝑌1, 𝑌2 =1.6109
=> having a good outcome at visit 1(Y_1=1) is
associated with having a good outcome at visit 2.
35. Log linear model for epileptic seizure episodes
• The data consist of the number of epileptic seizures
in an eight-week baseline period, before any
treatment;
• and in each of four two-week treatment periods, in
which patients received either a placebo or the drug
Progabide in addition to other therapy.
Trt=0 placebo
Trt=1 Progabide
SAS file name: SAS demo GEE Poisson
37. /*** exclude an outlier ID 207
creating offset variable ***/
data Seizure;
set Seizure;
if ID ne 207;
if Visit = 0 then do;
X1=0;
Ltime = log(8);
end;
else do;
X1=1;
Ltime=log(2);
end;
run;
proc print ; run ;
proc genmod data=Seizure;
class id;
model count=x1 | trt / d=poisson offset=ltime;
repeated subject=id / corrw covb type=exch;
run;