Week 10 GEE Data Examples v2.pptx

SAS Codes to Create a Binary Outcome
SAS file name: SAS demo TLC genmod
data tlc ; set ala.tlc ;
y=y0 ; time=0 ; week=0 ; output ;
run ;
data tlc ; set tlc ;
if week=0 then delete ;
if y>=20 then lead_normal=0 ;
if y ne . and y < 20 then lead_normal=1 ;
run ;
proc print ; run ;
Note: the event/success is normal blood lead level

id trt y0 y1 y4 y6 y time week lead_normal
1 P 30.8 26.9 25.8 23.8 26.9 1 1 0
1 P 30.8 26.9 25.8 23.8 25.8 2 4 0
1 P 30.8 26.9 25.8 23.8 23.8 3 6 0
2 A 26.5 14.8 19.5 21.0 14.8 1 1 1
2 A 26.5 14.8 19.5 21.0 19.5 2 4 1
2 A 26.5 14.8 19.5 21.0 21.0 3 6 0
3 A 25.8 23.0 19.1 23.2 23.0 1 1 0
3 A 25.8 23.0 19.1 23.2 19.1 2 4 1
3 A 25.8 23.0 19.1 23.2 23.2 3 6 0

TLC Data
Days Group A Group P
7 0.78 0.16
28 0.76 0.26
42 0.54 0.26
Blood lead levels were repeatedly measured in the
TLC trial data.
Binary outcome: blood lead level < 20 μg/dL (no lead
poisoning)
Percent of no lead poisoning in the two groups:

TLC Data (Continuous Lead Level)

TLC Data (continuous lead level by group)

TLC Data (binary: normal lead by group)

data tlc ; set ala.tlc ;
run ;
data tlc ; set tlc ;
if week=0 then delete ;
if y>=20 then lead_normal=0 ;
if y ne . and y < 20 then lead_normal=1 ;
run ;
proc genmod data=tlc descending ;
class id trt ;
model lead_normal =trt week / d=bin link=logit ;
repeated subject=id / type=exch corrw modelse ;
output out=pprobs p=pred xbeta=xbeta ;
run ;
Note: Genmod default is to use empirical (i.e. robust) standard error estimates. I used
the “modelse” option to show the difference between empirical and model-based
results.

GEE Model Information
Correlation Structure Exchangeable
Subject Effect id (100 levels)
Number of Clusters 100
Correlation Matrix Dimension 3
Maximum Cluster Size 3
Minimum Cluster Size 3
Algorithm converged.
Working Correlation Matrix
Col1 Col2 Col3
Row1 1.0000 0.4622 0.4622
Row2 0.4622 1.0000 0.4622
Row3 0.4622 0.4622 1.0000
Exchangeable Working Correlation
Correlation 0.4621656646
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept -1.0402 0.2839 -1.5966 -0.4838 -3.66 0.0002
trt A 2.0654 0.3706 1.3391 2.7918 5.57 <.0001
trt P 0.0000 0.0000 0.0000 0.0000 . .
week -0.0613 0.0522 -0.1635 0.0409 -1.18 0.2399

Error
Intercept -1.0402 0.2839 -1.5966 -0.4838 -3.66 0.0002
trt A 2.0654 0.3706 1.3391 2.7918 5.57 <.0001
trt P 0.0000 0.0000 0.0000 0.0000 . .
week -0.0613 0.0522 -0.1635 0.0409 -1.18 0.2399
Model-Based Standard Error Estimates
Error
Intercept -1.0402 0.3150 -1.6575 -0.4229 -3.30 0.0010
trt A 2.0654 0.3677 1.3447 2.7862 5.62 <.0001
trt P 0.0000 0.0000 0.0000 0.0000 . .
week -0.0613 0.0471 -0.1536 0.0310 -1.30 0.1930
Scale 1.0000 . . . . .

TLC Data
Observed and predicted proportions of normal lead
level in the two groups (predicted in parentheses)
Note the differences between observed and predicted
proportions in the treatment group. This is because the model
we fit was “main effect” only which assumes treatment effects
7 0.78 (0.72) 0.16 (0.25)
28 0.76 (0.69) 0.26 (0.22)
42 0.54 (0.66) 0.26 (0.20)

TLC Data: Adding an Interaction
proc genmod data=tlc descending ;
class id trt ;
model lead_normal =trt week trt*week / d=bin link=logit ;
repeated subject=id / type=exch corrw ;
output out=pprobs p=pred xbeta=xbeta ;
run ;

Correlation Structure Exchangeable
Subject Effect id (100 levels)
Col1 Col2 Col3
Row1 1.0000 0.4784 0.4784
Row2 0.4784 1.0000 0.4784
Row3 0.4784 0.4784 1.0000
Error
Intercept -1.6952 0.3935 -2.4665 -0.9239 -4.31 <.0001
week 0.1233 0.0770 -0.0276 0.2742 1.60 0.1091
trt A 3.3776 0.5711 2.2583 4.4970 5.91 <.0001
trt P 0.0000 0.0000 0.0000 0.0000 . .
week*trt A -0.3452 0.1045 -0.5500 -0.1404 -3.30 0.0010
week*trt P 0.0000 0.0000 0.0000 0.0000 . .

Group P:
logit 𝜇𝑖𝑗 = −1.6952 + 0.1233 ∗ 𝑤𝑒𝑒𝑘
Group A:
logit 𝜇𝑖𝑗 = −1.6952 + 3.3776 + 0.1233 ∗ 𝑤𝑒𝑒𝑘 − 0.3452 ∗ 𝑤𝑒𝑒𝑘
= 1.6824 − 0.2219 ∗ 𝑤𝑒𝑒𝑘
Thus, in the placebo group (group P), the odds of having normal lead level goes up over time
(although not reaching significance at the 0.05 level)
OR per week= exp(0.1233) = 1.13
But in the treatment group (group A), the odds of having normal lead level goes down over
time:
OR per week = exp(-0.2219) = 0.80
Change in OR over time between the two groups is significantly different (p=0.0010)
𝑂𝑅 =
𝑃𝑟𝑜𝑏 (𝑏𝑙𝑜𝑜𝑑 𝑙𝑒𝑎𝑑<20)
𝑃𝑟𝑜𝑏 (𝑏𝑙𝑜𝑜𝑑 𝑙𝑒𝑎𝑑≥20)

TLC Data
Comparisons of observed and predicted probabilities (in
parentheses) from the GEE model with trt, week as main
effects and trt and week interaction.
7 0.78 (0.81) 0.16 (0.17)
28 0.76 (0.69) 0.26 (0.23)
42 0.54 (0.59) 0.26 (0.28)
7 0.78 (0.72) 0.16 (0.25)
28 0.76 (0.69) 0.26 (0.22)
42 0.54 (0.66) 0.26 (0.20)
Predicted results using
main effects only model in
parentheses

GEE2
R(α) is the working correlation matrix containing unknown
parameter α. If we can write V=Wα, then we can include a second
set of estimating equations for α.
Second-order generalized estimating equation (GEE2)

Using correlation coefficient for binary data
𝐿𝑒𝑡 𝑌1 =
𝑖=1
𝑛
𝑌𝑖1
𝑛
𝑎𝑛𝑑 𝑌2 =
𝑖=1
𝑛
𝑌𝑖2
𝑛
𝐶𝑜𝑟𝑟 𝑌𝑖1, 𝑌𝑖2 =
𝑖=1
𝑛
(𝑌𝑖1 − 𝑌1)(𝑌𝑖2 − 𝑌2)
𝑛𝑆1𝑆2
= 𝑖=1
𝑛
𝑌𝑖1𝑌𝑖2 −𝑛𝑌1𝑌2
𝑛 𝑌1 1−𝑌1 𝑌2 1−𝑌2
<
min 𝑌1, 𝑌2 − 𝑌1𝑌2
𝑌1 1 − 𝑌1 𝑌2 1 − 𝑌2
𝑊ℎ𝑒𝑛 𝑌1=0.2 and 𝑌2 = 0.8, 𝑐𝑜𝑟𝑟 < 0.25

𝑂𝑅 𝑌
𝑗, 𝑌𝑘 =
Pr(𝑌𝑗=1,𝑌𝑘=1)
/

Alternate Logistic Regression using GEE2
Let be the log OR between pairs of
between subject binary outcomes.
The ALR algorithm models the log OR with:
𝛾𝑖𝑗𝑘 = 𝑍𝑖𝑗𝑘
′
𝛼
The vector α is now also included in the GEE iterative
algorithm in addition to the regression parameter β.

Respiratory Disease Example
• Clinical trial data comparing two treatments for a
respiratory disorder.
• Patients in each of two centers are randomly assigned
to groups receiving the active treatment or a placebo.
• ID re-used within each center
• During treatment, respiratory status, represented by
the variable outcome (coded as 0=poor, 1=good) is
determined for each of four visits.

Respiratory Disease Data
SAS file name: SAS demo GEE binary
center id treatment sex age baseline visit outcome
1 1 P M 46 0 1 0
1 1 P M 46 0 2 0
1 1 P M 46 0 3 0
1 1 P M 46 0 4 0
1 2 P M 28 0 1 0
1 2 P M 28 0 2 0
1 2 P M 28 0 3 0
1 2 P M 28 0 4 0
1 3 A M 23 1 1 1
1 3 A M 23 1 2 1
1 3 A M 23 1 3 1
1 3 A M 23 1 4 1

SAS Codes
proc genmod data=resp descend;
class id treatment(ref="P") center(ref="1") sex(ref="M")
baseline(ref="0") / param=ref;
model outcome=treatment center sex age baseline / dist=bin;
repeated subject=id(center) / corr=unstr corrw;
run;
proc genmod data=resp descend;
class id treatment(ref="P") center(ref="1") sex(ref="M")
baseline(ref="0") / param=ref;
model outcome=treatment center sex age baseline / dist=bin;
repeated subject=id(center) / logor=fullclust;
run;
In this study, IDs are re-used within each of the two centers.
So the code: subject=id(center) tells SAS that subjects with same ID but different
center will still be different subjects. This saves us from re-creating new unique
IDs.

SAS demo
• GEE with unstructured correlation
• GEE2 with alternate logistic regression

The GENMOD Procedure
Model Information
Data Set WORK.RESP
Distribution Binomial
Link Function Logit
Dependent Variable outcome
Number of Observations Read 444
Number of Observations Used 444
Number of Events 248
Number of Trials 444
Class Level Information
Class Value Design
Variables
treatment A 1
P 0
center 1 0
2 1
sex F 1
M 0
baseline 0 0
1 1
Response Profile
Ordered
Value
outcome Total
Frequency
1 1 248
2 0 196
PROC GENMOD is modeling the probability that outcome='1'.

Parameter Information
Parameter Effect treatment center sex baseline
Prm1 Intercept
Prm2 treatment A
Prm3 center 2
Prm4 sex F
Prm5 age
Prm6 baseline 1
Correlation Structure Unstructured
Subject Effect id(center) (111 levels)

Col1 Col2 Col3 Col4
Row1 1.0000 0.3351 0.2140 0.2953
Row2 0.3351 1.0000 0.4429 0.3581
Row3 0.2140 0.4429 1.0000 0.3964
Row4 0.2953 0.3581 0.3964 1.0000
GEE Fit Criteria
QIC 512.3416
QICu 499.6081
Error
Intercept -0.8882 0.4568 -1.7835 0.0071 -1.94 0.0519
treatment A 1.2442 0.3455 0.5669 1.9214 3.60 0.0003
center 2 0.6558 0.3512 -0.0326 1.3442 1.87 0.0619
sex F 0.1128 0.4408 -0.7512 0.9768 0.26 0.7981
age -0.0175 0.0129 -0.0427 0.0077 -1.36 0.1728
baseline 1 1.8981 0.3441 1.2237 2.5725 5.52 <.0001
QIC: Quasi-likelihood Criterion
Smaller is better

Log Odds Ratio Structure Fully Parameterized Clusters
Subject Effect id(center) (111 levels)
Log Odds Ratio Parameter
Information
Parameter Group
Alpha1 (1, 2)
Alpha2 (1, 3)
Alpha3 (1, 4)
Alpha4 (2, 3)
Alpha5 (2, 4)
Alpha6 (3, 4)

Error
Intercept -0.9266 0.4513 -1.8111 -0.0421 -2.05 0.0400
treatment A 1.2611 0.3406 0.5934 1.9287 3.70 0.0002
center 2 0.6287 0.3486 -0.0545 1.3119 1.80 0.0713
sex F 0.1024 0.4362 -0.7526 0.9575 0.23 0.8144
age -0.0162 0.0125 -0.0407 0.0084 -1.29 0.1977
baseline 1 1.8980 0.3404 1.2308 2.5652 5.58 <.0001
Alpha1 1.6109 0.4892 0.6522 2.5696 3.29 0.0010
Alpha2 1.0771 0.4834 0.1297 2.0246 2.23 0.0259
Alpha3 1.5875 0.4735 0.6594 2.5155 3.35 0.0008
Alpha4 2.1224 0.5022 1.1381 3.1068 4.23 <.0001
Alpha5 1.8818 0.4686 0.9634 2.8001 4.02 <.0001
Alpha6 2.1046 0.4949 1.1347 3.0745 4.25 <.0001

Results depends on the corr structure assumed
Un logor
Error
Pr > |Z| Estimate Standard
Error
Pr > |Z|
Intercept -0.8882 0.4568 0.0519 -0.9266 0.4513 0.0400
treatment A 1.2442 0.3455 0.0003 1.2611 0.3406 0.0002
center 2 0.6558 0.3512 0.0619 0.6287 0.3486 0.0713
sex F 0.1128 0.4408 0.7981 0.1024 0.4362 0.8144
age -0.0175 0.0129 0.1728 -0.0162 0.0125 0.1977
baseline 1 1.8981 0.3441 <.0001 1.8980 0.3404 <.0001

Log odds ratio structure
𝑂𝑅 𝑌
𝑗, 𝑌𝑘 =
/
𝐴𝑙𝑝ℎ𝑎1 = 𝑂𝑅 𝑌1, 𝑌2 =1.6109
=> having a good outcome at visit 1(Y_1=1) is
associated with having a good outcome at visit 2.

Log linear model for epileptic seizure episodes
• The data consist of the number of epileptic seizures
in an eight-week baseline period, before any
treatment;
• and in each of four two-week treatment periods, in
which patients received either a placebo or the drug
Progabide in addition to other therapy.
Trt=0 placebo
Trt=1 Progabide
SAS file name: SAS demo GEE Poisson

Obs ID Count Visit Trt Age Weeks
1 104 11 0 0 31 8
2 104 5 1 0 31 2
3 104 3 2 0 31 2
4 104 3 3 0 31 2
5 104 3 4 0 31 2
6 106 11 0 0 30 8
7 106 3 1 0 30 2
8 106 5 2 0 30 2
9 106 3 3 0 30 2
10 106 3 4 0 30 2

/*** exclude an outlier ID 207
creating offset variable ***/
data Seizure;
set Seizure;
if ID ne 207;
if Visit = 0 then do;
X1=0;
Ltime = log(8);
end;
else do;
X1=1;
Ltime=log(2);
end;
run;
proc print ; run ;
proc genmod data=Seizure;
class id;
model count=x1 | trt / d=poisson offset=ltime;
repeated subject=id / corrw covb type=exch;
run;

Obs ID Count Visit Trt Age Weeks X1 Ltime
1 104 11 0 0 31 8 0 2.07944
2 104 5 1 0 31 2 1 0.69315
3 104 3 2 0 31 2 1 0.69315
4 104 3 3 0 31 2 1 0.69315
5 104 3 4 0 31 2 1 0.69315
6 106 11 0 0 30 8 0 2.07944
7 106 3 1 0 30 2 1 0.69315
8 106 5 2 0 30 2 1 0.69315
9 106 3 3 0 30 2 1 0.69315
10 106 3 4 0 30 2 1 0.69315

The GENMOD Procedure
Model Information
Data Set WORK.SEIZURE
Distribution Poisson
Link Function Log
Dependent Variable Count
Offset Variable Ltime
Number of Observations Read 290
Number of Observations Used 290
Class Level Information
Class Levels Values
ID 58 101 102 103 104 106 107 108 110 111 112 113 114 116 117 118 121 122 123
124 126 128 129 130 135 137 139 141 143 145 147 201 202 203 204 205 206
208 209 210 211 213 214 215 217 218 219 220 221 222 225 226 227 228 230
232 234 236 238
Parameter Information
Parameter Effect
Prm1 Intercept
Prm2 X1
Prm3 Trt
Prm4 X1*Trt

Covariance Matrix (Model-Based)
Prm1 Prm2 Prm3 Prm4
Prm1 0.01223 0.001520 -0.01223 -0.001520
Prm2 0.001520 0.01519 -0.001520 -0.01519
Prm3 -0.01223 -0.001520 0.02495 0.005427
Prm4 -0.001520 -0.01519 0.005427 0.03748
Covariance Matrix (Empirical)
Prm1 Prm2 Prm3 Prm4
Prm1 0.02476 -0.001152 -0.02476 0.001152
Prm2 -0.001152 0.01348 0.001152 -0.01348
Prm3 -0.02476 0.001152 0.03751 -0.002999
Prm4 0.001152 -0.01348 -0.002999 0.02931
Col1 Col2 Col3 Col4 Col5
Row1 1.0000 0.5941 0.5941 0.5941 0.5941
Row2 0.5941 1.0000 0.5941 0.5941 0.5941
Row3 0.5941 0.5941 1.0000 0.5941 0.5941
Row4 0.5941 0.5941 0.5941 1.0000 0.5941
Row5 0.5941 0.5941 0.5941 0.5941 1.0000
These are covariance
matrices for the beta
parameters.

Error
Intercept 1.3476 0.1574 1.0392 1.6560 8.56 <.0001
X1 0.1108 0.1161 -0.1168 0.3383 0.95 0.3399
Trt -0.1080 0.1937 -0.4876 0.2716 -0.56 0.5770
X1*Trt -0.3016 0.1712 -0.6371 0.0339 -1.76 0.0781

Genmod working correlation matrix

Genmod working correlation matrix
User defined correlation matrix (not in proc mixed):

New Homework Assignment
• Problem 13.1, due next Wednesday, April 15.
• Submit to canvas

Week 10 GEE Data Examples v2.pptx

Recommended

Recommended

More Related Content

Similar to Week 10 GEE Data Examples v2.pptx

Similar to Week 10 GEE Data Examples v2.pptx (20)

Recently uploaded

Recently uploaded (20)

Week 10 GEE Data Examples v2.pptx