The document provides examples of using SAS code to analyze longitudinal binary outcome data from a clinical trial using generalized estimating equations (GEE). Specifically:
1) It summarizes a clinical trial dataset comparing two treatments for a respiratory disorder with binary outcome measured at four visits;
2) Shows SAS code to fit a GEE model with unstructured correlation to the data;
3) Fits an alternate GEE2 model that additionally estimates the correlation between outcomes.
⭐⭐⭐⭐⭐ Finding a Dynamical Model of a Social Norm Physical Activity InterventionVictor Asanza
✅ Low levels of physical activity in sedentary individuals constitute a major concern in public health.
✅ Physical activity interventions can be designed relying on mobile technologies such as smartphones.
✅ The purpose of this work is to find a dynamical model of a social norm physical activity intervention relying on Social Cognitive Theory, and using a data set obtained from a previous experiment.
✅ The model will serve as a framework for the design of future optimized interventions. To obtain model parameters, two strategies are developed: first, an algorithm is proposed that randomly varies the values of each model parameter around initial guesses.
✅ The second approach utilizes traditional system identification concepts to obtain model parameters relying on semi-physical identification routines. For both cases, the obtained model is assessed through the computation of percentage fits to a validation data set, and by the development of a correlation analysis.
In general, a factorial experiment involves several variables.
One variable is the response variable, which is sometimes called the outcome variable or the dependent variable.
The other variables are called factors.
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...aurkoiitk
The objective of this study
was to develop an economic indicator system for the US
economy that will help to forecast the turning points in the
aggregate level of economic activity. Our primary concern
is to study the short run relationship between the major
economic indicators of US economy (eg: GDP, Money
Supply, Unemployment Rate, Inflation rate, Federal Fund
Rate, Exchange Rate, Government Expenditure &
Receipt, Crude Oil Price, Net Import & Export).
1. Outline the differences between Hoarding power and Encouraging..docxpaynetawnya
1. Outline the differences between Hoarding power and Encouraging.
2. Explain about the power of Congruency in Leadership.
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseDegreeGender1GrCopy Employee Data set to this page.822.10.962233290915.81FAThe ongoing question that the weekly assignments will focus on is: Are males and females paid the same for equal work (under the Equal Pay Act)? 1522.60.984233280814.91FANote: to simplfy the analysis, we will assume that jobs within each grade comprise equal work.3522.60.984232390415.30FA37230.999232295216.20FAThe column labels in the table mean:1023.11.003233080714.71FAID – Employee sample number Salary – Salary in thousands 2323.11.004233665613.30FAAge – Age in yearsPerformance Rating – Appraisal rating (Employee evaluation score)1123.31.01223411001914.81FASERvice – Years of serviceGender: 0 = male, 1 = female 2623.51.020232295216.20FAMidpoint – salary grade midpoint Raise – percent of last raise3123.61.028232960413.91FAGrade – job/pay gradeDegree (0= BS\BA 1 = MS)3623.61.026232775314.30FAGender1 (Male or Female)Compa-ratio - salary divided by midpoint4023.81.034232490206.30MA14241.04523329012161FA4224.21.0512332100815.71FA1924.31.055233285104.61MA25251.0872341704040MA3226.50.855312595405.60MB227.70.895315280703.90MB3428.60.923312680204.91MB3933.91.094312790615.50FB2034.11.1013144701614.80FB1834.51.1133131801115.60FB335.11.132313075513.61FB1341.11.0274030100214.70FC741.31.0324032100815.71FC1642.21.054404490405.70MC4145.81.144402580504.30MC2746.91.172403580703.91MC548.21.0044836901605.71MD3049.31.0274845901804.30MD2456.31.173483075913.80FD4556.91.185483695815.21FD4757.21.003573795505.51ME3357.51.008573590905.51ME4581.01857421001605.51ME3858.81.0325745951104.50ME5059.61.0465738801204.60ME4660.21.0575739752003.91ME2260.31.257484865613.81FD161.61.081573485805.70ME4461.81.0855745901605.21ME49631.1055741952106.60ME1763.71.1185727553131FE1264.71.1355752952204.50ME4869.51.2195734901115.31FE973.91.103674910010041MF4375.61.1286742952015.50FF2976.31.139675295505.40MF2177.21.1526743951306.31MF678.11.1656736701204.51MF2878.31.169674495914.40FF
Week 2This assignment covers the material presented in weeks 1 and 2.Six QuestionsBefore starting this assignment, make sure the the assignment data from the Employee Salary Data Set file is copied over to this Assignment file.You can do this either by a copy and paste of all the columns or by opening the data file, right clicking on the Data tab, selecting Move or Copy, and copying the entire sheet to this file(Weekly Assignment Sheet or whatever you are calling your master assignment file).It is highly recommended that you copy the data columns (with labels) and paste them to the right so that whatever you do will not disrupt the original data values and relationships.To Ensure full credit for each question, you need to show how you got your results. For example, Question 1 asks for several data values. If you obtain them using descript ...
Estimating sample size through simulationsArthur8898
Determining sample size is one critical and important procedure for designing an experiment. The sample size for most statistical models can be easily calculated by using the POWER procedure. However, the PROC POWER cannot be used for a complicated statistical model. This paper reviews a more generalized method to estimate the sample size through a simulation approach by using SAS® software. The simulation approach not only applies to the simple but also to a more complex statistical design.
Traditional randomized experiments allow us to determine the overall causal impact of a treatment program (e.g. marketing, medical, social, education, political). Uplift modeling (also known as true lift, net lift, incremental lift) takes a further step to identify individuals who are truly positively influenced by a treatment through data mining / machine learning. This technique allows us to identify the “persuadables” and thus optimize target selection in order to maximize treatment benefits. This important subfield of data mining/data science/business analytics has gained significant attention in areas such as personalized marketing, personalized medicine, and political election with plenty of publications and presentations appeared in recent years from both industry practitioners and academics.
In this workshop, I will introduce the concept of Uplift, review existing methods, contrast with the traditional approach, and introduce a new method that can be implemented with standard software. A method and metrics for model assessment will be recommended. Our discussion will include new approaches to handling a general situation where only observational data are available, i.e. without randomized experiments, using techniques from causal inference. Additionally, an integrated modeling approach for uplift and direct response (where it can be identified who actually responded, e.g., click-through or coupon scanning) will be discussed. Last but not least, extension to the multiple treatment situation with solutions to optimizing treatments at the individual level will also be discussed. While the talk is geared towards marketing applications (“personalized marketing”), the same methodologies can be readily applied in other fields such as insurance, medicine, education, political, and social programs. Examples from the retail and non-profit industries will be used to illustrate the methodologies.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
⭐⭐⭐⭐⭐ Finding a Dynamical Model of a Social Norm Physical Activity InterventionVictor Asanza
✅ Low levels of physical activity in sedentary individuals constitute a major concern in public health.
✅ Physical activity interventions can be designed relying on mobile technologies such as smartphones.
✅ The purpose of this work is to find a dynamical model of a social norm physical activity intervention relying on Social Cognitive Theory, and using a data set obtained from a previous experiment.
✅ The model will serve as a framework for the design of future optimized interventions. To obtain model parameters, two strategies are developed: first, an algorithm is proposed that randomly varies the values of each model parameter around initial guesses.
✅ The second approach utilizes traditional system identification concepts to obtain model parameters relying on semi-physical identification routines. For both cases, the obtained model is assessed through the computation of percentage fits to a validation data set, and by the development of a correlation analysis.
In general, a factorial experiment involves several variables.
One variable is the response variable, which is sometimes called the outcome variable or the dependent variable.
The other variables are called factors.
A Study on the Short Run Relationship b/w Major Economic Indicators of US Eco...aurkoiitk
The objective of this study
was to develop an economic indicator system for the US
economy that will help to forecast the turning points in the
aggregate level of economic activity. Our primary concern
is to study the short run relationship between the major
economic indicators of US economy (eg: GDP, Money
Supply, Unemployment Rate, Inflation rate, Federal Fund
Rate, Exchange Rate, Government Expenditure &
Receipt, Crude Oil Price, Net Import & Export).
1. Outline the differences between Hoarding power and Encouraging..docxpaynetawnya
1. Outline the differences between Hoarding power and Encouraging.
2. Explain about the power of Congruency in Leadership.
DataIDSalaryCompaMidpoint AgePerformance RatingServiceGenderRaiseDegreeGender1GrCopy Employee Data set to this page.822.10.962233290915.81FAThe ongoing question that the weekly assignments will focus on is: Are males and females paid the same for equal work (under the Equal Pay Act)? 1522.60.984233280814.91FANote: to simplfy the analysis, we will assume that jobs within each grade comprise equal work.3522.60.984232390415.30FA37230.999232295216.20FAThe column labels in the table mean:1023.11.003233080714.71FAID – Employee sample number Salary – Salary in thousands 2323.11.004233665613.30FAAge – Age in yearsPerformance Rating – Appraisal rating (Employee evaluation score)1123.31.01223411001914.81FASERvice – Years of serviceGender: 0 = male, 1 = female 2623.51.020232295216.20FAMidpoint – salary grade midpoint Raise – percent of last raise3123.61.028232960413.91FAGrade – job/pay gradeDegree (0= BS\BA 1 = MS)3623.61.026232775314.30FAGender1 (Male or Female)Compa-ratio - salary divided by midpoint4023.81.034232490206.30MA14241.04523329012161FA4224.21.0512332100815.71FA1924.31.055233285104.61MA25251.0872341704040MA3226.50.855312595405.60MB227.70.895315280703.90MB3428.60.923312680204.91MB3933.91.094312790615.50FB2034.11.1013144701614.80FB1834.51.1133131801115.60FB335.11.132313075513.61FB1341.11.0274030100214.70FC741.31.0324032100815.71FC1642.21.054404490405.70MC4145.81.144402580504.30MC2746.91.172403580703.91MC548.21.0044836901605.71MD3049.31.0274845901804.30MD2456.31.173483075913.80FD4556.91.185483695815.21FD4757.21.003573795505.51ME3357.51.008573590905.51ME4581.01857421001605.51ME3858.81.0325745951104.50ME5059.61.0465738801204.60ME4660.21.0575739752003.91ME2260.31.257484865613.81FD161.61.081573485805.70ME4461.81.0855745901605.21ME49631.1055741952106.60ME1763.71.1185727553131FE1264.71.1355752952204.50ME4869.51.2195734901115.31FE973.91.103674910010041MF4375.61.1286742952015.50FF2976.31.139675295505.40MF2177.21.1526743951306.31MF678.11.1656736701204.51MF2878.31.169674495914.40FF
Week 2This assignment covers the material presented in weeks 1 and 2.Six QuestionsBefore starting this assignment, make sure the the assignment data from the Employee Salary Data Set file is copied over to this Assignment file.You can do this either by a copy and paste of all the columns or by opening the data file, right clicking on the Data tab, selecting Move or Copy, and copying the entire sheet to this file(Weekly Assignment Sheet or whatever you are calling your master assignment file).It is highly recommended that you copy the data columns (with labels) and paste them to the right so that whatever you do will not disrupt the original data values and relationships.To Ensure full credit for each question, you need to show how you got your results. For example, Question 1 asks for several data values. If you obtain them using descript ...
Estimating sample size through simulationsArthur8898
Determining sample size is one critical and important procedure for designing an experiment. The sample size for most statistical models can be easily calculated by using the POWER procedure. However, the PROC POWER cannot be used for a complicated statistical model. This paper reviews a more generalized method to estimate the sample size through a simulation approach by using SAS® software. The simulation approach not only applies to the simple but also to a more complex statistical design.
Traditional randomized experiments allow us to determine the overall causal impact of a treatment program (e.g. marketing, medical, social, education, political). Uplift modeling (also known as true lift, net lift, incremental lift) takes a further step to identify individuals who are truly positively influenced by a treatment through data mining / machine learning. This technique allows us to identify the “persuadables” and thus optimize target selection in order to maximize treatment benefits. This important subfield of data mining/data science/business analytics has gained significant attention in areas such as personalized marketing, personalized medicine, and political election with plenty of publications and presentations appeared in recent years from both industry practitioners and academics.
In this workshop, I will introduce the concept of Uplift, review existing methods, contrast with the traditional approach, and introduce a new method that can be implemented with standard software. A method and metrics for model assessment will be recommended. Our discussion will include new approaches to handling a general situation where only observational data are available, i.e. without randomized experiments, using techniques from causal inference. Additionally, an integrated modeling approach for uplift and direct response (where it can be identified who actually responded, e.g., click-through or coupon scanning) will be discussed. Last but not least, extension to the multiple treatment situation with solutions to optimizing treatments at the individual level will also be discussed. While the talk is geared towards marketing applications (“personalized marketing”), the same methodologies can be readily applied in other fields such as insurance, medicine, education, political, and social programs. Examples from the retail and non-profit industries will be used to illustrate the methodologies.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
3. SAS Codes to Create a Binary Outcome
SAS file name: SAS demo TLC genmod
data tlc ; set ala.tlc ;
y=y0 ; time=0 ; week=0 ; output ;
y=y1 ; time=1 ; week=1 ; output ;
y=y4 ; time=2 ; week=4 ; output ;
y=y6 ; time=3 ; week=6 ; output ;
run ;
data tlc ; set tlc ;
if week=0 then delete ;
if y>=20 then lead_normal=0 ;
if y ne . and y < 20 then lead_normal=1 ;
run ;
proc print ; run ;
Note: the event/success is normal blood lead level
4. id trt y0 y1 y4 y6 y time week lead_normal
1 P 30.8 26.9 25.8 23.8 26.9 1 1 0
1 P 30.8 26.9 25.8 23.8 25.8 2 4 0
1 P 30.8 26.9 25.8 23.8 23.8 3 6 0
2 A 26.5 14.8 19.5 21.0 14.8 1 1 1
2 A 26.5 14.8 19.5 21.0 19.5 2 4 1
2 A 26.5 14.8 19.5 21.0 21.0 3 6 0
3 A 25.8 23.0 19.1 23.2 23.0 1 1 0
3 A 25.8 23.0 19.1 23.2 19.1 2 4 1
3 A 25.8 23.0 19.1 23.2 23.2 3 6 0
5. TLC Data
Days Group A Group P
7 0.78 0.16
28 0.76 0.26
42 0.54 0.26
Blood lead levels were repeatedly measured in the
TLC trial data.
Binary outcome: blood lead level < 20 μg/dL (no lead
poisoning)
Percent of no lead poisoning in the two groups:
10. data tlc ; set ala.tlc ;
y=y0 ; time=0 ; week=0 ; output ;
y=y1 ; time=1 ; week=1 ; output ;
y=y4 ; time=2 ; week=4 ; output ;
y=y6 ; time=3 ; week=6 ; output ;
run ;
data tlc ; set tlc ;
if week=0 then delete ;
if y>=20 then lead_normal=0 ;
if y ne . and y < 20 then lead_normal=1 ;
run ;
proc genmod data=tlc descending ;
class id trt ;
model lead_normal =trt week / d=bin link=logit ;
repeated subject=id / type=exch corrw modelse ;
output out=pprobs p=pred xbeta=xbeta ;
run ;
Note: Genmod default is to use empirical (i.e. robust) standard error estimates. I used
the “modelse” option to show the difference between empirical and model-based
results.
11. GEE Model Information
Correlation Structure Exchangeable
Subject Effect id (100 levels)
Number of Clusters 100
Correlation Matrix Dimension 3
Maximum Cluster Size 3
Minimum Cluster Size 3
Algorithm converged.
Working Correlation Matrix
Col1 Col2 Col3
Row1 1.0000 0.4622 0.4622
Row2 0.4622 1.0000 0.4622
Row3 0.4622 0.4622 1.0000
Exchangeable Working Correlation
Correlation 0.4621656646
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept -1.0402 0.2839 -1.5966 -0.4838 -3.66 0.0002
trt A 2.0654 0.3706 1.3391 2.7918 5.57 <.0001
trt P 0.0000 0.0000 0.0000 0.0000 . .
week -0.0613 0.0522 -0.1635 0.0409 -1.18 0.2399
12. Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept -1.0402 0.2839 -1.5966 -0.4838 -3.66 0.0002
trt A 2.0654 0.3706 1.3391 2.7918 5.57 <.0001
trt P 0.0000 0.0000 0.0000 0.0000 . .
week -0.0613 0.0522 -0.1635 0.0409 -1.18 0.2399
Analysis Of GEE Parameter Estimates
Model-Based Standard Error Estimates
Parameter Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept -1.0402 0.3150 -1.6575 -0.4229 -3.30 0.0010
trt A 2.0654 0.3677 1.3447 2.7862 5.62 <.0001
trt P 0.0000 0.0000 0.0000 0.0000 . .
week -0.0613 0.0471 -0.1536 0.0310 -1.30 0.1930
Scale 1.0000 . . . . .
13. TLC Data
Observed and predicted proportions of normal lead
level in the two groups (predicted in parentheses)
Note the differences between observed and predicted
proportions in the treatment group. This is because the model
we fit was “main effect” only which assumes treatment effects
Days Group A Group P
7 0.78 (0.72) 0.16 (0.25)
28 0.76 (0.69) 0.26 (0.22)
42 0.54 (0.66) 0.26 (0.20)
14. TLC Data: Adding an Interaction
proc genmod data=tlc descending ;
class id trt ;
model lead_normal =trt week trt*week / d=bin link=logit ;
repeated subject=id / type=exch corrw ;
output out=pprobs p=pred xbeta=xbeta ;
run ;
15. GEE Model Information
Correlation Structure Exchangeable
Subject Effect id (100 levels)
Number of Clusters 100
Correlation Matrix Dimension 3
Maximum Cluster Size 3
Minimum Cluster Size 3
Algorithm converged.
Working Correlation Matrix
Col1 Col2 Col3
Row1 1.0000 0.4784 0.4784
Row2 0.4784 1.0000 0.4784
Row3 0.4784 0.4784 1.0000
Exchangeable Working Correlation
Correlation 0.4783943345
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept -1.6952 0.3935 -2.4665 -0.9239 -4.31 <.0001
week 0.1233 0.0770 -0.0276 0.2742 1.60 0.1091
trt A 3.3776 0.5711 2.2583 4.4970 5.91 <.0001
trt P 0.0000 0.0000 0.0000 0.0000 . .
week*trt A -0.3452 0.1045 -0.5500 -0.1404 -3.30 0.0010
week*trt P 0.0000 0.0000 0.0000 0.0000 . .
16. Group P:
logit 𝜇𝑖𝑗 = −1.6952 + 0.1233 ∗ 𝑤𝑒𝑒𝑘
Group A:
logit 𝜇𝑖𝑗 = −1.6952 + 3.3776 + 0.1233 ∗ 𝑤𝑒𝑒𝑘 − 0.3452 ∗ 𝑤𝑒𝑒𝑘
= 1.6824 − 0.2219 ∗ 𝑤𝑒𝑒𝑘
Thus, in the placebo group (group P), the odds of having normal lead level goes up over time
(although not reaching significance at the 0.05 level)
OR per week= exp(0.1233) = 1.13
But in the treatment group (group A), the odds of having normal lead level goes down over
time:
OR per week = exp(-0.2219) = 0.80
Change in OR over time between the two groups is significantly different (p=0.0010)
𝑂𝑅 =
𝑃𝑟𝑜𝑏 (𝑏𝑙𝑜𝑜𝑑 𝑙𝑒𝑎𝑑<20)
𝑃𝑟𝑜𝑏 (𝑏𝑙𝑜𝑜𝑑 𝑙𝑒𝑎𝑑≥20)
17. TLC Data
Comparisons of observed and predicted probabilities (in
parentheses) from the GEE model with trt, week as main
effects and trt and week interaction.
Days Group A Group P
7 0.78 (0.81) 0.16 (0.17)
28 0.76 (0.69) 0.26 (0.23)
42 0.54 (0.59) 0.26 (0.28)
Days Group A Group P
7 0.78 (0.72) 0.16 (0.25)
28 0.76 (0.69) 0.26 (0.22)
42 0.54 (0.66) 0.26 (0.20)
Predicted results using
main effects only model in
parentheses
18. GEE2
R(α) is the working correlation matrix containing unknown
parameter α. If we can write V=Wα, then we can include a second
set of estimating equations for α.
Second-order generalized estimating equation (GEE2)
23. Alternate Logistic Regression using GEE2
Let be the log OR between pairs of
between subject binary outcomes.
The ALR algorithm models the log OR with:
𝛾𝑖𝑗𝑘 = 𝑍𝑖𝑗𝑘
′
𝛼
The vector α is now also included in the GEE iterative
algorithm in addition to the regression parameter β.
24. Respiratory Disease Example
• Clinical trial data comparing two treatments for a
respiratory disorder.
• Patients in each of two centers are randomly assigned
to groups receiving the active treatment or a placebo.
• ID re-used within each center
• During treatment, respiratory status, represented by
the variable outcome (coded as 0=poor, 1=good) is
determined for each of four visits.
25. Respiratory Disease Data
SAS file name: SAS demo GEE binary
center id treatment sex age baseline visit outcome
1 1 P M 46 0 1 0
1 1 P M 46 0 2 0
1 1 P M 46 0 3 0
1 1 P M 46 0 4 0
1 2 P M 28 0 1 0
1 2 P M 28 0 2 0
1 2 P M 28 0 3 0
1 2 P M 28 0 4 0
1 3 A M 23 1 1 1
1 3 A M 23 1 2 1
1 3 A M 23 1 3 1
1 3 A M 23 1 4 1
26. SAS Codes
proc genmod data=resp descend;
class id treatment(ref="P") center(ref="1") sex(ref="M")
baseline(ref="0") / param=ref;
model outcome=treatment center sex age baseline / dist=bin;
repeated subject=id(center) / corr=unstr corrw;
run;
proc genmod data=resp descend;
class id treatment(ref="P") center(ref="1") sex(ref="M")
baseline(ref="0") / param=ref;
model outcome=treatment center sex age baseline / dist=bin;
repeated subject=id(center) / logor=fullclust;
run;
In this study, IDs are re-used within each of the two centers.
So the code: subject=id(center) tells SAS that subjects with same ID but different
center will still be different subjects. This saves us from re-creating new unique
IDs.
27. SAS demo
• GEE with unstructured correlation
• GEE2 with alternate logistic regression
28. The GENMOD Procedure
Model Information
Data Set WORK.RESP
Distribution Binomial
Link Function Logit
Dependent Variable outcome
Number of Observations Read 444
Number of Observations Used 444
Number of Events 248
Number of Trials 444
Class Level Information
Class Value Design
Variables
treatment A 1
P 0
center 1 0
2 1
sex F 1
M 0
baseline 0 0
1 1
Response Profile
Ordered
Value
outcome Total
Frequency
1 1 248
2 0 196
PROC GENMOD is modeling the probability that outcome='1'.
29. Parameter Information
Parameter Effect treatment center sex baseline
Prm1 Intercept
Prm2 treatment A
Prm3 center 2
Prm4 sex F
Prm5 age
Prm6 baseline 1
Algorithm converged.
GEE Model Information
Correlation Structure Unstructured
Subject Effect id(center) (111 levels)
Number of Clusters 111
Correlation Matrix Dimension 4
Maximum Cluster Size 4
Minimum Cluster Size 4
Algorithm converged.
30. Working Correlation Matrix
Col1 Col2 Col3 Col4
Row1 1.0000 0.3351 0.2140 0.2953
Row2 0.3351 1.0000 0.4429 0.3581
Row3 0.2140 0.4429 1.0000 0.3964
Row4 0.2953 0.3581 0.3964 1.0000
GEE Fit Criteria
QIC 512.3416
QICu 499.6081
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Parameter Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept -0.8882 0.4568 -1.7835 0.0071 -1.94 0.0519
treatment A 1.2442 0.3455 0.5669 1.9214 3.60 0.0003
center 2 0.6558 0.3512 -0.0326 1.3442 1.87 0.0619
sex F 0.1128 0.4408 -0.7512 0.9768 0.26 0.7981
age -0.0175 0.0129 -0.0427 0.0077 -1.36 0.1728
baseline 1 1.8981 0.3441 1.2237 2.5725 5.52 <.0001
QIC: Quasi-likelihood Criterion
Smaller is better
31. GEE Model Information
Log Odds Ratio Structure Fully Parameterized Clusters
Subject Effect id(center) (111 levels)
Number of Clusters 111
Correlation Matrix Dimension 4
Maximum Cluster Size 4
Minimum Cluster Size 4
Log Odds Ratio Parameter
Information
Parameter Group
Alpha1 (1, 2)
Alpha2 (1, 3)
Alpha3 (1, 4)
Alpha4 (2, 3)
Alpha5 (2, 4)
Alpha6 (3, 4)
33. Results depends on the corr structure assumed
Un logor
Parameter Estimate Standard
Error
Pr > |Z| Estimate Standard
Error
Pr > |Z|
Intercept -0.8882 0.4568 0.0519 -0.9266 0.4513 0.0400
treatment A 1.2442 0.3455 0.0003 1.2611 0.3406 0.0002
center 2 0.6558 0.3512 0.0619 0.6287 0.3486 0.0713
sex F 0.1128 0.4408 0.7981 0.1024 0.4362 0.8144
age -0.0175 0.0129 0.1728 -0.0162 0.0125 0.1977
baseline 1 1.8981 0.3441 <.0001 1.8980 0.3404 <.0001
34. Log odds ratio structure
𝑂𝑅 𝑌
𝑗, 𝑌𝑘 =
Pr(𝑌𝑗=1,𝑌𝑘=1)
Pr(𝑌𝑗=0,𝑌𝑘=1)
/
Pr(𝑌𝑗=1,𝑌𝑘=0)
Pr(𝑌𝑗=0,𝑌𝑘=0)
𝐴𝑙𝑝ℎ𝑎1 = 𝑂𝑅 𝑌1, 𝑌2 =1.6109
=> having a good outcome at visit 1(Y_1=1) is
associated with having a good outcome at visit 2.
35. Log linear model for epileptic seizure episodes
• The data consist of the number of epileptic seizures
in an eight-week baseline period, before any
treatment;
• and in each of four two-week treatment periods, in
which patients received either a placebo or the drug
Progabide in addition to other therapy.
Trt=0 placebo
Trt=1 Progabide
SAS file name: SAS demo GEE Poisson
37. /*** exclude an outlier ID 207
creating offset variable ***/
data Seizure;
set Seizure;
if ID ne 207;
if Visit = 0 then do;
X1=0;
Ltime = log(8);
end;
else do;
X1=1;
Ltime=log(2);
end;
run;
proc print ; run ;
proc genmod data=Seizure;
class id;
model count=x1 | trt / d=poisson offset=ltime;
repeated subject=id / corrw covb type=exch;
run;