Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterprise Miner by Patricia B. CerritoPresentation Transcript
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterprise Miner Patricia B. Cerrito [email_address] University of Louisville
To examine some issues with traditional statistical models and their basic assumptions
To examine the Central Limit Theorem and its necessity in statistical models
To look at the differences and similarities between clinical trials and health outcomes research
Surrogate Versus Real Endpoints
Because clinical trials tend to be short term, they use high risk patients and surrogate endpoints
Use of statins reduce cholesterol levels but do they increase longevity and disease free survival?
Health outcomes data can examine real endpoints from the general population
One Versus Many Endpoints
Clinical trials generally have one survival endpoint-time to recurrence, time to death, time to disease progression
Health outcomes can examine multiple endpoints simultaneously using survival data mining
Homogeneous Versus Heterogeneous Data
Clinical trials generally use inclusion/exclusion criteria to define a homogeneous sample
Health outcomes have to rely upon heterogeneous data
Populations are more gamma distributions than normal and this must be taken into consideration
Large Versus Small Samples
Clinical trials tend to use the smallest sample possible to achieve the desired power
Database designed for analysis and data are very clean
Health outcomes have an abundance of data and variables
Power not an issue
Data are very messy and require considerable preprocessing
Clinical trials not large enough to find all potential rare occurrences
Health outcomes have enough data to find rare occurrences and to predict the probability of occurrence
Requires modifications to standard linear models
Predictive modeling much better at actual prediction
Ottenbacher, Kenneth J. Ottenbacher, Heather R. Tooth, Leigh. Ostir, Glenn V.
A review of two journals found that articles using multivariable logistic regression frequently did not report commonly recommended assumptions. Journal of Clinical Epidemiology. 57(11):1147-52, 2004 Nov.
Statistical significance testing or confidence intervals were reported in all articles. Methods for selecting independent variables were described in 82%, and specific procedures used to generate the models were discussed in 65%.
Fewer than 50% of the articles indicated if interactions were tested or met the recommended events per independent variable ratio of 10:1.
Fewer than 20% of the articles described conformity to a linear gradient, examined collinearity, reported information on validation procedures, goodness-of-fit, discrimination statistics, or provided complete information on variable coding.
Brown, James M. O'Brien, Sean M. Wu, Changfu. Sikora, Jo Ann H. Griffith, Bartley P. Gammie, James S. Title: Isolated aortic valve replacement in North America comprising 108,687 patients in 10 years: changes in risks, valve types, and outcomes in the Society of Thoracic Surgeons National Database. Source: Journal of Thoracic & Cardiovascular Surgery. 137(1):82-90, 2009 Jan.
108,687 isolated aortic valve replacements were analyzed. Time-related trends were assessed by comparing distributions of risk factors, valve types, and outcomes in 1997 versus 2006.
Differences in case mix were summarized by comparing average predicted mortality risks with a logistic regression model.
Differences across subgroups and time were assessed.
RESULTS: There was a dramatic shift toward use of bioprosthetic valves.
Aortic valve replacement recipients in 2006 were older (mean age 65.9 vs 67.9 years, P < .001) with higher predicted operative mortality risk (2.75 vs 3.25, P < .001)
Observed mortality and permanent stroke rate fell (by 24% and 27%, respectively).
Female sex, age older than 70 years, and ejection fraction less than 30% were all related to higher mortality, higher stroke rate and longer postoperative stay.
There was a 39% reduction in mortality with preoperative renal failure.
Central Limit Theorem
As the sample size increases to infinity, the distribution of the sample average approaches a normal distribution with mean μ and variance σ 2 /n.
As n approaches infinity, the variance approaches zero.
Therefore, the distribution of the sample average starts to look like a straight line at the point μ if n is too large.
Central Limit Theorem
In addition, the sample mean is very susceptible to the influence of outliers.
Moreover, the confidence limits are defined based upon the assumption of normality and symmetry. Therefore, the existence of many outliers will skew the confidence interval.
Nonparametric models still require symmetry.
Many populations are highly skewed so that these models also have problems
We use data from the National Inpatient Sample from 2005
A stratified sample from 1000 hospitals from 37 states
Approximately 8 million inpatient stays
Distribution of Patient Stays
Kernel Density Estimation
Instead of assuming that the population follows a known distribution, we can estimate it.
Kernel density estimation is an excellent method to use to do this
Given that the National Inpatient Sample has 8 million records, we can consider it to be an infinite population. Therefore, we can sample from this population to see if it can be estimated by the Central Limit Theorem
We start with extracting 100 different samples of size N=5
Confidence Limit The confidence limit excludes much of the actual population distribution
Confidence Limit With Larger n
An over-reliance on the Central Limit Theorem can give a very misleading picture of the population distribution.
Kernel density estimation (PROC KDE) allows an examination of the entire population distribution instead of just using the mean to represent the population.
Without the assumption of normality, we need to use predictive modeling.
This is true for both logistic and linear regression where the assumption of normality is required.
The two regression techniques do not work well with skewed populations.
We first look at logistic regression for rare occurrences
Problems With Regression
Logistic regression is not designed to predict rare occurrences
With a rare occurrence, logistic regression will predict virtually all observations as non-occurrences
The accuracy will be high but the predictive ability of the model will be virtually nil.
For Logistic regression, a threshold value is defined, and regression values above the threshold are predicted as 1
Regression values below the threshold are predicted as 0
Choice of threshold value optimizes error rate
Classification With 3 Variables continued...
Classification With 3 Variables
Y = β 0 + β 1 X 1 + β 2 X 2 …….+ β k X k
log e (p/1− p) = β 0 + β 1 Χ 1 + β 2 Χ 2 …….β n Χ n
log e (Y) = β 0 + β 1 Χ 1 + β 2 Χ 2 …….β n Χ n
The parameter of the Poisson Distribution, λ , will represent the average mortality rate, say 2%.
Then the sample size times 2% will give the estimate for the number of deaths, say 1,000,000*0.02=20,000
However, the problem still persists.
For example, septicemia has a 26% mortality rate, pneumonia has a 7.5% rate
The three conditions include approximately 25% of total hospitalizations, leaving 75% not accounted for.
The Poisson distribution can be accurate on those patients but cannot determine anything about the remaining 75%
If more patient conditions are added, the 25% will increase but not to the point that the model will have good predictability
Takes a different approach
Uses equal group sizes
100% of the rarest level
Equal sample size of other level
Randomizes the selection of the sampling
Uses prior probabilities to choose the optimal model
50/50 Split in the Data Filter data to mortality outcome Filter data to non-mortality outcome Use PROC SURVEYSELECT to extract a subsample of non-mortality outcome Append the mortality outcome data to subsample
75/25 Split in the Data
90/10 Split in the Data
The reduced sample is partitioned into training/validation/testing sets
Only need training/testing sets for regression models
Model is validated on the testing set
Misclassification in Regression
Rule Induction Results
Data are sorted and divided into deciles
True positive patients with highest confidence come first
Next, positive patients with lower confidence.
True negative cases with lowest confidence come next
Next, negative cases with highest confidence.
Target density =number of actually positive instances in that decile the total number of instances in the decile.
The lift =the ratio of the target density for the decile to the target density over all the test data.
Way to find patients most at risk for mortality (or infection)
Predictive modeling in Enterprise Miner has some capabilities that are possible, but extremely difficult in SAS/Stat
Sampling a rare occurrence to a 50/50 split
Partitioning to validate the results
Comparing multiple models to find the one that is optimal
Clinical trials do differ from health outcomes research and the statistical techniques required must be adapted to outcomes research
Model assumptions are important, but too often ignored
We need to look at results in detail
Superficial consideration of results can lead to very erroneous conclusions