SlideShare a Scribd company logo
Survival Analysis
Dimension Reduction Techniques
Claressa Ullmayer and Iván Rodríguez
The University of Alaska, Fairbanks
The University of Arizona
30 July 2015
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Background
Given a dataset, we want to estimate the true survival function.
Complications:
Data Dimensionality
Data Censoring
Unknown True Survival Curve
We want to minimize bias and mean-squared error (MSE)
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Applications
Our running example: microarray gene expression datasets
with n patients and p genes such that n p
However, there exist many other implementations:
Engineering
Business
Public Health
Security
Biostatistics
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
The Survival Function
A survival function, S(t) describes the probability of an object
experiencing an explicit event after a particular time:
S(t) := P(T > t) =
∞
t
f(τ) dτ = 1 − F(t),
where t is the specific time, T is a random variable, f(τ) is the
PDF of T, and F(t) is the CDF of T
In our running example:
event of interest = death
S(t) = probability that a cancer patient survives—death
not observed—after a particular time
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
An Example of Survival Curves
Below are four survival arms demonstrating efficacy of different
drug choices for a particular cancer
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
The Accelerated Failure Time (AFT) Model
Two classic models are used to estimate the survival function:
Cox Proportional Hazards (CPH)
Accelerated Failure Time (AFT)
Chief differences:
Ease of interpretation—survivorship vis-à-vis hazard
AFT directly models survival times
AFT assumes covariates affect a constant
acceleration/deceleration of ‘disease’ life course
CPH posits no assumption about baseline hazard function
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
The Accelerated Failure Time (AFT) Model Cont.
Underlying formula:
ln(Ti) = µ + zi β + ei
i = 1, . . . , n total observations
Ti is the ith observation’s survival time
parameter µ is the theoretical mean
vector zi denotes the data covariates
vector β indicates the covariate or ‘regression’ coefficients
ei designates the random error for the ith observation
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Dimension Reduction Techniques
Three dimension reduction techniques are compared given
predictors in X and responses in Y:
Principal Component Analysis (PCA)
Partial Least Squares (PLS)
Johnson-Lindenstrauss inspired Random Matrices (RM)
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Principal Component Analysis (PCA)
PCA obtains orthogonal variance-maximized components in X
PCA is used when
X is highly collinear
covariates outnumber observations
Model: T = XW
Xn×p now related to Wp×p ‘loadings’ and Tn×p ‘scores’
Columns of W are eigenvectors of XT
X
Desired ‘principal’ components are retained
These have maximal variability in their respective directions
Note: response variable Y disregarded
Thus, known as an ‘unsupervised’ method
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Partial Least Squares (PLS)
PLS analyzes linear combinations of X and Y
PLS is used when
X is highly collinear
covariates vastly outnumber observations
Y is multidimensional
Model: X = TP + E and Y = UQ + F
X now related to ‘scores’ T, ‘loadings’ P, and error E
Y now related to ‘scores’ U, ‘loadings’ Q, and error F
PLS is iterative
covariance maximized between T and U
resulting ‘latent vectors’ retained—subtracted from X and Y
process repeated until X is a null matrix
Note: PLS performs singular value decompositions of XTY
Hence, known as a ‘supervised’ method
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Johnson-Lindenstrauss Lemma
Random Matrices inspired by the Johnson-Lindenstrauss
Lemma is the third dimension reduction technique
The Johnson-Lindenstrauss Lemma
For any ∈ (0, 1) and any n ∈ Z, let k ∈ Z be positive and let
k ≥
4 ln(n)
2/2 − 3/3
.
Then, for any set S of n points in Rd , there exists a mapping
f : Rd → Rk such that, for all points u, v ∈ S,
(1 − ) u − v 2
≤ f(u) − f(v) 2
≤ (1 + ) u − v 2
.
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Generating Random Matrices
Three Random Matrices were generated according to the
papers of Achlioptas and Dasgupta-Gupta
Properties of Achlioptas Matrices:
Rij =
1
√
k
×
+1 with probability 1/2,
−1 with probability 1/2.
Rij =
3
k
×



+1 with probability 1/6,
0 with probability 2/3,
−1 with probability 1/6.
Properties of Dasgupta-Gupta Matrix:
entries from N(0, 1) distribution with normalized rows
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Johnson-Lindenstrauss Success Simulations
Accuracy of the Johnson-Lindenstrauss Lemma was tested
with the three matrices testing varying values of and k
Johnson-Lindenstrauss Lemma passes 100% of the time
under the constraints for k and
To reduce X to 100 × 37, ≈ 0.65 to satisfy the
Johnson-Lindenstrauss Lemma
Show Simulations
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Simulating Data
Data was simulated in order to test which method best
minimized bias and MSE
dimension of 100 × 1000 observations and covariates
β1×1000 random regression coefficients from
U(−1.0 × 10−7, 1.0 × 107)
z1×1000 covariates for 100 observations
z ∼ (N(0, 1), 1)
z was exponentiated to make all the values log-normally
distributed
Ti, survival times, are exponentially distributed with
λi = e−zi β
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Applying PCA
Now that we had our data simulated, our next goal was to apply
our dimension reduction techniques.
First we implemented PCA and received 99 components
representing the eigenvalues of the variance-covariance
matrix
The components are linear combinations of the original
covariates (genes)
Below are the first ten components:
> z_star_PCA$eig
eigenvalue percentage of variance cumulative percentage of variance
comp 1 16.825242335933619841626 1.6825242335933618953447 1.682524233593361895345
comp 2 16.493827336712410414066 1.6493827336712409969977 3.331906967264602670298
comp 3 16.342590741574667845271 1.6342590741574667401181 4.966166041422069632461
comp 4 16.152005225467195970168 1.6152005225467198634703 6.581366563968789051842
comp 5 15.614782499223768041929 1.5614782499223769374197 8.142844813891166211306
comp 6 15.433746107384354928627 1.5433746107384354040448 9.686219424629602059440
comp 7 15.153687644673805579032 1.5153687644673805579032 11.201588189096982617343
comp 8 15.019382184758567788663 1.5019382184758567344574 12.703526407572839573845
comp 9 14.957535219902165835038 1.4957535219902164946859 14.199279929563054736263
comp 10 14.871508798048505894940 1.4871508798048505006761 15.686430809367905681029
We decided to incorporate 50% of the overall variance in
picking our components
Hence, we chose to reduce the data from 99 to 37Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Applying PLS and AFT
Next, we implemented PLS using the same number of
components that we chose for PCA
From both PCA and PLS, we obtained the weights on all
the genes for each component (open PDF)
We then multiply our original 100 × 1000 matrix by the
resulting 1000 × 37 matrix of weights to get a 100 × 37
reduced matrix for both PLS and PCA
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Estimating the Survival Function
We took these new matrices and fed them into the AFT
model to get our estimated regression coefficients
We can then estimate the survival function, defined as
ˆS0(t) = e−te−¯z∗ ˆβ∗
¯z∗
is the column-centered original matrix of observations
and covariates
ˆβ∗
is a matrix produced by multiplying our original simulated
regression coefficients by our matrix of obtained weights
−¯z∗ ˆβ∗
becomes a scalar
We know our real survival function is S0(t) = e−¯λt
We repeated this procedure for 5000 iterations
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Results
Since we have the estimated and the real survival function,
we can estimate the bias and MSE for PCA, PLS, and the
three random matrices
To compare the performance of the dimension reduction
techniques , we first partitioned the y-axis of the survival
curve into equally spaced sections, ui for i = 1, . . . , 20
Then, we found the corresponding ti on the x-axis of the
survival curve
For each of the 20 ti, we summed the bias and MSE for
each point to get the distribution of the errors after 5000
iterations
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Bias Plot, PCA and PLS
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Mean-Squared Error Plot, PCA and PLS
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Bias Plot, Random Matrices
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Mean-Squared Error Plot, Random Matrices
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Bias Plot, All Methods
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Mean-Squared Error Plot, All Methods
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Discussion
Censoring is when the event of interest for a given subject
was not observed for some extraneous reason
Naturally censoring is a problem in real life investigations
and studies. Unfortunately, we did not have the time to
incorporate the effect of censoring on our data simulations.
Furthermore, a complication arose in the generation of the
fixed β coefficients; essentially, R software necessitated
generating grossly smaller βs due to the exponent in
ˆS0(t) = e−te−¯z∗ ˆβ∗
in the survival curve estimate.
An initial goal was to apply our findings to real microarray
gene datasets—due to time constraints, this objective was
not fulfilled
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
References
Cox, DR. Regression Models and life tables (with discussion).
Journal of Royal Statistical Society Series B34: 187-220, 1972.
Johnson, W.B. and J. Lindenstrauss. Extensions of Lipschitz
maps into a Hilbert space. Contemp Math 26: 189-206, 1984.
Pearson, K. On lines and planes of closest fit to systems of
points in space. Philosophical Magazine 2: 559-572, 1901.
Wold, H. Estimation of principal components and related
models by iterative least squares. P.R. Krishnaiaah: 391-420,
1966.
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
References Cont.
Achlioptas, D. Database-friendly random projections:
Johnson-Lindenstrauss with binary coins. Journal of Computer
and System Sciences 66(4): 671-687, 2003.
Dasgupta, S. and A. Gupta. An elementary proof of a theorem
of Johnson and Lindenstrauss. Random Structures and
Algorithms 22(1): 60-65, 2003.
Nguyen, D.V. Partial least squares dimension reduction for
microarray gene expression data with a censored response.
Math Biosci 193: 119-137, 2005.
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
References Cont.
Nguyen, D.V., and D.M. Rocke. On partial least squares
dimension reduction for microarraybased classification: A
simulation study. Comput Stat Data Analysis 46: 407-425,
2004.
Nguyen, Tuan S. and Javier Rojo. Dimension Reduction of
Microarray Gene Expression Data: The Accelerated Failure
Time Model. Journal of Bioinformatics and Computational
Biology 7(6): 939-954, 2009.
Nguyen, Tuan S. and Javier Rojo. Dimension Reduction of
Microarray Data in the Presence of a Censored Survival
Response: A Simulation Study. Statistical Applications in
Genetics and Molecular Biology 8(1): 2009.
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
Thank You
This research was supported by the National Security Agency
through REU Grant H98230-15-1-0048 to The University of
Nevada at Reno, Javier Rojo PI.
We would like to greatly thank the NSA for funding our
research this summer
Thank you all for taking the time to be here and listen to
our presentation
Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques

More Related Content

What's hot

Eigen values and eigen vectors engineering
Eigen values and eigen vectors engineeringEigen values and eigen vectors engineering
Eigen values and eigen vectors engineering
shubham211
 
Power method
Power methodPower method
Power method
nashaat algrara
 
Eigen values and eigen vectors
Eigen values and eigen vectorsEigen values and eigen vectors
Eigen values and eigen vectors
Riddhi Patel
 
Eigen value and eigen vector
Eigen value and eigen vectorEigen value and eigen vector
Eigen value and eigen vector
Rutvij Patel
 
Maths-->>Eigenvalues and eigenvectors
Maths-->>Eigenvalues and eigenvectorsMaths-->>Eigenvalues and eigenvectors
Maths-->>Eigenvalues and eigenvectors
Jaydev Kishnani
 
Eigenvalues and eigenvectors
Eigenvalues and eigenvectorsEigenvalues and eigenvectors
Eigenvalues and eigenvectors
iraq
 
Eigen values and eigenvectors
Eigen values and eigenvectorsEigen values and eigenvectors
Eigen values and eigenvectorsAmit Singh
 
Numerical solution of eigenvalues and applications 2
Numerical solution of eigenvalues and applications 2Numerical solution of eigenvalues and applications 2
Numerical solution of eigenvalues and applications 2
SamsonAjibola
 
Eigenvalues and Eigenvectors (Tacoma Narrows Bridge video included)
Eigenvalues and Eigenvectors (Tacoma Narrows Bridge video included)Eigenvalues and Eigenvectors (Tacoma Narrows Bridge video included)
Eigenvalues and Eigenvectors (Tacoma Narrows Bridge video included)Prasanth George
 
MASSS_Presentation_20160209
MASSS_Presentation_20160209MASSS_Presentation_20160209
MASSS_Presentation_20160209Yimin Wu
 
Eigen value and vectors
Eigen value and vectorsEigen value and vectors
Eigen value and vectors
Praveen Prashant
 
Ridge regression
Ridge regressionRidge regression
Ridge regression
Ananda Swarup
 
Eigenvectors & Eigenvalues: The Road to Diagonalisation
Eigenvectors & Eigenvalues: The Road to DiagonalisationEigenvectors & Eigenvalues: The Road to Diagonalisation
Eigenvectors & Eigenvalues: The Road to DiagonalisationChristopher Gratton
 
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
Ceni Babaoglu, PhD
 
Eigenvalue problems .ppt
Eigenvalue problems .pptEigenvalue problems .ppt
Eigenvalue problems .ppt
Self-employed
 
Eigen value , eigen vectors, caley hamilton theorem
Eigen value , eigen vectors, caley hamilton theoremEigen value , eigen vectors, caley hamilton theorem
Eigen value , eigen vectors, caley hamilton theorem
gidc engineering college
 
Jacobi iterative method
Jacobi iterative methodJacobi iterative method
Jacobi iterative method
Luckshay Batra
 

What's hot (20)

Eigen values and eigen vectors engineering
Eigen values and eigen vectors engineeringEigen values and eigen vectors engineering
Eigen values and eigen vectors engineering
 
Power method
Power methodPower method
Power method
 
Eigen values and eigen vectors
Eigen values and eigen vectorsEigen values and eigen vectors
Eigen values and eigen vectors
 
Eigen value and eigen vector
Eigen value and eigen vectorEigen value and eigen vector
Eigen value and eigen vector
 
Maths-->>Eigenvalues and eigenvectors
Maths-->>Eigenvalues and eigenvectorsMaths-->>Eigenvalues and eigenvectors
Maths-->>Eigenvalues and eigenvectors
 
Eigenvalues and eigenvectors
Eigenvalues and eigenvectorsEigenvalues and eigenvectors
Eigenvalues and eigenvectors
 
Eigen values and eigenvectors
Eigen values and eigenvectorsEigen values and eigenvectors
Eigen values and eigenvectors
 
Numerical solution of eigenvalues and applications 2
Numerical solution of eigenvalues and applications 2Numerical solution of eigenvalues and applications 2
Numerical solution of eigenvalues and applications 2
 
Rankmatrix
RankmatrixRankmatrix
Rankmatrix
 
Eigenvalues and Eigenvectors (Tacoma Narrows Bridge video included)
Eigenvalues and Eigenvectors (Tacoma Narrows Bridge video included)Eigenvalues and Eigenvectors (Tacoma Narrows Bridge video included)
Eigenvalues and Eigenvectors (Tacoma Narrows Bridge video included)
 
MASSS_Presentation_20160209
MASSS_Presentation_20160209MASSS_Presentation_20160209
MASSS_Presentation_20160209
 
Eigen value and vectors
Eigen value and vectorsEigen value and vectors
Eigen value and vectors
 
Ridge regression
Ridge regressionRidge regression
Ridge regression
 
Eigenvectors & Eigenvalues: The Road to Diagonalisation
Eigenvectors & Eigenvalues: The Road to DiagonalisationEigenvectors & Eigenvalues: The Road to Diagonalisation
Eigenvectors & Eigenvalues: The Road to Diagonalisation
 
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
3. Linear Algebra for Machine Learning: Factorization and Linear Transformations
 
Maths
MathsMaths
Maths
 
Eigenvalue problems .ppt
Eigenvalue problems .pptEigenvalue problems .ppt
Eigenvalue problems .ppt
 
JISA_Paper
JISA_PaperJISA_Paper
JISA_Paper
 
Eigen value , eigen vectors, caley hamilton theorem
Eigen value , eigen vectors, caley hamilton theoremEigen value , eigen vectors, caley hamilton theorem
Eigen value , eigen vectors, caley hamilton theorem
 
Jacobi iterative method
Jacobi iterative methodJacobi iterative method
Jacobi iterative method
 

Similar to Ullmayer_Rodriguez_Presentation

9_Poisson_printable.pdf
9_Poisson_printable.pdf9_Poisson_printable.pdf
9_Poisson_printable.pdf
Elio Laureano
 
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
inventionjournals
 
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
inventionjournals
 
PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...
PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...
PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...
The Statistical and Applied Mathematical Sciences Institute
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
Arthur Charpentier
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
Federico Cerutti
 
Simulation Study for Extended AUC In Disease Risk Prediction in survival anal...
Simulation Study for Extended AUC In Disease Risk Prediction in survival anal...Simulation Study for Extended AUC In Disease Risk Prediction in survival anal...
Simulation Study for Extended AUC In Disease Risk Prediction in survival anal...Gang Cui
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
The Statistical and Applied Mathematical Sciences Institute
 
Formulation of model likelihood functions
Formulation of model likelihood functionsFormulation of model likelihood functions
Formulation of model likelihood functions
Andreas Scheidegger
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
Arthur Charpentier
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
Chandan Reddy
 
A lambda calculus for density matrices with classical and probabilistic controls
A lambda calculus for density matrices with classical and probabilistic controlsA lambda calculus for density matrices with classical and probabilistic controls
A lambda calculus for density matrices with classical and probabilistic controls
Alejandro Díaz-Caro
 
Lesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing dataLesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing data
mjlobetos
 
Module 2_ Regression Models..pptx
Module 2_ Regression Models..pptxModule 2_ Regression Models..pptx
Module 2_ Regression Models..pptx
nikshaikh786
 
ch3.ppt
ch3.pptch3.ppt
Regression
RegressionRegression
20070823
2007082320070823
20070823neostar
 

Similar to Ullmayer_Rodriguez_Presentation (20)

9_Poisson_printable.pdf
9_Poisson_printable.pdf9_Poisson_printable.pdf
9_Poisson_printable.pdf
 
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
 
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
 
PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...
PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...
PMED Transition Workshop - Non-parametric Techniques for Estimating Tumor Het...
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
 
Simulation Study for Extended AUC In Disease Risk Prediction in survival anal...
Simulation Study for Extended AUC In Disease Risk Prediction in survival anal...Simulation Study for Extended AUC In Disease Risk Prediction in survival anal...
Simulation Study for Extended AUC In Disease Risk Prediction in survival anal...
 
overviewPCA
overviewPCAoverviewPCA
overviewPCA
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Formulation of model likelihood functions
Formulation of model likelihood functionsFormulation of model likelihood functions
Formulation of model likelihood functions
 
Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Paper 7 (s.k. ashour)
Paper 7 (s.k. ashour)Paper 7 (s.k. ashour)
Paper 7 (s.k. ashour)
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
A lambda calculus for density matrices with classical and probabilistic controls
A lambda calculus for density matrices with classical and probabilistic controlsA lambda calculus for density matrices with classical and probabilistic controls
A lambda calculus for density matrices with classical and probabilistic controls
 
MT2
MT2MT2
MT2
 
Lesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing dataLesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing data
 
Module 2_ Regression Models..pptx
Module 2_ Regression Models..pptxModule 2_ Regression Models..pptx
Module 2_ Regression Models..pptx
 
ch3.ppt
ch3.pptch3.ppt
ch3.ppt
 
Regression
RegressionRegression
Regression
 
20070823
2007082320070823
20070823
 

More from ​Iván Rodríguez

Rodriguez_THINK_TANK_Testimonial
Rodriguez_THINK_TANK_TestimonialRodriguez_THINK_TANK_Testimonial
Rodriguez_THINK_TANK_Testimonial​Iván Rodríguez
 
Rodriguez_UROC_Final_Presentation
Rodriguez_UROC_Final_PresentationRodriguez_UROC_Final_Presentation
Rodriguez_UROC_Final_Presentation​Iván Rodríguez
 
Rodriguez_THINK_TANK_Difficult_Problem_12
Rodriguez_THINK_TANK_Difficult_Problem_12Rodriguez_THINK_TANK_Difficult_Problem_12
Rodriguez_THINK_TANK_Difficult_Problem_12​Iván Rodríguez
 
Rodriguez_THINK_TANK_Difficult_Problem_9
Rodriguez_THINK_TANK_Difficult_Problem_9Rodriguez_THINK_TANK_Difficult_Problem_9
Rodriguez_THINK_TANK_Difficult_Problem_9​Iván Rodríguez
 
Rodriguez_THINK_TANK_Mathematics_Tutoring_Philosophy
Rodriguez_THINK_TANK_Mathematics_Tutoring_PhilosophyRodriguez_THINK_TANK_Mathematics_Tutoring_Philosophy
Rodriguez_THINK_TANK_Mathematics_Tutoring_Philosophy​Iván Rodríguez
 
Rodriguez_Survival_Abstract_Beamer
Rodriguez_Survival_Abstract_BeamerRodriguez_Survival_Abstract_Beamer
Rodriguez_Survival_Abstract_Beamer​Iván Rodríguez
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNASRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS​Iván Rodríguez
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_ReportRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report​Iván Rodríguez
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMMRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM​Iván Rodríguez
 

More from ​Iván Rodríguez (12)

Rodriguez_NRMC_Presentation
Rodriguez_NRMC_PresentationRodriguez_NRMC_Presentation
Rodriguez_NRMC_Presentation
 
Rodriguez_THINK_TANK_Testimonial
Rodriguez_THINK_TANK_TestimonialRodriguez_THINK_TANK_Testimonial
Rodriguez_THINK_TANK_Testimonial
 
Rodriguez_UROC_Final_Poster
Rodriguez_UROC_Final_PosterRodriguez_UROC_Final_Poster
Rodriguez_UROC_Final_Poster
 
Rodriguez_UROC_Final_Presentation
Rodriguez_UROC_Final_PresentationRodriguez_UROC_Final_Presentation
Rodriguez_UROC_Final_Presentation
 
Rodriguez_THINK_TANK_Difficult_Problem_12
Rodriguez_THINK_TANK_Difficult_Problem_12Rodriguez_THINK_TANK_Difficult_Problem_12
Rodriguez_THINK_TANK_Difficult_Problem_12
 
Rodriguez_THINK_TANK_Difficult_Problem_9
Rodriguez_THINK_TANK_Difficult_Problem_9Rodriguez_THINK_TANK_Difficult_Problem_9
Rodriguez_THINK_TANK_Difficult_Problem_9
 
Rodriguez_THINK_TANK_Mathematics_Tutoring_Philosophy
Rodriguez_THINK_TANK_Mathematics_Tutoring_PhilosophyRodriguez_THINK_TANK_Mathematics_Tutoring_Philosophy
Rodriguez_THINK_TANK_Mathematics_Tutoring_Philosophy
 
Rodriguez_DRT_Abstract_Beamer
Rodriguez_DRT_Abstract_BeamerRodriguez_DRT_Abstract_Beamer
Rodriguez_DRT_Abstract_Beamer
 
Rodriguez_Survival_Abstract_Beamer
Rodriguez_Survival_Abstract_BeamerRodriguez_Survival_Abstract_Beamer
Rodriguez_Survival_Abstract_Beamer
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNASRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_ReportRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMMRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
 

Ullmayer_Rodriguez_Presentation

  • 1. Survival Analysis Dimension Reduction Techniques Claressa Ullmayer and Iván Rodríguez The University of Alaska, Fairbanks The University of Arizona 30 July 2015 Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 2. Background Given a dataset, we want to estimate the true survival function. Complications: Data Dimensionality Data Censoring Unknown True Survival Curve We want to minimize bias and mean-squared error (MSE) Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 3. Applications Our running example: microarray gene expression datasets with n patients and p genes such that n p However, there exist many other implementations: Engineering Business Public Health Security Biostatistics Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 4. The Survival Function A survival function, S(t) describes the probability of an object experiencing an explicit event after a particular time: S(t) := P(T > t) = ∞ t f(τ) dτ = 1 − F(t), where t is the specific time, T is a random variable, f(τ) is the PDF of T, and F(t) is the CDF of T In our running example: event of interest = death S(t) = probability that a cancer patient survives—death not observed—after a particular time Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 5. An Example of Survival Curves Below are four survival arms demonstrating efficacy of different drug choices for a particular cancer Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 6. The Accelerated Failure Time (AFT) Model Two classic models are used to estimate the survival function: Cox Proportional Hazards (CPH) Accelerated Failure Time (AFT) Chief differences: Ease of interpretation—survivorship vis-à-vis hazard AFT directly models survival times AFT assumes covariates affect a constant acceleration/deceleration of ‘disease’ life course CPH posits no assumption about baseline hazard function Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 7. The Accelerated Failure Time (AFT) Model Cont. Underlying formula: ln(Ti) = µ + zi β + ei i = 1, . . . , n total observations Ti is the ith observation’s survival time parameter µ is the theoretical mean vector zi denotes the data covariates vector β indicates the covariate or ‘regression’ coefficients ei designates the random error for the ith observation Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 8. Dimension Reduction Techniques Three dimension reduction techniques are compared given predictors in X and responses in Y: Principal Component Analysis (PCA) Partial Least Squares (PLS) Johnson-Lindenstrauss inspired Random Matrices (RM) Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 9. Principal Component Analysis (PCA) PCA obtains orthogonal variance-maximized components in X PCA is used when X is highly collinear covariates outnumber observations Model: T = XW Xn×p now related to Wp×p ‘loadings’ and Tn×p ‘scores’ Columns of W are eigenvectors of XT X Desired ‘principal’ components are retained These have maximal variability in their respective directions Note: response variable Y disregarded Thus, known as an ‘unsupervised’ method Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 10. Partial Least Squares (PLS) PLS analyzes linear combinations of X and Y PLS is used when X is highly collinear covariates vastly outnumber observations Y is multidimensional Model: X = TP + E and Y = UQ + F X now related to ‘scores’ T, ‘loadings’ P, and error E Y now related to ‘scores’ U, ‘loadings’ Q, and error F PLS is iterative covariance maximized between T and U resulting ‘latent vectors’ retained—subtracted from X and Y process repeated until X is a null matrix Note: PLS performs singular value decompositions of XTY Hence, known as a ‘supervised’ method Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 11. Johnson-Lindenstrauss Lemma Random Matrices inspired by the Johnson-Lindenstrauss Lemma is the third dimension reduction technique The Johnson-Lindenstrauss Lemma For any ∈ (0, 1) and any n ∈ Z, let k ∈ Z be positive and let k ≥ 4 ln(n) 2/2 − 3/3 . Then, for any set S of n points in Rd , there exists a mapping f : Rd → Rk such that, for all points u, v ∈ S, (1 − ) u − v 2 ≤ f(u) − f(v) 2 ≤ (1 + ) u − v 2 . Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 12. Generating Random Matrices Three Random Matrices were generated according to the papers of Achlioptas and Dasgupta-Gupta Properties of Achlioptas Matrices: Rij = 1 √ k × +1 with probability 1/2, −1 with probability 1/2. Rij = 3 k ×    +1 with probability 1/6, 0 with probability 2/3, −1 with probability 1/6. Properties of Dasgupta-Gupta Matrix: entries from N(0, 1) distribution with normalized rows Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 13. Johnson-Lindenstrauss Success Simulations Accuracy of the Johnson-Lindenstrauss Lemma was tested with the three matrices testing varying values of and k Johnson-Lindenstrauss Lemma passes 100% of the time under the constraints for k and To reduce X to 100 × 37, ≈ 0.65 to satisfy the Johnson-Lindenstrauss Lemma Show Simulations Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 14. Simulating Data Data was simulated in order to test which method best minimized bias and MSE dimension of 100 × 1000 observations and covariates β1×1000 random regression coefficients from U(−1.0 × 10−7, 1.0 × 107) z1×1000 covariates for 100 observations z ∼ (N(0, 1), 1) z was exponentiated to make all the values log-normally distributed Ti, survival times, are exponentially distributed with λi = e−zi β Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 15. Applying PCA Now that we had our data simulated, our next goal was to apply our dimension reduction techniques. First we implemented PCA and received 99 components representing the eigenvalues of the variance-covariance matrix The components are linear combinations of the original covariates (genes) Below are the first ten components: > z_star_PCA$eig eigenvalue percentage of variance cumulative percentage of variance comp 1 16.825242335933619841626 1.6825242335933618953447 1.682524233593361895345 comp 2 16.493827336712410414066 1.6493827336712409969977 3.331906967264602670298 comp 3 16.342590741574667845271 1.6342590741574667401181 4.966166041422069632461 comp 4 16.152005225467195970168 1.6152005225467198634703 6.581366563968789051842 comp 5 15.614782499223768041929 1.5614782499223769374197 8.142844813891166211306 comp 6 15.433746107384354928627 1.5433746107384354040448 9.686219424629602059440 comp 7 15.153687644673805579032 1.5153687644673805579032 11.201588189096982617343 comp 8 15.019382184758567788663 1.5019382184758567344574 12.703526407572839573845 comp 9 14.957535219902165835038 1.4957535219902164946859 14.199279929563054736263 comp 10 14.871508798048505894940 1.4871508798048505006761 15.686430809367905681029 We decided to incorporate 50% of the overall variance in picking our components Hence, we chose to reduce the data from 99 to 37Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 16. Applying PLS and AFT Next, we implemented PLS using the same number of components that we chose for PCA From both PCA and PLS, we obtained the weights on all the genes for each component (open PDF) We then multiply our original 100 × 1000 matrix by the resulting 1000 × 37 matrix of weights to get a 100 × 37 reduced matrix for both PLS and PCA Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 17. Estimating the Survival Function We took these new matrices and fed them into the AFT model to get our estimated regression coefficients We can then estimate the survival function, defined as ˆS0(t) = e−te−¯z∗ ˆβ∗ ¯z∗ is the column-centered original matrix of observations and covariates ˆβ∗ is a matrix produced by multiplying our original simulated regression coefficients by our matrix of obtained weights −¯z∗ ˆβ∗ becomes a scalar We know our real survival function is S0(t) = e−¯λt We repeated this procedure for 5000 iterations Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 18. Results Since we have the estimated and the real survival function, we can estimate the bias and MSE for PCA, PLS, and the three random matrices To compare the performance of the dimension reduction techniques , we first partitioned the y-axis of the survival curve into equally spaced sections, ui for i = 1, . . . , 20 Then, we found the corresponding ti on the x-axis of the survival curve For each of the 20 ti, we summed the bias and MSE for each point to get the distribution of the errors after 5000 iterations Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 19. Bias Plot, PCA and PLS Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 20. Mean-Squared Error Plot, PCA and PLS Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 21. Bias Plot, Random Matrices Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 22. Mean-Squared Error Plot, Random Matrices Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 23. Bias Plot, All Methods Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 24. Mean-Squared Error Plot, All Methods Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 25. Discussion Censoring is when the event of interest for a given subject was not observed for some extraneous reason Naturally censoring is a problem in real life investigations and studies. Unfortunately, we did not have the time to incorporate the effect of censoring on our data simulations. Furthermore, a complication arose in the generation of the fixed β coefficients; essentially, R software necessitated generating grossly smaller βs due to the exponent in ˆS0(t) = e−te−¯z∗ ˆβ∗ in the survival curve estimate. An initial goal was to apply our findings to real microarray gene datasets—due to time constraints, this objective was not fulfilled Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 26. References Cox, DR. Regression Models and life tables (with discussion). Journal of Royal Statistical Society Series B34: 187-220, 1972. Johnson, W.B. and J. Lindenstrauss. Extensions of Lipschitz maps into a Hilbert space. Contemp Math 26: 189-206, 1984. Pearson, K. On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2: 559-572, 1901. Wold, H. Estimation of principal components and related models by iterative least squares. P.R. Krishnaiaah: 391-420, 1966. Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 27. References Cont. Achlioptas, D. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences 66(4): 671-687, 2003. Dasgupta, S. and A. Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms 22(1): 60-65, 2003. Nguyen, D.V. Partial least squares dimension reduction for microarray gene expression data with a censored response. Math Biosci 193: 119-137, 2005. Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 28. References Cont. Nguyen, D.V., and D.M. Rocke. On partial least squares dimension reduction for microarraybased classification: A simulation study. Comput Stat Data Analysis 46: 407-425, 2004. Nguyen, Tuan S. and Javier Rojo. Dimension Reduction of Microarray Gene Expression Data: The Accelerated Failure Time Model. Journal of Bioinformatics and Computational Biology 7(6): 939-954, 2009. Nguyen, Tuan S. and Javier Rojo. Dimension Reduction of Microarray Data in the Presence of a Censored Survival Response: A Simulation Study. Statistical Applications in Genetics and Molecular Biology 8(1): 2009. Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques
  • 29. Thank You This research was supported by the National Security Agency through REU Grant H98230-15-1-0048 to The University of Nevada at Reno, Javier Rojo PI. We would like to greatly thank the NSA for funding our research this summer Thank you all for taking the time to be here and listen to our presentation Claressa Ullmayer and Iván Rodríguez Survival Analysis Dimension Reduction Techniques