1. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
As Random As It Gets:
Dimension Reduction Techniques à la Survival Analysis
Under the Accelerated Failure Time Model
Ivan Rodriguez†
Claressa Ullmayer‡
Dr. Rojo§
†The University of Arizona
‡The University of Alaska, Fairbanks
§The University of Nevada, Reno
20 February 2016
2. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
3. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
4. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
5. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
6. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
7. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
8. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
9. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
10. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
11. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
12. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Reducing Dimensionality
Ideal scenario: researchers obtain copious data.
Variable(s) and response(s) likely beyond R2.
Glaring complication: even sophisticated software and
hardware can struggle/fail to provide data analysis.
Classical workaround: employ mathematical methods to
reduce dimensions while retaining the general structure.
13. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Reducing Dimensionality
Ideal scenario: researchers obtain copious data.
Variable(s) and response(s) likely beyond R2.
Glaring complication: even sophisticated software and
hardware can struggle/fail to provide data analysis.
Classical workaround: employ mathematical methods to
reduce dimensions while retaining the general structure.
14. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Reducing Dimensionality
Ideal scenario: researchers obtain copious data.
Variable(s) and response(s) likely beyond R2.
Glaring complication: even sophisticated software and
hardware can struggle/fail to provide data analysis.
Classical workaround: employ mathematical methods to
reduce dimensions while retaining the general structure.
15. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Reducing Dimensionality
Ideal scenario: researchers obtain copious data.
Variable(s) and response(s) likely beyond R2.
Glaring complication: even sophisticated software and
hardware can struggle/fail to provide data analysis.
Classical workaround: employ mathematical methods to
reduce dimensions while retaining the general structure.
16. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Reducing Dimensionality
Ideal scenario: researchers obtain copious data.
Variable(s) and response(s) likely beyond R2.
Glaring complication: even sophisticated software and
hardware can struggle/fail to provide data analysis.
Classical workaround: employ mathematical methods to
reduce dimensions while retaining the general structure.
17. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Previous Insights
Johnson-Lindenstrauss (1984) extended and provided
proof of Lipschitz mappings into Hilbert spaces.
Achlioptas (2003) and Dasgupta-Gupta (2003) devised
examples of these mappings.
Nguyen and Rojo (2009a; 2009b) compared various
dimension-reduction techniques.
To wit: variants of principal component analysis (PCA)
and partial least squares (PLS).
18. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Previous Insights
Johnson-Lindenstrauss (1984) extended and provided
proof of Lipschitz mappings into Hilbert spaces.
Achlioptas (2003) and Dasgupta-Gupta (2003) devised
examples of these mappings.
Nguyen and Rojo (2009a; 2009b) compared various
dimension-reduction techniques.
To wit: variants of principal component analysis (PCA)
and partial least squares (PLS).
19. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Previous Insights
Johnson-Lindenstrauss (1984) extended and provided
proof of Lipschitz mappings into Hilbert spaces.
Achlioptas (2003) and Dasgupta-Gupta (2003) devised
examples of these mappings.
Nguyen and Rojo (2009a; 2009b) compared various
dimension-reduction techniques.
To wit: variants of principal component analysis (PCA)
and partial least squares (PLS).
20. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Previous Insights
Johnson-Lindenstrauss (1984) extended and provided
proof of Lipschitz mappings into Hilbert spaces.
Achlioptas (2003) and Dasgupta-Gupta (2003) devised
examples of these mappings.
Nguyen and Rojo (2009a; 2009b) compared various
dimension-reduction techniques.
To wit: variants of principal component analysis (PCA)
and partial least squares (PLS).
21. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Previous Insights
Johnson-Lindenstrauss (1984) extended and provided
proof of Lipschitz mappings into Hilbert spaces.
Achlioptas (2003) and Dasgupta-Gupta (2003) devised
examples of these mappings.
Nguyen and Rojo (2009a; 2009b) compared various
dimension-reduction techniques.
To wit: variants of principal component analysis (PCA)
and partial least squares (PLS).
22. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Objectives
Simulate datasets to compare PCA, PLS and three
flavors of JL-inspired random matrices (RMs).
Criteria: bias and mean-squared error (MSE).
Use the accelerated failure time (AFT) model to
perform data regression and output survival curves.
Alternative to the celebrated Cox proportional hazards
(CPH) model.
Ease-of-interpretation.
23. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Objectives
Simulate datasets to compare PCA, PLS and three
flavors of JL-inspired random matrices (RMs).
Criteria: bias and mean-squared error (MSE).
Use the accelerated failure time (AFT) model to
perform data regression and output survival curves.
Alternative to the celebrated Cox proportional hazards
(CPH) model.
Ease-of-interpretation.
24. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Objectives
Simulate datasets to compare PCA, PLS and three
flavors of JL-inspired random matrices (RMs).
Criteria: bias and mean-squared error (MSE).
Use the accelerated failure time (AFT) model to
perform data regression and output survival curves.
Alternative to the celebrated Cox proportional hazards
(CPH) model.
Ease-of-interpretation.
25. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Objectives
Simulate datasets to compare PCA, PLS and three
flavors of JL-inspired random matrices (RMs).
Criteria: bias and mean-squared error (MSE).
Use the accelerated failure time (AFT) model to
perform data regression and output survival curves.
Alternative to the celebrated Cox proportional hazards
(CPH) model.
Ease-of-interpretation.
26. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Objectives
Simulate datasets to compare PCA, PLS and three
flavors of JL-inspired random matrices (RMs).
Criteria: bias and mean-squared error (MSE).
Use the accelerated failure time (AFT) model to
perform data regression and output survival curves.
Alternative to the celebrated Cox proportional hazards
(CPH) model.
Ease-of-interpretation.
27. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Objectives
Simulate datasets to compare PCA, PLS and three
flavors of JL-inspired random matrices (RMs).
Criteria: bias and mean-squared error (MSE).
Use the accelerated failure time (AFT) model to
perform data regression and output survival curves.
Alternative to the celebrated Cox proportional hazards
(CPH) model.
Ease-of-interpretation.
28. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Survival Analysis
Overcomes standard regression limitations.
Can incorporate censored data.
Survival curves express probability of surviving an
unambiguous event of interest.
Model:
S(t) = P (T > t) =
∞
t
f (τ)dτ = 1 − F(t), where
t ∈ [0, ∞) is a specific time,
T is a random variable,
f (τ) is the density of T, and
F(t) is the distribution of T.
29. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Survival Analysis
Overcomes standard regression limitations.
Can incorporate censored data.
Survival curves express probability of surviving an
unambiguous event of interest.
Model:
S(t) = P (T > t) =
∞
t
f (τ)dτ = 1 − F(t), where
t ∈ [0, ∞) is a specific time,
T is a random variable,
f (τ) is the density of T, and
F(t) is the distribution of T.
30. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Survival Analysis
Overcomes standard regression limitations.
Can incorporate censored data.
Survival curves express probability of surviving an
unambiguous event of interest.
Model:
S(t) = P (T > t) =
∞
t
f (τ)dτ = 1 − F(t), where
t ∈ [0, ∞) is a specific time,
T is a random variable,
f (τ) is the density of T, and
F(t) is the distribution of T.
31. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Survival Analysis
Overcomes standard regression limitations.
Can incorporate censored data.
Survival curves express probability of surviving an
unambiguous event of interest.
Model:
S(t) = P (T > t) =
∞
t
f (τ)dτ = 1 − F(t), where
t ∈ [0, ∞) is a specific time,
T is a random variable,
f (τ) is the density of T, and
F(t) is the distribution of T.
32. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Survival Analysis
Overcomes standard regression limitations.
Can incorporate censored data.
Survival curves express probability of surviving an
unambiguous event of interest.
Model:
S(t) = P (T > t) =
∞
t
f (τ)dτ = 1 − F(t), where
t ∈ [0, ∞) is a specific time,
T is a random variable,
f (τ) is the density of T, and
F(t) is the distribution of T.
33. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Accelerated Failure Time (AFT) Model
Provides survival curves.
Directly incorporates survival times.
Model:
ln (Ti ) = µ + zT
i β + ei , where
i is the index of n observations,
Ti is the ith
observation’s survival time,
µ is the theoretical mean,
zi is the ith
observation’s covariates,
β is the 1-D regression scalar vector, and
ei is the ith
observation’s random error.
34. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Accelerated Failure Time (AFT) Model
Provides survival curves.
Directly incorporates survival times.
Model:
ln (Ti ) = µ + zT
i β + ei , where
i is the index of n observations,
Ti is the ith
observation’s survival time,
µ is the theoretical mean,
zi is the ith
observation’s covariates,
β is the 1-D regression scalar vector, and
ei is the ith
observation’s random error.
35. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Accelerated Failure Time (AFT) Model
Provides survival curves.
Directly incorporates survival times.
Model:
ln (Ti ) = µ + zT
i β + ei , where
i is the index of n observations,
Ti is the ith
observation’s survival time,
µ is the theoretical mean,
zi is the ith
observation’s covariates,
β is the 1-D regression scalar vector, and
ei is the ith
observation’s random error.
36. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Accelerated Failure Time (AFT) Model
Provides survival curves.
Directly incorporates survival times.
Model:
ln (Ti ) = µ + zT
i β + ei , where
i is the index of n observations,
Ti is the ith
observation’s survival time,
µ is the theoretical mean,
zi is the ith
observation’s covariates,
β is the 1-D regression scalar vector, and
ei is the ith
observation’s random error.
37. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Principal Component Analysis (PCA)
Consider predictor matrix X.
Components/eigenvalues obtained from data’s
variance-covariance matrix.
Covariance and correlation of linear combinations of X
maximized.
Less-correlated variables obtained through orthogonal
transformation of covariates.
Model:
T = XW, where
Xn×p is the predictor matrix,
Wp×p is a matrix of loadings, and
Tn×p is a matrix of scores.
38. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Principal Component Analysis (PCA)
Consider predictor matrix X.
Components/eigenvalues obtained from data’s
variance-covariance matrix.
Covariance and correlation of linear combinations of X
maximized.
Less-correlated variables obtained through orthogonal
transformation of covariates.
Model:
T = XW, where
Xn×p is the predictor matrix,
Wp×p is a matrix of loadings, and
Tn×p is a matrix of scores.
39. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Principal Component Analysis (PCA)
Consider predictor matrix X.
Components/eigenvalues obtained from data’s
variance-covariance matrix.
Covariance and correlation of linear combinations of X
maximized.
Less-correlated variables obtained through orthogonal
transformation of covariates.
Model:
T = XW, where
Xn×p is the predictor matrix,
Wp×p is a matrix of loadings, and
Tn×p is a matrix of scores.
40. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Principal Component Analysis (PCA)
Consider predictor matrix X.
Components/eigenvalues obtained from data’s
variance-covariance matrix.
Covariance and correlation of linear combinations of X
maximized.
Less-correlated variables obtained through orthogonal
transformation of covariates.
Model:
T = XW, where
Xn×p is the predictor matrix,
Wp×p is a matrix of loadings, and
Tn×p is a matrix of scores.
41. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Principal Component Analysis (PCA)
Consider predictor matrix X.
Components/eigenvalues obtained from data’s
variance-covariance matrix.
Covariance and correlation of linear combinations of X
maximized.
Less-correlated variables obtained through orthogonal
transformation of covariates.
Model:
T = XW, where
Xn×p is the predictor matrix,
Wp×p is a matrix of loadings, and
Tn×p is a matrix of scores.
42. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Principal Component Analysis (PCA)
Consider predictor matrix X.
Components/eigenvalues obtained from data’s
variance-covariance matrix.
Covariance and correlation of linear combinations of X
maximized.
Less-correlated variables obtained through orthogonal
transformation of covariates.
Model:
T = XW, where
Xn×p is the predictor matrix,
Wp×p is a matrix of loadings, and
Tn×p is a matrix of scores.
43. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Partial Least Squares (PLS)
Largely similar to PCA.
Consider response matrix Y.
Chief difference: maximizes covariance and correlation
of linear combinations of X and Y.
Model:
X = TPT
and Y = UQT
, where
Xn×m is the predictor matrix,
Yn×p is the response matrix,
Tn×l and Un×l are score matrices, and
Pm×l and Qp×l are loading matrices.
44. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Partial Least Squares (PLS)
Largely similar to PCA.
Consider response matrix Y.
Chief difference: maximizes covariance and correlation
of linear combinations of X and Y.
Model:
X = TPT
and Y = UQT
, where
Xn×m is the predictor matrix,
Yn×p is the response matrix,
Tn×l and Un×l are score matrices, and
Pm×l and Qp×l are loading matrices.
45. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Partial Least Squares (PLS)
Largely similar to PCA.
Consider response matrix Y.
Chief difference: maximizes covariance and correlation
of linear combinations of X and Y.
Model:
X = TPT
and Y = UQT
, where
Xn×m is the predictor matrix,
Yn×p is the response matrix,
Tn×l and Un×l are score matrices, and
Pm×l and Qp×l are loading matrices.
46. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Partial Least Squares (PLS)
Largely similar to PCA.
Consider response matrix Y.
Chief difference: maximizes covariance and correlation
of linear combinations of X and Y.
Model:
X = TPT
and Y = UQT
, where
Xn×m is the predictor matrix,
Yn×p is the response matrix,
Tn×l and Un×l are score matrices, and
Pm×l and Qp×l are loading matrices.
47. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Partial Least Squares (PLS)
Largely similar to PCA.
Consider response matrix Y.
Chief difference: maximizes covariance and correlation
of linear combinations of X and Y.
Model:
X = TPT
and Y = UQT
, where
Xn×m is the predictor matrix,
Yn×p is the response matrix,
Tn×l and Un×l are score matrices, and
Pm×l and Qp×l are loading matrices.
48. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Random Matrices (RMs)
Generated matrix applied directly to X.
Entries of RM1:
1
√
k
×
+1 with probability 1/2,
−1 with probability 1/2.
Entries of RM2:
3
k
×
+1 with probability 1/6,
0 with probability 4/6,
−1 with probability 1/6.
Entries of RM3: N (0, 1) . Rows are then normalized.
49. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Random Matrices (RMs)
Generated matrix applied directly to X.
Entries of RM1:
1
√
k
×
+1 with probability 1/2,
−1 with probability 1/2.
Entries of RM2:
3
k
×
+1 with probability 1/6,
0 with probability 4/6,
−1 with probability 1/6.
Entries of RM3: N (0, 1) . Rows are then normalized.
50. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Random Matrices (RMs)
Generated matrix applied directly to X.
Entries of RM1:
1
√
k
×
+1 with probability 1/2,
−1 with probability 1/2.
Entries of RM2:
3
k
×
+1 with probability 1/6,
0 with probability 4/6,
−1 with probability 1/6.
Entries of RM3: N (0, 1) . Rows are then normalized.
51. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Random Matrices (RMs)
Generated matrix applied directly to X.
Entries of RM1:
1
√
k
×
+1 with probability 1/2,
−1 with probability 1/2.
Entries of RM2:
3
k
×
+1 with probability 1/6,
0 with probability 4/6,
−1 with probability 1/6.
Entries of RM3: N (0, 1) . Rows are then normalized.
52. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Random Matrices (RMs)
Generated matrix applied directly to X.
Entries of RM1:
1
√
k
×
+1 with probability 1/2,
−1 with probability 1/2.
Entries of RM2:
3
k
×
+1 with probability 1/6,
0 with probability 4/6,
−1 with probability 1/6.
Entries of RM3: N (0, 1) . Rows are then normalized.
53. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: Simulation Example
54. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: PCA Outperforms PLS
55. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: PCA Outperforms PLS
56. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: RMs Are Comparable
57. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: RMs Are Comparable
58. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: RMs Outdo PCA and PLS
59. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: RMs Outdo PCA and PLS
60. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
61. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
62. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
63. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
64. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
65. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
66. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
67. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
68. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
69. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
70. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
71. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
72. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
73. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
74. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
75. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
76. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
77. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
References
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz
maps into a Hilbert space. Contemporary Mathematics, 26,
189–206.
Achlioptas, D. (2003). Database-friendly random projections:
Johnson-Lindenstrauss with binary coins. Computer and System
Sciences, 66(4), 671–687.
Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem
of Johnson and Lindenstrauss. Random Structures and Algorithms,
22(1), 60–65.
Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray
gene expression data: The accelerated failure time model.
Bioinformatics and Computational Biology 7(6), 939–954.
Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray
data in the presence of a censored survival response: A simulation
study. Statistical Applications in Genetics and Molecular Biology
8(1), 4.
78. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
References
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz
maps into a Hilbert space. Contemporary Mathematics, 26,
189–206.
Achlioptas, D. (2003). Database-friendly random projections:
Johnson-Lindenstrauss with binary coins. Computer and System
Sciences, 66(4), 671–687.
Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem
of Johnson and Lindenstrauss. Random Structures and Algorithms,
22(1), 60–65.
Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray
gene expression data: The accelerated failure time model.
Bioinformatics and Computational Biology 7(6), 939–954.
Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray
data in the presence of a censored survival response: A simulation
study. Statistical Applications in Genetics and Molecular Biology
8(1), 4.
79. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
References
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz
maps into a Hilbert space. Contemporary Mathematics, 26,
189–206.
Achlioptas, D. (2003). Database-friendly random projections:
Johnson-Lindenstrauss with binary coins. Computer and System
Sciences, 66(4), 671–687.
Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem
of Johnson and Lindenstrauss. Random Structures and Algorithms,
22(1), 60–65.
Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray
gene expression data: The accelerated failure time model.
Bioinformatics and Computational Biology 7(6), 939–954.
Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray
data in the presence of a censored survival response: A simulation
study. Statistical Applications in Genetics and Molecular Biology
8(1), 4.
80. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
References
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz
maps into a Hilbert space. Contemporary Mathematics, 26,
189–206.
Achlioptas, D. (2003). Database-friendly random projections:
Johnson-Lindenstrauss with binary coins. Computer and System
Sciences, 66(4), 671–687.
Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem
of Johnson and Lindenstrauss. Random Structures and Algorithms,
22(1), 60–65.
Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray
gene expression data: The accelerated failure time model.
Bioinformatics and Computational Biology 7(6), 939–954.
Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray
data in the presence of a censored survival response: A simulation
study. Statistical Applications in Genetics and Molecular Biology
8(1), 4.
81. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
References
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz
maps into a Hilbert space. Contemporary Mathematics, 26,
189–206.
Achlioptas, D. (2003). Database-friendly random projections:
Johnson-Lindenstrauss with binary coins. Computer and System
Sciences, 66(4), 671–687.
Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem
of Johnson and Lindenstrauss. Random Structures and Algorithms,
22(1), 60–65.
Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray
gene expression data: The accelerated failure time model.
Bioinformatics and Computational Biology 7(6), 939–954.
Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray
data in the presence of a censored survival response: A simulation
study. Statistical Applications in Genetics and Molecular Biology
8(1), 4.
82. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
References
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz
maps into a Hilbert space. Contemporary Mathematics, 26,
189–206.
Achlioptas, D. (2003). Database-friendly random projections:
Johnson-Lindenstrauss with binary coins. Computer and System
Sciences, 66(4), 671–687.
Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem
of Johnson and Lindenstrauss. Random Structures and Algorithms,
22(1), 60–65.
Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray
gene expression data: The accelerated failure time model.
Bioinformatics and Computational Biology 7(6), 939–954.
Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray
data in the presence of a censored survival response: A simulation
study. Statistical Applications in Genetics and Molecular Biology
8(1), 4.
83. Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
In Conclusion
Problem: reducing dataset dimensionality while maintaining
sufficient generality.
Takeaway: the standard methods of principal component
analysis and partial least squares are inferior to the random
matrices in this simulation study.
Ivan Rodriguez: ivanrodriguez@email.arizona.edu .
Claressa Ullmayer: clullmayer@alaska.edu .
Dr. Rojo: jrojo@unr.edu .