SlideShare a Scribd company logo
1 of 83
Download to read offline
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
As Random As It Gets:
Dimension Reduction Techniques à la Survival Analysis
Under the Accelerated Failure Time Model
Ivan Rodriguez†
Claressa Ullmayer‡
Dr. Rojo§
†The University of Arizona
‡The University of Alaska, Fairbanks
§The University of Nevada, Reno
20 February 2016
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Big Data
Examples:
Social media and businesses.
Delta Air Lines and Track My Bag.
Education and digital learning efficacy.
Knewton and personalized learning.
Intelligent consumer recommendations.
Netflix and viewer suggestions.
Our running example: matrix of 100 individuals × 1,000
genes, DNA microarray expression, cancer.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Reducing Dimensionality
Ideal scenario: researchers obtain copious data.
Variable(s) and response(s) likely beyond R2.
Glaring complication: even sophisticated software and
hardware can struggle/fail to provide data analysis.
Classical workaround: employ mathematical methods to
reduce dimensions while retaining the general structure.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Reducing Dimensionality
Ideal scenario: researchers obtain copious data.
Variable(s) and response(s) likely beyond R2.
Glaring complication: even sophisticated software and
hardware can struggle/fail to provide data analysis.
Classical workaround: employ mathematical methods to
reduce dimensions while retaining the general structure.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Reducing Dimensionality
Ideal scenario: researchers obtain copious data.
Variable(s) and response(s) likely beyond R2.
Glaring complication: even sophisticated software and
hardware can struggle/fail to provide data analysis.
Classical workaround: employ mathematical methods to
reduce dimensions while retaining the general structure.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Reducing Dimensionality
Ideal scenario: researchers obtain copious data.
Variable(s) and response(s) likely beyond R2.
Glaring complication: even sophisticated software and
hardware can struggle/fail to provide data analysis.
Classical workaround: employ mathematical methods to
reduce dimensions while retaining the general structure.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Reducing Dimensionality
Ideal scenario: researchers obtain copious data.
Variable(s) and response(s) likely beyond R2.
Glaring complication: even sophisticated software and
hardware can struggle/fail to provide data analysis.
Classical workaround: employ mathematical methods to
reduce dimensions while retaining the general structure.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Previous Insights
Johnson-Lindenstrauss (1984) extended and provided
proof of Lipschitz mappings into Hilbert spaces.
Achlioptas (2003) and Dasgupta-Gupta (2003) devised
examples of these mappings.
Nguyen and Rojo (2009a; 2009b) compared various
dimension-reduction techniques.
To wit: variants of principal component analysis (PCA)
and partial least squares (PLS).
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Previous Insights
Johnson-Lindenstrauss (1984) extended and provided
proof of Lipschitz mappings into Hilbert spaces.
Achlioptas (2003) and Dasgupta-Gupta (2003) devised
examples of these mappings.
Nguyen and Rojo (2009a; 2009b) compared various
dimension-reduction techniques.
To wit: variants of principal component analysis (PCA)
and partial least squares (PLS).
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Previous Insights
Johnson-Lindenstrauss (1984) extended and provided
proof of Lipschitz mappings into Hilbert spaces.
Achlioptas (2003) and Dasgupta-Gupta (2003) devised
examples of these mappings.
Nguyen and Rojo (2009a; 2009b) compared various
dimension-reduction techniques.
To wit: variants of principal component analysis (PCA)
and partial least squares (PLS).
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Previous Insights
Johnson-Lindenstrauss (1984) extended and provided
proof of Lipschitz mappings into Hilbert spaces.
Achlioptas (2003) and Dasgupta-Gupta (2003) devised
examples of these mappings.
Nguyen and Rojo (2009a; 2009b) compared various
dimension-reduction techniques.
To wit: variants of principal component analysis (PCA)
and partial least squares (PLS).
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Background: Previous Insights
Johnson-Lindenstrauss (1984) extended and provided
proof of Lipschitz mappings into Hilbert spaces.
Achlioptas (2003) and Dasgupta-Gupta (2003) devised
examples of these mappings.
Nguyen and Rojo (2009a; 2009b) compared various
dimension-reduction techniques.
To wit: variants of principal component analysis (PCA)
and partial least squares (PLS).
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Objectives
Simulate datasets to compare PCA, PLS and three
flavors of JL-inspired random matrices (RMs).
Criteria: bias and mean-squared error (MSE).
Use the accelerated failure time (AFT) model to
perform data regression and output survival curves.
Alternative to the celebrated Cox proportional hazards
(CPH) model.
Ease-of-interpretation.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Objectives
Simulate datasets to compare PCA, PLS and three
flavors of JL-inspired random matrices (RMs).
Criteria: bias and mean-squared error (MSE).
Use the accelerated failure time (AFT) model to
perform data regression and output survival curves.
Alternative to the celebrated Cox proportional hazards
(CPH) model.
Ease-of-interpretation.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Objectives
Simulate datasets to compare PCA, PLS and three
flavors of JL-inspired random matrices (RMs).
Criteria: bias and mean-squared error (MSE).
Use the accelerated failure time (AFT) model to
perform data regression and output survival curves.
Alternative to the celebrated Cox proportional hazards
(CPH) model.
Ease-of-interpretation.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Objectives
Simulate datasets to compare PCA, PLS and three
flavors of JL-inspired random matrices (RMs).
Criteria: bias and mean-squared error (MSE).
Use the accelerated failure time (AFT) model to
perform data regression and output survival curves.
Alternative to the celebrated Cox proportional hazards
(CPH) model.
Ease-of-interpretation.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Objectives
Simulate datasets to compare PCA, PLS and three
flavors of JL-inspired random matrices (RMs).
Criteria: bias and mean-squared error (MSE).
Use the accelerated failure time (AFT) model to
perform data regression and output survival curves.
Alternative to the celebrated Cox proportional hazards
(CPH) model.
Ease-of-interpretation.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Objectives
Simulate datasets to compare PCA, PLS and three
flavors of JL-inspired random matrices (RMs).
Criteria: bias and mean-squared error (MSE).
Use the accelerated failure time (AFT) model to
perform data regression and output survival curves.
Alternative to the celebrated Cox proportional hazards
(CPH) model.
Ease-of-interpretation.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Survival Analysis
Overcomes standard regression limitations.
Can incorporate censored data.
Survival curves express probability of surviving an
unambiguous event of interest.
Model:
S(t) = P (T > t) =
∞
t
f (τ)dτ = 1 − F(t), where
t ∈ [0, ∞) is a specific time,
T is a random variable,
f (τ) is the density of T, and
F(t) is the distribution of T.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Survival Analysis
Overcomes standard regression limitations.
Can incorporate censored data.
Survival curves express probability of surviving an
unambiguous event of interest.
Model:
S(t) = P (T > t) =
∞
t
f (τ)dτ = 1 − F(t), where
t ∈ [0, ∞) is a specific time,
T is a random variable,
f (τ) is the density of T, and
F(t) is the distribution of T.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Survival Analysis
Overcomes standard regression limitations.
Can incorporate censored data.
Survival curves express probability of surviving an
unambiguous event of interest.
Model:
S(t) = P (T > t) =
∞
t
f (τ)dτ = 1 − F(t), where
t ∈ [0, ∞) is a specific time,
T is a random variable,
f (τ) is the density of T, and
F(t) is the distribution of T.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Survival Analysis
Overcomes standard regression limitations.
Can incorporate censored data.
Survival curves express probability of surviving an
unambiguous event of interest.
Model:
S(t) = P (T > t) =
∞
t
f (τ)dτ = 1 − F(t), where
t ∈ [0, ∞) is a specific time,
T is a random variable,
f (τ) is the density of T, and
F(t) is the distribution of T.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Survival Analysis
Overcomes standard regression limitations.
Can incorporate censored data.
Survival curves express probability of surviving an
unambiguous event of interest.
Model:
S(t) = P (T > t) =
∞
t
f (τ)dτ = 1 − F(t), where
t ∈ [0, ∞) is a specific time,
T is a random variable,
f (τ) is the density of T, and
F(t) is the distribution of T.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Accelerated Failure Time (AFT) Model
Provides survival curves.
Directly incorporates survival times.
Model:
ln (Ti ) = µ + zT
i β + ei , where
i is the index of n observations,
Ti is the ith
observation’s survival time,
µ is the theoretical mean,
zi is the ith
observation’s covariates,
β is the 1-D regression scalar vector, and
ei is the ith
observation’s random error.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Accelerated Failure Time (AFT) Model
Provides survival curves.
Directly incorporates survival times.
Model:
ln (Ti ) = µ + zT
i β + ei , where
i is the index of n observations,
Ti is the ith
observation’s survival time,
µ is the theoretical mean,
zi is the ith
observation’s covariates,
β is the 1-D regression scalar vector, and
ei is the ith
observation’s random error.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Accelerated Failure Time (AFT) Model
Provides survival curves.
Directly incorporates survival times.
Model:
ln (Ti ) = µ + zT
i β + ei , where
i is the index of n observations,
Ti is the ith
observation’s survival time,
µ is the theoretical mean,
zi is the ith
observation’s covariates,
β is the 1-D regression scalar vector, and
ei is the ith
observation’s random error.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Accelerated Failure Time (AFT) Model
Provides survival curves.
Directly incorporates survival times.
Model:
ln (Ti ) = µ + zT
i β + ei , where
i is the index of n observations,
Ti is the ith
observation’s survival time,
µ is the theoretical mean,
zi is the ith
observation’s covariates,
β is the 1-D regression scalar vector, and
ei is the ith
observation’s random error.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Principal Component Analysis (PCA)
Consider predictor matrix X.
Components/eigenvalues obtained from data’s
variance-covariance matrix.
Covariance and correlation of linear combinations of X
maximized.
Less-correlated variables obtained through orthogonal
transformation of covariates.
Model:
T = XW, where
Xn×p is the predictor matrix,
Wp×p is a matrix of loadings, and
Tn×p is a matrix of scores.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Principal Component Analysis (PCA)
Consider predictor matrix X.
Components/eigenvalues obtained from data’s
variance-covariance matrix.
Covariance and correlation of linear combinations of X
maximized.
Less-correlated variables obtained through orthogonal
transformation of covariates.
Model:
T = XW, where
Xn×p is the predictor matrix,
Wp×p is a matrix of loadings, and
Tn×p is a matrix of scores.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Principal Component Analysis (PCA)
Consider predictor matrix X.
Components/eigenvalues obtained from data’s
variance-covariance matrix.
Covariance and correlation of linear combinations of X
maximized.
Less-correlated variables obtained through orthogonal
transformation of covariates.
Model:
T = XW, where
Xn×p is the predictor matrix,
Wp×p is a matrix of loadings, and
Tn×p is a matrix of scores.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Principal Component Analysis (PCA)
Consider predictor matrix X.
Components/eigenvalues obtained from data’s
variance-covariance matrix.
Covariance and correlation of linear combinations of X
maximized.
Less-correlated variables obtained through orthogonal
transformation of covariates.
Model:
T = XW, where
Xn×p is the predictor matrix,
Wp×p is a matrix of loadings, and
Tn×p is a matrix of scores.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Principal Component Analysis (PCA)
Consider predictor matrix X.
Components/eigenvalues obtained from data’s
variance-covariance matrix.
Covariance and correlation of linear combinations of X
maximized.
Less-correlated variables obtained through orthogonal
transformation of covariates.
Model:
T = XW, where
Xn×p is the predictor matrix,
Wp×p is a matrix of loadings, and
Tn×p is a matrix of scores.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Principal Component Analysis (PCA)
Consider predictor matrix X.
Components/eigenvalues obtained from data’s
variance-covariance matrix.
Covariance and correlation of linear combinations of X
maximized.
Less-correlated variables obtained through orthogonal
transformation of covariates.
Model:
T = XW, where
Xn×p is the predictor matrix,
Wp×p is a matrix of loadings, and
Tn×p is a matrix of scores.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Partial Least Squares (PLS)
Largely similar to PCA.
Consider response matrix Y.
Chief difference: maximizes covariance and correlation
of linear combinations of X and Y.
Model:
X = TPT
and Y = UQT
, where
Xn×m is the predictor matrix,
Yn×p is the response matrix,
Tn×l and Un×l are score matrices, and
Pm×l and Qp×l are loading matrices.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Partial Least Squares (PLS)
Largely similar to PCA.
Consider response matrix Y.
Chief difference: maximizes covariance and correlation
of linear combinations of X and Y.
Model:
X = TPT
and Y = UQT
, where
Xn×m is the predictor matrix,
Yn×p is the response matrix,
Tn×l and Un×l are score matrices, and
Pm×l and Qp×l are loading matrices.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Partial Least Squares (PLS)
Largely similar to PCA.
Consider response matrix Y.
Chief difference: maximizes covariance and correlation
of linear combinations of X and Y.
Model:
X = TPT
and Y = UQT
, where
Xn×m is the predictor matrix,
Yn×p is the response matrix,
Tn×l and Un×l are score matrices, and
Pm×l and Qp×l are loading matrices.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Partial Least Squares (PLS)
Largely similar to PCA.
Consider response matrix Y.
Chief difference: maximizes covariance and correlation
of linear combinations of X and Y.
Model:
X = TPT
and Y = UQT
, where
Xn×m is the predictor matrix,
Yn×p is the response matrix,
Tn×l and Un×l are score matrices, and
Pm×l and Qp×l are loading matrices.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Partial Least Squares (PLS)
Largely similar to PCA.
Consider response matrix Y.
Chief difference: maximizes covariance and correlation
of linear combinations of X and Y.
Model:
X = TPT
and Y = UQT
, where
Xn×m is the predictor matrix,
Yn×p is the response matrix,
Tn×l and Un×l are score matrices, and
Pm×l and Qp×l are loading matrices.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Random Matrices (RMs)
Generated matrix applied directly to X.
Entries of RM1:
1
√
k
×
+1 with probability 1/2,
−1 with probability 1/2.
Entries of RM2:
3
k
×



+1 with probability 1/6,
0 with probability 4/6,
−1 with probability 1/6.
Entries of RM3: N (0, 1) . Rows are then normalized.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Random Matrices (RMs)
Generated matrix applied directly to X.
Entries of RM1:
1
√
k
×
+1 with probability 1/2,
−1 with probability 1/2.
Entries of RM2:
3
k
×



+1 with probability 1/6,
0 with probability 4/6,
−1 with probability 1/6.
Entries of RM3: N (0, 1) . Rows are then normalized.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Random Matrices (RMs)
Generated matrix applied directly to X.
Entries of RM1:
1
√
k
×
+1 with probability 1/2,
−1 with probability 1/2.
Entries of RM2:
3
k
×



+1 with probability 1/6,
0 with probability 4/6,
−1 with probability 1/6.
Entries of RM3: N (0, 1) . Rows are then normalized.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Random Matrices (RMs)
Generated matrix applied directly to X.
Entries of RM1:
1
√
k
×
+1 with probability 1/2,
−1 with probability 1/2.
Entries of RM2:
3
k
×



+1 with probability 1/6,
0 with probability 4/6,
−1 with probability 1/6.
Entries of RM3: N (0, 1) . Rows are then normalized.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Methods: Random Matrices (RMs)
Generated matrix applied directly to X.
Entries of RM1:
1
√
k
×
+1 with probability 1/2,
−1 with probability 1/2.
Entries of RM2:
3
k
×



+1 with probability 1/6,
0 with probability 4/6,
−1 with probability 1/6.
Entries of RM3: N (0, 1) . Rows are then normalized.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: Simulation Example
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: PCA Outperforms PLS
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: PCA Outperforms PLS
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: RMs Are Comparable
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: RMs Are Comparable
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: RMs Outdo PCA and PLS
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Results: RMs Outdo PCA and PLS
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Analysis and Discussion
Unexpected results:
RMs outdid PCA and PLS.
Could be due to not incorporating censoring.
PCA bested PLS.
Further inquiry:
Apply findings to real datasets.
Integrate censored data.
Utilize CPH model.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
Acknowledgements
Dr. Rojo.
University of Nevada, Reno personnel:
Dr. Bradford.
Nathan Wiseman.
Dr. Cruz-Cano.
Rashidul Hasan.
National Security Agency.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
References
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz
maps into a Hilbert space. Contemporary Mathematics, 26,
189–206.
Achlioptas, D. (2003). Database-friendly random projections:
Johnson-Lindenstrauss with binary coins. Computer and System
Sciences, 66(4), 671–687.
Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem
of Johnson and Lindenstrauss. Random Structures and Algorithms,
22(1), 60–65.
Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray
gene expression data: The accelerated failure time model.
Bioinformatics and Computational Biology 7(6), 939–954.
Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray
data in the presence of a censored survival response: A simulation
study. Statistical Applications in Genetics and Molecular Biology
8(1), 4.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
References
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz
maps into a Hilbert space. Contemporary Mathematics, 26,
189–206.
Achlioptas, D. (2003). Database-friendly random projections:
Johnson-Lindenstrauss with binary coins. Computer and System
Sciences, 66(4), 671–687.
Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem
of Johnson and Lindenstrauss. Random Structures and Algorithms,
22(1), 60–65.
Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray
gene expression data: The accelerated failure time model.
Bioinformatics and Computational Biology 7(6), 939–954.
Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray
data in the presence of a censored survival response: A simulation
study. Statistical Applications in Genetics and Molecular Biology
8(1), 4.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
References
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz
maps into a Hilbert space. Contemporary Mathematics, 26,
189–206.
Achlioptas, D. (2003). Database-friendly random projections:
Johnson-Lindenstrauss with binary coins. Computer and System
Sciences, 66(4), 671–687.
Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem
of Johnson and Lindenstrauss. Random Structures and Algorithms,
22(1), 60–65.
Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray
gene expression data: The accelerated failure time model.
Bioinformatics and Computational Biology 7(6), 939–954.
Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray
data in the presence of a censored survival response: A simulation
study. Statistical Applications in Genetics and Molecular Biology
8(1), 4.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
References
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz
maps into a Hilbert space. Contemporary Mathematics, 26,
189–206.
Achlioptas, D. (2003). Database-friendly random projections:
Johnson-Lindenstrauss with binary coins. Computer and System
Sciences, 66(4), 671–687.
Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem
of Johnson and Lindenstrauss. Random Structures and Algorithms,
22(1), 60–65.
Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray
gene expression data: The accelerated failure time model.
Bioinformatics and Computational Biology 7(6), 939–954.
Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray
data in the presence of a censored survival response: A simulation
study. Statistical Applications in Genetics and Molecular Biology
8(1), 4.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
References
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz
maps into a Hilbert space. Contemporary Mathematics, 26,
189–206.
Achlioptas, D. (2003). Database-friendly random projections:
Johnson-Lindenstrauss with binary coins. Computer and System
Sciences, 66(4), 671–687.
Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem
of Johnson and Lindenstrauss. Random Structures and Algorithms,
22(1), 60–65.
Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray
gene expression data: The accelerated failure time model.
Bioinformatics and Computational Biology 7(6), 939–954.
Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray
data in the presence of a censored survival response: A simulation
study. Statistical Applications in Genetics and Molecular Biology
8(1), 4.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
References
Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz
maps into a Hilbert space. Contemporary Mathematics, 26,
189–206.
Achlioptas, D. (2003). Database-friendly random projections:
Johnson-Lindenstrauss with binary coins. Computer and System
Sciences, 66(4), 671–687.
Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem
of Johnson and Lindenstrauss. Random Structures and Algorithms,
22(1), 60–65.
Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray
gene expression data: The accelerated failure time model.
Bioinformatics and Computational Biology 7(6), 939–954.
Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray
data in the presence of a censored survival response: A simulation
study. Statistical Applications in Genetics and Molecular Biology
8(1), 4.
Dimension
Reduction
Techniques
Ivan Rodriguez
Background
Big Data
Reducing Dimensionality
Previous Insights
Objectives
Methods
Survival Analysis
Accelerated Failure Time
(AFT) Model
Principal Component
Analysis (PCA)
Partial Least Squares (PLS)
Random Matrices (RMs)
Results
Simulation Example
PCA Outperforms PLS
RMs Are Comparable
RMs Outdo PCA and PLS
Analysis and
Discussion
Acknowledgments
References
In Conclusion
Problem: reducing dataset dimensionality while maintaining
sufficient generality.
Takeaway: the standard methods of principal component
analysis and partial least squares are inferior to the random
matrices in this simulation study.
Ivan Rodriguez: ivanrodriguez@email.arizona.edu .
Claressa Ullmayer: clullmayer@alaska.edu .
Dr. Rojo: jrojo@unr.edu .

More Related Content

Similar to Rodriguez_DRT_Abstract_Beamer

DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
Johannes Hoppe
 

Similar to Rodriguez_DRT_Abstract_Beamer (20)

Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
How to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data TeamHow to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data Team
 
Impact of Data Science
Impact of Data Science Impact of Data Science
Impact of Data Science
 
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining AppliedDMDW Lesson 05 + 06 + 07 - Data Mining Applied
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
 
Ai
AiAi
Ai
 
Ai
AiAi
Ai
 
Ai
AiAi
Ai
 
Artificial Intelligence Certification
Artificial Intelligence CertificationArtificial Intelligence Certification
Artificial Intelligence Certification
 
Ai
AiAi
Ai
 
Data Mining.ppt
Data Mining.pptData Mining.ppt
Data Mining.ppt
 
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
R Tool for Visual Studio และการทำงานร่วมกันเป็นทีม โดย เฉลิมวงศ์ วิจิตรปิยะกุ...
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial
 
Data Analysis - Making Big Data Work
Data Analysis - Making Big Data WorkData Analysis - Making Big Data Work
Data Analysis - Making Big Data Work
 
Hadoop dev 01
Hadoop dev 01Hadoop dev 01
Hadoop dev 01
 
Towards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchTowards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning Research
 
Predictive modeling
Predictive modelingPredictive modeling
Predictive modeling
 
Part1
Part1Part1
Part1
 
Using Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements PrioritizationUsing Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements Prioritization
 
Data Warehouse to Data Science
Data Warehouse to Data ScienceData Warehouse to Data Science
Data Warehouse to Data Science
 

More from ​Iván Rodríguez

Rodriguez_THINK_TANK_Testimonial
Rodriguez_THINK_TANK_TestimonialRodriguez_THINK_TANK_Testimonial
Rodriguez_THINK_TANK_Testimonial
​Iván Rodríguez
 
Rodriguez_UROC_Final_Presentation
Rodriguez_UROC_Final_PresentationRodriguez_UROC_Final_Presentation
Rodriguez_UROC_Final_Presentation
​Iván Rodríguez
 
Rodriguez_THINK_TANK_Difficult_Problem_12
Rodriguez_THINK_TANK_Difficult_Problem_12Rodriguez_THINK_TANK_Difficult_Problem_12
Rodriguez_THINK_TANK_Difficult_Problem_12
​Iván Rodríguez
 
Rodriguez_THINK_TANK_Difficult_Problem_9
Rodriguez_THINK_TANK_Difficult_Problem_9Rodriguez_THINK_TANK_Difficult_Problem_9
Rodriguez_THINK_TANK_Difficult_Problem_9
​Iván Rodríguez
 
Rodriguez_THINK_TANK_Mathematics_Tutoring_Philosophy
Rodriguez_THINK_TANK_Mathematics_Tutoring_PhilosophyRodriguez_THINK_TANK_Mathematics_Tutoring_Philosophy
Rodriguez_THINK_TANK_Mathematics_Tutoring_Philosophy
​Iván Rodríguez
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNASRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
​Iván Rodríguez
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_ReportRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
​Iván Rodríguez
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMMRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
​Iván Rodríguez
 

More from ​Iván Rodríguez (11)

Rodriguez_NRMC_Presentation
Rodriguez_NRMC_PresentationRodriguez_NRMC_Presentation
Rodriguez_NRMC_Presentation
 
Rodriguez_THINK_TANK_Testimonial
Rodriguez_THINK_TANK_TestimonialRodriguez_THINK_TANK_Testimonial
Rodriguez_THINK_TANK_Testimonial
 
Rodriguez_UROC_Final_Poster
Rodriguez_UROC_Final_PosterRodriguez_UROC_Final_Poster
Rodriguez_UROC_Final_Poster
 
Rodriguez_UROC_Final_Presentation
Rodriguez_UROC_Final_PresentationRodriguez_UROC_Final_Presentation
Rodriguez_UROC_Final_Presentation
 
Rodriguez_THINK_TANK_Difficult_Problem_12
Rodriguez_THINK_TANK_Difficult_Problem_12Rodriguez_THINK_TANK_Difficult_Problem_12
Rodriguez_THINK_TANK_Difficult_Problem_12
 
Rodriguez_THINK_TANK_Difficult_Problem_9
Rodriguez_THINK_TANK_Difficult_Problem_9Rodriguez_THINK_TANK_Difficult_Problem_9
Rodriguez_THINK_TANK_Difficult_Problem_9
 
Rodriguez_THINK_TANK_Mathematics_Tutoring_Philosophy
Rodriguez_THINK_TANK_Mathematics_Tutoring_PhilosophyRodriguez_THINK_TANK_Mathematics_Tutoring_Philosophy
Rodriguez_THINK_TANK_Mathematics_Tutoring_Philosophy
 
Ullmayer_Rodriguez_Presentation
Ullmayer_Rodriguez_PresentationUllmayer_Rodriguez_Presentation
Ullmayer_Rodriguez_Presentation
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNASRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_SACNAS
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_ReportRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Technical_Report
 
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMMRodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
Rodriguez_Ullmayer_Rojo_RUSIS@UNR_REU_Poster_Presentation_JMM
 

Rodriguez_DRT_Abstract_Beamer

  • 1. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References As Random As It Gets: Dimension Reduction Techniques à la Survival Analysis Under the Accelerated Failure Time Model Ivan Rodriguez† Claressa Ullmayer‡ Dr. Rojo§ †The University of Arizona ‡The University of Alaska, Fairbanks §The University of Nevada, Reno 20 February 2016
  • 2. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Big Data
  • 3. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Big Data Examples: Social media and businesses. Delta Air Lines and Track My Bag. Education and digital learning efficacy. Knewton and personalized learning. Intelligent consumer recommendations. Netflix and viewer suggestions. Our running example: matrix of 100 individuals × 1,000 genes, DNA microarray expression, cancer.
  • 4. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Big Data Examples: Social media and businesses. Delta Air Lines and Track My Bag. Education and digital learning efficacy. Knewton and personalized learning. Intelligent consumer recommendations. Netflix and viewer suggestions. Our running example: matrix of 100 individuals × 1,000 genes, DNA microarray expression, cancer.
  • 5. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Big Data Examples: Social media and businesses. Delta Air Lines and Track My Bag. Education and digital learning efficacy. Knewton and personalized learning. Intelligent consumer recommendations. Netflix and viewer suggestions. Our running example: matrix of 100 individuals × 1,000 genes, DNA microarray expression, cancer.
  • 6. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Big Data Examples: Social media and businesses. Delta Air Lines and Track My Bag. Education and digital learning efficacy. Knewton and personalized learning. Intelligent consumer recommendations. Netflix and viewer suggestions. Our running example: matrix of 100 individuals × 1,000 genes, DNA microarray expression, cancer.
  • 7. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Big Data Examples: Social media and businesses. Delta Air Lines and Track My Bag. Education and digital learning efficacy. Knewton and personalized learning. Intelligent consumer recommendations. Netflix and viewer suggestions. Our running example: matrix of 100 individuals × 1,000 genes, DNA microarray expression, cancer.
  • 8. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Big Data Examples: Social media and businesses. Delta Air Lines and Track My Bag. Education and digital learning efficacy. Knewton and personalized learning. Intelligent consumer recommendations. Netflix and viewer suggestions. Our running example: matrix of 100 individuals × 1,000 genes, DNA microarray expression, cancer.
  • 9. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Big Data Examples: Social media and businesses. Delta Air Lines and Track My Bag. Education and digital learning efficacy. Knewton and personalized learning. Intelligent consumer recommendations. Netflix and viewer suggestions. Our running example: matrix of 100 individuals × 1,000 genes, DNA microarray expression, cancer.
  • 10. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Big Data Examples: Social media and businesses. Delta Air Lines and Track My Bag. Education and digital learning efficacy. Knewton and personalized learning. Intelligent consumer recommendations. Netflix and viewer suggestions. Our running example: matrix of 100 individuals × 1,000 genes, DNA microarray expression, cancer.
  • 11. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Big Data Examples: Social media and businesses. Delta Air Lines and Track My Bag. Education and digital learning efficacy. Knewton and personalized learning. Intelligent consumer recommendations. Netflix and viewer suggestions. Our running example: matrix of 100 individuals × 1,000 genes, DNA microarray expression, cancer.
  • 12. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Reducing Dimensionality Ideal scenario: researchers obtain copious data. Variable(s) and response(s) likely beyond R2. Glaring complication: even sophisticated software and hardware can struggle/fail to provide data analysis. Classical workaround: employ mathematical methods to reduce dimensions while retaining the general structure.
  • 13. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Reducing Dimensionality Ideal scenario: researchers obtain copious data. Variable(s) and response(s) likely beyond R2. Glaring complication: even sophisticated software and hardware can struggle/fail to provide data analysis. Classical workaround: employ mathematical methods to reduce dimensions while retaining the general structure.
  • 14. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Reducing Dimensionality Ideal scenario: researchers obtain copious data. Variable(s) and response(s) likely beyond R2. Glaring complication: even sophisticated software and hardware can struggle/fail to provide data analysis. Classical workaround: employ mathematical methods to reduce dimensions while retaining the general structure.
  • 15. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Reducing Dimensionality Ideal scenario: researchers obtain copious data. Variable(s) and response(s) likely beyond R2. Glaring complication: even sophisticated software and hardware can struggle/fail to provide data analysis. Classical workaround: employ mathematical methods to reduce dimensions while retaining the general structure.
  • 16. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Reducing Dimensionality Ideal scenario: researchers obtain copious data. Variable(s) and response(s) likely beyond R2. Glaring complication: even sophisticated software and hardware can struggle/fail to provide data analysis. Classical workaround: employ mathematical methods to reduce dimensions while retaining the general structure.
  • 17. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Previous Insights Johnson-Lindenstrauss (1984) extended and provided proof of Lipschitz mappings into Hilbert spaces. Achlioptas (2003) and Dasgupta-Gupta (2003) devised examples of these mappings. Nguyen and Rojo (2009a; 2009b) compared various dimension-reduction techniques. To wit: variants of principal component analysis (PCA) and partial least squares (PLS).
  • 18. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Previous Insights Johnson-Lindenstrauss (1984) extended and provided proof of Lipschitz mappings into Hilbert spaces. Achlioptas (2003) and Dasgupta-Gupta (2003) devised examples of these mappings. Nguyen and Rojo (2009a; 2009b) compared various dimension-reduction techniques. To wit: variants of principal component analysis (PCA) and partial least squares (PLS).
  • 19. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Previous Insights Johnson-Lindenstrauss (1984) extended and provided proof of Lipschitz mappings into Hilbert spaces. Achlioptas (2003) and Dasgupta-Gupta (2003) devised examples of these mappings. Nguyen and Rojo (2009a; 2009b) compared various dimension-reduction techniques. To wit: variants of principal component analysis (PCA) and partial least squares (PLS).
  • 20. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Previous Insights Johnson-Lindenstrauss (1984) extended and provided proof of Lipschitz mappings into Hilbert spaces. Achlioptas (2003) and Dasgupta-Gupta (2003) devised examples of these mappings. Nguyen and Rojo (2009a; 2009b) compared various dimension-reduction techniques. To wit: variants of principal component analysis (PCA) and partial least squares (PLS).
  • 21. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Background: Previous Insights Johnson-Lindenstrauss (1984) extended and provided proof of Lipschitz mappings into Hilbert spaces. Achlioptas (2003) and Dasgupta-Gupta (2003) devised examples of these mappings. Nguyen and Rojo (2009a; 2009b) compared various dimension-reduction techniques. To wit: variants of principal component analysis (PCA) and partial least squares (PLS).
  • 22. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Objectives Simulate datasets to compare PCA, PLS and three flavors of JL-inspired random matrices (RMs). Criteria: bias and mean-squared error (MSE). Use the accelerated failure time (AFT) model to perform data regression and output survival curves. Alternative to the celebrated Cox proportional hazards (CPH) model. Ease-of-interpretation.
  • 23. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Objectives Simulate datasets to compare PCA, PLS and three flavors of JL-inspired random matrices (RMs). Criteria: bias and mean-squared error (MSE). Use the accelerated failure time (AFT) model to perform data regression and output survival curves. Alternative to the celebrated Cox proportional hazards (CPH) model. Ease-of-interpretation.
  • 24. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Objectives Simulate datasets to compare PCA, PLS and three flavors of JL-inspired random matrices (RMs). Criteria: bias and mean-squared error (MSE). Use the accelerated failure time (AFT) model to perform data regression and output survival curves. Alternative to the celebrated Cox proportional hazards (CPH) model. Ease-of-interpretation.
  • 25. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Objectives Simulate datasets to compare PCA, PLS and three flavors of JL-inspired random matrices (RMs). Criteria: bias and mean-squared error (MSE). Use the accelerated failure time (AFT) model to perform data regression and output survival curves. Alternative to the celebrated Cox proportional hazards (CPH) model. Ease-of-interpretation.
  • 26. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Objectives Simulate datasets to compare PCA, PLS and three flavors of JL-inspired random matrices (RMs). Criteria: bias and mean-squared error (MSE). Use the accelerated failure time (AFT) model to perform data regression and output survival curves. Alternative to the celebrated Cox proportional hazards (CPH) model. Ease-of-interpretation.
  • 27. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Objectives Simulate datasets to compare PCA, PLS and three flavors of JL-inspired random matrices (RMs). Criteria: bias and mean-squared error (MSE). Use the accelerated failure time (AFT) model to perform data regression and output survival curves. Alternative to the celebrated Cox proportional hazards (CPH) model. Ease-of-interpretation.
  • 28. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Survival Analysis Overcomes standard regression limitations. Can incorporate censored data. Survival curves express probability of surviving an unambiguous event of interest. Model: S(t) = P (T > t) = ∞ t f (τ)dτ = 1 − F(t), where t ∈ [0, ∞) is a specific time, T is a random variable, f (τ) is the density of T, and F(t) is the distribution of T.
  • 29. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Survival Analysis Overcomes standard regression limitations. Can incorporate censored data. Survival curves express probability of surviving an unambiguous event of interest. Model: S(t) = P (T > t) = ∞ t f (τ)dτ = 1 − F(t), where t ∈ [0, ∞) is a specific time, T is a random variable, f (τ) is the density of T, and F(t) is the distribution of T.
  • 30. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Survival Analysis Overcomes standard regression limitations. Can incorporate censored data. Survival curves express probability of surviving an unambiguous event of interest. Model: S(t) = P (T > t) = ∞ t f (τ)dτ = 1 − F(t), where t ∈ [0, ∞) is a specific time, T is a random variable, f (τ) is the density of T, and F(t) is the distribution of T.
  • 31. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Survival Analysis Overcomes standard regression limitations. Can incorporate censored data. Survival curves express probability of surviving an unambiguous event of interest. Model: S(t) = P (T > t) = ∞ t f (τ)dτ = 1 − F(t), where t ∈ [0, ∞) is a specific time, T is a random variable, f (τ) is the density of T, and F(t) is the distribution of T.
  • 32. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Survival Analysis Overcomes standard regression limitations. Can incorporate censored data. Survival curves express probability of surviving an unambiguous event of interest. Model: S(t) = P (T > t) = ∞ t f (τ)dτ = 1 − F(t), where t ∈ [0, ∞) is a specific time, T is a random variable, f (τ) is the density of T, and F(t) is the distribution of T.
  • 33. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Accelerated Failure Time (AFT) Model Provides survival curves. Directly incorporates survival times. Model: ln (Ti ) = µ + zT i β + ei , where i is the index of n observations, Ti is the ith observation’s survival time, µ is the theoretical mean, zi is the ith observation’s covariates, β is the 1-D regression scalar vector, and ei is the ith observation’s random error.
  • 34. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Accelerated Failure Time (AFT) Model Provides survival curves. Directly incorporates survival times. Model: ln (Ti ) = µ + zT i β + ei , where i is the index of n observations, Ti is the ith observation’s survival time, µ is the theoretical mean, zi is the ith observation’s covariates, β is the 1-D regression scalar vector, and ei is the ith observation’s random error.
  • 35. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Accelerated Failure Time (AFT) Model Provides survival curves. Directly incorporates survival times. Model: ln (Ti ) = µ + zT i β + ei , where i is the index of n observations, Ti is the ith observation’s survival time, µ is the theoretical mean, zi is the ith observation’s covariates, β is the 1-D regression scalar vector, and ei is the ith observation’s random error.
  • 36. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Accelerated Failure Time (AFT) Model Provides survival curves. Directly incorporates survival times. Model: ln (Ti ) = µ + zT i β + ei , where i is the index of n observations, Ti is the ith observation’s survival time, µ is the theoretical mean, zi is the ith observation’s covariates, β is the 1-D regression scalar vector, and ei is the ith observation’s random error.
  • 37. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Principal Component Analysis (PCA) Consider predictor matrix X. Components/eigenvalues obtained from data’s variance-covariance matrix. Covariance and correlation of linear combinations of X maximized. Less-correlated variables obtained through orthogonal transformation of covariates. Model: T = XW, where Xn×p is the predictor matrix, Wp×p is a matrix of loadings, and Tn×p is a matrix of scores.
  • 38. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Principal Component Analysis (PCA) Consider predictor matrix X. Components/eigenvalues obtained from data’s variance-covariance matrix. Covariance and correlation of linear combinations of X maximized. Less-correlated variables obtained through orthogonal transformation of covariates. Model: T = XW, where Xn×p is the predictor matrix, Wp×p is a matrix of loadings, and Tn×p is a matrix of scores.
  • 39. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Principal Component Analysis (PCA) Consider predictor matrix X. Components/eigenvalues obtained from data’s variance-covariance matrix. Covariance and correlation of linear combinations of X maximized. Less-correlated variables obtained through orthogonal transformation of covariates. Model: T = XW, where Xn×p is the predictor matrix, Wp×p is a matrix of loadings, and Tn×p is a matrix of scores.
  • 40. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Principal Component Analysis (PCA) Consider predictor matrix X. Components/eigenvalues obtained from data’s variance-covariance matrix. Covariance and correlation of linear combinations of X maximized. Less-correlated variables obtained through orthogonal transformation of covariates. Model: T = XW, where Xn×p is the predictor matrix, Wp×p is a matrix of loadings, and Tn×p is a matrix of scores.
  • 41. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Principal Component Analysis (PCA) Consider predictor matrix X. Components/eigenvalues obtained from data’s variance-covariance matrix. Covariance and correlation of linear combinations of X maximized. Less-correlated variables obtained through orthogonal transformation of covariates. Model: T = XW, where Xn×p is the predictor matrix, Wp×p is a matrix of loadings, and Tn×p is a matrix of scores.
  • 42. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Principal Component Analysis (PCA) Consider predictor matrix X. Components/eigenvalues obtained from data’s variance-covariance matrix. Covariance and correlation of linear combinations of X maximized. Less-correlated variables obtained through orthogonal transformation of covariates. Model: T = XW, where Xn×p is the predictor matrix, Wp×p is a matrix of loadings, and Tn×p is a matrix of scores.
  • 43. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Partial Least Squares (PLS) Largely similar to PCA. Consider response matrix Y. Chief difference: maximizes covariance and correlation of linear combinations of X and Y. Model: X = TPT and Y = UQT , where Xn×m is the predictor matrix, Yn×p is the response matrix, Tn×l and Un×l are score matrices, and Pm×l and Qp×l are loading matrices.
  • 44. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Partial Least Squares (PLS) Largely similar to PCA. Consider response matrix Y. Chief difference: maximizes covariance and correlation of linear combinations of X and Y. Model: X = TPT and Y = UQT , where Xn×m is the predictor matrix, Yn×p is the response matrix, Tn×l and Un×l are score matrices, and Pm×l and Qp×l are loading matrices.
  • 45. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Partial Least Squares (PLS) Largely similar to PCA. Consider response matrix Y. Chief difference: maximizes covariance and correlation of linear combinations of X and Y. Model: X = TPT and Y = UQT , where Xn×m is the predictor matrix, Yn×p is the response matrix, Tn×l and Un×l are score matrices, and Pm×l and Qp×l are loading matrices.
  • 46. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Partial Least Squares (PLS) Largely similar to PCA. Consider response matrix Y. Chief difference: maximizes covariance and correlation of linear combinations of X and Y. Model: X = TPT and Y = UQT , where Xn×m is the predictor matrix, Yn×p is the response matrix, Tn×l and Un×l are score matrices, and Pm×l and Qp×l are loading matrices.
  • 47. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Partial Least Squares (PLS) Largely similar to PCA. Consider response matrix Y. Chief difference: maximizes covariance and correlation of linear combinations of X and Y. Model: X = TPT and Y = UQT , where Xn×m is the predictor matrix, Yn×p is the response matrix, Tn×l and Un×l are score matrices, and Pm×l and Qp×l are loading matrices.
  • 48. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Random Matrices (RMs) Generated matrix applied directly to X. Entries of RM1: 1 √ k × +1 with probability 1/2, −1 with probability 1/2. Entries of RM2: 3 k ×    +1 with probability 1/6, 0 with probability 4/6, −1 with probability 1/6. Entries of RM3: N (0, 1) . Rows are then normalized.
  • 49. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Random Matrices (RMs) Generated matrix applied directly to X. Entries of RM1: 1 √ k × +1 with probability 1/2, −1 with probability 1/2. Entries of RM2: 3 k ×    +1 with probability 1/6, 0 with probability 4/6, −1 with probability 1/6. Entries of RM3: N (0, 1) . Rows are then normalized.
  • 50. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Random Matrices (RMs) Generated matrix applied directly to X. Entries of RM1: 1 √ k × +1 with probability 1/2, −1 with probability 1/2. Entries of RM2: 3 k ×    +1 with probability 1/6, 0 with probability 4/6, −1 with probability 1/6. Entries of RM3: N (0, 1) . Rows are then normalized.
  • 51. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Random Matrices (RMs) Generated matrix applied directly to X. Entries of RM1: 1 √ k × +1 with probability 1/2, −1 with probability 1/2. Entries of RM2: 3 k ×    +1 with probability 1/6, 0 with probability 4/6, −1 with probability 1/6. Entries of RM3: N (0, 1) . Rows are then normalized.
  • 52. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Methods: Random Matrices (RMs) Generated matrix applied directly to X. Entries of RM1: 1 √ k × +1 with probability 1/2, −1 with probability 1/2. Entries of RM2: 3 k ×    +1 with probability 1/6, 0 with probability 4/6, −1 with probability 1/6. Entries of RM3: N (0, 1) . Rows are then normalized.
  • 53. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Results: Simulation Example
  • 54. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Results: PCA Outperforms PLS
  • 55. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Results: PCA Outperforms PLS
  • 56. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Results: RMs Are Comparable
  • 57. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Results: RMs Are Comparable
  • 58. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Results: RMs Outdo PCA and PLS
  • 59. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Results: RMs Outdo PCA and PLS
  • 60. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Analysis and Discussion Unexpected results: RMs outdid PCA and PLS. Could be due to not incorporating censoring. PCA bested PLS. Further inquiry: Apply findings to real datasets. Integrate censored data. Utilize CPH model.
  • 61. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Analysis and Discussion Unexpected results: RMs outdid PCA and PLS. Could be due to not incorporating censoring. PCA bested PLS. Further inquiry: Apply findings to real datasets. Integrate censored data. Utilize CPH model.
  • 62. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Analysis and Discussion Unexpected results: RMs outdid PCA and PLS. Could be due to not incorporating censoring. PCA bested PLS. Further inquiry: Apply findings to real datasets. Integrate censored data. Utilize CPH model.
  • 63. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Analysis and Discussion Unexpected results: RMs outdid PCA and PLS. Could be due to not incorporating censoring. PCA bested PLS. Further inquiry: Apply findings to real datasets. Integrate censored data. Utilize CPH model.
  • 64. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Analysis and Discussion Unexpected results: RMs outdid PCA and PLS. Could be due to not incorporating censoring. PCA bested PLS. Further inquiry: Apply findings to real datasets. Integrate censored data. Utilize CPH model.
  • 65. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Analysis and Discussion Unexpected results: RMs outdid PCA and PLS. Could be due to not incorporating censoring. PCA bested PLS. Further inquiry: Apply findings to real datasets. Integrate censored data. Utilize CPH model.
  • 66. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Analysis and Discussion Unexpected results: RMs outdid PCA and PLS. Could be due to not incorporating censoring. PCA bested PLS. Further inquiry: Apply findings to real datasets. Integrate censored data. Utilize CPH model.
  • 67. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Analysis and Discussion Unexpected results: RMs outdid PCA and PLS. Could be due to not incorporating censoring. PCA bested PLS. Further inquiry: Apply findings to real datasets. Integrate censored data. Utilize CPH model.
  • 68. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Analysis and Discussion Unexpected results: RMs outdid PCA and PLS. Could be due to not incorporating censoring. PCA bested PLS. Further inquiry: Apply findings to real datasets. Integrate censored data. Utilize CPH model.
  • 69. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Acknowledgements Dr. Rojo. University of Nevada, Reno personnel: Dr. Bradford. Nathan Wiseman. Dr. Cruz-Cano. Rashidul Hasan. National Security Agency.
  • 70. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Acknowledgements Dr. Rojo. University of Nevada, Reno personnel: Dr. Bradford. Nathan Wiseman. Dr. Cruz-Cano. Rashidul Hasan. National Security Agency.
  • 71. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Acknowledgements Dr. Rojo. University of Nevada, Reno personnel: Dr. Bradford. Nathan Wiseman. Dr. Cruz-Cano. Rashidul Hasan. National Security Agency.
  • 72. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Acknowledgements Dr. Rojo. University of Nevada, Reno personnel: Dr. Bradford. Nathan Wiseman. Dr. Cruz-Cano. Rashidul Hasan. National Security Agency.
  • 73. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Acknowledgements Dr. Rojo. University of Nevada, Reno personnel: Dr. Bradford. Nathan Wiseman. Dr. Cruz-Cano. Rashidul Hasan. National Security Agency.
  • 74. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Acknowledgements Dr. Rojo. University of Nevada, Reno personnel: Dr. Bradford. Nathan Wiseman. Dr. Cruz-Cano. Rashidul Hasan. National Security Agency.
  • 75. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Acknowledgements Dr. Rojo. University of Nevada, Reno personnel: Dr. Bradford. Nathan Wiseman. Dr. Cruz-Cano. Rashidul Hasan. National Security Agency.
  • 76. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References Acknowledgements Dr. Rojo. University of Nevada, Reno personnel: Dr. Bradford. Nathan Wiseman. Dr. Cruz-Cano. Rashidul Hasan. National Security Agency.
  • 77. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References References Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics, 26, 189–206. Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Computer and System Sciences, 66(4), 671–687. Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms, 22(1), 60–65. Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray gene expression data: The accelerated failure time model. Bioinformatics and Computational Biology 7(6), 939–954. Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray data in the presence of a censored survival response: A simulation study. Statistical Applications in Genetics and Molecular Biology 8(1), 4.
  • 78. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References References Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics, 26, 189–206. Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Computer and System Sciences, 66(4), 671–687. Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms, 22(1), 60–65. Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray gene expression data: The accelerated failure time model. Bioinformatics and Computational Biology 7(6), 939–954. Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray data in the presence of a censored survival response: A simulation study. Statistical Applications in Genetics and Molecular Biology 8(1), 4.
  • 79. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References References Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics, 26, 189–206. Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Computer and System Sciences, 66(4), 671–687. Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms, 22(1), 60–65. Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray gene expression data: The accelerated failure time model. Bioinformatics and Computational Biology 7(6), 939–954. Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray data in the presence of a censored survival response: A simulation study. Statistical Applications in Genetics and Molecular Biology 8(1), 4.
  • 80. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References References Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics, 26, 189–206. Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Computer and System Sciences, 66(4), 671–687. Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms, 22(1), 60–65. Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray gene expression data: The accelerated failure time model. Bioinformatics and Computational Biology 7(6), 939–954. Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray data in the presence of a censored survival response: A simulation study. Statistical Applications in Genetics and Molecular Biology 8(1), 4.
  • 81. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References References Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics, 26, 189–206. Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Computer and System Sciences, 66(4), 671–687. Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms, 22(1), 60–65. Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray gene expression data: The accelerated failure time model. Bioinformatics and Computational Biology 7(6), 939–954. Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray data in the presence of a censored survival response: A simulation study. Statistical Applications in Genetics and Molecular Biology 8(1), 4.
  • 82. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References References Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz maps into a Hilbert space. Contemporary Mathematics, 26, 189–206. Achlioptas, D. (2003). Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Computer and System Sciences, 66(4), 671–687. Dasgupta, S., & Gupta, A. (2003). An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures and Algorithms, 22(1), 60–65. Nguyen, T. S., & Rojo, J. (2009a). Dimension reduction of microarray gene expression data: The accelerated failure time model. Bioinformatics and Computational Biology 7(6), 939–954. Nguyen, T. S., & Rojo, J. (2009b). Dimension reduction of microarray data in the presence of a censored survival response: A simulation study. Statistical Applications in Genetics and Molecular Biology 8(1), 4.
  • 83. Dimension Reduction Techniques Ivan Rodriguez Background Big Data Reducing Dimensionality Previous Insights Objectives Methods Survival Analysis Accelerated Failure Time (AFT) Model Principal Component Analysis (PCA) Partial Least Squares (PLS) Random Matrices (RMs) Results Simulation Example PCA Outperforms PLS RMs Are Comparable RMs Outdo PCA and PLS Analysis and Discussion Acknowledgments References In Conclusion Problem: reducing dataset dimensionality while maintaining sufficient generality. Takeaway: the standard methods of principal component analysis and partial least squares are inferior to the random matrices in this simulation study. Ivan Rodriguez: ivanrodriguez@email.arizona.edu . Claressa Ullmayer: clullmayer@alaska.edu . Dr. Rojo: jrojo@unr.edu .