Logistic regression: topological and geometric considerations

LOGISTIC REGRESSION:
GEOMETRIC AND
TOPOLOGICAL
CONSIDERATIONS
By Colleen M. Farrelly

PROBLEM OVERVIEW
•Logistic regression is ubiquitous in medical research today.
• Recovery from a traumatic brain injury
• Psychiatric outcomes/relapse
• Development of resistance to HIV medications
•Logistic regression struggles under several types of data conditions:
• Small sample size relative to number of predictors (n<p)
• Sparsity (few predictors are related to outcome)
• Collinearity (variables share lots of variance)
Machine learning offers several promising solutions:
 Penalized regression
 Tree-based methods
 Boosted regression
 Neural networks
 Some return an interpretable model with regression coefficients (odds ratios); some do not.

MACHINE LEARNING EXTENSIONS
•Despite the success of machine learning in
recent years, algorithm extensions utilizing
topological or geometric information have
outperformed their base algorithm
counterparts.
• Network analytics (Forman curvature) and network
matching via topologically-based metrics
• Hodge theory extensions of graph-based ranking
algorithms
• Psychometric measure validation via Hausdorff
statistics
• Morse-Smale-based regression
• Mapper algorithm for clustering/subgroup mining
• Persistent homology to explore group
differences/shape matching problems
• Conformal mapping for image analytics
•This suggests that machine learning
algorithms, such as penalized regression, can
benefit from the addition of geometric and
topological information.

•Penalized extensions of generalized
linear models such as logistic
regression:
• LASSO imposes sparsity by adding a penalty
(L-1 norm) that shrinks near-zero
coefficients to 0.
• Similar to a cowboy at the origin roping coefficients that
get too close
• Can handle p>n situations well
• Ridge regression adds a different penalty (L-
2 norm) to create a robust model.
• Handles messy geometry (local minima/maxima…) to
yield consistent estimators
• May not impose sparsity on solutions
• Elastic net combines these penalties to
impose sparsity on solutions and yield a
robust model.
LASSO, RIDGE REGRESSION, AND
ELASTIC NET

HOMOTOPY-BASED LASSO
• Homotopy arrow example
• Red and blue arrows can be deformed
into each other by wiggling and
stretching the line path with anchors
at start and finish of line.
• Yellow arrow crosses holes and would
need to backtrack or break to the
surface to freely wiggle into the blue
or red line.
• Homotopy method in LASSO
wiggles an easy regression path
into an optimal regression path.
• Avoids obstacles that can trap other
regression estimators (peaks, valleys,
saddles…)
• Akin to removing obstacles that might
hinder the cowboy’s ability to rope
variables near the origin
• Homotopy as path
equivalence
• Intrinsic property of
topological spaces (such
as data manifolds)

•Instead of fitting model to data space, fit model to model error tangent
space:
• Deals with collinearity, as parallel vectors share a tangent space (only one selected of
collinear group)
• Separates predictors into 3 groups:
• Set of selected predictors (small angles relative to error tangent space)
• Set of redundant predictors (share tangent space with predictors)
• Set of non-selected predictors (large angles with tangent space)
•Leverages Rao scoring and Fisher Information
• Important in assessing generalized linear models
• Yields model fit statistics (BIC, AIC…)
• Forward selection for series of all possible models
• Choose best model by model fit statistics
DIFFERENTIAL GEOMETRY LARS
EXTENSIONS (DGLARS)

TESTING SET-UP: ALGORITHMS
•Set 1
• Main effects models that yield regression
coefficients for predictors
• Logistic regression
• Elastic net regression
• Homotopy LASSO
• DGLARS
• Boosted regression with linear base learners
• Multivariate adaptive regression splines (MARS)
• Bayesian model averaging (BMA)
•Set 2
• Main effects plus interaction term models
that yield regression coefficients
• Logistic regression
• Homotopy LASSO
• Boosted regression with linear base learners
• MARS
•Set 3
 Other machine learning models which
do not yield regression coefficients or
an interpretable model
 Random forest
 Extreme gradient boosting (XGBoost)
 Conditional inference tree
 Neural network
 K-nearest neighbors regression

SIMULATIONS AND REAL DATA
•Simulations:
• Comparison of sets 1 and 2
• 13 predictors (4 true predictors) and
binary outcome (0,1)
• Linear relationships (4 main effects terms)
• Nonlinear relationships (2 interaction terms)
• Mixed relationships (2 main effects, 1
interaction term)
• Added Gaussian noise and group
overlap levels:
• Low noise (0,0.25) and no group overlap
• Medium noise (0,0.5) and 5-10% group overlap
• High noise (0,0.75) and 15-20% group overlap
• Yielded a total of 9 simulation
conditions, which were replicated 10
times across sample sizes of: 500,
1000, 2500, 5000, 10000.
• Train/test splits of 70/30 for each trial
•Real dataset:
• UCI Machine Learning Repository
Wisconsin Breast Cancer Dataset
(WBCD)
• 569 individuals with 30 tumor attributes and
binary indicator of malignancy (outcome)
• Sets 1, 2, and 3 compared
• 70/30 train/test split
• Comparison of selected model
coefficients across set 1:
• Reduction of model size
• Odds ratio comparison of selected terms
• Overlap of selected terms between models

SIMULATION RESULTS
•Main effects trials (left column)
• Most algorithms perform well.
• Set 2 homotopy LASSO (stars) is optimal
among algorithms tested.
• DGLARS (triangles) performs well with low
noise/overlap or high noise/overlap.
•Interaction trials (middle column)
• Set 2 Homotopy LASSO performs well across
conditions (especially n>2500).
• DGLARS performs well with low
noise/overlap.
• DGLARS retains advantage over many main
effects models with added noise/overlap.
•Mixed trials (right column)
• DGLARS outperforms all other set 1/set 2
methods at low noise/overlap conditions
• DGLARS retains advantage over set 1/set 2
algorithms, except set 2 homotopy LASSO.
• Set 2 homotopy LASSO emerges as best
algorithm with increasing noise/overlap.
•This suggests that incorporating
geometry/topology into machine
learning algorithms can improve
performance on data with:
• Group overlap
• Noisy measurements

BCWD RESULTS: OVERVIEW OF
PERFORMANCE•Set 1 main effects models
perform well.
• Machine learning methods
improve logistic regression.
• Elastic net and homotopy
LASSO show the best
performance overall.
•Set 2 models suggest
logistic regression struggles
with the large number of
predictors relative to
sample size.
•Set 3 models demonstrate
that DGLARS and homotopy
LASSO perform comparably
well to nonparametric
machine learning models,
yielding lower and more
balanced error.

BCWD ODDS RATIO COMPARISON: SET
1 ALGORITHMS •Most algorithms
reduced the predictor
set by more than half.
•Many algorithms
struggled with data
geometry (singularities,
local optima…), yielding
odds ratios of >1000
(set to 10 in graph).
•Homotopy LASSO
offered a solution with
finite odds ratio
estimates/coefficients.
• Suggests its potential for
solving multivariate
regression on messy
datasets
• Yields interpretable,
bounded odds ratios when
other algorithms fail

CONCLUSIONS
•This study suggests the potential for new logistic regression
algorithms that incorporate geometric and topological information.
• DGLARS and homotopy LASSO perform well on simulated data, particularly on
messier problems with main effects and interaction terms with some noise/group
overlap.
• Homotopy LASSO and DGLARS perform well on BCWD compared to nonparametric
machine learning algorithms and produce interpretable linear models.
• Homotopy LASSO yields finite odds ratios where other regression algorithms fail.
•More work should be done to incorporate geometric/topological
methods into existing machine learning algorithms (particularly those
based on generalized linear regression with interpretable models).
•Further empirical testing could include:
• Multinomial regression (3+ category outcomes)
• Tweedie/Poisson regression (count outcomes)

Logistic regression: topological and geometric considerations

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Logistic regression: topological and geometric considerations

Similar to Logistic regression: topological and geometric considerations (20)

More from Colleen Farrelly

More from Colleen Farrelly (20)

Recently uploaded

Recently uploaded (20)

Logistic regression: topological and geometric considerations

Editor's Notes