Topological Data Analysis.pptx

T O P O L O G I C A L
D ATA A N A LY S I S
C O L L E E N M . F A R R E L L Y ,
D A T A S E M B L Y

W H Y
T O P O L O G I C A L
D ATA A N A LY S I S ?
• Autocorrelations/dynamic systems (time
series, spatiotemporal data)
• Wide data (-omics data)
• Small data (pilot studies, rare diseases…)
• Visualization-heavy needs for
comparisons/groups (especially high-
dimensional data)
• Data that breaks assumptions of machine
learning algorithms/statistical models

E X A M P L E S
O F T D A
T O O L S
Persistent homology
Mapper algorithm
Homotopy continuation
Morse functions/clustering/regression
Euler calculus
Discrete exterior calculus
Ricci curvature
Mappings to Teichmüller space

P E R S I S T E N T
H O M O L O G Y
C O M P A R I N G G R O U P S A N D E X T E N D I N G
H I E R A R C H I C A L C L U S T E R I N G

P O I N T C L O U D S
A N D D I S TA N C E
M E T R I C S

S I M P L I C I A
L
C O M P L E X E
S

H O M O L O G Y O V E R V I E W : B E T T I
N U M B E R S
(1,0,0…) (1,1,0…) (1,0,1…)

F I LT R AT I O N S
A N D
P E R S I S T E N C
E
• Filter distances or objects to
obtain a series of topological
objects (graphs, simplicial
complexes…)
• Compute a series of metrics
or summary statistics over
filtrations
• Track how metrics/statistics

A L G O R I T H M D E TA I L S
Rips filtration
• Pairwise intersections
of ɛ-balls centered at a
given point in the point
cloud or distance
matrix
Dimension parameter
• Number of Betti
numbers to compute
(usually set to a
dimension of 0 or 1)
Diagram
parameters/distance
computation parameters
• Optional visualization
or statistical testing
functions after using
ripser()

I M P L E M E N TAT I O N I N P Y T H O N O R R
• TDAstats
• TDAverse
R packages
• Scikit-TDA
• Ripser/persim
• Giotto-TDA
Python packages

E X A M P L E
A N A L Y S I S :
P R O B L E M / D A T
A
Small set of BERT-
embedded poems that are
either humorous or serious
in tone
Want to understand if there
are significant differences in
BERT features between the
two sets of poems

M A P P E R
C L U S T E R I N G A N D D A T A M I N I N G

M O R S E
F U N C T I O N S
: H E I G H T
F U N C T I O N S
A N D
C R I T I C A L
P O I N T S

N E R V E S : O P E N
C O V E R I N G S

Project Data
• Takes input
data and
projects to
custom
embeddings
(3-
dimensional
space, knn
distances…)
Create Cover
• Percent of
overlap
across
covers and
number of
covers
(different
results with
different
parameters)
Cluster
• DBSCAN or
other
clusterers
available in
scikit-learn
Save Model
• Save output
and details
to a
webpage
(path_html)

I M P L E M E N TAT I O N I N P Y T H O N O R
R
• TDAmapper
R packages
• Kepler-Mapper (part of Scikit)
• Giotto-TDA
• tmap
Python packages

E X A M P L E A N A LY S I S :
P R O B L E M / D ATA
Small set of BERT-embedded poems that
are either humorous or serious in tone
Want to cluster poems to understand the
existence of subgroups

R I C C I C U R VAT U R E
F I N D I N G K E Y P I E C E S O F A S O C I A L
N E T W O R K

R I C C I
C U R VAT U R E
Negative
Zero
Positive

P O W E R / D I S E A S E
N E T W O R K B A C K B O N E S

Calculate Curvature
on Edges
• Examine vertices
and their adjacent
edges to see how
much “pull” there is
on an edge
Calculate Curvature
on Vertices
• Sum up edge
weights around a
vertex to find out
how much “stuff” is
weighing it down

I M P L E M E N TAT I O N I N P Y T H O N O R
R
• Custom in igraph
R packages
• Custom in igraph
• Custom in networkx
Python packages

E X A M P L E A N A LY S I S :
P R O B L E M / D ATA
Town network representing a supply
chain (medical, food, electricity…)
Want to understand vulnerabilities
that exist within the network

Topological Data Analysis.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Topological Data Analysis.pptx

Similar to Topological Data Analysis.pptx (20)

More from Colleen Farrelly

More from Colleen Farrelly (20)

Recently uploaded

Recently uploaded (20)

Topological Data Analysis.pptx