T O P O L O G I C A L
D ATA A N A LY S I S
C O L L E E N M . F A R R E L L Y ,
D A T A S E M B L Y
W H Y
T O P O L O G I C A L
D ATA A N A LY S I S ?
• Autocorrelations/dynamic systems (time
series, spatiotemporal data)
• Wide data (-omics data)
• Small data (pilot studies, rare diseases…)
• Visualization-heavy needs for
comparisons/groups (especially high-
dimensional data)
• Data that breaks assumptions of machine
learning algorithms/statistical models
E X A M P L E S
O F T D A
T O O L S
Persistent homology
Mapper algorithm
Homotopy continuation
Morse functions/clustering/regression
Euler calculus
Discrete exterior calculus
Ricci curvature
Mappings to Teichmüller space
P E R S I S T E N T
H O M O L O G Y
C O M P A R I N G G R O U P S A N D E X T E N D I N G
H I E R A R C H I C A L C L U S T E R I N G
P O I N T C L O U D S
A N D D I S TA N C E
M E T R I C S
S I M P L I C I A
L
C O M P L E X E
S
H O M O L O G Y O V E R V I E W : B E T T I
N U M B E R S
(1,0,0…) (1,1,0…) (1,0,1…)
F I LT R AT I O N S
A N D
P E R S I S T E N C
E
• Filter distances or objects to
obtain a series of topological
objects (graphs, simplicial
complexes…)
• Compute a series of metrics
or summary statistics over
filtrations
• Track how metrics/statistics
A L G O R I T H M D E TA I L S
Rips filtration
• Pairwise intersections
of ɛ-balls centered at a
given point in the point
cloud or distance
matrix
Dimension parameter
• Number of Betti
numbers to compute
(usually set to a
dimension of 0 or 1)
Diagram
parameters/distance
computation parameters
• Optional visualization
or statistical testing
functions after using
ripser()
I M P L E M E N TAT I O N I N P Y T H O N O R R
• TDAstats
• TDAverse
R packages
• Scikit-TDA
• Ripser/persim
• Giotto-TDA
Python packages
E X A M P L E
A N A L Y S I S :
P R O B L E M / D A T
A
Small set of BERT-
embedded poems that are
either humorous or serious
in tone
Want to understand if there
are significant differences in
BERT features between the
two sets of poems
M A P P E R
C L U S T E R I N G A N D D A T A M I N I N G
M O R S E
F U N C T I O N S
: H E I G H T
F U N C T I O N S
A N D
C R I T I C A L
P O I N T S
N E R V E S : O P E N
C O V E R I N G S
A L G O R I T H M D E TA I L S
Project Data
• Takes input
data and
projects to
custom
embeddings
(3-
dimensional
space, knn
distances…)
Create Cover
• Percent of
overlap
across
covers and
number of
covers
(different
results with
different
parameters)
Cluster
• DBSCAN or
other
clusterers
available in
scikit-learn
Save Model
• Save output
and details
to a
webpage
(path_html)
I M P L E M E N TAT I O N I N P Y T H O N O R
R
• TDAmapper
R packages
• Kepler-Mapper (part of Scikit)
• Giotto-TDA
• tmap
Python packages
E X A M P L E A N A LY S I S :
P R O B L E M / D ATA
Small set of BERT-embedded poems that
are either humorous or serious in tone
Want to cluster poems to understand the
existence of subgroups
R I C C I C U R VAT U R E
F I N D I N G K E Y P I E C E S O F A S O C I A L
N E T W O R K
R I C C I
C U R VAT U R E
Negative
Zero
Positive
P O W E R / D I S E A S E
N E T W O R K B A C K B O N E S
A L G O R I T H M D E TA I L S
Calculate Curvature
on Edges
• Examine vertices
and their adjacent
edges to see how
much “pull” there is
on an edge
Calculate Curvature
on Vertices
• Sum up edge
weights around a
vertex to find out
how much “stuff” is
weighing it down
I M P L E M E N TAT I O N I N P Y T H O N O R
R
• Custom in igraph
R packages
• Custom in igraph
• Custom in networkx
Python packages
E X A M P L E A N A LY S I S :
P R O B L E M / D ATA
Town network representing a supply
chain (medical, food, electricity…)
Want to understand vulnerabilities
that exist within the network

Topological Data Analysis.pptx

  • 1.
    T O PO L O G I C A L D ATA A N A LY S I S C O L L E E N M . F A R R E L L Y , D A T A S E M B L Y
  • 2.
    W H Y TO P O L O G I C A L D ATA A N A LY S I S ? • Autocorrelations/dynamic systems (time series, spatiotemporal data) • Wide data (-omics data) • Small data (pilot studies, rare diseases…) • Visualization-heavy needs for comparisons/groups (especially high- dimensional data) • Data that breaks assumptions of machine learning algorithms/statistical models
  • 3.
    E X AM P L E S O F T D A T O O L S Persistent homology Mapper algorithm Homotopy continuation Morse functions/clustering/regression Euler calculus Discrete exterior calculus Ricci curvature Mappings to Teichmüller space
  • 4.
    P E RS I S T E N T H O M O L O G Y C O M P A R I N G G R O U P S A N D E X T E N D I N G H I E R A R C H I C A L C L U S T E R I N G
  • 5.
    P O IN T C L O U D S A N D D I S TA N C E M E T R I C S
  • 6.
    S I MP L I C I A L C O M P L E X E S
  • 7.
    H O MO L O G Y O V E R V I E W : B E T T I N U M B E R S (1,0,0…) (1,1,0…) (1,0,1…)
  • 8.
    F I LTR AT I O N S A N D P E R S I S T E N C E • Filter distances or objects to obtain a series of topological objects (graphs, simplicial complexes…) • Compute a series of metrics or summary statistics over filtrations • Track how metrics/statistics
  • 9.
    A L GO R I T H M D E TA I L S Rips filtration • Pairwise intersections of ɛ-balls centered at a given point in the point cloud or distance matrix Dimension parameter • Number of Betti numbers to compute (usually set to a dimension of 0 or 1) Diagram parameters/distance computation parameters • Optional visualization or statistical testing functions after using ripser()
  • 10.
    I M PL E M E N TAT I O N I N P Y T H O N O R R • TDAstats • TDAverse R packages • Scikit-TDA • Ripser/persim • Giotto-TDA Python packages
  • 11.
    E X AM P L E A N A L Y S I S : P R O B L E M / D A T A Small set of BERT- embedded poems that are either humorous or serious in tone Want to understand if there are significant differences in BERT features between the two sets of poems
  • 12.
    M A PP E R C L U S T E R I N G A N D D A T A M I N I N G
  • 13.
    M O RS E F U N C T I O N S : H E I G H T F U N C T I O N S A N D C R I T I C A L P O I N T S
  • 14.
    N E RV E S : O P E N C O V E R I N G S
  • 15.
    A L GO R I T H M D E TA I L S Project Data • Takes input data and projects to custom embeddings (3- dimensional space, knn distances…) Create Cover • Percent of overlap across covers and number of covers (different results with different parameters) Cluster • DBSCAN or other clusterers available in scikit-learn Save Model • Save output and details to a webpage (path_html)
  • 16.
    I M PL E M E N TAT I O N I N P Y T H O N O R R • TDAmapper R packages • Kepler-Mapper (part of Scikit) • Giotto-TDA • tmap Python packages
  • 17.
    E X AM P L E A N A LY S I S : P R O B L E M / D ATA Small set of BERT-embedded poems that are either humorous or serious in tone Want to cluster poems to understand the existence of subgroups
  • 18.
    R I CC I C U R VAT U R E F I N D I N G K E Y P I E C E S O F A S O C I A L N E T W O R K
  • 19.
    R I CC I C U R VAT U R E Negative Zero Positive
  • 20.
    P O WE R / D I S E A S E N E T W O R K B A C K B O N E S
  • 21.
    A L GO R I T H M D E TA I L S Calculate Curvature on Edges • Examine vertices and their adjacent edges to see how much “pull” there is on an edge Calculate Curvature on Vertices • Sum up edge weights around a vertex to find out how much “stuff” is weighing it down
  • 22.
    I M PL E M E N TAT I O N I N P Y T H O N O R R • Custom in igraph R packages • Custom in igraph • Custom in networkx Python packages
  • 23.
    E X AM P L E A N A LY S I S : P R O B L E M / D ATA Town network representing a supply chain (medical, food, electricity…) Want to understand vulnerabilities that exist within the network