Human Factors of XR: Using Human Factors to Design XR Systems
Tutorial of topological data analysis part 3(Mapper algorithm)
1. Tutorial of
Topological Data Analysis
Tran Quoc Hoan
@k09hthaduonght.wordpress.com/
Paper Alert 2016-04-15, Hasegawa lab., Tokyo
The University of Tokyo
Part III - Mapper Algorithm
2. My TDA = Topology Data Analysis ’s road
TDA Road 2
Part I - Basic concepts &
applications
Part II - Advanced TDA
computation
Part III - Mapper Algorithm
Part V - Applications in…
Part VI - Applications in…
Part IV - Software Roadmap
He is following me
4. Basic motivation
Mapper Algorithm 4
Basic idea
Perform clustering at different “scales”, track how
clusters change as scale varies
Motivation
• Coarser than manifold learning, but
still works in nonlinear situation
• Extract meaningful geometric
information about dataset
• Efficiently computable (for large
dataset) Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition.
G Singh, F Mémoli, GE Carlsson - SPBG, 2007
5. Morse theory
Mapper Algorithm 5
Basic idea
Describe topology of a smooth manifold M using level
sets of a suitable function h : M -> R
• Recover M by looking at h-1((∞, t]), as t scans over the
range of h
• Topology of M changes at critical points of h
6. Reeb graphs
Mapper Algorithm 6
• For each t in R, contract each
component of f-1(t) to a point
• Resulting structure is a graph
7. Mapper
Mapper Algorithm 7
The mapper algorithm is a generalization of this procedure (Singh-
Memoli-Carlsson)
Input
✤ Filter (continuous) function f: X -> R
✤ Cover L of im(f) by open intervals:
Method
✤ Cluster each inverse image f-1(Lα) into various connected components
✤ The Mapper is the nerve of V
• Clusters are vertices
• 1 k-simplex per (k+1)-fold intersection
connected cover V
✤ Color vertices according to average value of f in the cluster
k
i=0Vi 6= ;, V0, ..., Vk 2 V
8. Workflow - Illustration
Mapper Algorithm 8Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
f could be in n-dimension
9. Workflow - Illustration
Mapper Algorithm 9Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
f could be in n-dimension
10. Workflow - Illustration
Mapper Algorithm 10Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
f could be in n-dimension
11. Mapper in practice
Mapper Algorithm 11
Input
✤ Filter (continuous) function f: P -> R
✤ Cover L of im(f) by open intervals:
Method
✤ Cluster each inverse image f-1(Lα) into various connected components
in G
✤ The Mapper is the nerve of V
connected cover V
✤ Color vertices according to average value of f in the cluster
- Point cloud P with metric dP
- Compute neighborhood graph G = (P, E)
• Clusters are vertices
• 1 k-simplex per (k+1)-fold intersection
k
i=0Vi 6= ;, V0, ..., Vk 2 V
(intersections materialized
by data points)
12. Mapper in practice
Mapper Algorithm 12Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
13. Mapper in practice
Mapper Algorithm 13Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
14. Mapper in practice
Mapper Algorithm 14Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/
15. Mapper in practice
Mapper Algorithm 15
Parameters
✤ Filter (continuous) function f: P -> R
✤ Cover L of im(f) by open intervals:
✤ Neighborhood size δ
Example: uniform cover L
• Resolution / granularity: r (diameter of intervals)
• Gain: g (percentage of overlap)
range scale
geometric scale
16. Filter functions
Mapper Algorithm 16
Choice of filter function is essential
• Some kind of density measure
• A score measure difference (distance) from some baseline
• An eccentricity measure
Statistics
Mean/Max/Min
Variance
n-Moment
Density
…
Machine Learning
PCA/SVD
Auto encoders
Isomap/MDS/TSNE
SVM Distance
Error/Debugging Info
…
Geometry
Centrality
Curvature
Harmonic Cycles
…
17. Filter functions
Mapper Algorithm 17
Eccentricity
Density
- How close the point lies to the “center” of the point cloud.
- How close the point to the surrounding points
18. Mapper in applications
Mapper Algorithm 18
Extracting insights from the shape of complex data using topology,
Lum et al., Nature, 2013
Topological Data Analysis for Discovery in Preclinical Spinal Cord
Injury and Traumatic Brain Injury, Nielson et al., Nature, 2015
Using Topological Data Analysis for Diagnosis Pulmonary Embolism,
Rucco et al., arXiv preprint, 2014
Topological Methods for Exploring Low-density States in
Biomolecular Folding Pathways, Yao et al., J. Chemical Physics, 2009
CD8 T-cell reactivity to islet antigens is unique to type 1 while
CD4 T-cell reactivity exists in both type 1 and type 2 diabetes,
Sarikonda et al., J. Autoimmunity, 2013
Innate and adaptive T cells in asthmatic patients: Relationship
to severity and disease mechanisms, Hinks et al., J. Allergy Clinical
Immunology, 2015
✤
✤
✤
✤
✤
✤
20. Mapper in clustering
Mapper Algorithm 20
(1) Compute the Mapper
(2) Detect interesting topological substructures
(“loops”, “flares”)
(3) Use substructure to
cluster data
select parameters
Not easy (Tutorial part 1 + 2)
21. Mapper Algorithm 21
Extracting insights from the shape of complex data using topology,
Lum et al., Nature, 2013
f: 1st and 2nd SVD r = 120, g = 22%
PCA can show the
Republican/
Democrat cluster
but TDA gives
more information
House Party representative grouping
Point: member of
the House
PCA
22. Mapper Algorithm 22
Extracting insights from the shape of complex data using topology,
Lum et al., Nature, 2013
Detect new clusters for NBA players
23. Mapper Algorithm 23
Innate and adaptive T cells in asthmatic patients: Relationship
to severity and disease mechanisms, Hinks et al., J. Allergy Clinical Immunology, 2015
The TDA used 62 subjects
with most complete data.
f: 1st and 2nd SVD
r = 120, g = 14%, equalized
24. Mapper in feature selection
Mapper Algorithm 24
(1) Compute the Mapper
(2) Detect interesting topological substructures
(“loops”, “flares”)
(3) Select features that best
discriminate data in substructure
select parameters Kolmogorov-Smirnov test on (substructure)
feature vs. (whole dataset) feature,
select features with low p-val
25. Mapper Algorithm 25
Extracting insights from the shape of complex data using topology,
Lum et al., Nature, 2013
Goal: detect factors that influence survival after therapy in breast cancer patients
Points: breast cancer patients that went through specific therapy
PCA/Single-linkage clustering cannot see this
f: eccentricity
r = 1/30, g = 33%
26. Mapper Algorithm 26
Topological Data Analysis for Discovery in Preclinical Spinal Cord
Injury and Traumatic Brain Injury, Nielson et al., Nature, 2015
27. Select Parameters
Mapper Algorithm 27
parameter r
parameter g
parameter δ
parameter f
• Small r -> fine cover
(close to Reeb) (sensitive to δ)
• Large r -> rough cover
(less sensitive to δ)
• g ≈ 1 -> more points inside
intersections , less sensitive to
δ but far from Reeb
• g ≈ 0 -> controlled Mapper
dimension, close to Reeb
• Large δ -> fewer nodes, clean
Mapper but far from Reeb
(more straight lines)
• Small δ -> distinct
topological structure but lots
of nodes (noisy)
• Depend mostly on the
dataset
coordinate, density estimation,
eccentricity, eigenvector
28. Select Parameters
Mapper Algorithm 28
Example: P in R2 sampled from known distribution
f = density estimator, r = 1/30, g = 20%
δ = percentage of the diameter of X
Image source: http://www.enseignement.polytechnique.fr/informatique/INF563/