Multiscale Mapper Networks
By Colleen M. Farrelly
Problem
 Data contains many
underlying structures
and relationships.
 Current methods (such
as k-means
clustering):
◦ Don’t capture all of these
structures
◦ Struggle with certain
data properties
(dimensionality)
◦ Provide little information
about connectedness
between
clusters/individuals
◦ Instability
Recent Solutions
 Nonlinear distance
metrics
◦ Random forest-based
◦ Manifold learning-based
 Hierarchical clustering
◦ Nested clustering
approach
 Multiscale K-Nearest
Neighbors
◦ Adjust number of
neighbors to slice data
 Still don’t provide a
comprehensive view of
data structure
Topology Overview
 Branch of mainly pure
mathematics
 Study of changes in
function behavior on
different shapes
(called manifolds)
 Can examine locally-
variant and globally-
invariant properties
 Classify
similarities/differences
between shapes
based on these
characteristics
Algebra can be used to build
more complex structures
from basic building blocks
Topology and Data
 Data clouds can be turned into discrete shapes
combinations (simplices)
 Identify key topological features across different slices of
the data (circles, holes…)
◦ Classified by Betti numbers (dimension plus feature type)
 Find connected components of similar topological
structure
doi.ieeecomputersociety.org
Mapper Algorithm
 Topological clustering
◦ Define distance metric
 Linear or nonlinear
◦ Define filtration function
 Linear, density-based…
◦ Slice multidimensional
dataset with Morse function
 Type of function associated
with gradient flow and critical
point identification on smooth
manifolds
◦ Examine function behavior
across slice (level set)
◦ Cluster function behavior
◦ Graph cluster connections
 Type of extended Reeb Graph
Response
gradations
Outliers
Multiscale Extension of
Mapper
 Instability of single-
scale mapper algorithm
◦ Clusters may change with
scale
◦ Connections may change
with scale
 Filtrations at multiple
resolution settings
 Connections change as
lens zooms in or out
◦ Contains information
about underlying data
structure and
relationships
◦ Hierarchy of Reeb graphs
◦ Topological summary
Graph Theory Extensions of
Mapper
 Cluster relationships from
Mapper give an adjacency
matrix and distance metric
◦ Clusters as vertices
◦ Nested hierarchy as edges
◦ Connected/unconnected
components
◦ Centrality of certain points
◦ Bridges linking disparate
clusters
◦ Path lengths between
clusters
 Can apply network
analytics to assess cluster
relationships and
individual connections
across clusters
This is a weighted,
undirected graph!
Network Extensions of Multiscale
Mapper
 Graph theory
algorithms applied to
Mapper results dig
deeper into:
◦ Data topology/structure
◦ Nature of individuals’
similarities across
multivariate distribution
 Examine across
different lenses
◦ Hierarchy of networks
connected through
individuals common to
multiple networks
◦ Analyze across slices to
gain deeper insight into
network and underlying
data structures
Information from Each
Network Hubs
◦ Direct connection
to many other
clusters
 Betweenness
◦ Non-extremity
measure
 Diversity
◦ Information
contained
 Bridges
◦ Connection
between less-
related
components
 Graph Laplacian
◦ Eigenvectors with
connection/bridge
weights
 Centrality
◦ Weight direct
connections and
bridges for
importance to
network
 Vertices
◦ Clusters at a
particular resolution
 Edges
◦ Connections between
clusters
◦ Individuals common
between clusters
 Levels
◦ Level sets (height
slices) containing one
or more vertices
◦ Individuals bridging
levels
Combined Insight of
Extensions
 Multiple resolutions
◦ Cluster hierarchy
 Evolving cluster structure
 More complete picture of
individual classifications
◦ Network hierarchy
 Evolving network structure
 More complete picture of
cluster relationships and
structure
 More complete picture of
individual connections
Example Demonstration
 Demo dataset of 7th grade SAT
scores
 Group-level data mining of results
Transition
Transition
Emergence of
subgroups
Split into
two distinct
groups
Individual Mining Results
 Map back to individuals
◦ Bridging individuals
 Transition between clusters
 Multivariate cut-off scores
determination
◦ Isolated individuals
 Outliers and outlier groups
 Unique response or predictors
subsets
◦ Consistently clustered
individuals
 Cohesive subgroups in data
 Underlying similarity of
predictors or response
Conclusion
New method ameliorates some of the issues
with clustering methods
◦ Robust
◦ Works in high dimensions
◦ Captures connectedness
◦ Stable
◦ Provides hierarchy
◦ Quantify relationships

Multiscale Mapper Networks

  • 1.
    Multiscale Mapper Networks ByColleen M. Farrelly
  • 2.
    Problem  Data containsmany underlying structures and relationships.  Current methods (such as k-means clustering): ◦ Don’t capture all of these structures ◦ Struggle with certain data properties (dimensionality) ◦ Provide little information about connectedness between clusters/individuals ◦ Instability
  • 3.
    Recent Solutions  Nonlineardistance metrics ◦ Random forest-based ◦ Manifold learning-based  Hierarchical clustering ◦ Nested clustering approach  Multiscale K-Nearest Neighbors ◦ Adjust number of neighbors to slice data  Still don’t provide a comprehensive view of data structure
  • 4.
    Topology Overview  Branchof mainly pure mathematics  Study of changes in function behavior on different shapes (called manifolds)  Can examine locally- variant and globally- invariant properties  Classify similarities/differences between shapes based on these characteristics Algebra can be used to build more complex structures from basic building blocks
  • 5.
    Topology and Data Data clouds can be turned into discrete shapes combinations (simplices)  Identify key topological features across different slices of the data (circles, holes…) ◦ Classified by Betti numbers (dimension plus feature type)  Find connected components of similar topological structure doi.ieeecomputersociety.org
  • 6.
    Mapper Algorithm  Topologicalclustering ◦ Define distance metric  Linear or nonlinear ◦ Define filtration function  Linear, density-based… ◦ Slice multidimensional dataset with Morse function  Type of function associated with gradient flow and critical point identification on smooth manifolds ◦ Examine function behavior across slice (level set) ◦ Cluster function behavior ◦ Graph cluster connections  Type of extended Reeb Graph Response gradations Outliers
  • 7.
    Multiscale Extension of Mapper Instability of single- scale mapper algorithm ◦ Clusters may change with scale ◦ Connections may change with scale  Filtrations at multiple resolution settings  Connections change as lens zooms in or out ◦ Contains information about underlying data structure and relationships ◦ Hierarchy of Reeb graphs ◦ Topological summary
  • 8.
    Graph Theory Extensionsof Mapper  Cluster relationships from Mapper give an adjacency matrix and distance metric ◦ Clusters as vertices ◦ Nested hierarchy as edges ◦ Connected/unconnected components ◦ Centrality of certain points ◦ Bridges linking disparate clusters ◦ Path lengths between clusters  Can apply network analytics to assess cluster relationships and individual connections across clusters This is a weighted, undirected graph!
  • 9.
    Network Extensions ofMultiscale Mapper  Graph theory algorithms applied to Mapper results dig deeper into: ◦ Data topology/structure ◦ Nature of individuals’ similarities across multivariate distribution  Examine across different lenses ◦ Hierarchy of networks connected through individuals common to multiple networks ◦ Analyze across slices to gain deeper insight into network and underlying data structures
  • 10.
    Information from Each NetworkHubs ◦ Direct connection to many other clusters  Betweenness ◦ Non-extremity measure  Diversity ◦ Information contained  Bridges ◦ Connection between less- related components  Graph Laplacian ◦ Eigenvectors with connection/bridge weights  Centrality ◦ Weight direct connections and bridges for importance to network  Vertices ◦ Clusters at a particular resolution  Edges ◦ Connections between clusters ◦ Individuals common between clusters  Levels ◦ Level sets (height slices) containing one or more vertices ◦ Individuals bridging levels
  • 11.
    Combined Insight of Extensions Multiple resolutions ◦ Cluster hierarchy  Evolving cluster structure  More complete picture of individual classifications ◦ Network hierarchy  Evolving network structure  More complete picture of cluster relationships and structure  More complete picture of individual connections
  • 12.
    Example Demonstration  Demodataset of 7th grade SAT scores  Group-level data mining of results Transition Transition Emergence of subgroups Split into two distinct groups
  • 13.
    Individual Mining Results Map back to individuals ◦ Bridging individuals  Transition between clusters  Multivariate cut-off scores determination ◦ Isolated individuals  Outliers and outlier groups  Unique response or predictors subsets ◦ Consistently clustered individuals  Cohesive subgroups in data  Underlying similarity of predictors or response
  • 14.
    Conclusion New method amelioratessome of the issues with clustering methods ◦ Robust ◦ Works in high dimensions ◦ Captures connectedness ◦ Stable ◦ Provides hierarchy ◦ Quantify relationships

Editor's Notes

  • #3 Dey, T. K., Memoli, F., & Wang, Y. (2015). Mutiscale Mapper: A Framework for Topological Summarization of Data and Maps. arXiv preprint arXiv:1504.03763. Singh, G., Mémoli, F., & Carlsson, G. E. (2007, September). Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. In SPBG (pp. 91-100).
  • #4 Ghosh, A. K., Chaudhuri, P., & Murthy, C. A. (2006). Multiscale classification using nearest neighbor density estimates. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 36(5), 1139-1148. Shi, T., Seligson, D., Belldegrun, A. S., Palotie, A., & Horvath, S. (2005). Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma. Modern Pathology, 18(4), 547-557. Navarro, J. F., Frenk, C. S., & White, S. D. (1997). A Universal density profile from hierarchical clustering. The Astrophysical Journal, 490(2), 493.
  • #5 Spanier, E. H. (1994). Algebraic topology (Vol. 55, No. 1). Springer Science & Business Media. Aspinwall, P. S., Greene, B. R., & Morrison, D. R. (1994). Calabi-Yau moduli space, mirror manifolds and spacetime topology change in string theory. Nuclear Physics B, 416(2), 414-480. Schwarz, M. (1993). Morse homology. In Progress in Mathematics. Palis, J. (1969). On morse-smale dynamical systems. Topology, 8(4), 385-404. Devaney, R. L. (1989). An introduction to chaotic dynamical systems (Vol. 13046). Reading: Addison-Wesley.
  • #6 Epstein, C., Carlsson, G., & Edelsbrunner, H. (2011). Topological data analysis. Inverse Problems, 27(12), 120201. Zomorodian, A. (2007). Topological data analysis. Advances in Applied and Computational Topology, 70, 1-39.
  • #7 Singh, G., Mémoli, F., & Carlsson, G. E. (2007, September). Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. In SPBG (pp. 91-100). Lum, P. Y., Singh, G., Lehman, A., Ishkanov, T., Vejdemo-Johansson, M., Alagappan, M., ... & Carlsson, G. (2013). Extracting insights from the shape of complex data using topology. Scientific reports, 3. Carlsson, G., Jardine, R., Feichtner-Kozlov, D., Morozov, D., Chazal, F., de Silva, V., ... & Wang, Y. (2012). Topological Data Analysis and Machine Learning Theory.
  • #8 Dey, T. K., Memoli, F., & Wang, Y. (2015). Mutiscale Mapper: A Framework for Topological Summarization of Data and Maps. arXiv preprint arXiv:1504.03763.
  • #10 Opens the door to extremely deep unsupervised learning and a new way to look at data structure and underlying relationships.
  • #11 Scott, J. (2012). Social network analysis. Sage. Carrington, P. J., Scott, J., & Wasserman, S. (Eds.). (2005). Models and methods in social network analysis (Vol. 28). Cambridge university press. Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications (Vol. 8). Cambridge university press.