SlideShare a Scribd company logo
1 of 16
TOPOLOGY FOR DATA
SCIENCE: MORSETHEORY
AND APPLICATION
Colleen M. Farrelly
Level Sets in Everyday Life
• Front maps partition weather patterns by areas
of the same pressure (isobars).
• Elevation maps partition land areas by height
above/below sea level.
Level Sets of Functions
• Continuous functions have defined
local and global peaks, valleys, and
passes.
• Define height “slices” to partition
function.
• Akin to a cheese grater scraping off
layers of a cheese block.
• In the example, the blue lines slice a
sine wave into pieces of similar height.
• Function on discrete date (points) can
be partitioned into level sets, too.
Level Sets to Critical Points
• Continuous functions:
• Can be decomposed with level sets.
• Contain local optima (critical points).
• Maxima (peaks)
• Minima (valleys)
• Saddle points (inflections/height change)
• Continuous functions can live in
higher-dimensional spaces with more
complicated critical points.
Degenerate and Non-DegenerateOptima
• Morse functions have stable and isolated local
optima (non-degenerate critical points).
• Related to 1st and 2nd derivatives of function.
• Don’t change with small shifts to the function.
• Technically, related to Hessian being
defined/undefined at the critical point.
• Reflects neighborhood behavior around the
critical point.
1. Non-degenerate critical points have defined
behavior in the critical point’s neighborhood.
2. Degenerate points have undefined behavior
near the critical point.
f’=0
f’=0
f’’(x)<0
f’’(x)>0
f’’(x)=0
Morse Function Definition
1. None of the function’s critical points
are degenerate.
2. None of the critical points share the
same value.
• These properties allow a map between a
function’s critical point values to a space
of level sets (left).
• All critical values map to values in the level
set collection.
• Function can be plotted nicely to
summarize its peaks, valleys, and in-
between spaces.
1
0
-1
Level Set
Critical
Point
Map
Discrete Extensions to DataAnalysis
• Morse functions can be extended to
discrete spaces.
• Data lives in a discrete point cloud.
• Topological spaces, called simplicial
complexes, can be built from these.
• Several algorithms exist to connect
points to each other via shared
neighborhoods.
• Vietoris-Rips complexes are built from
connecting points with d distance from
each other.
• Any metric distance can be used.
• Process turns data into a topological space
upon which a Morse function can be
defined.
2-d neighborhoods are
defined by Euclidean
distance.
Points within a given
circle are mutually
connected, forming a
simplex.
Example
simplicial
complex
Morse-Smale Clustering
• Partition space between minima and
maxima of function by flow.
• Example:
• The truncated sine wave shown has 2
minima and 2 maxima shown (dots).
• Pieces between local minima and maxima
define regions of the function.
1. Yellow
2. Blue
3. Red
• Higher-dimensional spaces can be
simplified by this partitioning.
• Can be used to cluster data.
• Subgroups can then be compared across
characteristics using statistical tests (t-
test, Chi square…).
Cluster 1
Cluster 2
Cluster 3
Intuitive 2-Dimensional Example
• Imagine a soccer player kicking a ball on the ground of a hilly field.
• The high and low points determine where the ball will come to rest.
• These paths of the ball define which parts of the field share common hills and
valleys.
• These paths are actually gradient paths defined by height on the field’s topological
space.
• The spaces they define are the Morse-Smale complex of the field, partitioning it
into different regions (clusters).
Algorithms that compute
Morse-Smale complexes
typically follow this intuition.
Morse-Smale Regression
• Type of piece-wise regression.
• Fit regression model to partitions
found by Morse-Smale
decompositions of a space given a
Morse function.
• Regression models include:
• Linear and generalized linear models
• Machine learning models
• Random forest
• Elastic net
• Boosted regression
• Neural/deep networks
• Can examine group-wise differences
in regression models.
Example: 2 groups,
3 predictors
Reeb Graphs
• Track evolution of level sets
through critical points of a
Morse function.
• Partition space according to a
function (left by height).
• Plot critical points entering
model.
• Track until they are subsumed
into another partition.
• Useful in image analytics and
shape comparison.
Persistent Homology
• Filtration of simplicial complexes built from
data
• Iterative changing of lens with which to examine
data (neighborhood size…)
• Topological features (critical points) appear and
disappear as the lens changes.
• Creates a nested sequence of features with
underlying algebraic properties, called a homology
sequence:
Hom1⊂Hom2⊂Hom3⊂Hom4
• Persistence gives length of feature existence in
homology sequence.
• Many plots (left) exist to summarize this
information, and special statistical tools can
compare datasets/topological spaces.
• Filtration defines an MRI-type examination of
data’s topological characteristics and evolution
of critical points.
0 2 4 6 8 10
0246810
Birth
Death
0 2 4 6 8 10
time
MapperAlgorithm
• Generalizes Reeb graphs to track
connected components through
covers/nerves of a space with a defined
Morse function.
• Basic steps:
• Define distance metric on data
• Define filtration function (Morse function)
• Linear, density-based, curvature-based…
• Slice multidimensional dataset with that
function
• Examine function behavior across slice (level
set)
• Cluster by connected components of cover
• Plot clusters by overlap of points across
covers
Response
gradations
Outliers
Multiscale Mapper Methods
• Mapper clusters change with
parameter scale change
(unstable solutions).
• Filtrations at multiple
resolution settings to create
stability (see above example).
• Creates hierarchy of Reeb
graphs (mapper clusters) from
each slice.
• Analyze across slices to gain
deeper insight underlying data
structures.
1st Scale 2nd Scale
Scale
change
Psychometric
test example:
verbal vs.
math ability
Conclusion
• Morse functions underlie several methods used in modern data analysis.
• Understanding the theory and application can facilitate use on new data
problems, as well as development of new tools based on these methods.
• Combined with statistics and machine learning, these methods can create power
analytics pipelines yielding more insight than individual
Good References
• Carlsson,G. (2009).Topology and data. Bulletin of the American MathematicalSociety,
46(2), 255-308.
• Gerber, S., Rübel, O., Bremer, P.T., Pascucci,V., &Whitaker, R.T. (2013). Morse–smale
regression. Journal of Computational and Graphical Statistics, 22(1), 193-214.
• Edelsbrunner, H., & Harer, J. (2008). Persistent homology-a survey. Contemporary
mathematics, 453, 257-282.
• Forman, R. (2002).A user’s guide to discrete Morse theory. Sém. Lothar. Combin, 48, 35pp.
• Carr, H., Garth, C., &Weinkauf,T. (Eds.). (2017). Topological Methods in Data Analysis and
Visualization IV:Theory, Algorithms, and Applications. Springer.
• Di Fabio, B., & Landi,C. (2016).The edit distance for Reeb graphs of surfaces. Discrete &
Computational Geometry, 55(2), 423-461.

More Related Content

What's hot

Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Databricks
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 

What's hot (20)

Python and CSV Connectivity
Python and CSV ConnectivityPython and CSV Connectivity
Python and CSV Connectivity
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
3 classification
3  classification3  classification
3 classification
 
Morse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk ModelingMorse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk Modeling
 
パターン認識と機械学習 13章 系列データ
パターン認識と機械学習 13章 系列データパターン認識と機械学習 13章 系列データ
パターン認識と機械学習 13章 系列データ
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Decision tree
Decision treeDecision tree
Decision tree
 
Deep belief network.pptx
Deep belief network.pptxDeep belief network.pptx
Deep belief network.pptx
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear Regression
 
2013.12.26 prml勉強会 線形回帰モデル3.2~3.4
2013.12.26 prml勉強会 線形回帰モデル3.2~3.42013.12.26 prml勉強会 線形回帰モデル3.2~3.4
2013.12.26 prml勉強会 線形回帰モデル3.2~3.4
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Independent Component Analysis
Independent Component Analysis Independent Component Analysis
Independent Component Analysis
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
 
Understanding Feature Space in Machine Learning
Understanding Feature Space in Machine LearningUnderstanding Feature Space in Machine Learning
Understanding Feature Space in Machine Learning
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 

Similar to Topology for data science

Similar to Topology for data science (20)

Presentation
PresentationPresentation
Presentation
 
Applied GIS - 3022.pptx
Applied GIS - 3022.pptxApplied GIS - 3022.pptx
Applied GIS - 3022.pptx
 
Self Organizing Maps
Self Organizing MapsSelf Organizing Maps
Self Organizing Maps
 
ODSC India 2018: Topological space creation &amp; Clustering at BigData scale
ODSC India 2018: Topological space creation &amp; Clustering at BigData scaleODSC India 2018: Topological space creation &amp; Clustering at BigData scale
ODSC India 2018: Topological space creation &amp; Clustering at BigData scale
 
TYBSC IT PGIS Unit IV Spacial Data Analysis
TYBSC IT PGIS Unit IV  Spacial Data AnalysisTYBSC IT PGIS Unit IV  Spacial Data Analysis
TYBSC IT PGIS Unit IV Spacial Data Analysis
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 
Multiple UGV SLAM Map Sharing
Multiple UGV SLAM Map SharingMultiple UGV SLAM Map Sharing
Multiple UGV SLAM Map Sharing
 
Geospatial Data ppt.pptx
Geospatial Data ppt.pptxGeospatial Data ppt.pptx
Geospatial Data ppt.pptx
 
Seminar on gis analysis functions
Seminar on gis analysis functionsSeminar on gis analysis functions
Seminar on gis analysis functions
 
cs 601 - lecture 1.pptx
cs 601 - lecture 1.pptxcs 601 - lecture 1.pptx
cs 601 - lecture 1.pptx
 
Land Suitability Analysis.pdf
Land Suitability Analysis.pdfLand Suitability Analysis.pdf
Land Suitability Analysis.pdf
 
Topological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial SystemsTopological Data Analysis of Complex Spatial Systems
Topological Data Analysis of Complex Spatial Systems
 
Vector data model
Vector data model Vector data model
Vector data model
 
Vector data model
Vector data modelVector data model
Vector data model
 
Introduction to Real Time Rendering
Introduction to Real Time RenderingIntroduction to Real Time Rendering
Introduction to Real Time Rendering
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
GEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYSTGEOSTATISTICAL_ANALYST
GEOSTATISTICAL_ANALYST
 
06 image features
06 image features06 image features
06 image features
 
PPT s11-machine vision-s2
PPT s11-machine vision-s2PPT s11-machine vision-s2
PPT s11-machine vision-s2
 
Geospatial Database.pptx
Geospatial Database.pptxGeospatial Database.pptx
Geospatial Database.pptx
 

More from Colleen Farrelly

More from Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Topological Data Analysis.pptx
Topological Data Analysis.pptxTopological Data Analysis.pptx
Topological Data Analysis.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 

Recently uploaded

Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
great91
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
0uyfyq0q4
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 

Recently uploaded (20)

Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Heaps & its operation -Max Heap, Min Heap
Heaps & its operation -Max Heap, Min  HeapHeaps & its operation -Max Heap, Min  Heap
Heaps & its operation -Max Heap, Min Heap
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
123.docx. .
123.docx.                                 .123.docx.                                 .
123.docx. .
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 

Topology for data science

  • 1. TOPOLOGY FOR DATA SCIENCE: MORSETHEORY AND APPLICATION Colleen M. Farrelly
  • 2. Level Sets in Everyday Life • Front maps partition weather patterns by areas of the same pressure (isobars). • Elevation maps partition land areas by height above/below sea level.
  • 3. Level Sets of Functions • Continuous functions have defined local and global peaks, valleys, and passes. • Define height “slices” to partition function. • Akin to a cheese grater scraping off layers of a cheese block. • In the example, the blue lines slice a sine wave into pieces of similar height. • Function on discrete date (points) can be partitioned into level sets, too.
  • 4. Level Sets to Critical Points • Continuous functions: • Can be decomposed with level sets. • Contain local optima (critical points). • Maxima (peaks) • Minima (valleys) • Saddle points (inflections/height change) • Continuous functions can live in higher-dimensional spaces with more complicated critical points.
  • 5. Degenerate and Non-DegenerateOptima • Morse functions have stable and isolated local optima (non-degenerate critical points). • Related to 1st and 2nd derivatives of function. • Don’t change with small shifts to the function. • Technically, related to Hessian being defined/undefined at the critical point. • Reflects neighborhood behavior around the critical point. 1. Non-degenerate critical points have defined behavior in the critical point’s neighborhood. 2. Degenerate points have undefined behavior near the critical point. f’=0 f’=0 f’’(x)<0 f’’(x)>0 f’’(x)=0
  • 6. Morse Function Definition 1. None of the function’s critical points are degenerate. 2. None of the critical points share the same value. • These properties allow a map between a function’s critical point values to a space of level sets (left). • All critical values map to values in the level set collection. • Function can be plotted nicely to summarize its peaks, valleys, and in- between spaces. 1 0 -1 Level Set Critical Point Map
  • 7. Discrete Extensions to DataAnalysis • Morse functions can be extended to discrete spaces. • Data lives in a discrete point cloud. • Topological spaces, called simplicial complexes, can be built from these. • Several algorithms exist to connect points to each other via shared neighborhoods. • Vietoris-Rips complexes are built from connecting points with d distance from each other. • Any metric distance can be used. • Process turns data into a topological space upon which a Morse function can be defined. 2-d neighborhoods are defined by Euclidean distance. Points within a given circle are mutually connected, forming a simplex. Example simplicial complex
  • 8. Morse-Smale Clustering • Partition space between minima and maxima of function by flow. • Example: • The truncated sine wave shown has 2 minima and 2 maxima shown (dots). • Pieces between local minima and maxima define regions of the function. 1. Yellow 2. Blue 3. Red • Higher-dimensional spaces can be simplified by this partitioning. • Can be used to cluster data. • Subgroups can then be compared across characteristics using statistical tests (t- test, Chi square…). Cluster 1 Cluster 2 Cluster 3
  • 9. Intuitive 2-Dimensional Example • Imagine a soccer player kicking a ball on the ground of a hilly field. • The high and low points determine where the ball will come to rest. • These paths of the ball define which parts of the field share common hills and valleys. • These paths are actually gradient paths defined by height on the field’s topological space. • The spaces they define are the Morse-Smale complex of the field, partitioning it into different regions (clusters). Algorithms that compute Morse-Smale complexes typically follow this intuition.
  • 10. Morse-Smale Regression • Type of piece-wise regression. • Fit regression model to partitions found by Morse-Smale decompositions of a space given a Morse function. • Regression models include: • Linear and generalized linear models • Machine learning models • Random forest • Elastic net • Boosted regression • Neural/deep networks • Can examine group-wise differences in regression models. Example: 2 groups, 3 predictors
  • 11. Reeb Graphs • Track evolution of level sets through critical points of a Morse function. • Partition space according to a function (left by height). • Plot critical points entering model. • Track until they are subsumed into another partition. • Useful in image analytics and shape comparison.
  • 12. Persistent Homology • Filtration of simplicial complexes built from data • Iterative changing of lens with which to examine data (neighborhood size…) • Topological features (critical points) appear and disappear as the lens changes. • Creates a nested sequence of features with underlying algebraic properties, called a homology sequence: Hom1⊂Hom2⊂Hom3⊂Hom4 • Persistence gives length of feature existence in homology sequence. • Many plots (left) exist to summarize this information, and special statistical tools can compare datasets/topological spaces. • Filtration defines an MRI-type examination of data’s topological characteristics and evolution of critical points. 0 2 4 6 8 10 0246810 Birth Death 0 2 4 6 8 10 time
  • 13. MapperAlgorithm • Generalizes Reeb graphs to track connected components through covers/nerves of a space with a defined Morse function. • Basic steps: • Define distance metric on data • Define filtration function (Morse function) • Linear, density-based, curvature-based… • Slice multidimensional dataset with that function • Examine function behavior across slice (level set) • Cluster by connected components of cover • Plot clusters by overlap of points across covers Response gradations Outliers
  • 14. Multiscale Mapper Methods • Mapper clusters change with parameter scale change (unstable solutions). • Filtrations at multiple resolution settings to create stability (see above example). • Creates hierarchy of Reeb graphs (mapper clusters) from each slice. • Analyze across slices to gain deeper insight underlying data structures. 1st Scale 2nd Scale Scale change Psychometric test example: verbal vs. math ability
  • 15. Conclusion • Morse functions underlie several methods used in modern data analysis. • Understanding the theory and application can facilitate use on new data problems, as well as development of new tools based on these methods. • Combined with statistics and machine learning, these methods can create power analytics pipelines yielding more insight than individual
  • 16. Good References • Carlsson,G. (2009).Topology and data. Bulletin of the American MathematicalSociety, 46(2), 255-308. • Gerber, S., Rübel, O., Bremer, P.T., Pascucci,V., &Whitaker, R.T. (2013). Morse–smale regression. Journal of Computational and Graphical Statistics, 22(1), 193-214. • Edelsbrunner, H., & Harer, J. (2008). Persistent homology-a survey. Contemporary mathematics, 453, 257-282. • Forman, R. (2002).A user’s guide to discrete Morse theory. Sém. Lothar. Combin, 48, 35pp. • Carr, H., Garth, C., &Weinkauf,T. (Eds.). (2017). Topological Methods in Data Analysis and Visualization IV:Theory, Algorithms, and Applications. Springer. • Di Fabio, B., & Landi,C. (2016).The edit distance for Reeb graphs of surfaces. Discrete & Computational Geometry, 55(2), 423-461.