SlideShare a Scribd company logo
T O P O L O G I C A L
D ATA A N A LY S I S
C O L L E E N M . F A R R E L L Y ,
D A T A S E M B L Y
W H Y
T O P O L O G I C A L
D ATA A N A LY S I S ?
• Autocorrelations/dynamic systems (time
series, spatiotemporal data)
• Wide data (-omics data)
• Small data (pilot studies, rare diseases…)
• Visualization-heavy needs for
comparisons/groups (especially high-
dimensional data)
• Data that breaks assumptions of machine
learning algorithms/statistical models
E X A M P L E S
O F T D A
T O O L S
Persistent homology
Mapper algorithm
Homotopy continuation
Morse functions/clustering/regression
Euler calculus
Discrete exterior calculus
Ricci curvature
Mappings to Teichmüller space
P E R S I S T E N T
H O M O L O G Y
C O M P A R I N G G R O U P S A N D E X T E N D I N G
H I E R A R C H I C A L C L U S T E R I N G
P O I N T C L O U D S
A N D D I S TA N C E
M E T R I C S
S I M P L I C I A
L
C O M P L E X E
S
H O M O L O G Y O V E R V I E W : B E T T I
N U M B E R S
(1,0,0…) (1,1,0…) (1,0,1…)
F I LT R AT I O N S
A N D
P E R S I S T E N C
E
• Filter distances or objects to
obtain a series of topological
objects (graphs, simplicial
complexes…)
• Compute a series of metrics
or summary statistics over
filtrations
• Track how metrics/statistics
A L G O R I T H M D E TA I L S
Rips filtration
• Pairwise intersections
of ɛ-balls centered at a
given point in the point
cloud or distance
matrix
Dimension parameter
• Number of Betti
numbers to compute
(usually set to a
dimension of 0 or 1)
Diagram
parameters/distance
computation parameters
• Optional visualization
or statistical testing
functions after using
ripser()
I M P L E M E N TAT I O N I N P Y T H O N O R R
• TDAstats
• TDAverse
R packages
• Scikit-TDA
• Ripser/persim
• Giotto-TDA
Python packages
E X A M P L E
A N A L Y S I S :
P R O B L E M / D A T
A
Small set of BERT-
embedded poems that are
either humorous or serious
in tone
Want to understand if there
are significant differences in
BERT features between the
two sets of poems
M A P P E R
C L U S T E R I N G A N D D A T A M I N I N G
M O R S E
F U N C T I O N S
: H E I G H T
F U N C T I O N S
A N D
C R I T I C A L
P O I N T S
N E R V E S : O P E N
C O V E R I N G S
A L G O R I T H M D E TA I L S
Project Data
• Takes input
data and
projects to
custom
embeddings
(3-
dimensional
space, knn
distances…)
Create Cover
• Percent of
overlap
across
covers and
number of
covers
(different
results with
different
parameters)
Cluster
• DBSCAN or
other
clusterers
available in
scikit-learn
Save Model
• Save output
and details
to a
webpage
(path_html)
I M P L E M E N TAT I O N I N P Y T H O N O R
R
• TDAmapper
R packages
• Kepler-Mapper (part of Scikit)
• Giotto-TDA
• tmap
Python packages
E X A M P L E A N A LY S I S :
P R O B L E M / D ATA
Small set of BERT-embedded poems that
are either humorous or serious in tone
Want to cluster poems to understand the
existence of subgroups
R I C C I C U R VAT U R E
F I N D I N G K E Y P I E C E S O F A S O C I A L
N E T W O R K
R I C C I
C U R VAT U R E
Negative
Zero
Positive
P O W E R / D I S E A S E
N E T W O R K B A C K B O N E S
A L G O R I T H M D E TA I L S
Calculate Curvature
on Edges
• Examine vertices
and their adjacent
edges to see how
much “pull” there is
on an edge
Calculate Curvature
on Vertices
• Sum up edge
weights around a
vertex to find out
how much “stuff” is
weighing it down
I M P L E M E N TAT I O N I N P Y T H O N O R
R
• Custom in igraph
R packages
• Custom in igraph
• Custom in networkx
Python packages
E X A M P L E A N A LY S I S :
P R O B L E M / D ATA
Town network representing a supply
chain (medical, food, electricity…)
Want to understand vulnerabilities
that exist within the network

More Related Content

What's hot

SIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase TransitionSIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase Transition
Ha Phuong
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
SOYEON KIM
 
Spatial Autocorrelation
Spatial AutocorrelationSpatial Autocorrelation
Spatial Autocorrelation
Ehsan Hamzei
 
Survival Analysis Superlearner
Survival Analysis SuperlearnerSurvival Analysis Superlearner
Survival Analysis Superlearner
Colleen Farrelly
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
Kyunghoon Kim
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
Pier Luca Lanzi
 
Dimensionality reduction with UMAP
Dimensionality reduction with UMAPDimensionality reduction with UMAP
Dimensionality reduction with UMAP
Jakub Bartczuk
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
freshdatabos
 
Tda presentation
Tda presentationTda presentation
Tda presentation
HJ van Veen
 
Diagnostic in poisson regression models
Diagnostic in poisson regression modelsDiagnostic in poisson regression models
Diagnostic in poisson regression models
University of Southampton
 
Causal discovery and prediction mechanisms
Causal discovery and prediction mechanismsCausal discovery and prediction mechanisms
Causal discovery and prediction mechanisms
Shiga University, RIKEN
 
Bayesian inference
Bayesian inferenceBayesian inference
Bayesian inference
CharthaGaglani
 
Spatial Data Science with R
Spatial Data Science with RSpatial Data Science with R
Spatial Data Science with R
amsantac
 
PRML輪読#12
PRML輪読#12PRML輪読#12
PRML輪読#12
matsuolab
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNE
Yan Xu
 
Tim Maudlin: New Foundations for Physical Geometry
Tim Maudlin: New Foundations for Physical GeometryTim Maudlin: New Foundations for Physical Geometry
Tim Maudlin: New Foundations for Physical Geometry
Arun Gupta
 
High-Dimensional Data Visualization, Geometry, and Stock Market Crashes
High-Dimensional Data Visualization, Geometry, and Stock Market CrashesHigh-Dimensional Data Visualization, Geometry, and Stock Market Crashes
High-Dimensional Data Visualization, Geometry, and Stock Market Crashes
Colleen Farrelly
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
Alberto Labarga
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Databricks
 
Exploratory data analysis using xgboost package in R
Exploratory data analysis using xgboost package in RExploratory data analysis using xgboost package in R
Exploratory data analysis using xgboost package in R
Satoshi Kato
 

What's hot (20)

SIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase TransitionSIAM-AG21-Topological Persistence Machine of Phase Transition
SIAM-AG21-Topological Persistence Machine of Phase Transition
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
 
Spatial Autocorrelation
Spatial AutocorrelationSpatial Autocorrelation
Spatial Autocorrelation
 
Survival Analysis Superlearner
Survival Analysis SuperlearnerSurvival Analysis Superlearner
Survival Analysis Superlearner
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
Dimensionality reduction with UMAP
Dimensionality reduction with UMAPDimensionality reduction with UMAP
Dimensionality reduction with UMAP
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
Tda presentation
Tda presentationTda presentation
Tda presentation
 
Diagnostic in poisson regression models
Diagnostic in poisson regression modelsDiagnostic in poisson regression models
Diagnostic in poisson regression models
 
Causal discovery and prediction mechanisms
Causal discovery and prediction mechanismsCausal discovery and prediction mechanisms
Causal discovery and prediction mechanisms
 
Bayesian inference
Bayesian inferenceBayesian inference
Bayesian inference
 
Spatial Data Science with R
Spatial Data Science with RSpatial Data Science with R
Spatial Data Science with R
 
PRML輪読#12
PRML輪読#12PRML輪読#12
PRML輪読#12
 
Visualization using tSNE
Visualization using tSNEVisualization using tSNE
Visualization using tSNE
 
Tim Maudlin: New Foundations for Physical Geometry
Tim Maudlin: New Foundations for Physical GeometryTim Maudlin: New Foundations for Physical Geometry
Tim Maudlin: New Foundations for Physical Geometry
 
High-Dimensional Data Visualization, Geometry, and Stock Market Crashes
High-Dimensional Data Visualization, Geometry, and Stock Market CrashesHigh-Dimensional Data Visualization, Geometry, and Stock Market Crashes
High-Dimensional Data Visualization, Geometry, and Stock Market Crashes
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 
Exploratory data analysis using xgboost package in R
Exploratory data analysis using xgboost package in RExploratory data analysis using xgboost package in R
Exploratory data analysis using xgboost package in R
 

Similar to Topological Data Analysis.pptx

SPATIAL DB IN DATABASE MANAGEMENT SYSTEM
SPATIAL DB IN DATABASE MANAGEMENT SYSTEMSPATIAL DB IN DATABASE MANAGEMENT SYSTEM
SPATIAL DB IN DATABASE MANAGEMENT SYSTEM
ANITHAR21446
 
Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in Python
Valerio Maggio
 
Statistical Programming with JavaScript
Statistical Programming with JavaScriptStatistical Programming with JavaScript
Statistical Programming with JavaScript
David Simons
 
Graph theory in Practise
Graph theory in PractiseGraph theory in Practise
Graph theory in Practise
David Simons
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Databricks
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
Ivo Andreev
 
Introduction to itraoral scanner technology.pptx
Introduction to itraoral scanner technology.pptxIntroduction to itraoral scanner technology.pptx
Introduction to itraoral scanner technology.pptx
riyathaker981
 
Choosing the Right Database
Choosing the Right DatabaseChoosing the Right Database
Choosing the Right Database
David Simons
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limiters
Marina Kolpakova
 
Witchcraft
WitchcraftWitchcraft
Witchcraft
Brooklyn Zelenka
 
SVD.ppt
SVD.pptSVD.ppt
SVD.ppt
cmpt cmpt
 
Bristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQLBristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQL
David Simons
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at Scale
David Simons
 
PS
PSPS
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
Julie Iskander
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud data
Tony Fast
 
Recurrence Quantification Analysis : Tutorial & application to eye-movement data
Recurrence Quantification Analysis :Tutorial & application to eye-movement dataRecurrence Quantification Analysis :Tutorial & application to eye-movement data
Recurrence Quantification Analysis : Tutorial & application to eye-movement dataDeb Aks
 
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Cynthia Freeman
 
Introduction to Compiler Development
Introduction to Compiler DevelopmentIntroduction to Compiler Development
Introduction to Compiler Development
Logan Chien
 

Similar to Topological Data Analysis.pptx (20)

SPATIAL DB IN DATABASE MANAGEMENT SYSTEM
SPATIAL DB IN DATABASE MANAGEMENT SYSTEMSPATIAL DB IN DATABASE MANAGEMENT SYSTEM
SPATIAL DB IN DATABASE MANAGEMENT SYSTEM
 
Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in Python
 
Statistical Programming with JavaScript
Statistical Programming with JavaScriptStatistical Programming with JavaScript
Statistical Programming with JavaScript
 
Graph theory in Practise
Graph theory in PractiseGraph theory in Practise
Graph theory in Practise
 
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
Matrix Factorizations at Scale: a Comparison of Scientific Data Analytics on ...
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
 
Introduction to itraoral scanner technology.pptx
Introduction to itraoral scanner technology.pptxIntroduction to itraoral scanner technology.pptx
Introduction to itraoral scanner technology.pptx
 
Choosing the Right Database
Choosing the Right DatabaseChoosing the Right Database
Choosing the Right Database
 
Code GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limitersCode GPU with CUDA - Identifying performance limiters
Code GPU with CUDA - Identifying performance limiters
 
Witchcraft
WitchcraftWitchcraft
Witchcraft
 
SVD.ppt
SVD.pptSVD.ppt
SVD.ppt
 
Bristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQLBristol Uni - Use Cases of NoSQL
Bristol Uni - Use Cases of NoSQL
 
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at Scale
 
PS
PSPS
PS
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud data
 
Recurrence Quantification Analysis : Tutorial & application to eye-movement data
Recurrence Quantification Analysis :Tutorial & application to eye-movement dataRecurrence Quantification Analysis :Tutorial & application to eye-movement data
Recurrence Quantification Analysis : Tutorial & application to eye-movement data
 
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
 
Introduction to Compiler Development
Introduction to Compiler DevelopmentIntroduction to Compiler Development
Introduction to Compiler Development
 
Q
QQ
Q
 

More from Colleen Farrelly

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
Colleen Farrelly
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
Colleen Farrelly
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
Colleen Farrelly
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
Colleen Farrelly
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
Colleen Farrelly
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
Colleen Farrelly
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
Colleen Farrelly
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
Colleen Farrelly
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
Colleen Farrelly
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
Colleen Farrelly
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
Colleen Farrelly
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
Colleen Farrelly
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
Colleen Farrelly
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
Colleen Farrelly
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
Colleen Farrelly
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
Colleen Farrelly
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
Colleen Farrelly
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
Colleen Farrelly
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
Colleen Farrelly
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceWIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network Science
Colleen Farrelly
 

More from Colleen Farrelly (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023Hands-On Network Science, PyData Global 2023
Hands-On Network Science, PyData Global 2023
 
Modeling Climate Change.pptx
Modeling Climate Change.pptxModeling Climate Change.pptx
Modeling Climate Change.pptx
 
Natural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptxNatural Language Processing for Beginners.pptx
Natural Language Processing for Beginners.pptx
 
The Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptxThe Shape of Data--ODSC.pptx
The Shape of Data--ODSC.pptx
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Emerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptxEmerging Technologies for Public Health in Remote Locations.pptx
Emerging Technologies for Public Health in Remote Locations.pptx
 
Applications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptxApplications of Forman-Ricci Curvature.pptx
Applications of Forman-Ricci Curvature.pptx
 
Geometry for Social Good.pptx
Geometry for Social Good.pptxGeometry for Social Good.pptx
Geometry for Social Good.pptx
 
Topology for Time Series.pptx
Topology for Time Series.pptxTopology for Time Series.pptx
Topology for Time Series.pptx
 
Time Series Applications AMLD.pptx
Time Series Applications AMLD.pptxTime Series Applications AMLD.pptx
Time Series Applications AMLD.pptx
 
An introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptxAn introduction to quantum machine learning.pptx
An introduction to quantum machine learning.pptx
 
An introduction to time series data with R.pptx
An introduction to time series data with R.pptxAn introduction to time series data with R.pptx
An introduction to time series data with R.pptx
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Geometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptxGeometry, Data, and One Path Into Data Science.pptx
Geometry, Data, and One Path Into Data Science.pptx
 
Transforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptxTransforming Text Data to Matrix Data via Embeddings.pptx
Transforming Text Data to Matrix Data via Embeddings.pptx
 
Natural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptxNatural Language Processing in the Wild.pptx
Natural Language Processing in the Wild.pptx
 
SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing SAS Global 2021 Introduction to Natural Language Processing
SAS Global 2021 Introduction to Natural Language Processing
 
2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk2021 American Mathematical Society Data Science Talk
2021 American Mathematical Society Data Science Talk
 
WIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network ScienceWIDS 2021--An Introduction to Network Science
WIDS 2021--An Introduction to Network Science
 

Recently uploaded

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Topological Data Analysis.pptx

  • 1. T O P O L O G I C A L D ATA A N A LY S I S C O L L E E N M . F A R R E L L Y , D A T A S E M B L Y
  • 2. W H Y T O P O L O G I C A L D ATA A N A LY S I S ? • Autocorrelations/dynamic systems (time series, spatiotemporal data) • Wide data (-omics data) • Small data (pilot studies, rare diseases…) • Visualization-heavy needs for comparisons/groups (especially high- dimensional data) • Data that breaks assumptions of machine learning algorithms/statistical models
  • 3. E X A M P L E S O F T D A T O O L S Persistent homology Mapper algorithm Homotopy continuation Morse functions/clustering/regression Euler calculus Discrete exterior calculus Ricci curvature Mappings to Teichmüller space
  • 4. P E R S I S T E N T H O M O L O G Y C O M P A R I N G G R O U P S A N D E X T E N D I N G H I E R A R C H I C A L C L U S T E R I N G
  • 5. P O I N T C L O U D S A N D D I S TA N C E M E T R I C S
  • 6. S I M P L I C I A L C O M P L E X E S
  • 7. H O M O L O G Y O V E R V I E W : B E T T I N U M B E R S (1,0,0…) (1,1,0…) (1,0,1…)
  • 8. F I LT R AT I O N S A N D P E R S I S T E N C E • Filter distances or objects to obtain a series of topological objects (graphs, simplicial complexes…) • Compute a series of metrics or summary statistics over filtrations • Track how metrics/statistics
  • 9. A L G O R I T H M D E TA I L S Rips filtration • Pairwise intersections of ɛ-balls centered at a given point in the point cloud or distance matrix Dimension parameter • Number of Betti numbers to compute (usually set to a dimension of 0 or 1) Diagram parameters/distance computation parameters • Optional visualization or statistical testing functions after using ripser()
  • 10. I M P L E M E N TAT I O N I N P Y T H O N O R R • TDAstats • TDAverse R packages • Scikit-TDA • Ripser/persim • Giotto-TDA Python packages
  • 11. E X A M P L E A N A L Y S I S : P R O B L E M / D A T A Small set of BERT- embedded poems that are either humorous or serious in tone Want to understand if there are significant differences in BERT features between the two sets of poems
  • 12. M A P P E R C L U S T E R I N G A N D D A T A M I N I N G
  • 13. M O R S E F U N C T I O N S : H E I G H T F U N C T I O N S A N D C R I T I C A L P O I N T S
  • 14. N E R V E S : O P E N C O V E R I N G S
  • 15. A L G O R I T H M D E TA I L S Project Data • Takes input data and projects to custom embeddings (3- dimensional space, knn distances…) Create Cover • Percent of overlap across covers and number of covers (different results with different parameters) Cluster • DBSCAN or other clusterers available in scikit-learn Save Model • Save output and details to a webpage (path_html)
  • 16. I M P L E M E N TAT I O N I N P Y T H O N O R R • TDAmapper R packages • Kepler-Mapper (part of Scikit) • Giotto-TDA • tmap Python packages
  • 17. E X A M P L E A N A LY S I S : P R O B L E M / D ATA Small set of BERT-embedded poems that are either humorous or serious in tone Want to cluster poems to understand the existence of subgroups
  • 18. R I C C I C U R VAT U R E F I N D I N G K E Y P I E C E S O F A S O C I A L N E T W O R K
  • 19. R I C C I C U R VAT U R E Negative Zero Positive
  • 20. P O W E R / D I S E A S E N E T W O R K B A C K B O N E S
  • 21. A L G O R I T H M D E TA I L S Calculate Curvature on Edges • Examine vertices and their adjacent edges to see how much “pull” there is on an edge Calculate Curvature on Vertices • Sum up edge weights around a vertex to find out how much “stuff” is weighing it down
  • 22. I M P L E M E N TAT I O N I N P Y T H O N O R R • Custom in igraph R packages • Custom in igraph • Custom in networkx Python packages
  • 23. E X A M P L E A N A LY S I S : P R O B L E M / D ATA Town network representing a supply chain (medical, food, electricity…) Want to understand vulnerabilities that exist within the network