Topological Time-Series Data Analysis with
Delay-Variant Embedding
The University ofTokyo
CCS2019@NTU
zoro@biom.t.u-tokyo.ac.jp
Graduate School of Information Science andTechnology
Tran Quoc Hoan (Ph.D. Candidate)
Motivation
Diagram of the hierarchical organization of biology at different scales and examples of data that can be collected at these different levels.
(Image source https://researcher.watson.ibm.com/researcher/view_group.php?id=5372)
E.g.,Variant scales of biology
ā—¼ Reveal the black box (vs.
Deep Learning machines) to
understand the nature of
complex data
ā—¼ Reveal the variant scales in
complex data
2019/9/30 TopologicalTime-SeriesAnalysis 2/16
VariantTopological Features
ā—¼ Our research focuses on the shape of data to provide
insights into dynamics, variant scales via new and
general features which are robust under the perturbation
applied to the data.
ā—¼ The ā€œshapeā€ of data
→ Appearance of holes in the high-dimensional space
ā—¼ Dynamics &Variant scales
→ Oscillations, variant timescales.
ā—¼ Perturbation
→ Noise added to data
2019/9/30
We focus on time-
series data in this talk
TopologicalTime-SeriesAnalysis 3/16
Persistent Homology
ā—¼ An algebraic method to encode the topological structures
of data into quantitative features
Finite set of points, networks, etc.
We need
i.e., holes
āž¢ Mathematically define the ā€œholeā€
āž¢ Quantitatively calculate the ā€œholeā€
2019/9/30 TopologicalTime-SeriesAnalysis 4/16
2019/9/30 Topological Time-Series Analysis 5/24
What is hole?
ā—¼ 0-dimensional holes: connected components
ā—¼ 1-dimensional holes: rings, loops, tunnels
1-dimensional graph (in š¾) without
boundary, they are not boundary of
any 2-dimensional graph in š¾
Not a hole
š¾
A hole
ā—¼ 2-dimensional holes: cavities, voids
2-dimensional graph (in š¾) without
boundary, they are not boundary of
any 3-dimensional graph in š¾ Not a hole
solid
empty
A hole
2019/9/30 TopologicalTime-SeriesAnalysis 6/16
Represent the ā€œholeā€
ā—¼ Idea: connect nearby points, fill in complete geometrical
shapes
šœ€
1. Choose
a distance
šœ€.
2. Connect
pairs of
points that
are no further
apart than šœ€.
3. Fill in complete
geometrical shapes
(triangle, tetrahedron, etc.).
4. Homology detects the hole
Problem:
How do we choose
the distance šœ€
2019/9/30 This figure is designed similarly with the slide ā€œIntroduction to Persistent Homologyā€ of Dr. Matthew L.Wright 7/16
šœ€
How to choose distance šœ€?
This šœ€
looks
good.
Innovation Idea
Consider all
distances šœ€.
How to
distinguish
this hole with
other ?
2019/9/30 8/16
This figure is designed similarly with the slide ā€œIntroduction to Persistent Homologyā€ of Dr. Matthew L.Wright
šœ€: 0 1 2 3
Barcodes: Monitor the change of topological structures
2019/9/30 9/16
This figure is designed similarly with the slide ā€œIntroduction to Persistent Homologyā€ of Dr. Matthew L.Wright
Topological features = Persistence Diagram
A persistence diagram is a two-
dimensional representation of a
barcode
ā—¼ Multi-set points with coordinates (b,
d) which represent birth-scale (b) and
death-scale (d) of the hole.
Birth-scale
Death-scale
2019/9/30 TopologicalTime-SeriesAnalysis 10/16
Time-series reconstruction
ā—¼ Situation: Dynamics and variant
scales of time-series data
ā—¼ Oscillations,chaotic
ā—¼ Variant time scales
š‘‹(š‘š)
šœ
š‘” = š‘„ š‘” , š‘„ š‘” āˆ’ šœ , … , š‘„ š‘” āˆ’ š‘š āˆ’ 1 šœ
š‘‹(š‘š)
šœ
1
š‘„ š‘”
š‘„ š‘” āˆ’ 2šœ š‘„ š‘” āˆ’ šœ
š‘‹(š‘š)
šœ
2
š‘‹(š‘š)
šœ
3
šœ
šœ
šœ šœ
šœ
šœ
š‘„ š‘”
š‘‹(š‘š)
šœ
1
š‘‹(š‘š)
šœ
2
š‘‹ š‘š
šœ
(3)
Delay embedding
(Taken’s theorem)
Transform timeseries to finite set
of points in high-dimensional space
2019/9/30 TopologicalTime-SeriesAnalysis 11/16
Patterns from delay-embedding
Limit cycle
Limit torus Strange attractor
(Topological) Patterns
from delay-embedding
represent the behavior
of attractors which
provide insights into
dynamical system
Attractors of Dynamical System
Fixed point
2019/9/30 TopologicalTime-SeriesAnalysis 12/16
Problem in delay-embedding
Determining time delay is sensitive and problem-dependent
ā—¼ Well-known methods: mutual information, auto correlation, etc
ā—¼ A real timeseries is noisy and has a finite length
It is not well-defined to
evaluate the shape of
embedded points from
embedding space
2019/9/30 TopologicalTime-SeriesAnalysis 13/16
(Proposed) Delay-variant embedding
ā—¼ Considering time delay šœ
as the variable parameter
ā—¼ Monitor the variation of
topological structures in
the embedded space
ā—¼ Construct the
topological features for
each šœ, then integrate
these features with
šœ serving as an additional
dimension
š‘„(š‘”)
š‘„(š‘”)
š‘„(š‘” āˆ’ šœ)
š‘„(š‘” āˆ’ 2šœ)
2019/9/30 TopologicalTime-SeriesAnalysis
Three-dimensional persistence diagram (3PD)
14/16
Stability theorem
š‘‘šµ,šœ‰
(3)
š·(š‘™,š‘š)
(3)
š‘„ , š·(š‘™,š‘š)
(3)
š‘¦ ≤ 2 š‘š max
š‘”āˆˆš•‹
š‘„ š‘” āˆ’ š‘¦(š‘”)
Theorem: Stability theorem
ā—¼ If š‘„ š‘” is perturbed by noise to š‘¦ š‘” = š‘„ š‘” + šœ–(š‘”) then the upper limit
of the distance between diagrams is governed by magnitude of šœ–(š‘”)
ā—¼ The 3PDs are robust w.r.t time-series data being perturbed by noise
ā—¼ The 3PDs can be used as discriminating features for characterizing the
time series
2019/9/30 TopologicalTime-SeriesAnalysis 15/24
Kernel method
ā—¼ Can define an inner product
ā—¼ Use in (linear) statistical-learning tasks (e.g.,
SVM)
šø
š¹
The space of diagrams
ā—¼ Not a vector space
ā—¼ Difficult to use in (linear)
statistical-learning tasks (e.g.,
classification)
ā—¼ Cannot define an inner product
Ī©
Ī¦šø
Ī¦š¹
, š»š‘
Feature mapping
Φ
Feature-mapped space
š»š‘
Hilbert space
ā—¼ Use in unsupervised learning tasks (e.g.,
Kernel PCA, Kernel Change Point Detection)
2019/9/30 TopologicalTime-SeriesAnalysis 16/16
Scenarios for Applications
ā—¼ Identify the dynamics of
biological model via observed
noisy biological timeseries
āž¢ E.g., stochastic oscillations in
single-cell live imaging time
series
ā—¼ Classify real time-series data
āž¢ E.g., ECG data, sensor data
(Image source https://slideplayer.com/slide/7612898/ )
Stochastic model of the Hes1
genetic oscillator (N.A. Monk,
Curr. Biol. 13, 2003)
2019/9/30 17/16
Topological time-series analysis with delay-variant embedding, Physical Review E 100, 032308, 2019
2019/9/30 Topological Time-Series Analysis 18/16
Thanks for
listening!

CCS2019-opological time-series analysis with delay-variant embedding

  • 1.
    Topological Time-Series DataAnalysis with Delay-Variant Embedding The University ofTokyo CCS2019@NTU zoro@biom.t.u-tokyo.ac.jp Graduate School of Information Science andTechnology Tran Quoc Hoan (Ph.D. Candidate)
  • 2.
    Motivation Diagram of thehierarchical organization of biology at different scales and examples of data that can be collected at these different levels. (Image source https://researcher.watson.ibm.com/researcher/view_group.php?id=5372) E.g.,Variant scales of biology ā—¼ Reveal the black box (vs. Deep Learning machines) to understand the nature of complex data ā—¼ Reveal the variant scales in complex data 2019/9/30 TopologicalTime-SeriesAnalysis 2/16
  • 3.
    VariantTopological Features ā—¼ Ourresearch focuses on the shape of data to provide insights into dynamics, variant scales via new and general features which are robust under the perturbation applied to the data. ā—¼ The ā€œshapeā€ of data → Appearance of holes in the high-dimensional space ā—¼ Dynamics &Variant scales → Oscillations, variant timescales. ā—¼ Perturbation → Noise added to data 2019/9/30 We focus on time- series data in this talk TopologicalTime-SeriesAnalysis 3/16
  • 4.
    Persistent Homology ā—¼ Analgebraic method to encode the topological structures of data into quantitative features Finite set of points, networks, etc. We need i.e., holes āž¢ Mathematically define the ā€œholeā€ āž¢ Quantitatively calculate the ā€œholeā€ 2019/9/30 TopologicalTime-SeriesAnalysis 4/16
  • 5.
  • 6.
    What is hole? ā—¼0-dimensional holes: connected components ā—¼ 1-dimensional holes: rings, loops, tunnels 1-dimensional graph (in š¾) without boundary, they are not boundary of any 2-dimensional graph in š¾ Not a hole š¾ A hole ā—¼ 2-dimensional holes: cavities, voids 2-dimensional graph (in š¾) without boundary, they are not boundary of any 3-dimensional graph in š¾ Not a hole solid empty A hole 2019/9/30 TopologicalTime-SeriesAnalysis 6/16
  • 7.
    Represent the ā€œholeā€ ā—¼Idea: connect nearby points, fill in complete geometrical shapes šœ€ 1. Choose a distance šœ€. 2. Connect pairs of points that are no further apart than šœ€. 3. Fill in complete geometrical shapes (triangle, tetrahedron, etc.). 4. Homology detects the hole Problem: How do we choose the distance šœ€ 2019/9/30 This figure is designed similarly with the slide ā€œIntroduction to Persistent Homologyā€ of Dr. Matthew L.Wright 7/16
  • 8.
    šœ€ How to choosedistance šœ€? This šœ€ looks good. Innovation Idea Consider all distances šœ€. How to distinguish this hole with other ? 2019/9/30 8/16 This figure is designed similarly with the slide ā€œIntroduction to Persistent Homologyā€ of Dr. Matthew L.Wright
  • 9.
    šœ€: 0 12 3 Barcodes: Monitor the change of topological structures 2019/9/30 9/16 This figure is designed similarly with the slide ā€œIntroduction to Persistent Homologyā€ of Dr. Matthew L.Wright
  • 10.
    Topological features =Persistence Diagram A persistence diagram is a two- dimensional representation of a barcode ā—¼ Multi-set points with coordinates (b, d) which represent birth-scale (b) and death-scale (d) of the hole. Birth-scale Death-scale 2019/9/30 TopologicalTime-SeriesAnalysis 10/16
  • 11.
    Time-series reconstruction ā—¼ Situation:Dynamics and variant scales of time-series data ā—¼ Oscillations,chaotic ā—¼ Variant time scales š‘‹(š‘š) šœ š‘” = š‘„ š‘” , š‘„ š‘” āˆ’ šœ , … , š‘„ š‘” āˆ’ š‘š āˆ’ 1 šœ š‘‹(š‘š) šœ 1 š‘„ š‘” š‘„ š‘” āˆ’ 2šœ š‘„ š‘” āˆ’ šœ š‘‹(š‘š) šœ 2 š‘‹(š‘š) šœ 3 šœ šœ šœ šœ šœ šœ š‘„ š‘” š‘‹(š‘š) šœ 1 š‘‹(š‘š) šœ 2 š‘‹ š‘š šœ (3) Delay embedding (Taken’s theorem) Transform timeseries to finite set of points in high-dimensional space 2019/9/30 TopologicalTime-SeriesAnalysis 11/16
  • 12.
    Patterns from delay-embedding Limitcycle Limit torus Strange attractor (Topological) Patterns from delay-embedding represent the behavior of attractors which provide insights into dynamical system Attractors of Dynamical System Fixed point 2019/9/30 TopologicalTime-SeriesAnalysis 12/16
  • 13.
    Problem in delay-embedding Determiningtime delay is sensitive and problem-dependent ā—¼ Well-known methods: mutual information, auto correlation, etc ā—¼ A real timeseries is noisy and has a finite length It is not well-defined to evaluate the shape of embedded points from embedding space 2019/9/30 TopologicalTime-SeriesAnalysis 13/16
  • 14.
    (Proposed) Delay-variant embedding ā—¼Considering time delay šœ as the variable parameter ā—¼ Monitor the variation of topological structures in the embedded space ā—¼ Construct the topological features for each šœ, then integrate these features with šœ serving as an additional dimension š‘„(š‘”) š‘„(š‘”) š‘„(š‘” āˆ’ šœ) š‘„(š‘” āˆ’ 2šœ) 2019/9/30 TopologicalTime-SeriesAnalysis Three-dimensional persistence diagram (3PD) 14/16
  • 15.
    Stability theorem š‘‘šµ,šœ‰ (3) š·(š‘™,š‘š) (3) š‘„ ,š·(š‘™,š‘š) (3) š‘¦ ≤ 2 š‘š max š‘”āˆˆš•‹ š‘„ š‘” āˆ’ š‘¦(š‘”) Theorem: Stability theorem ā—¼ If š‘„ š‘” is perturbed by noise to š‘¦ š‘” = š‘„ š‘” + šœ–(š‘”) then the upper limit of the distance between diagrams is governed by magnitude of šœ–(š‘”) ā—¼ The 3PDs are robust w.r.t time-series data being perturbed by noise ā—¼ The 3PDs can be used as discriminating features for characterizing the time series 2019/9/30 TopologicalTime-SeriesAnalysis 15/24
  • 16.
    Kernel method ā—¼ Candefine an inner product ā—¼ Use in (linear) statistical-learning tasks (e.g., SVM) šø š¹ The space of diagrams ā—¼ Not a vector space ā—¼ Difficult to use in (linear) statistical-learning tasks (e.g., classification) ā—¼ Cannot define an inner product Ī© Ī¦šø Ī¦š¹ , š»š‘ Feature mapping Φ Feature-mapped space š»š‘ Hilbert space ā—¼ Use in unsupervised learning tasks (e.g., Kernel PCA, Kernel Change Point Detection) 2019/9/30 TopologicalTime-SeriesAnalysis 16/16
  • 17.
    Scenarios for Applications ā—¼Identify the dynamics of biological model via observed noisy biological timeseries āž¢ E.g., stochastic oscillations in single-cell live imaging time series ā—¼ Classify real time-series data āž¢ E.g., ECG data, sensor data (Image source https://slideplayer.com/slide/7612898/ ) Stochastic model of the Hes1 genetic oscillator (N.A. Monk, Curr. Biol. 13, 2003) 2019/9/30 17/16 Topological time-series analysis with delay-variant embedding, Physical Review E 100, 032308, 2019
  • 18.
  • 19.