013_20160328_Topological_Measurement_Of_Protein_Compressibility

A topological measurement
of protein compressibility
Tran Quoc Hoan
@k09hthaduonght.wordpress.com/
28 March 2016, Paper Alert, Hasegawa lab., Tokyo
The University of Tokyo
Marcio Gameiro et. al. (Japan J. Indust. Appl. Math (2015) 32:1-17)

Abstract
Topological Measurement of Protein Compressibility 2
…we partially clarify the relation between the compressibility of
a protein and its molecular geometric structure. To identify and
understand the relevant topological features within a given
protein, we model its molecule as an alpha ﬁltration and hence
obtain multi-scale insight into the structure of its tunnels and
cavities. The persistence diagrams of this alpha ﬁltration
capture the sizes and robustness of such tunnels and cavities in a
compact and meaningful manner…
Our main result establishes a clear linear correlation between
the topological measure and the experimentally-determined
compressibility of most proteins for which both PDB information
and experimental compressibility data are available…..

Tutorial of
Topological Data Analysis
Tran Quoc Hoan
@k09hthaduonght.wordpress.com/
Hasegawa lab., Tokyo
The University of Tokyo
Part I - Basic Concepts

Outline
TDA - Basic Concepts 4
1. Topology and holes
3. Deﬁnition of holes
5. Some of applications
2. Simplicial complexes
4. Persistent homology

Outline

Topology
I - Topology and Holes 6
The properties of space that are preserved under continuous
deformations, such as stretching and bending, but not tearing or
gluing
⇠= ⇠= ⇠=
⇠= ⇠= ⇠=
⇠=

Invariant
7
Question: what are invariant things in topology?
⇠= ⇠= ⇠=
⇠= ⇠=
⇠=
⇠=
Connected 
Component Ring Cavity
1 0 0
2 0 0
1 1 0
1 10
Number of
I - Topology and Holes

Holes and dimension
8
Topology: consider the continuous deformation under the
same dimensional hole
✤ Concern to forming of shape: connected component, ring, cavity
• 0-dimensional “hole” = connected component
• 1-dimensional “hole” = ring
• 2-dimensional “hole” = cavity
How to deﬁne “hole”?
Use “algebraic” Homology group

Homology group
9
✤ For geometric object X, homology Hl satisﬁed:
k0 : number of connected components
k1 : number of rings
k2 : number of cavities
kq : number of q-dimensional holes
Betti-numbers
Image source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Outline

Simplicial complexes
11
Simplicial complex:
A set of vertexes, edges, triangles, tetrahedrons, … that are closed
under taking faces and that have no improper intersections
vertex 
(0-dimension)
edge 
(1-dimension)
triangle 
(2-dimension)
tetrahedron 
(3-dimension)
simplicial
complex
not simplicial
complex
2 - Simplicial complexes
k-simplex

Simplicial
12
n-simplex:
The “smallest” convex hull of n+1 aﬃnity independent points
vertex 
(0-dimension)
edge 
(1-dimension)
triangle 
(2-dimension)
tetrahedron 
(3-dimension)
n-simplex
= |v0v1...vn| = { 0v0 + 1v1 + ... + nvn| 0 + ... + n = 1, i 0}
A m-face of σ is the convex hull τ = |vi0…vim| of a non-empty subset
of {v0, v1, …, vn} (and it is proper if the subset is not the entire set)
⌧

Simplicial
13
Direction of simplicial:
The same direction with permutation <i0i1…in>
1-simplex
2-simplex
3-simplex

Simplicial complex
14
Definition:
A simplicial complex is a finite collection of simplifies K such that
(1) If 2 K and for all face ⌧ then ⌧ 2 K
(2) If , ⌧ 2 K and ⌧ 6= ? then ⌧ and ⌧ ⌧
The maximum dimension of simplex in K is the dimension of K
K2 = {|v0v1v2|, |v0v1|, |v0v2|, |v1v2|, |v0|, |v1|, |v2|}
K = K2 [ {|v3v4|, |v3|, |v4|}
NOT YES

Simplicial complexes
15
Hemoglobin
simplicial complex

✤ Let be a covering of
Nerve
16
= {Bi|i = 1, ..., m} X = [m
i=1Bi
✤ The nerve of is a simplicial complex N( ) = (V, ⌃)

Nerve theorem
17
✤ If is covered by a collection of convex closed
sets then X and are
homotopy equivalent
X ⊂ RN
= {Bi|i = 1, ..., m} N( )

Cech complex
18
P = {xi 2 RN
|i = 1, ..., m}
Br(xi) = {x 2 RN
| ||x xi||  r}
✤ The Cech complex C(P, r) is the nerve of
✤
= {Br(xi)| xi 2 P}
✤ From nerve theorem: C(P, r)
Xr = [m
i=1Br(xi) ' C(P, r)
✤ Filtration
ball with radius r

Cech complex
19
✤ The weighted Cech complex C(P, R) is the nerve of
✤ Computations to check the intersections of balls are not easy
ball with diﬀerent radius= {Bri
(xi)| xi 2 P}
Alpha complex

Voronoi diagrams and Delaunay complex
20
✤ P = {xi 2 RN
|i = 1, ..., m}
Vi = {x 2 RN
| ||x xi||  ||x xj||, j 6= i}
RN
= [m
i=1Vi
Voronoi cell
✤ = {Vi|i = 1, ..., m}
D(P) = N( )
Voronoi decomposition
Delaunay complex

General position
21
✤ is in a general position, if there is no
✤ If all combination of N+2 points in P is in a general
position, then P is in a general position
x1, ..., xN+2 2 RN
x 2 RN
s.t.||x x1|| = ... = ||x xN+2||
✤ If P is in a general position then
The dimensions of Delaunay simplexes <= N
Geometric representation of D(P) can be
embedded in RN

Alpha complex
22
✤
✤
✤ The alpha complex is the nerve of
↵(P, r) = N( )
✤ From Nerve theorem:
Xr ' ↵(P, r)

Alpha complex
23
✤
✤
✤ The weighted alpha complex is deﬁned
with different radius
if P is in a general position
ﬁltration of alpha complexes

Alpha complex
24
✤ Computations are much easier than Cech complexes
✤ Software: CGAL
• Construct alpha complexes of points clouds data in RN with
N <= 3
Filtration of alpha complex

Outline

Deﬁnition of holes
26
Simplicial
complex
Chain
complex
Homology 
group
Algebraic Holes
Geometrical
object
Algebraic
object
3 - Deﬁnition of Holes

What is hole?
27
✤ 1-dimensional hole: ring
not ring have ring
boundary
without
ring
without
boundary
Ring =  
1-dimensional graph without boundary?
However, NOT
1-dimensional graph without  
boundary but is 2-dimensional graph
’s boundary
Ring = 1-dimensional graph without boundary and is not boundary
of 2-dimensional graph

What is hole?
28
✤ 2-dimensional hole: cavity
not cavity have cavity
boundary
without
cavity
without
boundary
However, NOT
2-dimensional graph without  
boundary but is 3-dimensional graph
’s boundary
Cavity = 2-dimensional graph without boundary and is not boundary
of 3-dimensional graph
Cavity =  
2-dimensional graph without boundary?

Hole and boundary
29
q-dimensional hole
q-dimensional graph without boundary and
is not boundary of (q+1)-dimensional graph=
We try to make it clear by “Algebraic” language

Chain complexes
30
Let K be a simplicial complex with dimension n. The group of q-
chains is deﬁned as below:
The element of Cq(K) is called q chain.
Deﬁnition:
Cq(K) := {
X
↵i
⌦
vi0
...viq
↵
|↵i 2 R,
⌦
vi0
...viq
↵
: q simplicial in K}
0  q  nif
Cq(K) := 0, if q < 0 or q > n

Boundary
31
Boundary of a q-simplex is the sum of its (q-1)-dimensional faces.
Deﬁnition:
vil is omitted
@|v0v1v2| := |v0v1| + |v1v2| + |v0v2|

Boundary
32
Fundamental lemma
@q 1 @q = 0
@2 @1
For q = 2
In general
• For a q - simplex τ, the boundary ∂qτ, consists of all (q-1) faces of τ.
• Every (q-2)-face of τ belongs to exactly two (q-1)-faces, with diﬀerent direction
@q 1@q⌧ = 0

Hole and boundary
33
q-dimensional hole
q-dimensional graph without boundary and is
not boundary of (q+1)-dimensional graph
(1)
(2)
(1)
(2)
:= ker @q
:= im@q+1
(cycles group)
(boundary group)
Bq(K) ⇢ Zq(K) ⇢ Cq(K)
@q @q+1 = 0

Hole and boundary
34
q-dimensional hole
q-dimensional graph without boundary and is
not boundary of (q+1)-dimensional graph
(1)
(2)
Elements in Zq(K) remain after make Bq(K) become zero
This operator is deﬁned as Q
=
:= ker @q := im@q+1
Q(z0
) = Q(z) + Q(b) = Q(z)
(z and z’ are equivalent in
with respect to )
q-dimensional hole = an equivalence
class of vectors
ker @q
im @q+1
For z0
= z + b, z, z0
2 ker @q, b 2 im @q+1

Homology group
35
Homology groups
The qth
Homology Group Hq is deﬁned as Hq = Ker@q/Im@q+1
= {z + Im@q+1 | z 2 Ker@q } = {[z]|z 2 Ker@q}
Divided in groups with operator [z] + [z’] = [z + z’]
Betti Numbers
The qth
Betti Number is deﬁned as the dimension of Hq
bq = dim(Hq)
H0(K): connected component H1(K): ring H2(K): cavity

Computing Homology
36
v0
v1 v2
v3
All vectors in the column space of Ker@0 are equivalent with respect to Im@1
b0 = dim(H0) = 1
Im@2 has only the zero vector
b1 = dim(H1) = 1
H1 = { (|v0v1| + |v1v2| + |v2v3| + |v3v0|)}

Computing Homology
37
v0
v1 v2
v3
H1 = { (hv0v1i + hv1v2i + hv2v3i hv0v3i)}
All vectors in the column space of Ker@0 are equivalent with respect to Im@1
b0 = dim(H0) = 1
Im@2 has only the zero vector
b1 = dim(H1) = 1

Outline

Persistent Homology
Persistent homology 39
✤ Consider ﬁltration of ﬁnite type
K : K0
⇢ K1
⇢ ... ⇢ Kt
⇢ ...
9 ⇥ s.t. Kj
= K⇥
, 8j ⇥
✤ : total simplicial complexK = [t 0Kt
Kk
Kt
k
T( ) = t 2 Kt
Kt 1
: all k-simplexes in K
: all k-simplexes in K at time t
: birth time of the simplex
time
Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Persistent Homology
40
✤ Z2 - vector space
✤ Z2[x] - graded module
✤ Inclusion map
✤ is a free Z2[x] module with the baseCk(K)
Persistent homology Slide source: http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf

Persistent Homology
41
✤ Boundary map
✤ From the graded structure
✤ Persistent homology
(graded homomorphism)
face of σ

Persistent Homology
42
✤ From the structure theorem of Z2[x] (PID)
✤ Persistent interval
✤ Persistent diagram
Ii(b): inf of Ii, Ii(d): sup of Ii

Persistent Homology
43
birth time
death time
✤ “Hole” appears close to the
diagonal may be the “noise”
✤ “Hole” appears far to the
diagonal may be the “noise”
✤ Detect the “structure hole”

Outline
see more at part2 of tutorial

Applications
5 - Some of applications 45
• Persistence to Protein compressibility
Marcio Gameiro et. al. (Japan J. Indust. Appl. Math (2015) 32:1-17)

Protein Structure
Persistence to protein compressibility 46
amino acid 1 amino acid 2
3-dim structure of hemoglobin
1-dim structure of protein
folding
peptide bond

Protein Structure
✤ Van der Waals radius of an atom
H: 1.2, C: 1.7, N: 1.55 (A0)
O: 1.52, S: 1.8, P: 1.8 (A0)
Van der Waals ball model of hemoglobin

Alpha Complex for Protein Modeling
✤
✤
✤
: position of atoms
: radius of i-th atom
: weighted Voronoi Decomposition
: power distance
: ball with radius ri

Alpha Complex for Protein Modeling
✤
✤
✤
Alpha complex nerve
k - simplex
Nerve lemma
Changing radius
to form a ﬁltration (by w)

Topology of Ovalbumin
birth time
deathtime
birth time
deathtime
1st betti
plot
2nd betti
plot
PD1 PD2

Compressibility
3-dim structureFunctionality
Softness
Compressibility
Experiments Quantification
Persistence diagrams
(Difficult)
…..…..
Select generators and fitting parameters
with experimental compressibility
holes

Denoising
birth time
deathtime
✤ Topological noise
✤ Non-robust topological features depend on a status of
fluctuations
✤ The quantification should not be dependent on a
status of fluctuations

Holes with Sparse or Dense Boundary
✤ A sparse hole structure is deformable to a much larger
extent than the dense hole → greater compressibility
✤ Eﬀective sparse holes
: van der Waals ball
: enlarged ball
birth time
deathtime

# of generators v.s. compressibility
# of generators v.s. compressibility
Topological Measurement Cp
Compressibility

Applications
• Persistence to Phylogenetic Trees

Protein Phylogenetic Tree
Persistence to Phylogenetic Trees 56
✤ Phylogenetic tree is defined by a distance matrix for a
set of species (human, dog, frog, fish,…)
✤ The distance matrix is calculated by a score function
based on similarity of amino acid sequences
amino acid sequences
fish hemoglobin
frog hemoglobin
human hemoglobin
distance matrix of
hemoglobin
fish
frog
human
dog

Persistence Distance and Classiﬁcation of Proteins
✤ The score function based on amnio acid sequences does not
contain information of 3-dim structure of proteins
✤ Wasserstein distance (of degree p)
Cohen-Steiner, Edelsbrunner, Harer, and Mileyko, FCM, 2010
on persistence diagrams reﬂects similarity of persistence
diagram (3-dim structures) of proteins

Persistence Distance and Classiﬁcation of Proteins
birth time
deathtime
birth time
birth time
deathtime
deathtimeWasserstein distance
Bijection

Distance between persistence diagrams
Persistence of sub level sets
Stability Theorem (Cohen-Steiner et al., 2010)
birth time
deathtime

Phylogenetic Tree by Persistence
✤ Apply the distance on persistence diagrams to classify
proteins
Persistence diagram used the noise band same as
in the computations of compressibility
3DHT
3D1A
1QPW
3LQD
1FAW
1C40
2FZB

Future work
✤ Principle to de-noise fluctuations in persistence diagrams (NMR
experiments)
✤ Finding minimum generators to identify specific regions in a
protein (e.g., a region inducing high compressibility, hereditarily
important regions)
✤ Zigzag persistence for robust topological features among a
specific group of proteins (quiver representation)
✤ Multi-dimensional persistence (PID → Grobner basic)

Applications more in part … of tutorials
✤ Robotics
✤ Computer Visions
✤ Sensor network
✤ Concurrency & database
✤ Visualization
Prof. Robert Ghrist
Department of Mathematics
University of Pennsylvania
One of pioneers in applications
Michael Farber Edelsbrunner
Mischaikow Gaucher Bubenik
Zomorodian
Carlsson

Software
• Alpha complex by CGAL
http://www.cgal.org/
• Persistence diagrams by Perseus (coded by Vidit Nanda)
http://www.sas.upenn.edu/~vnanda/perseus/index.html
http://chomp.rutgers.edu/Project.html
• CHomP project

Reference link
Topological Measurement of Protein Compressibility 64
✤ Original paper
✤ Author slides
http://www2.math.kyushu-u.ac.jp/~hiraoka/protein_homology.pdf
http://www.sas.upenn.edu/~vnanda/source/compressibility-ﬁnal.pdf
✤ Books (very good)
- (Japaneses) タンパク質構造とトポロジーパーシステントホモロジー群入
門平岡裕章
- (English) Computational Topology - An Introduction, Herbert Edelsbrunner, John
L. Harer

013_20160328_Topological_Measurement_Of_Protein_Compressibility

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to 013_20160328_Topological_Measurement_Of_Protein_Compressibility

Similar to 013_20160328_Topological_Measurement_Of_Protein_Compressibility (20)

More from Ha Phuong

More from Ha Phuong (7)

Recently uploaded

Recently uploaded (20)

013_20160328_Topological_Measurement_Of_Protein_Compressibility