A family of global protein shape descriptors using gauss integrals, christian laing

The Florida State University
College of Arts and Sciences

A Family of Global Protein Shape
Descriptors Using Gauss Integrals

By

Christian Edgar Laing Celestino

A proposal submitted to the Department
of Mathematics in partial fulfillment of
the doctoral preliminary examination

April 30, 2004

Table of Contents

Abstract …………………………………………… 2

1 Background and Significance …………………….. 4

1.1 CATH Protein Structure Classification …………………… 4
1.2 Current Methods and Importance of a New Approach ……. 5
1.3 The Writhing Number ….………………………………….. 7
1.3.1 Directional Writhing Number ….…………………….. 8
1.3.2 Natural Notion of the Writhing Number for Polygonal
Curves …………………………………………………….. 10
1.4 Representing Proteins in R 20 ……………………………… 11
1.4.1 Results of the SGM when Tested for CATH 2.4 ……. 12

2 The Experimental Plan ……………………………. 14

2.1 Purpose and Objectives …………………………………… 14
2.2 Procedures ………………………………………………… 15

References ………………………………………… 17

1

Abstract

Within the field of biology, comparison, description and prediction of biological
structures is an important task. In the case of proteins, it is of great interest to characterize
and therefore classify these three dimensional structures. Protein structures can be
classified in a variety of interrelated ways such as functional similarity, evolutionary
similarity, and fold similarity. Two similar proteins can have different sequence
information, but comparison of protein structures can show their distant evolutionary
relationships that would not be evident by sequence information alone. Proteins also have
three-dimensional structures that provide clues to their function in living organisms.

Protein classification focuses on identifying proteins that have similar chemical
architectures and topology. Because it is not practical to study in detail all the protein
structures in every genome, the functional role of a new protein in the cell can be inferred
from an already classified protein with similar structure. This is why it is important to
develop new methods for 3D structures classification of proteins.

Today, there is a great amount of protein information obtained from experimental
methods such as X-ray Crystallography (1) and NMR Spectroscopy (2). The data is
deposited into a resource of public domain, such as the Protein Data Bank (3). Structural
information about proteins such as CATH (Class, Architecture, Topology and
Homologous Superfamily, see 4-5) and SCOP (Structural Classification of Proteins, see
6) is also available in databases. However, some of the methods of classification are done
by manual inspection. Because of the rapid increase in the number of known proteins (as
of April 2004, 25,004 and growing by >450 per month (3)), a fully automatic method
(using solely computer algorithms) is required.

Currently there are several computer methods for structural comparison of
proteins (7). Examples of these are CE (8), DALI (9), KENOBI (10), and STRUCTAL
(11). Such methods are also in the public domain and in some cases the program itself is
available for download. These structural comparison methods are based on computing a
pairwise distance between the alpha carbon atoms of the protein, but such methods
present several complications. First, these methods are high in computational cost
because they require alignment between two molecules in order to compare poteins.
Additionally, the measures that are used violate the triangle inequality
( d ( x, z ) ≤ d ( x, y ) + d ( y, z ) ). Consequently these computations have little meaning for
proteins with large distance, that is, when their structures similarities are far apart.
Because of these complications, the need of a better and different approach is required.

Peter Rogen and Boris Fain in the group of Michael Levitt at Stanford University,
have developed a new automatic classification of proteins using Gauss integrals. A vector

2

of 20 numbers inspired by Vassiliev knot invariants to capture the topology of a protein
(12), (13). Multiple combinations of a geometrical tool called “writhing number” gives
these 20 numbers.

This work is still in progress and it has shown good results when it was tested on
a protein database known as CATH 2.4, correctly classifying 98.6% of the protein crystal
structure data used.

The authors leave an interesting point open (12): “While we have geometric
interpretation of the writhing number we would like to understand the other generalized
Gauss integrals used in this work”. We intend to investigate and answer this question.

3

1. Background and Significance

Proteomics is the study of the full set of proteins encoded by a genome, and
Structural Proteomics is a sub-area of Proteomics that studies the structure of proteins. So
far, many genomes have been fully sequenced, including Yeast, Drosophila Melanogaster
and Homo Sapiens. The full value of the sequence data will be realized when we assign
the role of each protein in the cell, and this require a full set of tools for classification of
proteins, computer databases like CATH, and sequence methods like DALI for example.

1.1 CATH Protein Structure Classification

CATH is a hierarchical classification of protein domain structures in the Protein
Data Bank (3) which clusters proteins at four levels: Class (C), Architecture (A),
Topology (T) and Homologous Superfamily (H).

Such classification operates at the level of structural domains, as these domains
are likely to be the fundamental evolutionary building blocks or units (5). When a protein
has a similarity to another protein already in the database, then the new protein inherits
the domain boundaries of the existing entry. If the new protein has no relative in the
CATH database, three different algorithms (DETECTIVE, PUU and DOMAK) are used
to identify the structural domain automatically. If all the programs agree, the domain
boundaries are assigned. If not, then the domain boundaries are assigned manually based
on the rules below (see also 14). The four levels of CATH are described and figure 1
shows the hierarchy for the C, A, and T levels. References for CATH can be found in (4),
(5) and (14).

• Class C level is assigned by considering the secondary structure and packing
within the structure. Four classes are recognized: mainly alpha, mainly beta,
alpha-beta and the fourth class, which contains protein domains that have low
secondary structure and content. The correspondence of a protein to its class is of
more than 90% of protein structures are classified automatically, the rest are
determined by hand.

• Architecture A level describes the overall shape of secondary structures in three-
dimensional space but ignores their connectivity. Although an automatic
procedure is being developed, it is currently assigned manually using a basic
description of the secondary arrangements (e.g. roll, sandwich).

• Topology T level groups structures into fold families depending on the shape and
connectivity of the secondary structures. This fold group is also related to protein
domains that show a similarity in structure but have no sequence similarity. The
assignments are made by sequence and structure comparison (a SSAP score
greater than 70 is required) (5).

4

• Homologous Superfamily H level groups into domains that are thought to share
a common ancestor (homologous families) for either having sequence similarity
(35%) or high structural similarity (20%). Structural similarity is done by an
automatic method (SSAP>80).

Figure 1. Hierarchy of CATH at C, A and T levels. From reference (4).

1.2 Current Methods and Importance of a New Approach

In order to find similarity between 3D protein structures in the crystal state,
scientists have built a wide variety of protein structure alignment methods and techniques
such as distance matrix alignment (9), genetic algorithms (10) and double dynamic
programming (11). The general idea is to consider the protein backbone of two proteins
as two chains, A and B in the three dimensional space, and to find sub-chains α and β
of A and B respectively, such that the lengths of the sub chains α and β are equal and
maximal with the property that α and β are similar (see figure 2).

5

The most common parameter that expresses the difference between two proteins
is RMSD or root mean square deviation. RMSD can be computed using the position of
the alpha carbon atoms of the protein backbone and is a function of the distance between
atoms in one structure and the same atoms in another structure.

Because of the nature of these methods, we encounter some complications:

• A protein structure can contain several hundreds of atoms, therefore finding such
alignments may be high in computational cost. A structural comparison method
needs to be fast.

• As discussed in the introduction, these methods fail to satisfy the triangle
inequality. Indeed, if we consider three proteins made of the following sequences:
protein A=DEF-LMN, protein B=GHI-LMN and protein C=GHI-OPQ. Then
there is a similarity between protein A and B in the LMN region, and also there is
a similarity between protein B and C in the GHI region. However, we cannot infer
a similarity between A and C (see figure 3). The triangle inequality is violated
because it does not satisfy d ( A, C ) ≤ d ( A, B) + d ( B, C ) . When this occurs, we are
unable to judge dissimilarity and the problem worsens with increasing distance.

• In order to compute such measures, the methods require a series of adjustable
parameters such as gap and insertion penalties, weights, etc.

Figure 2. Two chains in three Figure 3. Failure of triangle inequality
dimensional space. From reference (12).

These complications lead to the search of a better, more efficient and fully
automated method. The protein backbone is a space curve, and mathematicians study
such curves in areas such as Knot Theory and Differential Geometry, we wish to apply
these mathematical techniques to the protein classification problem.

6

1.3 The Writhing Number

We start with the concepts of linking number and the twist. These two numbers,
together with the writhing number are all related in a simple formula. These concepts
were obtained from (15) and (16).

A strip (C,U) is a smooth1 curve C together with a smoothly varying unit vector
U(t) perpendicular to C at each point.

Definition 1. If C1 (t1 ) and C2 (t2 ) are two disjoint oriented closed curves in space
parametrized by [0,1], the linking number is defined by the integral

1 (C1 (t1 ) − C 2 (t 2 )) ⋅ (∂C1 / ∂t1 × ∂C 2 / ∂t 2 )
Lk (C1 , C 2 ) =
4π ∫∫
C1 C2 | C1 (t1 ) − C 2 (t 2 ) |3
dt1 dt 2

The linking number is an integer that measures the entanglement between two
curves. Examples of the linking number are shown on figure 4 below, notice that figure
4c shows an example of two curves that are entangled, however the linking number is
zero.

Figure 4. From reference (16).

For any simple closed strip, the curves C + εU given parametrically C (t ) + εU (t )
are, for sufficiently small ε > 0 , simple closed curves disjoint from C, and the linking
number Lk(C, C + εU ) is defined and independent of ε . The vectors C ' (t), U(t) and
V (t ) = C ' (t ) × U (t ) define a moving frame (C ' ,U ,V ) along C. Let Ω denote the angular

1
A curve C is smooth if is infinitely differentiable.

7

velocity vector describing the rate of rotation of the frame with respect to the arclength t,
so that c' = Ω × C ' , µ = Ω × U and ν = Ω × V . Let ω1 , ω 2 and ω 3 be the components of

Ω referred to the moving frame, i. e., Ω = ω1C '+ω 2U + ω 3V . Then ω1 represents the
angular rate at which U revolves around C. ω1 is called the twist of the strip at each point
of the curve.

Definition 2. The total twist number Tw(C,U), is defined by the integral of ω1 with
respect to the arclength t over the curve C and divided by 2π . That is
1
2π ∫
Tw(C ,U ) = ω1 dt . The total twist number need not be an integer and if the curve C is
a simple plane curve then the linking number Lk (C , C + U ) and the total twist number
Tw(C,U) are equal.

Definition 3. The difference Wr (C ) = Lk (C , C + U ) − Tw(C ,U ) is a geometric invariant
of the curve C and is called the writhing number.

1.3.1 Directional Writhing Number

Definition 4. A smooth simple closed curve C and a fixed unit vector σ are said to be in
general position if the tangents to C are never parallel to σ . In this case the curves
C + εσ are disjoint from C for all sufficiently small ε > 0 , hence for such ε we can
may define the directional writhing number of C in the direction of σ by
Wr (C , σ ) = Lk (C , C + εσ ) .

If C and σ are in general position, the orthogonal projection of C onto a plane
with normal σ defines a smooth closed plane curve Cσ for which undercrossings and
overcrossings can be distinguished at each crossing point (see figure 5 below). At a
crossing point c of an oriented regular diagram for a curve, we have two possible
configurations. Either sign(c)=+1 or sign(c)= – 1 as shown on figure 5. The sign of a
crossing number is based on the right hand rule convention.

Figure 5.
If one adds all the signed crossing numbers for a fixed regular projection of a
curve for a direction σ , one obtains the directional writhing number Wr (C , σ ) . The

8

writhing number Wr of a curve C is equal to the average of the directional writhing
number over all projections, the average is taken with respect to the area on the unit
sphere.

Figure 6 shows examples of regular projections of two knots, for the oriented
projection of the trefoil knot (left) we have the projected writhing number is 3 while for
the oriented projection of the figure eight knot (right), is 0.

Figure 6

The writhing number Wr of a closed space curve γ can be calculated using
generalized Gauss integrals.
1
4π γ ×∫∫D
Wr (γ ) = w(t1 , t 2 )dt1 dt 2 ,
γ

where
[γ ' (t1 ), γ (t1 ) − γ (t 2 ), γ ' (t 2 )]
w(t1 , t 2 ) =
| γ (t1 ) − γ (t 2 ) |3

and D is the diagonal of γ × γ . The numerator of w(t1 , t 2 ) is the triple scalar product,
[γ ' (t1 ), γ (t1 ) − γ (t 2 ), γ ' (t 2 )] = γ ' (t1 ) ⋅ {[γ (t1 ) − γ (t 2 )] × γ ' (t 2 )} . The triple scalar product is
also equal to the oriented volume of the parallelepiped spanned by γ ' (t1 ), γ (t1 ) − γ (t 2 ) ,
and γ ' (t 2 ) . Thus w(t1 , t 2 ) = w(t 2 , t1 ) . Assuming that γ is parametrized by [0,1] it suffices
to calculate the integral on the domain ∆2 = {(t1 , t 2 );0 < t1 < t 2 < 1} . If
I (1, 2 ) = ∫ w(t1 , t 2 )dt1 dt 2 then:
∆2
1
Wr (γ ) = I (1, 2 )
2π

Another measure for curves is the average crossing number and is defined by
taking the absolute value of the integrand:

I |1, 2| (γ ) = ∫ | w(t1 , t 2 ) | dt1 dt 2
∆2

9

The main difference between the projection of a knot and space curves
(representing protein backbones) is that for knots we deal with simple closed curves,
while for protein backbones we have polygonal curves which are not closed.

1.3.2 Natural Notion of the Writhing Number for Polygonal Curves

For a polygonal curve the natural definition of writhing number is:

I (1, 2) (γ ) = Wr (γ ) = ∑W (i , i
0< i1
1 2 ),
< i2 < N

with
i1 +1 i2 +1
1
W (i1 , i2 ) =
2π ∫ ∫ w(t , t
t1 =i1 t 2 =i2
1 2 )dt1 dt 2 .

and w(t1 , t 2 ) = [γ ' (t1 ), γ (t1 ) − γ (t 2 ), γ ' (t 2 )] / | γ (t1 ) − γ (t 2 ) |3 .

Here W (i1 , i2 ) is the contribution to the writhing number coming from the i1 th
and the i2 th line segments. W (i1 , i2 ) is equal to the probability from an arbitrary
direction to see the i1 th and the i2 th line segment cross, multiplied by the sign of this
crossing. Thus, geometrically this notion of writhe number is still the projected writhing
number averaged over all projections.

By combining this number we can make a whole set of structural measures, e.g.

I |1, 2| (γ ) = ∑ | W (i , i
0<i1
1 2 ) |,
< i2 < N

I |1,3|( 2, 4 ) (γ ) = ∑ | W (i , i ) | W (i , i
0<i1 <i2
1 3 2 4 ),
<i3 <i4 < N

I |1,5|( 2, 4 )(3,6 ) (γ ) = ∑ | W (i , i ) | W (i , i
0<i1 <i2 < i3
1 5 2 4 )W (i3 , i6 )
<i4 <i5 <i6 < N

where N is the number of vertices of the polygonal curve.

Numbers like the ones just mentioned will constitute the building blocks for our protein
domain descriptors, which described in the next section.

10

1.4 Representing Proteins in R 20

As mentioned before, the protein backbone is a space curve (see figure 7 below).
We are interested in the absolute measures of the geometry of these curves by studying
the self-crossings seen in a planar projection. These measures are inspired by generalized
Gauss integrals involved in formulas for the Vassiliev knot invariants.

Figure 7. Backbone curve of Lysozyme from Gallus Gallus, from (3).

For each protein domain on CATH 2.4, we have a geometric invariant of the polygonal
curve connecting the α -carbon atoms. Each domain is assigned a 20-dimensional vector
containing the measures described by the following:

I (1, 2) , I |1, 2| , I (1,3)( 2, 4) , I (1, 2)(3, 4) , I (1, 4)( 2,3) , I (1, 2)(3, 4)(5,6) , I (1, 2)(3,5)( 4,6) , I (1, 2)(3,6)( 4,5) , I (1,3)( 2, 4)(5, 6) ,
I (1,3)( 2,5)( 4,6) , I (1,3)( 2,6)( 4,5) , I (1, 4)( 2,3)(5,6) , I (1, 4)( 2,5)(3, 6) , I (1, 4)( 2,6)(3,5) , I (1,5)( 2,3)( 4,6) , I (1,5)( 2, 4)(3,6) ,
I (1,5)( 2,6)(3, 4) , I (1,6)( 2,3)( 4,5) , I (1,6)( 2, 4)(3,5) , and I (1,6)( 2,5)(3, 4) .

The measures are normalized such that each value is between –1 and 1. The
normalization factors are one over 146, 1277, 119, 101 023, 1206, 477 989, 6612, 23 946,
6448, 203, 1884, 54 581, 172, 258, 1246, 293, 1396, 36 143, 442, and 2468 respectively
for the measures in the order above.

Once each protein chain is mapped onto a point in the 20-dimensional space, the
usual euclidean metric is used to compare the protein chains.

11

20
d ( x, y ) = ∑ (x
i =1
i − yi ) 2

Based on the scaled factors described given above, this metric is called the Scaled
Gauss Metric (SGM).

1.4.1 Results of the SGM when Tested for CATH 2.4

Let x, y and z be points in R 20 , then the Scaled Gauss Metric satisfies the three
properties for pseudometric:

i) d ( x, y ) = 0 if x=y
ii) d ( x, y ) = d ( y, x) (symmetry)
iii) d ( x, z ) ≤ d ( x, y ) + d ( y, z ) (triangle inequality).

The fact that SGM satisfies the triangle inequality is important because it allows
us to judge dissimilarity between proteins.

A computer algorithm (12,13,17) based on this metric was made to classify the
domains of all 20,937 of CATH 2.4 domains as of September 2002. The total success rate
was 98.6%. The remaining 1.4% of the chains are unknown; of these, 0.9% are actually
new folds. It presented no mistakes since unknown structures were flagged instead of
misclassifying. Also proteins of different sizes can be compared directly without use of
alignment or gap penalties. The figure 8 shows a projection map from R 20 to R 2 , and it
shows the CATH hierarchy. Here, every point represents a protein domain in CATH.

As described by the authors (12), the rectangle in the upper left contains all the
chains in CATH, colored according to their class ( α , β , αβ and few secondary
structures), notice that the αβ group resides between the α and the β groups. This
observation shows the congruence that exists between the automatic classification created
by the SGM and the CATH database assignation currently given.

Figure 9 shows the usefulness of the second order invariants. In this example the
curves A and B posses the same crossing number and average crossing number.
However the second order invariants can differentiate between the two curves.

12



13

2. The Experimental Plan

2.1 Purpose and Objectives

The excellent results of the SGM shown in the previous section are elegant, fast,
computationally viable, and motivate one to understand the true geometric meaning of
such measures.

As it was mentioned before, the geometric idea of all these measures is still not
fully understood (12-13). While there is a geometric interpretation of the writhing
number ( I (1, 2) ) and the average crossing number ( I |1, 2| ), the meaning of the higher order
measures is still a mystery. Another important question worth investigating is to
determine if it is possible to classify protein structure domains with less Gauss measures
(described in 1.4), if some of the measures are strongly correlated or provide more
information and it will be possible to improve the combinations used. Finally, it might be
plausible to apply this method to classification of RNA secondary structures.

During this research proposal I intend, with the support of my advisor, De Witt
Sumners, to complete the following objectives:

I) Determine the geometric meaning of the higher order invariants obtained
from the Gauss integral measures. Such work will validate the importance
of the role of these numbers and corroborate the excellent results obtained
from experimental evidence.

II) Optimize the choice of the invariant numbers used to classify the protein
structures. This will allow an increase of the speed and efficiency of the
computer algorithms to classify the protein structures by selecting the best
shape descriptors, and the minimum quantity necessary of such
descriptors.

III) Study the mathematical idea involved in these numbers and the possible
applications to branches of mathematics such as Knot Theory and
Differential Geometry.

IV) Explore the possibility of application of these methods to the classification
of RNA secondary structures. Since an RNA secondary structure can be
seen as a chain or a polygonal curve, an approach to this unexplored topic
could result in promising and new applications of mathematics in biology.

The research questions are as follows:

14

Are the numbers obtained by using the higher order writhe calculations truly shape
descriptors of space curves? Or, are they just numbers chosen by chance, that work only
for very particular curves?

The answer to these questions will unveil the true geometric meaning of these higher
order invariants. This is fundamental to validate the automatic classification computer
method for novel protein structure domains.

2.2 Procedures

The research will be based on mathematics and on biology as described below.

To begin with, we consider a review of the old literature related to the writhing
number such as the work by J. H. White (18), G. Gălugăreanu (19), and Brock Fuller (15-
16) ,as well as the new literature that focus also on the concept of writhing number for
open and closed curves (20-28). A study on the proof and the methods for solving the
primary cases would provide clues for solving the general case for the higher order
invariants.

Another fundamental source of information is to review current computer algorithms
designed to calculate the writhing number particularly applied to fields such as biology
and physics (27). Some of these computer algorithms are in the public domain and can be
downloaded (28).

An algorithm to compute the writhing number is essential to understand and to verify
the geometric ideas. Using Monte Carlo simulations, we intend to estimate the write
number of a polygonal curve of n in the simple cubic lattice. The advantage of using a
simple cubic lattice is that for a closed curve, the problem reduces the writhing number
computation to the average of the linking number of the given curve with four of its
pushoffs (24). The next step would be to study the higher order invariants on this simple
cubic lattice.

To verify the data on simulation results we would like to consider some examples.
We will first consider simple cases where we know the answer and then we will apply
these methods for a polygonal curve describing the backbone of some protein crystals.
Such data can be obtained from the Protein Data Bank (3).

Finally, we would like to apply this method to RNA secondary structures. A
ribonucleic acid (RNA) molecule consists of a chain of ribonucleotides linked together by
covalent chemical bonds (29). Figure 10 shows a model of an RNA structure obtained
from the Protein Data Bank. We notice that RNA structures, like on the figure 10, can be
seen as a chain that bends and twines about itself. Such self-crossings are of particular
interest because the Gauss measures, designed to describe the shape of proteins, can be
applied to these chains.

15

With these approaches we expect to understand the geometric meaning of these higher
order invariants.

Figure 10. Pseudoknot within the gene 32 messenger RNA
of Bacteriophage T2. Image obtained by Protein Data Bank (3).

16

References

(1) Gale Rhodes. Crystallography: Made Crystals Clear. Academic Press, 2000,
Second Edition.

(2) Joseph P. Hornak. The Basics of NMR. <http://www.cis.rit.edu/htbooks/nmr/>.

(3) Protein Databank, Available from <http://beta.rcsb.org/pdb/>.

(4) CATH Protein Structure Classification <http://www.biochem.ucl.ac.uk/bsm/cath/>.

(5) Pearl, F. M. G. Lee, D., Bray, J. E. Sillitoe, I., Todd, A. E., Harrison, A. P.,
Thornton, J. M. and Orengo, C. A. Assigning Genomic Sequences to CATH.
Nucleic Acids Research. 2000, Vol 28. No 1. 277-282.

(6) Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP : A Structural
Classification of Proteins Database for the Investigation of Sequences and
Structures. J. Mol. Biol. 1995, 247:536-540.

(7) Patrice Koehl, Protein Structure Similarities. Curr. Opin. Struct. Biol. 2001,
11:348-353.

(8) CE Combinatorial Extension <http://cl.sdsc.edu>, available to download from
<ftp://ftp.sdsc.edu/pub/sdsc/biology/CE/src>.

(9) DALI Distance Matrix Alignment <http://www2.ebi.ac.uk/dali>, available to
download from <http://jura.ebi.ac.uk:8765/~holm/DaliLite>.

(10) KENOBI Alignment Using a Genetic Algorithm
<http://sullivan.bu.edu/kenobi>, available to download from
<http://www.columbia.edu/~ay1>.

(11) STRUCTAL Double Dynamic Programming
<http://bioinfo.mbb.yale.edu/align/server.cgi>.

(12) Peter Rogen, Boris Fain. Automatic Classification of Protein Structure by Using
Gauss Integrals. PNAS, Vol 100 (2003), no.1, 119-124.

(13) Peter Rogen, Henrik Bohr. A New Family of Global Protein Shape Descriptors.
Math Biosc 182 (2003), 167-181.

(14) Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B., and
Thornton, J. M. CATH- A Hierarchy Classification of Protein Domain Structures.
Structure. Vol 5 (1997), No 8. 1093-1108.

17

(15) F. Brock Fuller, The Writhing Number of a Space Curve. Proc. Nat. Acad. Sci.
USA, Vol. 68, No. 4 (1971), 815-819.

(16) F. Brock Fuller, Mathematical Problems in the Biological Sciences, Proceedings of
Symposia in Applied Mathematics, ed. R. E. Bellman (American Mathematical
Society, Providence) Vol. 14 (1962), 64-68.

(17) Peter Rogen, Robert Sinclair. Computing a New Family of Shape Descriptors for
Protein Structures. J. Chem. Inf. Comput. Sci. 43 (2003), 1740-1747.

(18) White J. H., Self-Linking and the Gauss Integral in HigherDimensions. Am. J.
Math. 91 (1969), 693-727

(19) G. Gălugăreanu, Sur les Classes D’isotope des Noeuds Tridimensionnels et Leur
Invariants, Czechoslovak Mathematical Journal 11 (1961), 588-625.

(20) Lin, X-S, Wang, Z. Integral Geometry of Plane Curves and Knot Invariants. J.
Differ. Geom. 44 (1996), 74-95.

(21) Yu. Aminov, Differential Geometry and Topology of Curves, Gordon and Breach
Science Publishers (2000).

(22) Eric S. Lander, Michael Waterman, Calculating the Secretes of Life, National
Research Council (1995).

(23) Levitt group Server, <http://www.stanford.edu/~bfain/>.

(24) E. Orlandini, M. C. Tesi, E. J. Janse van Rensburg, D. W. Sumners, S. G.
Whittington, The Writhe of a Self-avoiding Polygon, J. Phys. A: Math. Gen. 26
(1993), 981-986.

(25) E. Orlandini, S. G. Whittington, D. W. Sumners, M. C. Tesi, E. J. Janse van
Rensburg, The Writhe of a Self-avoiding Path, J. Phys. A: Math. Gen. 27 (1994),
333-338.

(26) Meivys Garcia, Emmanuel Ilangko, Stuart G. Whittimgton, The Writhe of Polygons
on the Face-centered Cubic Lattice, Path, J. Phys. A: Math. Gen. 32 (1999), 4593-
4600.

(27) Corinne Cerf, Andrzej Stasiak, A Topological Invariant to Predict the three-
dimensional Writhe of Ideal Configurations of Knots and Links, PNAS Vol. 97
(2000), 3795-3798.

(28) Pankaj K. Agarwal, Herbert Edelsbrunner, Yusu Wang, Computing the Writhing
Number of a Polygonal Knot, SODA, (2002), 791-799.

18

(29) RNA World at IMB Jena: <http://www.imb-jena.de/RNA.html>.

19

A family of global protein shape descriptors using gauss integrals, christian laing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to A family of global protein shape descriptors using gauss integrals, christian laing

Similar to A family of global protein shape descriptors using gauss integrals, christian laing (20)

Recently uploaded

Recently uploaded (20)

A family of global protein shape descriptors using gauss integrals, christian laing