Your SlideShare is downloading. ×
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
A family of global protein shape descriptors using gauss integrals, christian laing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

A family of global protein shape descriptors using gauss integrals, christian laing

692

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
692
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Florida State University College of Arts and Sciences A Family of Global Protein Shape Descriptors Using Gauss Integrals By Christian Edgar Laing CelestinoA proposal submitted to the Departmentof Mathematics in partial fulfillment of the doctoral preliminary examination April 30, 2004
  • 2. Table of Contents Abstract …………………………………………… 21 Background and Significance …………………….. 4 1.1 CATH Protein Structure Classification …………………… 4 1.2 Current Methods and Importance of a New Approach ……. 5 1.3 The Writhing Number ….………………………………….. 7 1.3.1 Directional Writhing Number ….…………………….. 8 1.3.2 Natural Notion of the Writhing Number for Polygonal Curves …………………………………………………….. 10 1.4 Representing Proteins in R 20 ……………………………… 11 1.4.1 Results of the SGM when Tested for CATH 2.4 ……. 122 The Experimental Plan ……………………………. 14 2.1 Purpose and Objectives …………………………………… 14 2.2 Procedures ………………………………………………… 15 References ………………………………………… 17 1
  • 3. Abstract Within the field of biology, comparison, description and prediction of biologicalstructures is an important task. In the case of proteins, it is of great interest to characterizeand therefore classify these three dimensional structures. Protein structures can beclassified in a variety of interrelated ways such as functional similarity, evolutionarysimilarity, and fold similarity. Two similar proteins can have different sequenceinformation, but comparison of protein structures can show their distant evolutionaryrelationships that would not be evident by sequence information alone. Proteins also havethree-dimensional structures that provide clues to their function in living organisms. Protein classification focuses on identifying proteins that have similar chemicalarchitectures and topology. Because it is not practical to study in detail all the proteinstructures in every genome, the functional role of a new protein in the cell can be inferredfrom an already classified protein with similar structure. This is why it is important todevelop new methods for 3D structures classification of proteins. Today, there is a great amount of protein information obtained from experimentalmethods such as X-ray Crystallography (1) and NMR Spectroscopy (2). The data isdeposited into a resource of public domain, such as the Protein Data Bank (3). Structuralinformation about proteins such as CATH (Class, Architecture, Topology andHomologous Superfamily, see 4-5) and SCOP (Structural Classification of Proteins, see6) is also available in databases. However, some of the methods of classification are doneby manual inspection. Because of the rapid increase in the number of known proteins (asof April 2004, 25,004 and growing by >450 per month (3)), a fully automatic method(using solely computer algorithms) is required. Currently there are several computer methods for structural comparison ofproteins (7). Examples of these are CE (8), DALI (9), KENOBI (10), and STRUCTAL(11). Such methods are also in the public domain and in some cases the program itself isavailable for download. These structural comparison methods are based on computing apairwise distance between the alpha carbon atoms of the protein, but such methodspresent several complications. First, these methods are high in computational costbecause they require alignment between two molecules in order to compare poteins.Additionally, the measures that are used violate the triangle inequality( d ( x, z ) ≤ d ( x, y ) + d ( y, z ) ). Consequently these computations have little meaning forproteins with large distance, that is, when their structures similarities are far apart.Because of these complications, the need of a better and different approach is required. Peter Rogen and Boris Fain in the group of Michael Levitt at Stanford University,have developed a new automatic classification of proteins using Gauss integrals. A vector 2
  • 4. of 20 numbers inspired by Vassiliev knot invariants to capture the topology of a protein(12), (13). Multiple combinations of a geometrical tool called “writhing number” givesthese 20 numbers. This work is still in progress and it has shown good results when it was tested ona protein database known as CATH 2.4, correctly classifying 98.6% of the protein crystalstructure data used. The authors leave an interesting point open (12): “While we have geometricinterpretation of the writhing number we would like to understand the other generalizedGauss integrals used in this work”. We intend to investigate and answer this question. 3
  • 5. 1. Background and Significance Proteomics is the study of the full set of proteins encoded by a genome, andStructural Proteomics is a sub-area of Proteomics that studies the structure of proteins. Sofar, many genomes have been fully sequenced, including Yeast, Drosophila Melanogasterand Homo Sapiens. The full value of the sequence data will be realized when we assignthe role of each protein in the cell, and this require a full set of tools for classification ofproteins, computer databases like CATH, and sequence methods like DALI for example. 1.1 CATH Protein Structure Classification CATH is a hierarchical classification of protein domain structures in the ProteinData Bank (3) which clusters proteins at four levels: Class (C), Architecture (A),Topology (T) and Homologous Superfamily (H). Such classification operates at the level of structural domains, as these domainsare likely to be the fundamental evolutionary building blocks or units (5). When a proteinhas a similarity to another protein already in the database, then the new protein inheritsthe domain boundaries of the existing entry. If the new protein has no relative in theCATH database, three different algorithms (DETECTIVE, PUU and DOMAK) are usedto identify the structural domain automatically. If all the programs agree, the domainboundaries are assigned. If not, then the domain boundaries are assigned manually basedon the rules below (see also 14). The four levels of CATH are described and figure 1shows the hierarchy for the C, A, and T levels. References for CATH can be found in (4),(5) and (14). • Class C level is assigned by considering the secondary structure and packing within the structure. Four classes are recognized: mainly alpha, mainly beta, alpha-beta and the fourth class, which contains protein domains that have low secondary structure and content. The correspondence of a protein to its class is of more than 90% of protein structures are classified automatically, the rest are determined by hand. • Architecture A level describes the overall shape of secondary structures in three- dimensional space but ignores their connectivity. Although an automatic procedure is being developed, it is currently assigned manually using a basic description of the secondary arrangements (e.g. roll, sandwich). • Topology T level groups structures into fold families depending on the shape and connectivity of the secondary structures. This fold group is also related to protein domains that show a similarity in structure but have no sequence similarity. The assignments are made by sequence and structure comparison (a SSAP score greater than 70 is required) (5). 4
  • 6. • Homologous Superfamily H level groups into domains that are thought to share a common ancestor (homologous families) for either having sequence similarity (35%) or high structural similarity (20%). Structural similarity is done by an automatic method (SSAP>80). Figure 1. Hierarchy of CATH at C, A and T levels. From reference (4). 1.2 Current Methods and Importance of a New Approach In order to find similarity between 3D protein structures in the crystal state,scientists have built a wide variety of protein structure alignment methods and techniquessuch as distance matrix alignment (9), genetic algorithms (10) and double dynamicprogramming (11). The general idea is to consider the protein backbone of two proteinsas two chains, A and B in the three dimensional space, and to find sub-chains α and βof A and B respectively, such that the lengths of the sub chains α and β are equal andmaximal with the property that α and β are similar (see figure 2). 5
  • 7. The most common parameter that expresses the difference between two proteinsis RMSD or root mean square deviation. RMSD can be computed using the position ofthe alpha carbon atoms of the protein backbone and is a function of the distance betweenatoms in one structure and the same atoms in another structure. Because of the nature of these methods, we encounter some complications: • A protein structure can contain several hundreds of atoms, therefore finding such alignments may be high in computational cost. A structural comparison method needs to be fast. • As discussed in the introduction, these methods fail to satisfy the triangle inequality. Indeed, if we consider three proteins made of the following sequences: protein A=DEF-LMN, protein B=GHI-LMN and protein C=GHI-OPQ. Then there is a similarity between protein A and B in the LMN region, and also there is a similarity between protein B and C in the GHI region. However, we cannot infer a similarity between A and C (see figure 3). The triangle inequality is violated because it does not satisfy d ( A, C ) ≤ d ( A, B) + d ( B, C ) . When this occurs, we are unable to judge dissimilarity and the problem worsens with increasing distance. • In order to compute such measures, the methods require a series of adjustable parameters such as gap and insertion penalties, weights, etc. Figure 2. Two chains in three Figure 3. Failure of triangle inequality dimensional space. From reference (12). These complications lead to the search of a better, more efficient and fullyautomated method. The protein backbone is a space curve, and mathematicians studysuch curves in areas such as Knot Theory and Differential Geometry, we wish to applythese mathematical techniques to the protein classification problem. 6
  • 8. 1.3 The Writhing Number We start with the concepts of linking number and the twist. These two numbers,together with the writhing number are all related in a simple formula. These conceptswere obtained from (15) and (16). A strip (C,U) is a smooth1 curve C together with a smoothly varying unit vectorU(t) perpendicular to C at each point.Definition 1. If C1 (t1 ) and C2 (t2 ) are two disjoint oriented closed curves in spaceparametrized by [0,1], the linking number is defined by the integral 1 (C1 (t1 ) − C 2 (t 2 )) ⋅ (∂C1 / ∂t1 × ∂C 2 / ∂t 2 ) Lk (C1 , C 2 ) = 4π ∫∫ C1 C2 | C1 (t1 ) − C 2 (t 2 ) |3 dt1 dt 2 The linking number is an integer that measures the entanglement between twocurves. Examples of the linking number are shown on figure 4 below, notice that figure4c shows an example of two curves that are entangled, however the linking number iszero. Figure 4. From reference (16). For any simple closed strip, the curves C + εU given parametrically C (t ) + εU (t )are, for sufficiently small ε > 0 , simple closed curves disjoint from C, and the linkingnumber Lk(C, C + εU ) is defined and independent of ε . The vectors C (t), U(t) andV (t ) = C (t ) × U (t ) define a moving frame (C ,U ,V ) along C. Let Ω denote the angular1 A curve C is smooth if is infinitely differentiable. 7
  • 9. velocity vector describing the rate of rotation of the frame with respect to the arclength t,so that c = Ω × C , µ = Ω × U and ν = Ω × V . Let ω1 , ω 2 and ω 3 be the components ofΩ referred to the moving frame, i. e., Ω = ω1C +ω 2U + ω 3V . Then ω1 represents theangular rate at which U revolves around C. ω1 is called the twist of the strip at each pointof the curve.Definition 2. The total twist number Tw(C,U), is defined by the integral of ω1 withrespect to the arclength t over the curve C and divided by 2π . That is 1 2π ∫Tw(C ,U ) = ω1 dt . The total twist number need not be an integer and if the curve C isa simple plane curve then the linking number Lk (C , C + U ) and the total twist numberTw(C,U) are equal.Definition 3. The difference Wr (C ) = Lk (C , C + U ) − Tw(C ,U ) is a geometric invariantof the curve C and is called the writhing number.1.3.1 Directional Writhing NumberDefinition 4. A smooth simple closed curve C and a fixed unit vector σ are said to be ingeneral position if the tangents to C are never parallel to σ . In this case the curvesC + εσ are disjoint from C for all sufficiently small ε > 0 , hence for such ε we canmay define the directional writhing number of C in the direction of σ byWr (C , σ ) = Lk (C , C + εσ ) . If C and σ are in general position, the orthogonal projection of C onto a planewith normal σ defines a smooth closed plane curve Cσ for which undercrossings andovercrossings can be distinguished at each crossing point (see figure 5 below). At acrossing point c of an oriented regular diagram for a curve, we have two possibleconfigurations. Either sign(c)=+1 or sign(c)= – 1 as shown on figure 5. The sign of acrossing number is based on the right hand rule convention. Figure 5. If one adds all the signed crossing numbers for a fixed regular projection of acurve for a direction σ , one obtains the directional writhing number Wr (C , σ ) . The 8
  • 10. writhing number Wr of a curve C is equal to the average of the directional writhingnumber over all projections, the average is taken with respect to the area on the unitsphere. Figure 6 shows examples of regular projections of two knots, for the orientedprojection of the trefoil knot (left) we have the projected writhing number is 3 while forthe oriented projection of the figure eight knot (right), is 0. Figure 6 The writhing number Wr of a closed space curve γ can be calculated usinggeneralized Gauss integrals. 1 4π γ ×∫∫D Wr (γ ) = w(t1 , t 2 )dt1 dt 2 , γwhere [γ (t1 ), γ (t1 ) − γ (t 2 ), γ (t 2 )] w(t1 , t 2 ) = | γ (t1 ) − γ (t 2 ) |3and D is the diagonal of γ × γ . The numerator of w(t1 , t 2 ) is the triple scalar product,[γ (t1 ), γ (t1 ) − γ (t 2 ), γ (t 2 )] = γ (t1 ) ⋅ {[γ (t1 ) − γ (t 2 )] × γ (t 2 )} . The triple scalar product isalso equal to the oriented volume of the parallelepiped spanned by γ (t1 ), γ (t1 ) − γ (t 2 ) ,and γ (t 2 ) . Thus w(t1 , t 2 ) = w(t 2 , t1 ) . Assuming that γ is parametrized by [0,1] it sufficesto calculate the integral on the domain ∆2 = {(t1 , t 2 );0 < t1 < t 2 < 1} . IfI (1, 2 ) = ∫ w(t1 , t 2 )dt1 dt 2 then: ∆2 1 Wr (γ ) = I (1, 2 ) 2π Another measure for curves is the average crossing number and is defined bytaking the absolute value of the integrand: I |1, 2| (γ ) = ∫ | w(t1 , t 2 ) | dt1 dt 2 ∆2 9
  • 11. The main difference between the projection of a knot and space curves(representing protein backbones) is that for knots we deal with simple closed curves,while for protein backbones we have polygonal curves which are not closed. 1.3.2 Natural Notion of the Writhing Number for Polygonal Curves For a polygonal curve the natural definition of writhing number is: I (1, 2) (γ ) = Wr (γ ) = ∑W (i , i 0< i1 1 2 ), < i2 < Nwith i1 +1 i2 +1 1 W (i1 , i2 ) = 2π ∫ ∫ w(t , t t1 =i1 t 2 =i2 1 2 )dt1 dt 2 .and w(t1 , t 2 ) = [γ (t1 ), γ (t1 ) − γ (t 2 ), γ (t 2 )] / | γ (t1 ) − γ (t 2 ) |3 . Here W (i1 , i2 ) is the contribution to the writhing number coming from the i1 thand the i2 th line segments. W (i1 , i2 ) is equal to the probability from an arbitrarydirection to see the i1 th and the i2 th line segment cross, multiplied by the sign of thiscrossing. Thus, geometrically this notion of writhe number is still the projected writhingnumber averaged over all projections. By combining this number we can make a whole set of structural measures, e.g. I |1, 2| (γ ) = ∑ | W (i , i 0<i1 1 2 ) |, < i2 < N I |1,3|( 2, 4 ) (γ ) = ∑ | W (i , i ) | W (i , i 0<i1 <i2 1 3 2 4 ), <i3 <i4 < N I |1,5|( 2, 4 )(3,6 ) (γ ) = ∑ | W (i , i ) | W (i , i 0<i1 <i2 < i3 1 5 2 4 )W (i3 , i6 ) <i4 <i5 <i6 < Nwhere N is the number of vertices of the polygonal curve.Numbers like the ones just mentioned will constitute the building blocks for our proteindomain descriptors, which described in the next section. 10
  • 12. 1.4 Representing Proteins in R 20 As mentioned before, the protein backbone is a space curve (see figure 7 below).We are interested in the absolute measures of the geometry of these curves by studyingthe self-crossings seen in a planar projection. These measures are inspired by generalizedGauss integrals involved in formulas for the Vassiliev knot invariants. Figure 7. Backbone curve of Lysozyme from Gallus Gallus, from (3). For each protein domain on CATH 2.4, we have a geometric invariant of the polygonalcurve connecting the α -carbon atoms. Each domain is assigned a 20-dimensional vectorcontaining the measures described by the following:I (1, 2) , I |1, 2| , I (1,3)( 2, 4) , I (1, 2)(3, 4) , I (1, 4)( 2,3) , I (1, 2)(3, 4)(5,6) , I (1, 2)(3,5)( 4,6) , I (1, 2)(3,6)( 4,5) , I (1,3)( 2, 4)(5, 6) ,I (1,3)( 2,5)( 4,6) , I (1,3)( 2,6)( 4,5) , I (1, 4)( 2,3)(5,6) , I (1, 4)( 2,5)(3, 6) , I (1, 4)( 2,6)(3,5) , I (1,5)( 2,3)( 4,6) , I (1,5)( 2, 4)(3,6) ,I (1,5)( 2,6)(3, 4) , I (1,6)( 2,3)( 4,5) , I (1,6)( 2, 4)(3,5) , and I (1,6)( 2,5)(3, 4) . The measures are normalized such that each value is between –1 and 1. Thenormalization factors are one over 146, 1277, 119, 101 023, 1206, 477 989, 6612, 23 946,6448, 203, 1884, 54 581, 172, 258, 1246, 293, 1396, 36 143, 442, and 2468 respectivelyfor the measures in the order above. Once each protein chain is mapped onto a point in the 20-dimensional space, theusual euclidean metric is used to compare the protein chains. 11
  • 13. 20 d ( x, y ) = ∑ (x i =1 i − yi ) 2 Based on the scaled factors described given above, this metric is called the ScaledGauss Metric (SGM). 1.4.1 Results of the SGM when Tested for CATH 2.4 Let x, y and z be points in R 20 , then the Scaled Gauss Metric satisfies the threeproperties for pseudometric:i) d ( x, y ) = 0 if x=yii) d ( x, y ) = d ( y, x) (symmetry)iii) d ( x, z ) ≤ d ( x, y ) + d ( y, z ) (triangle inequality). The fact that SGM satisfies the triangle inequality is important because it allowsus to judge dissimilarity between proteins. A computer algorithm (12,13,17) based on this metric was made to classify thedomains of all 20,937 of CATH 2.4 domains as of September 2002. The total success ratewas 98.6%. The remaining 1.4% of the chains are unknown; of these, 0.9% are actuallynew folds. It presented no mistakes since unknown structures were flagged instead ofmisclassifying. Also proteins of different sizes can be compared directly without use ofalignment or gap penalties. The figure 8 shows a projection map from R 20 to R 2 , and itshows the CATH hierarchy. Here, every point represents a protein domain in CATH. As described by the authors (12), the rectangle in the upper left contains all thechains in CATH, colored according to their class ( α , β , αβ and few secondarystructures), notice that the αβ group resides between the α and the β groups. Thisobservation shows the congruence that exists between the automatic classification createdby the SGM and the CATH database assignation currently given. Figure 9 shows the usefulness of the second order invariants. In this example thecurves A and B posses the same crossing number and average crossing number.However the second order invariants can differentiate between the two curves. 12
  • 14. Figure 8. From reference (12).Figure 9. From reference (12). 13
  • 15. 2. The Experimental Plan 2.1 Purpose and Objectives The excellent results of the SGM shown in the previous section are elegant, fast,computationally viable, and motivate one to understand the true geometric meaning ofsuch measures. As it was mentioned before, the geometric idea of all these measures is still notfully understood (12-13). While there is a geometric interpretation of the writhingnumber ( I (1, 2) ) and the average crossing number ( I |1, 2| ), the meaning of the higher ordermeasures is still a mystery. Another important question worth investigating is todetermine if it is possible to classify protein structure domains with less Gauss measures(described in 1.4), if some of the measures are strongly correlated or provide moreinformation and it will be possible to improve the combinations used. Finally, it might beplausible to apply this method to classification of RNA secondary structures. During this research proposal I intend, with the support of my advisor, De WittSumners, to complete the following objectives: I) Determine the geometric meaning of the higher order invariants obtained from the Gauss integral measures. Such work will validate the importance of the role of these numbers and corroborate the excellent results obtained from experimental evidence. II) Optimize the choice of the invariant numbers used to classify the protein structures. This will allow an increase of the speed and efficiency of the computer algorithms to classify the protein structures by selecting the best shape descriptors, and the minimum quantity necessary of such descriptors. III) Study the mathematical idea involved in these numbers and the possible applications to branches of mathematics such as Knot Theory and Differential Geometry. IV) Explore the possibility of application of these methods to the classification of RNA secondary structures. Since an RNA secondary structure can be seen as a chain or a polygonal curve, an approach to this unexplored topic could result in promising and new applications of mathematics in biology. The research questions are as follows: 14
  • 16. Are the numbers obtained by using the higher order writhe calculations truly shapedescriptors of space curves? Or, are they just numbers chosen by chance, that work onlyfor very particular curves? The answer to these questions will unveil the true geometric meaning of these higherorder invariants. This is fundamental to validate the automatic classification computermethod for novel protein structure domains. 2.2 Procedures The research will be based on mathematics and on biology as described below. To begin with, we consider a review of the old literature related to the writhingnumber such as the work by J. H. White (18), G. Gălugăreanu (19), and Brock Fuller (15-16) ,as well as the new literature that focus also on the concept of writhing number foropen and closed curves (20-28). A study on the proof and the methods for solving theprimary cases would provide clues for solving the general case for the higher orderinvariants. Another fundamental source of information is to review current computer algorithmsdesigned to calculate the writhing number particularly applied to fields such as biologyand physics (27). Some of these computer algorithms are in the public domain and can bedownloaded (28). An algorithm to compute the writhing number is essential to understand and to verifythe geometric ideas. Using Monte Carlo simulations, we intend to estimate the writenumber of a polygonal curve of n in the simple cubic lattice. The advantage of using asimple cubic lattice is that for a closed curve, the problem reduces the writhing numbercomputation to the average of the linking number of the given curve with four of itspushoffs (24). The next step would be to study the higher order invariants on this simplecubic lattice. To verify the data on simulation results we would like to consider some examples.We will first consider simple cases where we know the answer and then we will applythese methods for a polygonal curve describing the backbone of some protein crystals.Such data can be obtained from the Protein Data Bank (3). Finally, we would like to apply this method to RNA secondary structures. Aribonucleic acid (RNA) molecule consists of a chain of ribonucleotides linked together bycovalent chemical bonds (29). Figure 10 shows a model of an RNA structure obtainedfrom the Protein Data Bank. We notice that RNA structures, like on the figure 10, can beseen as a chain that bends and twines about itself. Such self-crossings are of particularinterest because the Gauss measures, designed to describe the shape of proteins, can beapplied to these chains. 15
  • 17. With these approaches we expect to understand the geometric meaning of these higherorder invariants. Figure 10. Pseudoknot within the gene 32 messenger RNA of Bacteriophage T2. Image obtained by Protein Data Bank (3). 16
  • 18. References (1) Gale Rhodes. Crystallography: Made Crystals Clear. Academic Press, 2000, Second Edition. (2) Joseph P. Hornak. The Basics of NMR. <http://www.cis.rit.edu/htbooks/nmr/>. (3) Protein Databank, Available from <http://beta.rcsb.org/pdb/>. (4) CATH Protein Structure Classification <http://www.biochem.ucl.ac.uk/bsm/cath/>. (5) Pearl, F. M. G. Lee, D., Bray, J. E. Sillitoe, I., Todd, A. E., Harrison, A. P., Thornton, J. M. and Orengo, C. A. Assigning Genomic Sequences to CATH. Nucleic Acids Research. 2000, Vol 28. No 1. 277-282. (6) Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP : A Structural Classification of Proteins Database for the Investigation of Sequences and Structures. J. Mol. Biol. 1995, 247:536-540. (7) Patrice Koehl, Protein Structure Similarities. Curr. Opin. Struct. Biol. 2001, 11:348-353. (8) CE Combinatorial Extension <http://cl.sdsc.edu>, available to download from <ftp://ftp.sdsc.edu/pub/sdsc/biology/CE/src>. (9) DALI Distance Matrix Alignment <http://www2.ebi.ac.uk/dali>, available to download from <http://jura.ebi.ac.uk:8765/~holm/DaliLite>.(10) KENOBI Alignment Using a Genetic Algorithm <http://sullivan.bu.edu/kenobi>, available to download from <http://www.columbia.edu/~ay1>.(11) STRUCTAL Double Dynamic Programming <http://bioinfo.mbb.yale.edu/align/server.cgi>.(12) Peter Rogen, Boris Fain. Automatic Classification of Protein Structure by Using Gauss Integrals. PNAS, Vol 100 (2003), no.1, 119-124.(13) Peter Rogen, Henrik Bohr. A New Family of Global Protein Shape Descriptors. Math Biosc 182 (2003), 167-181.(14) Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B., and Thornton, J. M. CATH- A Hierarchy Classification of Protein Domain Structures. Structure. Vol 5 (1997), No 8. 1093-1108. 17
  • 19. (15) F. Brock Fuller, The Writhing Number of a Space Curve. Proc. Nat. Acad. Sci. USA, Vol. 68, No. 4 (1971), 815-819.(16) F. Brock Fuller, Mathematical Problems in the Biological Sciences, Proceedings of Symposia in Applied Mathematics, ed. R. E. Bellman (American Mathematical Society, Providence) Vol. 14 (1962), 64-68.(17) Peter Rogen, Robert Sinclair. Computing a New Family of Shape Descriptors for Protein Structures. J. Chem. Inf. Comput. Sci. 43 (2003), 1740-1747.(18) White J. H., Self-Linking and the Gauss Integral in HigherDimensions. Am. J. Math. 91 (1969), 693-727(19) G. Gălugăreanu, Sur les Classes D’isotope des Noeuds Tridimensionnels et Leur Invariants, Czechoslovak Mathematical Journal 11 (1961), 588-625.(20) Lin, X-S, Wang, Z. Integral Geometry of Plane Curves and Knot Invariants. J. Differ. Geom. 44 (1996), 74-95.(21) Yu. Aminov, Differential Geometry and Topology of Curves, Gordon and Breach Science Publishers (2000).(22) Eric S. Lander, Michael Waterman, Calculating the Secretes of Life, National Research Council (1995).(23) Levitt group Server, <http://www.stanford.edu/~bfain/>.(24) E. Orlandini, M. C. Tesi, E. J. Janse van Rensburg, D. W. Sumners, S. G. Whittington, The Writhe of a Self-avoiding Polygon, J. Phys. A: Math. Gen. 26 (1993), 981-986.(25) E. Orlandini, S. G. Whittington, D. W. Sumners, M. C. Tesi, E. J. Janse van Rensburg, The Writhe of a Self-avoiding Path, J. Phys. A: Math. Gen. 27 (1994), 333-338.(26) Meivys Garcia, Emmanuel Ilangko, Stuart G. Whittimgton, The Writhe of Polygons on the Face-centered Cubic Lattice, Path, J. Phys. A: Math. Gen. 32 (1999), 4593- 4600.(27) Corinne Cerf, Andrzej Stasiak, A Topological Invariant to Predict the three- dimensional Writhe of Ideal Configurations of Knots and Links, PNAS Vol. 97 (2000), 3795-3798.(28) Pankaj K. Agarwal, Herbert Edelsbrunner, Yusu Wang, Computing the Writhing Number of a Polygonal Knot, SODA, (2002), 791-799. 18
  • 20. (29) RNA World at IMB Jena: <http://www.imb-jena.de/RNA.html>. 19

×