The document proposes a new method for classifying protein structures using Gauss integrals. It discusses current methods for protein classification that have limitations. The proposal focuses on developing a "family of global protein shape descriptors" using concepts from knot theory, including the writhing number. It aims to provide a fully automated, efficient method for protein structure comparison that overcomes current method limitations.
This talk presents an on line decision support system for structural biologists who are interested in performing multiple protein structure comparisons, via multiple methods, in one go.
Mining frequent pattern is a NP-hard problem and has become a hot topic in recent researches. Moreover,
protein dataset contains distinct Pattern that can be used in many areas such as drug discovery, disease
prediction, etc. In early decades, pattern discovery and protein fold recognition was determined by
biophysics and biochemistry approach; and X-ray and NMR have been used for protein structure
prediction which are very expensive and time consuming while, a mathematical approach can reduce the
cost of such laboratory experiments. Many computer based tests have been applied for the protein fold
detection such as graph based algorithms and data mining viewpoints like classification or clustering, and
all have their advantages and drawbacks. Pattern matching in protein sequential dataset for fold
recognition plays a meaningful role in the field of bioinformatics since it evolved prediction of unknown
protein function. There are lots of pattern recognition algorithms but in this work we used PrefixSpan. The
reason of selecting this algorithm will be discussed below in section 2. For evaluating the result of
experiments we used SCOPE dataset which is a classified protein dataset and ASTRAL, a discriminative
sequential dataset of SCOPE.
HMM’S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSISijcseit
HMM has found its application in almost every field. Applying Hmm to biological sequences has its own
advantages. HMM’s being more systematic and specific, yield a result better than consensus techniques.
Profile HMMs use position specific scoring for the matching & substitution of a residue and for the
opening or extension of a gap. HMMs apply a statistical method to estimate the true frequency of a residue
at a given position in the alignment from its observed frequency while standard profiles use the observed
frequency itself to assign the score for that residue. This means that a profile HMM derived from only 10 to
20 aligned sequences can be of equivalent quality to a standard profile created from 40 to 50 aligned
sequences.
I shikha popali and my colleague harshpal singh wahi presents a presentation "RECENT DEVELOPMENT IN DRUG DESIGN AND DISCOVERY " A detail account on protein structure is given
This talk presents an on line decision support system for structural biologists who are interested in performing multiple protein structure comparisons, via multiple methods, in one go.
Mining frequent pattern is a NP-hard problem and has become a hot topic in recent researches. Moreover,
protein dataset contains distinct Pattern that can be used in many areas such as drug discovery, disease
prediction, etc. In early decades, pattern discovery and protein fold recognition was determined by
biophysics and biochemistry approach; and X-ray and NMR have been used for protein structure
prediction which are very expensive and time consuming while, a mathematical approach can reduce the
cost of such laboratory experiments. Many computer based tests have been applied for the protein fold
detection such as graph based algorithms and data mining viewpoints like classification or clustering, and
all have their advantages and drawbacks. Pattern matching in protein sequential dataset for fold
recognition plays a meaningful role in the field of bioinformatics since it evolved prediction of unknown
protein function. There are lots of pattern recognition algorithms but in this work we used PrefixSpan. The
reason of selecting this algorithm will be discussed below in section 2. For evaluating the result of
experiments we used SCOPE dataset which is a classified protein dataset and ASTRAL, a discriminative
sequential dataset of SCOPE.
HMM’S INTERPOLATION OF PROTIENS FOR PROFILE ANALYSISijcseit
HMM has found its application in almost every field. Applying Hmm to biological sequences has its own
advantages. HMM’s being more systematic and specific, yield a result better than consensus techniques.
Profile HMMs use position specific scoring for the matching & substitution of a residue and for the
opening or extension of a gap. HMMs apply a statistical method to estimate the true frequency of a residue
at a given position in the alignment from its observed frequency while standard profiles use the observed
frequency itself to assign the score for that residue. This means that a profile HMM derived from only 10 to
20 aligned sequences can be of equivalent quality to a standard profile created from 40 to 50 aligned
sequences.
I shikha popali and my colleague harshpal singh wahi presents a presentation "RECENT DEVELOPMENT IN DRUG DESIGN AND DISCOVERY " A detail account on protein structure is given
Protein threading using context specific alignment potential ismb-2013Sheng Wang
Template-based modeling, including homology modeling and protein threading, is the most reliable method for protein 3D structure prediction. However, alignment errors and template selection are still the main bottleneck for current template-base modeling methods, especially when proteins under consideration are distantly related.
We present a novel context-specific alignment potential for protein threading, including alignment and template selection. Our alignment potential measures the log-odds ratio of one alignment being generated from two related proteins to being generated from two unrelated proteins, by integrating both local and global contextspecific information.
Prediction of the three dimensional structure of a given protein sequence i.e. target protein from the amino acid sequence of a homologous (template) protein for which an X-ray or NMR structure is available based on an alignment to one or more known protein structures
Criterion based Two Dimensional Protein Folding Using Extended GA IJCSEIT Journal
In the dynamite field of biological and protein research, the protein fold recognition for long pattern
protein sequences is a great confrontation for many years. With that consideration, this paper contributes
to the protein folding research field and presents a novel procedure for mapping appropriate protein
structure to its correct 2D fold by a concrete model using swarm intelligence. Moreover, the model
incorporates Extended Genetic Algorithm (EGA) with concealed Markov model (CMM) for effectively
folding the protein sequences that are having long chain lengths. The protein sequences are preprocessed,
classified and then, analyzed with some parameters (criterion) such as fitness, similarity and sequence gaps
for optimal formation of protein structures. Fitness correlation is evaluated for the determination of
bonding strength of molecules, thereby involves in efficient fold recognition task. Experimental results have
shown that the proposed method is more adept in 2D protein folding and outperforms the existing
algorithms.
Protein threading using context specific alignment potential ismb-2013Sheng Wang
Template-based modeling, including homology modeling and protein threading, is the most reliable method for protein 3D structure prediction. However, alignment errors and template selection are still the main bottleneck for current template-base modeling methods, especially when proteins under consideration are distantly related.
We present a novel context-specific alignment potential for protein threading, including alignment and template selection. Our alignment potential measures the log-odds ratio of one alignment being generated from two related proteins to being generated from two unrelated proteins, by integrating both local and global contextspecific information.
Prediction of the three dimensional structure of a given protein sequence i.e. target protein from the amino acid sequence of a homologous (template) protein for which an X-ray or NMR structure is available based on an alignment to one or more known protein structures
Criterion based Two Dimensional Protein Folding Using Extended GA IJCSEIT Journal
In the dynamite field of biological and protein research, the protein fold recognition for long pattern
protein sequences is a great confrontation for many years. With that consideration, this paper contributes
to the protein folding research field and presents a novel procedure for mapping appropriate protein
structure to its correct 2D fold by a concrete model using swarm intelligence. Moreover, the model
incorporates Extended Genetic Algorithm (EGA) with concealed Markov model (CMM) for effectively
folding the protein sequences that are having long chain lengths. The protein sequences are preprocessed,
classified and then, analyzed with some parameters (criterion) such as fitness, similarity and sequence gaps
for optimal formation of protein structures. Fitness correlation is evaluated for the determination of
bonding strength of molecules, thereby involves in efficient fold recognition task. Experimental results have
shown that the proposed method is more adept in 2D protein folding and outperforms the existing
algorithms.
A visual guide to custom landing tabs on Facebook Pages.Carlabobka
Facebook can be a sea of sameness. Look at what's possible on facebook when you customize the landing tab to add personality and tell your story. This deck looks at examples and points out the goal behind each tab's design.
MULISA : A New Strategy for Discovery of Protein Functional Motifs and Residuescsandit
To predict and identify details regarding function
from protein sequences is an emergency task
since the growing number and diversity of protein s
equence. Here, we develop a novel approach
for identifying conservation residues and motifs of
ligand-binding proteins. In this method,
called MuLiSA (Multiple Ligand-bound Structure Alig
nment), we first superimpose the ligands
of ligand-binding proteins and then the residues of
ligand-binding sites are naturally aligned.
We identify important residues and patterns based o
n the z-scores of the residue entropy and
residue-segment entropy. After identifying new patt
ern candidates, the profiles of patterns are
generated to predict the protein function from only
protein sequences. We tested our approach
on ATP-binding proteins and HEM-binding proteins. T
he experiments show that MuLiSA can
identify the conservation residues and novel patter
ns which are really correlated with protein
functions of certain ligand-binding proteins. We fo
und that our MuLiSA can identify
conservation patterns and is better than traditiona
l alignments such as CE and CLUSTALW in
some ligand-binding proteins. We believe that our M
uLiSA is useful to discover ligand-binding
specificity-determining residues and functional imp
ortant patterns of proteins.
Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Melissa Moody
UVA Data Science Institute Master of Science in Data Science students Sean Mullane, Ruoyan Chen and Sri Vaishnavi Vemulapalli were motivated to apply data science tools and techniques to the problem, and see if protein structures can be quantitatively described, compared and otherwise analyzed in a more robust, efficient and automated manner. Potential applications include more effectively designed drugs to inhibit disease-related proteins, or even newly engineered ones.
The researchers received the award for Best Paper in the Data Science for Health category at the 2019 Systems & Information Design Symposium (SIEDS) meeting. Their project, "Machine Learning for Classification of Protein Helix Capping Motifs," focused on small segments of a protein called secondary structural elements. These structural elements are the basic molecular-scale building blocks that all proteins—and therefore life—build upon.
Research Inventy : International Journal of Engineering and Scienceresearchinventy
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Presentation made at PepTalk 2011 in San Diego on Jan. 13, 2011. The emphasis is on computational methods to explore global and local structure similarities in determining the possible promiscuity of drugs to bind to multiple protein receptors.
A Frequency Domain Approach to Protein Sequence Similarity Analysis and Funct...sipij
A new computational approach for protein sequence similarity analysis and functional classification which is fast and easier compared to the conventional method is described. This technique uses Discrete Wavelet Transform decomposition followed by sequence correlation analysis. The technique can also be used for identifying the functional class of a newly obtained protein sequence. The classification was done using a sample set of 270 protein sequences obtained from organisms of diverse origins and functional classes, which gave a classification accuracy of 94.81%. Accuracy and reliability of the technique is verified by comparing the results with that obtained from NCBI.
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem
process in which instead comparing whole query sequence with database sequence it breaks
query sequence into small words and these words are used to align patterns. it uses heuristic method which
make it faster than earlier smith-waterman algorithm. But due small query sequence used for align in case of
very large database with complex queries it may perform poor. To remove this draw back we suggest by using
MSA tools which can filter database in by removing unnecessary sequences from data. This sorted data set then
applies to BLAST which can then indentify relationship among them i.e. HOMOLOGS, ORTHOLOGS,
PARALOGS. The proposed system can be further use to find relation among two persons or used to create
family tree. Ortholog is interesting for a wide range of bioinformatics analyses, including functional annotation,
phylogenetic inference, or genome evolution. This system describes and motivates the algorithm for predicting
orthologous relationships among complete genomes. The algorithm takes a pairwise approach, thus neither
requiring tree reconstruction nor reconciliation
Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...IJRTEMJOURNAL
BLAST is most popular sequence alignment tool used to align bioinformatics patterns. It uses
local alignment process in which instead comparing whole query sequence with database sequence it breaks
query sequence into small words and these words are used to align patterns. it uses heuristic method which
make it faster than earlier smith-waterman algorithm. But due small query sequence used for align in case of
very large database with complex queries it may perform poor. To remove this draw back we suggest by using
MSA tools which can filter database in by removing unnecessary sequences from data. This sorted data set then
applies to BLAST which can then indentify relationship among them i.e. HOMOLOGS, ORTHOLOGS,
PARALOGS. The proposed system can be further use to find relation among two persons or used to create
family tree. Ortholog is interesting for a wide range of bioinformatics analyses, including functional annotation,
phylogenetic inference, or genome evolution. This system describes and motivates the algorithm for predicting
orthologous relationships among complete genomes. The algorithm takes a pairwise approach, thus neither
requiring tree reconstruction nor reconciliation
Comparative Protein Structure Modeling and itsApplicationsLynellBull52
Comparative Protein Structure Modeling and its
Applications to Drug Discovery
Matthew Jacobson
1
and Andrej Sali
1,2
1
Department of Pharmaceutical Chemistry, California Institute for
Quantitative Biomedical Research, Mission Bay Genentech Hall, 600 16th Street,
University of California, San Francisco, CA 94143-2240, USA
2
Department of Biopharmaceutial Sciences, California Institute for
Quantitative Biomedical Research, Mission Bay Genentech Hall, 600 16th Street,
University of California, San Francisco, CA 94143-2240, USA
Contents
1. Introduction 259
2. Fold assignment and sequence-structure alignment 261
3. Comparative model building 261
4. Loop modeling 262
5. Sidechain modeling 263
6. Comparative modeling by MODELLER 264
7. Physics-based approaches to comparative model construction and refinement 264
8. Accuracy of comparative models 266
9. Modeling on a genomic scale 266
10. Applications of comparative modeling to drug discovery 267
10.1. Comparative models vs experimental structures in virtual screening 267
10.2. Use of comparative models to obtain novel drug leads 268
10.3. Comparative models of kinases in virtual screening 269
10.4. GPCR comparative models for drug development 270
10.5. Other uses of comparative models in drug development 271
10.6. Future directions 272
11. Conclusions 273
References 273
1. INTRODUCTION
Homology or comparative protein structure modeling constructs a three-dimensional
model of a given protein sequence based on its similarity to one or more known
structures. In this perspective, we begin by describing the comparative modeling
technique and the accuracy of the models. We then discuss the significant role that
comparative prediction plays in drug discovery. We focus on virtual ligand screening
against comparative models and illustrate the state-of-the-art by a number of specific
examples.
The genome sequencing efforts are providing us with complete genetic blueprints for
hundreds of organisms, including humans. We are now faced with describing,
ANNUAL REPORTS IN MEDICINAL CHEMISTRY, VOLUME 39 q 2004 Elsevier Inc.
ISSN: 0065-7743 DOI 10.1016/S0065-7743(04)39020-2 All rights reserved
controlling, and modifying the functions of proteins encoded by these genomes. This
task is generally facilitated by protein three-dimensional structures [1], which are best
determined by experimental methods such as X-ray crystallography and nuclear
magnetic resonance (NMR) spectroscopy. Despite significant advances in these
techniques, many protein sequences are not easily accessible to structure determination
by experiment. Over the last two years, the number of sequences in the comprehensive
public sequence databases, such as SwissProt/TrEMBL [2] and GenPept [3], increased
by a factor of 2.3 from 522,959 to 1,215,803 on 26 April 2004. In contrast, despite
structural genomics, the number of experimentally determined structures deposited in
the Protein Data Bank (PDB) increas ...
Automated alphabet reduction with evolutionary algorithms for protein structu...kknsastry
This paper focuses on automated procedures to reduce the dimensionality of protein structure prediction datasets by simplifying the way in which the primary sequence of a protein is represented. The potential benefits of this procedure are faster and easier learning process as well as the generation of more compact and human-readable classifiers. The dimensionality reduction procedure we propose consists on the reduction of the 20-letter amino acid (AA) alphabet, which is normally used to specify a protein sequence, into a lower cardinality alphabet. This reduction comes about by a clustering of AA types accordingly to their physical and chemical similarity. Our automated reduction procedure is guided by a fitness function based on the Mutual Information between the AA-based input attributes of the dataset and the protein structure feature that being predicted. <br />
To search for the optimal reduction, the Extended Compact Genetic Algorithm (ECGA) was used, and afterwards the results of this process were fed into (and validated by) BioHEL, a genetics-based machine learning technique. BioHEL used the reduced alphabet to induce rules for protein structure prediction features. BioHEL results are compared to two standard machine learning systems. Our results show that it is possible to reduce the size of the alphabet used for prediction from twenty to just three letters resulting in more compact, i.e. interpretable, rules. Also, a protein-wise accuracy performance measure suggests that the loss of accuracy accrued by this substantial alphabet reduction is not statistically significant when compared to the full alphabet.
Gardner D.P., Xu W., Miranker D.P., Ozer S., Cannone J.J., and Gutell R.R. (2012).
An Accurate Scalable Template-based Alignment Algorithm.
Proceedings of 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2012), Philadelphia, PA. October 4-7, 2012. IEEE Computer Society, Washington, DC, USA. pp. 237-243.
Protein Structure Prediction Using Support Vector Machine ijsc
Support Vector Machine (SVM) is used for predict the protein structural. Bioinformatics method use to protein structure prediction mostly depends on the amino acid sequence. In this paper, work predicted of 1-D, 2-D, and 3-D protein structure prediction. Protein structure prediction is one of the most important problems in modern computation biology. Support Vector Machine haves shown strong generalization ability protein structure prediction. Binary classification techniques of Support Vector Machine are implemented and RBF kernel function is used in SVM. This Radial Basic Function (RBF) of SVM produces better accuracy in terms of classification and the learning results.
PROTEIN STRUCTURE PREDICTION USING SUPPORT VECTOR MACHINEijsc
Support Vector Machine (SVM) is used for predict the protein structural. Bioinformatics method use to protein structure prediction mostly depends on the amino acid sequence. In this paper, work predicted of 1-
D, 2-D, and 3-D protein structure prediction. Protein structure prediction is one of the most important problems in modern computation biology. Support Vector Machine haves shown strong generalization ability protein structure prediction. Binary classification techniques of Support Vector Machine are implemented and RBF kernel function is used in SVM. This Radial Basic Function (RBF) of SVM produces better accuracy in terms of classification and the learning results.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
MATATAG CURRICULUM: ASSESSING THE READINESS OF ELEM. PUBLIC SCHOOL TEACHERS I...NelTorrente
In this research, it concludes that while the readiness of teachers in Caloocan City to implement the MATATAG Curriculum is generally positive, targeted efforts in professional development, resource distribution, support networks, and comprehensive preparation can address the existing gaps and ensure successful curriculum implementation.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
A family of global protein shape descriptors using gauss integrals, christian laing
1. The Florida State University
College of Arts and Sciences
A Family of Global Protein Shape
Descriptors Using Gauss Integrals
By
Christian Edgar Laing Celestino
A proposal submitted to the Department
of Mathematics in partial fulfillment of
the doctoral preliminary examination
April 30, 2004
2. Table of Contents
Abstract …………………………………………… 2
1 Background and Significance …………………….. 4
1.1 CATH Protein Structure Classification …………………… 4
1.2 Current Methods and Importance of a New Approach ……. 5
1.3 The Writhing Number ….………………………………….. 7
1.3.1 Directional Writhing Number ….…………………….. 8
1.3.2 Natural Notion of the Writhing Number for Polygonal
Curves …………………………………………………….. 10
1.4 Representing Proteins in R 20 ……………………………… 11
1.4.1 Results of the SGM when Tested for CATH 2.4 ……. 12
2 The Experimental Plan ……………………………. 14
2.1 Purpose and Objectives …………………………………… 14
2.2 Procedures ………………………………………………… 15
References ………………………………………… 17
1
3. Abstract
Within the field of biology, comparison, description and prediction of biological
structures is an important task. In the case of proteins, it is of great interest to characterize
and therefore classify these three dimensional structures. Protein structures can be
classified in a variety of interrelated ways such as functional similarity, evolutionary
similarity, and fold similarity. Two similar proteins can have different sequence
information, but comparison of protein structures can show their distant evolutionary
relationships that would not be evident by sequence information alone. Proteins also have
three-dimensional structures that provide clues to their function in living organisms.
Protein classification focuses on identifying proteins that have similar chemical
architectures and topology. Because it is not practical to study in detail all the protein
structures in every genome, the functional role of a new protein in the cell can be inferred
from an already classified protein with similar structure. This is why it is important to
develop new methods for 3D structures classification of proteins.
Today, there is a great amount of protein information obtained from experimental
methods such as X-ray Crystallography (1) and NMR Spectroscopy (2). The data is
deposited into a resource of public domain, such as the Protein Data Bank (3). Structural
information about proteins such as CATH (Class, Architecture, Topology and
Homologous Superfamily, see 4-5) and SCOP (Structural Classification of Proteins, see
6) is also available in databases. However, some of the methods of classification are done
by manual inspection. Because of the rapid increase in the number of known proteins (as
of April 2004, 25,004 and growing by >450 per month (3)), a fully automatic method
(using solely computer algorithms) is required.
Currently there are several computer methods for structural comparison of
proteins (7). Examples of these are CE (8), DALI (9), KENOBI (10), and STRUCTAL
(11). Such methods are also in the public domain and in some cases the program itself is
available for download. These structural comparison methods are based on computing a
pairwise distance between the alpha carbon atoms of the protein, but such methods
present several complications. First, these methods are high in computational cost
because they require alignment between two molecules in order to compare poteins.
Additionally, the measures that are used violate the triangle inequality
( d ( x, z ) ≤ d ( x, y ) + d ( y, z ) ). Consequently these computations have little meaning for
proteins with large distance, that is, when their structures similarities are far apart.
Because of these complications, the need of a better and different approach is required.
Peter Rogen and Boris Fain in the group of Michael Levitt at Stanford University,
have developed a new automatic classification of proteins using Gauss integrals. A vector
2
4. of 20 numbers inspired by Vassiliev knot invariants to capture the topology of a protein
(12), (13). Multiple combinations of a geometrical tool called “writhing number” gives
these 20 numbers.
This work is still in progress and it has shown good results when it was tested on
a protein database known as CATH 2.4, correctly classifying 98.6% of the protein crystal
structure data used.
The authors leave an interesting point open (12): “While we have geometric
interpretation of the writhing number we would like to understand the other generalized
Gauss integrals used in this work”. We intend to investigate and answer this question.
3
5. 1. Background and Significance
Proteomics is the study of the full set of proteins encoded by a genome, and
Structural Proteomics is a sub-area of Proteomics that studies the structure of proteins. So
far, many genomes have been fully sequenced, including Yeast, Drosophila Melanogaster
and Homo Sapiens. The full value of the sequence data will be realized when we assign
the role of each protein in the cell, and this require a full set of tools for classification of
proteins, computer databases like CATH, and sequence methods like DALI for example.
1.1 CATH Protein Structure Classification
CATH is a hierarchical classification of protein domain structures in the Protein
Data Bank (3) which clusters proteins at four levels: Class (C), Architecture (A),
Topology (T) and Homologous Superfamily (H).
Such classification operates at the level of structural domains, as these domains
are likely to be the fundamental evolutionary building blocks or units (5). When a protein
has a similarity to another protein already in the database, then the new protein inherits
the domain boundaries of the existing entry. If the new protein has no relative in the
CATH database, three different algorithms (DETECTIVE, PUU and DOMAK) are used
to identify the structural domain automatically. If all the programs agree, the domain
boundaries are assigned. If not, then the domain boundaries are assigned manually based
on the rules below (see also 14). The four levels of CATH are described and figure 1
shows the hierarchy for the C, A, and T levels. References for CATH can be found in (4),
(5) and (14).
• Class C level is assigned by considering the secondary structure and packing
within the structure. Four classes are recognized: mainly alpha, mainly beta,
alpha-beta and the fourth class, which contains protein domains that have low
secondary structure and content. The correspondence of a protein to its class is of
more than 90% of protein structures are classified automatically, the rest are
determined by hand.
• Architecture A level describes the overall shape of secondary structures in three-
dimensional space but ignores their connectivity. Although an automatic
procedure is being developed, it is currently assigned manually using a basic
description of the secondary arrangements (e.g. roll, sandwich).
• Topology T level groups structures into fold families depending on the shape and
connectivity of the secondary structures. This fold group is also related to protein
domains that show a similarity in structure but have no sequence similarity. The
assignments are made by sequence and structure comparison (a SSAP score
greater than 70 is required) (5).
4
6. • Homologous Superfamily H level groups into domains that are thought to share
a common ancestor (homologous families) for either having sequence similarity
(35%) or high structural similarity (20%). Structural similarity is done by an
automatic method (SSAP>80).
Figure 1. Hierarchy of CATH at C, A and T levels. From reference (4).
1.2 Current Methods and Importance of a New Approach
In order to find similarity between 3D protein structures in the crystal state,
scientists have built a wide variety of protein structure alignment methods and techniques
such as distance matrix alignment (9), genetic algorithms (10) and double dynamic
programming (11). The general idea is to consider the protein backbone of two proteins
as two chains, A and B in the three dimensional space, and to find sub-chains α and β
of A and B respectively, such that the lengths of the sub chains α and β are equal and
maximal with the property that α and β are similar (see figure 2).
5
7. The most common parameter that expresses the difference between two proteins
is RMSD or root mean square deviation. RMSD can be computed using the position of
the alpha carbon atoms of the protein backbone and is a function of the distance between
atoms in one structure and the same atoms in another structure.
Because of the nature of these methods, we encounter some complications:
• A protein structure can contain several hundreds of atoms, therefore finding such
alignments may be high in computational cost. A structural comparison method
needs to be fast.
• As discussed in the introduction, these methods fail to satisfy the triangle
inequality. Indeed, if we consider three proteins made of the following sequences:
protein A=DEF-LMN, protein B=GHI-LMN and protein C=GHI-OPQ. Then
there is a similarity between protein A and B in the LMN region, and also there is
a similarity between protein B and C in the GHI region. However, we cannot infer
a similarity between A and C (see figure 3). The triangle inequality is violated
because it does not satisfy d ( A, C ) ≤ d ( A, B) + d ( B, C ) . When this occurs, we are
unable to judge dissimilarity and the problem worsens with increasing distance.
• In order to compute such measures, the methods require a series of adjustable
parameters such as gap and insertion penalties, weights, etc.
Figure 2. Two chains in three Figure 3. Failure of triangle inequality
dimensional space. From reference (12).
These complications lead to the search of a better, more efficient and fully
automated method. The protein backbone is a space curve, and mathematicians study
such curves in areas such as Knot Theory and Differential Geometry, we wish to apply
these mathematical techniques to the protein classification problem.
6
8. 1.3 The Writhing Number
We start with the concepts of linking number and the twist. These two numbers,
together with the writhing number are all related in a simple formula. These concepts
were obtained from (15) and (16).
A strip (C,U) is a smooth1 curve C together with a smoothly varying unit vector
U(t) perpendicular to C at each point.
Definition 1. If C1 (t1 ) and C2 (t2 ) are two disjoint oriented closed curves in space
parametrized by [0,1], the linking number is defined by the integral
1 (C1 (t1 ) − C 2 (t 2 )) ⋅ (∂C1 / ∂t1 × ∂C 2 / ∂t 2 )
Lk (C1 , C 2 ) =
4π ∫∫
C1 C2 | C1 (t1 ) − C 2 (t 2 ) |3
dt1 dt 2
The linking number is an integer that measures the entanglement between two
curves. Examples of the linking number are shown on figure 4 below, notice that figure
4c shows an example of two curves that are entangled, however the linking number is
zero.
Figure 4. From reference (16).
For any simple closed strip, the curves C + εU given parametrically C (t ) + εU (t )
are, for sufficiently small ε > 0 , simple closed curves disjoint from C, and the linking
number Lk(C, C + εU ) is defined and independent of ε . The vectors C ' (t), U(t) and
V (t ) = C ' (t ) × U (t ) define a moving frame (C ' ,U ,V ) along C. Let Ω denote the angular
1
A curve C is smooth if is infinitely differentiable.
7
9. velocity vector describing the rate of rotation of the frame with respect to the arclength t,
so that c' = Ω × C ' , µ = Ω × U and ν = Ω × V . Let ω1 , ω 2 and ω 3 be the components of
Ω referred to the moving frame, i. e., Ω = ω1C '+ω 2U + ω 3V . Then ω1 represents the
angular rate at which U revolves around C. ω1 is called the twist of the strip at each point
of the curve.
Definition 2. The total twist number Tw(C,U), is defined by the integral of ω1 with
respect to the arclength t over the curve C and divided by 2π . That is
1
2π ∫
Tw(C ,U ) = ω1 dt . The total twist number need not be an integer and if the curve C is
a simple plane curve then the linking number Lk (C , C + U ) and the total twist number
Tw(C,U) are equal.
Definition 3. The difference Wr (C ) = Lk (C , C + U ) − Tw(C ,U ) is a geometric invariant
of the curve C and is called the writhing number.
1.3.1 Directional Writhing Number
Definition 4. A smooth simple closed curve C and a fixed unit vector σ are said to be in
general position if the tangents to C are never parallel to σ . In this case the curves
C + εσ are disjoint from C for all sufficiently small ε > 0 , hence for such ε we can
may define the directional writhing number of C in the direction of σ by
Wr (C , σ ) = Lk (C , C + εσ ) .
If C and σ are in general position, the orthogonal projection of C onto a plane
with normal σ defines a smooth closed plane curve Cσ for which undercrossings and
overcrossings can be distinguished at each crossing point (see figure 5 below). At a
crossing point c of an oriented regular diagram for a curve, we have two possible
configurations. Either sign(c)=+1 or sign(c)= – 1 as shown on figure 5. The sign of a
crossing number is based on the right hand rule convention.
Figure 5.
If one adds all the signed crossing numbers for a fixed regular projection of a
curve for a direction σ , one obtains the directional writhing number Wr (C , σ ) . The
8
10. writhing number Wr of a curve C is equal to the average of the directional writhing
number over all projections, the average is taken with respect to the area on the unit
sphere.
Figure 6 shows examples of regular projections of two knots, for the oriented
projection of the trefoil knot (left) we have the projected writhing number is 3 while for
the oriented projection of the figure eight knot (right), is 0.
Figure 6
The writhing number Wr of a closed space curve γ can be calculated using
generalized Gauss integrals.
1
4π γ ×∫∫D
Wr (γ ) = w(t1 , t 2 )dt1 dt 2 ,
γ
where
[γ ' (t1 ), γ (t1 ) − γ (t 2 ), γ ' (t 2 )]
w(t1 , t 2 ) =
| γ (t1 ) − γ (t 2 ) |3
and D is the diagonal of γ × γ . The numerator of w(t1 , t 2 ) is the triple scalar product,
[γ ' (t1 ), γ (t1 ) − γ (t 2 ), γ ' (t 2 )] = γ ' (t1 ) ⋅ {[γ (t1 ) − γ (t 2 )] × γ ' (t 2 )} . The triple scalar product is
also equal to the oriented volume of the parallelepiped spanned by γ ' (t1 ), γ (t1 ) − γ (t 2 ) ,
and γ ' (t 2 ) . Thus w(t1 , t 2 ) = w(t 2 , t1 ) . Assuming that γ is parametrized by [0,1] it suffices
to calculate the integral on the domain ∆2 = {(t1 , t 2 );0 < t1 < t 2 < 1} . If
I (1, 2 ) = ∫ w(t1 , t 2 )dt1 dt 2 then:
∆2
1
Wr (γ ) = I (1, 2 )
2π
Another measure for curves is the average crossing number and is defined by
taking the absolute value of the integrand:
I |1, 2| (γ ) = ∫ | w(t1 , t 2 ) | dt1 dt 2
∆2
9
11. The main difference between the projection of a knot and space curves
(representing protein backbones) is that for knots we deal with simple closed curves,
while for protein backbones we have polygonal curves which are not closed.
1.3.2 Natural Notion of the Writhing Number for Polygonal Curves
For a polygonal curve the natural definition of writhing number is:
I (1, 2) (γ ) = Wr (γ ) = ∑W (i , i
0< i1
1 2 ),
< i2 < N
with
i1 +1 i2 +1
1
W (i1 , i2 ) =
2π ∫ ∫ w(t , t
t1 =i1 t 2 =i2
1 2 )dt1 dt 2 .
and w(t1 , t 2 ) = [γ ' (t1 ), γ (t1 ) − γ (t 2 ), γ ' (t 2 )] / | γ (t1 ) − γ (t 2 ) |3 .
Here W (i1 , i2 ) is the contribution to the writhing number coming from the i1 th
and the i2 th line segments. W (i1 , i2 ) is equal to the probability from an arbitrary
direction to see the i1 th and the i2 th line segment cross, multiplied by the sign of this
crossing. Thus, geometrically this notion of writhe number is still the projected writhing
number averaged over all projections.
By combining this number we can make a whole set of structural measures, e.g.
I |1, 2| (γ ) = ∑ | W (i , i
0<i1
1 2 ) |,
< i2 < N
I |1,3|( 2, 4 ) (γ ) = ∑ | W (i , i ) | W (i , i
0<i1 <i2
1 3 2 4 ),
<i3 <i4 < N
I |1,5|( 2, 4 )(3,6 ) (γ ) = ∑ | W (i , i ) | W (i , i
0<i1 <i2 < i3
1 5 2 4 )W (i3 , i6 )
<i4 <i5 <i6 < N
where N is the number of vertices of the polygonal curve.
Numbers like the ones just mentioned will constitute the building blocks for our protein
domain descriptors, which described in the next section.
10
12. 1.4 Representing Proteins in R 20
As mentioned before, the protein backbone is a space curve (see figure 7 below).
We are interested in the absolute measures of the geometry of these curves by studying
the self-crossings seen in a planar projection. These measures are inspired by generalized
Gauss integrals involved in formulas for the Vassiliev knot invariants.
Figure 7. Backbone curve of Lysozyme from Gallus Gallus, from (3).
For each protein domain on CATH 2.4, we have a geometric invariant of the polygonal
curve connecting the α -carbon atoms. Each domain is assigned a 20-dimensional vector
containing the measures described by the following:
I (1, 2) , I |1, 2| , I (1,3)( 2, 4) , I (1, 2)(3, 4) , I (1, 4)( 2,3) , I (1, 2)(3, 4)(5,6) , I (1, 2)(3,5)( 4,6) , I (1, 2)(3,6)( 4,5) , I (1,3)( 2, 4)(5, 6) ,
I (1,3)( 2,5)( 4,6) , I (1,3)( 2,6)( 4,5) , I (1, 4)( 2,3)(5,6) , I (1, 4)( 2,5)(3, 6) , I (1, 4)( 2,6)(3,5) , I (1,5)( 2,3)( 4,6) , I (1,5)( 2, 4)(3,6) ,
I (1,5)( 2,6)(3, 4) , I (1,6)( 2,3)( 4,5) , I (1,6)( 2, 4)(3,5) , and I (1,6)( 2,5)(3, 4) .
The measures are normalized such that each value is between –1 and 1. The
normalization factors are one over 146, 1277, 119, 101 023, 1206, 477 989, 6612, 23 946,
6448, 203, 1884, 54 581, 172, 258, 1246, 293, 1396, 36 143, 442, and 2468 respectively
for the measures in the order above.
Once each protein chain is mapped onto a point in the 20-dimensional space, the
usual euclidean metric is used to compare the protein chains.
11
13. 20
d ( x, y ) = ∑ (x
i =1
i − yi ) 2
Based on the scaled factors described given above, this metric is called the Scaled
Gauss Metric (SGM).
1.4.1 Results of the SGM when Tested for CATH 2.4
Let x, y and z be points in R 20 , then the Scaled Gauss Metric satisfies the three
properties for pseudometric:
i) d ( x, y ) = 0 if x=y
ii) d ( x, y ) = d ( y, x) (symmetry)
iii) d ( x, z ) ≤ d ( x, y ) + d ( y, z ) (triangle inequality).
The fact that SGM satisfies the triangle inequality is important because it allows
us to judge dissimilarity between proteins.
A computer algorithm (12,13,17) based on this metric was made to classify the
domains of all 20,937 of CATH 2.4 domains as of September 2002. The total success rate
was 98.6%. The remaining 1.4% of the chains are unknown; of these, 0.9% are actually
new folds. It presented no mistakes since unknown structures were flagged instead of
misclassifying. Also proteins of different sizes can be compared directly without use of
alignment or gap penalties. The figure 8 shows a projection map from R 20 to R 2 , and it
shows the CATH hierarchy. Here, every point represents a protein domain in CATH.
As described by the authors (12), the rectangle in the upper left contains all the
chains in CATH, colored according to their class ( α , β , αβ and few secondary
structures), notice that the αβ group resides between the α and the β groups. This
observation shows the congruence that exists between the automatic classification created
by the SGM and the CATH database assignation currently given.
Figure 9 shows the usefulness of the second order invariants. In this example the
curves A and B posses the same crossing number and average crossing number.
However the second order invariants can differentiate between the two curves.
12
14. Figure 8. From reference (12).
Figure 9. From reference (12).
13
15. 2. The Experimental Plan
2.1 Purpose and Objectives
The excellent results of the SGM shown in the previous section are elegant, fast,
computationally viable, and motivate one to understand the true geometric meaning of
such measures.
As it was mentioned before, the geometric idea of all these measures is still not
fully understood (12-13). While there is a geometric interpretation of the writhing
number ( I (1, 2) ) and the average crossing number ( I |1, 2| ), the meaning of the higher order
measures is still a mystery. Another important question worth investigating is to
determine if it is possible to classify protein structure domains with less Gauss measures
(described in 1.4), if some of the measures are strongly correlated or provide more
information and it will be possible to improve the combinations used. Finally, it might be
plausible to apply this method to classification of RNA secondary structures.
During this research proposal I intend, with the support of my advisor, De Witt
Sumners, to complete the following objectives:
I) Determine the geometric meaning of the higher order invariants obtained
from the Gauss integral measures. Such work will validate the importance
of the role of these numbers and corroborate the excellent results obtained
from experimental evidence.
II) Optimize the choice of the invariant numbers used to classify the protein
structures. This will allow an increase of the speed and efficiency of the
computer algorithms to classify the protein structures by selecting the best
shape descriptors, and the minimum quantity necessary of such
descriptors.
III) Study the mathematical idea involved in these numbers and the possible
applications to branches of mathematics such as Knot Theory and
Differential Geometry.
IV) Explore the possibility of application of these methods to the classification
of RNA secondary structures. Since an RNA secondary structure can be
seen as a chain or a polygonal curve, an approach to this unexplored topic
could result in promising and new applications of mathematics in biology.
The research questions are as follows:
14
16. Are the numbers obtained by using the higher order writhe calculations truly shape
descriptors of space curves? Or, are they just numbers chosen by chance, that work only
for very particular curves?
The answer to these questions will unveil the true geometric meaning of these higher
order invariants. This is fundamental to validate the automatic classification computer
method for novel protein structure domains.
2.2 Procedures
The research will be based on mathematics and on biology as described below.
To begin with, we consider a review of the old literature related to the writhing
number such as the work by J. H. White (18), G. Gălugăreanu (19), and Brock Fuller (15-
16) ,as well as the new literature that focus also on the concept of writhing number for
open and closed curves (20-28). A study on the proof and the methods for solving the
primary cases would provide clues for solving the general case for the higher order
invariants.
Another fundamental source of information is to review current computer algorithms
designed to calculate the writhing number particularly applied to fields such as biology
and physics (27). Some of these computer algorithms are in the public domain and can be
downloaded (28).
An algorithm to compute the writhing number is essential to understand and to verify
the geometric ideas. Using Monte Carlo simulations, we intend to estimate the write
number of a polygonal curve of n in the simple cubic lattice. The advantage of using a
simple cubic lattice is that for a closed curve, the problem reduces the writhing number
computation to the average of the linking number of the given curve with four of its
pushoffs (24). The next step would be to study the higher order invariants on this simple
cubic lattice.
To verify the data on simulation results we would like to consider some examples.
We will first consider simple cases where we know the answer and then we will apply
these methods for a polygonal curve describing the backbone of some protein crystals.
Such data can be obtained from the Protein Data Bank (3).
Finally, we would like to apply this method to RNA secondary structures. A
ribonucleic acid (RNA) molecule consists of a chain of ribonucleotides linked together by
covalent chemical bonds (29). Figure 10 shows a model of an RNA structure obtained
from the Protein Data Bank. We notice that RNA structures, like on the figure 10, can be
seen as a chain that bends and twines about itself. Such self-crossings are of particular
interest because the Gauss measures, designed to describe the shape of proteins, can be
applied to these chains.
15
17. With these approaches we expect to understand the geometric meaning of these higher
order invariants.
Figure 10. Pseudoknot within the gene 32 messenger RNA
of Bacteriophage T2. Image obtained by Protein Data Bank (3).
16
18. References
(1) Gale Rhodes. Crystallography: Made Crystals Clear. Academic Press, 2000,
Second Edition.
(2) Joseph P. Hornak. The Basics of NMR. <http://www.cis.rit.edu/htbooks/nmr/>.
(3) Protein Databank, Available from <http://beta.rcsb.org/pdb/>.
(4) CATH Protein Structure Classification <http://www.biochem.ucl.ac.uk/bsm/cath/>.
(5) Pearl, F. M. G. Lee, D., Bray, J. E. Sillitoe, I., Todd, A. E., Harrison, A. P.,
Thornton, J. M. and Orengo, C. A. Assigning Genomic Sequences to CATH.
Nucleic Acids Research. 2000, Vol 28. No 1. 277-282.
(6) Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP : A Structural
Classification of Proteins Database for the Investigation of Sequences and
Structures. J. Mol. Biol. 1995, 247:536-540.
(7) Patrice Koehl, Protein Structure Similarities. Curr. Opin. Struct. Biol. 2001,
11:348-353.
(8) CE Combinatorial Extension <http://cl.sdsc.edu>, available to download from
<ftp://ftp.sdsc.edu/pub/sdsc/biology/CE/src>.
(9) DALI Distance Matrix Alignment <http://www2.ebi.ac.uk/dali>, available to
download from <http://jura.ebi.ac.uk:8765/~holm/DaliLite>.
(10) KENOBI Alignment Using a Genetic Algorithm
<http://sullivan.bu.edu/kenobi>, available to download from
<http://www.columbia.edu/~ay1>.
(11) STRUCTAL Double Dynamic Programming
<http://bioinfo.mbb.yale.edu/align/server.cgi>.
(12) Peter Rogen, Boris Fain. Automatic Classification of Protein Structure by Using
Gauss Integrals. PNAS, Vol 100 (2003), no.1, 119-124.
(13) Peter Rogen, Henrik Bohr. A New Family of Global Protein Shape Descriptors.
Math Biosc 182 (2003), 167-181.
(14) Orengo, C. A., Michie, A. D., Jones, S., Jones, D. T., Swindells, M. B., and
Thornton, J. M. CATH- A Hierarchy Classification of Protein Domain Structures.
Structure. Vol 5 (1997), No 8. 1093-1108.
17
19. (15) F. Brock Fuller, The Writhing Number of a Space Curve. Proc. Nat. Acad. Sci.
USA, Vol. 68, No. 4 (1971), 815-819.
(16) F. Brock Fuller, Mathematical Problems in the Biological Sciences, Proceedings of
Symposia in Applied Mathematics, ed. R. E. Bellman (American Mathematical
Society, Providence) Vol. 14 (1962), 64-68.
(17) Peter Rogen, Robert Sinclair. Computing a New Family of Shape Descriptors for
Protein Structures. J. Chem. Inf. Comput. Sci. 43 (2003), 1740-1747.
(18) White J. H., Self-Linking and the Gauss Integral in HigherDimensions. Am. J.
Math. 91 (1969), 693-727
(19) G. Gălugăreanu, Sur les Classes D’isotope des Noeuds Tridimensionnels et Leur
Invariants, Czechoslovak Mathematical Journal 11 (1961), 588-625.
(20) Lin, X-S, Wang, Z. Integral Geometry of Plane Curves and Knot Invariants. J.
Differ. Geom. 44 (1996), 74-95.
(21) Yu. Aminov, Differential Geometry and Topology of Curves, Gordon and Breach
Science Publishers (2000).
(22) Eric S. Lander, Michael Waterman, Calculating the Secretes of Life, National
Research Council (1995).
(23) Levitt group Server, <http://www.stanford.edu/~bfain/>.
(24) E. Orlandini, M. C. Tesi, E. J. Janse van Rensburg, D. W. Sumners, S. G.
Whittington, The Writhe of a Self-avoiding Polygon, J. Phys. A: Math. Gen. 26
(1993), 981-986.
(25) E. Orlandini, S. G. Whittington, D. W. Sumners, M. C. Tesi, E. J. Janse van
Rensburg, The Writhe of a Self-avoiding Path, J. Phys. A: Math. Gen. 27 (1994),
333-338.
(26) Meivys Garcia, Emmanuel Ilangko, Stuart G. Whittimgton, The Writhe of Polygons
on the Face-centered Cubic Lattice, Path, J. Phys. A: Math. Gen. 32 (1999), 4593-
4600.
(27) Corinne Cerf, Andrzej Stasiak, A Topological Invariant to Predict the three-
dimensional Writhe of Ideal Configurations of Knots and Links, PNAS Vol. 97
(2000), 3795-3798.
(28) Pankaj K. Agarwal, Herbert Edelsbrunner, Yusu Wang, Computing the Writhing
Number of a Polygonal Knot, SODA, (2002), 791-799.
18
20. (29) RNA World at IMB Jena: <http://www.imb-jena.de/RNA.html>.
19