This document proposes a new graph model and algorithm for assigning signals in 3D NMR spectra of RNA molecules. The graph model represents the 3D NMR spectrum as a graph where vertices are cross-peaks and edges represent possible connections between cross-peaks based on their coordinates. The algorithm performs an "assignment walk" through this graph representation to reconstruct pathways of magnetization transfer and assign signals. It was tested on exemplary 3D NMR spectral data and aims to automate the currently bottlenecked process of signal assignment in RNA structure determination.
2. even disables resonance signal identification on the basis of
two-dimensional experiments. A step towards three-
dimensional spectra is the most evident solution to this
problem. In this paper, we focus on a novel approach to an
analysis of three-dimensional spectra of RNA molecules.
We introduce a graph-based theoretical model to represent
signal assignment problem in 3D NMR spectrum. Basing on
this model, we feature an enumerative algorithm performing
an assignment walk through the graph representation of
spectral data. Finally, we present the algorithm processing
exemplary three-dimensional spectrum and we point out the
directions for further research.
II. GRAPH MODEL
From among many different three-dimensional NMR
experiments, three are used for sequential assignment: HCP,
HSQC-NOESY, and NOESY-NOESY [15][16]. Each of
these types serves an analysis of other correlation signals.
However, the procedure of assignment is common for all. It
starts from the identification of the sequence-specific
connectivity pathway representing magnetization transfer
between the selected nuclei of the analyzed molecule.
Consequently, H4’-C4’-P signals in heteronuclear HCP
spectrum, representing the intra- and internucleotide scalar
interactions form the pathway (H4’n-C4’n-Pn)-(H4’n-C4’n-
Pn+1)-(H4’n+1-C4’n+1-Pn+1)-…, where n stands for a residue
number [16]. Fig. 1 shows an example of such a track of
magnetization transfer within the single RNA chain, whereas
the corresponding pathway in HCP spectrum is presented in
Fig. 2. HSQC-NOESY is mixed, homo- and heteronuclear
experiment, being most frequently used to resonance
assignment of RNAs. It provides the information about
many different interactions, collected in the separate regions
of its spectrum. The most meaningful are the signals
constructing the following pathways: (C1’n-H1’n-H8/H6n)-
(C1’n-H1’n-H8/H6n+1)-(C1’n+1-H1’n+1-H8/H6n+1)-…, and
(C8/C6n-H8/H6n-H1’n)-(C8/C6n-H8/H6n-H1’n+1)-(C8/C6n+1-
H8/H6n+1-H1’n+1)-…[16]. Finally, homonuclear NOESY-
NOESY spectra can be used for a reconstruction of several
magnetization transfer tracks, from which crucial are
(H8/H6n+1-H1’n-H8/H6n)-(H8/H6n+1-H1’n-H8/H6n+1)-
(H8/H6n+1-H1’n+1-H8/H6n+1)-...[16].
Correlation signal recorded during NMR experiment is
visualized as a cross-peak in the spectrum. Each cross-peak
(signal) is characterized by its location (i.e. three
coordinates, F1, F2, F3), size (width in each dimension), and
intensity. When reconstructing a single connection in the
pathway, one must apply one of the following principles:
- link two cross-peaks having one common coordinate;
- link two cross-peaks having two common coordinates.
The pathway itself should be the longest possible, and it
must keep the regularity in the transition of line.
With respect to the above description of the problem, we
have proposed its mathematical model based on graph
theory. Let us denote by DFi(a,b) the direction of an edge
between cross-peaks a, and b, having different coordinates
in Fi dimension, and denote by DFiFj(a,b) the direction of an
edge between cross-peaks a, and b, which differ in
dimensions Fi and Fj. Now, we can define a spectral graph,
representing an assignment problem in 3D NMR spectrum:
Fig. 1. Magnetization transfer between H4’, C4’, and P nuclei
observed during 3D HCP experiment for r(ACGU).
Fig. 2. A fragment of simulated 3D HCP spectrum for r(ACGU) with
magnetization transfer pathway between H4’, C4’, and P. F1, F2, and
F3 axes represent chemical shift of these nuclei, respectively.
3. Definition 1 (spectral graph)
Let G=(V,E) be an undirected graph satisfying the following
conditions:
1) Every vertex v∈V represents one cross-peak from 3D
NMR spectrum S.
2) A number |V| of vertices in graph G equals a number of
cross-peaks in the corresponding spectrum S.
3) Every edge ej∈E, j=1..|E| is assigned a label
lj={0,1,2,3,4,5}, where
( )
( )
⎪
⎪
⎪
⎪
⎩
⎪
⎪
⎪
⎪
⎨
⎧
=
)
v
,
(v
D
if
5
)
v
,
(v
D
if
4
)
v
,
(v
D
if
3
)
v
,
(v
D
if
2
)
v
,
(v
D
if
1
)
v
,
(v
D
if
0
v
,
v
e
l
n
m
F1
F3,
n
m
F1
F2,
n
m
F3
F2,
n
m
F2
n
m
F1
n
m
F1
n
m
j
j
4) A number |E| of edges in graph G equals all possible
connections that can be drafted in the spectrum.
Let us notice that, apart from the location which determines
edge labeling, other features of the cross-peaks are not
transmitted to the elements of the spectral graph. However,
these features can be used upon user demand during a
construction of graph edges.
The sequential assignments of NMR signals correspond to
a reconstruction of a transfer (assignment) pathway between
the vertices of the spectral graph. Let us then formulate such
a pathway definition in terms of graph theory.
Definition 2 (assignment pathway)
Let PG=e1,e2,…,ek, k=|E|, be a sequence of edges of spectral
graph G=(V,E). We will call PG the assignment pathway in
G, if the following conditions are satisfied:
1) Every vertex v∈V and every edge e∈E of G occurs in PG
at most once.
2) PG is constructed according to one of the following
principles:
a) )
( 2
1,
},
2
,
1
,
0
{
.
2
..
1
),
( +
+ ≠
≠
∈
−
=
∈
∀ j
j
j
j
j
G
j
j l
l
l
l
l
k
j
P
e
l
b) ) ( )
( 3
mod
3
mod
.
1
..
1
),
( 1
+
=
−
=
∈
∀ j
j
G
j
j l
l
k
j
P
e
l
3) PG does not contain collinear edges.
The above definition assumes the ideal case where all the
vertices are included in the path. In the real spectra we will
construct the longest possible path between the vertices. Let
us also explain that there are two possible kinds of the
assignment pathway. They depend on a type of interactions
which are traced via the pathway construction. In case of an
analysis of homonuclear correlations, each edge of PG
satisfies principle (2a) from Definition 2, whereas in case of
heteronuclear interactions edges follow the point (2b).
Fig 3. presents the fragment of NMR spectrum with
enumerated cross-peaks and the corresponding spectral
graph. The spectrum has been projected on the plane F2-F3.
An appropriate label is assigned to each edge of the graph.
For better visualization each label has been associated with a
different color. Thus, we obtained the edge-colored graph
[17]. The assignment pathway has been marked in the graph.
Let us notice that any type of such a pathway found in a
spectral graph is an alternating walk.
III. ASSIGNMENT WALK ALGORITHM
On the basis of the graph model of the assignment
pathway reconstruction in 3D NMR spectra we have
proposed the first enumerative branch-and-bound algorithm
to solve the problem. The algorithm builds a graph
representation of the spectral data provided and runs the
search procedure. It uses domain expert knowledge to
introduce additional constraints that limit the search space to
the reasonable proportions. The algorithm has been
implemented in C programming language and runs in Unix
as well as Windows environment.
The number of possible assignment pathways and their
lengths depend on RNA structure and spectrum
characteristics (e.g. signal overlapping). Usually there exist
several pathways that satisfy all the required conditions. We
assumed that in the first tests of the method all the possible
solutions should be returned. However, we equipped the
algorithm with the procedures that allow to cut the searching
process according to the supplemental data that can be
provided by the user.
(a)
(b)
Fig. 3. A fragment of NOESY-HSQC spectrum projected on the plane
F2-F3 (a) and the corresponding spectral graph with the assignment
pathway drawn with the thick line (b).
4. Let us now briefly describe the input data. All the spectral
parameters are listed in a text file generated by NMR
software (e.g. Accelrys Felix) from the 3D NMR spectrum
after peak-picking procedure. The file specifies all the cross-
peaks contained in the spectrum. For each cross-peak, there
are: its number, three coordinates (F1,F2,F3) given in ppm
or Hz, widths in three dimensions given in Hz, and volume
(i.e. intensity of the NMR signal). Additionally, user may
provide the file with supplemental data, in which he can
define: a type of interaction (homo- or heteronuclear),
resolution of a spectrum, region for an analysis (if it is not
necessary to consider the whole spectrum), incorrect cross-
peaks if known, minimum and maximum length of the path,
start points of the pathway, positions of selected cross-peaks
within the sequence, regions with wrong signal separation,
buffer size and maximum number of solutions to be
returned.
The proposed method starts from building a graph
representation of the spectrum. At first all the information
about cross-peaks is placed in the vertex array. This
structure is used to construct the edge set of the graph. Main
search procedures use the adjacency list which is created
next. Current solution is stored in the stack of vertices. An
array of indexes is an additional structure to keep the
information about the allowed sequence of moves. Fig. 4
presents the general view of the method.
In the first step an algorithm reads the input files, rejects
the cross-peaks that can be omitted during further analysis
and constructs all the correct edges upon the modified set of
vertices. Every edge is automatically assigned an appropriate
label. Not-labeled edges are consider incorrect and they are
not added to the graph structure. Basically, edges are
generated according to the spectral parameters. However, if
the resolution is defined by the user in the supplemental data
file, the algorithm deviates the values of cross-peak
coordinates within the error range. This usually results in
creating the bigger edge set. Next, the procedures take every
single edge from the set and try to build a path starting with
this edge. All the supplemental data concerning starting
points and known signal positions within the pathway are
considered during the reconstruction process. The set of
solutions is returned when the stopping criterion is satisfied.
The criterion is defined by the buffer size, maximum number
of solutions and the number of iterations.
IV. ALGORITHM’S PERFORMANCE
The first experiments were performed on PC (AMD
Athlon XP 1600+, 512 RAM) in Windows XP environment.
The algorithm was tested on the spectral data simulated for
the following experiments:
During the experiments buffer size has been set to 50,000.
This reduced possible input/output operations that highly
increase the time of computation. Thus, pathway
reconstruction processes have been performed in RAM, with
no necessity to of using disk space.
The following table presents test results. First rows of the
table feature the input data by giving the information about
the number of cross-peaks in the spectrum, the type of
interactions which determines directions of edges in the
assignment pathway, and the defined supplemental data.
Next rows shows the results of computational experiments,
i.e. a number of solutions generated by the algorithm and the
time of computations. We can see that computation is
performed quickly and the number of solutions is
reasonable. In all of the cases the original assignment
pathway, a priori known, has been reconstructed by the
algorithm. It is important to add that supplemental data
highly decrease a number of solutions. Especially specifying
the spectral region is necessary if we are interested in a
reconstruction of the pathway representing the selected
NMR interactions.
Fig. 4. General scheme of the assignment walk algorithm.
TABLE I
EXPERIMENTAL DATA SET
No Type of experiment molecule
1 NOESY-HSQC r(CGCCGGUA)
2 NOESY-HSQC r(UACGACGGUACG)
3 NOESY-HSQC r(CCCUGAAAAGG)
4 NOESY-NOESY r(GGGUAGCGAAAGCUACCC
)
TABLE II
TEST RESULTS
Sample 1 Sample 2 Sample 3 Sample 4
Number of
cross-peaks
30 41 57 51
Interactions Heteronuc. Heteronuc Homo-
/heteronuc.
Heteronuc.
Supplemental
data
Maximum
length; two
separate
regions of
a spectrum
Maximum
length;
separate
regions of
a spectrum
Selected
region of a
spectrum
Selected
region of a
spectrum
Number of
solutions
2 4 16 14
Computation
time [s]
0.05 0.093 0.3 0.025
5. Fig. 5 presents the tested spectra for Sample 1 and Sample
4 in their projection on a plane. In both cases, the original
assignment pathway has been drawn in the spectrum.
V. CONCLUSION
In this paper we have analyzed the problem of signal
assignment in the 3D NMR spectrum and we proposed the
first model of the problem based on graph theory. We have
implemented an enumerative algorithm for a reconstruction
of assignment pathways and performed computational tests.
In the nearest future, a representative set of experimental
data should be recorded, containing spectral parameters for
already known anad unknown structures of RNAs. The
influence of the supplemental data on the algorithm
performance and on solutions quality should be analyzed.
ACKNOWLEDGMENT
Authors thank Slawomir Klemczak from the Institute of
Computing Science, Poznan University of Technology, for
technical assistance.
REFERENCES
[1] P.E. Bourne, and H. Weissig, Structural Bioinformatics, La Jolla, CA:
Wiley-Liss, 2003.
[2] P.D. Zamore, and B. Haley, “Ribo-gnome: the big world of small
RNAs”, Science, vol. 309, 2005, pp.1519-1524.
[3] I. Tinoco Jr., and C. Bustamante, “How RNA folds“, J. Mol. Biol.,
vol. 293, 1999, pp. 271-281.
[4] N.B. Leontis, A. Lescoute, and E. Westhof, “The building blocks and
motifs of RNA architecture”, Curr. Opin. Struct. Biol., vol. 16, 2006,
pp. 279-287.
[5] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H.
Weissig, I.N. Shindyalov, and P.E. Bourne, “The Protein Data Bank”,
Nucleic Acids. Res., vol. 28, 2000, pp. 235-242.
[6] G. Varani, and I. Tinoco Jr., “RNA structure and NMR spectroscopy”,
Q. Rev. Biophys, vol. 24, 1991, pp. 479-532.
[7] R.W. Adamiak, J. Blazewicz, P. Formanowicz, Z. Gdaniec, M.
Kasprzak, M. Popenda, and M. Szachniuk, “An algorithm for an
automatic NOE pathways analysis of 2D NMR spectra of RNA
duplexes”, J. Comp. Biol,., vol. 11, 2004, pp. 163-180.
[8] J. Blazewicz, M. Szachniuk, and A. Wojtowicz, „RNA tertiary
structure determination: NOE pathways construction by tabu search”,
Bioinformatics, vol. 21/10, 2005, pp. 2356-2361.
[9] H.N.B. Moseley, and G.T. Montelione, “Automated analysis of NMR
assignments and structures for proteins”, Curr. Opin. Struct. Biol., vol.
9, 1999, pp. 635-642.
[10] H.S. Atreya, S.C. Sahu, K.V. Chary, and G. Govil, “A tracked
approach for automated NMR assignments in protein (TATAPRO)”,
J. Biomol. NMR, vol. 17, 2000, pp. 125-36.
[11] J.P. Linge, M. Habeck, W. Rieping, and M. Nilges, “ARIA:
automated NOE assignment and NMR structure calculation”,
Bioinformatics, vol. 19, 2003, pp. 315-316.
[12] C. Balley-Kellogg, S. Chainraj, and G. Pandurangan, “A Random
Graph Approach to NMR Sequential Assignment”, Curr. Comp. Mol.
Biol., 2004, pp. 58-67.
[13] R. Dunkel, and X. Wu, “Identification of organic molecules from a
structure database using proton and carbon NMR analysis results”, J.
Magn. Reson., vol. 188, 2007, pp. 97-110.
[14] H. Heise, K. Seidel, M. Etzkorn, S. Becker, and M. Baldus, “3D NMR
spectroscopy for resonance assignment and structure elucidation of
proteins under MAS: novel pulse schemes and sensitivity
considerations”, J. Magn. Reson., vol. 173, 2005, pp. 64-74.
[15] M. Popenda, “An application of NMR and molecular modeling in
structural analysis of RNA”, Ph.D. thesis, Institute of Bioorganic
Chemistry, PAS, Poznan, Poland, 1998.
[16] M. Szachniuk, M. Popenda, S. Klemczak, and J. Blazewicz, “An
analysis of 3-dimensional NMR spectra in the process of RNA
structure determination”, Poznan Supercomputing and Networking
Center, Poznan, Poland, RA-001/2007, 2007.
[17] H. Li, G. Wang, and S. Zhou, “Long alternating cycles in edge-
colored complete graphs”, Laboratoire de Recherche en Informatique,
CNRS, Orsay, France, No. 1481, 2007.
(a)
(b)
Fig. 5. The spectra for sample 1 (a) and sample 4 (b) with the original
assignment walks.