• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)
 

RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

on

  • 1,638 views

Increasingly sophisticated knowledge about RNA structure and function requires an inclusive knowledge representation that facilitates the integration of independently-generated information arising ...

Increasingly sophisticated knowledge about RNA structure and function requires an inclusive knowledge representation that facilitates the integration of independently-generated information arising from such efforts as genome sequencing projects, microarray analyses, structure determination and RNA SELEX experiments. While RNAML, an XML-based representation, has been proposed as an exchange format for a select subset of information, it lacks machine-understandable semantics that make it arbitrarily user-extensible, as is the case for formal logic based languages. Here, we describe an RNA knowledge base (RKB) for structure-based knowledge using RDF/OWL Semantic Web technologies. RKB contains basic terminology for nucleic acid composi-tion along with context/model-specific representation of structural features such as sugar conformations, base pairings and base stackings. RKB is populated with RNA PDB entries and MC-Annotate structural annotation. The use of semantic web technologies addresses the reality of diverse interests of the RNA Ontology Consortium and supports knowledge discovery over independently-published RNA knowledge.

Statistics

Views

Total Views
1,638
Views on SlideShare
1,637
Embed Views
1

Actions

Likes
0
Downloads
22
Comments
0

1 Embed 1

http://localhost 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • This nomenclature has several advantages. First the names are easy to remember and there is no need to reference any documentation. Second the name alone gives a good idea of the base pair geometry. Third, isosteric pairs have the same name.Despite these advantages, LW cannot differentiate base paring types that differ by a sliding of the bases along the interacting faces, especially in the context of single H-bond base pairs (GU base pair).
  • To increase the precision of the LW nomenclature, Lemiux and Major defined the LW+ nomenclature by decomposing the edges into faces. Then, they defined and implemented an algorithm to reduce possible identification ambiguities to anecdotal occurrences.
  • Base pairs are described from the hydrogen bonding that occur between the edges of the corresponding nitrogen containing bases but can also be more precisely described by the interaction with their facesBase pairs necessarily involve the participating nucleotides, but face interactions, which are part of the base pair, can specifically identify the participating edges, which are qualities of thei r respective nucleotides. The figure part A illustrates the representation of a Ww / O2’ face interaction as part of a GG base pair in model 5 of PDB:1AJU. In model 10 (figure part B) an additional face interaction (Ss/O2’) can easily be added as another part of the base pair.
  • The base pair shown is part of the solution NMR RNA structure for the “hairpin ribozyme loop B domain” (PDBID: 1B36). This base pair is formed in model 4 of the PDB entry between AMP at position 9 and AMP at position 29. The base pair in question is realized through a single hydrogen bond formed between the interaction of the Hw and Hh between residues 9 and 29 respectively. Both of these faces form part of the HoogsteenEdges from each base. (the Legend A [square] A is a well used nomenclature to indicate that it’s a trans Hoogsteen/Hoogsteen BP).(the OWL-DL class seen in the box represents the types that were asserted/infered at the instance level of the base pair)This base pair is classified as being of Class II according to the Saenger nomenclature and is classified as a (8) Trans, Hoogsteen/Hoogsteen, Parallel according to the Leontis-Westhof classification
  • The ribose ring presents two main puckering modes, “envelope” and “twist”. The “envelope” geometry is observed when one atom is located over or below the plane formed by the four others, whereas the “twist” geometry is observed when one atom is over and another is below the plane formed by the three others. The classification of a ribose, into either geometry is dependent on the relative position of the carbon atoms of the ribose to its C5’ atom. Hence carbon atoms in a ribose bear either the endo or exo role with respect to the plane formed by the other atoms
  • Our implementation of situational modeling assures that objects are represented by a single entity throughout their lifetime, thus avoiding the need to create multiple distinct instances of the same object in each particular spatial-temporal context with different attributesThe RKB enables the description of the role played by the carbon atoms in the ribose of a nucleotide that define its pucker quality. “When the plane is horizontal and the C5’ atom is oriented to left, the atoms located over the plane are said to be endo to the C5’”, hence these atoms are considered to have an endo role and thus the ribose instantiates the envelope quality

RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting) RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting) Presentation Transcript

  • RKB – A Semantic Knowledge Base for RNA Michel Dumontier 1, José Cruz-Toledo 1 Marc Parisien 2, Francois Major 2 1 Carleton University 2 Université de Montreal
  • Objectives i. To represent biochemistry of nucleic acids and their structural characteristics including base pairing/stacking ii. Represent context specific knowledge iii. Capture the structural annotation generated by MC-Annotate 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 2
  • Guided design • Modeling with Upper Level Ontologies – interoperability and semantic coherency – New Upper Level Ontology (NULO) • distinguishes objects, qualities, roles, processes and spatial regions • Based on BFO/RO, but for OWL 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 3
  • Biological Modeling • Objects – Occupy space • Nucleic acids, nucleotides, riboses and phosphates • Qualities – Intrinsic categorical or numeric valued property • Nucleotide bears the quality of conformation • Roles – Defined by extrinsic interactions • A C3’ atom may hold the exo role during some sugar puckering • Processes – Entities that extend in time • structure determination, an interaction 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 4
  • Contextual Modeling of Nucleic Acids • Base stacking varies in different XRD/NMR models • Need to know in which model that info is found • We want to set the stage for representing simulation. 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 5
  • RKB populated with PDB, MC-Annotate • The ontology population involved 3 steps: i. Assigning names ii. Asserting class membership iii. Assigning relations between entities • The following naming convention was used: – Objects: • Polymer: PDBID_cCHAIN • Residue: PDBID_cCHAIN_rRESIDUE • Atom: PDBID_cCHAIN_rRESIDUE_aAtom – Quality/Roles • PDBID_mMODEL_cCHAIN_rRESIDUE_type – Processes • Structure determination: PDBID_mMODEL • Interaction: PDBID_mMODEL_PROCESSTYPE_PARTICIPANT 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 6
  • Support for Leontis-Westhof Nomenclature • The RKB incorporates LW nomenclature • Describes the three edges for H-bonding interactions in purines (Y) and pyrimidines (R) • Atom composition: i. Watson-Crick Edge: • A(N6)/G(O6), R(N1), A(C2)/G(N2), U(O 4)/C(N4), Y(N3) and Y(O2) ii. Hoogsteen Edge (CH edge for R): • A(N6)/G(O6), R(N7), U(O4)/C(N4) and Y(C5) iii. Sugar Edge: • A(C2)/G(N2), R(N3), Y(O2) and O2’ • cis and trans orientations • relative orientations of the glycosidic bond between the sugar and the PO4 group 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 7
  • Support for LW+ Nomenclature • Extension incorporates faces to each edge: – WC edge: • Wh, Ww and Ws faces – Hoogsteen Edge: • C8(Y), Hh, Hw and Bh – Sugar Edge: • Bs, Ss(Y), Sw and O2’ • The Bh and Bs faces involve the Hoogsteen side amino/keto group and the sugar side amino/ keto group respectively. • The C8 face was introduced for the C8-H8 donor group in purines 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 8
  • Describing Base Pairs • Base pairs composed of interactions with the edges or faces of the interacting bases • Role chains capture additional knowledge: Objects that participate in sub-processes (face interactions) are also participants of the process whole (base pair) hasPart ◦ hasParticipant -> hasParticipant Objects are involved in processes when their qualities are isBearerOf ◦ isParticipantIn -> isParticipantIn 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 9
  • The RKB is compatible with both the LW and the Saenger nomenclature for base pairs • The semantics of the RKB enables the usage of consistent bp naming schemes • The AA BP in model 4 of PDB:1B36 can be classified as the being member of the following classes: – Saenger type II A A – LW Trans Hoogsteen/Hoogsteen (8) NucleotideBasePair and ParallelBasePair and TransBasePair and HoogsteenHoogsteenBasePair and hasAgent exactly 2 AMP Carleton University :: Dumontier Lab :: 5/25/2009 10 dumontierlab.com
  • Sugar Puckering • The ribose ring presents two distinct puckering modes, envelope and twist • The classification into either geometry is dependent on the relative position of the carbon atoms of the ribose to its C5’ atom • Carbon atoms in a ribose thus bear either the endo or exo role with respect to the plane formed by the other atoms 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 11
  • Sugar Puckering (cont’d) Our implementation of situational modeling assures that objects are represented by a single entity throughout their lifetime, thus avoiding the need to create multiple distinct instances of the same object in each particular spatial-temporal context with different attributes 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 12
  • RKB is SPARQL accessible • SPARQL is a graph query language • Loaded instantiated ontology into Virtuoso 6 • SPARQL endpoint – http://codemonkey.dumontierlab.com/sparql/ • Specify Graphs to restrict search – http://semanticscience.org/rkb/mcannotate/pdb/dna – http://semanticscience.org/rkb/mcannotate/pdb/rna Carleton University :: Dumontier Lab :: 5/25/2009 13 dumontierlab.com
  • Query 1: Find all face interactions (model 1 of PDB:1B36) PREFIX ss: <http://semanticscience.org/> select distinct ?faceInteraction where { ?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> . ?pair ss:hasProperPart ?faceInteraction . ?faceInteraction rdf:type ss:FaceInteraction . } Nucleotide base pairs are composed of one or more face interactions. Where known, such as in the MC-Annotate results, we can retrieve all 18 instances of this that satisfy this query. Carleton University :: Dumontier Lab :: 5/25/2009 14 dumontierlab.com
  • See results : http://tinyurl.com/porxdb 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 15
  • Query 2: Find all C8 mediated base pairs (model 1 of PDB:1B36) PREFIX ss: <http://semanticscience.org/> SELECT DISTINCT ?faceInteraction ?residue ?hasC8Face where { ?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> . ?pair ss:hasProperPart ?faceInteraction . ?faceInteraction rdf:type ss:FaceInteraction . ?C8Face ss:isAgentIn ?faceInteraction . ?C8Face rdf:type ss:C8Face . ?residue ss:hasQuality ?C8Face } Results: http://tinyurl.com/r7b5e4 Face interactions are mediated by the faces of bases. Nucleotides and their face qualities are related by the hasQuality relation, whereas faces are agents in the face interaction, and are related by the hasAgent relation. Carleton University :: Dumontier Lab :: 5/25/2009 16 dumontierlab.com
  • Query 3: Find base pairs involving a GMP sugar-sugar face (model 1 of PDB:1B36) PREFIX ss: <http://semanticscience.org/> SELECT distinct ?faceInteraction ?residue ?hasSSFace WHERE { ?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> . ?pair ss:hasProperPart ?faceInteraction . ?faceInteraction rdf:type ss:FaceInteraction . ?hasSSFace rdf:type ss:SugarSugarFace . ?hasSSFace ss:isAgentIn ?faceInteraction . ?residue ss:hasQuality ?hasSSFace . ?residue rdf:type ss:GMP } Results found at: http://tinyurl.com/qpup8z This query builds on Query 2, in that it requires a Ss face to be on an AMP that is participating in a base pair. Two GMPs are found to have this particular face participating with other nucleotides in base pairs in this particular structure Carleton University :: Dumontier Lab :: 5/25/2009 17 dumontierlab.com
  • Query 4: Find Hoogsteen – O2’ face interactions (model 1 of PDB:1B36) PREFIX ss: <http://semanticscience.org/> SELECT distinct ?faceInteraction ?residue1 ?residue2 ?hasHhFace ?hasO2pFace where { ?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> . ?pair ss:hasProperPart ?faceInteraction . ?faceInteraction rdf:type ss:FaceInteraction . ?hasHhFace rdf:type ss:HoogsteenHoogsteenFace . ?hasHhFace ss:isAgentIn ?faceInteraction . ?hasO2pFace rdf:type ss:O2pFace . ?hasO2pFace ss:isAgentIn ?faceInteraction . ?residue1 ss:hasQuality ?hasHhFace . ?residue2 ss:hasQuality ?hasO2pFace } Results found at: http://tinyurl.com/oo4fp8 LW+ nomenclature more detailed for base interactions. The result of this query describes a single base pair in this structure. Carleton University :: Dumontier Lab :: 5/25/2009 18 dumontierlab.com
  • Future Directions • Specify Saenger nomenclature • Map other structural annotator output (e.g. 3DNA) • Extend structural knowledge with 6 backbone angles – range restrictions on classes • SWRL / DL-safe rules or SPARQL query required to specify cyclic motifs • Publish as part of Bio2RDF network 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 19
  • RKB Availability • Creative Commons License. • Google Code Project: – http://semanticscience.org • Instructions: http://code.google.com/p/semanticscience/wiki/RKBDownload 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 20
  • References • Dumontier, M., et al. (2009). RKB: A Semantic Web Knowledge Base for RNA, Accepted in Bio-Ontologies 2009, Stockholm, Sweden • Smith, B., et al. (2005). Relations in biomedical ontologies. Genome Biol, 6(5): p. R46 • Leontis, N. B. and E. Westhof (2001). Geometric nomenclature and classification of RNA base pairs. RNA, 7(4): 499-512. • Lemieux, S. and F. Major. (2002). RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire. Nucleic Acids Res, 30(19): p. 4250-63. • Major, F., Thibault, P., Computer Modeling of RNA Three- Dimensional Structures, in Encyclopedia of Molecular Cell Biology and Molecular Medicine, R.A. Meyers, Editor. 2005, Wiley-VCH Verlag GmbH & Co.: Weinheim. p. 605-636. 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 21