Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
RKB – A Semantic Knowledge Base for RNA


      Michel Dumontier 1, José Cruz-Toledo 1
         Marc Parisien 2, Francois ...
Objectives

i. To represent biochemistry of nucleic
   acids and their structural characteristics
   including base pairin...
Guided design
• Modeling with Upper Level Ontologies
      – interoperability and semantic coherency
      – New Upper Lev...
Biological Modeling
• Objects
      – Occupy space
            • Nucleic acids, nucleotides, riboses and phosphates

• Qua...
Contextual Modeling of Nucleic Acids

• Base stacking varies in different XRD/NMR models
• Need to know in which model tha...
RKB populated with PDB, MC-Annotate
 • The ontology population involved 3 steps:
    i.   Assigning names
    ii.  Asserti...
Support for Leontis-Westhof Nomenclature

•   The RKB incorporates LW nomenclature
•   Describes the three edges for H-bon...
Support for LW+ Nomenclature
•   Extension incorporates faces to each edge:
      – WC edge:
            • Wh, Ww and Ws f...
Describing Base Pairs
• Base pairs composed of
  interactions with the edges or faces
  of the interacting bases

• Role c...
The RKB is compatible with both the LW and
  the Saenger nomenclature for base pairs
• The semantics of the RKB enables
  ...
Sugar Puckering
• The ribose ring presents two
  distinct puckering
  modes, envelope and twist
• The classification into ...
Sugar Puckering (cont’d)
Our implementation of situational
modeling assures that objects are
represented by a single entit...
RKB is SPARQL accessible

• SPARQL is a graph query language
• Loaded instantiated ontology into Virtuoso 6

• SPARQL endp...
Query 1: Find all face interactions
(model 1 of PDB:1B36)

    PREFIX ss: <http://semanticscience.org/>

    select distin...
See results : http://tinyurl.com/porxdb




5/25/2009      Carleton University -- Dumontier Lab dumontierlab.com   15
Query 2: Find all C8 mediated base pairs
  (model 1 of PDB:1B36)

   PREFIX ss: <http://semanticscience.org/>
   SELECT DI...
Query 3: Find base pairs involving a GMP sugar-sugar face
(model 1 of PDB:1B36)


    PREFIX ss: <http://semanticscience.o...
Query 4: Find Hoogsteen – O2’ face interactions
(model 1 of PDB:1B36)


      PREFIX ss: <http://semanticscience.org/>
   ...
Future Directions
• Specify Saenger nomenclature
• Map other structural annotator output (e.g. 3DNA)
• Extend structural k...
RKB Availability
• Creative Commons License.
• Google Code Project:
      – http://semanticscience.org

• Instructions:
  ...
References
• Dumontier, M., et al. (2009). RKB: A Semantic Web Knowledge
  Base for RNA, Accepted in Bio-Ontologies
  2009...
Upcoming SlideShare
Loading in …5
×

RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

1,380 views

Published on

Increasingly sophisticated knowledge about RNA structure and function requires an inclusive knowledge representation that facilitates the integration of independently-generated information arising from such efforts as genome sequencing projects, microarray analyses, structure determination and RNA SELEX experiments. While RNAML, an XML-based representation, has been proposed as an exchange format for a select subset of information, it lacks machine-understandable semantics that make it arbitrarily user-extensible, as is the case for formal logic based languages. Here, we describe an RNA knowledge base (RKB) for structure-based knowledge using RDF/OWL Semantic Web technologies. RKB contains basic terminology for nucleic acid composi-tion along with context/model-specific representation of structural features such as sugar conformations, base pairings and base stackings. RKB is populated with RNA PDB entries and MC-Annotate structural annotation. The use of semantic web technologies addresses the reality of diverse interests of the RNA Ontology Consortium and supports knowledge discovery over independently-published RNA knowledge.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

RKB – A Semantic Knowledge Base For RNA (RNA ontology consortium meeting)

  1. 1. RKB – A Semantic Knowledge Base for RNA Michel Dumontier 1, José Cruz-Toledo 1 Marc Parisien 2, Francois Major 2 1 Carleton University 2 Université de Montreal
  2. 2. Objectives i. To represent biochemistry of nucleic acids and their structural characteristics including base pairing/stacking ii. Represent context specific knowledge iii. Capture the structural annotation generated by MC-Annotate 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 2
  3. 3. Guided design • Modeling with Upper Level Ontologies – interoperability and semantic coherency – New Upper Level Ontology (NULO) • distinguishes objects, qualities, roles, processes and spatial regions • Based on BFO/RO, but for OWL 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 3
  4. 4. Biological Modeling • Objects – Occupy space • Nucleic acids, nucleotides, riboses and phosphates • Qualities – Intrinsic categorical or numeric valued property • Nucleotide bears the quality of conformation • Roles – Defined by extrinsic interactions • A C3’ atom may hold the exo role during some sugar puckering • Processes – Entities that extend in time • structure determination, an interaction 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 4
  5. 5. Contextual Modeling of Nucleic Acids • Base stacking varies in different XRD/NMR models • Need to know in which model that info is found • We want to set the stage for representing simulation. 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 5
  6. 6. RKB populated with PDB, MC-Annotate • The ontology population involved 3 steps: i. Assigning names ii. Asserting class membership iii. Assigning relations between entities • The following naming convention was used: – Objects: • Polymer: PDBID_cCHAIN • Residue: PDBID_cCHAIN_rRESIDUE • Atom: PDBID_cCHAIN_rRESIDUE_aAtom – Quality/Roles • PDBID_mMODEL_cCHAIN_rRESIDUE_type – Processes • Structure determination: PDBID_mMODEL • Interaction: PDBID_mMODEL_PROCESSTYPE_PARTICIPANT 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 6
  7. 7. Support for Leontis-Westhof Nomenclature • The RKB incorporates LW nomenclature • Describes the three edges for H-bonding interactions in purines (Y) and pyrimidines (R) • Atom composition: i. Watson-Crick Edge: • A(N6)/G(O6), R(N1), A(C2)/G(N2), U(O 4)/C(N4), Y(N3) and Y(O2) ii. Hoogsteen Edge (CH edge for R): • A(N6)/G(O6), R(N7), U(O4)/C(N4) and Y(C5) iii. Sugar Edge: • A(C2)/G(N2), R(N3), Y(O2) and O2’ • cis and trans orientations • relative orientations of the glycosidic bond between the sugar and the PO4 group 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 7
  8. 8. Support for LW+ Nomenclature • Extension incorporates faces to each edge: – WC edge: • Wh, Ww and Ws faces – Hoogsteen Edge: • C8(Y), Hh, Hw and Bh – Sugar Edge: • Bs, Ss(Y), Sw and O2’ • The Bh and Bs faces involve the Hoogsteen side amino/keto group and the sugar side amino/ keto group respectively. • The C8 face was introduced for the C8-H8 donor group in purines 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 8
  9. 9. Describing Base Pairs • Base pairs composed of interactions with the edges or faces of the interacting bases • Role chains capture additional knowledge: Objects that participate in sub-processes (face interactions) are also participants of the process whole (base pair) hasPart ◦ hasParticipant -> hasParticipant Objects are involved in processes when their qualities are isBearerOf ◦ isParticipantIn -> isParticipantIn 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 9
  10. 10. The RKB is compatible with both the LW and the Saenger nomenclature for base pairs • The semantics of the RKB enables the usage of consistent bp naming schemes • The AA BP in model 4 of PDB:1B36 can be classified as the being member of the following classes: – Saenger type II A A – LW Trans Hoogsteen/Hoogsteen (8) NucleotideBasePair and ParallelBasePair and TransBasePair and HoogsteenHoogsteenBasePair and hasAgent exactly 2 AMP Carleton University :: Dumontier Lab :: 5/25/2009 10 dumontierlab.com
  11. 11. Sugar Puckering • The ribose ring presents two distinct puckering modes, envelope and twist • The classification into either geometry is dependent on the relative position of the carbon atoms of the ribose to its C5’ atom • Carbon atoms in a ribose thus bear either the endo or exo role with respect to the plane formed by the other atoms 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 11
  12. 12. Sugar Puckering (cont’d) Our implementation of situational modeling assures that objects are represented by a single entity throughout their lifetime, thus avoiding the need to create multiple distinct instances of the same object in each particular spatial-temporal context with different attributes 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 12
  13. 13. RKB is SPARQL accessible • SPARQL is a graph query language • Loaded instantiated ontology into Virtuoso 6 • SPARQL endpoint – http://codemonkey.dumontierlab.com/sparql/ • Specify Graphs to restrict search – http://semanticscience.org/rkb/mcannotate/pdb/dna – http://semanticscience.org/rkb/mcannotate/pdb/rna Carleton University :: Dumontier Lab :: 5/25/2009 13 dumontierlab.com
  14. 14. Query 1: Find all face interactions (model 1 of PDB:1B36) PREFIX ss: <http://semanticscience.org/> select distinct ?faceInteraction where { ?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> . ?pair ss:hasProperPart ?faceInteraction . ?faceInteraction rdf:type ss:FaceInteraction . } Nucleotide base pairs are composed of one or more face interactions. Where known, such as in the MC-Annotate results, we can retrieve all 18 instances of this that satisfy this query. Carleton University :: Dumontier Lab :: 5/25/2009 14 dumontierlab.com
  15. 15. See results : http://tinyurl.com/porxdb 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 15
  16. 16. Query 2: Find all C8 mediated base pairs (model 1 of PDB:1B36) PREFIX ss: <http://semanticscience.org/> SELECT DISTINCT ?faceInteraction ?residue ?hasC8Face where { ?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> . ?pair ss:hasProperPart ?faceInteraction . ?faceInteraction rdf:type ss:FaceInteraction . ?C8Face ss:isAgentIn ?faceInteraction . ?C8Face rdf:type ss:C8Face . ?residue ss:hasQuality ?C8Face } Results: http://tinyurl.com/r7b5e4 Face interactions are mediated by the faces of bases. Nucleotides and their face qualities are related by the hasQuality relation, whereas faces are agents in the face interaction, and are related by the hasAgent relation. Carleton University :: Dumontier Lab :: 5/25/2009 16 dumontierlab.com
  17. 17. Query 3: Find base pairs involving a GMP sugar-sugar face (model 1 of PDB:1B36) PREFIX ss: <http://semanticscience.org/> SELECT distinct ?faceInteraction ?residue ?hasSSFace WHERE { ?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> . ?pair ss:hasProperPart ?faceInteraction . ?faceInteraction rdf:type ss:FaceInteraction . ?hasSSFace rdf:type ss:SugarSugarFace . ?hasSSFace ss:isAgentIn ?faceInteraction . ?residue ss:hasQuality ?hasSSFace . ?residue rdf:type ss:GMP } Results found at: http://tinyurl.com/qpup8z This query builds on Query 2, in that it requires a Ss face to be on an AMP that is participating in a base pair. Two GMPs are found to have this particular face participating with other nucleotides in base pairs in this particular structure Carleton University :: Dumontier Lab :: 5/25/2009 17 dumontierlab.com
  18. 18. Query 4: Find Hoogsteen – O2’ face interactions (model 1 of PDB:1B36) PREFIX ss: <http://semanticscience.org/> SELECT distinct ?faceInteraction ?residue1 ?residue2 ?hasHhFace ?hasO2pFace where { ?pair ss:isProperPartOf <http://semanticscience.org/pdb/1B36_m1> . ?pair ss:hasProperPart ?faceInteraction . ?faceInteraction rdf:type ss:FaceInteraction . ?hasHhFace rdf:type ss:HoogsteenHoogsteenFace . ?hasHhFace ss:isAgentIn ?faceInteraction . ?hasO2pFace rdf:type ss:O2pFace . ?hasO2pFace ss:isAgentIn ?faceInteraction . ?residue1 ss:hasQuality ?hasHhFace . ?residue2 ss:hasQuality ?hasO2pFace } Results found at: http://tinyurl.com/oo4fp8 LW+ nomenclature more detailed for base interactions. The result of this query describes a single base pair in this structure. Carleton University :: Dumontier Lab :: 5/25/2009 18 dumontierlab.com
  19. 19. Future Directions • Specify Saenger nomenclature • Map other structural annotator output (e.g. 3DNA) • Extend structural knowledge with 6 backbone angles – range restrictions on classes • SWRL / DL-safe rules or SPARQL query required to specify cyclic motifs • Publish as part of Bio2RDF network 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 19
  20. 20. RKB Availability • Creative Commons License. • Google Code Project: – http://semanticscience.org • Instructions: http://code.google.com/p/semanticscience/wiki/RKBDownload 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 20
  21. 21. References • Dumontier, M., et al. (2009). RKB: A Semantic Web Knowledge Base for RNA, Accepted in Bio-Ontologies 2009, Stockholm, Sweden • Smith, B., et al. (2005). Relations in biomedical ontologies. Genome Biol, 6(5): p. R46 • Leontis, N. B. and E. Westhof (2001). Geometric nomenclature and classification of RNA base pairs. RNA, 7(4): 499-512. • Lemieux, S. and F. Major. (2002). RNA canonical and non-canonical base pairing types: a recognition method and complete repertoire. Nucleic Acids Res, 30(19): p. 4250-63. • Major, F., Thibault, P., Computer Modeling of RNA Three- Dimensional Structures, in Encyclopedia of Molecular Cell Biology and Molecular Medicine, R.A. Meyers, Editor. 2005, Wiley-VCH Verlag GmbH & Co.: Weinheim. p. 605-636. 5/25/2009 Carleton University -- Dumontier Lab dumontierlab.com 21

×