Representing sequences of parts in processes using OWL

695 views
595 views

Published on

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
695
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Processes are those sorts of things that necessarily involve temporal extent. They *happen*, rather than *existing*. Examples are the development of organisms over time; an organism’s life; specific phases in life such as pregnancy; and biological processes such as transcription. Processes have a special dependency relationship to their participants, but are not the same thing. The Gene Ontology ‘biological process’ ontology consists of > 17 000 matches (by far the largest portion of the GO overall)
  • Many or most of the interesting processes described in biology consist of sub-processes which form a part of the overall process. The sub-processes are usually ordered with respect to time. They may be repeated in the same sequence. Processes are often illustrated diagrammatically such as the familiar biochemical pathway diagrams. For our purposes in this paper we will however ignore the complications posed to representation by the presence of cycles in such pathway illustrations, since we are primarily interested here in the classification based on straightforwardly linear sequences of parts of processes. Representation of temporal sequences of process parts is not very highly represented in bio-ontologies.
  • Note that developmental anatomy ontologies have a rather more complicated and necessary relationship to temporal sequences than do straightforward process hierarchies, since in developmental anatomy ontologies, the physical entity being described doesn’t exist in certain parts of the temporal hierarchy.
  • Initiation begins with an RNA polymerase enzyme binding to a region on a DNA double strand, which depends on the existence of the right pre-conditions. First, the promoter sequence of the region to be transcribed needs to be accessible. Then, relevant proteins called transcription factors need to recognise the specific promoter. When the specific transcription factors are bound to the promoter, the RNA polymerase can moor. This forms the transcription initiation complex. Elongation can be summarized in the following series of sub-processes: a. RNA nucleotide monomers are paired with complementary DNA bases and added to the 3' end of the new RNA macromolecule being synthesized. A sugar-phosphate backbone forms with assistance from RNA polymerase. While unwinding the double strand, 10 to 20 nucleotides are available to the enzyme in order to proceed to the base pairing and to the proper elongation of the RNA. b. The rate of polymerisation is about 60 nucleotides per second in eukaryotes. Furthermore, multiple molecules of RNA polymerase can simultaneously transcribe the same DNA strand, following each other like a truck convoy. c. If the cell has a nucleus, the RNA is further processed (addition of a 3' poly-A tail and a 5' cap) and exits through to the cytoplasm through the nuclear pore complex. In eukaryotic cells, when the polymerase encounters the termination signal (a specific sequence on the DNA), it continues transcribing for hundreds of nucleotides past the termination signal, but at a point about 10 to 35 nucleotides past the signal (AAUAAA sequence in the pre-RNA), the mRNA is cut free from the enzyme. Subsequently, if the cell has a nucleus, the mRNA is further processed by the addition of a 3' poly-A tail and a 5' cap, and exits through to the cytoplasm through the nuclear pore complex. By contrast, in prokaryotes, transcription stops right at the end of the termination signal and the RNA and DNA are released.
  • We give a brief sketch of the more straightforward aspects of our model, before going into detail on the more problematic areas in the next section. Firstly, we model the biological entities which are described above as material entities: DNA, mRNA, cell and cell nucleus, etc, which are inherited from BioTop. The various macromolecular complexes involved in transcription are included as well, such as the TranscriptionInitiationComplex.
  • The transcription process is modelled together with its parts (sub-processes), i.e. initiation, elongation and termination, using the transitive precededBy relation to indicate the temporal sequence of process parts, as follows:
  • Transcription subClassOf Process and (hasProcessualPart some TranscriptionInitiation) and (hasProcessualPart some TranscriptionElongation) and (hasProcessualPart some TranscriptionTermination) TranscriptionInitiation subClassOf (Process and processualPartOf some Transcription) TranscriptionElongation subClassOf (Process and processualPartOf some Transcription) TranscriptionTermination subClassOf (Process and processualPartOf some Transcription) TranscriptionElongation subclassOf precededBy some TranscriptionInitiation TranscriptionTermination subclassOf precededBy some TranscriptionElongation EukaryoticTranscription subClassOf transcription EukaryoticTranscription subClassOf rateOfTranscription value "60"^^int
  • The first question could be addressed using SPARQL-DL querying [6], in which ontology information is collapsed into a graph and can be queried in a similar fashion to RDF data with SPARQL . However, the query which retrieved the sequence of sub-processes would have to make assumptions about the maximum number of possible sub-processes, which is not very intuitive.
  • The first question could be addressed using SPARQL-DL querying [6], in which ontology information is collapsed into a graph and can be queried in a similar fashion to RDF data with SPARQL . However, the query which retrieved the sequence of sub-processes would have to make assumptions about the maximum number of possible sub-processes, which is not very intuitive.
  • The second question can potentially be answered if it is reformulated as How many RNA polymerases can bind a promoter? – and the relevant cardinality restriction is captured somewhere in the ontology. Additional logic is needed to translate the query Can multiple to decide yes or no based on whether the answer to the How many question is greater than one. The third question is beyond the scope of what OWL knowledge bases can cope with – complex processing is required to formulate the relevant mathematical expression and test the solution based on the rate of transcription which is modelled in the ontology.
  • We would also like the ontology to be able to perform correct instance classification. In particular, we can try to classify completed transcription processes, in which the various sub-processes have executed in the sequence specified. This is more complex than the above query answering. To see this, we create the following instances in the ontology, each of which aside from the first represents different negative examples (instances we would not expect to see classified): This is a form of error detection as well as
  • transcription1: contains subprocesses initiation1, elongation1, and termination1. elongation1 precededBy initiation1, and termination1 precededBy elongation1. transcription2: contains subprocesses initiation2 and elongation2, but no other subprocesses.
  • transcription3: contains subprocesses initiation3, elongation3, and termination3, but they are in the incorrect sequence (initiation precededBy termination). transcription4: contains subprocesses initiation4, elongation4 and termination4, but relate them to the subprocesses of the previous instance: elongation4 precededBy initiation3, and termination4 precededBy termination3.
  • transcription5: contains two different copies of each of the subprocesses initiation, elongation and termination.
  • With this definition, executing HermiT for classification, we obtain the following instance members: transcription1, -3, -4 and -5. (i.e. only transcription2 failed to be classified as an instance). Clearly we must do better. our next attempt uses an exact cardinality constraint to strengthen the requirement:
  • However, reasoning with this definition finds no instances as members , indicating that the cardinality constraint is not met. This may be due to the open world assumption underlying OWL reasoning: although we have stated that our instances have only one of the relevant sub-processes as parts, nevertheless in all possible models nothing prevents additional sub-process parts being included.
  • With another attempt to address some of the issues in correct classification, we can try to enforce that the sub-processes forming the sequence are all part of the same overall transcription process. We can attempt this using the special Self keyword in OWL. The definition would then look like: Sadly, we found that actually reasoning with this construction – although it was syntactically accepted – did not yield the desired result (due to the reasoner implementation), and no instances were classified.
  • Is the precededBy relation transitive? If ( elongation precededBy initiation ) and ( termination precededBy elongation ) => termination precededBy initiation But, in one organism, it may be the case that two transcription processes are temporally ordered such that transcription2 precededBy transcription1 and then we have initiation2 precededBy termination1
  • Representing sequences of parts in processes using OWL

    1. 1. Representing the sequence of parts of processes using OWL Janna Hastings , Samy Deghou, Christoph Steinbeck EBI Cheminformatics and Metabolism Stefan Schulz Medical University of Graz, Austria Deep Knowledge Representation Challenge Workshop, Banff, Alberta, Canada 26 June 2011
    2. 2. Processes happen in time 30.06.11 Processes and their parts – DKRC, Banff, Alberta GO ‘biological process’ ontology > 17 000 terms Mary birth Mary’s life childhood adulthood
    3. 3. Processes consist of sub-processes 30.06.11 Processes and their parts – DKRC, Banff, Alberta EATING biting chewing swallowing digesting temporal sequence biochemical pathways
    4. 4. Temporal sequence in bio-ontologies <ul><li>Gene Ontology: describes parts of processes, but contains no information as to their sequence </li></ul><ul><li>Fly development: starts_at_end_of relation; Annotation: temporal ordering number </li></ul><ul><li>Xenopus anatomy and development: start_stage , end_stage , and develops_from relations </li></ul><ul><li>Human anatomy and development: starts_at and ends_at </li></ul>30.06.11 Processes and their parts – DKRC, Banff, Alberta
    5. 5. 30.06.11 Processes and their parts – DKRC, Banff, Alberta TRANSCRIPTION
    6. 6. 30.06.11 Processes and their parts – DKRC, Banff, Alberta material entity DNA mRNA cell nucleus cell transcription initiation complex
    7. 7. 30.06.11 Processes and their parts – DKRC, Banff, Alberta
    8. 8. 30.06.11 Processes and their parts – DKRC, Banff, Alberta process transcription transcription initiation transcription elongation transcription termination
    9. 9. 30.06.11 Processes and their parts – DKRC, Banff, Alberta
    10. 10. Composition and ordering of process parts 30.06.11 Processes and their parts – DKRC, Banff, Alberta transcription transcription initiation transcription elongation transcription termination precededBy precededBy eukaryotic transcription rateOfTranscription = 60 processualPartOf
    11. 11. Asking questions of the ontology <ul><li>What are the steps which comprise a transcription process? </li></ul><ul><li>? processualPartOf some Transcription </li></ul><ul><li>What precedes transcription elongation? </li></ul><ul><li>? inv(precededBy) some TranscriptionElongation </li></ul>30.06.11 Processes and their parts – DKRC, Banff, Alberta
    12. 12. More difficult... <ul><li>What are the sequence of steps involved in a transcription process? </li></ul><ul><li>Can multiple RNA polymerase bind a promoter at the same time? </li></ul><ul><li>How long would it take for a eukaryotic cell to translate an RNA whose ORF length is about 10 000 codons? </li></ul>30.06.11 Processes and their parts – DKRC, Banff, Alberta
    13. 13. SPARQL-DL to query sequences of steps <ul><li>What are the sequence of steps involved in a transcription process? </li></ul><ul><li>SELECT ?first, ?second, ?third </li></ul><ul><li>WHERE ?second precededBy ?first, </li></ul><ul><li> ?third precededBy ?second, </li></ul><ul><li>?first processPartOf ?trans, </li></ul><ul><li>?second processPartOf ?trans, </li></ul><ul><li>?third processPartOf ?trans. </li></ul>30.06.11 Processes and their parts – DKRC, Banff, Alberta Number of steps must be known!
    14. 14. Counting and cardinality constraints <ul><li>Can multiple RNA polymerase bind a promoter at the same time? </li></ul><ul><li>promoter subClassOf bindsTo max 1 RNA polymerase </li></ul><ul><li>How long would it take for a eukaryotic cell to translate an RNA whose ORF length is about 10 000 codons? </li></ul><ul><li>Needs a numeric equation solver </li></ul>30.06.11 Processes and their parts – DKRC, Banff, Alberta
    15. 15. Classification of instances <ul><li>Can our ontology correctly classify instances of completed </li></ul><ul><li>transcription processes </li></ul><ul><li>... and not classify erroneous examples? </li></ul>30.06.11 Processes and their parts – DKRC, Banff, Alberta
    16. 16. Test individuals <ul><li>Transcription1 should be classified as completed transcription. </li></ul><ul><li>We expect transcription2 will not be classified as a completed transcription process, although it may be classified as a transcription in progress. </li></ul>30.06.11 Processes and their parts – DKRC, Banff, Alberta transcription1 initiation1 elongation1 termination1 precededBy processualPartOf transcription2 initiation2 elongation2 precededBy processualPartOf
    17. 17. Test individuals <ul><li>We expect this will not be classified as an instance of a completed transcription. </li></ul><ul><li>We expect this will not be classified as an instance of a completed transcription. </li></ul>30.06.11 Processes and their parts – DKRC, Banff, Alberta transcription3 initiation3 elongation3 termination3 precededBy processualPartOf transcription4 initiation4 elongation4 termination4 precededBy processualPartOf initiation3 elongation3
    18. 18. Test individuals <ul><li>We would expect to see this instance not classified as a completed transcription since it violates the ‘exactly 1’ cardinality constraint. </li></ul>30.06.11 Processes and their parts – DKRC, Banff, Alberta transcription5 initiation5b elongation5b termination5b precededBy processualPartOf initiation5a elongation5a termination5a precededBy
    19. 19. Fully defined ‘Completed Transcription’ <ul><li>CompletedTranscription equivalentTo </li></ul><ul><li>Transcription and (hasProcessualComponent some TranscriptionInitiation) and (hasProcessualComponent some (TranscriptionElongation and precededBy some TranscriptionInitiation)) and (hasProcessualComponent some (TranscriptionTermination and precededBy some TranscriptionElongation)) </li></ul>30.06.11 Processes and their parts – DKRC, Banff, Alberta trans1 trans2 trans3 trans4 trans5
    20. 20. Completed Transcription, take 2 <ul><li>Transcription </li></ul><ul><li>and (hasProcessualComponent exactly 1 (TranscriptionElongation and precededBy some TranscriptionInitiation)) </li></ul><ul><li>and (hasProcessualComponent exactly 1 TranscriptionInitiation) </li></ul><ul><li>and (hasProcessualComponent exactly 1 (TranscriptionTermination and precededBy some TranscriptionElongation)) </li></ul>30.06.11 Processes and their parts – DKRC, Banff, Alberta trans1 trans2 trans3 trans4 trans5
    21. 21. Does the ‘Self’ keyword help? <ul><li>(hasProcessualComponent some TranscriptionInitiation) </li></ul><ul><li>and (hasProcessualComponent some </li></ul><ul><li>(TranscriptionElongation </li></ul><ul><li>and (precededBy some </li></ul><ul><li>(TranscriptionInitiation </li></ul><ul><li>and (processualComponentOf some Self ))))) </li></ul><ul><li>and (hasProcessualComponent some </li></ul><ul><li>(TranscriptionTermination </li></ul><ul><li>and (precededBy some </li></ul><ul><li>(TranscriptionElongation </li></ul><ul><li>and (processualComponentOf some Self ))))) </li></ul>30.06.11 Processes and their parts – DKRC, Banff, Alberta trans1 trans2 trans3 trans4 trans5
    22. 22. Some additional challenges <ul><li>Multiple processes of the same sort occurring in sequence </li></ul><ul><li>Nesting of process parts </li></ul>30.06.11 Processes and their parts – DKRC, Banff, Alberta transcription1 initiation1 elongation1 termination1 processualPartOf transcription2 initiation promoter binding ... elongation termination base pairing unwinding ... precededBy
    23. 23. Sequence of components of an emotion? 30.06.11 Processes and their parts – DKRC, Banff, Alberta The Emotion Ontology, Hastings et al., ICBO 2011
    24. 24. Thank you 30.06.11 Processes and their parts – DKRC, Banff, Alberta

    ×