Successfully reported this slideshow.

Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Development Network Connectivity

  1. 1. Integrating Large, Disparate, BiomedicalOntologies to Boost Organ DevelopmentNetwork Connectivity Chimezie Ogbuji1 and Rong Xu2 Metacognition LLC1 Case Western Reserve University2
  2. 2. Outline Outline ◦ Background ◦ Motivation ◦ Literature review / related work ◦ Opportunity / specific example ◦ Hypothesis ◦ Method ◦ Evaluation ◦ Discussion
  3. 3. Background Controlled biomedical vocabulary systems (and ontologies) play a key role in the analysis of genetic disease ◦ Structured, interoperable, and machine-readable ◦ Facilitate reproducibility of scientific results and use of intelligent software that can leverage underlying meaning ◦ Scientific results and the structured biomedical knowledge they are based on may be used for multiple - even unanticipated - purposes
  4. 4. Motivation Want descriptive relations that comprise terminology paths between (congenital) diseases and the anatomical entities that become malformed Want to use these as the basis for analysis and classification of congenital disorders according to their underlying molecular mechanism
  5. 5. Opportunity The Gene Ontology (GO) is arguably the most prominent example of how highly-organized and structured medical knowledge can be leveraged to facilitate medical genetics ◦ Has a hierarchy of biological processes involving organ development.   The Foundational Model of Anatomy (FMA) is a vast ontology with an objective to conceptualize the physical objects and spaces that constitute the human body ◦ macroscopic, microscopic and sub-cellular canonical anatomy. Their skeletal relations (is_a, part_of, and has_part) have the same meaning
  6. 6. Opportunity (continued) Their skeletal relations (is_a, part_of, and has_part) have the same meaning There are no immediately usable terminology paths between concepts in the GOs anatomy development process hierarchy and participating anatomical entities defined in the FMA
  7. 7. Literature review  Cellular components function via interaction with each other in a highly-complex and interconnected network  Interdependencies among a cell’s molecular components lead to functional, molecular, and causal relationships among distinct phenotypes.  Network-based approaches to disease have the potential to provide a framework for classifying disease, defining susceptibility, predicting disease outcome, and identifying tailored therapeutic strategiesBarabási et al. Network Medicine: A Network-based Approach to Human Disease, Nature ReviewsGenetics 2011.
  8. 8. For over a decade, analysis of biological networks via network and graph theoryhas revealed the importance of locally-dense andwell-connected subgraphs (hubs). Schwikowski et al. A network of protein-protein interactions in yeast 2000 Barabási et al. 2011
  9. 9. Related work Investigation of structural and lexical concordance between anatomy terms in the FMA and SNOMED-CT ◦ Bodenreider & Zhang 2006 Leveraging this concordance for integrating modules from each for a specific domain ◦ Ogbuji et al. 2010 Discussion of logical consequences of using part_of between both anatomical entities (in the FMA) and biological processes (the GO) ◦ Jimenez-Ruiz et al. 2010
  10. 10. Opportunity: Cardiovasculardisease and development Understanding the formation of the heart is critical to the understanding of cardiovascular diseases The study of genes and gene products involved in cardiovascular development is an important research area There have been recent efforts to expand the subset of the GOs anatomy development hierarchy involved in heart development
  11. 11. Marfan Syndrome (MFS)[…] mainly characterized by aneurysm formation in theproximal ascending aorta, leading to aortic dissectionor rupture at a young age when left untreated. Theidentification of the underlying genetic cause of MFS, namelymutations in the fibrillin-1 gene (FBN1), has furtherenhanced [...] insights into the complex pathophysiology ofaneurysm formationIn UMLS Metathesaurus• Finding site: connective tissue structure (SNOMED-CT)• Category: congenitial skeletal disorder (CRISP Thesaurus and NLM MTH)
  12. 12. Marfan Syndrome example In the GO, FBN1 is annotated with the GO_0001501 (skeletal system development) and GO_0007507 (heart development) concepts (amongst others) The former coincides with the more common finding site and classification of MFS as a congenital skeletal disorder This is in spite of the fact that associations (causal and otherwise) between MFS and cardiovascular diseases such as aortic root dilation are well- documented in the medical literature
  13. 13. Hypothesis A high-quality integration of the GOs development process hierarchy with the FMA will have several benefits: ◦ New biological pathways from genetic diseases to the anatomical entities whose development are involved in their underlying molecular mechanisms ◦ Graph and network analysis can benefit from an increase in connectivity for discovering biologically meaningful motifs ◦ Similarly, classification algorithms can also take advantage of this
  14. 14. Copper: annotates human geneGold : does not annotate human gene
  15. 15. Method and materials Integration is performed on the following GO development process hiearchies ◦ Anatomical structure development ◦ Anatomical structure arrangement ◦ Anatomical structure morphogenesis Only GO concepts that annotate human genes are considered In processing the GO, the logical properties (transitivity, for example) of the relations are fully considered ◦ This will always be the case, henceforth
  16. 16. Method and materials (continued) The FMA ontology is loaded (as OWL/RDF) into a triple store for remote querying via SPARQL The prefix of the human-readable label for each GO concept in the development hierarchies is stemmed and used as a basis for case-insensitive, lexical matching on primary labels and exact synonyms of FMA classes via a SPARQL query FMA classes that match exactly are considered to denote the anatomical entities that participate in the corresponding GO biological process
  17. 17. Example GO_0007507 (heart development) Prefix: heart Matching FMA concept: FMA_7088 (Heart)
  18. 18. Evaluation Result: 1644 development process and anatomical entity pairs We calculate the Jaccard coefficient of the overlap between hierarchies for 6 major organs and the anatomical development processes they participate in
  19. 19. Evaluation (continued) Using the GO development process for some FMA organ O as the starting point, the set of all subordinate terms is calculated: GOsubgraph(O) Example: ◦ GO_0007507 (heart development) has GO_0003170 (heart valve development) as a component (via has_part) ◦ GO_0003170 subsumes GO_0003176 (aortic valve development) and has GO_0003179 (heart valve morphogenesis) as a component ◦ Each of these would be considered as subordinates of GO_0007507
  20. 20. Evaluation (continued) In a similar fashion, the subordinate anatomical entities for each O amongst the 6 chosen organs are calculated: ◦ FMAsubgraph(O) For each O, we calculate the GO terms that are both in GOsubgraph(O) and were matched with an FMA class that is in FMAsubgraph(O) This resulting set of GO terms is considered the intersecting set and the Jaccard coefficient is calculated with respect to this, FMAsubgraph(O), and GOsubgraph(O)
  21. 21. Jaccard Coefficient (overlap)
  22. 22. Evaluation: network connectivity We calculate number of new paths from OMIM diseases through their genes to the anatomical entities in the FMA: ◦ P+dgo Similarly, we calculate the number of new paths starting from the genes to additional FMA anatomical entities ◦ P+go
  23. 23. Network connectivity: continued Only genes that are annotated with anatomical development processes matched to FMA classes and OMIM diseases associated with these genes were considered ◦ Genesdev
  24. 24. Number of additional P+dgo paths on a logarithmic scale
  25. 25. Histogram of the distribution of additional P+dgo paths as a wholeand normalized by the number of genes associated with each disease
  26. 26. Log-scaled histogram of additional paths from Genesdev to FMAclasses, only for those genes that had additional paths
  27. 27. Evaluation summary On average, mapping introduces 9,549 additional P+dgo paths per OMIM disease On average, each Genedev gene had 17,037 additional paths to FMA classes Caveat in normalizing the number of P+dgo paths by number of genes ◦ paths from diseases to anatomical entities introduce combinatorial factor of disease-gene pairings
  28. 28. Discussion Overlap results indicate little overlap between the GO hierarchies and corresponding FMA hierarchies Not surprising as both cover disparate domains within medicine and one is specific to humans while the other is not
  29. 29. Discussion (continued) This along with the size of the FMA as a whole and within the portions mapped to the GO hierarchies indicate opportunity to build on the mapping and to integrate both ontologies in a meaningful way Connectivity results demonstrate significant increase of biological paths from genetic diseases (and their genes) to the anatomical entities participating in the development process
  30. 30. Discussion (continued) As these paths are at least as logically and biologically sound as the ontologies they were forged from, we expect that an appreciable amount of them will be useful for analysis To our knowledge, this is the first attempt of this kind to integrate the anatomical structural development, morphogenesis, and organization hierarchies in the GO with the FMA
  31. 31. Limitations Regarding deductions (formal or otherwise) that follow from an integration of the FMA and GO ◦ Need to be careful to only consider annotations for humans or to have a robust way to manage the uncertainty introduced in not doing so