The Phenoscape Knowledgebase


Data integration challenge entry presentation for iEvoBio 2011.

  1. 1. Phenoscape KnowledgebaseJim Balhoff, Wasila Dahdul, Hilmar Lapp, Paula Mabee, Peter Midford, Todd Vision, Monte Westerfield
  2. 2. Phenoscape• Collaboration between P. Mabee (U. South Dakota), M. Westerfield (ZFIN, U. Oregon), and T.Vision (NESCent, UNC)• Aim: foster semantic integration of phenotype data by • Prototyping a database of curated, machine-interpretable evolutionary phenotypes. • Integrating these with mutant phenotypes from model organisms. • Providing reasoner-enabled semantic tools which facilitate data-mining of phenotypic diversity and discovery of candidate genes for evolutionary phenotype transitions.
  3. 3. Phenoscape Knowledgebase• ~4 million asserted semantic links (+ 7 million inferred)• 52 fish systematics publications • 4700 morphological characters • 11,000 states + ontological descriptions (EQ) • 2500 referenced taxa• ZFIN • 3900 genes, 6800 genotypes • 30000 experimental phenotype annotations• 13 ontologies • ~80,000 terms
  4. 4. Workflow for phenotype annotation 3. Character annotation 2. Students: by experts: Entry of Manual entry of free text phenotypes using character descriptions, Phenex matrix, taxon list, specimens and museum numbers using Phenex 4. Phenoscape Knowledgebase: OBD, data services, web application 501,862 1. Students:gather publications (scan phenotypes for hard copies, produce OCR PDFs) taxa Dahdul et al., 2010 PLoS ONE Text
  5. 5. Knowledgebase architecture Knowledgebase User Inteface External web sites Web Application for Exploration & Mining and client (Ruby on Rails, JavaScript) applications Knowledgebase Data Services API (REST) OBD Programming API OBD Reasoner (Java) Teleost Taxonomy Ontology (TTO) Knowledgebase (OBD) (PostgreSQL) Phenotypic Anatomy Quality Ontology Ontologies (PATO) (ZFA, TAO)Genes & genotypes Homology assertions Mutant EQ phenotypes Evolutionary EQ Phenotypes NeXML OBO Library from Zebrafish Model (through annotation) Organism Database Phenex Skeletal Character Data (Evolutionary EQ (from phylogenetic annotation) treatments in literature)
  6. 6. OBD: Ontology-Based Database • Stores data and ontologies in combined semantic model - triple based • Reasoner executes inference rules as SQL queries - results iteratively added to database • Supports class expressions based on property restrictions, intersections, unions; transitive properties, property chains, and subsumption • Provenance-tracking via reification
  7. 7. Reasoning across logical relationshipsBrachyplatystoma exhibits some round that capapretum inheres_in some ethmoid cartilage influences some split that tfap2a ts213/ts213 inheres_in some ethmoid cartilage
  8. 8. Reasoning across logical relationships tfap2a ethmoidBrachyplatystoma cartilage round variant_of is_a inheres_in is_a Brachyplatystoma exhibits some round that inheres_in split capapretum inheres_in some ethmoid cartilage is_a influences some split that tfap2a ts213/ts213 inheres_in some ethmoid cartilage
  9. 9. Reasoning across logical relationships sequence-specific DNA olfactory binding transcription factor activity chondrocranium region cartilagePimelodidae shape has_function part_of is_a is_a is_a is_a tfap2a ethmoidBrachyplatystoma cartilage round variant_of is_a inheres_in is_a Brachyplatystoma exhibits some round that inheres_in split capapretum inheres_in some ethmoid cartilage is_a influences some split that tfap2a ts213/ts213 inheres_in some ethmoid cartilage
  10. 10. Demo
  11. 11. Phenotype variationin taxa (left) vs. zebrafish mutants (right) cardiovascular digestive skeletal is_a is_a is_a endocrine sensory is_a is_a anatomical is_a hematopoietic respiratory is_a system is_a is_a immune is_a is_a is_a is_a liver and biliary reproductive renal musculature nervous >85% 20-30% 15-19% 10-14% 5-9% 1-4% <1% Distributed across anatomical systems
  12. 12. Global view of skeletal data 4,-62+/0.123" viewed, summarized, synthesized, at a scale not possible otherwise. =">03?:.97+9,"9@+9,"3A2,2?07";070.57:8+/0.123" =">9+.2B"C73" ;5170</0.123" =""4.97+-1" 489.9:+/0.123" 456.+7+/0.123" *+,-.+/0.123" Image from Sabaj-Perez !" #!" $!" %!" &!" !" (!" )!" Skeletal variation across taxa and regions
  13. 13. Quantify phenotypic similaritySimilarity Taxon Candidate Gene Subsuming Taxon (IC) phenotype gene phenotype phenotype 13.43 Eels gill rakers, absent eda gill rakers, absent gill rakers count lateral line, 12.85 Minytrema lateral line, absent pcsk5a lateral line, size reduced 12.44 Siluriformes basihyal, absent brpf1 basihyal, absent basihyal, count ceratobranchial 5 ceratobranchial 11.32 Gyrinocheilus eda tooth, count teeth, absent teeth, absent 11.11 Siluriformes scales, absent eda scales, absent scales, count dermatocranium, 9.91 Siluriformes opercle, shape edn1 maxilla, shape shape 9.19 Mola caudal fin, absent yap1 median fin, absent median fin, count 9.19 Gonorynchiformes dorsal fin, absent tf1p2a median fin, absent median fin, count 5.17 Siluriformes eye, reduced pbx2 eye, small eye, size
  14. 14. SummarySemantic framework and reasoning tools provide: • Powerful queries not previously possible for evolutionary phenotype data • Meaningful integration with model organism phenotypic and genetic data
  15. 15. Acknowledgments Phenoscape Workshop ParticipantsNational Science Foundation (BDI-0641025) & Contributors ! Arhat AbzhanovNational Evolutionary Synthesis Center ! ! Michael Ashburner Judith Blake ! Stan Blum ! Quentin Cronk Contributors to Teleost Ontologies Curators: ! Mário de Pinna ! Andy Deans ! Gloria Arratia Miles Coburn ! George Gkoutos ! Melissa Haendel ! Stan Blum ! Jeff Engemen ! Hopi Hoekstra ! Miles Coburn ! Kevin Conway ! Terry Grande ! Hans Hofmann ! Elizabeth Jockusch ! Wasila Dahdul ! Eric Hilton ! Elizabeth Kellogg ! Mário de Pinna ! Jeff Engemen ! John Lundberg ! Chuck Kimmel ! Suzanna Lewis ! Bill Eschmeyer ! Paula Mabee ! Anne Maglia ! Terry Grande ! Melissa Haendel ! Richard Mayden ! Austin Mast ! Brian Hall ! Chris Mungall ! Mark Sabaj ! Martin Ramirez ! Eric Hilton ! John Lundberg Sandrine Tercerie ! Sue Rhee ! Richard Mayden ! Martin Ringwald ! Mark Sabaj Pérez ! Nelson Rios ! Brian Sidlauskas ! Mark Sabaj Pérez ! Richard Vari ! Eric Segerdell ! Jacqueline Webb ! Brian Sidlauskas ! Edward Wiley ! Barry Smith ! David Stern ! Peter Vize ! Gunter Wagner ! Nicole Washington ! Edward Wiley