Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ontology-based services for querying and mining plant genomic and phenomic data

91 views

Published on

Findingphenotypeassociationsacrossmultipleplantspecies, annotation strategies, and environments has become more di cult as the amount of annotated data has continued to increase. By associating annotations with ontologies as metadata, we can provide a structured, inferrable, and standardized context in which to improve our ability to mine data by more accurately defining our own data.
To this end, the Planteome project (http://planteome.org) ingests over
20 database sources, 80 taxa, and 2 million bioentities (genes, germplasm, QTL). Over 17 million bioentities are annotated to defined ontology terms in a standardized manner. With this infrastructure in place, Planteome provides a browsable resource for multiple reference ontologies for plants such as Plant Ontology (PO) describing anatomy and growth and de- velopmental stages, Plant Trait Ontology (TO) describing phenotype traits, Gene Ontology (GO) describing molecular function, biological process and cellular components, Phenotype and Attribute Trait On- tology (PATO) and the Application ontologies that are species-specific Crop Ontology (CO). The database also allows for an ontology-based, faceted, cross-species search of plant phenomic and genomic data anno- tated with the reference ontologies. Data is denormalized using the GOlr infrastructure (http://wiki.geneontology.org/index.php/GOlr), built on top of the Solr search platform, providing quick and meaningful querying capabilities.
Work is currently underway to allow adopt a standardized Biolink web-
services API (https://github.com/biolink/biolink-api) that, with GOlr,
has already been adopted by the Monarch Initiative (https://monarchinitiative.org), an ontology-based tool for search and aggregation service focused on hu-
man disease through analysis of cross-species annotations.

Published in: Data & Analytics
  • Be the first to comment

Ontology-based services for querying and mining plant genomic and phenomic data

  1. 1. http://planteome.org/ Planteome Ontology-based Services for Querying and Mining Plant Genomic and Phenomic Data Seth Carbon1, Justin Elser2, Laurel Cooper2, Nathan Dunn1, Eric Douglass1, Suzanna Lewis1, Monica Munoz-Torres1, Pankaj Jaiswal2, and Chris Mungall1 1 - Berkeley Bioinformatics Open-Source Projects, Lawrence Berkeley National Laboratory, 2 - Oregon State University, Corvallis, Oregon
  2. 2. Questions we ask our data • What genes are associated with a plant trait? • What are the effects of a gene on an organism? • Are there useful genes from another organism that affect these traits?
  3. 3. • What genes are associated with a plant trait? • What are the effects of a gene on an organism? • Are there useful genes from another organism that affect these traits? • Browse and search related terms • Cross-species analysis search • Search multiple terms and term types Questions we ask our answers Questions we ask our data
  4. 4. Lots of data i X Y
  5. 5. Lots of data, types protein germplasm gene model mRNA QTL transcript i X Y
  6. 6. Lots of data, types, backgrounds protein germplasm gene model mRNA QTL transcript i X Y traits phenotypes anatomy development treatments environments mutants
  7. 7. Lots of data, multiple dimensions protein germplasm gene model mRNA QTL transcript i X Y traits phenotypes anatomy development treatments environments mutants
  8. 8. Annotate data with ontologies • Ontology: Logical, Interconnected Dictionary • Annotation: Associate data to ontology BioEntity (Data) ontology termannotation ontology term part_of ontology term ontology term is_a protein germplasm gene model mRNA QTL transcript annotation is_a
  9. 9. Connect data via ontology logic BioEntity (Data) ontology termannotation ontology term part_of ontology term ontology term is_a protein germplasm gene model mRNA QTL transcript annotation is_a BioEntity (Data) protein germplasm gene model mRNA QTL transcript annotation annotation
  10. 10. Planteome • 20+ database sources • 80+ taxa • 2+ million bioentities (genes, germplasm, QTL) • 17+ million standardized annotations • link BioEntities to ontology terms Ontol BioEntity (Data) Ontol BioEntity (Data) Ontol BioEntity (Data) Ontology Term BioEntity (Data)
  11. 11. Annotations modeled naturally by ontology http://browser.planteome.org/amigo/gene_product/IRIC:IRIS_313-10844 INDO NO 7505 germplasm annotation annotation flag leaf angle leaf length leaf size flag leaf morphology trait adult leaf morphology trait vascular leaf morphology trait leaf morphology trait phyllome morphology trait Oryza sativa is_a is_a is_a is_a is_a is_a is_a
  12. 12. Annotations modeled naturally by ontology INDO NO 7505 germplasm annotation annotation flag leaf angle leaf length leaf size flag leaf morphology trait adult leaf morphology trait vascular leaf morphology trait leaf morphology trait phyllome morphology trait Oryza sativa ns1 gene Zea mays FCL1 gene Medicago truncatula is_a is_a is_a is_a is_a is_a is_a
  13. 13. Genes annotated to “shrunken endosperm” http://browser.planteome.org/amigo/term/TO:0000100 Planteome
  14. 14. Genes annotated to “shrunken endosperm” http://browser.planteome.org/amigo/term/TO:0000100 Planteome Multiple Taxa Multiple Types
  15. 15. Genes annotated to “shrunken endosperm” http://browser.planteome.org/amigo/term/TO:0000100 Planteome Facets
  16. 16. Genes annotated to “shrunken endosperm” Planteome View Tree
  17. 17. Parent term provides more results endosperm quality: http://browser.planteome.org/amigo/term/TO:0000587 • 4 direct annotations
  18. 18. Parent term provides more results endosperm quality: http://browser.planteome.org/amigo/term/TO:0000587 • 139 total annotations • 13 direct annotations • 4 direct annotations
  19. 19. Parent term provides more results endosperm quality: http://browser.planteome.org/amigo/term/TO:0000587 • 139 total annotations • 13 direct annotations http://browser.planteome.org/solr/select? defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=bioentity,bioentity_name,type,annotation_class,annotation_extension_json,taxon,evidence_typ %7C&fq=document_category:%22annotation%22&fq=regulates_closure:%22TO:0000587%22&facet.field=source&facet.field=assigned_by&facet.field=aspect&facet.field=evide • TSV Download
  20. 20. Facets endosperm quality: http://browser.planteome.org/amigo/term/TO:0000587 http://browser.planteome.org/solr/select? defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=bioentity,bioentity_name,type,annotation_class,annotation_extension_json,taxon,evidence_typ %7C&fq=document_category:%22annotation%22&fq=regulates_closure:%22TO:0000587%22&facet.field=source&facet.field=assigned_by&facet.field=aspect&facet.field=evide • TSV Download • 139 total annotations • 13 direct annotations Direct annotations
  21. 21. Facets endosperm quality: http://browser.planteome.org/amigo/term/TO:0000587 http://browser.planteome.org/solr/select? defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=bioentity,bioentity_name,type,annotation_class,annotation_extension_json,taxon,evidence_typ %7C&fq=document_category:%22annotation%22&fq=regulates_closure:%22TO:0000587%22&facet.field=source&facet.field=assigned_by&facet.field=aspect&facet.field=evide • TSV Download Inferred annotations • 139 total annotations • 13 direct annotations
  22. 22. Facets endosperm quality: http://browser.planteome.org/amigo/term/TO:0000587 http://browser.planteome.org/solr/select? defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=bioentity,bioentity_name,type,annotation_class,annotation_extension_json,taxon,evidence_typ %7C&fq=document_category:%22annotation%22&fq=regulates_closure:%22TO:0000587%22&facet.field=source&facet.field=assigned_by&facet.field=aspect&facet.field=evide • TSV Download • 139 total annotations • 13 direct annotations Database Source
  23. 23. Facets endosperm quality: http://browser.planteome.org/amigo/term/TO:0000587 http://browser.planteome.org/solr/select? defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=bioentity,bioentity_name,type,annotation_class,annotation_extension_json,taxon,evidence_typ %7C&fq=document_category:%22annotation%22&fq=regulates_closure:%22TO:0000587%22&facet.field=source&facet.field=assigned_by&facet.field=aspect&facet.field=evide • TSV Download • 139 total annotations • 13 direct annotations Evidence Type
  24. 24. Facets endosperm quality: http://browser.planteome.org/amigo/term/TO:0000587 http://browser.planteome.org/solr/select? defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=bioentity,bioentity_name,type,annotation_class,annotation_extension_json,taxon,evidence_typ %7C&fq=document_category:%22annotation%22&fq=regulates_closure:%22TO:0000587%22&facet.field=source&facet.field=assigned_by&facet.field=aspect&facet.field=evide • TSV Download • 139 total annotations • 13 direct annotations Taxon
  25. 25. Facets endosperm quality: http://browser.planteome.org/amigo/term/TO:0000587 http://browser.planteome.org/solr/select? defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=bioentity,bioentity_name,type,annotation_class,annotation_extension_json,taxon,evidence_typ %7C&fq=document_category:%22annotation%22&fq=regulates_closure:%22TO:0000587%22&facet.field=source&facet.field=assigned_by&facet.field=aspect&facet.field=evide • TSV Download • 139 total annotations • 13 direct annotations Data Type
  26. 26. Facets endosperm quality: http://browser.planteome.org/amigo/term/TO:0000587 • 139 total annotations • 13 direct annotations http://browser.planteome.org/solr/select? defType=edismax&qt=standard&indent=on&wt=csv&rows=100000&start=0&fl=bioentity,bioentity_name,type,annotation_class,annotation_extension_json,taxon,evidence_type,evidence_with,reference,assigned_by&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&hl=true&hl.simple.pre=%3Cem%20class=%22hilite%22%3E&csv.encapsulator=&csv.sepa %7C&fq=document_category:%22annotation%22&fq=regulates_closure:%22TO:0000587%22&facet.field=source&facet.field=assigned_by&facet.field=aspect&facet.field=evidence_type_closure&facet.field=qualifier&facet.field=taxon_label&facet.field=type&facet.field=annotation_class_label&facet.field=regulates_closure_label&facet.field=annotation_extension_class_closure_label& Custom Download Drag to Results
  27. 27. http://planteome.org/ Planteome: Browse
  28. 28. http://planteome.org/ Planteome: Browse
  29. 29. http://planteome.org/ Planteome: Browse
  30. 30. http://planteome.org/
  31. 31. ingest RDF RDF RDF Solr/GOlr ‘graphs’ BioLink Service Layer Planteome Architecture Noctua Ontologies Planteome OWL AmiGO Client App Service Planteome Curator
  32. 32. • Plant Ontology (PO): anatomy, growth and developmental stages • Plant Trait Ontology (TO): phenotype traits • Gene Ontology (GO): molecular function, biological process and cellular components • Phenotype and Attribute Trait Ontology (PATO) • Application ontologies: e.g., Crop Ontology (CO) Planteome Ontologies
  33. 33. Planteome: GOlr • Data is optimized for querying using GOlr • uses Solr / Lucene • very fast • denormalized • supports loose queries
  34. 34. BioLink BioLink • Automatically generate client
  35. 35. BioLink BioLink • Provides Parameters
  36. 36. BioLink BioLink • Provides Parameters
  37. 37. Example: Jupyter Notebook BioLink Planteome/planteome- notebooks
  38. 38. Example: Galaxy Tool BioLink
  39. 39. Example: Galaxy Tool BioLink
  40. 40. Monarch Initiative • Similar ontology model for association • Humans, model organisms, non-model organisms • Supports BioLink API • Will be used to drive interface • Galaxy Toolshed Tool
  41. 41. Monarch Initiative • Similar concepts • Demonstrates further possibilities
  42. 42. dipper RDF RDF RDF Neo4J/ SciGraph Solr/GOlr ‘graphs’ BioLink Service Layer app app app Monarch Initiative Architecture Describe differen noctua / phenote
  43. 43. dipper RDF RDF RDF Neo4J/ SciGraph Solr/GOlr ‘graphs’ BioLink Service Layer app app app Monarch Initiative Architecture Describe differen noctua / phenote
  44. 44. Phenotypes across anatomy and organism
  45. 45. Phenogrid Human Disease Phenotype Cross-species / phenotype-driven comparison Mouse Genes Zebrafish Genes
  46. 46. Human Disease Phenotype Mouse Genes Zebrafish Genes Phenogrid Control vs relevance score
  47. 47. Directed phenotypic search
  48. 48. Directed phenotypic search
  49. 49. Directed phenotypic search
  50. 50. Phenotype Text Annotator
  51. 51. Phenotype Text Annotator
  52. 52. Phenotype Text Annotator
  53. 53. LBNL Seth Carbon* Chris Mungall* Suzanna Lewis Monica Munoz-Torres Eric Douglass Jeremy Nguyen-Xuan Nathan Dunn* OHSU Planteome is an international collaborative effort and is supported by primary funding (IOS:1340112 award) from the National Science Foundation of USA Oregon State University • Searchable “plant database” • Annotation to ontologies defines and connects data • Ontologies organize high dimensional data • BioLink API enhances integration • Monarch shows additional possibilities of ontologies Justin Elser* Pankaj Jaiswal Laurel Cooper* Planteome http://planteome.org/ Melissa Haendel* Kent Shefchek Tom Conlin Dan Keith
  54. 54. http://planteome.org/ Extra Slides
  55. 55. http://planteome.org/
  56. 56. Connect data via ontology logic Os03g0107300 annotation gene product Oryza sativa Ames 28933 germplasm Zea mays annotation • Connect arbitrary types across species
  57. 57. http://planteome.org/

×