Your SlideShare is downloading. ×
0
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Gene World: A large-scale gene-centric semantic web knowledge base for molecular biology (ORE2013)

441

Published on

The discovery of the central role of genes in regulating the fundamental biochemical processes of living things has driven biologists to collect, analyze and re-use enormous amounts of information, …

The discovery of the central role of genes in regulating the fundamental biochemical processes of living things has driven biologists to collect, analyze and re-use enormous amounts of information, and to make this information available in thousands of curated databases. The increasingly popular use of specialized terminologies, often organized into hierarchical taxonomies or more formal ontologies, to describe this data indicates that managing the total amount of resources available (big data) will surely continue to be an ongoing challenge. Here, we describe a biological gene-centric dataset (available at http://semanticscience.org/projects/gene-world), aimed at providing the reasoner community with a fully connected graph of data and ontologies of value to the bioinformatics community and for which there currently exists significant challenges in using automated reasoning for consistency checking and query answering of large ontology-mapped linked data

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
441
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • The Bio2RDF project transforms silos of life science data into a globally distributed network of linked data for biological knowledge discovery.
  • Transcript

    • 1. José Cruz-Toledo, Alison Callahan and Michel Dumontier Carleton University Gene World: A large-scale, gene-centric semantic web knowledge base for molecular biology Dumontier::ORE 2013:Gene World 1
    • 2. At the heart of Linked Data for the Life Sciences • Free and open source • Uses Semantic Web standards • Release 2 (Jan 2013): 1B+ interlinked statements from 19 conventional and high value datasets • Provenance, statistics • Partnerships with EBI, NCBI, DBCLS, NCBO, OpenPHACTS, and commercial tool providers chemicals/drugs/formulations, genomes/genes/proteins, domains Interactions, complexes & pathways animal models and phenotypes Disease, genetic markers, treatments Terminologies & publications Dumontier::ORE 2013:Gene World 2
    • 3. Gene World • Goal: to establish a Bio2RDF-based life science dataset for evaluation of large instance-based reasoners • Approach: select a medium-size, well annotated dataset with links to rich ontologies. Augment with disjunction and provide mappings to richer upper level ontologies. Provide sample queries. Dumontier::ORE 2013:Gene World 3
    • 4. Gene World : Data • NCBI Gene: database of genes including names, reference sequences, variants, phenotypes, pathways and cross-references to related resources. – 394,026,267 triples – 12,543,449 unique subjects – 60 unique predicates – 121,538,103 unique objects • HomoloGene: database of homologous groups, including paralogous and orthologous, genes from a set of 21 completely sequenced eukaryotic genomes. – 1,281,881 triples – 43,605 unique subjects – 17 unique predicates – 1,011,783 unique objects Dumontier::ORE 2013:Gene World 4
    • 5. Gene World : Ontologies • Gene Ontology (GO) – Ontology for annotating gene products. Consist of three main branches: molecular function, biological process and cellular component – 34k classes, 6 object properties, 63k subclass axioms • Evidence Code Ontology (ECO) – Ontology for capturing the source of evidence used for the GO annotation – 297 classes, 2 object properties, 453 subclass axioms • NCBI Taxonomy (TAXON) – Ontology of species; widely used, excludes anything that we don’t have a sequence for. – 1M classes, 15 object properties, 1M subclass axioms Dumontier::ORE 2013:Gene World 5
    • 6. Gene World : Mappings • The Semanticscience Integrated Ontology (SIO) is a simple upper level description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes – 1385 classes, 201 object properties and 1 datatype property. - SRIQ(D) – basic design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. – Mapped types and relations to 19 Bio2RDF datasets and 700+ SADI semantic web services • the Sequence Ontology (SO) provides vocabulary for the physical attributes of biological sequences (i.e. binding sites, exons) and the processes in which biological sequences may be involved in – 2151 classes, 74 object properties; SHI Dumontier::ORE 2013:Gene World 6
    • 7. Dumontier::ORE 2013:Gene World 7 SRIQ(D) 10700+ axioms 1300+ classes 201 object properties (inc. inverses) 1 datatype property
    • 8. uniprot:P05067 uniprot:Protein is a sio:gene is a is a Semantic data integration, consistency checking and query answering over Bio2RDF with the Semanticscience Integrated Ontology (SIO) dataset ontology Knowledge Base Dumontier::ORE 2013:Gene World pharmgkb:PA30917 refseq:Protein is a is a omim:189931 omim:Gene pharmgkb:Gene Querying Bio2RDF Linked Open Data with a Global Schema. Alison Callahan, José Cruz-Toledo and Michel Dumontier. Bio-ontologies 2012. 8
    • 9. Gene World : Mappings • The Semanticscience Integrated Ontology (SIO) is a simple upper level description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes – 1385 classes, 201 object properties and 1 datatype property. - SRIQ(D) – basic design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. – Mapped types and relations to 19 Bio2RDF datasets and 700+ SADI semantic web services • the Sequence Ontology (SO) provides vocabulary for the physical attributes of biological sequences (i.e. binding sites, exons) and the processes in which biological sequences may be involved in – 2151 classes, 74 object properties; SHI Dumontier::ORE 2013:Gene World 9
    • 10. Dumontier::ORE 2013:Gene World 10
    • 11. DL Queries Dumontier::ORE 2013:Gene World 11 Q4: retrieve genes that are annotated with a specific enzymatic function: DL query: gene that ‘has function’ some ‘acetylglucosaminyltransferase activity *go: 0008375+’ -> subclass reasoning over SIO mappings and GO Q6: retrieve organisms that have genes with a enzymatic activity that was not obtained by computational analysis DL query: ‘Mammalia [taxid: 40674+’ that inverse(has_taxid) some (gene that 'has function' some (function that inverse(go_term) some ('has evidence' some not('inferred by electronic annotation'))) -> subclass reasoning with disjunction, inverse property and negation
    • 12. SPARQL-DL Dumontier::ORE 2013:Gene World 12 Q9: retrieve orthologous human and mouse genes annotated with function to bind ATP Type(?human_gene, gene), Type(?mouse_gene, ‘gene’), Type(?homologene_group, HomoloGene_Group), PropertyValue(?human_gene, has_taxid, ‘Homo sapiens’), PropertyValue(?mouse_gene, has_taxid, ‘Mus musculus’), PropertyValue(?human_gene, ‘has function’, ‘ATP binding’), PropertyValue(?mouse_gene, ‘has function’, ‘ATP binding’), PropertyValue(?homologene_group, has_gene, ?human_gene), PropertyValu e(?homologene_group, has_gene, ?mouse_gene)
    • 13. Availability & Future work • http://semanticscience.org/projects/gene-world • Try this dataset with different reasoners : reasoner-world? • Generate informative statistics for reasoner developers and develop variants based on evaluation need Dumontier::ORE 2013:Gene World 13
    • 14. Michel Dumontier michel_dumontier@carleton.ca Publications: http://dumontierlab.com Presentations: http://slideshare.com/micheldumontier 14Dumontier::ORE 2013:Gene World

    ×