Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Model Organism Linked Data
(NIH Commons/MOD Interoperability supplement to SGD)
Michel Dumontier
Associate Professor of Me...
Team
Michel Dumontier (Biomedical Informatics Research, Stanford)
Maxime Déraspe (U. Laval & Biomedical Informatics Resear...
25+ available endpoints available for different MODs:
YeastMine, WormMine, FlyMine, ZebrafishMine, MouseMine, ThaleMine, H...
Linked Data and Semantic Web technologies (RDF, SPARQL) are increasingly adopted
in the bioinformatics data provider commu...
Model Organism Linked Data (MO-LD)
Effort to expose InterMine data a FAIR -
Findable, Accessible, Interoperable, Reusable
...
Includes 6 MODs -
YeastMine, FlyMine, ZebrafishMine, RatMine, MouseMine, HumanMine
Linked with 38 Bio2RDF datasets
RefSeq,...
RDFization of InterMine
Query InterMine API with Object Model
Convert the tabular results into triples (RDF)
Merge the res...
InterMine-LD
External linked datasets (38)
with the 6 MODs
Linking MODs with LOD
- incomplete linking
InterMine
primary key
Identifier ...
Linked Data Platform
SPARQL Query Editor
Faceted Browser (Virtuoso)
RelFinder for Relation Visualization
Application Progr...
SPARQL Support for Programmers
Get all reactions from
KEGG that are associated
with genes that are extrinsic
components of...
Federated Query
RelFinder - Find connections between 2 or more entities
Infrastructure Deployment and Reusability
Docker (container engine) to build and deploy the MOLD infrastructure
https://hu...
Reflections
Not all data in MODs are available in the InterMine instance
Not all references are in the cross-references ta...
Can we improve the quality of the representation by using community
vocabularies (FALDO, CiTo, SIO)?
Can we offer high per...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Picnik
Next
Upcoming SlideShare
Picnik
Next
Download to read offline and view in fullscreen.

2

Share

Model Organism Linked Data

Download to read offline

Model organisms such as budding yeast provide a common platform to interrogate and understand cellular and physiological processes. Knowledge about model organisms, whether generated during the course of scientific investigation, or extracted from published articles, are made available by model organism databases (MODs) such as the Saccharomyces Genome Database (SGD) for powerful, data-driven bioinformatic analyses. Integrative platforms such as InterMine offer a standard platform for MOD data exploration and data mining. Yet, today’s bioinformatic analyses also requires access to a significantly broader set of structured biomedical data, such as what can be found in the emerging network of Linked Open Data (LOD). If MOD data could be provisioned as FAIR (Findable, Accessible, Interoperable, and Reusable), then scientists could leverage a greater amount of interoperable data in knowledge discovery.

The goal of this proposal is to increase the utility of MOD data by implementing standards-compliant data access interfaces that interoperate with Linked Data. We will focus our efforts on developing interfaces for data access, data retrieval, and query answering for SGD. Our software will publish InterMine data as LOD that are semantically annotated with ontologies and be retrieved using standardized formats (e.g. JSON-LD, Turtle). We will facilitate the exploration of MOD data for hypothesis testing, by implementing efficient query answering using Linked Data Fragments, and by developing a set of graphical user interfaces to search for data of interest, explore connections, and answer questions that leverage the wider LOD network. Finally, we will develop a locally and cloud-deployable image to enable the rapid deployment of the proposed infrastructure. Our efforts to increase interoperability and ease of deployment for biomedical data repositories will increase research productivity and reduce costs associated with data integration and warehouse maintenance.

Related Books

Free with a 30 day trial from Scribd

See all

Model Organism Linked Data

  1. 1. Model Organism Linked Data (NIH Commons/MOD Interoperability supplement to SGD) Michel Dumontier Associate Professor of Medicine Stanford University
  2. 2. Team Michel Dumontier (Biomedical Informatics Research, Stanford) Maxime Déraspe (U. Laval & Biomedical Informatics Research, Stanford) Jacques Corbeil (U. Laval) Mike Cherry (Department of Genetics, Stanford) Kalpana Karra (Department of Genetics, Stanford) Gail Binkley (Department of Genetics, Stanford) Gos Micklem (Cambridge Systems Biology Centre, U. of Cambridge) Julie Sullivan (Cambridge Systems Biology Centre, U. of Cambridge)
  3. 3. 25+ available endpoints available for different MODs: YeastMine, WormMine, FlyMine, ZebrafishMine, MouseMine, ThaleMine, HumanMine Access via (db query) API Core data object model (commonly used) + Mine-specific customizations -> heterogeneity in tables, fields, and terminologies used pose challenges for interoperability and pan-database queries InterMine is a platform for Model Organism Data
  4. 4. Linked Data and Semantic Web technologies (RDF, SPARQL) are increasingly adopted in the bioinformatics data provider community: DBCLS, EBI, NCBI, NLM, and many others MODs, like many Omics databases, often rely on other people’s content Linked Data can offer deferenceable links to authoratitive sources Opportunity to improve MOD data interoperability through mapping of their Ontologies and Vocabularies Towards increased interoperability with Semantic Web technologies
  5. 5. Model Organism Linked Data (MO-LD) Effort to expose InterMine data a FAIR - Findable, Accessible, Interoperable, Reusable Specific Aims: 1. To improve interoperability of MOD data by publishing Linked Data 2. To enable and demonstrate federated queries between MOD data and the network of Linked Data 3. To package our software and data for easier local and cloud-based deployment
  6. 6. Includes 6 MODs - YeastMine, FlyMine, ZebrafishMine, RatMine, MouseMine, HumanMine Linked with 38 Bio2RDF datasets RefSeq, PantherDB, GO, NCBI gene, HGNC, ENSEMBL, OMIM, … InterMine-RDFizer script to reproduce with any InterMine instance Web application to visualize, explore and query the Linked Datasets Model Organism Linked Database (MO-LD)
  7. 7. RDFization of InterMine Query InterMine API with Object Model Convert the tabular results into triples (RDF) Merge the resources with the same primary keys Link Data with external datasets Load the RDF data into a triple store
  8. 8. InterMine-LD
  9. 9. External linked datasets (38) with the 6 MODs Linking MODs with LOD - incomplete linking InterMine primary key Identifier DataSource 00001 Q6GZX4 Uniprot 00002 ASIC1 HGNC 00003 GO:0004396 GO 00004 AL732629.6 RefSeq Cross References Table* from InterMine * Also done with Ontology tables
  10. 10. Linked Data Platform SPARQL Query Editor Faceted Browser (Virtuoso) RelFinder for Relation Visualization Application Programming Interface (Swagger.io - OpenAPIs specification) MO-LD.org
  11. 11. SPARQL Support for Programmers Get all reactions from KEGG that are associated with genes that are extrinsic components of the cell membrane
  12. 12. Federated Query
  13. 13. RelFinder - Find connections between 2 or more entities
  14. 14. Infrastructure Deployment and Reusability Docker (container engine) to build and deploy the MOLD infrastructure https://hub.docker.com/u/mold Microservices architecture for reusability and extensibility : Web application, API and Virtuoso images Cloud-Ready - tested on Amazon EC2 Tutorial : https://github.com/mo-ld/mold-dock Only 5 commands to deploy a Linked-MOD !
  15. 15. Reflections Not all data in MODs are available in the InterMine instance Not all references are in the cross-references table, limits Linked Data generation Team interactions led to change in export process RDFizer focuses only on two tables of the core object model offers as template by InterMine (CrossReference + DataSource and Ontology + OntologyTerm). Support for mine-specific tables would also improve coverage of contents and links
  16. 16. Can we improve the quality of the representation by using community vocabularies (FALDO, CiTo, SIO)? Can we offer high performance query services (Triple Pattern Fragments/HDT) How can we persist data in other archives (wikidata / schema.org+cse) Are curation priorties in line with what users want? Can pan-species analyses tell us something about success in drug discovery? Future Directions
  • nourredineZaher

    Apr. 28, 2019
  • kkloke86

    Jun. 29, 2016

Model organisms such as budding yeast provide a common platform to interrogate and understand cellular and physiological processes. Knowledge about model organisms, whether generated during the course of scientific investigation, or extracted from published articles, are made available by model organism databases (MODs) such as the Saccharomyces Genome Database (SGD) for powerful, data-driven bioinformatic analyses. Integrative platforms such as InterMine offer a standard platform for MOD data exploration and data mining. Yet, today’s bioinformatic analyses also requires access to a significantly broader set of structured biomedical data, such as what can be found in the emerging network of Linked Open Data (LOD). If MOD data could be provisioned as FAIR (Findable, Accessible, Interoperable, and Reusable), then scientists could leverage a greater amount of interoperable data in knowledge discovery. The goal of this proposal is to increase the utility of MOD data by implementing standards-compliant data access interfaces that interoperate with Linked Data. We will focus our efforts on developing interfaces for data access, data retrieval, and query answering for SGD. Our software will publish InterMine data as LOD that are semantically annotated with ontologies and be retrieved using standardized formats (e.g. JSON-LD, Turtle). We will facilitate the exploration of MOD data for hypothesis testing, by implementing efficient query answering using Linked Data Fragments, and by developing a set of graphical user interfaces to search for data of interest, explore connections, and answer questions that leverage the wider LOD network. Finally, we will develop a locally and cloud-deployable image to enable the rapid deployment of the proposed infrastructure. Our efforts to increase interoperability and ease of deployment for biomedical data repositories will increase research productivity and reduce costs associated with data integration and warehouse maintenance.

Views

Total views

1,143

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

14

Shares

0

Comments

0

Likes

2

×