NETTAB 2013
Upcoming SlideShare
Loading in...5
×
 

NETTAB 2013

on

  • 1,374 views

Presentation at NETTAB 2013, Venice Lido, Italy. October 2013. http://nettab.org/2013/progr.php

Presentation at NETTAB 2013, Venice Lido, Italy. October 2013. http://nettab.org/2013/progr.php

Statistics

Views

Total Views
1,374
Views on SlideShare
777
Embed Views
597

Actions

Likes
1
Downloads
5
Comments
0

7 Embeds 597

http://www.oerc.ox.ac.uk 497
http://oerc.ox.ac.uk 56
http://mundus.oerc.ox.ac.uk 20
http://intranet.oerc.ox.ac.uk 12
https://twitter.com 10
http://www.linkedin.com 1
http://www.google.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

NETTAB 2013 NETTAB 2013 Presentation Transcript

  • Bio-GraphIIn: a graph-based, integrative and semantically enabled repository for life science experimental data Alejandra González-Beltrán, PhD Oxford e-Research Centre, University of Oxford alejandra.gonzalezbeltran@oerc.ox.ac.uk @alegonbel NETTAB 2013 October 16-18, 2013 Venice Lido, Italy
  • Experimental workflow Planning Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis Perform new experiment
  • Experimental workflow Planning data + metadata Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis Perform new experiment
  • Experimental workflow Planning data + metadata Use existing data Publication Data Collection Perform new experiment Data Scientist Data Management Visualization Analysis y lit ibi uc d ro ep eR nc cie S
  • Experimental workflow Planning Planning Use existing data Publication Data Collection Perform new experiment Use existing data Publication Data Scientist Data Scientist Data Management Visualization Analysis Data Collection Data Management Visualization Analysis ity il ab us Re ta Da Perform new experiment
  • Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  • Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  • Motivation 1/4 retrospective data submissions Planning Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis Perform new experiment
  • Motivation 1/4 retrospective data submissions Planning metadata Use existing data Publication Data Collection Data Scientist Data Management Visualization retrospective Analysis Perform new experiment
  • Motivation 1/4 retrospective data submissions Planning metadata Use existing data Publication Data Collection Perform new experiment Data Scientist Data Management Visualization retrospective Analysis Metadata edits to repositories are not straightforward, often requiring deleting the submission and re-submitting the data
  • Motivation 1/4 prospective retrospective data submissions Planning metadata metadata Publication Use existing data Data Collection metadata metadata Data Scientist Data Management Visualization metadata Analysis metadata Perform new experiment
  • Motivation 1/4 prospective retrospective data submissions Planning metadata metadata Publication Use existing data Data Collection metadata metadata Perform new experiment Data Scientist Data Management Visualization metadata Analysis metadata Support incremental data deposition + metadata edits
  • Motivation 2/4 heterogeneous experimental data Data Collection
  • Motivation 3/4 fragmentation of formats and databases Publication
  • Motivation 4/4 semantic queries leading to integrative analysis • Visualization Analysis support for rich and uniform query interface across studies, enabling integrative data analysis to provide new insights at systems biology level • e.g. find all data files associated with samples from a particular organism (e.g. Homo Sapiens) and particular tissue type (e.g. liver) • allow to select a set of samples/data files through browsing, semantic filtering • provide links to analysis and visualisation platforms life science experiments repo
  • Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary
  • The Investigation/Study/Assay ( ) infrastructure generic format for experimental description and data exchange community engagement open source software tools 12
  • investigation investigation high level concept to link related studies study the central unit, containing information on the subject under study, its characteristics and any treatments applied. a study has associated assays assay test performed either on material taken from the subject or on the whole initial subject, which produce qualitative or quantitative measurements (data) assay(s) assay(s) pointers to data file names/location external files in native or other formats data data • environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics • stem cell discovery • system biology • transcriptomics • toxicogenomics • communities working to build a library of cellular signatures
  • Experimental workflow - graph representation H1.sample1 H1.sample1.labeled ... Scanning h1-s1.cel ... Labeling Scanning h1-s2.cel ... Scanning h2-s1.cel H1 H. Sapiens 35 Years H2 H. Sapiens 33 Years H1.sample2 H2.sample1 Labeling H2.sample1.labeled
  • Experimental workflow - graph representation Labeling H1.sample1.labeled ... Scanning h1-s1.cel ... H1.sample1 Scanning h1-s2.cel ... Scanning h2-s1.cel H1 H. Sapiens 35 Years H2 H1.sample2 Labeling H2.sample1 H2.sample1.labeled H. Sapiens 33 Years Spreadsheets for end-users ... H1 H. Sapiens 35 Years H1.sample1 H1 H. Sapiens 35 Years H1.sample2 H2 H. Sapiens 33 Years H2.sample1 Labeling H1.sample1.labeled H2.sample1.labeled h1-s1.cel Scanning Labeling Scanning h1-s2.cel Scanning h2-s1.cel vocabulary for the description of the experimental workflow
  • Experimental workflow - graph representation Labeling H1.sample1.labeled ... Scanning h1-s1.cel ... H1.sample1 Scanning h1-s2.cel ... Scanning h2-s1.cel H1 H. Sapiens 35 Years H2 H1.sample2 Labeling H2.sample1 H2.sample1.labeled H. Sapiens 33 Years Spreadsheets for end-users ... H1 H. Sapiens 35 Years H1.sample1 H1 H. Sapiens 35 Years H1.sample2 H2 H. Sapiens 33 Years H2.sample1 Labeling H1.sample1.labeled H2.sample1.labeled h1-s1.cel Scanning Labeling Scanning h1-s2.cel Scanning h2-s1.cel vocabulary for the description of the experimental workflow syntactic interoperability across biological experiments of different types
  • Machine-readable representation Graph + Semantics obi:material entity obi:material sample tax:homo sapiens H1.sample1 obi:material processing d i fie c spe _of s_ bi:i nput o i _ labeling1obi: obi:processed material scanning1 d is_ i fie c _o spe utp ci spe _of _ ut fied i:is put _o H1.sample1. b in o f _ labeled isa:raw data file ob i:i _o s_spe utp ci ut fied _o f d labeling2obi scanning2 ob :is_ ifie c i:is e f _o spe fied sp _o ci _ _o _spe utp ci fie H1.sample2. s_spe _of i:is put utp ci ut d ob in : i ut ut fied _o i _ _o f ob inp H1.sample2 labeled _ f isa:executes H1 ives bfo:der from bfo: der _fro ives m obi:planned process labeling protocol obi:protocol semantic interoperability across biological experiments of different types h1-s1.cel h1-s2.cel
  • architecture) ISA-TAB parser! graph! analysis! mappings between the ISA-TAB syntax and ontologies isa2owl mapping! parser! Configuration! file! Resource Description Framework (RDF)
  • ISA$OBI'mapping' Ontology for Biomedical Investigations
  • Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary
  • Bio-GraphIIn Requirements Bio-GraphIIn (pronounced “bio-graphene”) stands for Biological Graph Investigation Index BioInvestigation Index (BII)
  • Bio-GraphIIn Requirements • • • • • support prospective annotation of experiments • support Create Read Update Delete (CRUD) operations manage heterogeneous biological and biomedical metadata • relying on ISA-TAB support data integration & semantic queries • relying on ISA2OWL links to analysis and visualisation platforms take advantage of experimental design information, improving metadata such as including study groups
  • Functionality provided by existing repositories & Bio-GrapIIn requirements Browsing/ Searching Programmatic submission Programmatic access SampleTAB browse/search X (email submission) REST API X X YES MAGETAB browse/filter/ search/ advanced search MAGE-TAB spreadsheet/ MIAMExpress REST API X X X* SRA-XML browse/text/ sequence/ advance search Webin, REST REST API X X X mass PRIDE spectromet PRIDE-ML inspector/ ry PRIDE Biomart X (FTP upload) Java API X X X Data Types BioSample DB Format sample info ArrayExpress sequencing /GEO SRA/ENA PRIDE BII Bio-GraphIIn next generation sequencing All All CRUD Community operations curation RDF ISA-TAB browse/text search/filtering X SOAP web services X X X ISA-TAB browse/filter/ search/ advanced search YES (upload, REST) REST API YES YES YES *We are referring to the ArrayExpress repository not to the Expression Atlas, which is available in RDF
  • Functionality provided by existing repositories & Bio-GrapIIn requirements Browsing/ Searching Programmatic submission Programmatic access SampleTAB browse/search X (email submission) REST API X X YES MAGETAB browse/filter/ search/ advanced search MAGE-TAB spreadsheet/ MIAMExpress REST API X X X* SRA-XML browse/text/ sequence/ advance search Webin, REST REST API X X X mass PRIDE spectromet PRIDE-ML inspector/ ry PRIDE Biomart X (FTP upload) Java API X X X Data Types BioSample DB Format sample info ArrayExpress sequencing /GEO SRA/ENA PRIDE BII Bio-GraphIIn next generation sequencing All All CRUD Community operations curation RDF ISA-TAB browse/text search/filtering X SOAP web services X X X ISA-TAB browse/filter/ search/ advanced sing search row pe YES (upload, REST) REST API YES YES YES b oty rot p e typ oto pr e typ o rot p e typ o rot p *We are referring to the ArrayExpress repository not to the Expression Atlas, which is available in RDF e typ o rot p
  • Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary
  • semantic representation of the graph, rich queries over common semantic framework enabling integration with other repositories
  • independence from underlying graph technology
  • property graphs http://www.tinkerpop.com/ independence from underlying graph technology
  • R SPARQL package http://www.r-bloggers.com/sparql-with-r-in-less-than-5-minutes/ http://refinery-platform.org/ Django-based analysis and visualisation platform, relies on ISA-TAB metadata
  • Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  • SPARQL queries SELECT DISTINCT ?i_id ?s_id ?s_title ?organism WHERE { ?study rdf:type obi:0000066. obi:investigation_title ?study rdfs:label ?s_id. ?s_title_iri rdf:type obi:0001622. obi:investigation ?s_title_iri iao:0000219 ?study. iao:denotes ?s_title_iri isa:00000089 ?s_title. ?source rdf:type bfo:0000040. bfo:material_entity ?source obi:0000295 ?study. obi:is_specified_input_of OPTIONAL { ?study bfo:0000050 ?investigation. bfo:part_of ?investigation rdf:type obi:0000011. obi:planned_process ?investigation rdfs:label ?i_id. } OPTIONAL { ?source rdf:type ?organism_iri. ?organism_iri rdf:type obi:0100026. obi:organism ?organism_iri rdfs:label ?organism. } OPTIONAL { ?source bfo:0000053 ?characteristic. ?characteristic rdf:type bfo:0000005.bfo:dependent continuant ?characteristic rdfs:comment ?comment. ?characteristic rdfs:label ?organism. FILTER regex(str(?comment), "organism") } } PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX bfo: <http://purl.obolibrary.org/obo/BFO_> PREFIX iao: <http://purl.obolibrary.org/obo/IAO_> PREFIX obi: <http://purl.obolibrary.org/obo/OBI_> PREFIX tax: <http://purl.obolibrary.org/obo/NCBITaxon_> PREFIX isa: <http://purl.org/isa-tools/ISA_> PREFIX ro: <http://purl.obolibrary.org/obo/RO_> Considering theoretical results on SPARQL to improve query performance, such as AND-OPT well-designed graph patterns Pérez et al, Semantics and complexity of SPARQL, ACM Trans Database Syst. 2009 Letelier et al. Static analysis and optimization of semantic web queries PODS 2012.
  • Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  • investigation studies assays measurement technology
  • http://bii.oerc.ox.ac.uk
  • http://bii.oerc.ox.ac.uk
  • Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  • Summary and future work • Bio-GraphIIn - the new integrative and semantically -enabled repository for the ISA infrastructure: motivation, requirements, design & architecture, prototype • Support for data integration, uniform semantic queries across experiments enabled by a common semantic framework (ISA2OWL) • More work required on • Querying: performance analysis, support for ad hoc queries • • Extension/improvement of prototype Interfaces to services (e.g. BioPortal) and analysis/ visualisation platforms (e.g. R/Bioconductor & Refinery)
  • funders
  • Thanks for your attention! Questions? You can email us... isatools@googlegroups.com View our website http://www.isa-tools.org View our Git repo & contribute http://github.com/ISA-tools View our blog http://isatools.wordpress.com Follow us on Twitter @isatools