Your SlideShare is downloading. ×
0
Bio-GraphIIn: a graph-based,
integrative and semantically enabled
repository for life science
experimental data
Alejandra ...
Experimental workflow
Planning
Use existing
data
Publication

Data Collection

Data
Scientist
Data
Management

Visualizatio...
Experimental workflow
Planning

data
+
metadata

Use existing
data
Publication

Data Collection

Data
Scientist
Data
Manage...
Experimental workflow
Planning

data
+
metadata

Use existing
data
Publication

Data Collection

Perform new
experiment

Da...
Experimental workflow

Planning

Planning
Use existing
data

Publication

Data Collection

Perform new
experiment

Use exis...
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retros...
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retros...
Motivation 1/4

retrospective data submissions
Planning
Use existing
data
Publication

Data Collection

Data
Scientist
Dat...
Motivation 1/4

retrospective data submissions
Planning

metadata

Use existing
data

Publication

Data Collection

Data
S...
Motivation 1/4

retrospective data submissions
Planning

metadata

Use existing
data

Publication

Data Collection

Perfor...
Motivation 1/4

prospective

retrospective data submissions
Planning

metadata

metadata

Publication

Use existing
data
D...
Motivation 1/4

prospective

retrospective data submissions
Planning

metadata

metadata

Publication

Use existing
data
D...
Motivation 2/4

heterogeneous experimental data

Data Collection
Motivation 3/4

fragmentation of formats and databases

Publication
Motivation 4/4

semantic queries leading to integrative analysis

•

Visualization

Analysis

support for rich and uniform...
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retros...
The Investigation/Study/Assay (

) infrastructure

generic format for experimental
description and data exchange

communit...
investigation

investigation

high level concept to link
related studies

study
the central unit, containing
information o...
Experimental workflow - graph representation
H1.sample1

H1.sample1.labeled

...

Scanning

h1-s1.cel

...

Labeling

Scann...
Experimental workflow - graph representation
Labeling

H1.sample1.labeled

...

Scanning

h1-s1.cel

...

H1.sample1

Scann...
Experimental workflow - graph representation
Labeling

H1.sample1.labeled

...

Scanning

h1-s1.cel

...

H1.sample1

Scann...
Machine-readable representation
Graph + Semantics
obi:material
entity obi:material
sample
tax:homo
sapiens
H1.sample1

obi...
architecture)

ISA-TAB
parser!

graph!
analysis!

mappings between the ISA-TAB
syntax and ontologies

isa2owl mapping!
par...
ISA$OBI'mapping'
Ontology for Biomedical
Investigations
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retros...
Bio-GraphIIn Requirements
Bio-GraphIIn (pronounced “bio-graphene”) stands for Biological Graph
Investigation Index

BioInv...
Bio-GraphIIn Requirements

•
•
•
•
•

support prospective annotation of experiments

•

support Create Read Update Delete ...
Functionality provided by existing repositories
& Bio-GrapIIn requirements
Browsing/
Searching

Programmatic
submission

P...
Functionality provided by existing repositories
& Bio-GrapIIn requirements
Browsing/
Searching

Programmatic
submission

P...
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retros...
semantic representation of the graph,
rich queries over common
semantic framework enabling
integration with other reposito...
independence from underlying
graph technology
property graphs
http://www.tinkerpop.com/

independence from underlying
graph technology
R SPARQL package
http://www.r-bloggers.com/sparql-with-r-in-less-than-5-minutes/

http://refinery-platform.org/

Django-bas...
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retros...
SPARQL queries
SELECT DISTINCT
?i_id ?s_id ?s_title ?organism
WHERE {
?study rdf:type obi:0000066. obi:investigation_title...
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retros...
investigation studies

assays
measurement

technology
http://bii.oerc.ox.ac.uk
http://bii.oerc.ox.ac.uk
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retros...
Summary and future work

•

Bio-GraphIIn - the new integrative and semantically -enabled
repository for the ISA infrastruc...
funders
Thanks for your attention!
Questions?
You can email us...
isatools@googlegroups.com
View our website
http://www.isa-tools....
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
Upcoming SlideShare
Loading in...5
×

NETTAB 2013

2,010

Published on

Presentation at NETTAB 2013, Venice Lido, Italy. October 2013. http://nettab.org/2013/progr.php

Published in: Education, Technology
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
2,010
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
11
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "NETTAB 2013"

  1. 1. Bio-GraphIIn: a graph-based, integrative and semantically enabled repository for life science experimental data Alejandra González-Beltrán, PhD Oxford e-Research Centre, University of Oxford alejandra.gonzalezbeltran@oerc.ox.ac.uk @alegonbel NETTAB 2013 October 16-18, 2013 Venice Lido, Italy
  2. 2. Experimental workflow Planning Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis Perform new experiment
  3. 3. Experimental workflow Planning data + metadata Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis Perform new experiment
  4. 4. Experimental workflow Planning data + metadata Use existing data Publication Data Collection Perform new experiment Data Scientist Data Management Visualization Analysis y lit ibi uc d ro ep eR nc cie S
  5. 5. Experimental workflow Planning Planning Use existing data Publication Data Collection Perform new experiment Use existing data Publication Data Scientist Data Scientist Data Management Visualization Analysis Data Collection Data Management Visualization Analysis ity il ab us Re ta Da Perform new experiment
  6. 6. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  7. 7. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  8. 8. Motivation 1/4 retrospective data submissions Planning Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis Perform new experiment
  9. 9. Motivation 1/4 retrospective data submissions Planning metadata Use existing data Publication Data Collection Data Scientist Data Management Visualization retrospective Analysis Perform new experiment
  10. 10. Motivation 1/4 retrospective data submissions Planning metadata Use existing data Publication Data Collection Perform new experiment Data Scientist Data Management Visualization retrospective Analysis Metadata edits to repositories are not straightforward, often requiring deleting the submission and re-submitting the data
  11. 11. Motivation 1/4 prospective retrospective data submissions Planning metadata metadata Publication Use existing data Data Collection metadata metadata Data Scientist Data Management Visualization metadata Analysis metadata Perform new experiment
  12. 12. Motivation 1/4 prospective retrospective data submissions Planning metadata metadata Publication Use existing data Data Collection metadata metadata Perform new experiment Data Scientist Data Management Visualization metadata Analysis metadata Support incremental data deposition + metadata edits
  13. 13. Motivation 2/4 heterogeneous experimental data Data Collection
  14. 14. Motivation 3/4 fragmentation of formats and databases Publication
  15. 15. Motivation 4/4 semantic queries leading to integrative analysis • Visualization Analysis support for rich and uniform query interface across studies, enabling integrative data analysis to provide new insights at systems biology level • e.g. find all data files associated with samples from a particular organism (e.g. Homo Sapiens) and particular tissue type (e.g. liver) • allow to select a set of samples/data files through browsing, semantic filtering • provide links to analysis and visualisation platforms life science experiments repo
  16. 16. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary
  17. 17. The Investigation/Study/Assay ( ) infrastructure generic format for experimental description and data exchange community engagement open source software tools 12
  18. 18. investigation investigation high level concept to link related studies study the central unit, containing information on the subject under study, its characteristics and any treatments applied. a study has associated assays assay test performed either on material taken from the subject or on the whole initial subject, which produce qualitative or quantitative measurements (data) assay(s) assay(s) pointers to data file names/location external files in native or other formats data data • environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics • stem cell discovery • system biology • transcriptomics • toxicogenomics • communities working to build a library of cellular signatures
  19. 19. Experimental workflow - graph representation H1.sample1 H1.sample1.labeled ... Scanning h1-s1.cel ... Labeling Scanning h1-s2.cel ... Scanning h2-s1.cel H1 H. Sapiens 35 Years H2 H. Sapiens 33 Years H1.sample2 H2.sample1 Labeling H2.sample1.labeled
  20. 20. Experimental workflow - graph representation Labeling H1.sample1.labeled ... Scanning h1-s1.cel ... H1.sample1 Scanning h1-s2.cel ... Scanning h2-s1.cel H1 H. Sapiens 35 Years H2 H1.sample2 Labeling H2.sample1 H2.sample1.labeled H. Sapiens 33 Years Spreadsheets for end-users ... H1 H. Sapiens 35 Years H1.sample1 H1 H. Sapiens 35 Years H1.sample2 H2 H. Sapiens 33 Years H2.sample1 Labeling H1.sample1.labeled H2.sample1.labeled h1-s1.cel Scanning Labeling Scanning h1-s2.cel Scanning h2-s1.cel vocabulary for the description of the experimental workflow
  21. 21. Experimental workflow - graph representation Labeling H1.sample1.labeled ... Scanning h1-s1.cel ... H1.sample1 Scanning h1-s2.cel ... Scanning h2-s1.cel H1 H. Sapiens 35 Years H2 H1.sample2 Labeling H2.sample1 H2.sample1.labeled H. Sapiens 33 Years Spreadsheets for end-users ... H1 H. Sapiens 35 Years H1.sample1 H1 H. Sapiens 35 Years H1.sample2 H2 H. Sapiens 33 Years H2.sample1 Labeling H1.sample1.labeled H2.sample1.labeled h1-s1.cel Scanning Labeling Scanning h1-s2.cel Scanning h2-s1.cel vocabulary for the description of the experimental workflow syntactic interoperability across biological experiments of different types
  22. 22. Machine-readable representation Graph + Semantics obi:material entity obi:material sample tax:homo sapiens H1.sample1 obi:material processing d i fie c spe _of s_ bi:i nput o i _ labeling1obi: obi:processed material scanning1 d is_ i fie c _o spe utp ci spe _of _ ut fied i:is put _o H1.sample1. b in o f _ labeled isa:raw data file ob i:i _o s_spe utp ci ut fied _o f d labeling2obi scanning2 ob :is_ ifie c i:is e f _o spe fied sp _o ci _ _o _spe utp ci fie H1.sample2. s_spe _of i:is put utp ci ut d ob in : i ut ut fied _o i _ _o f ob inp H1.sample2 labeled _ f isa:executes H1 ives bfo:der from bfo: der _fro ives m obi:planned process labeling protocol obi:protocol semantic interoperability across biological experiments of different types h1-s1.cel h1-s2.cel
  23. 23. architecture) ISA-TAB parser! graph! analysis! mappings between the ISA-TAB syntax and ontologies isa2owl mapping! parser! Configuration! file! Resource Description Framework (RDF)
  24. 24. ISA$OBI'mapping' Ontology for Biomedical Investigations
  25. 25. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary
  26. 26. Bio-GraphIIn Requirements Bio-GraphIIn (pronounced “bio-graphene”) stands for Biological Graph Investigation Index BioInvestigation Index (BII)
  27. 27. Bio-GraphIIn Requirements • • • • • support prospective annotation of experiments • support Create Read Update Delete (CRUD) operations manage heterogeneous biological and biomedical metadata • relying on ISA-TAB support data integration & semantic queries • relying on ISA2OWL links to analysis and visualisation platforms take advantage of experimental design information, improving metadata such as including study groups
  28. 28. Functionality provided by existing repositories & Bio-GrapIIn requirements Browsing/ Searching Programmatic submission Programmatic access SampleTAB browse/search X (email submission) REST API X X YES MAGETAB browse/filter/ search/ advanced search MAGE-TAB spreadsheet/ MIAMExpress REST API X X X* SRA-XML browse/text/ sequence/ advance search Webin, REST REST API X X X mass PRIDE spectromet PRIDE-ML inspector/ ry PRIDE Biomart X (FTP upload) Java API X X X Data Types BioSample DB Format sample info ArrayExpress sequencing /GEO SRA/ENA PRIDE BII Bio-GraphIIn next generation sequencing All All CRUD Community operations curation RDF ISA-TAB browse/text search/filtering X SOAP web services X X X ISA-TAB browse/filter/ search/ advanced search YES (upload, REST) REST API YES YES YES *We are referring to the ArrayExpress repository not to the Expression Atlas, which is available in RDF
  29. 29. Functionality provided by existing repositories & Bio-GrapIIn requirements Browsing/ Searching Programmatic submission Programmatic access SampleTAB browse/search X (email submission) REST API X X YES MAGETAB browse/filter/ search/ advanced search MAGE-TAB spreadsheet/ MIAMExpress REST API X X X* SRA-XML browse/text/ sequence/ advance search Webin, REST REST API X X X mass PRIDE spectromet PRIDE-ML inspector/ ry PRIDE Biomart X (FTP upload) Java API X X X Data Types BioSample DB Format sample info ArrayExpress sequencing /GEO SRA/ENA PRIDE BII Bio-GraphIIn next generation sequencing All All CRUD Community operations curation RDF ISA-TAB browse/text search/filtering X SOAP web services X X X ISA-TAB browse/filter/ search/ advanced sing search row pe YES (upload, REST) REST API YES YES YES b oty rot p e typ oto pr e typ o rot p e typ o rot p *We are referring to the ArrayExpress repository not to the Expression Atlas, which is available in RDF e typ o rot p
  30. 30. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary
  31. 31. semantic representation of the graph, rich queries over common semantic framework enabling integration with other repositories
  32. 32. independence from underlying graph technology
  33. 33. property graphs http://www.tinkerpop.com/ independence from underlying graph technology
  34. 34. R SPARQL package http://www.r-bloggers.com/sparql-with-r-in-less-than-5-minutes/ http://refinery-platform.org/ Django-based analysis and visualisation platform, relies on ISA-TAB metadata
  35. 35. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  36. 36. SPARQL queries SELECT DISTINCT ?i_id ?s_id ?s_title ?organism WHERE { ?study rdf:type obi:0000066. obi:investigation_title ?study rdfs:label ?s_id. ?s_title_iri rdf:type obi:0001622. obi:investigation ?s_title_iri iao:0000219 ?study. iao:denotes ?s_title_iri isa:00000089 ?s_title. ?source rdf:type bfo:0000040. bfo:material_entity ?source obi:0000295 ?study. obi:is_specified_input_of OPTIONAL { ?study bfo:0000050 ?investigation. bfo:part_of ?investigation rdf:type obi:0000011. obi:planned_process ?investigation rdfs:label ?i_id. } OPTIONAL { ?source rdf:type ?organism_iri. ?organism_iri rdf:type obi:0100026. obi:organism ?organism_iri rdfs:label ?organism. } OPTIONAL { ?source bfo:0000053 ?characteristic. ?characteristic rdf:type bfo:0000005.bfo:dependent continuant ?characteristic rdfs:comment ?comment. ?characteristic rdfs:label ?organism. FILTER regex(str(?comment), "organism") } } PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX bfo: <http://purl.obolibrary.org/obo/BFO_> PREFIX iao: <http://purl.obolibrary.org/obo/IAO_> PREFIX obi: <http://purl.obolibrary.org/obo/OBI_> PREFIX tax: <http://purl.obolibrary.org/obo/NCBITaxon_> PREFIX isa: <http://purl.org/isa-tools/ISA_> PREFIX ro: <http://purl.obolibrary.org/obo/RO_> Considering theoretical results on SPARQL to improve query performance, such as AND-OPT well-designed graph patterns Pérez et al, Semantics and complexity of SPARQL, ACM Trans Database Syst. 2009 Letelier et al. Static analysis and optimization of semantic web queries PODS 2012.
  37. 37. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  38. 38. investigation studies assays measurement technology
  39. 39. http://bii.oerc.ox.ac.uk
  40. 40. http://bii.oerc.ox.ac.uk
  41. 41. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  42. 42. Summary and future work • Bio-GraphIIn - the new integrative and semantically -enabled repository for the ISA infrastructure: motivation, requirements, design & architecture, prototype • Support for data integration, uniform semantic queries across experiments enabled by a common semantic framework (ISA2OWL) • More work required on • Querying: performance analysis, support for ad hoc queries • • Extension/improvement of prototype Interfaces to services (e.g. BioPortal) and analysis/ visualisation platforms (e.g. R/Bioconductor & Refinery)
  43. 43. funders
  44. 44. Thanks for your attention! Questions? You can email us... isatools@googlegroups.com View our website http://www.isa-tools.org View our Git repo & contribute http://github.com/ISA-tools View our blog http://isatools.wordpress.com Follow us on Twitter @isatools
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×