P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

1

The open source ISA metadata tracking
framework: from data curation and management at
the source, to the linked data universe

BOSC, Long Beach, July 13-14, 2012

Philippe Rocca-Serra (Ph. D)
ISA Team
twitter: @isatools.org

philippe.rocca-serra@oerc.ox.ac.uk
http://www.isa-tools.org

Friday, 13 July 2012

3

MAIN THEME:
It is all about structuring experimental information to make it
available to computer and software agents to enable mining.

But let’s proceed gradually…


3

MAIN THEME:


Notes in Lab Books
(information for humans)


3

MAIN THEME:


Notes in Lab Books Spreadsheets and Tables
(information for humans) ( the compromise)


3

MAIN THEME:


Notes in Lab Books Spreadsheets and Tables Facts as RDF statements
(information for humans) ( the compromise) (information for machines)


9

Observations
• Experiments are expensive, often publicly funded, still
many fail to see the light.
• Spreadsheets are the most common vehicle for so-called
‘omics’ (functional genomics) experimental metadata
tracking
• technology centric repositories form de facto silos
• conversions are required to allow for deposition to public
databases.
• submitting to common information across a series of
repositories is inefﬁcient


10

Case Study


13

Many ontologies, Many Formats, Many
Requirements…

Grr…Where are the
tools!?!

Credits:
h/p://liverpoolsolfed.wordpress.com/resources/image-‐bank/demonstraAon/


14

ISA framework overview


Why ISA format and Tools?

– Supporting data provenance tracking
– Node/Edge underlying concept
– Tabular as a compromise: a presentation layer inspired by Object
model (FuGE,MAGE-OM)
– A Generic representation, applied to:
• microarray based experiments (MAGE)
• sequencing based experiments (SRA)
• ﬂow cytometry based experiments (FuGE-Flow Cyt)
• mass spectrometry and NMR spectroscopy experiments


Why ISA format and Tools?

investigation investigation
high level concept to link H1 H. Sapiens 35 Years H1.sample1 Labeling H1.sample1.labeled h1-s1.cel
related studies H1 H. Sapiens 35 Years H1.sample2 h1-s2.cel
H2 H. Sapiens 33 Years H2.sample1 Labeling H2.sample1.labeled h2-s1.cel
study
the central unit, containing
information on the subject
under study, its characteristics H1.sample1 Labeling H1.sample1.labeled h1-s1.cel
and any treatments applied. H1
a study has associated assays H. Sapiens H1.sample2 h1-s2.cel
35 Years

assay H2 H2.sample1 Labeling H2.sample1.labeled h2-s1.cel
test performed either on H. Sapiens
33 Years
material taken from the sub-
ject or on the whole initial
subject, which produce quali-
tative or quantitative meas- ISA metadata specifications:
urements (data)
•workflow and process orientated
•compatible with checklist enforcement
•compatible with external vocabulary resources
assay(s) assay(s) •compatible by design with existing schemas
pointers to data file MAGE-Tab
names/location
Pride-xml
SRA-xml

external files in Currently finalizing conversion to RDF to explore
native or other for-
mats
the growing Linked Data universe, in collaboration
with the W3C HCLSIG, Toxbank Consortium)
data data


ISA syntax and Table deﬁnition

• Material Transformations:
– Input and Outputs of Protocols are Material Nodes (Source Name, Sample Name, Extract Name, Labeled Extract Name.)

Material Node Material Node

Characteristics[…]
Factor Value[…] (independent Protocol REF Characteristics[…]
variables)
Factor Value[…] (independent
Material Type
Parameter Value variables)
Comment[…]
[…] Material Type
Comment[…]

Performer (operator
effect)
Date (day effect)

9




Data File Node

variables)
Material Type
Comment[…]
[…] Material Type
Comment[…]

Performer (operator
effect)
Date (day effect)

9




Data File Node

Comment[…]
variables)
Material Type
Comment[…]
[…] Material Type
Comment[…]

Performer (operator
effect)
Date (day effect)

9


19

ISAconﬁgurator Tables


20

ISAconﬁgurator Tables


22

How do ISA tools access Ontology servers?


The ISAcreator...

isacreator
Developed to be a user friendly way to
enter standards-compliant metadata: it
has lots of features...

But these are just some of them...we
also have a data entry wizard and an
import utility...


24

Select and Annotate in ISAcreator


Extending ISAcreator
The Plugin Archictecture


Plugins in ISAcreator

In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good.

•Plugins can be developed for 3 different purposes:

Search (adds extra search space Custom cell editors Extra general functionality
for ontology tool) (for spreadsheet) (which appears in a plugin
menu)
•2 Examples of ISA plugins:
• Access to local metadata stores: Novartis Plugin to Ontology Widget
• Annotation of ﬁndings: Metabolite Identiﬁcation Plugin (Metabolights Repository contribution to ISA project).


Plugins...example 1 Novartis Metastore Search

Search function on the Novartis
Metastore... integrates search results
on the metastore in the Ontology
search tool.

So, with the Novartis plugin in your
Plugin directory, you’ll be able to
search the Novartis metastore
directly within ISAcreator, and it will
handle all the tasks involved with
recording term source, etc.


Plugins Example 2 - Metabolite Identiﬁcation plugin

5
Credits: Kenneth Haug: Metabolights


30

Potential Issues and known hurdles

• The problem of conﬂicting versions
– especially high when working with big consortia
– distributed, decentralized groups of users
• Lack of version control and history
• Absence of collaborative features

– Looking for new solutions while retaining the features !
• OntoMaton: Bringing Google Doc, NCBO Bioportal and
ISA-TAB together !


OntoMaton: Searching


OntoMaton: Tagging


OntoMaton
• Public release: http://goo.gl/2OKFV
• Can be used in any Google Spreadsheet
document

• Application:
• Annotating data records
• Supporting ontology development (see OBI
Quick Term Templates)


31

ISA2RDF work in progress
• Use case on W3C HCLS scientific discourse list
– deciding on the granularity of representation
– building on previous experience
– Evaluating alternative representations.
• Participitation to the Biohackathon 2011
– http://blogs.openaccesscentral.com/blogs/bmcblog/entry/
biohackathon_2011_number_1
– Discussing best practices
• PURL uri and identifiers.org as identifiers
• Openphacts guidelines (http://www.nanopub.org/guidelines/
OpenPHACTS_Nanopublication_Guidlines_v1.8.1.pdf)

•

Preparing for Linked Open Data
✴ ISA2RDF (Toxbank collaboration) contribution to an
ecosystem of software tools supporting the ISA syntax
✴ reliance to internet resolvable identiﬁers
✴ W3C bio/life science Note on Gene Expression RDF -
(PMID: 22449719)
✴ TODO:
✴ Specify comparator groups + analysis methods and
resulting measurements and statistical measures


32

ISA2RDF: work in progress

jeliazkova.nina
[toxbank project]

ISA2OWL

• OWLAPI
• ISA Parser (in memory BII object store objects)

• Mapping ISA syntax into target Ontological Space
• Decoupling Mapping from Conversion Engine
• avoid to be tied to a semantic framework


ISA2OWL: mapping in the
BFO space as starting point


ISA2OWL: mapping issues

• Stability over time
• Keeping track of resource versions
• Gaps in coverage
• Use of local extensions
• Direct requests/contributions


ISA2OWL: development

• include graph metadata (graph provenance to aid
indexing)

• extend semantic validation of ISA archive
• augment annotation by suggesting additions
• facilitate curation work
• create new mappings to other frameworks
(OPML model, SIO,)


33

Publication...

ISA software suite: supporting standards-compliant
experimental annotation and enabling curation at the
community level
Philippe Rocca-Serra; Marco Brandizi; Eamonn Maguire; Nataliya Sklyar; Chris Taylor; Kimberly Begley; Dawn Field; Stephen Harris;
Winston Hide; Oliver Hofmann; Steffen Neumann; Peter Sterk; Weida Tong; Susanna-Assunta Sansone
BioinformaAcs
2010
26:
2354-‐2356


34

Acknowledgements

Groups and individuals participating in:
MIBBI http://mibbi.org
ISA-‐Tab
format http://isatab.sf.net
OBO
Foundry http://obofoundry.org
OBI: http://obi-ontology.org/page/Main_Page
collaborators at:
ISA Infrastructure Team: Cambridge University
Alejandra Gonzalez-‐Beltran
(Oxford) EuNuGO
Harvard School for Public Health
Eamonn Maguire
(Oxford) FDAs NCTR
Philippe Rocca-‐Serra
(Oxford) Leibniz Plant Institute
NERCs NEBC
SIDR,
INIST
Metabolights,
EMBL-‐EBI
Funders:
EU Carcinogenomics Project
UK
BBSRC


35

Groups and individuals participating in:
Winston Hide: HSPH
Oliver Hoffman: HSPH
Shannan Ho Sui : HSPH
Brad Chapman: HSPH
Christoph Steinbeck: Metabolights
Kenneth Haug: Metabolights
Paula de Matos: Metabolights
Magali Roux: INIST
Florian Mazur: INIST
Alain Zasadzinki: INIST
Marie Christine Jacquemot: INIST
Nina Jeliazkova: ToxBank

And many more who have to forgive us!


36

Questions:


P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Recommended

Recommended

More Related Content

Similar to P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe

Similar to P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe (20)

More from Jan Aerts

More from Jan Aerts (20)

Recently uploaded

Recently uploaded (20)

P Rocca-Serra - The open source ISA metadata tracking framework: from data curation and management at the source, to the linked data universe