The Open Source ISA Metadata Tracking Framework:From Data Curation and Management at the Source, to the Linked Data Univer...
What is ISA all about?                                We want to enable better reporting of                               ...
What’s the problem?                                     Could be beans. Could be peas. Could be soup.                     ...
What’s the problem?                                     Could be beans. Could be peas. Could be soup.                     ...
What’s the problem?                                     Could be beans. Could be peas. Could be soup.                     ...
1. There is fragmentation in formats       Can you imagine having to translate everything you write into a different langu...
1. There is fragmentation in formats       Can you imagine having to translate everything you write into a different langu...
1. There is fragmentation in formats       Can you imagine having to translate everything you write into a different langu...
1. There is fragmentation in formats: our solution Repositories are making it difficult for biologists to submit data, and ...
1. There is fragmentation in formats: our solution                     investigation                   investigation      ...
2. Different formats often capture different information                        ...But there are lots of similarities Mini...
Now integrated inHelping to demystify theunwieldy world ofstandards...Find out what standards are outthere...MI Checklists...
Now integrated inIn biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the samelangu...
The ISA tools...               Ontologies                                               MI Checklists                     ...
The ISA tools     Developed on top of the ISA-Tab format...modular, configurable, open source, Java based*                 ...
The ISA tools... a tool for all your needsISCB-Asia, 17th December 2012
Configurable...                                We need to support lots of different checklists,                            ...
Create configuration xml filesISCB-Asia, 17th December 2012
isacreator                                Create & Edit ISA-TabISCB-Asia, 17th December 2012
The ISAcreator...                                                                       file chooser                      ...
Ontology search and automated annotation in Google Docs
Make sure the ISA-Tab is correctISCB-Asia, 17th December 2012
validate from the dedicated tool...                                               or...                                val...
Convert to or from differing formatsISCB-Asia, 17th December 2012
The converters        Fully Endorsed by ArrayExpress, PRIDE and the European Nucleotide Archive (ENA)...                  ...
The converters...semantic web                                     type,               material(en*ty(                     ...
The converters...semantic web    •Make the semantics of ISAtab explicit, including materials & data entities      & proces...
The converters...semantic web    Notes&in&Lab&books&         Spreadsheets&&&Tables&     Facts&as&RDF&statements&(informa1o...
Get ISA-Tab into a database                                Share it (or don’t) with the worldISCB-Asia, 17th December 2012
Database & Web ApplicationISCB-Asia, 17th December 2012
Web applicationISCB-Asia, 17th December 2012
Web applicationISCB-Asia, 17th December 2012
Web applicationISCB-Asia, 17th December 2012
Last but not least...                                      AnalysisISCB-Asia, 17th December 2012
Package to read ISA-Tab into R, especially BioConductor to run analysis                                                  s...
isacommonsA growing ecosystem of over30 public and internalresources using the ISAmetadata tracking frameworkto facilitate...
ISCB-Asia, 17th December 2012
ISA software suite: supporting standards-compliant                          experimental annotation and enabling curation ...
Thanks for listening...                                            Questions??                                            ...
Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Curation and Management at the Source, to the L...
Upcoming SlideShare
Loading in …5
×

Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Curation and Management at the Source, to the Linked Data Universe

4,668 views

Published on

Eamonn Maguire's talk on "The Open Source ISA Metadata Tracking Framework: From Data Curation and Management at the Source, to the Linked Data Universe" at ISCB-Asia, December 17th 2012

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
4,668
On SlideShare
0
From Embeds
0
Number of Embeds
3,325
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Eamonn Maguire: The Open Source ISA Metadata Tracking Framework: From Data Curation and Management at the Source, to the Linked Data Universe

  1. 1. The Open Source ISA Metadata Tracking Framework:From Data Curation and Management at the Source, to the Linked Data UniverseEamonn MaguireLead Software EngineerOxford Universityeamonn.maguire@oerc.ox.ac.ukISCB-Asia, 17th December 2012
  2. 2. What is ISA all about? We want to enable better reporting of experiments... We want to make to easier for submitters... We want to provide tooling which biologists will want to use...ISCB-Asia, 17th December 2012
  3. 3. What’s the problem? Could be beans. Could be peas. Could be soup. Analogy time. Each can is an experiment. Tin can analogy borrowed from We have no labels, so no indication about what is in the can. Norman Morrison & converted from ontologies to metadata transfer standards. In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same language. 1. there is fragmentation in formats: the formats used to describe experiments are different, e.g. MAGE-Tab, PRIDE-ML, SRA-XML. 2. different formats often capture different information - often not enough to actually repeat an experiment correctly 3. the terminologies used to describe an experiment is different, e.g. humans vs homo sapiens or rat vs rattus norvegicus, making search more difficult.ISCB-Asia, 17th December 2012
  4. 4. What’s the problem? Could be beans. Could be peas. Could be soup. 可能是豌豆 - a different representation...non latin language Analogy time. Each can is an experiment. Tin can analogy borrowed from We have no labels, so no indication about what is in the can. Norman Morrison & converted from ontologies to metadata transfer standards. In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same language. 1. there is fragmentation in formats: the formats used to describe experiments are different, e.g. MAGE-Tab, PRIDE-ML, SRA-XML. 2. different formats often capture different information - often not enough to actually repeat an experiment correctly 3. the terminologies used to describe an experiment is different, e.g. humans vs homo sapiens or rat vs rattus norvegicus, making search more difficult.ISCB-Asia, 17th December 2012
  5. 5. What’s the problem? Could be beans. Could be peas. Could be soup. 可能是豌豆 - a different representation...non latin language Might be petit pois - a different terminology Analogy time. Each can is an experiment. Tin can analogy borrowed from We have no labels, so no indication about what is in the can. Norman Morrison & converted from ontologies to metadata transfer standards. In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same language. 1. there is fragmentation in formats: the formats used to describe experiments are different, e.g. MAGE-Tab, PRIDE-ML, SRA-XML. 2. different formats often capture different information - often not enough to actually repeat an experiment correctly 3. the terminologies used to describe an experiment is different, e.g. humans vs homo sapiens or rat vs rattus norvegicus, making search more difficult.ISCB-Asia, 17th December 2012
  6. 6. 1. There is fragmentation in formats Can you imagine having to translate everything you write into a different language in order to submit your data?ISCB-Asia, 17th December 2012
  7. 7. 1. There is fragmentation in formats Can you imagine having to translate everything you write into a different language in order to submit your data? 你能想象有翻译成不同的语言编写的一切,以提交 的数据吗?即使转换 工具,像谷歌,翻译弄错了。ISCB-Asia, 17th December 2012
  8. 8. 1. There is fragmentation in formats Can you imagine having to translate everything you write into a different language in order to submit your data? 你能想象有翻译成不同的语言编写的一切,以提交 的数据吗?即使转换 工具,像谷歌,翻译弄错了。 An féidir leat a shamhlú go bhfuil gach rud a scríobh tú a aistriú isteach i dteanga eile dfhonn a chur isteach do chuid sonraí? Fiú uirlisí chomhshó, cosúil le google translate a fháilsé mícheart.ISCB-Asia, 17th December 2012
  9. 9. 1. There is fragmentation in formats: our solution Repositories are making it difficult for biologists to submit data, and for others to use it. Particularly for those performing multi-omic experiments...to submit say proteomic and transcriptomic data, one must provide slightly different information in two very different formats...why? Our solution is one general purpose, flexible format, herein referred to as ISA-Tab. A domain agnostic format to capture experimental metadata in omic experiments (transcriptomic, genomic, proteomic, metabolomic) as well as traditional experiments such as clinical chemistry and histology. ...it already works in lots of domains...nutrigenomics, toxicogenomics, public health... etc.ISCB-Asia, 17th December 2012
  10. 10. 1. There is fragmentation in formats: our solution investigation investigation high level concept to link related studies study the central unit, containing information on the subject under study, its characteristics and any treatments applied. a study has associated assays assay test performed either on material taken from the sub- ject or on the whole initial subject, which produce quali- tative or quantitative meas- urements (data) assay(s) assay(s) pointers to data file Biologists like tab. names/location They don’t like XML. Through basic inference... external files in ISA-Tab is good :) native or other for- mats data dataISCB-Asia, 17th December 2012
  11. 11. 2. Different formats often capture different information ...But there are lots of similarities Minimal Information about a Biological or Biomedical Investigation. The information captured by a format is generated via a ‘checklist’, ideally a list of fields that together provide the minimal amount of information required to be able to reproduce an experiment. MIBBI is trying to harmonise these checklists to reduce redundancy and make them interoperable.We have 32 checklists at present because there are differences in what is deemed importantdepending on the experiment being performed.ISCB-Asia, 17th December 2012
  12. 12. Now integrated inHelping to demystify theunwieldy world ofstandards...Find out what standards are outthere...MI Checklists, ontologiesand formats plus what domainsthey are suited to...Find out about data sharingpolicies from NIH for example.Databases, which standards theyuse etc.ISCB-Asia, 17th December 2012
  13. 13. Now integrated inIn biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the samelanguage. What do I mean by this? Well...1. there is fragmentation:2. different formats often capture different information3. the terminologies used to describe an experiment are different: we promote the use ofontologies to harmonize the recording of experiments.ISCB-Asia, 17th December 2012
  14. 14. The ISA tools... Ontologies MI Checklists Common representation ISA tools brings together a common representation, MI checklists and ontologies.ISCB-Asia, 17th December 2012
  15. 15. The ISA tools Developed on top of the ISA-Tab format...modular, configurable, open source, Java based* See them all at isa-tools.orgISCB-Asia, 17th December 2012
  16. 16. The ISA tools... a tool for all your needsISCB-Asia, 17th December 2012
  17. 17. Configurable... We need to support lots of different checklists, and it should be easy for people to change their requirements should they need to.... So, our infrastructure is built upon XML files. These are created by the ISAConfigurator. A configuration XML file describes the fields (or checklist) required to describe a particular experiment and any ontologies to be used.ISCB-Asia, 17th December 2012
  18. 18. Create configuration xml filesISCB-Asia, 17th December 2012
  19. 19. isacreator Create & Edit ISA-TabISCB-Asia, 17th December 2012
  20. 20. The ISAcreator... file chooser publication searcher visualization ontology search QR code generator isacreatorDeveloped to be a user friendlyway to enter standards-compliant automated ontology taggingmetadata: it has lots of features... spreadsheet-like interface tagterms visualise suggest clear all help powered by ncbo annotator But these are just some of them...we also have a data entry wizard and an import utility... ISCB-Asia, 17th December 2012
  21. 21. Ontology search and automated annotation in Google Docs
  22. 22. Make sure the ISA-Tab is correctISCB-Asia, 17th December 2012
  23. 23. validate from the dedicated tool... or... validate from the command line... or... within ISAcreator directly...ISCB-Asia, 17th December 2012
  24. 24. Convert to or from differing formatsISCB-Asia, 17th December 2012
  25. 25. The converters Fully Endorsed by ArrayExpress, PRIDE and the European Nucleotide Archive (ENA)... Converts MAGE-Tab to ISA-Tab. This is still in beta, however we are getting close to a fully working version. We’ve successfully creating validated ISA-Tab for ~90% of the 21k experiments in ArrayExpress Available as a web service, web interface and source is available for running conversions locally http://isatab.sourceforge.net/magetoisa/ISCB-Asia, 17th December 2012
  26. 26. The converters...semantic web type, material(en*ty( Saghantelian_1, has,specified,input, derives,from, Sample, collec5on, has,specified,output, KO1, type, type, has,specified,input, processed,, material, derives,from, extrac5on, material,, processing, type, has,specified,output, KO1_extract, has,specified,input, type, mass, Informa5on, derives,from, spectrometry, content,en5ty, has,specified,output, type, ./cdf/KO/ko15.CDF,ISCB-Asia, 17th December 2012
  27. 27. The converters...semantic web •Make the semantics of ISAtab explicit, including materials & data entities & processes •Exploit the semantic annotations available in ISAtab datasets •Augment ISA syntax with new elements (e.g. groups), facilitating the understanding & querying of experimental design •Facilitate querying, data integration & knowledge discovery/reasoningISCB-Asia, 17th December 2012
  28. 28. The converters...semantic web Notes&in&Lab&books& Spreadsheets&&&Tables& Facts&as&RDF&statements&(informa1on&for&humans)& (ISAtab&metadata)& (informa1on&for&machines)&ISCB-Asia, 17th December 2012
  29. 29. Get ISA-Tab into a database Share it (or don’t) with the worldISCB-Asia, 17th December 2012
  30. 30. Database & Web ApplicationISCB-Asia, 17th December 2012
  31. 31. Web applicationISCB-Asia, 17th December 2012
  32. 32. Web applicationISCB-Asia, 17th December 2012
  33. 33. Web applicationISCB-Asia, 17th December 2012
  34. 34. Last but not least... AnalysisISCB-Asia, 17th December 2012
  35. 35. Package to read ISA-Tab into R, especially BioConductor to run analysis scripts on your data... It can automatically call microarray, mass spec and flow cytometry analysis packages on appropriate datasets... Available from BioConductor... There is also a script to create Galaxy libraries from ISA-Tab Brad Chapman is working on this at HSPH Dedicated ISAcreator mode. Allows for persistence and perusal of ISA experiments in GenomeSpaceISCB-Asia, 17th December 2012
  36. 36. isacommonsA growing ecosystem of over30 public and internalresources using the ISAmetadata tracking frameworkto facilitate standards-compliant collection, curation,management and reuse ofinvestigations in an increasinglydiverse set of life sciencedomains, including: S t e m C e ll C o m m o n s Nanotechnology Informatics Working GroupISCB-Asia, 17th December 2012
  37. 37. ISCB-Asia, 17th December 2012
  38. 38. ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level Philippe Rocca-Serra; Marco Brandizi; Eamonn Maguire; Nataliya Sklyar; Chris Taylor; Kimberly Begley; Dawn Field; Stephen Harris; Winston Hide; Oliver Hofmann; Steffen Neumann; Peter Sterk; Weida Tong; Susanna- Assunta Sansone Bioinformatics 2010 26: 2354-2356 Towards Interoperable Bioscience Data Sansone SA, Rocca-Serra P, Field D, Maguire E et al Nature Genetics 2012ISCB-Asia, 17th December 2012
  39. 39. Thanks for listening... Questions?? You can email us... isatools@googlegroups.com View our website http://www.isa-tools.org View our Git repo & contribute http://github.com/ISA-tools View our blog http://isatools.wordpress.com Follow us on Twitter @isatoolsISCB-Asia, 17th December 2012

×