• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
ISA tools presentation
 

ISA tools presentation

on

  • 1,524 views

 

Statistics

Views

Total Views
1,524
Views on SlideShare
1,518
Embed Views
6

Actions

Likes
2
Downloads
13
Comments
0

3 Embeds 6

https://www.linkedin.com 4
http://a0.twimg.com 1
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    ISA tools presentation ISA tools presentation Presentation Transcript

    • The ISA software suite Eamonn Maguire Lead Software Engineer eamonn.maguire@oerc.ox.ac.uk Novartis, 21st October 2011Tuesday, 8 November 2011
    • Who am I? it’s rhetorical... Irish Formal background is Computer Science (Bachelors) and Bioinformatics (Masters) Lead software engineer on the ISA project DPhil Student at Oxford in Visualization in the Dept. of Computer Science Have my own graphic design company (Antarctic Design) Part of a small but productive and vibrant team at Oxford headed by Susanna-Assunta Sansone. Our work includes the ISA tools/infrastructure, MIBBI & BioSharing. Novartis, 21st October 2011Tuesday, 8 November 2011
    • What is ISA all about? We want to enable better reporting of experiments... We want to make to easier for submitters... We want to provide tooling which biologists will want to use... Novartis, 21st October 2011Tuesday, 8 November 2011
    • What’s the problem? Could be beans. Could be peas. Could be soup. Analogy time. Each can is an experiment. Tin can analogy borrowed from We have no labels, so no indication about what is in the can. Norman Morrison & converted from ontologies to metadata transfer standards. In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same language. What do I mean by this? Well... 1. there is fragmentation: the formats used to describe experiments are different, e.g. MAGE- Tab, PRIDE-ML, SRA-XML, but in essence they capture much of the same information; and 2. the terminologies used to describe experiments is different, even though many concepts are shared such as sample description. Field names as well as values... Novartis, 21st October 2011Tuesday, 8 November 2011
    • What’s the problem? Could be beans. Could be peas. Could be soup. - a different representation...non latin language Analogy time. Each can is an experiment. Tin can analogy borrowed from We have no labels, so no indication about what is in the can. Norman Morrison & converted from ontologies to metadata transfer standards. In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same language. What do I mean by this? Well... 1. there is fragmentation: the formats used to describe experiments are different, e.g. MAGE- Tab, PRIDE-ML, SRA-XML, but in essence they capture much of the same information; and 2. the terminologies used to describe experiments is different, even though many concepts are shared such as sample description. Field names as well as values... Novartis, 21st October 2011Tuesday, 8 November 2011
    • What’s the problem? Could be beans. Could be peas. Could be soup. - a different representation...non latin language Might be petit pois - a different terminology Analogy time. Each can is an experiment. Tin can analogy borrowed from We have no labels, so no indication about what is in the can. Norman Morrison & converted from ontologies to metadata transfer standards. In biology, things aren’t quite as bad as this, we have some labels, but they aren’t all in the same language. What do I mean by this? Well... 1. there is fragmentation: the formats used to describe experiments are different, e.g. MAGE- Tab, PRIDE-ML, SRA-XML, but in essence they capture much of the same information; and 2. the terminologies used to describe experiments is different, even though many concepts are shared such as sample description. Field names as well as values... Novartis, 21st October 2011Tuesday, 8 November 2011
    • What’s the problem? Can you imagine having to translate everything you write into a different language in order to submit your data? Novartis, 21st October 2011Tuesday, 8 November 2011
    • What’s the problem? Can you imagine having to translate everything you write into a different language in order to submit your data? 译 语 编 吗 转换 译 错 Novartis, 21st October 2011Tuesday, 8 November 2011
    • What’s the problem? Can you imagine having to translate everything you write into a different language in order to submit your data? 译 语 编 吗 转换 译 错 An féidir leat a shamhlú go bhfuil gach rud a scríobh tú a aistriú isteach i dteanga eile dfhonn a chur isteach do chuid sonraí? Fiú uirlisí chomhshó, cosúil le google translate a fháilsé mícheart. Novartis, 21st October 2011Tuesday, 8 November 2011
    • Take home point... Repositories are making it difficult for biologists to submit data, and for others to use it. Particularly for those performing multi-omic experiments where to submit say proteomic and transcriptomic data, one must provide the same general data in two very different formats...why? Well people like to have their own formats...plus, ad hoc is easier in general Our solution is one general purpose, flexible format, herein referred to as ISA-Tab. A domain agnostic format to capture experimental metadata in omic experiments (transcriptomic, genomic, proteomic, metabolomic) as well as traditional experiments such as clinical chemistry and histology. ...it works on lots (I won’t dare say all) types of data...nutrigenomics, toxicogenomics, public health... etc. Novartis, 21st October 2011Tuesday, 8 November 2011
    • Tell me more... investigation investigation high  level  concept  to  link   related  studies study the  central  unit,  containing   information  on  the  subject   under  study,  its  characteristics   and  any  treatments  applied. a  study  has  associated  assays assay test  performed  either  on   material  taken  from  the  sub-­ ject  or  on  the  whole  initial   subject,  which  produce  quali-­ tative  or  quantitative  meas-­ urements  (data) assay(s) assay(s) pointers  to  data  file   Biologists like tab. names/location They don’t like XML. Through basic inference... external  files  in   ISA-Tab is good :) native  or  other  for-­ mats data data Novartis, 21st October 2011Tuesday, 8 November 2011
    • But we don’t want to do this... http://xkcd.com/927/ Novartis, 21st October 2011Tuesday, 8 November 2011
    • A format on it’s own isn’t very much though... Too true...the secret to adoption is to provide the tooling to enable biologists to get data into the format, share it, convert and analyse it! The ISAtools provide this tool support. Novartis, 21st October 2011Tuesday, 8 November 2011
    • The ISA tools Developed on top of the ISA-Tab format...modular, configurable, open source, Java based* converter isacreator & others being developed by the ISA community... PERL Parser for ISA by Bob MacCallum and Python Parser for ISA by Brad Chapman *apart from the R, PERL and Python packages of course... Novartis, 21st October 2011Tuesday, 8 November 2011
    • The ISA tools... modular Convert to ISA Convert from ISA converter converter Convert to MAGE-TAB, Convert from MAGE-Tab PRIDE-ML, SRA-XML for to ISATab. More formats submission to international coming soon... public repositories Configure Create Validate Load Browse isacreator Users browse investigations, Check adherance to Curator stores metadata Curator creates template Experimentalist uses editor to in database using BII data query and view template experimental metadata, and report investigation. management tool access associated data files Analyze Perform analysis of data in context with the metadata Requires Configuration XML using the Galaxy or R analysis engines. Novartis, 21st October 2011Tuesday, 8 November 2011
    • The ISA tools... configurable Are you just using buzz words? Well we like buzz words as much as everyone else, but no. We need to be configurable to support evolving checklists and requirements. Just check out mibbi.org, lots of checklists! 32 in fact at the last count. MIBBI is trying to harmonise these checklists to reduce redundancy and make them interoperable. Novartis, 21st October 2011Tuesday, 8 November 2011
    • Checklists...what are they? When we report things, there are some things which are really important. In a school report, we have the child’s name, their class, teacher, subjects taken and so on. Well, in a biological experiment, the very same principles apply. We need information about the sample (species, strain, age) and information about the protocols applied during the experiment and subsequent parameters. We have 32 checklists at present because there are differences in what is deemed important depending on the experiment being performed. Good reporting means that statistics can be applied better, experiments can be reproduced more easily, and data mashups can occur in the future. Experiments are expensive, we should make sure that their full value is realised. Novartis, 21st October 2011Tuesday, 8 November 2011
    • On this point... Helping to demystify the unwieldy world of standards... Find out what standards are out there...MI Checklists, ontologies and formats plus what domains they are suited to... Find out about data sharing policies from NIH for example. Novartis, 21st October 2011Tuesday, 8 November 2011
    • Configurable...back to that We need to support lots of different checklists, and it should be easy for people to change their requirements should they need to.... So, our infrastructure is built upon XML files. These are created by the ISAConfigurator. A configuration XML file describes the fields (or checklist) required to describe a particular experiment! Novartis, 21st October 2011Tuesday, 8 November 2011
    • ISAconfigurator Configuration XML The brick maker...a kiln The bricks... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Create configuration xml files Novartis, 21st October 2011Tuesday, 8 November 2011
    • The ISAconfigurator... Novartis, 21st October 2011Tuesday, 8 November 2011
    • The ISAconfigurator... Novartis, 21st October 2011Tuesday, 8 November 2011
    • The configuration xml... This is an example of a field definition created by the configurator. In this instance we are describing a label field, in particular, one used to describe the label used in a microarray experiment. We have defined it to come from an ontology, and we recommend the ChEBI ontology. It is also required. Novartis, 21st October 2011Tuesday, 8 November 2011
    • The configuration xml... Aside from strong ontology support, the configuration xml also allows for specification of regular expressions which field contents should match, to specify if a field is an integer, double, list value, boolean, string or a field which should accept a file location... The configuration xml is an important part of the infrastructure and is utilised in various components in differing capacities. isacreator Used in content validation but it’s main Used in content validation. The validation purpose here is to build the user component is also called in the ISAconverter interface...more on this later. and BII data manager before conversion and loading respectively Novartis, 21st October 2011Tuesday, 8 November 2011
    • isacreator Create & Edit ISA-Tab Novartis, 21st October 2011Tuesday, 8 November 2011
    • The ISAcreator... file chooser publication searcher visualization ontology search QR code generator isacreator Developed to be a user friendly way to enter standards-compliant automated ontology tagging metadata: it has lots of features... spreadsheet-like interface tagterms visualise suggest clear all help powered by ncbo annotator But these are just some of them...we also have a data entry wizard and an import utility... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Use of the configuration xml Configuration xml schema (XSD) is consumed by an XML beans goal in maven and Java stubs are created which are then used to load the XML files into memory XML definition(s) Import into Java Object Model Construct spreadsheet model. Columns, Assign cell editors. Ontology terms are using classes created by XML rows, etc. given the ontology selection tool as a cell beans editor, file fields are given a file chooser etc. <xml> <field>sample</field> <field>protocol ref</field> Java Object <field>extract name</field> TableReferenceObject <field>label</field> ... </xml> The configuration is also used to define the form view using a similar mechanism.... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Sounds good...what does it look like?... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Sounds good...what does it look like?... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Sounds good...what does it look like?... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Sounds good...what does it look like?... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Sounds good...what does it look like?... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Sounds good...what does it look like?... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Sounds good...what does it look like?... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Sounds good...what does it look like?... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Sounds good...what does it look like?... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Sounds good...what does it look like?... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Sounds good...what does it look like?... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Sounds good...what does it look like?... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Ontologies We use the NCBO Bioportal and the EBI’s OLS to do searching and browsing on ontologies. Ontology field restriction Ontology browsing & searching Ontology tagging Ontology Resource Manager The resource manager provides seamless searching of ontology resources, regardless of their origins, their underlying data schema or the mechanism (REST, SOAP or local file store) through which they are accessed. NCBO Ontology Plugin BioPortal Lookup Search, Hierarchy and Annotator services Service (OLS) ISAcreator manages ontology metadata such as version information as well as individual term accessions, source, uri and so forth. Ontology search code is usable outside of ISAcreator. In fact, the ISAconfigurator imports ISAcreator as a maven dependency and reuses it’s components to do ontology restriction...plugins can also make use of our ontology search and browse functionalities Novartis, 21st October 2011Tuesday, 8 November 2011
    • Ontologies...some more technical details How do we browse so quickly without downloading and reasoning over ontologies? (disclaimer: speed also depends on if you access OLS/BioPortal from Europe/America) Ontologies are all accessed by web services...this part is clear. But browsing over ontologies, especially those coming from 2 separate resources, in different parts of the world with very different implementations isn’t easy. ontology loaded root expanded node a expanded root root, level 0 root, level 0 level(root) + 1 branch a branch a level(a) +1 branch b level(b) +1 To make the browsing experience not so slow and painful, we preload parts of the ontology tree in advance of them being requested by the user. Novartis, 21st October 2011Tuesday, 8 November 2011
    • Plugins In ISAcreator, we use the Apache Felix implementation of the OSGi framework...it’s really good. Plugins can be developed for 3 different purposes: Search (adds extra search Custom cell editors Extra general functionality space for ontology tool) (for spreadsheet) (which appears in a plugin menu) Novartis, 21st October 2011Tuesday, 8 November 2011
    • Plugins...example Novartis Metastore Search Search function on the Novartis Metastore... integrates search results on the metastore in the Ontology search tool. So, with the Novartis plugin in your Plugin directory, you’ll be able to search the Novartis metastore directly within ISAcreator, and it will handle all the tasks involved with recording term source, etc. Novartis, 21st October 2011Tuesday, 8 November 2011
    • Make sure the ISA-Tab is correct Novartis, 21st October 2011Tuesday, 8 November 2011
    • Checks: the structure of the ISA-Tab to ensure it’s well formed; the contents to ensure that it matches what is defined in the configuration xml Then: maps the tab structure into an graph-based object model H1 H. Sapiens 35 Years H1.sample1 Labeling H1.sample1.labeled h1-s1.cel H1 H. Sapiens 35 Years H1.sample2 h1-s2.cel H2 H. Sapiens 33 Years H2.sample1 Labeling H2.sample1.labeled h2-s1.cel H1.sample1 Labeling H1.sample1.labeled h1-s1.cel H1 H. Sapiens H1.sample2 h1-s2.cel 35 Years H2 H2.sample1 Labeling H2.sample1.labeled h2-s1.cel H. Sapiens 33 Years Actions such as conversion to other formats and persisting to the DB are performed on this object model (called the BIIObjectStore). Novartis, 21st October 2011Tuesday, 8 November 2011
    • Novartis, 21st October 2011Tuesday, 8 November 2011
    • or... validate from the command line... or... within ISAcreator directly... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Convert to or from differing formats Novartis, 21st October 2011Tuesday, 8 November 2011
    • The converters Fully Endorsed by ArrayExpress, PRIDE and the European Nucleotide Archive (ENA)... Converts MAGE-Tab to ISA-Tab. This is still in beta, however we are getting close to a fully working version. We’ve successfully creating validated ISA-Tab for ~90% of the 21k experiments in ArrayExpress Available as a web service, web interface and source is available for running conversions locally http://isatab.sourceforge.net/magetoisa/ Novartis, 21st October 2011Tuesday, 8 November 2011
    • Novartis, 21st October 2011Tuesday, 8 November 2011
    • or... convert from the command line... or... within ISAcreator directly... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Automagically filters out the formats you can’t export to...e.g., if I have no sequencing experiments, I won’t need to export in SRA Novartis, 21st October 2011Tuesday, 8 November 2011
    • Get ISA-Tab into a database Share it (or don’t) with the world Novartis, 21st October 2011Tuesday, 8 November 2011
    • GUI & command line interface to get ISA-Tab into an instance of the BII (BioInvestigation Index) Calls the validator first, then persists the BIIObjectStore object to the database via Hibernate Novartis, 21st October 2011Tuesday, 8 November 2011
    • Lots of admin functionalities available from the GUI, these are also available using the command line or API Disclaimer Over X11, using such an interface is slow...I’d suggest making use of the API or command line tools available... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Database Novartis, 21st October 2011Tuesday, 8 November 2011
    • Database The BioInvestigation Index term is an overloaded one. It refers to the database & the web application The database itself is quite complicated to describe in detail in a single presentation, but the key take home message is that it is graph based...remember this? H1.sample1 Labeling H1.sample1.labeled h1-s1.cel H1 H. Sapiens H1.sample2 h1-s2.cel 35 Years H2 H2.sample1 Labeling H2.sample1.labeled h2-s1.cel H. Sapiens 33 Years In the BII, we have Materials, Processes, Cross References and Annotations. This makes things pretty generic...and the BII model is even more generic that ISA-Tab Novartis, 21st October 2011Tuesday, 8 November 2011
    • Database One more word about the database, (and a few sentences) then I’ll show the web application. Scalable. As far as we know... :) ArrayExpress v2 makes use of all of the BII object model. They just add a table for bio entities (or genes) and that’s it! AE have >21,000 experiments and >500,000 hybridizations loaded into it’s database. Novartis, 21st October 2011Tuesday, 8 November 2011
    • Web Application Novartis, 21st October 2011Tuesday, 8 November 2011
    • Web application Novartis, 21st October 2011Tuesday, 8 November 2011
    • Web application Novartis, 21st October 2011Tuesday, 8 November 2011
    • Web application Novartis, 21st October 2011Tuesday, 8 November 2011
    • Web application Novartis, 21st October 2011Tuesday, 8 November 2011
    • Web application Novartis, 21st October 2011Tuesday, 8 November 2011
    • Web application We created the web application as a light weight solution enabling users to share their data. (But it’s a J2EE solution so I think we’ve got an oxymoron on our hands) But even though it’s enterprise level, it is at least light on maintenance. You’ll not have to do much with BII once it is running. The EBI version, running across 2 servers (one as backup) has been live for 6 months so far without one restart...and I only restarted to deploy a new instance. Novartis, 21st October 2011Tuesday, 8 November 2011
    • Web application We use JBoss Seam, mainly because we don’t have to worry about HTTP sessions, scope, etc. It manages everything for us which is useful...this is particularly important in highly accessed systems and releases time to be spent working on more interesting things... But it’s also a really good “integration framework”, pulling in JSP, JSF, EJB, JPA, Hibernate, etc. Novartis, 21st October 2011Tuesday, 8 November 2011
    • Web application We use HQL instead of platform specific SQL. So the database can be Oracle, MySQL, PostGreSQL...a database independent application We can deal directly with objects, directly from the database queries We construct the schema using POJO’s, some XML Novartis, 21st October 2011Tuesday, 8 November 2011
    • Web application Lucene creates a document-based index of the database contents We use annotations to specify which fields should be indexed This index can be accessed and queried very quickly, so we use this to build the user interface Novartis, 21st October 2011Tuesday, 8 November 2011
    • Being deployed on Cloud-enabled instance of the BioLinux VM Will make it easier to create deployments of the BII database and web application... Novartis, 21st October 2011Tuesday, 8 November 2011
    • Last but not least... Analysis Novartis, 21st October 2011Tuesday, 8 November 2011
    • Package to read ISA-Tab into R, especially BioConductor to run analysis scripts on your data... It can automatically call microarray, mass spec and flow cytometry analysis packages on appropriate datasets... We still need to upload this to BioConductor...created by Audrey Kauffman There is also a script to create Galaxy libraries from ISA-Tab Brad Chapman is working on this at HSPH Novartis, 21st October 2011Tuesday, 8 November 2011
    • Who’s using ISA? Fortunately, lots of people are now taking ISA on board... people are realising that MAGE-TAB, SOFT, PRIDE-ML and SRA-XML are an overhead which can be avoided, especially in multi-omic experiments. The National Center for Toxicological Research (NCTR) & others...see the case study section on the ISA tools web site Novartis, 21st October 2011Tuesday, 8 November 2011
    • Who’s using ISA? Case study: Metabolomics repository - Metabolights Built on top of the ISA infrastructure with a custom front-end web interface... converter isacreator Data entry tooling - ISAcreator, ISAvalidator and ISAconverter Data management tools - BII data manager, BII database Also developing their own plugins for ISAcreator (of type: custom cell editor) to help users in reporting metabolite assignments. Novartis, 21st October 2011Tuesday, 8 November 2011
    • Who’s using ISA? Case study: Metabolomics repository - Metabolights Novartis, 21st October 2011Tuesday, 8 November 2011
    • Who’s using ISA? Case study: SCDE Curated stem cell informatics resource linked with the Galaxy analysis engine converter isacreator Built on top of the ISA infrastructure in its entirety Contributing automatic deployment scripts for the BII (linked with the cloud BioLinux initiative) Created the Python Parser for ISA-Tab Novartis, 21st October 2011Tuesday, 8 November 2011
    • Who’s using ISA? Case study: SCDE Novartis, 21st October 2011Tuesday, 8 November 2011
    • Who’s using ISA? Biggest public study of its kind Case study: GeneData - InnoMed Only available in ISA-Tab 720 animals 16 compounds 3 doses ~20,000 assays Novartis, 21st October 2011Tuesday, 8 November 2011
    • Who’s using ISA? Biggest public study of its kind Case study: GeneData - InnoMed Only available in ISA-Tab protein expression profiling by mass spectrometry transcription profiling by dna microarray 720 animals metabolite profiling 16 compounds by mass spectrometry 3 doses metabolite profiling by nmr spectroscopy ~20,000 assays histology clinical chemistry hematology Novartis, 21st October 2011Tuesday, 8 November 2011
    • Who’s using ISA? Case study: GeneData - InnoMed Novartis, 21st October 2011Tuesday, 8 November 2011
    • Our next steps...as a community Visualization Further adoption Analysis low dose aspirin liver kidney blood serum blood plasma x5 x5 x5 x5 SAMP SAMP SAMP SAMP EX EX EX EX kidney blood serum LABEL LABEL LABEL HYB HYB HYB x5 x5 SAMP SAMP SCAN SCAN SCAN SCAN EX TRANS TRANS TRANS TRANS LABEL HYB SCAN SCAN liver kidney blood serum blood plasma TRANS TRANS x5 x5 x5 x5 SAMP SAMP SAMP SAMP well described process missing protocols and no from sample to data file. information about what was being measured. EX EX Making visual comparisons is straightfor- ward using this approach. The longest path is constructed based on all other known LABEL LABEL datasets in the pool of workflows being compared. HYB HYB HYB SCAN SCAN SCAN SCAN TRANS TRANS TRANS TRANS Novartis, 21st October 2011Tuesday, 8 November 2011
    • We can’t do everything by ourselves... ISA team Funders Susanna-Assunta Sansone Philippe Rocca-Serra Eamonn Maguire Contributors Collaborators at Marco Brandizi Natalija Sklyar Brad Chapman Bob MacCallum Kenneth Haug Pablo Conesa The National Center for Toxicological Research (NCTR) Audrey Kauffman Novartis, 21st October 2011Tuesday, 8 November 2011
    • ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level Philippe Rocca-Serra; Marco Brandizi; Eamonn Maguire; Nataliya Sklyar; Chris Taylor; Kimberly Begley; Dawn Field; Stephen Harris; Winston Hide; Oliver Hofmann; Steffen Neumann; Peter Sterk; Weida Tong; Susanna- Assunta Sansone Bioinformatics 2010 26: 2354-2356 Novartis, 21st October 2011Tuesday, 8 November 2011
    • Thanks for listening... Questions?? You can email us... isatools@googlegroups.com View our website http://www.isa-tools.org View our Git repo & contribute http://github.com/ISA-tools View our blog http://isatools.wordpress.com Follow us on Twitter @antarcticdesign Novartis, 21st October 2011Tuesday, 8 November 2011