The	  Inves)ga)on/Study/Assay	  (ISA)	  metadata	  framework	  for	  reproducible	    and	  reusable	  bioscience	  resear...
Ioannidis	   et	   al.,	   Repeatability	   of	   published	   microarray	  gene	  expression	  analyses.	  Nature	  Gene*...
Ioannidis	   et	   al.,	   Repeatability	   of	   published	   microarray	  gene	  expression	  analyses.	  Nature	  Gene*...
h[p://www.nature.com/news/2011/110111/full/469139a.html	  
h[p://www.nature.com/news/2011/110111/full/469139a.html	           h[p://www.economist.com/node/21528593	  
h[p://www.nature.com/news/2011/110111/full/469139a.html	          h[p://www.economist.com/node/21528593	        h[p://www....
Contextual	  informa)on	  (metadata):	    •  Sample	  characteris)cs	    •  Technology	  and	  measurement	  types	    •  ...
Need	  for	  a	  generic	  representa)on,	  applied	  to:	     	  •microarray	  based	  experiments	  (MAGE)	     	  •sequ...
Roadmap	                Reproducible	  &	  Reusable	  	                  Bioscience	  Research	  
Roadmap	            reasoning	   visualiza)on	                analysis	   browsing	   integra)on	                    excha...
Roadmap	            reasoning	   visualiza)on	                analysis	   browsing	   integra)on	                    excha...
Roadmap	                          reasoning	   visualiza)on	                               analysis	   browsing	   integra...
Roadmap	            reasoning	   visualiza)on	                analysis	   browsing	   integra)on	                    excha...
Bioscience	  is	  mul)-­‐domain…	                                                                                         ...
Mul)ple	  communi)es,	  mul)ple	  norms	  and	  standards,	  e.g.:	                                                       ...
Growing	  number	  of	  bioscience	  repor)ng	  standards	                                                                ...
But…	  	     what	  do	  we	  know	  about	  them	  and	  how	  they	  are	  related	                            MAGE-Tab!...
But…	  	     what	  do	  we	  know	  about	  them	  and	  how	  they	  are	  related	                                     ...
A	  coherent,	  curated	  and	     searchable	  catalogue	  of	     data	  sharing	  resources	                    	  •  B...
 	  	  	  	  	  	  	  	  	  	  	  	  	  infrastructure	  
ISA	  sodware	  suite:	  suppor)ng	                                                                               standard...
faahKO	  dataset	  •  Available	  in	  Bioconductor	  •  Subset	  of	  the	  original	  data	  on	  global	  metabolite	  ...
-­‐	  	  Define	  key	  en))es	  (e.g.	  factors,	  	  protocols,	  parameters)	  -­‐	  Grouping	  of	  studies	  -­‐	  Rel...
-­‐  Subjects	  studied:	  source(s),	  sampling	                                                   methodology,	  charact...
-­‐  Subjects	  studied:	  source(s),	  sampling	                                  methodology,	  characteris)cs	  faahKO	...
-­‐  measurement	  type,	  e.g.	  metabolite	  profiling	  -­‐  technology,	  e.g.	  mass	  spectrometry	                fa...
Create template(s) to fit the type ofexperiments to be described		  Create	  templates	  detailing	  the	  steps	  to	  be	...
Describe, curate your experiment using adesktop-based tool		  Report and edit the description using this tool,(also custom...
•  Ontology	  search	  and	  automated	  tagging	  	  (relying	  on	  	                                                NCB...
•  R	  package	  available	  in	  BioConductor	  2.11	  	                                 h[p://bioconductor.org/packages/...
•  faahKO	  package	  v.	  2.12	  contains	  ISAtab	  files	     describing	  the	  experiment	      	  	  	  	  faahkoISA	...
The	  implicit	  seman)cs	  of	  the	  	     	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	  	 ...
Hybridiza)on	                                                                                             Derived	  Array	...
45	      Tagging:	  from	  free	  text	  to	  ontology-­‐based	               • single	  interven)on	  representa)on,	  fr...
ToxBank	  effort	         	  developed	  by	  Nina	  Jeliazkova	  	                                                        ...
•  Make	  the	  seman)cs	  of	  ISAtab	  explicit,	  including	     materials	  &	  data	  en))es	  &	  processes	  &	  th...
architecture	  ISA-­‐TAB	   parser	            	  	  	  	  	  graph	                    isa2owl	  mapping	                ...
vocabularies	                Chemical	                        Biomolecular	  	                                   Informa)o...
Open	  Biological	  and	               Biomedical	  Ontologies	               (OBO)	  Foundry	                            ...
ISA-­‐OBI	  mapping	  
ISA-­‐SIO	  mapping	  
faahKO	  dataset	  	  Available	  in	  Bioconductor	  	  (with	  ISA-­‐TAB	  metadata)	  Global	  metabolite	  profiling	  ...
•  support	  different	  conversion	  modes	  (different	  levels	  of	     granularity)	  •  querying	  for	  ISA-­‐TAB	  d...
Increasing	  level	  of	  structure	  	                                     for	  experimental	  metadata	  Notes	  in	  L...
Towards	  interoperable	  bioscience	  data	                                                                              ...
Implementa)on	  at	  Harvard	                                   ISA                     h[p://discovery.hsci.harvard.edu/	  
Implementa)on	  at	  the	  	  European	  Bioinforma)cs	  Ins)tute	                                              h[p://www....
reasoning	   visualiza)on	  analysis	   browsing	   integra)on	      exchange	   retrieval	  Reproducible	  &	  Reusable	 ...
@isatools	  @biosharing	  isa-­‐tools.org	  	  	  	  isacommons.org	  	  biosharing.org	  
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
Upcoming SlideShare
Loading in …5
×

BCU 2013

318 views
224 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
318
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

BCU 2013

  1. 1. The  Inves)ga)on/Study/Assay  (ISA)  metadata  framework  for  reproducible   and  reusable  bioscience  research   Alejandra  González-­‐Beltrán,  PhD   on  behalf  of  the  ISATeam       Oxford  e-­‐Research  Centre,  University  of  Oxford     Faculty  of  Technology,  Environment  and  Engineering   Birmingham  City  University   12th  March  2013    
  2. 2. Ioannidis   et   al.,   Repeatability   of   published   microarray  gene  expression  analyses.  Nature  Gene*cs  41(2),  149-­‐55  (2009)  doi:10.1038/ng.295    
  3. 3. Ioannidis   et   al.,   Repeatability   of   published   microarray  gene  expression  analyses.  Nature  Gene*cs  41(2),  149-­‐55  (2009)  doi:10.1038/ng.295    
  4. 4. h[p://www.nature.com/news/2011/110111/full/469139a.html  
  5. 5. h[p://www.nature.com/news/2011/110111/full/469139a.html   h[p://www.economist.com/node/21528593  
  6. 6. h[p://www.nature.com/news/2011/110111/full/469139a.html   h[p://www.economist.com/node/21528593   h[p://www.ny)mes.com/2011/07/08/health/research/08genes.html  
  7. 7. Contextual  informa)on  (metadata):   •  Sample  characteris)cs   •  Technology  and  measurement  types   •  Instrument  parameters   •  …  
  8. 8. Need  for  a  generic  representa)on,  applied  to:    •microarray  based  experiments  (MAGE)    •sequencing  based  experiments  (SRA)    •flow  cytometry  based  experiments  (FuGE-­‐Flow  Cyt)    •mass  spectrometry  and  NMR  spectroscopy  experiments  (Metabolights  and  PRIDE)  
  9. 9. Roadmap   Reproducible  &  Reusable     Bioscience  Research  
  10. 10. Roadmap   reasoning   visualiza)on   analysis   browsing   integra)on   exchange   retrieval   Well-­‐annotated  &   Structured  Data   Reproducible  &  Reusable     Bioscience  Research  
  11. 11. Roadmap   reasoning   visualiza)on   analysis   browsing   integra)on   exchange   retrieval   Well-­‐annotated  &   Structured  Data   Reproducible  &  Reusable     Bioscience  Research   User  community  
  12. 12. Roadmap   reasoning   visualiza)on   analysis   browsing   integra)on   exchange   retrieval  Community  Standards   Sodware  Tools   Well-­‐annotated  &   Structured  Data   Reproducible  &  Reusable     Bioscience  Research   User  community  
  13. 13. Roadmap   reasoning   visualiza)on   analysis   browsing   integra)on   exchange   retrieval   Reproducible  &  Reusable     Bioscience  Research  
  14. 14. Bioscience  is  mul)-­‐domain…   health   env   agro   tox/pharma  §       Interdisciplinary  and  integra:ve  in  character     •  need  to  deal  with  new  and  exis:ng  datasets   •  deal  with  a  variety  of  data  types   Source  of  the  figure:  EBI  website  
  15. 15. Mul)ple  communi)es,  mul)ple  norms  and  standards,  e.g.:   use  the  same  term  to   allow  data  to  flow  from   report  the  same  core,     refer  to  the  same  ‘thing’   one  system  to  another   essen)al  informa)on     Challenges: lack of interaction and coordination, duplication of effort, fragmentation and uneven coverage…hinders interoperability
  16. 16. Growing  number  of  bioscience  repor)ng  standards   303  +       150  +       130  +       Source:  MIBBI,     Source:  BioPortal   Es:mated   EQUATOR   Databases,     annota)on,   cura)on     tools   MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! GIATE! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO   PRO! IDO…! MIASE! MISFISHIE….!
  17. 17. But…     what  do  we  know  about  them  and  how  they  are  related   MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! GIATE! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO   PRO! IDO…! MIASE! MISFISHIE….!
  18. 18. But…     what  do  we  know  about  them  and  how  they  are  related   I  use  high  throughput   Which  tools  and   sequencing  technologies,   databases   which  ones  are  relevant  to   implement  which   me?   standards?   How  can  I  get   What  are  the   involved  to  propose   criteria  to  evaluate   extensions  or   their  status  and   modifica)ons?   value?   Which  ones  are   Which  formats   I  work  on  plants,  are   mature  enough  for   support  specific   these  standards  just   me  to  use  or   minimum   for  biomedical   recommend?   informa)on   applica)ons?   guidelines?  
  19. 19. A  coherent,  curated  and   searchable  catalogue  of   data  sharing  resources    •  Bioscience  standards  and   associated  data-­‐sharing   policies,  publica:ons,  tools   and  databases  •  Assessment  criteria  for   usability  and  popularity  of   standards  •  Rela:onships  among   standards  •  Encouragement  for   communica:on  &   interac:on  among  groups  •  Promo)ng  interoperability   &  informed  decisions  about   standards  
  20. 20.                            infrastructure  
  21. 21. ISA  sodware  suite:  suppor)ng   standards-­‐compliant  experimental   annota)on  and  enabling  cura)on  at                              infrastructure   the  community  level   Rocca-­‐Serra  et  al,    2010   Bioinforma)cs   •  Assist  in  the  annota)on  and  management  of   experimental  metadata  at  source,  suppor)ng  data   provenance  tracking   •  Deal  with  high-­‐throughput  studies  using  one  or  a   combina)on  of  omics  and  other  technologies   •  Empower  users  to  uptake  community-­‐defined  checklists   and  ontologies   •  Facilitate  data  sharing,  re-­‐use,  comparison  and   reproducibility  of  experiments,  submission  to   interna)onal  public  repositories  
  22. 22. faahKO  dataset  •  Available  in  Bioconductor  •  Subset  of  the  original  data  on  global  metabolite  profiling   Saghatlian  et  al.   Biochemistry.  2004  •  LC/MS  peaks  from  the  spinal  cords  of  6  wild-­‐type  and  6  FAAH   (fa[y  acid  amyde  hydrolase)  knockout  mice  
  23. 23. -­‐    Define  key  en))es  (e.g.  factors,    protocols,  parameters)  -­‐  Grouping  of  studies  -­‐  Relate  studies  and  assays   faahKO  inves)ga)on  
  24. 24. -­‐  Subjects  studied:  source(s),  sampling   methodology,  characteris)cs  faahKO  study   -­‐  treatments/manipula)ons  performed     to  prepare  the  specimens     NEWT  UniProt  Taxonomy  Database   Mouse  Genome  Informa)cs  
  25. 25. -­‐  Subjects  studied:  source(s),  sampling   methodology,  characteris)cs  faahKO  study   -­‐  treatments/manipula)ons  performed     to  prepare  the  specimens     Mouse  Adult  Gross  Anatomy  
  26. 26. -­‐  measurement  type,  e.g.  metabolite  profiling  -­‐  technology,  e.g.  mass  spectrometry   faahKO  assay  
  27. 27. Create template(s) to fit the type ofexperiments to be described  Create  templates  detailing  the  steps  to  be  reported  for  different  inves)ga)ons,  complying  to  community  standards,  e.g.  configuring  the  value(s)  allowed  for  each  field  to  be    •  text  (with/without  regular  expression  tes)ng),  •  ontology  terms,  •  numbers  etc.          
  28. 28. Describe, curate your experiment using adesktop-based tool  Report and edit the description using this tool,(also customized using the templates) with aspreadsheet like look and feel, packed withfunctionalities such as •  ontology search (access via ) •  term-tagging features •  import from spreadsheets etc…  
  29. 29. •  Ontology  search  and  automated  tagging    (relying  on     NCBO  Bioportal  services)  on  Google  Spreadsheets   •  Collabora)ve  annota)on;  support  for  distributed  users   •  Version  control  &  history  OntoMaton:  a  Bioportal  powered  Ontology  widget  for  Google  Spreadsheets  Maguire  et  al,    2013  Bioinforma)cs  
  30. 30. •  R  package  available  in  BioConductor  2.11     h[p://bioconductor.org/packages/release/bioc/html/Risa.html  •  ISAtab  class  •  Read  ISAtab  files  into  ISAtab  objects  and  write  ISAtab   files  back  to  disk  •  Increment  metadata  with  defini)on  factors/ treatments/groups  •  Build  xcmsSet  (xcms  package)  objects  from  mass   spectrometry  assays      •  Augment  the  ISAtab  dataset  ader  analysis  •                                                           source  &  issues  tracking     h[ps://github.com/ISA-­‐tools/Risa          
  31. 31. •  faahKO  package  v.  2.12  contains  ISAtab  files   describing  the  experiment          faahkoISA  =  readISAta(find.package("faahKO"))          assay.filename  <-­‐  faahkoISA["assay.filenames"][[1]]          xset  =  processAssayXcmsSet(faahkoISA,  assay.filename)          …          updateAssayMetadata(faahkoISA,  assay.filename,"Derived  Spectral   Data  File","faahkoDSDF.txt"  )  •  MTBLS2  processing  and  analysis  using  Risa,  xcms  and   CAMERA  BioConductor  packages   Metabolights – an open access general-purpose repository for metabolomics studies and associated meta-data Haug et al, 2012 Nucleic Acids Research
  32. 32. The  implicit  seman)cs  of  the                                                            syntax  
  33. 33. Hybridiza)on   Derived  Array  Data  File   Sample  Name   Material  Type   Assay  Design  REF   Array  Data  File   Protocol  REF   Assay  Name       sample1   genomic  DNA   assay1   A-AFFY-107" assay1.cel   data  normaliza)on   assay1.txt   sample2   genomic  DNA   assay2   A-AFFY-107" assay2.cel   data  normaliza)on   assay2.txt   sample3   genomic  DNA   assay3   A-AFFY-107" assay3.cel   data  normaliza)on   assay3.txt  Material  transforma)ons...   Material  Node   Data  File  Node   " " DATA! Characteristics[…] Material! Derived Data File Factor Value[…] (independent Protocol   variables) Process   Material Type Comment[…] Parameter  Value   " […]   " Material! DATA! Raw Data Performer    (operator effect) File  Date  (day effect)
  34. 34. 45   Tagging:  from  free  text  to  ontology-­‐based   • single  interven)on  representa)on,  free  text  annota)on   Factor   Characteris)cs[organism]   Factor   Factor   Source  Name   Value[perturba)on     Value[dose]   Value[dura)on]   agent]   individual1   human   aspirin   high  dose   12  weeks   • single  interven)on,  ontology-­‐based  annota)on   Factor   Characteris)cs[organism Term  Source   Term  Accession   Value[chemical   Term  Source   Term  Accession  Source  Name   obi:0100026)])   REF   Number   compound   REF   Number     CHEBI_37577)]  individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354   Factor   Term  Source   Term  Accession   Factor  Value[)me   Term  Source   Term  Accession   Unit   Value[dose(OBI_0000984)   REF   Number   (PATO_0000165)]   REF   Number   low  dose   LNC   LP30872-­‐3   12   week   UO   0000034  
  35. 35. ToxBank  effort    developed  by  Nina  Jeliazkova     Health  Care  &  Life  Sciences    Kohonen  et  al.  The  ToxBank  Data  Warehouse:  a   Interest  Group    research  cluster  of  7    EU  FP7  Health  systems  toxicology  and  toxicogenomics  projects.    
  36. 36. •  Make  the  seman)cs  of  ISAtab  explicit,  including   materials  &  data  en))es  &  processes  &  their   rela)onships  •  Provide  incen)ves  for  provision  of  ontology-­‐ based  annota)ons  in  ISA-­‐TAB  datasets;  exploit   those  annota)ons    •  Augment  ISA  syntax  with  new  elements  (e.g.   groups),  facilita)ng  the  understanding  &   querying  of  experimental  design  •  Facilitate  data  integra)on  &  knowledge   discovery/reasoning  
  37. 37. architecture  ISA-­‐TAB   parser            graph   isa2owl  mapping   analysis   parser   Configura)on   file   Implementa)on:   -­‐  java-­‐based   -­‐  Using  owlapi  
  38. 38. vocabularies   Chemical   Biomolecular     Informa)on   domain   domain   domain        Experimental   domain   Factor   Characteris)cs[organi Term   Term  Accession   Value[chemical   Term  Source   Term  Accession   Source  Name   smobi:0100026)])   Source  REF   Number   compound   REF   Number       CHEBI_37577)]   individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354  
  39. 39. Open  Biological  and   Biomedical  Ontologies   (OBO)  Foundry   BFO   ChEBI   GO   IAO   Factor   Characteris)cs[organiOBI   Term   Term  Accession   Value[chemical   Term  Source   Term  Accession   Source  Name   smobi:0100026)])   Source  REF   Number   compound   REF   Number     CHEBI_37577)]   individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354  
  40. 40. ISA-­‐OBI  mapping  
  41. 41. ISA-­‐SIO  mapping  
  42. 42. faahKO  dataset    Available  in  Bioconductor    (with  ISA-­‐TAB  metadata)  Global  metabolite  profiling   Data  subset:  LC/ MS  peaks  from  the   spinal  cords  of  6   wild-­‐type  and  6   FAAH  (fa[y  acid   amyde  hydrolase)   knockout  mice  
  43. 43. •  support  different  conversion  modes  (different  levels  of   granularity)  •  querying  for  ISA-­‐TAB  datasets,  across  mul)ple   experiment  types  •  reasoning  exploi)ng  ontology  annota)ons   –   seman)c  valida)on  of  ISA-­‐TAB  datasets  •  augmented  annota)on  over  na)ve  ISA  syntax   –  iden)fica)on  gaps  in  ontological  representa)ons     –  feedback  of  findings  to  community  ontologies    
  44. 44. Increasing  level  of  structure     for  experimental  metadata  Notes  in  Lab  books   Spreadsheets  &  Tables   Facts  as  RDF  statements     (ISAtab  metadata)    
  45. 45. Towards  interoperable  bioscience  data   Sansone  et  al,  2012   Nature  Gene)cs  A  growing  ecosystem    of  over  30  public  and  internal  resources  using  the  ISA  metadata  tracking  framework    to  facilitate  standards-­‐compliant  collec)on,  cura)on,  management  and  reuse  of  inves)ga)ons  in  an  increasingly  diverse  set  of  life  science  domains.  
  46. 46. Implementa)on  at  Harvard   ISA h[p://discovery.hsci.harvard.edu/  
  47. 47. Implementa)on  at  the    European  Bioinforma)cs  Ins)tute   h[p://www.ebi.ac.uk/metabolights   60
  48. 48. reasoning   visualiza)on  analysis   browsing   integra)on   exchange   retrieval  Reproducible  &  Reusable     Bioscience  Research  
  49. 49. @isatools  @biosharing  isa-­‐tools.org        isacommons.org    biosharing.org  

×