Your SlideShare is downloading. ×
0
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
BCU 2013
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

BCU 2013

125

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
125
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The  Inves)ga)on/Study/Assay  (ISA)  metadata  framework  for  reproducible   and  reusable  bioscience  research   Alejandra  González-­‐Beltrán,  PhD   on  behalf  of  the  ISATeam       Oxford  e-­‐Research  Centre,  University  of  Oxford     Faculty  of  Technology,  Environment  and  Engineering   Birmingham  City  University   12th  March  2013    
  • 2. Ioannidis   et   al.,   Repeatability   of   published   microarray  gene  expression  analyses.  Nature  Gene*cs  41(2),  149-­‐55  (2009)  doi:10.1038/ng.295    
  • 3. Ioannidis   et   al.,   Repeatability   of   published   microarray  gene  expression  analyses.  Nature  Gene*cs  41(2),  149-­‐55  (2009)  doi:10.1038/ng.295    
  • 4. h[p://www.nature.com/news/2011/110111/full/469139a.html  
  • 5. h[p://www.nature.com/news/2011/110111/full/469139a.html   h[p://www.economist.com/node/21528593  
  • 6. h[p://www.nature.com/news/2011/110111/full/469139a.html   h[p://www.economist.com/node/21528593   h[p://www.ny)mes.com/2011/07/08/health/research/08genes.html  
  • 7. Contextual  informa)on  (metadata):   •  Sample  characteris)cs   •  Technology  and  measurement  types   •  Instrument  parameters   •  …  
  • 8. Need  for  a  generic  representa)on,  applied  to:    •microarray  based  experiments  (MAGE)    •sequencing  based  experiments  (SRA)    •flow  cytometry  based  experiments  (FuGE-­‐Flow  Cyt)    •mass  spectrometry  and  NMR  spectroscopy  experiments  (Metabolights  and  PRIDE)  
  • 9. Roadmap   Reproducible  &  Reusable     Bioscience  Research  
  • 10. Roadmap   reasoning   visualiza)on   analysis   browsing   integra)on   exchange   retrieval   Well-­‐annotated  &   Structured  Data   Reproducible  &  Reusable     Bioscience  Research  
  • 11. Roadmap   reasoning   visualiza)on   analysis   browsing   integra)on   exchange   retrieval   Well-­‐annotated  &   Structured  Data   Reproducible  &  Reusable     Bioscience  Research   User  community  
  • 12. Roadmap   reasoning   visualiza)on   analysis   browsing   integra)on   exchange   retrieval  Community  Standards   Sodware  Tools   Well-­‐annotated  &   Structured  Data   Reproducible  &  Reusable     Bioscience  Research   User  community  
  • 13. Roadmap   reasoning   visualiza)on   analysis   browsing   integra)on   exchange   retrieval   Reproducible  &  Reusable     Bioscience  Research  
  • 14. Bioscience  is  mul)-­‐domain…   health   env   agro   tox/pharma  §       Interdisciplinary  and  integra:ve  in  character     •  need  to  deal  with  new  and  exis:ng  datasets   •  deal  with  a  variety  of  data  types   Source  of  the  figure:  EBI  website  
  • 15. Mul)ple  communi)es,  mul)ple  norms  and  standards,  e.g.:   use  the  same  term  to   allow  data  to  flow  from   report  the  same  core,     refer  to  the  same  ‘thing’   one  system  to  another   essen)al  informa)on     Challenges: lack of interaction and coordination, duplication of effort, fragmentation and uneven coverage…hinders interoperability
  • 16. Growing  number  of  bioscience  repor)ng  standards   303  +       150  +       130  +       Source:  MIBBI,     Source:  BioPortal   Es:mated   EQUATOR   Databases,     annota)on,   cura)on     tools   MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! GIATE! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO   PRO! IDO…! MIASE! MISFISHIE….!
  • 17. But…     what  do  we  know  about  them  and  how  they  are  related   MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! GIATE! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO   PRO! IDO…! MIASE! MISFISHIE….!
  • 18. But…     what  do  we  know  about  them  and  how  they  are  related   I  use  high  throughput   Which  tools  and   sequencing  technologies,   databases   which  ones  are  relevant  to   implement  which   me?   standards?   How  can  I  get   What  are  the   involved  to  propose   criteria  to  evaluate   extensions  or   their  status  and   modifica)ons?   value?   Which  ones  are   Which  formats   I  work  on  plants,  are   mature  enough  for   support  specific   these  standards  just   me  to  use  or   minimum   for  biomedical   recommend?   informa)on   applica)ons?   guidelines?  
  • 19. A  coherent,  curated  and   searchable  catalogue  of   data  sharing  resources    •  Bioscience  standards  and   associated  data-­‐sharing   policies,  publica:ons,  tools   and  databases  •  Assessment  criteria  for   usability  and  popularity  of   standards  •  Rela:onships  among   standards  •  Encouragement  for   communica:on  &   interac:on  among  groups  •  Promo)ng  interoperability   &  informed  decisions  about   standards  
  • 20.                            infrastructure  
  • 21. ISA  sodware  suite:  suppor)ng   standards-­‐compliant  experimental   annota)on  and  enabling  cura)on  at                              infrastructure   the  community  level   Rocca-­‐Serra  et  al,    2010   Bioinforma)cs   •  Assist  in  the  annota)on  and  management  of   experimental  metadata  at  source,  suppor)ng  data   provenance  tracking   •  Deal  with  high-­‐throughput  studies  using  one  or  a   combina)on  of  omics  and  other  technologies   •  Empower  users  to  uptake  community-­‐defined  checklists   and  ontologies   •  Facilitate  data  sharing,  re-­‐use,  comparison  and   reproducibility  of  experiments,  submission  to   interna)onal  public  repositories  
  • 22. faahKO  dataset  •  Available  in  Bioconductor  •  Subset  of  the  original  data  on  global  metabolite  profiling   Saghatlian  et  al.   Biochemistry.  2004  •  LC/MS  peaks  from  the  spinal  cords  of  6  wild-­‐type  and  6  FAAH   (fa[y  acid  amyde  hydrolase)  knockout  mice  
  • 23. -­‐    Define  key  en))es  (e.g.  factors,    protocols,  parameters)  -­‐  Grouping  of  studies  -­‐  Relate  studies  and  assays   faahKO  inves)ga)on  
  • 24. -­‐  Subjects  studied:  source(s),  sampling   methodology,  characteris)cs  faahKO  study   -­‐  treatments/manipula)ons  performed     to  prepare  the  specimens     NEWT  UniProt  Taxonomy  Database   Mouse  Genome  Informa)cs  
  • 25. -­‐  Subjects  studied:  source(s),  sampling   methodology,  characteris)cs  faahKO  study   -­‐  treatments/manipula)ons  performed     to  prepare  the  specimens     Mouse  Adult  Gross  Anatomy  
  • 26. -­‐  measurement  type,  e.g.  metabolite  profiling  -­‐  technology,  e.g.  mass  spectrometry   faahKO  assay  
  • 27. Create template(s) to fit the type ofexperiments to be described  Create  templates  detailing  the  steps  to  be  reported  for  different  inves)ga)ons,  complying  to  community  standards,  e.g.  configuring  the  value(s)  allowed  for  each  field  to  be    •  text  (with/without  regular  expression  tes)ng),  •  ontology  terms,  •  numbers  etc.          
  • 28. Describe, curate your experiment using adesktop-based tool  Report and edit the description using this tool,(also customized using the templates) with aspreadsheet like look and feel, packed withfunctionalities such as •  ontology search (access via ) •  term-tagging features •  import from spreadsheets etc…  
  • 29. •  Ontology  search  and  automated  tagging    (relying  on     NCBO  Bioportal  services)  on  Google  Spreadsheets   •  Collabora)ve  annota)on;  support  for  distributed  users   •  Version  control  &  history  OntoMaton:  a  Bioportal  powered  Ontology  widget  for  Google  Spreadsheets  Maguire  et  al,    2013  Bioinforma)cs  
  • 30. •  R  package  available  in  BioConductor  2.11     h[p://bioconductor.org/packages/release/bioc/html/Risa.html  •  ISAtab  class  •  Read  ISAtab  files  into  ISAtab  objects  and  write  ISAtab   files  back  to  disk  •  Increment  metadata  with  defini)on  factors/ treatments/groups  •  Build  xcmsSet  (xcms  package)  objects  from  mass   spectrometry  assays      •  Augment  the  ISAtab  dataset  ader  analysis  •                                                           source  &  issues  tracking     h[ps://github.com/ISA-­‐tools/Risa          
  • 31. •  faahKO  package  v.  2.12  contains  ISAtab  files   describing  the  experiment          faahkoISA  =  readISAta(find.package("faahKO"))          assay.filename  <-­‐  faahkoISA["assay.filenames"][[1]]          xset  =  processAssayXcmsSet(faahkoISA,  assay.filename)          …          updateAssayMetadata(faahkoISA,  assay.filename,"Derived  Spectral   Data  File","faahkoDSDF.txt"  )  •  MTBLS2  processing  and  analysis  using  Risa,  xcms  and   CAMERA  BioConductor  packages   Metabolights – an open access general-purpose repository for metabolomics studies and associated meta-data Haug et al, 2012 Nucleic Acids Research
  • 32. The  implicit  seman)cs  of  the                                                            syntax  
  • 33. Hybridiza)on   Derived  Array  Data  File   Sample  Name   Material  Type   Assay  Design  REF   Array  Data  File   Protocol  REF   Assay  Name       sample1   genomic  DNA   assay1   A-AFFY-107" assay1.cel   data  normaliza)on   assay1.txt   sample2   genomic  DNA   assay2   A-AFFY-107" assay2.cel   data  normaliza)on   assay2.txt   sample3   genomic  DNA   assay3   A-AFFY-107" assay3.cel   data  normaliza)on   assay3.txt  Material  transforma)ons...   Material  Node   Data  File  Node   " " DATA! Characteristics[…] Material! Derived Data File Factor Value[…] (independent Protocol   variables) Process   Material Type Comment[…] Parameter  Value   " […]   " Material! DATA! Raw Data Performer    (operator effect) File  Date  (day effect)
  • 34. 45   Tagging:  from  free  text  to  ontology-­‐based   • single  interven)on  representa)on,  free  text  annota)on   Factor   Characteris)cs[organism]   Factor   Factor   Source  Name   Value[perturba)on     Value[dose]   Value[dura)on]   agent]   individual1   human   aspirin   high  dose   12  weeks   • single  interven)on,  ontology-­‐based  annota)on   Factor   Characteris)cs[organism Term  Source   Term  Accession   Value[chemical   Term  Source   Term  Accession  Source  Name   obi:0100026)])   REF   Number   compound   REF   Number     CHEBI_37577)]  individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354   Factor   Term  Source   Term  Accession   Factor  Value[)me   Term  Source   Term  Accession   Unit   Value[dose(OBI_0000984)   REF   Number   (PATO_0000165)]   REF   Number   low  dose   LNC   LP30872-­‐3   12   week   UO   0000034  
  • 35. ToxBank  effort    developed  by  Nina  Jeliazkova     Health  Care  &  Life  Sciences    Kohonen  et  al.  The  ToxBank  Data  Warehouse:  a   Interest  Group    research  cluster  of  7    EU  FP7  Health  systems  toxicology  and  toxicogenomics  projects.    
  • 36. •  Make  the  seman)cs  of  ISAtab  explicit,  including   materials  &  data  en))es  &  processes  &  their   rela)onships  •  Provide  incen)ves  for  provision  of  ontology-­‐ based  annota)ons  in  ISA-­‐TAB  datasets;  exploit   those  annota)ons    •  Augment  ISA  syntax  with  new  elements  (e.g.   groups),  facilita)ng  the  understanding  &   querying  of  experimental  design  •  Facilitate  data  integra)on  &  knowledge   discovery/reasoning  
  • 37. architecture  ISA-­‐TAB   parser            graph   isa2owl  mapping   analysis   parser   Configura)on   file   Implementa)on:   -­‐  java-­‐based   -­‐  Using  owlapi  
  • 38. vocabularies   Chemical   Biomolecular     Informa)on   domain   domain   domain        Experimental   domain   Factor   Characteris)cs[organi Term   Term  Accession   Value[chemical   Term  Source   Term  Accession   Source  Name   smobi:0100026)])   Source  REF   Number   compound   REF   Number       CHEBI_37577)]   individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354  
  • 39. Open  Biological  and   Biomedical  Ontologies   (OBO)  Foundry   BFO   ChEBI   GO   IAO   Factor   Characteris)cs[organiOBI   Term   Term  Accession   Value[chemical   Term  Source   Term  Accession   Source  Name   smobi:0100026)])   Source  REF   Number   compound   REF   Number     CHEBI_37577)]   individual1   Homo  sapiens   NCBITax   9606   aspirin   CHEBI   1231354  
  • 40. ISA-­‐OBI  mapping  
  • 41. ISA-­‐SIO  mapping  
  • 42. faahKO  dataset    Available  in  Bioconductor    (with  ISA-­‐TAB  metadata)  Global  metabolite  profiling   Data  subset:  LC/ MS  peaks  from  the   spinal  cords  of  6   wild-­‐type  and  6   FAAH  (fa[y  acid   amyde  hydrolase)   knockout  mice  
  • 43. •  support  different  conversion  modes  (different  levels  of   granularity)  •  querying  for  ISA-­‐TAB  datasets,  across  mul)ple   experiment  types  •  reasoning  exploi)ng  ontology  annota)ons   –   seman)c  valida)on  of  ISA-­‐TAB  datasets  •  augmented  annota)on  over  na)ve  ISA  syntax   –  iden)fica)on  gaps  in  ontological  representa)ons     –  feedback  of  findings  to  community  ontologies    
  • 44. Increasing  level  of  structure     for  experimental  metadata  Notes  in  Lab  books   Spreadsheets  &  Tables   Facts  as  RDF  statements     (ISAtab  metadata)    
  • 45. Towards  interoperable  bioscience  data   Sansone  et  al,  2012   Nature  Gene)cs  A  growing  ecosystem    of  over  30  public  and  internal  resources  using  the  ISA  metadata  tracking  framework    to  facilitate  standards-­‐compliant  collec)on,  cura)on,  management  and  reuse  of  inves)ga)ons  in  an  increasingly  diverse  set  of  life  science  domains.  
  • 46. Implementa)on  at  Harvard   ISA h[p://discovery.hsci.harvard.edu/  
  • 47. Implementa)on  at  the    European  Bioinforma)cs  Ins)tute   h[p://www.ebi.ac.uk/metabolights   60
  • 48. reasoning   visualiza)on  analysis   browsing   integra)on   exchange   retrieval  Reproducible  &  Reusable     Bioscience  Research  
  • 49. @isatools  @biosharing  isa-­‐tools.org        isacommons.org    biosharing.org  

×