NETTAB 2012

3,492 views

Published on

1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
3,492
On SlideShare
0
From Embeds
0
Number of Embeds
2,593
Actions
Shares
0
Downloads
5
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

NETTAB 2012

  1. 1. The  open  source  ISA  soOware  suite  and  its   internaQonal  user  community:  Knowledge  management  of  experimental  data   Alejandra  González-­‐Beltrán   Senior Software Engineer, ISATeam Oxford  e-­‐Research  Centre,  University  of  Oxford    Oxford,  UK NETTAB  2012  –  Integrated  Bio-­‐Search,  Como,  Italy,  November  14-­‐16  
  2. 2. Outline  •  Knowledge  management  of  experimental  data   –  SeSng  the  scene   –  The                                ecosystem:  ISA-­‐tab,  tools,  community   –  Use  case  •  Latest  addiQons    •  Related  projects  &  main  points  
  3. 3. SeSng  the  scene   health   agro   env   tox/pharma   Source  of  the  figure:  EBI  website  Bioscience    is  mulQ-­‐domain…  
  4. 4. SeSng  the  scene   health   agro   env   tox/pharma   Source  of  the  figure:  EBI  website  Bioscience    is  mulQ-­‐domain…   Petabytes  of  data  
  5. 5. SeSng  the  scene   health   agro   env   tox/pharma   Source  of  the  figure:  EBI  website  Bioscience    is  mulQ-­‐domain…   Petabytes  of  data   Experimental  metadata   in  Lab  books  
  6. 6. inves&ga&on  study  assay  •  Assist  in  the  annotaQon  and  management  of   experimental  data  at  source    •  Deal  with  data  from  high-­‐throughput  studies   using  one  or  a  combinaQon  of  omics  and  other   technologies  •  Empower  users  to  uptake  community-­‐defined   checklists  and  ontologies  •  Facilitate  data  sharing,  reuse,  comparison  and   reproducibility  of  experiments,  submission  to   internaQonal  public  repositories  
  7. 7. The                          ecosystem  
  8. 8. The                          ecosystem  ISA software suite: supporting standards-compliant Towards interoperable bioscience data experimental annotation and enabling curation at the Sansone et al, 2012 community level Nature Genetics Rocca-Serra et al, 2010 Bioinformatics
  9. 9. General  purpose  &  flexible  format  Domain  agnosQc  Captures  metadata  in  omics  experiments  and  tradiQonal  experiments  (e.g.  clinical  chemistry  and  histology)  
  10. 10. faahKO  dataset  •  Available  in  BioConductor  •  Subset  of  the  original  data  on  global  metabolite  profiling   Saghatlian  et  al.   Biochemstry.  2004  •  LC/MS  peaks  from  the  spinal  cords  of  6  wild-­‐type  and  6  FAAH   (facy  acid  amyde  hydrolase)  knockout  mice  
  11. 11. -­‐    Define  key  enQQes  (e.g.  factors,    protocols,  parameters)  -­‐  Grouping  of  studies  -­‐  Relate  studies  and  assays   faahKO  invesQgaQon  
  12. 12. -­‐  Subjects  studied:  source(s),  sampling   methodology,  characterisQcs  faahKO  study   -­‐  treatments/manipulaQons  performed     to  prepare  the  specimens     NEWT  UniProt  Taxonomy  Database   Mouse  Genome  InformaQcs  
  13. 13. -­‐  Subjects  studied:  source(s),  sampling   methodology,  characterisQcs  faahKO  study   -­‐  treatments/manipulaQons  performed     to  prepare  the  specimens     Mouse  Adult  Gross  Anatomy  
  14. 14. -­‐  measurement  type,  e.g.  metabolite  profiling  -­‐  technology,  e.g.  mass  spectrometry   faahKO  assay  
  15. 15. Report  and  edit  the  descripQon  of  the  invesQgaQon   using  Google  Spreadsheets.       Use  Google  Spreadsheets  in  combinaQon  with  ISA-­‐Tab  templates  (created  through  imporQng  the  Excel   file  from  the  ISAconfigurator)  and  OntoMaton  (for   ontology  search  and  tagging  support)  to  report  an   invesQgaQon.  
  16. 16. -­‐  collaboraQve  annotaQon   -­‐  distributed  groups  of  users   -­‐  version  control  &  history    Ontology  Search  and  Tagging  in  Google  Spreadsheets  
  17. 17. Create  templates  detailing  the  steps  to  be  reported  for  different  invesQgaQons,  complying  to  community  standards   (listed  at                                                    ),  e.g.  configuring  fields  to  be  (i)   ontology  terms,  (ii)  text  (with/without  regular  expression   tesQng),  (iii)  numbers  etc.  
  18. 18. From  the  ISA-­‐Tab  we  can  perform  analysis,  convert  to  RDF/OWL  and  other  formats  for  submission/ sharing  to  local/remote  repositories,    
  19. 19. From  the  ISA-­‐Tab  we  can  perform  analysis,  convert  to  RDF/OWL  and  other  formats  for  submission/ sharing  to  local/remote  repositories,     +  VisualisaQon  Methods  
  20. 20. faahKO  Groups  faahKO  Workflow   Maguire   E,   Rocca-­‐Serra   P,   Sansone   SA,   Davies  J  and  Chen  M.   Taxonomy-­‐based   Glyph   Design   -­‐-­‐   with   a   Case   Study   on   Visualizing   Workflows   of   Biological  Experiments,   IEEE  Transac9ons  on  Visualiza9on  and   Computer  Graphics,  volume  18,  2012  (in   press)  
  21. 21. •  R  package  available  in  BioConductor  2.11     hcp://bioconductor.org/packages/release/bioc/html/Risa.html  •  ISAtab  class  •  Read  ISAtab  files  into  ISAtab  objects  and  save   ISAtab  files  •  Build  xcmsSet  (xcms  package)  objects  from   mass  spectrometry  assays      •  Augment  the  ISAtab  dataset  aOer  analysis  •                                                           source  &  issues  tracking     hcps://github.com/ISA-­‐tools/Risa        
  22. 22. •  faahKO  package  v.  2.12  contains  ISAtab  files   describing  the  experiment          faahkoISA  =  readISAta(find.package("faahKO"))          assay.filename  <-­‐  faahkoISA["assay.filenames"][[1]]          xset  =  processAssayXcmsSet(faahkoISA,  assay.filename)          …          updateAssayMetadata(faahkoISA,  assay.filename,"Derived  Spectral   Data  File","faahkoDSDF.txt"  )  •  MTBLS2  processing  and  analysis  using  Risa,  xcms  and   CAMERA  BioConductor  packages   Metabolights – an open access general-purpose repository for metabolomics studies and associated meta-data Haug et al, 2012 Nucleic Acids Research
  23. 23.  ISA  syntax     &  Underlying  Material/Data  workflows   Input  Material  or   Output  Material  or   Data  Node   Data  Node  Characteris9cs[…]  Factor  Value[…]   Characteris9cs[…]   Factor  Value[…]   Protocol  REF   Parameter  Value  […]   26  
  24. 24. •  Make  the  semanQcs  of  ISAtab  explicit,  including   materials  &  data  enQQes  &  processes  •  Exploit  the  semanQc  annotaQons  available  in   ISAtab  datasets  •  Augment  ISA  syntax  with  new  elements  (e.g.   groups),  facilitaQng  the  understanding  &   querying  of  experimental  design  •  Facilitate  data  integraQon  &  knowledge   discovery/reasoning  
  25. 25. ISAtab  datasets  as  linked  data    •  Connect  to  the  growing  Linked  Data  universe        RDF  =  Resource  DescripQon  Framework,  OWL  =  Web  Ontology  Language  •  CollaboraQons  with  Toxbank  (                                )     &   W3C   Health   Care   &   Life   Sciences   Interest   Group  (HCLSIG)   <subject,  predicate,  object>     <lipoprotein>  <parQcipates_in>  <inflammatory  response>     <PRO:212342352>  <BFO_0000056>  <GO:0006954>  
  26. 26. ISAtab  dataset   ISAtab  Graph   Parser   Analysis   ISA  Mapping   Parser  
  27. 27. ISA-­‐OBO-­‐mapping  
  28. 28. has  specified  input   type  material  enQty   Saghantelian_1   sample    collecQon   derives  from   has  specified  output   type   type   KO1   has  specified  input   processed     material   derives  from   extracQon   material     processing   type   has  specified  output   KO1_extract   has  specified  input   type   InformaQon   derives  from   mass   content  enQty   spectrometry   has  specified  output   type   ./cdf/KO/ko15.CDF  
  29. 29. Increasing  level  of  structure…   …different  target  audiences   Notes  in  Lab  books   Spreadsheets  &  Tables   Facts  as  RDF  statements  (informaQon  for  humans)   (ISAtab  metadata)   (informaQon  for  machines)  
  30. 30. core  organizaQon  in  the   UK  Node  
  31. 31. Implementation at Harvard ISA hcp://discovery.hsci.harvard.edu/    
  32. 32. Implementation at the EBIhcp://www.ebi.ac.uk/metabolights     Metabolights – an open access general-purpose repository for metabolomics studies and associated meta-data Haug et al, 2012 Nucleic Acids Research 35
  33. 33. The                          ecosystem  
  34. 34. @isatools  @biosharing  Isa-­‐tools.org          isacommons.org        biosharing.org  

×