Community-­‐standards	  for	  reproducible	  and	             reusable	  research	  -­‐	  	             fundamentals	  and...
Ioannidis	   et	   al.,	   Repeatability	   of	   published	   microarray	  gene	  expression	  analyses.	  Nature	  Gene*...
Ioannidis	   et	   al.,	   Repeatability	   of	   published	   microarray	  gene	  expression	  analyses.	  Nature	  Gene*...
Roadmap	                Reproducible	  &	  Reusable	  	                  Bioscience	  Research	           Principles	  &	 ...
Roadmap	            reasoning	   visualizaYon	                analysis	   browsing	   integraYon	                    excha...
Roadmap	                          reasoning	   visualizaYon	                               analysis	   browsing	   integra...
Bioscience	  is	  mulY-­‐domain…	                                                                                         ...
From	  reusable	  data	  to	  reproducible	  research	  To	   make	   the	   datasets	   comprehensible	   and	   interope...
Different	  communiYes,	  different	  norms	  and	  standards,	  e.g.:	                                                     ...
Different	  communiYes,	  different	  norms	  and	  standards,	  e.g.:	                                                     ...
Guidelines	  for	  InformaYon	  About	  Therapy	  Experiments	  GIATE	                                                    ...
Growing	  number	  of	  bioscience	  reporYng	  standards	                         MAGE-Tab!     AAO!            miame!   ...
Growing	  number	  of	  bioscience	  reporYng	  standards	                                                                ...
But…	  	     what	  do	  we	  know	  about	  them	  and	  how	  they	  are	  related	                            MAGE-Tab!...
But…	  	     what	  do	  we	  know	  about	  them	  and	  how	  they	  are	  related	                                     ...
A	  coherent,	  curated	  and	     searchable	  catalogue	  of	     data	  sharing	  resources	                    	  •  B...
Standards	  compliance	  is	  challenging…	   Is	  it	  possible	  to	  achieve	  a	  common,	  structured	  representaYon...
Structured	  descripYon	  of	  datasets	                                §  Capture	  all	  salient	  features	  of	      ...
Not	  too	  much,	  not	  too	  lille,	  just	  ‘right’	                                          §  We	  must	  strike	 ...
Metadata tracking framework, designed tosupport the use of several standardschecklists, terminologies andconversions to (a...
user communityThe International Conference onSystems Biology (ICSB), 22-28August, 2008      Susanna-AssuntaSansone www.ebi...
ISA	   soQware	   suite:	   supporYng	   standards-­‐compliant	   experimental	   annotaYon	   and	  enabling	  curaYon	  ...
2   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta3   Sansone www.ebi.ac.uk/...
Ontology	  Search	  and	  Tagging	  in	  Google	  Spreadsheets	  
Ontology Search and Tagging in Google Spreadsheets
ISA	  infrastructure	  &	  linked	  data	       •  Work	  in	  progress	  to	  convert	  to	  RDF/OWL	  to	  connect	     ...
Increasing	  levels	  of	  structure…	  Notes in Lab Books       Spreadsheets and Tables   Facts as RDF statements(informa...
A	  growing	  ecosystem	  of	  over	  30	  public	  and	  internal	  resources	  using	  the	  ISA	              metadata	...
A	  growing	  ecosystem	  of	  over	  30	  public	  and	  internal	  resources	  using	  the	  ISA	                       ...
Implementation at Harvard                            ISA                hlp://discovery.hsci.harvard.edu/	  
Implementation at the EBI hlp://www.ebi.ac.uk/metabolights	                  31
reasoning	   visualizaYon	                               analysis	   browsing	   integraYon	                              ...
@isatools	  @biosharing	  Isa-­‐tools.org	  	  	  	  	  isacommons.org	  	  	  	  biosharing.org	  
Upcoming SlideShare
Loading in...5
×

Drug Discovery- ELRIG -2012

2,315

Published on

Published in: Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,315
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
2
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Drug Discovery- ELRIG -2012

  1. 1. Community-­‐standards  for  reproducible  and   reusable  research  -­‐     fundamentals  and  challenges   Alejandra  González-­‐Beltrán,  PhD   Senior Software Engineer, ISATeam University  of  Oxford  e-­‐Research  Centre,  Oxford,  UK   Drug  Discovery  2012,  Manchester,  UK,  September  6-­‐7  
  2. 2. Ioannidis   et   al.,   Repeatability   of   published   microarray  gene  expression  analyses.  Nature  Gene*cs  41(2),  149-­‐55  (2009)  doi:10.1038/ng.295    
  3. 3. Ioannidis   et   al.,   Repeatability   of   published   microarray  gene  expression  analyses.  Nature  Gene*cs  41(2),  149-­‐55  (2009)  doi:10.1038/ng.295    
  4. 4. Roadmap   Reproducible  &  Reusable     Bioscience  Research   Principles  &  Challenges  
  5. 5. Roadmap   reasoning   visualizaYon   analysis   browsing   integraYon   exchange   retrieval   Well-­‐annotated  &   Structured  Data   Reproducible  &  Reusable     Bioscience  Research   Principles  &  Challenges  
  6. 6. Roadmap   reasoning   visualizaYon   analysis   browsing   integraYon   exchange   retrieval  Community  Standards   So[ware  Tools   Well-­‐annotated  &   Structured  Data   Reproducible  &  Reusable     Bioscience  Research   Principles  &  Challenges  
  7. 7. Bioscience  is  mulY-­‐domain…   health   env   agro   tox/pharma  §       Interdisciplinary  and  integra9ve  in  character     •  need  to  deal  with  new  and  exis9ng  datasets   •  deal  with  a  variety  of  data  types   Source  of  the  figure:  EBI  website  
  8. 8. From  reusable  data  to  reproducible  research  To   make   the   datasets   comprehensible   and   interoperable,   underpinning   future  invesYgaYons,  we  need  common  ways  to  report  and  share  the  experimental  details  and  the  associated  results   Consistent  reporYng  will  have  a  posiYve  and  long-­‐lasYng  impact  on  the  value  of   collec9ve  scien9fic  outputs.   Community  Standards   The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  9. 9. Different  communiYes,  different  norms  and  standards,  e.g.:   use  the  same  term  to   allow  data  to  flow  from   report  the  same  core,     refer  to  the  same  ‘thing’   one  system  to  another   essenYal  informaYon    
  10. 10. Different  communiYes,  different  norms  and  standards,  e.g.:   use  the  same  term  to   allow  data  to  flow  from   report  the  same  core,     refer  to  the  same  ‘thing’   one  system  to  another   essenYal  informaYon     Challenges: lack of interaction and coordination, duplication of effort, fragmentation and uneven coverage…hinders interoperability
  11. 11. Guidelines  for  InformaYon  About  Therapy  Experiments  GIATE   TherapeuYc   InvesYgaYon   Generic  Model   Molecular   Cellular   Cellular   Animal   Animal   Clinical   Clinical   Molecular   Model   Model   Model   Model   Model   Model   Model   Model  
  12. 12. Growing  number  of  bioscience  reporYng  standards   MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! GIATE! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO   PRO! IDO…! MIASE! MISFISHIE….!
  13. 13. Growing  number  of  bioscience  reporYng  standards   303  +       150  +       130  +       Source:  MIBBI,     Source:  BioPortal   Es9mated   EQUATOR   Databases,     annotaYon,   curaYon     tools   MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! GIATE! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO   PRO! IDO…! MIASE! MISFISHIE….!
  14. 14. But…     what  do  we  know  about  them  and  how  they  are  related   MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! GIATE! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO   PRO! IDO…! MIASE! MISFISHIE….!
  15. 15. But…     what  do  we  know  about  them  and  how  they  are  related   I  use  high  throughput   Which  tools  and   sequencing  technologies,   databases   which  ones  are  relevant  to   implement  which   me?   standards?   How  can  I  get   What  are  the   involved  to  propose   criteria  to  evaluate   extensions  or   their  status  and   modificaYons?   value?   Which  ones  are   Which  formats   I  work  on  plants,  are   mature  enough  for   support  specific   these  just  for   me  to  use  or   minimum   biomedical   recommend?   informaYon   applicaYons?   guidelines?  
  16. 16. A  coherent,  curated  and   searchable  catalogue  of   data  sharing  resources    •  Bioscience  standards  and   associated  data-­‐sharing   policies,  publica9ons,  tools   and  databases  •  Assessment  criteria  for   usability  and  popularity  of   standards  •  Rela9onships  among   standards  •  Encouragement  for   communica9on  &   interac9on  among  groups  •  PromoYng  interoperability   &  informed  decisions  about   standards  
  17. 17. Standards  compliance  is  challenging…   Is  it  possible  to  achieve  a  common,  structured  representaYon   of  diverse  bioscience  experiments  that:   •  transcends  individual  bioscience  domains,  but  also   •  follows  the  appropriate  community  norms  and  standards?   The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  18. 18. Structured  descripYon  of  datasets   §  Capture  all  salient  features  of   the  experimental  workflow     §  Make  annotaYon  explicit  and   discoverable       §  Structure  the  descripYons  for   consistency,  tracking   §  independent  variables   §  dependent  variables   and  using   §  resolvable  idenYfiers  and   cross-­‐references  
  19. 19. Not  too  much,  not  too  lille,  just  ‘right’   §  We  must  strike  a  balance   between  sufficiency  and   pracYcability:   •  depth  and  breadth  of   informaYon   •  burden  to  produce  and   maintain  the  informaYon  
  20. 20. Metadata tracking framework, designed tosupport the use of several standardschecklists, terminologies andconversions to (a growing number of) othermetadata formats, used by publicrepositories, e.g. MAGE-Tab Pride-xml SRA-xml SOFT
  21. 21. user communityThe International Conference onSystems Biology (ICSB), 22-28August, 2008 Susanna-AssuntaSansone www.ebi.ac.uk/net-project
  22. 22. ISA   soQware   suite:   supporYng   standards-­‐compliant   experimental   annotaYon   and  enabling  curaYon  at  the  community  level  (Rocca-­‐Serra  et  al,  2010)  
  23. 23. 2 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta3 Sansone www.ebi.ac.uk/net-project empowering researchers to use standards
  24. 24. Ontology  Search  and  Tagging  in  Google  Spreadsheets  
  25. 25. Ontology Search and Tagging in Google Spreadsheets
  26. 26. ISA  infrastructure  &  linked  data   •  Work  in  progress  to  convert  to  RDF/OWL  to  connect   to  the  growing  Linked  Data  universe        RDF  =  Resource  DescripYon  Framework,  OWL  =  Web  Ontology  Language   •  CollaboraYons  with  Toxbank  &  W3C  HCLSIG  <subject,  predicate,  object>    <lipoprotein>  <parYcipates_in>  <inflammatory  response>    <PRO:212342352>  <BFO_0000056>  <GO:0006954>  
  27. 27. Increasing  levels  of  structure…  Notes in Lab Books Spreadsheets and Tables Facts as RDF statements(information for humans) ( the compromise) (information for machines)
  28. 28. A  growing  ecosystem  of  over  30  public  and  internal  resources  using  the  ISA   metadata   tracking   framework   to   facilitate   standards-­‐compliant   collec9on,   cura9on,   management   and   reuse   of   invesYgaYons   in   an   increasingly   diverse   set  of  life  science  domains,  including:       •  environmental  health   •  stem  cell  discovery   •  environmental  genomics   •  system  biology   •  metabolomics   •  transcriptomics   •  metagenomics   •  toxicogenomics   •  nanotechnology   •  also  by  communiYes  working  to  build  a   •  proteomics,   library  of  cellular  signatures  We  aim  to  achieve  a  common  representaYon   of  experimental  content  that  transcends   individual  bioscience  domains   Sansone et al., Towards interoperable bioscience data. Nature Genetics 44, 121-126 (2012) doi:10.1038/ng.1054
  29. 29. A  growing  ecosystem  of  over  30  public  and  internal  resources  using  the  ISA   metadata   tracking   framework   to   facilitate   standards-­‐compliant   collec9on,   cura9on,   management   and   reuse   of   invesYgaYons   in   an   increasingly   diverse   set  of  life  science  domains,  including:       •  environmental  health   •  stem  cell  discovery   •  environmental  genomics   •  system  biology   •  metabolomics   •  transcriptomics   •  metagenomics   •  toxicogenomics   •  nanotechnology   •  also  by  communiYes  working  to  build  a   •  proteomics,   library  of  cellular  signatures   Some  of  the  public  groups/resources:   Some  of  the  internal  projects:   Stem Cell Commons Nanotechnology    InformaYcs  Working   Group    
  30. 30. Implementation at Harvard ISA hlp://discovery.hsci.harvard.edu/  
  31. 31. Implementation at the EBI hlp://www.ebi.ac.uk/metabolights   31
  32. 32. reasoning   visualizaYon   analysis   browsing   integraYon   exchange   retrieval  Community  Standards   So[ware  Tools   Well-­‐annotated  &   Guidelines   GIATE   Structured  Data   Formats  Terminologies   Reproducible  &  Reusable     lack  of   Bioscience  Research   Standards-­‐compliant     coordinaYon,   data  sharing  is    fragmentaYon  and   demanding  and     uneven  coverage   Yme-­‐consuming  
  33. 33. @isatools  @biosharing  Isa-­‐tools.org          isacommons.org        biosharing.org  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×