Community-­‐standards	
  for	
  reproducible	
  and	
  
           reusable	
  research	
  -­‐	
  	
  
           fundamentals	
  and	
  challenges	
  

                   Alejandra	
  González-­‐Beltrán,	
  PhD	
  

                  Senior Software Engineer, ISATeam
          University	
  of	
  Oxford	
  e-­‐Research	
  Centre,	
  Oxford,	
  UK	
  



                  Drug	
  Discovery	
  2012,	
  Manchester,	
  UK,	
  September	
  6-­‐7	
  
Ioannidis	
   et	
   al.,	
   Repeatability	
   of	
   published	
   microarray	
  
gene	
  expression	
  analyses.	
  Nature	
  Gene*cs	
  41(2),	
  149-­‐55	
  
(2009)	
  doi:10.1038/ng.295	
  	
  
Ioannidis	
   et	
   al.,	
   Repeatability	
   of	
   published	
   microarray	
  
gene	
  expression	
  analyses.	
  Nature	
  Gene*cs	
  41(2),	
  149-­‐55	
  
(2009)	
  doi:10.1038/ng.295	
  	
  
Roadmap	
  




              Reproducible	
  &	
  Reusable	
  	
  
                Bioscience	
  Research	
  



         Principles	
  &	
  Challenges	
  
Roadmap	
            reasoning	
   visualizaYon	
  
              analysis	
   browsing	
   integraYon	
  
                  exchange	
   retrieval	
  



                      Well-­‐annotated	
  &	
  
                      Structured	
  Data	
  



              Reproducible	
  &	
  Reusable	
  	
  
                Bioscience	
  Research	
  



         Principles	
  &	
  Challenges	
  
Roadmap	
                          reasoning	
   visualizaYon	
  
                             analysis	
   browsing	
   integraYon	
  
                                 exchange	
   retrieval	
  

Community	
  Standards	
                                                So[ware	
  Tools	
  

                                     Well-­‐annotated	
  &	
  
                                     Structured	
  Data	
  



                             Reproducible	
  &	
  Reusable	
  	
  
                               Bioscience	
  Research	
  



                     Principles	
  &	
  Challenges	
  
Bioscience	
  is	
  mulY-­‐domain…	
  



                                                                                                            health	
  




                             env	
                                                                                            agro	
  


                                                       tox/pharma	
  


§ 	
  	
  	
  Interdisciplinary	
  and	
  integra9ve	
  in	
  character	
  	
  
       •  need	
  to	
  deal	
  with	
  new	
  and	
  exis9ng	
  datasets	
  
       •  deal	
  with	
  a	
  variety	
  of	
  data	
  types	
  
                                                                                   Source	
  of	
  the	
  figure:	
  EBI	
  website	
  
From	
  reusable	
  data	
  to	
  reproducible	
  research	
  

To	
   make	
   the	
   datasets	
   comprehensible	
   and	
   interoperable,	
   underpinning	
   future	
  
invesYgaYons,	
  we	
  need	
  common	
  ways	
  to	
  report	
  and	
  share	
  the	
  experimental	
  details	
  
and	
  the	
  associated	
  results	
  

    Consistent	
  reporYng	
  will	
  have	
  a	
  posiYve	
  and	
  long-­‐lasYng	
  impact	
  on	
  the	
  value	
  of	
  
                                          collec9ve	
  scien9fic	
  outputs.	
  




                                                                                  Community	
  Standards	
  



    The International Conference on
    Systems Biology (ICSB), 22-28
    August, 2008      Susanna-Assunta
    Sansone www.ebi.ac.uk/net-project
Different	
  communiYes,	
  different	
  norms	
  and	
  standards,	
  e.g.:	
  




                                                      use	
  the	
  same	
  term	
  to	
  
             allow	
  data	
  to	
  flow	
  from	
                                                 report	
  the	
  same	
  core,	
  	
  
                                                      refer	
  to	
  the	
  same	
  ‘thing’	
  
             one	
  system	
  to	
  another	
                                                     essenYal	
  informaYon	
  	
  
Different	
  communiYes,	
  different	
  norms	
  and	
  standards,	
  e.g.:	
  




                                                      use	
  the	
  same	
  term	
  to	
  
             allow	
  data	
  to	
  flow	
  from	
                                                 report	
  the	
  same	
  core,	
  	
  
                                                      refer	
  to	
  the	
  same	
  ‘thing’	
  
             one	
  system	
  to	
  another	
                                                     essenYal	
  informaYon	
  	
  


   Challenges: lack of interaction and coordination, duplication of effort,
      fragmentation and uneven coverage…hinders interoperability
Guidelines	
  for	
  InformaYon	
  About	
  Therapy	
  Experiments	
  
GIATE	
  




                                                           TherapeuYc	
  
                                                         InvesYgaYon	
  


                                                         Generic	
  Model	
  



                                  Molecular	
       Cellular	
  
                                                    Cellular	
           Animal	
  
                                                                         Animal	
     Clinical	
  
                                                                                      Clinical	
  
                                  Molecular	
  
                                   Model	
  
                                   Model	
           Model	
  
                                                    Model	
               Model	
  
                                                                         Model	
       Model	
  
                                                                                      Model	
  
Growing	
  number	
  of	
  bioscience	
  reporYng	
  standards	
  




                       MAGE-Tab!     AAO!            miame!
                     GCDML!                               MIAPA!
                                        CHEBI!                     GIATE!
                       SRAxml!       OBI!            MIRIAM!
                                          VO!
             SOFT!                                            MIQAS!
                   FASTA!          PATO!                MIX!
      CML!                                  ENVO!                    REMARK!
               DICOM!                                      MIGEN!
     GELML!                         MOD!
                 SBRML!                                 MIAPE!       MIQE!
                                        TEDDY!
 MITAB!     MzML!                XAO!                         CIMR! CONSORT!
                                             BTO!
ISA-Tab! SEDML…!             DO	
   PRO!     IDO…!          MIASE! MISFISHIE….!
Growing	
  number	
  of	
  bioscience	
  reporYng	
  standards	
  
                                                                        303	
  +	
  	
  	
  




                                                                                                                            150	
  +	
  	
  	
  
                          130	
  +	
  	
  	
  




                                                                                                                                                   Source:	
  MIBBI,	
  	
  
                                                                                               Source:	
  BioPortal	
  
                                                 Es9mated	
  




                                                                                                                                                                EQUATOR	
  
                                                                                                                                                                                        Databases,	
  	
  
                                                                                                                                                                                        annotaYon,	
  
                                                                                                                                                                                         curaYon	
  	
  
                                                                                                                                                                                           tools	
  
                       MAGE-Tab!                                  AAO!                                                    miame!
                     GCDML!                                                                                                    MIAPA!
                                                                     CHEBI!                                                                                                    GIATE!
                       SRAxml!                                    OBI!                                                    MIRIAM!
                                                                       VO!
             SOFT!                                                                                                                       MIQAS!
                   FASTA!                                       PATO!                                                              MIX!
      CML!                                                                      ENVO!                                                           REMARK!
               DICOM!                                                                                                                 MIGEN!
     GELML!                                                      MOD!
                 SBRML!                                                                                                            MIAPE!                                        MIQE!
                                                                       TEDDY!
 MITAB!     MzML!                                               XAO!                                                                                   CIMR! CONSORT!
                                                                                         BTO!
ISA-Tab! SEDML…!                       DO	
   PRO!                                    IDO…!                                                          MIASE! MISFISHIE….!
But…	
  	
  
   what	
  do	
  we	
  know	
  about	
  them	
  and	
  how	
  they	
  are	
  related	
  




                          MAGE-Tab!      AAO!            miame!
                        GCDML!                                MIAPA!
                                           CHEBI!                       GIATE!
                          SRAxml!       OBI!             MIRIAM!
                                             VO!
                SOFT!                                             MIQAS!
                      FASTA!          PATO!                 MIX!
         CML!                                   ENVO!                    REMARK!
                  DICOM!                                       MIGEN!
        GELML!                         MOD!
                    SBRML!                                   MIAPE!        MIQE!
                                            TEDDY!
  MITAB!    MzML!                    XAO!                          CIMR! CONSORT!
                                                 BTO!
ISA-Tab! SEDML…!                 DO	
   PRO!     IDO…!           MIASE! MISFISHIE….!
But…	
  	
  
   what	
  do	
  we	
  know	
  about	
  them	
  and	
  how	
  they	
  are	
  related	
  
                                                                 I	
  use	
  high	
  throughput	
  
          Which	
  tools	
  and	
  
                                                               sequencing	
  technologies,	
  
            databases	
  
                                                               which	
  ones	
  are	
  relevant	
  to	
  
         implement	
  which	
  
                                                                                me?	
  
            standards?	
  

                                                                                  How	
  can	
  I	
  get	
  
     What	
  are	
  the	
                                                      involved	
  to	
  propose	
  
  criteria	
  to	
  evaluate	
                                                    extensions	
  or	
  
    their	
  status	
  and	
                                                      modificaYons?	
  
         value?	
  


             Which	
  ones	
  are	
        Which	
  formats	
            I	
  work	
  on	
  plants,	
  are	
  
            mature	
  enough	
  for	
     support	
  specific	
                  these	
  just	
  for	
  
              me	
  to	
  use	
  or	
         minimum	
                          biomedical	
  
              recommend?	
                  informaYon	
                        applicaYons?	
  
                                             guidelines?	
  
A	
  coherent,	
  curated	
  and	
  
   searchable	
  catalogue	
  of	
  
   data	
  sharing	
  resources	
  
                  	
  
•  Bioscience	
  standards	
  and	
  
   associated	
  data-­‐sharing	
  
   policies,	
  publica9ons,	
  tools	
  
   and	
  databases	
  

•  Assessment	
  criteria	
  for	
  
   usability	
  and	
  popularity	
  of	
  
   standards	
  

•  Rela9onships	
  among	
  
   standards	
  

•  Encouragement	
  for	
  
   communica9on	
  &	
  
   interac9on	
  among	
  groups	
  

•  PromoYng	
  interoperability	
  
   &	
  informed	
  decisions	
  about	
  
   standards	
  
Standards	
  compliance	
  is	
  challenging…	
  




 Is	
  it	
  possible	
  to	
  achieve	
  a	
  common,	
  structured	
  representaYon	
  
 of	
  diverse	
  bioscience	
  experiments	
  that:	
  
 •  transcends	
  individual	
  bioscience	
  domains,	
  but	
  also	
  
 •  follows	
  the	
  appropriate	
  community	
  norms	
  and	
  standards?	
  
 The International Conference on
 Systems Biology (ICSB), 22-28
 August, 2008      Susanna-Assunta
 Sansone www.ebi.ac.uk/net-project
Structured	
  descripYon	
  of	
  datasets	
  


                              §  Capture	
  all	
  salient	
  features	
  of	
  
                                  the	
  experimental	
  workflow	
  
                              	
  




                              §  Make	
  annotaYon	
  explicit	
  and	
  
                                  discoverable	
  
                              	
  
                              	
  




                              §  Structure	
  the	
  descripYons	
  for	
  
                                  consistency,	
  tracking	
  
                                   §  independent	
  variables	
  
                                   §  dependent	
  variables	
  
                                   and	
  using	
  
                                   §  resolvable	
  idenYfiers	
  and	
  
                                       cross-­‐references	
  
Not	
  too	
  much,	
  not	
  too	
  lille,	
  just	
  ‘right’	
  



                                        §  We	
  must	
  strike	
  a	
  balance	
  
                                            between	
  sufficiency	
  and	
  
                                            pracYcability:	
  
                                             •  depth	
  and	
  breadth	
  of	
  
                                                  informaYon	
  
                                             •  burden	
  to	
  produce	
  and	
  
                                                  maintain	
  the	
  informaYon	
  
Metadata tracking framework, designed to
support the use of several standards
checklists, terminologies and
conversions to (a growing number of) other
metadata formats, used by public
repositories, e.g.

      MAGE-Tab     Pride-xml

                      SRA-xml     SOFT
user community


The International Conference on
Systems Biology (ICSB), 22-28
August, 2008      Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
ISA	
   soQware	
   suite:	
   supporYng	
   standards-­‐
compliant	
   experimental	
   annotaYon	
   and	
  
enabling	
  curaYon	
  at	
  the	
  community	
  level	
  
(Rocca-­‐Serra	
  et	
  al,	
  2010)
                             	
  
2   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta
3   Sansone www.ebi.ac.uk/net-project



                              empowering researchers to use standards
Ontology	
  Search	
  and	
  Tagging	
  in	
  Google	
  Spreadsheets	
  
Ontology Search and Tagging in Google Spreadsheets
ISA	
  infrastructure	
  &	
  linked	
  data	
  
     •  Work	
  in	
  progress	
  to	
  convert	
  to	
  RDF/OWL	
  to	
  connect	
  
        to	
  the	
  growing	
  Linked	
  Data	
  universe	
  	
  
         	
  	
  RDF	
  =	
  Resource	
  DescripYon	
  Framework,	
  OWL	
  =	
  Web	
  Ontology	
  Language	
  

     •  CollaboraYons	
  with	
  Toxbank	
  &	
  W3C	
  HCLSIG	
  

<subject,	
  predicate,	
  object>	
  
	
  
<lipoprotein>	
  <parYcipates_in>	
  <inflammatory	
  response>	
  
	
  
<PRO:212342352>	
  <BFO_0000056>	
  <GO:0006954>	
  
Increasing	
  levels	
  of	
  structure…	
  




Notes in Lab Books       Spreadsheets and Tables   Facts as RDF statements
(information for humans) ( the compromise)         (information for machines)
A	
  growing	
  ecosystem	
  of	
  over	
  30	
  public	
  and	
  internal	
  resources	
  using	
  the	
  ISA	
  
            metadata	
   tracking	
   framework	
   to	
   facilitate	
   standards-­‐compliant	
   collec9on,	
  
            cura9on,	
   management	
   and	
   reuse	
   of	
   invesYgaYons	
   in	
   an	
   increasingly	
   diverse	
  
            set	
  of	
  life	
  science	
  domains,	
  including:	
  
            	
  
            	
  



            •      environmental	
  health	
                   •    stem	
  cell	
  discovery	
  
            •      environmental	
  genomics	
                 •    system	
  biology	
  
            •      metabolomics	
                              •    transcriptomics	
  
            •      metagenomics	
                              •    toxicogenomics	
  
            •      nanotechnology	
                            •    also	
  by	
  communiYes	
  working	
  to	
  build	
  a	
  
            •      proteomics,	
                                    library	
  of	
  cellular	
  signatures	
  


We	
  aim	
  to	
  achieve	
  a	
  common	
  representaYon	
  
   of	
  experimental	
  content	
  that	
  transcends	
  
          individual	
  bioscience	
  domains	
  

                                                                             Sansone et al., Towards interoperable
                                                                             bioscience data. Nature Genetics 44,
                                                                             121-126 (2012) doi:10.1038/ng.1054
A	
  growing	
  ecosystem	
  of	
  over	
  30	
  public	
  and	
  internal	
  resources	
  using	
  the	
  ISA	
  
                                 metadata	
   tracking	
   framework	
   to	
   facilitate	
   standards-­‐compliant	
   collec9on,	
  
                                 cura9on,	
   management	
   and	
   reuse	
   of	
   invesYgaYons	
   in	
   an	
   increasingly	
   diverse	
  
                                 set	
  of	
  life	
  science	
  domains,	
  including:	
  
                                 	
  
                                 	
  



                                 •      environmental	
  health	
                   •    stem	
  cell	
  discovery	
  
                                 •      environmental	
  genomics	
                 •    system	
  biology	
  
                                 •      metabolomics	
                              •    transcriptomics	
  
                                 •      metagenomics	
                              •    toxicogenomics	
  
                                 •      nanotechnology	
                            •    also	
  by	
  communiYes	
  working	
  to	
  build	
  a	
  
                                 •      proteomics,	
                                    library	
  of	
  cellular	
  signatures	
  
                            Some	
  of	
  the	
  public	
  groups/resources:	
                                 Some	
  of	
  the	
  internal	
  projects:	
  




                                                Stem Cell Commons




  Nanotechnology	
  	
  
InformaYcs	
  Working	
  
      Group	
  	
  
Implementation at Harvard




                            ISA




                hlp://discovery.hsci.harvard.edu/	
  
Implementation at the EBI




 hlp://www.ebi.ac.uk/metabolights	
  
                31
reasoning	
   visualizaYon	
  
                             analysis	
   browsing	
   integraYon	
  
                                 exchange	
   retrieval	
  

Community	
  Standards	
                                                So[ware	
  Tools	
  

                                     Well-­‐annotated	
  &	
  
 Guidelines	
   GIATE	
              Structured	
  Data	
  
 Formats	
  
Terminologies	
  
                             Reproducible	
  &	
  Reusable	
  	
  
      lack	
  of	
             Bioscience	
  Research	
                   Standards-­‐compliant	
  	
  
   coordinaYon,	
                                                            data	
  sharing	
  is	
  	
  
fragmentaYon	
  and	
                                                        demanding	
  and	
  	
  
 uneven	
  coverage	
                                                       Yme-­‐consuming	
  
@isatools	
  @biosharing	
  
Isa-­‐tools.org	
  	
  	
  	
  	
  isacommons.org	
  	
  	
  	
  biosharing.org	
  

Drug Discovery- ELRIG -2012

  • 1.
    Community-­‐standards  for  reproducible  and   reusable  research  -­‐     fundamentals  and  challenges   Alejandra  González-­‐Beltrán,  PhD   Senior Software Engineer, ISATeam University  of  Oxford  e-­‐Research  Centre,  Oxford,  UK   Drug  Discovery  2012,  Manchester,  UK,  September  6-­‐7  
  • 2.
    Ioannidis   et   al.,   Repeatability   of   published   microarray   gene  expression  analyses.  Nature  Gene*cs  41(2),  149-­‐55   (2009)  doi:10.1038/ng.295    
  • 3.
    Ioannidis   et   al.,   Repeatability   of   published   microarray   gene  expression  analyses.  Nature  Gene*cs  41(2),  149-­‐55   (2009)  doi:10.1038/ng.295    
  • 4.
    Roadmap   Reproducible  &  Reusable     Bioscience  Research   Principles  &  Challenges  
  • 5.
    Roadmap   reasoning   visualizaYon   analysis   browsing   integraYon   exchange   retrieval   Well-­‐annotated  &   Structured  Data   Reproducible  &  Reusable     Bioscience  Research   Principles  &  Challenges  
  • 6.
    Roadmap   reasoning   visualizaYon   analysis   browsing   integraYon   exchange   retrieval   Community  Standards   So[ware  Tools   Well-­‐annotated  &   Structured  Data   Reproducible  &  Reusable     Bioscience  Research   Principles  &  Challenges  
  • 7.
    Bioscience  is  mulY-­‐domain…   health   env   agro   tox/pharma   §       Interdisciplinary  and  integra9ve  in  character     •  need  to  deal  with  new  and  exis9ng  datasets   •  deal  with  a  variety  of  data  types   Source  of  the  figure:  EBI  website  
  • 8.
    From  reusable  data  to  reproducible  research   To   make   the   datasets   comprehensible   and   interoperable,   underpinning   future   invesYgaYons,  we  need  common  ways  to  report  and  share  the  experimental  details   and  the  associated  results   Consistent  reporYng  will  have  a  posiYve  and  long-­‐lasYng  impact  on  the  value  of   collec9ve  scien9fic  outputs.   Community  Standards   The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 9.
    Different  communiYes,  different  norms  and  standards,  e.g.:   use  the  same  term  to   allow  data  to  flow  from   report  the  same  core,     refer  to  the  same  ‘thing’   one  system  to  another   essenYal  informaYon    
  • 10.
    Different  communiYes,  different  norms  and  standards,  e.g.:   use  the  same  term  to   allow  data  to  flow  from   report  the  same  core,     refer  to  the  same  ‘thing’   one  system  to  another   essenYal  informaYon     Challenges: lack of interaction and coordination, duplication of effort, fragmentation and uneven coverage…hinders interoperability
  • 11.
    Guidelines  for  InformaYon  About  Therapy  Experiments   GIATE   TherapeuYc   InvesYgaYon   Generic  Model   Molecular   Cellular   Cellular   Animal   Animal   Clinical   Clinical   Molecular   Model   Model   Model   Model   Model   Model   Model   Model  
  • 12.
    Growing  number  of  bioscience  reporYng  standards   MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! GIATE! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO   PRO! IDO…! MIASE! MISFISHIE….!
  • 13.
    Growing  number  of  bioscience  reporYng  standards   303  +       150  +       130  +       Source:  MIBBI,     Source:  BioPortal   Es9mated   EQUATOR   Databases,     annotaYon,   curaYon     tools   MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! GIATE! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO   PRO! IDO…! MIASE! MISFISHIE….!
  • 14.
    But…     what  do  we  know  about  them  and  how  they  are  related   MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! GIATE! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO   PRO! IDO…! MIASE! MISFISHIE….!
  • 15.
    But…     what  do  we  know  about  them  and  how  they  are  related   I  use  high  throughput   Which  tools  and   sequencing  technologies,   databases   which  ones  are  relevant  to   implement  which   me?   standards?   How  can  I  get   What  are  the   involved  to  propose   criteria  to  evaluate   extensions  or   their  status  and   modificaYons?   value?   Which  ones  are   Which  formats   I  work  on  plants,  are   mature  enough  for   support  specific   these  just  for   me  to  use  or   minimum   biomedical   recommend?   informaYon   applicaYons?   guidelines?  
  • 16.
    A  coherent,  curated  and   searchable  catalogue  of   data  sharing  resources     •  Bioscience  standards  and   associated  data-­‐sharing   policies,  publica9ons,  tools   and  databases   •  Assessment  criteria  for   usability  and  popularity  of   standards   •  Rela9onships  among   standards   •  Encouragement  for   communica9on  &   interac9on  among  groups   •  PromoYng  interoperability   &  informed  decisions  about   standards  
  • 17.
    Standards  compliance  is  challenging…   Is  it  possible  to  achieve  a  common,  structured  representaYon   of  diverse  bioscience  experiments  that:   •  transcends  individual  bioscience  domains,  but  also   •  follows  the  appropriate  community  norms  and  standards?   The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 18.
    Structured  descripYon  of  datasets   §  Capture  all  salient  features  of   the  experimental  workflow     §  Make  annotaYon  explicit  and   discoverable       §  Structure  the  descripYons  for   consistency,  tracking   §  independent  variables   §  dependent  variables   and  using   §  resolvable  idenYfiers  and   cross-­‐references  
  • 19.
    Not  too  much,  not  too  lille,  just  ‘right’   §  We  must  strike  a  balance   between  sufficiency  and   pracYcability:   •  depth  and  breadth  of   informaYon   •  burden  to  produce  and   maintain  the  informaYon  
  • 20.
    Metadata tracking framework,designed to support the use of several standards checklists, terminologies and conversions to (a growing number of) other metadata formats, used by public repositories, e.g. MAGE-Tab Pride-xml SRA-xml SOFT
  • 21.
    user community The InternationalConference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 22.
    ISA   soQware   suite:   supporYng   standards-­‐ compliant   experimental   annotaYon   and   enabling  curaYon  at  the  community  level   (Rocca-­‐Serra  et  al,  2010)  
  • 23.
    2 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta 3 Sansone www.ebi.ac.uk/net-project empowering researchers to use standards
  • 24.
    Ontology  Search  and  Tagging  in  Google  Spreadsheets  
  • 25.
    Ontology Search andTagging in Google Spreadsheets
  • 26.
    ISA  infrastructure  &  linked  data   •  Work  in  progress  to  convert  to  RDF/OWL  to  connect   to  the  growing  Linked  Data  universe        RDF  =  Resource  DescripYon  Framework,  OWL  =  Web  Ontology  Language   •  CollaboraYons  with  Toxbank  &  W3C  HCLSIG   <subject,  predicate,  object>     <lipoprotein>  <parYcipates_in>  <inflammatory  response>     <PRO:212342352>  <BFO_0000056>  <GO:0006954>  
  • 27.
    Increasing  levels  of  structure…   Notes in Lab Books Spreadsheets and Tables Facts as RDF statements (information for humans) ( the compromise) (information for machines)
  • 28.
    A  growing  ecosystem  of  over  30  public  and  internal  resources  using  the  ISA   metadata   tracking   framework   to   facilitate   standards-­‐compliant   collec9on,   cura9on,   management   and   reuse   of   invesYgaYons   in   an   increasingly   diverse   set  of  life  science  domains,  including:       •  environmental  health   •  stem  cell  discovery   •  environmental  genomics   •  system  biology   •  metabolomics   •  transcriptomics   •  metagenomics   •  toxicogenomics   •  nanotechnology   •  also  by  communiYes  working  to  build  a   •  proteomics,   library  of  cellular  signatures   We  aim  to  achieve  a  common  representaYon   of  experimental  content  that  transcends   individual  bioscience  domains   Sansone et al., Towards interoperable bioscience data. Nature Genetics 44, 121-126 (2012) doi:10.1038/ng.1054
  • 29.
    A  growing  ecosystem  of  over  30  public  and  internal  resources  using  the  ISA   metadata   tracking   framework   to   facilitate   standards-­‐compliant   collec9on,   cura9on,   management   and   reuse   of   invesYgaYons   in   an   increasingly   diverse   set  of  life  science  domains,  including:       •  environmental  health   •  stem  cell  discovery   •  environmental  genomics   •  system  biology   •  metabolomics   •  transcriptomics   •  metagenomics   •  toxicogenomics   •  nanotechnology   •  also  by  communiYes  working  to  build  a   •  proteomics,   library  of  cellular  signatures   Some  of  the  public  groups/resources:   Some  of  the  internal  projects:   Stem Cell Commons Nanotechnology     InformaYcs  Working   Group    
  • 30.
    Implementation at Harvard ISA hlp://discovery.hsci.harvard.edu/  
  • 31.
    Implementation at theEBI hlp://www.ebi.ac.uk/metabolights   31
  • 32.
    reasoning   visualizaYon   analysis   browsing   integraYon   exchange   retrieval   Community  Standards   So[ware  Tools   Well-­‐annotated  &   Guidelines   GIATE   Structured  Data   Formats   Terminologies   Reproducible  &  Reusable     lack  of   Bioscience  Research   Standards-­‐compliant     coordinaYon,   data  sharing  is     fragmentaYon  and   demanding  and     uneven  coverage   Yme-­‐consuming  
  • 33.
    @isatools  @biosharing   Isa-­‐tools.org          isacommons.org        biosharing.org