www.slideshare.net/SusannaSansoneDelivering reproducible bioscience data by enabling                biocuration at the sou...
University of Oxford e-Research Centre
University of Oxford e-Research Centre                      Providing research                      computing, high-      ...
University of Oxford e-Research Centre                   Collaborating with European and wider                   internati...
My team’s activities and stakeholders we work with     data management and biocuration, collaborative development of softw...
Outline        “The buzz around reproducible bioscience data:                    the communities and the standards”“The re...
http://www.flickr.com/photos/notbrucelee/8016189356/   CC BY
C                                  E      O M            R            H                                              N    ...
C                                  E        O M           R           H                                              N    ...
C                                  E        O M           R           H                                              N    ...
experimental design     sample characteristic(s)     experimental variable(s)            technology(s)           measureme...
§  We must strike a balance    between     •  depth and breadth of        information; and     •  sufficient information ...
Growing, worldwide movement for reproducible research esoteric formats                                                    ...
Community mobilization to develop standards, e.g.:                               use the same word and     allow data to f...
Is this general mobilization good or bad?                                        use the same word and              allow ...
Growing number of reporting standards                       MAGE-Tab!     AAO!            miame!                     GCDML...
Growing number of reporting standards                                                      + 303                          ...
But how much do we know about these standards            Which tools and        I use high throughput              databas...
A catalogue to map the                                                                                  landscape of stand...
•    A coherent, curated and searchable catalogue of data sharing resources•    Bioscience standards and associated data-s...
Social engineering22   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sanson...
Ownership of open standards                                          can be problematic in broad,                         ...
The extensive community                                         liaison needs to be managed                               ...
The cost of implementing a                                           standards-supported data                             ...
Funders are actively developing data policies
Similar trend in the regulatory arena…
… and in the commercial sector
….the rise of data-driven journals, e.g.:                                            partnering with:
core organization in the                              UK node                           work in progress        UK Node
reasoning visualization            analysis browsing integration               exchange retrievalCommunity                ...
An exemplar approach to the status quo§  A grass-root collaborative that works to facilitate collection, curation    and ...
metadata tracking framework                                                           user communityThe International Conf...
collection, curate and sharing of bioscience experiments
A growing ecosystem of over 30 public and internal resources using theISA metadata tracking framework to facilitate standa...
Implementations at Harvard  Importance of a local community
Implementations at Harvarddata sharing in ISA-Tab               Importance of a local community
Implementations at Harvarddata sharing in ISA-Tab               Importance of a local community
Implementation at the EBI40   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta...
Extensions of the          Nanotechnology     Informatics Working Group41    The International Conference on Systems Biolo...
We must increase the level of annotation   Notes in Lab Books       Spreadsheets and Tables   Facts as RDF statements   (i...
Collaborative approaches are highly valuable but take timeCommunity involvement and uptake!1st ISA-Tab workshop! 3rd ISA-T...
Sa sansone dccroadshow-nov2012Delivering reproducible bioscience data by enabling biocuration at the source
Sa sansone dccroadshow-nov2012Delivering reproducible bioscience data by enabling biocuration at the source
Sa sansone dccroadshow-nov2012Delivering reproducible bioscience data by enabling biocuration at the source
Upcoming SlideShare
Loading in...5
×

Sa sansone dccroadshow-nov2012Delivering reproducible bioscience data by enabling biocuration at the source

167

Published on

http://www.dcc.ac.uk/events/data-management-roadshows/dcc-roadshow-london-2

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
167
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Sa sansone dccroadshow-nov2012Delivering reproducible bioscience data by enabling biocuration at the source

  1. 1. www.slideshare.net/SusannaSansoneDelivering reproducible bioscience data by enabling biocuration at the source Susanna-Assunta Sansone, PhD Principal Investigator and Team Leader, University of Oxford e-Research Centre, Oxford, UK Academic Consultant, Open Access Data Products, Nature Publishing Group Data Curation Centre (DCC) 13th Regional Data Management Roadshow, London, 20 November 2012
  2. 2. University of Oxford e-Research Centre
  3. 3. University of Oxford e-Research Centre Providing research computing, high- performance computing Integrating with national and international infrastructure Supporting leading edge facilities through education and training
  4. 4. University of Oxford e-Research Centre Collaborating with European and wider international groups in, e.g.: •  energy, •  radio astronomy, •  biological data federation, •  life sciences simulation, •  biodiversity, •  computational chemistry, •  neuroscience, •  digital humanities tools, •  digital music analysis •  visualization •  …
  5. 5. My team’s activities and stakeholders we work with data management and biocuration, collaborative development of software and database, standards and ontology•  environmental genomics •  stem cell discovery•  metabolomics •  system biology•  metagenomics •  transcriptomics•  nanotechnology •  toxicogenomics•  proteomics •  environmental health
  6. 6. Outline “The buzz around reproducible bioscience data: the communities and the standards”“The reality from the buzz:challenges and exemplar project”
  7. 7. http://www.flickr.com/photos/notbrucelee/8016189356/ CC BY
  8. 8. C E O M R H N P E S I B L Ehttp://www.flickr.com/photos/notbrucelee/8016189356/ CC BY
  9. 9. C E O M R H N P E SI N R E I T E O P R A B L Ehttp://www.flickr.com/photos/notbrucelee/8016189356/ CC BY
  10. 10. C E O M R H N P E SI N R E I T E O P R A R E U S B L Ehttp://www.flickr.com/photos/notbrucelee/8016189356/ CC BY
  11. 11. experimental design sample characteristic(s) experimental variable(s) technology(s) measurement(s) protocols(s) data file(s) ......11 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  12. 12. §  We must strike a balance between •  depth and breadth of information; and •  sufficient information required to reuse the data §  Capture all salient features of the experimental workflow §  Make annotation explicit and discoverable §  Structure the descriptions for consistency, tracking12 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  13. 13. Growing, worldwide movement for reproducible research esoteric formats comprehensible? lack of sufficient contextual interoperable? informationhoc or proprietary reusable? terminologies Source: http://ebbailey.wordpress.com§  Researchers and bioinformaticians in both academic and commercial science, along with funding agencies and publishers, embrace the concept that community-developed standards are pivotal to structure and enrich the annotation of •  entities of interest (e.g., genes, metabolites, phenotypes) and •  experimental steps (e.g., provenance of study materials, technology and measurement types)
  14. 14. Community mobilization to develop standards, e.g.: use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information
  15. 15. Is this general mobilization good or bad? use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information§  Fragmentation of the standards is a major issue •  Being focused on particular communities’ interests, be their individual technologies or biological/biomedical disciplines, leads to duplication of effort, and more seriously, the development of (largely arbitrarily) different standards •  This severely hinders the interoperability of databases and tools and ultimately the integration of datasets
  16. 16. Growing number of reporting standards MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  17. 17. Growing number of reporting standards + 303 + 150 + 130 Source: MIBBI, Source: BioPortal EQUATOR Estimated Databases, annotation, curation tools MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  18. 18. But how much do we know about these standards Which tools and I use high throughput databases sequencing technologies, implement which which one are applicable standards? to me? How can I get What are the involved tocriteria to evaluate propose their status and extensions or value? modifications? Which one are I work on plants, mature enough for are these just for me to use or biomedical recommend? applications?
  19. 19. A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation) Field*, Sansone* et al., Omics data sharing. Science19 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone 326, 234-36 (2009) doi:0.1126/science.1180598 www.ebi.ac.uk/net-project
  20. 20. •  A coherent, curated and searchable catalogue of data sharing resources•  Bioscience standards and associated data-sharing policies, publications, tools and databases•  Assessment criteria for usability and popularity of standards•  Relationships among standards•  Encouragement for communication & interaction among groups•  Promoting interoperability & informed decisions about standards
  21. 21. Social engineering22 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  22. 22. Ownership of open standards can be problematic in broad, grass-root collaborations; it requires improved models, to encourage maintenance of and contributions to these efforts, supporting their evolutions23 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  23. 23. The extensive community liaison needs to be managed and funded; rewards and incentives need to be identified for all contributors24 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  24. 24. The cost of implementing a standards-supported data sharing vision is as large as the number of stakeholders that must operate synchronously25 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  25. 25. Funders are actively developing data policies
  26. 26. Similar trend in the regulatory arena…
  27. 27. … and in the commercial sector
  28. 28. ….the rise of data-driven journals, e.g.: partnering with:
  29. 29. core organization in the UK node work in progress UK Node
  30. 30. reasoning visualization analysis browsing integration exchange retrievalCommunity SoftwareStandards Tools Well-annotated & Structured Data Reproducible & Reusable Bioscience Research
  31. 31. An exemplar approach to the status quo§  A grass-root collaborative that works to facilitate collection, curation and sharing of experiments using a common, structured representation of the experiments that •  transcends individual biological and technological domains and •  can be ‘configured’ to implement (several of) the community standards
  32. 32. metadata tracking framework user communityThe International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  33. 33. collection, curate and sharing of bioscience experiments
  34. 34. A growing ecosystem of over 30 public and internal resources using theISA metadata tracking framework to facilitate standards-compliantcollection, curation, management and reuse of investigations in anincreasingly diverse set of life science domains, including:•  environmental health •  stem cell discovery•  environmental genomics •  system biology•  metabolomics •  transcriptomics•  metagenomics •  toxicogenomics•  nanotechnology •  also by communities working to build•  proteomics, a library of cellular signatures TOWARDS INTEROPERABLE BIOSCIENCE DATA Feb 2012 Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios J, Hide W.
  35. 35. Implementations at Harvard Importance of a local community
  36. 36. Implementations at Harvarddata sharing in ISA-Tab Importance of a local community
  37. 37. Implementations at Harvarddata sharing in ISA-Tab Importance of a local community
  38. 38. Implementation at the EBI40 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  39. 39. Extensions of the Nanotechnology Informatics Working Group41 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  40. 40. We must increase the level of annotation Notes in Lab Books Spreadsheets and Tables Facts as RDF statements (information for humans) ( the compromise) (information for machines)•  Invest in curating and manage data at the source using: •  a common metadata tracking framework, such as ISA •  publicly available and community-developed terminologies •  recording sufficient contextual information of the experimental steps§  Progressively datasets will become more comprehensible, interoperable, reproducible and (re)usable, underpinning future investigations
  41. 41. Collaborative approaches are highly valuable but take timeCommunity involvement and uptake!1st ISA-Tab workshop! 3rd ISA-Tab workshop! User workshops/visits - start! 1st public instance: ! 2nd ISA-Tab workshop! Other tools implement ! Harvard Stem Cell ! Growing number of ISA-Tab! Discovery Engine! systems starts to adopt ISA framework!Core developments! Conversions to ! Links to Pride-XML/SRA-XML/! analysis toolsStrawman ISA-Tab spec! ISA software v1! MAGE-Tab and more! starts! Final ISA-Tab spec! Database instance ! at EBI! RDF format starts!Publications! Stem Cell ! ISA-Tab and ! Discovery ! ISA Commons! Omics data sharing! Workshop reports! ISA software suite! Engine! (Science)! (Nature Genetics)! (Bioinformatics)! (NAR)!2007 2008 2009 2010 2011 2012Development timeline
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×