Susanna Sansone at DataCite: The ISA-Commons - experiences from the field
Upcoming SlideShare
Loading in...5
×
 

Susanna Sansone at DataCite: The ISA-Commons - experiences from the field

on

  • 865 views

Susanna-Assunta Sansone's talk at the DataCite Summer meeting in Copenhagen on "The ISA-Commons - experiences from the field", 14th June 2012

Susanna-Assunta Sansone's talk at the DataCite Summer meeting in Copenhagen on "The ISA-Commons - experiences from the field", 14th June 2012

Statistics

Views

Total Views
865
Views on SlideShare
863
Embed Views
2

Actions

Likes
0
Downloads
3
Comments
0

1 Embed 2

https://twitter.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Susanna Sansone at DataCite: The ISA-Commons - experiences from the field Susanna Sansone at DataCite: The ISA-Commons - experiences from the field Presentation Transcript

  • bioscienceThe ISA Commons: experiences from! field the Susanna-Assunta Sansone, PhD Principal Investigator, Team Leader, University of Oxford e-Research Centre, Oxford, UK http://uk.linkedin.com/in/sasansone #biosharing DataCite Summer Meeting DIGITAL RESEARCH DATA IN PRACTICE: solutions for improving discovery, access and use June 14, 2012 Copenhagen
  • •  Reproducible research •  annotated research data and methods offer new discovery opportunities and prevent unnecessary repetition of work; •  improved data sharing underpins science of the future; •  but !.. shared data have little or no value if they are not interpretable and, consequently, reusable Image from datacite.org
  • Reproducibility Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 3! 149-55 (2009) doi:10.1038/ng.295
  • Reproducibility Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 4! 149-55 (2009) doi:10.1038/ng.295
  • Reproducibility Ioannidis et al., Repeatability of published microarray gene expression analyses. Nature Genetics 41(2), 5! 149-55 (2009) doi:10.1038/ng.295
  • Reproducibility 6! 6!
  • Across studies and groups 7! 7!
  • Reproducibility 8! 8!
  • NO to ‘data blobs’YES to verifiable, completeand structured information Image from datacite.org
  • Structured description of datasets !  Capture all salient features of the experimental workflow !  Make annotation explicit and discoverable !  Structure the descriptions for consistency, tracking !  independent variables !  dependent variables using !  cross reference and resolvable identifiers
  • Not too much, not too little, just ‘right’ !  We must strike a balance between •  depth and breadth of information; and •  sufficient information required to reuse the data
  • Example of experiments by InnoMed PredTox12 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 a FP6 public-private consortium Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • Different community, different norms and standards, e.g.: use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information Challenges: lack of coordination, fragmentation and uneven coverage
  • Growing number of reporting standards + 303 + 150 + 130 Source: MIBBI, Source: BioPortal EQUATOR Estimated MAGE-Tab! AAO! MIAME! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO!ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  • A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation) Field*, Sansone* et al., Omics data sharing. Science15 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone 326, 234-36 (2009) doi:0.1126/science.1180598 www.ebi.ac.uk/net-project
  • A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation) Field*, Sansone* et al., Omics data sharing. Science16 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone 326, 234-36 (2009) doi:0.1126/science.1180598 www.ebi.ac.uk/net-project
  • Bioscience is not one domain! &+.!&* +,-* /("* !"#$%&()*!  Bioscience is interdisciplinary and integrative in character •  need to deal with new and existing datasets •  deal with a variety of data types Source of the figure: EBI website
  • Is it possible to achieve a common, structuredrepresentation of diverse bioscience experiments that:•  transcends individual bioscience domains, but also•  follows the appropriate community norms and standards?
  • A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards- compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: •  environmental health •  stem cell discovery •  environmental genomics •  system biology •  metabolomics •  transcriptomics •  metagenomics •  toxicogenomics •  nanotechnology •  also by communities working to build •  proteomics, a library of cellular signatures We aim to achieve a commonrepresentation of experimental content thattranscends individual bioscience domains Sansone et al., Towards interoperable bioscience data. Nature Genetics 44, 121-126 (2012) doi:10.1038/ng.1054
  • A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards- compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: •  environmental health •  stem cell discovery •  environmental genomics •  system biology •  metabolomics •  transcriptomics •  metagenomics •  toxicogenomics •  nanotechnology •  also by communities working to build •  proteomics a library of cellular signatures Some of the public groups/resources: Some of the internal projects: Stem Cell Commons NanotechnologyInformatics Working Group
  • A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards- compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: •  environmental health •  stem cell discovery •  environmental genomics •  system biology •  metabolomics •  transcriptomics •  metagenomics •  toxicogenomics •  nanotechnology •  also by communities working to build •  proteomics a library of cellular signatures Some of the public groups/resources: Some of the internal projects: Stem Cell Commons NanotechnologyInformatics Working Group
  • Metadata tracking framework, designed tosupport the use us several standardschecklists, terminologies conversions to(a growing number of) other metadataformats, used by public repositories, e.g. MAGE-Tab Pride-xml SRA-xml SOFTCurrently finalizing conversion to RDF toexplore the growing Linked Data universe,in collaboration with the W3C HCLSIG)
  • empowering researchers to use standards To mint DOIs23 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • TOWARDS INTEROPERABLE BIOSCIENCE DATA doi:10.1038/ng.1054 Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Feb 2012 Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, www.biosharing.org www.isacommons.org Wolstencroft K, Xenarios J, Hide W. www.isacommons.orgCommunity involvement and uptake!1st ISA-Tab workshop! 3rd ISA-Tab workshop! User workshops/visits - start! 1st public instance: ! ! 2nd ISA-Tab workshop! Other tools implement ! Harvard Stem Cell ! Growing number of ISA-Tab! Discovery Engine! systems starts to adopt ISA-Tab!Core developments! Conversions to ! Links to Pride-XML/SRA-XML/! analysis toolsStrawman ISA-Tab spec! ISA software v1! MAGE-Tab and more! starts! Final ISA-Tab spec! Database instance ! at EBI! RDF format starts!Publications! Stem Cell ! ISA-Tab and ! Discovery ! ISA Commons! Omics data sharing! Workshop reports! ISA software suite! Engine! (Science)! (Nature Genetics)! (Bioinformatics)! (NAR)!2007 2008 2009 2010 2011 2012Development timeline!