Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways.

2,179 views

Published on

Prepared and presented by Jo McEntyre (EMBL_EBI) as part of the Reproducible and Citable Data and Models Workshop in Warnemünde, Germany. September 14th - 16th 2015.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways.

  1. 1. Citing data in research articles: principles, implementation, challenges - and the benefits of changing our ways Jo McEntyre Europe PMC, EMBL-EBI www.ebi.ac.uk
  2. 2. Life Science Data
  3. 3. Familiar Complexity! Article‘Package’ExternalResources “Recognized” data repos: file|structured record, Accession|DOI|API+ Accession Institutional repos: file|structured record, URL|DOI|API+Accession Author database|‘website’: file|struct record, URL|DOI|API+Accession Supp info tables/data: file, URL|DOI Cross-reference Dataset list Ref to external resRef to external res Reference list Fig Source data: file, URL|DOI Fig (caption + graphic) Cross-reference Ref to external resource Adapted from Thomas Lemberger, EMBO
  4. 4. Europe PMC literature database Europe PMC • Abstracts: 30 million • Full-text articles: 3 million • Article citation counts • Grants • ORCIDs • Semantic annotation • Data citations • Data integration Europe PMC is a member of the PMC International Collaboration. Funded by 28 European funders of life science research
  5. 5. About EMBL-EBI • Part of the European Molecular Biology Laboratory • International, non-profit research institute • Europe’s hub for biological data services and research
  6. 6. Making data discoverable Labs around the world deposit data and we… Archive it Classify it Share it with other data providers Analyse, add value and integrate it …provide tools to help researchers use it A collaborative enterprise
  7. 7. Journal Data Publishing
  8. 8. Data Citation in Europe PMC full text Literature* Added-Value Submitted *OMIM, Clinical trials, GO Submission statements vs reuse? 260K
  9. 9. Data Citation Principals Engender Two Big Ideas "sound, reproducible scholarship rests upon a foundation of robust, accessible data" "data should be considered legitimate, citable products of research" These slides are adapted from: http://www.slideshare.net/joanstarr/data-citation-a-joint-declaration-
  10. 10. 1 Importance 2 Credit and Attribution 3 Evidence 4 Unique Identification 5 Access 6 Persistence 7 Specificity and Verifiability 8 Interoperability and flexibility Full Principles: https://www.force11.org/datacitation Joint Declaration on Data Citation Principles
  11. 11. Joint Declaration Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications. 1. Importance
  12. 12. Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data. 2. Credit and Attribution Joint Declaration
  13. 13. In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited. 3. Evidence Joint Declaration
  14. 14. A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community. 4. Unique identification etc.. !!! Joint Declaration
  15. 15. Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data. 5. Access Joint Declaration
  16. 16. Unique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe. 6. Persistence Joint Declaration
  17. 17. Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited. 7. Specificity and Verifiability Joint Declaration
  18. 18. Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities. 8. Interoperability and flexibility Joint Declaration
  19. 19. Many organizational endorsements
  20. 20. An implementation example Principle 2: Credit and Attribution Principle 4, 5, 6: Unique ID Access Persistence Principle 7: Specificity and Verifiability Principle 8: Interoperability and flexibility Creators, Year, Dataset Title, DOI, Data Repository, version (Resolves to landing page with access to metadata, docs, and data) Slide from Mercè Crosas, Ph.D. Harvard University
  21. 21. http://europepmc.org/articles/PMC3089613 Large dataset:
  22. 22. http://europepmc.org/articles/PMC3535838
  23. 23. http://europepmc.org/articles/PMC3766260
  24. 24. http://europepmc.org/articles/PMC3704603
  25. 25. http://europepmc.org/articles/PMC3710810 Fig. 2
  26. 26. !! 2469 references !! http://europepmc.org/articles/PMC2672098
  27. 27. Examples of Implementations of Data Citations in Reference Lists
  28. 28. http://europepmc.org/articles/PMC3661987 <mixed-citation publication-type="other"> Occurrence in reference list: Occurrence in text: Tagged in reference list as:
  29. 29. http://europepmc.org/articles/PMC3646594 <mixed-citation publication-type="thesis"> Occurrence in text: Occurrence in reference list: Tagged in reference list as:
  30. 30. http://europepmc.org/articles/PMC3722494 <mixed-citation publication-type="webpage"> Also in this reference list: a non-DOI data citation Occurrence in text: Occurrence in reference list: Tagged in reference list as:
  31. 31. http://europepmc.org/articles/PMC3626513 <mixed-citation publication-type="journal"> Occurrence in text: Occurrence in reference list: Tagged in reference list as: Cite data generated in the course of the work described?
  32. 32. JATS support for data citation <mixed-citation publication-type='data'> <name><surname>Heinz</surname><given-names>D.W.</given- names></name>, <name><surname>Baase</surname><given-names>W.A.</given- names></name>, <etal>et. al.</etal> <data-title>How amino-acid insertions are allowed in an alpha-helix of T4 lysozyme</data-title>. <source>PDB Europe</source>, accession <pub-id pub-id-type='accession' assigning- authority='pdb' xlink:href='http://www.ebi.ac.uk/pdbe/entry/search/index?te xt:102L'>102l</pub-id>. <pub-id pub-id-type='doi' xlink:href='http://dx.doi.org/10.2210/pdb102l/pdb'>10.2210/ pdb102l/pdb</pub-id> </mixed-citation>
  33. 33. Minimal, maximal & extensible citation Resource name I D Resource name Resolution ‘template’ I D Author list Resource name Resolution ‘template’ I D Tim e ? Author list Resource name Resolution ‘template’ I D Tim e ? For example: new data vs pre-existing data For example: version Thomas Lemberger, EMBO
  34. 34. Integrated Research Reused from: seier+seier, Flickr Reused from: Images Money, Flickr Articles Data People Institutions Funders
  35. 35. A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community. 4. Unique identification etc.. Joint Declaration
  36. 36. 1. Discoverability through accessibility • Deposit in a public/open database • Where possible, structured archive (e.g. PDB, ENA) >> unstructured archive (e.g. Zenodo, Figshare) • Uniquely identify it: PID, Accession number, DOI, ROI • Give it context: metadata (and more) • All of the above = citable =
  37. 37. 2. Discoverability through structured data structured data is one of the true enablers of life science - Discovery of homology between genes across species - Predicting function based on protein folds • Structured data can be cross-analysed, compared by algorithm, and encourages development of new products and tools
  38. 38. Structured data is good value for money Annual cost of generating new protein structure data in labs around the world Annual cost of maintaining it in a central database
  39. 39. Degrees of Data Unstructured/semi- structured Structured Added Value Metadata A picture of a graph A spreadsheet of my results A record in a DNA sequence database A graphical display of a genome A narrative with citations, pictures and attachments Article
  40. 40. Metadata – critical to discoverability Generic: title, submitters, date, file format, version. citation basic search Wagner F.F., 23-APR-2002, TPA: Homo sapiens SMP1 gene, RHD gene and RHCE gene, INSDC, 14-NOV-2006 (Rel. 89, Last updated, Version 7). BN000065 Specific: organism, tissue, assay, page number … deep search analysis computation
  41. 41. BioStudyEBI BioStudy database for unstructured data Study Publications Ontologies Data files Other DBs Metadata Other DBs
  42. 42. Elixir: An international distributed infrastructure for • Data • Standards • Tools • Compute • Training • Industry
  43. 43. THE END

×