Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Publishing perspectives on data management & future directions

308 views

Published on

Professor Virginia Barbour's presentation at the Research Integrity Advisor Data Management Workshop, Brisbane, 31 March 2017.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Publishing perspectives on data management & future directions

  1. 1. Publishing perspectives on data management & future directions Research Integrity Advisors Data management workshop Friday 31 March Virginia Barbour Director, AOASG ORCID: 0000-0002-2358-2440 ginny.barbour@qut.edu.au
  2. 2. My roles Director, Australasian Open Access Strategy Group Chair, Committee on Publication Ethics (COPE) Editor PLOS Medicine, then Editorial Director, PLOS 2004 - 2015 Involved publishing initiatives, including AllTrials, reporting guidelines Joint appointment between Office of Research Ethics and Integrity, and Division of Technology, Information and Library Services, QUT
  3. 3. Journals’ interest in data Background https://commons.wikimedia.org/wiki/File:Network-mapping.gif https://commons.wikimedia.org/wiki/File:Question_mark_1.svg Motives Practicalities
  4. 4. Background https://commons.wikimedia.org/wiki/File:Network-mapping.gif https://commons.wikimedia.org/wiki/File:Question_mark_1.svg
  5. 5. “Today, the CMS Collaboration at CERN has released more than 300 terabytes (TB) of high-quality open data. These include over 100 TB, or 2.5 inverse femtobarns (fb−1), of data from proton collisions” This is the age of data https://www.flickr.com/photos/jerry-raia/13522426525/in/photostream/
  6. 6. Journals may see the problem first, but they are not the source of the problem https://www.flickr.com/photos/studiomiguel/3946174063
  7. 7. Data are often an issue in ethics cases
  8. 8. Cases can be complex
  9. 9. Classification of COPE cases, 1997-2012. Categories with >7 instances in a 4-year period
  10. 10. Increasing number of cases relevant to data Data • Top: over 16yr - fabrication 17%, selective/misleading reporting/interpretation 13%; • High: 2009-12 – unauthorized use & image manipulation Correction of the literature • retractions 47%, corrections 27%, expressions of concern 11%, disputes 9%, corrigenda & errata 6%
  11. 11. Poor data management scuppers research: a case study “Dear Editor In xxx, yyy published my colleagues’ and my article . Since the manuscript’s publication, we have been working on other, unrelated studies using the same database. When results in these new, unrelated studies were implausible, I undertook an intensive, several weeks-long investigation … I found we had failed to load 8 files of data into the dataset. This mistake resulted in the under- reporting of xxx … this mistake occurred despite the intensive quality checks we have in place to ensure data quality and accuracy. We sincerely apologize for these data issues and are committed to correcting the article…”
  12. 12. Poor data management leads to accusation of research misconduct: a case study A student submitted a paper to a journal as part of his PhD work. The research was data heavy – it was based on digital scans of cell images. The paper was published. Six months later a reader noted an anomaly, asked the journal for the underlying data, who in turn asked the author. The PhD student had moved on. None of his data had been stored securely at his previous institution and it could not be found. The journal felt that the lack of availability of data meant that the paper was unreliable and asked the institution to investigate whether misconduct had occurred. In the investigation it turned out that the student had asked repeatedly for a place to store his data but the university had not been able to provide one. The university accepted responsibility and the investigation led to the development of a policy on data management there. The student was exonerated.
  13. 13. Motives https://commons.wikimedia.org/wiki/File:Network-mapping.gif https://commons.wikimedia.org/wiki/File:Question_mark_1.svg
  14. 14. Institutions want data managed Journals want data published
  15. 15. From: How Does the Availability of Research Data Change With Time Since Publication? Timothy H. Vines and colleagues, Abstract (podium), Peer Review Congress, 2013 15
  16. 16. Do some research Write a narrative description that is inextricably linked to the data and methods Integrated collection of methods, results, data, metadata Store all data in accessible, usable format, link to publication Facilitate re-use & replication by people or machines The ideal situation
  17. 17. What we often have at journals • Unextractable data • Everything “extra” in one (unreadable) file • Third party licenses • Proprietary data • No metadata 17
  18. 18. Data availability in research papers allows Replication Validation New analysis Better interpretation Inclusion in meta-analyses Facilitation of reproducibility of research Closer scrutiny of published work Better ‘bang for the buck’ out of research investment
  19. 19. Practicalities https://commons.wikimedia.org/wiki/File:Network-mapping.gif https://commons.wikimedia.org/wiki/File:Question_mark_1.svg
  20. 20. “The evidence shows that the current research data policy ecosystem is in critical need of standardization and harmonization” How many journals have a research data policy? 52.4 23.2 23.2 All Journals 64.8 14.4 18.4 Science Journals 40 32 28 Social Science Journals Full Policy Partial Policy No Policy Data source: Linda Naughton, JISC Journal Research Data Policy Bank project presentation (n = 250) Iain Hrynaszkiewicz
  21. 21. Different levels of openness in research data publishing: 1. Accessible only to an individual researcher/group 2. Accessible to others on (reasonable) request 3. Published as electronic supplementary material 4. Deposited in a general or institutional data repository (e.g. figshare) 5. Deposited in a subject/community specific data repository Not all research data are Open Data More open
  22. 22. Wiley data sharing survey 2886 responses (3.2% response rate) – 52% had shared/published data Data publishing • 67% via supplementary material in journals • 28% via an institutional repository • 19% use a discipline-specific data repository • 6% use a general-purpose repository, such as Dryad or figshare Data sharing (informal) • 57% sharing at a conference • 42% sharing on request via email, direct contact, etc. • 37% via personal, institutional, or project website Are researchers sharing research data? Slide from Iain Hrynaszkiewicz
  23. 23. Data management is largely regarded by academics as: • Boring • Waste of time • Expensive • Hard • Confusing
  24. 24. They need to be persuaded that it is: • Boring • Waste of time • Expensive • Hard • Confusing • Part of the job • Time saving • Cost effective • Easy • Rewarded
  25. 25. • Content types e.g. data articles and journals • Credit and incentives e.g. data citation and data articles • Encouraging reuse e.g. open licenses • Improving data quality e.g. data peer review, community standards and repositories • Data discoverability e.g. repository partnerships, linking, integration with submission systems and research data metadata • Raising awareness e.g. editorials, outreach • Guidance e.g. information for authors • Policy – and its implementation What are publishers doing about it? Iain Hrynaszkiewicz
  26. 26. Journal data policy landscape • Nothing stated • Data sharing encouraged • Data sharing implied as a condition of submission/publication with mandates for specific data types (eg Nature pre -2016) • Mandated data availability statements in every paper and mandates for specific data types (Royal Society, BioMed Central, Palgrave Communications, Nature 2016 – ) • Mandated data sharing for all, with exceptions, with statement in paper (PLOS, BMJ) • Mandated data sharing for all with statement & link to data (e.g. American Economics Rev) • Mandated open data and data citation as a condition of submission (e.g. F1000Research) STRONGER Adapted from Iain Hrynaszkiewicz
  27. 27. “PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception”
  28. 28. References • Naughton, L. & Kernohan, D., (2016). Making sense of journal research data policies. Insights. 29(1), pp.84–89. DOI: http://doi.org/10.1629/uksg.284 • Lin J, Strasser C (2014) Recommendations for the Role of Publishers in Access to Data. PLoS Biol 12(10): e1001975. doi:10.1371/journal.pbio.1001975 • Hrynaszkiewicz I, Li P, Edmunds SC. Open science and the role of publishers in reproducible research. In: Stodden V, Leisch F, Peng, RD, editors. Implementing Reproducible Research. CRC Press; 2014. Public (https://osf.io/35s9d/) • https://scholarlykitchen.files.wordpress.com/2014/11/researcher-data-insights-infographic-final.pdf

×