Enhance your research impact through open science

Enhance your research impact
through open science
Gareth Knight
Research Data Manager
Library & Archives Service
researchdatamanagement@lshtm.ac.uk

Open Science
A broad movement that seeks to improve the quality of
research through greater:
• Transparency: Ensure methods are clearly explained and made
available earlier
• Consistency: Common standards, tools and services are used to
perform analysis.
• Collaboration: Opportunities are available for external
contribution & collaboration on research
• Access: All resources necessary to recreate the analysis are
made available in a form that enable verification & reuse
(Summary: it’s science with the benefit of 21st century tools)

Reproducibility Crisis
Vimes et al (2014) investigated data availability for 516 articles
published 2-22 years previous – odds of a dataset being
obtainable fell by 17% per year
A 2016 Nature survey revealed 52% of 1,576 surveyed researchers
considered there to be a 'significant' reproducibility crisis in
science.
• Approx. 68% of respondents failed to reproduce medical experiment.
Research replication is time-consuming and expensive
• Cancer Biology: https://osf.io/e81xl/wiki/home/
• Psychological Science - https://osf.io/ezcuj/wiki/home/
Retraction Watch lists 18,000+ papers that have been retracted,
many as a result of faulty science
Vimes et al (2014) https://doi.org/10.1016/j.cub.2013.11.014
Nature (2016) https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970

What are the benefits of open science?
Analysis of open research practices and motivations of
583 Wellcome & 259 ESRC funded researchers:
• Improved visibility of research
• More publications
• Higher citation rate – See Piwowar & Vision (2013)
• Contribute to academic profile
• Career benefits (e.g. promotion)
• New collaborations
Van den Eynden, V. et al. (2016) Towards Open Research: Practices, experiences, barriers and
Opportunities. Wellcome Trust. https://doi.org/10.6084/m9.figshare.4055448
Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage.
https://doi.org/10.7717/peerj.175

Open Science by Design
Plan
Collect
ManageAnalyse
Publish
https://www.flaticon.com/free-icon/scientist_857648
Enhanced
Research
standards
Enhanced
Research
standards
Open
Education
Resources
Open
Education
Resources
Open
software
Open
software
Citizen
Science &
peer review
opportunities
Citizen
Science &
peer review
opportunities
Open accessOpen access
Reusable
resources
Reusable
resources

Research Objectives
Research is reviewed for many purposes:
• Verification: check analysis to confirm conclusions are valid
• Replicate: Same methods applied to get same result, different
environment
• Reproduce: Same methods applied, different setup
• Reuse: same data, different research
What steps do you take to ensure research is easier to
validate/replicate/reproduce or reuse by others?
The Difference
https://xkcd.com/242/

Plan for openness from the outset
Plan
Be aware of
requirements
Consider
community
engagement
opportunities
Document
research
protocol &
publish
Data
collection
Inform
participants and
relevant
stakeholder
Acquire raw
data in
electronic form
using secure
systems (e.g.
ODK)
Data
Management
Organise
resources
logically
Ensure raw data
is read only
Assign unique
IDs to relevant
items
Data
processing
Automate
processing
activities (as far
as possible) in
an open format
to enable it to
be re-applied
Document
activities
performed to
ensure an audit
trail
Data analysis
Provide
opportunities
for relevant
individuals to
contribute
Store resources
used to
underpin
analysis (inc.
that used to
produce
graphs)
Reporting
Consider how
resources can
be made
accessible
Ensure
resources are
curated &
accessible in
the long-term
https://doi.org/10.1371/journal.pcbi.1003285

Openness requirements
Research practice
• Demonstrate rigour of research
Funder requirements:
• Gold vs. Green
• Publication status, research data, other outputs
Domain-specific reporting guidelines:
• For study protocol and project outputs
https://www.equator-network.org/
Journal policies:
• Transparency and Openness Promotion (TOP)
https://cos.io/our-services/top-guidelines/
• Joint Data Archiving Policy (JDAP)
https://datadryad.org//pages/jdap https://cos.io/prereg/

Storage and organisation
• Ensure project resources are stored in a location that is
secure and available to relevant parties
• Can you find files from a project completed 10 years ago?
• Store on Secure Server or other defined location
• Adopt a consistent structure to organise & label content
• Content type (data, documents, code)
• Version (raw, processed)
• Sensitivity – store personal info in secure locale
• Create a file inventory spreadsheet
• Filename, location, content, source, sensitivity, etc.
https://xkcd.com/1459/

Tidy data
Common issues:
• Column headers contain values
• Multiple variables held in 1
column.
• Variables held in both rows and
columns.
• Multiple types of observation
recorded in the same table.
Wickham applies 3rd Normal Form:
• One row for each observation
• One column for each variable
• One table for each type of
observation
• Column headers (where they are
used) should be variable names,
Tidy data tools:
tidyr, dplyr, ggplot2, data.table, pandas
A set of principles to make data more consistent
https://www.jstatsoft.org/article/view/v059i10/v59i10.pdf

Documentation & metadata
What info is needed to replicate or re-apply your analysis?
What info is needed to analyse and use your data?
User guide:
• Study design and data collection methods
• Data Analysis and Preparation
• Quality checks applied
Codebook:
• Variable type (Continuous, Ordinal, Categorical,
Missing values, censored/redacted)
• Permitted responses & their meaning (what is 1?)
• Abbreviations & phrases
• Research protocols
• Standard Operating Procedures
• Codebooks & data dictionaries
• Informed Consent form &
participant information sheet
• Questionnaires, interview
guide and other collection tools
• Data papers and other
publications
• Other relevant documents
http://www.dcc.ac.uk/resources/metadata-standards

Working with code and scripts in workflows
• Use ‘open’ programming/scripting languages not dependent upon
proprietary software
• Don’t reinvent the wheel: reuse existing code if it serves purpose
• Don’t update the source data, generate a derived file & label the version
no.
• Ensure a header to code files that explains their purpose and indicate
who created it & when
• Add comments throughout code explaining purpose of functions/specific
lines (if not obvious)
• Document dependencies, including version number

Providing access to resources
What do you
make available?
Anonymised data
Code
Research tools
Workflows
When do you
make it available?
-
During the project
lifetime
On publication of
findings
Within 6-12 months of
publication
Where do you
host it?
What platforms are
appropriate to your
needs?
How will access
be provided?
Open vs. controlled access
Need a reason
Participant consent, identifiable
-
How will it be managed?
Corresponding author,
Data Access Committee,
Data Sharing Agreement
https://www.flickr.com/photos/lwr/3897479560
https://www.flickr.com/photos/ryanr/142455033/

Data sharing principles
Publish a description
in a research catalogue
Obtain a permanent ID
to make it easy to cite
Provide clear method to
obtain files – open vs.
safeguarded
Handle access consistently
(PLOS req.)
Use recognised domain
standards & vocabularies
Common formats, e.g.
STATA, CSV
Apply clear usage licence -
Creative Commons or other
Provide documentation
relevant to researchers in
your field
The FAIR Guiding Principles for scientific data management and stewardship

Resource management tools
Functionality:
• Lifecycle management
• Object & version identifiers
• Workflow description standards that balance generic &
domain specific needs (E.g. DDI lifecycle, BPM variants)
Platforms:
• Electronic Lab Notebooks (Rspace, SciNote, LabArchives
• Code hosting: My Experiment, runmycode, Github/lab
• Repository platforms: OSF, Data Compass

Analysis and reporting tools
Growing number of online tools allow you to
create and share interactive documents that
contain live code, data, and other resources
• R Markdown - https://rmarkdown.rstudio.com/
• Jupyter - http://jupyter.org/
• Collaboratory https://colab.research.google.com/
• Benefits:
• Dynamic content that combines data & analysis
• Development environment - R, Python SQL.
• Disadvantages:
• Another complex platform to host & manage
• Content will become publicly accessible
Images sourced from project webpages

In summary
Open science requires you to consider:
• Research stakeholders who will be interested in
your work
• The value of research outputs for verification and
further use
• Systems that will be used to collect, manage,
analyse and provide access to research
https://www.flickr.com/photos/keith_marshall_avery/8132240925/

Enhance your research impact through open science

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Enhance your research impact through open science

Similar to Enhance your research impact through open science (20)

More from London School of Hygiene and Tropical Medicine

More from London School of Hygiene and Tropical Medicine (20)

Recently uploaded

Recently uploaded (20)