SlideShare a Scribd company logo
www.postersession.com
Reproducibility in research is the ability to replicate the ultimate product of academic
research to reproduce the results and build on the research. The main entities of academic
research are data, scripts/software for processing and analysis, workflow of the research
process, and research output (Figure 1). Documenting workflow, data, and code during the
active phase of the scientific research is important for communication of the scholarship and
replication of the results. When researchers submit scientific papers or build on their work,
they face the challenge of having to remember all the details of their own work if they
haven't included well documentation for this work. In order to sustain and ensure the
integrity of reproducibility in the scientific research and advance the scientific research
process, this poster presents guidelines for researchers that help them to manage the
research entities during the active phase of the research process.
A Guide for Reproducible Research
Yasmin AlNoamany
University of California, Berkeley
yasminal@berkeley.edu
Introduction
The main entities of the scientific research
Research Software – source code or executables that researchers generate or integrate
into the workflow of the scientific research.
What to document:
Good practices in managing your software:
•  Custom scripts to automate research analysis.
•  Attach examples of how the code works.
•  Generate a list of all scripts, how to run them, and in what order.
•  Use tools that capture the experimental environment, such as Docker and ReproZIP.
•  Use metadata standards for each generated module. Each module should have at least
the following:
Ø  Name of the module
Ø  Name of the project
Ø  Name of Author
Ø  Input and Output
Ø  Purpose of the Module
Ø  A brief Description
Naming files should be descriptive and consistent!
Tools
•  Docker
•  Apache Ivy
Research Software
•  The experimental environment – e.g.,
hardware, operating system
•  The computing platform and
prerequisites
•  Scripts and libraries
•  Input and output parameters
•  The functionality of each script
•  Dependencies of the software
indicating versions
•  The structure of the code/software and
details about individual components
Scientific paper(s) along with graphs/tables – document(s) that contains the results of
the scientific research as well as all the assorted graphs and tables. This could be:
•  Compiled files (e.g., pdf)
•  Source files (e.g., .tex files, figures, .bib file)
•  Packages/libraries/styles installed (e.g., graphics)
•  Graphs and tables
Good practices in managing output files:
•  Document the environment and the file structure.
•  Track versions of produced papers, graphs, etc.
•  Document any problem that faces you with the computing environment.
•  Backup your files every while.
•  Save your files on Dropbox or any other cloud storage to keep track of your
versions.
•  For writing your manuscript, use Latex and Bibtex for these reasons:
Ø  Latex is free and open source.
Ø  A .tex file can be edited in any text editor.
Ø  The content is separated from style.
Ø  With a couple of line and style files, you can convert how your pdf looks.
Ø  Latex allows preserving your files longer time.
Ø  The output document looks better.
Naming files should be descriptive and consistent!
Tools
•  Latex
•  Bibtex
Research Output
Data
Data – files that were used or produced during the scientific research process. These files
can be raw data or different versions of processed data.
Good practices in managing data:
•  Include a README file in the directory that has the data.
•  Write a data management plan, which has become a requirement by funding agencies.
•  Provide a detailed description of the data, data source(s), and how it will be used.
•  Provide a description to the process of capturing the data.
•  Describe all the steps of data preprocessing.
•  Provide a description and information about each new version of the data.
•  Provide details about the software/code that is used for preprocessing the data.
•  Adapt metadata standards for describing the data.
•  Backup your files every while.
Naming files should be descriptive and consistent!
Tools
•  DMPTool
•  DASH
•  Figshare
•  EZID
•  Box and Drive
•  Merritt repository
Source: http://data-archive.ac.uk/create-manage/life-cycle
References
1.  AlNoamany, Yasmin. "How to make your research reproducible”, http://guides.lib.berkeley.edu/reproducibility-guide,
(2017).
2.  Stodden, Victoria. "Enabling reproducible research: Open licensing for scientific innovation." (2009).
3.  Bailey, David H., Jonathan M. Borwein, and Victoria Stodden. "Facilitating reproducibility in scientific computing:
Principles and practice." Reproducibility: Principles, Problems, Practices, and Prospects (2014): 205-232.
4.  Stodden, Victoria, et al. "Enhancing reproducibility for computational methods." Science 354.6317 (2016): 1240-1241.
Workflow
Workflow documentation – detailed steps of the workflow
that capture the process of the scientific research.
•  Weekly/daily notes on the project's stages
•  Documentation for the steps of the workflow
For managing the research workflow, document:
•  The steps of the research starting from the design till
fetching the data till producing graphs and tables in the
scientific output.
•  All adopted libraries and integrated algorithms.
•  All citations and information of code and data used.
•  The input and the output of each step.
Electronic Notebooks, such as Jupyter help documenting the workflow!
Tools
•  Jupyter
•  knitr
•  Overleaf
•  ShareLatex
•  GitHub
•  Zenodo
Sponsored in part through grants from the Alfred P. Sloan Foundation #G-2014-13746 and from the National Science
Foundation NSF ACI #1349002

More Related Content

What's hot

Intro to Reproducible Research
Intro to Reproducible ResearchIntro to Reproducible Research
Intro to Reproducible Research
C. Tobin Magle
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014Susanna-Assunta Sansone
 
Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practice
C. Tobin Magle
 
Research Data Management for Qualitative Researchers
Research Data Management for Qualitative ResearchersResearch Data Management for Qualitative Researchers
Research Data Management for Qualitative Researchers
Celia Emmelhainz
 
Crosslinks
Crosslinks Crosslinks
Crosslinks
ericmeeks
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
Context in context: applying a context-driven approach in an academic library
Context in context: applying a context-driven approach in an academic libraryContext in context: applying a context-driven approach in an academic library
Context in context: applying a context-driven approach in an academic library
Kathleen Fear
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
Carole Goble
 
Upgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureUpgrading the Scholarly Infrastructure
Upgrading the Scholarly Infrastructure
Björn Brembs
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
CEDAR: Center for Expanded Data Annotation and Retrieval
 
Reproducible research concepts and tools
Reproducible research concepts and toolsReproducible research concepts and tools
Reproducible research concepts and tools
C. Tobin Magle
 
DataVsStatistics
DataVsStatisticsDataVsStatistics
DataVsStatistics
jpheintz
 
Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publications
dgarijo
 
Best practices data management
Best practices data managementBest practices data management
Best practices data management
Sherry Lake
 
Reuse of Repository Data
Reuse of Repository DataReuse of Repository Data
Reuse of Repository Data
Valerie Enriquez
 
Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)
dgarijo
 
Liberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaLiberating Laboratory Data - Eureka
Liberating Laboratory Data - Eureka
Stuart Chalk
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
IUPUI
 
ROHub
ROHubROHub
ROHub
Raul Palma
 

What's hot (20)

Intro to Reproducible Research
Intro to Reproducible ResearchIntro to Reproducible Research
Intro to Reproducible Research
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
NPG Scientific Data - Metabolomics Society meeting, Tsuruola, Japan, 2014
 
Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practice
 
Research Data Management for Qualitative Researchers
Research Data Management for Qualitative ResearchersResearch Data Management for Qualitative Researchers
Research Data Management for Qualitative Researchers
 
Crosslinks
Crosslinks Crosslinks
Crosslinks
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Context in context: applying a context-driven approach in an academic library
Context in context: applying a context-driven approach in an academic libraryContext in context: applying a context-driven approach in an academic library
Context in context: applying a context-driven approach in an academic library
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
Upgrading the Scholarly Infrastructure
Upgrading the Scholarly InfrastructureUpgrading the Scholarly Infrastructure
Upgrading the Scholarly Infrastructure
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
Reproducible research concepts and tools
Reproducible research concepts and toolsReproducible research concepts and tools
Reproducible research concepts and tools
 
DataVsStatistics
DataVsStatisticsDataVsStatistics
DataVsStatistics
 
Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publications
 
Best practices data management
Best practices data managementBest practices data management
Best practices data management
 
Reuse of Repository Data
Reuse of Repository DataReuse of Repository Data
Reuse of Repository Data
 
Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)
 
Liberating Laboratory Data - Eureka
Liberating Laboratory Data - EurekaLiberating Laboratory Data - Eureka
Liberating Laboratory Data - Eureka
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
 
ROHub
ROHubROHub
ROHub
 

Similar to A Guide for Reproducible Research

Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
Rebekah Cummings
 
Data management
Data management Data management
Data management
Graça Gabriel
 
Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011
Rudy Potenzone
 
Chem4Word Wade
Chem4Word WadeChem4Word Wade
Chem4Word WadeAlex Wade
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Susanna-Assunta Sansone
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
Sarah Anna Stewart
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
Scott Edmunds
 
Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing
Mojtaba Lotfaliany
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
Ulrike Wittig
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple Rules
Annika Eriksson
 
Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service
ARDC
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
Ilkay Altintas, Ph.D.
 
Enhance your rese​arch impact through open science
Enhance your rese​arch impact through open scienceEnhance your rese​arch impact through open science
Enhance your rese​arch impact through open science
London School of Hygiene and Tropical Medicine
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
Ilkay Altintas, Ph.D.
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
Carole Goble
 
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveviewRDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
Susanna-Assunta Sansone
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
Susanna-Assunta Sansone
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
David Johnson
 

Similar to A Guide for Reproducible Research (20)

Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
Data management
Data management Data management
Data management
 
Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011Acs denver dirks potenzone 30 aug2011
Acs denver dirks potenzone 30 aug2011
 
Chem4Word Wade
Chem4Word WadeChem4Word Wade
Chem4Word Wade
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
FAIR BioData Management
FAIR BioData ManagementFAIR BioData Management
FAIR BioData Management
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple Rules
 
Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service
 
Bridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable WorkflowsBridging Big Data and Data Science Using Scalable Workflows
Bridging Big Data and Data Science Using Scalable Workflows
 
Enhance your rese​arch impact through open science
Enhance your rese​arch impact through open scienceEnhance your rese​arch impact through open science
Enhance your rese​arch impact through open science
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveviewRDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 

More from Yasmin AlNoamany, PhD

Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
Yasmin AlNoamany, PhD
 
csvconfyasmin2017_05_03
csvconfyasmin2017_05_03csvconfyasmin2017_05_03
csvconfyasmin2017_05_03
Yasmin AlNoamany, PhD
 
Data curation vanderbilt
Data curation vanderbiltData curation vanderbilt
Data curation vanderbilt
Yasmin AlNoamany, PhD
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Yasmin AlNoamany, PhD
 
Generating stories from Archive-It collections
Generating stories from Archive-It collectionsGenerating stories from Archive-It collections
Generating stories from Archive-It collections
Yasmin AlNoamany, PhD
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Yasmin AlNoamany, PhD
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
Yasmin AlNoamany, PhD
 
Characteristics of Social Media Stories
Characteristics of Social Media StoriesCharacteristics of Social Media Stories
Characteristics of Social Media Stories
Yasmin AlNoamany, PhD
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
Yasmin AlNoamany, PhD
 
User Access Patterns in Web Archives
User Access Patterns in Web ArchivesUser Access Patterns in Web Archives
User Access Patterns in Web ArchivesYasmin AlNoamany, PhD
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet ArchiveYasmin AlNoamany, PhD
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesYasmin AlNoamany, PhD
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Yasmin AlNoamany, PhD
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesYasmin AlNoamany, PhD
 

More from Yasmin AlNoamany, PhD (14)

Software as a Well-Formed Research Object
Software as a Well-Formed Research ObjectSoftware as a Well-Formed Research Object
Software as a Well-Formed Research Object
 
csvconfyasmin2017_05_03
csvconfyasmin2017_05_03csvconfyasmin2017_05_03
csvconfyasmin2017_05_03
 
Data curation vanderbilt
Data curation vanderbiltData curation vanderbilt
Data curation vanderbilt
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
Using Web Archives to Enrich the Live Web Experience Through Storytelling - P...
 
Generating stories from Archive-It collections
Generating stories from Archive-It collectionsGenerating stories from Archive-It collections
Generating stories from Archive-It collections
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through StorytellingUsing Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
Characteristics of Social Media Stories
Characteristics of Social Media StoriesCharacteristics of Social Media Stories
Characteristics of Social Media Stories
 
Detecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web ArchivesDetecting Off-Topic Pages in Web Archives
Detecting Off-Topic Pages in Web Archives
 
User Access Patterns in Web Archives
User Access Patterns in Web ArchivesUser Access Patterns in Web Archives
User Access Patterns in Web Archives
 
Who and What Links to the Internet Archive
Who and What Links to the Internet ArchiveWho and What Links to the Internet Archive
Who and What Links to the Internet Archive
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web Archives
 
Using Web Archives to Enrich the Live Web Experience Through Storytelling
Using Web Archives to Enrich  the Live Web Experience Through StorytellingUsing Web Archives to Enrich  the Live Web Experience Through Storytelling
Using Web Archives to Enrich the Live Web Experience Through Storytelling
 
Access Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web ArchivesAccess Patterns for Robots and Humans in Web Archives
Access Patterns for Robots and Humans in Web Archives
 

Recently uploaded

erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
rakeshsharma20142015
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
binhminhvu04
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 

Recently uploaded (20)

erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Predicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdfPredicting property prices with machine learning algorithms.pdf
Predicting property prices with machine learning algorithms.pdf
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 

A Guide for Reproducible Research

  • 1. www.postersession.com Reproducibility in research is the ability to replicate the ultimate product of academic research to reproduce the results and build on the research. The main entities of academic research are data, scripts/software for processing and analysis, workflow of the research process, and research output (Figure 1). Documenting workflow, data, and code during the active phase of the scientific research is important for communication of the scholarship and replication of the results. When researchers submit scientific papers or build on their work, they face the challenge of having to remember all the details of their own work if they haven't included well documentation for this work. In order to sustain and ensure the integrity of reproducibility in the scientific research and advance the scientific research process, this poster presents guidelines for researchers that help them to manage the research entities during the active phase of the research process. A Guide for Reproducible Research Yasmin AlNoamany University of California, Berkeley yasminal@berkeley.edu Introduction The main entities of the scientific research Research Software – source code or executables that researchers generate or integrate into the workflow of the scientific research. What to document: Good practices in managing your software: •  Custom scripts to automate research analysis. •  Attach examples of how the code works. •  Generate a list of all scripts, how to run them, and in what order. •  Use tools that capture the experimental environment, such as Docker and ReproZIP. •  Use metadata standards for each generated module. Each module should have at least the following: Ø  Name of the module Ø  Name of the project Ø  Name of Author Ø  Input and Output Ø  Purpose of the Module Ø  A brief Description Naming files should be descriptive and consistent! Tools •  Docker •  Apache Ivy Research Software •  The experimental environment – e.g., hardware, operating system •  The computing platform and prerequisites •  Scripts and libraries •  Input and output parameters •  The functionality of each script •  Dependencies of the software indicating versions •  The structure of the code/software and details about individual components Scientific paper(s) along with graphs/tables – document(s) that contains the results of the scientific research as well as all the assorted graphs and tables. This could be: •  Compiled files (e.g., pdf) •  Source files (e.g., .tex files, figures, .bib file) •  Packages/libraries/styles installed (e.g., graphics) •  Graphs and tables Good practices in managing output files: •  Document the environment and the file structure. •  Track versions of produced papers, graphs, etc. •  Document any problem that faces you with the computing environment. •  Backup your files every while. •  Save your files on Dropbox or any other cloud storage to keep track of your versions. •  For writing your manuscript, use Latex and Bibtex for these reasons: Ø  Latex is free and open source. Ø  A .tex file can be edited in any text editor. Ø  The content is separated from style. Ø  With a couple of line and style files, you can convert how your pdf looks. Ø  Latex allows preserving your files longer time. Ø  The output document looks better. Naming files should be descriptive and consistent! Tools •  Latex •  Bibtex Research Output Data Data – files that were used or produced during the scientific research process. These files can be raw data or different versions of processed data. Good practices in managing data: •  Include a README file in the directory that has the data. •  Write a data management plan, which has become a requirement by funding agencies. •  Provide a detailed description of the data, data source(s), and how it will be used. •  Provide a description to the process of capturing the data. •  Describe all the steps of data preprocessing. •  Provide a description and information about each new version of the data. •  Provide details about the software/code that is used for preprocessing the data. •  Adapt metadata standards for describing the data. •  Backup your files every while. Naming files should be descriptive and consistent! Tools •  DMPTool •  DASH •  Figshare •  EZID •  Box and Drive •  Merritt repository Source: http://data-archive.ac.uk/create-manage/life-cycle References 1.  AlNoamany, Yasmin. "How to make your research reproducible”, http://guides.lib.berkeley.edu/reproducibility-guide, (2017). 2.  Stodden, Victoria. "Enabling reproducible research: Open licensing for scientific innovation." (2009). 3.  Bailey, David H., Jonathan M. Borwein, and Victoria Stodden. "Facilitating reproducibility in scientific computing: Principles and practice." Reproducibility: Principles, Problems, Practices, and Prospects (2014): 205-232. 4.  Stodden, Victoria, et al. "Enhancing reproducibility for computational methods." Science 354.6317 (2016): 1240-1241. Workflow Workflow documentation – detailed steps of the workflow that capture the process of the scientific research. •  Weekly/daily notes on the project's stages •  Documentation for the steps of the workflow For managing the research workflow, document: •  The steps of the research starting from the design till fetching the data till producing graphs and tables in the scientific output. •  All adopted libraries and integrated algorithms. •  All citations and information of code and data used. •  The input and the output of each step. Electronic Notebooks, such as Jupyter help documenting the workflow! Tools •  Jupyter •  knitr •  Overleaf •  ShareLatex •  GitHub •  Zenodo Sponsored in part through grants from the Alfred P. Sloan Foundation #G-2014-13746 and from the National Science Foundation NSF ACI #1349002