Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
Aparna Radhakrishnan
Reproducible climate research
U.S.
D
EPARTMENT OF COMM
E
R
CE
NATIONALOCEA
NIC
AND ATMOSPHERIC
ADMI...
2
Publications
Policy Decisions
Data Consumers
IPCC
Climate Model Intercomparison Project
A.Radhakrishnan et al, Towards R...
3
Reproducible research is the idea that data analyses, and
more generally, scientific claims, are published with their
da...
4
Learn
Give credit
Build Trust
Make informed decisions
Extend Research
Collaborate
Big data Source
code
Documentation
Num...
Specifications Document: E.g. Platform,
Dependencies, Software
HOW-TO?
E.g. Configure, Install.
CODEPRODUCTS, DATA
DOCUMEN...
When will
I get to
conduct my
research?
6 A.Radhakrishnan et al, Towards Reproducible Climate research
Explore the use of Docker containers.
7 A.Radhakrishnan et al, Towards Reproducible Climate research
8
Use-case:1
Climate analysis
Reproducing figures in papers.
Test image only.
A.Radhakrishnan et al, Towards Reproducible ...
9
How do we guide a researcher to
provide pointers
to
reproducing this figure from a
paper publication?
Test image only.
A...
10 A.Radhakrishnan et al, Towards Reproducible Climate research
1. Create a dockerfile from within your project directory ...
12 A.Radhakrishnan et al, Towards Reproducible Climate research
Live demo
Using docker and jupyter notebook for analysis i...
13
Step 6. Share with colleagues
Create a dockerhub account and a repository.
Push your awesome docker image!
Pull your aw...
14 A.Radhakrishnan et al, Towards Reproducible Climate research
15
How it works?
1. Enter your GitHub repository information for the Jupyter notebooks
2. Pangeo-binder builds a Docker im...
16
Live demo
1. Open Pangeo-binder: https://binder.pangeo.io/
2. Provide a Github repo URL, e.g. https://github.com/rabern...
17
Docker container
Use-case:2
Using Docker containers for massive
Data publishing.
A.Radhakrishnan et al, Towards Reprodu...
Earth System Grid Federation Architecture
18
(VM instances)
Nikonov et al, ESGF F2F 2018.
https://esgf.github.io/esgf-docker/compose/quick-start
Bash shell
Docker Engine
Docker Compose
• Data publishing on Earth
...
References
20
https://github.com/pangeo-data/pangeo-stacks
http://pangeo.io
https://mybinder.readthedocs.io/en/latest/intr...
21
Acknowledgments
21
Ryan Abernathey
V.Balaji
Thank You
Luca Cinquini
John Krasting
Colleen McHugh
Serguei Nikonov
Roland...
Upcoming SlideShare
Loading in …5
×

DCSF 19 Towards Reproducable Climate Research

368 views

Published on

Aparna Radhakrishnan, Engility

NOAA/GFDL was founded in 1955 and is still in the forefront of climate research, contributing to the numerous policies and decisions undertaken in this world of evolving responses with respect to climate, which in turn creates an avalanche of effects in various sectors, e.g agriculture, health, GDP. The scale and magnitude of computing and data have proven to increase significantly in the last decade, thus making data delivery methods to the world a herculean research problem by itself. In addition to this, the time and efforts invested by a user in analyzing and peer-reviewing a research article is very laborious. Literature shows numerous outstanding climate studies published in International climate assessment reports, such as the Intergovernmental Panel on Climate Change (IPCC), the United Nations body for assessing the science related to climate change. The need to verify the research and make it reproducible and transparent before it gets translated into major decisions is, now more than ever, one of our most critical challenges. In this presentation, we will paint a picture of the history of climate computing and analytics with significant transformations applied in order to make meaningful, quantifiable, credible, interoperable, accessible and reusable climate research. In other words, we will draw a path towards reproducible research using Docker containers for massive data publishing and climate analytics. This paper will also discuss some of the pioneering efforts from collaborators from other laboratories and organizations (such as ESGF, Google, NASA JPL, Columbia University, PMEL, etc.) in the area of Docker containers in computing and analysis on and off the cloud.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

DCSF 19 Towards Reproducable Climate Research

  1. 1. 1 Aparna Radhakrishnan Reproducible climate research U.S. D EPARTMENT OF COMM E R CE NATIONALOCEA NIC AND ATMOSPHERIC ADMINISTRATION Facilitating inspiration-driven and industrial-strength analysis
  2. 2. 2 Publications Policy Decisions Data Consumers IPCC Climate Model Intercomparison Project A.Radhakrishnan et al, Towards Reproducible Climate research Research-driven decisions
  3. 3. 3 Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them. (Ref. https://www.coursera.org/learn/reproducible-research)
  4. 4. 4 Learn Give credit Build Trust Make informed decisions Extend Research Collaborate Big data Source code Documentation Numerous possibilities Value-added research Research the research
  5. 5. Specifications Document: E.g. Platform, Dependencies, Software HOW-TO? E.g. Configure, Install. CODEPRODUCTS, DATA DOCUMENTATION A.Radhakrishnan et al, Towards Reproducible Climate research
  6. 6. When will I get to conduct my research? 6 A.Radhakrishnan et al, Towards Reproducible Climate research
  7. 7. Explore the use of Docker containers. 7 A.Radhakrishnan et al, Towards Reproducible Climate research
  8. 8. 8 Use-case:1 Climate analysis Reproducing figures in papers. Test image only. A.Radhakrishnan et al, Towards Reproducible Climate research
  9. 9. 9 How do we guide a researcher to provide pointers to reproducing this figure from a paper publication? Test image only. A.Radhakrishnan et al, Towards Reproducible Climate research A beginner’s guide to developing and sharing climate analysis using docker containers.
  10. 10. 10 A.Radhakrishnan et al, Towards Reproducible Climate research 1. Create a dockerfile from within your project directory that has your jupyter notebook and other supporting code. 2. Build your docker image 3. Run your application 4. Activate your conda environment and run jupyter notebook Reference. https://cloud.docker.com/u/aparnadotnoaa/repository/docker/aparnadotnoaa/gfdlanalysis-example Use –v to include any additional data volumes Cheat Sheet 5. Open localhost:8888?token=<paste_token_from_above_cmd>
  11. 11. 12 A.Radhakrishnan et al, Towards Reproducible Climate research Live demo Using docker and jupyter notebook for analysis in the cloud
  12. 12. 13 Step 6. Share with colleagues Create a dockerhub account and a repository. Push your awesome docker image! Pull your awesome docker image! See step 3 and 4 to run. A.Radhakrishnan et al, Towards Reproducible Climate research
  13. 13. 14 A.Radhakrishnan et al, Towards Reproducible Climate research
  14. 14. 15 How it works? 1. Enter your GitHub repository information for the Jupyter notebooks 2. Pangeo-binder builds a Docker image of your repository 3. Interact with your notebooks in a live environment 4. Scale your computations across an adaptive dask cluster A.Radhakrishnan et al, Towards Reproducible Climate research
  15. 15. 16 Live demo 1. Open Pangeo-binder: https://binder.pangeo.io/ 2. Provide a Github repo URL, e.g. https://github.com/rabernat/pangeo_esgf_demo 3. Tap on Launch. A.Radhakrishnan et al, Towards Reproducible Climate research
  16. 16. 17 Docker container Use-case:2 Using Docker containers for massive Data publishing. A.Radhakrishnan et al, Towards Reproducible Climate research
  17. 17. Earth System Grid Federation Architecture 18 (VM instances) Nikonov et al, ESGF F2F 2018.
  18. 18. https://esgf.github.io/esgf-docker/compose/quick-start Bash shell Docker Engine Docker Compose • Data publishing on Earth System Grid Federation • Collaborative software development for International projects • Ease of installation and maintenance. Some benefits Pre-requisites 19 A.Radhakrishnan et al, Towards Reproducible Climate research
  19. 19. References 20 https://github.com/pangeo-data/pangeo-stacks http://pangeo.io https://mybinder.readthedocs.io/en/latest/introduction.html https://esgf.github.io/esgf-docker/compose/quick-start https://dream.llnl.gov/about.html https://www.coursera.org/learn/reproducible-research Gundersen and Kjensmo 2018 “State of the Art: Reproducibility in Artificial Intelligence.” Gundersen, On Reproducible AI: AI Magazine, Vol. 39, No. 3, Fall 2018. http://doi.org/10.1609/aimag.v39i3.2816 Takeaway Docker: A recipe for reproducible, interoperable and collaborative research
  20. 20. 21 Acknowledgments 21 Ryan Abernathey V.Balaji Thank You Luca Cinquini John Krasting Colleen McHugh Serguei Nikonov Roland Schweitzer Hans Vahlenkamp Chandin Wilson Andrew Wittenberg @HV Photography A.Radhakrishnan et al, Towards Reproducible Climate research

×