Who cares about Software Process Modelling? A First Investigation about the P...
The Road to Reproducible Computational Research
1. U n i v e rs i t y LO G O
Testing and Developing Tools to Promote
the Reproducibility of Computational
Research
Andrey Moskalenko
Center for Theoretical and Computational Materials Science
Daniel Wheeler | Faical Yannick P. Congo
3. •Context of the Project
• Simulation Management
• Sumatra and CoRR
• Benchmark Phase Field Problem
• Conclusion
Table of Contents
4. U n i v e rs i t y LO G O
• Context of the Project
•Simulation Management
• Sumatra and CoRR
• Benchmark Phase Field Problem
• Conclusion
Table of Contents
5. U n i v e rs i t y LO G O
Simulation Management
The GoalComputational Research Now
6. U n i v e rs i t y LO G O
Current available tools
Robust
Command line
Web integration
Highly collaborative
Not suitable for
capturing execution
context
Suitable for recording
stable automated
executions
Provides log, search and
view of execution history
Capture entire
simulation context
Version environments
Collaborative
Not collaborative
with current tools
Not robust or
ubiquitous
Not suitable for log,
search and view of
history
Suitable for building
pipelines of distinct
tasks
Enables a clear
division of tasks for
non-experts
Black box design for
each section of the
pipeline
Monolithic in nature
encouraging isolated
ecosystem of tools
7. • Context of the Project
• Simulation Management
•Sumatra and CoRR
• Benchmark Phase Field Problem
• Conclusion
Table of Contents
8. • Context of the Project
• Simulation Management
•Sumatra and CoRR
• Benchmark Phase Field Problem
• Environment and Examples
• Conclusion
Table of Contents
• Context of the Project
• Simulation Management
•Sumatra and CoRR
• Benchmark Phase Field Problem
• Conclusion
Table of Contents
9. U n i v e rs i t y LO G O
Sumatra and CoRR
- What is it good for?
1
- What are the limitations?
10. U n i v e rs i t y LO G O
Sumatra and CoRR
- What is it good for?
1
- What are the limitations?
- Autonomous
- Local and cloud storage
- Continuously recording
- Compatible
- click-and-run
2
11. Sumatra and CoRR
dt = 1
Equation = f()
while elapsed_time is less than desired_duration:
result1 = equation.solve(dt = dt, solver = LinearPCG)
result2 = equation.solve(dt = small_dt, solver = LinearPCG)
if result1 does not meet tolerance * result2:
decrease dt and solve again
else:
increase dt and solve again
Extract data
12. U n i v e rs i t y LO G O
Environment
Workflow
Definition
Jupyter Notebook aka iPython Notebook
libraries
GitHub
Cluster
13. • Context of the Project
• Simulation Management
• Sumatra and CoRR
•Benchmark Phase Field
Problem
• Conclusion
Table of Contents
14. • Context of the Project
• Simulation Management
• Sumatra and CoRR
•Benchmark Phase Field
Problem
• Conclusion
Table of Contents
15. U n i v e rs i t y LO G O
Analysis – phase-field model
2 Test CoRR and Sumatra functionality
1 Performance evaluation
3 Results
1 Performance evaluation
16. U n i v e rs i t y LO G O
Analysis – phase-field model
Results
17. U n i v e rs i t y LO G O
Why is reproducibility a difficult task?
• Versions and updates
• Legality
• Hardware
• Python libraries and dependencies
• Time drain
18. U n i v e rs i t y LO G O
• Context of the Project
• Simulation Management
• Sumatra and CoRR
• Benchmark Phase Field Problem
•Conclusion
Table of Contents
19. U n i v e rs i t y LO G O
Conclusion
2
Problem: CHiMaD benchmark problem
Solution: CoRR
1 Could you reproduce our phase-field results?
3 More work to be done in both areas
20. U n i v e rs i t y LO G O
Acknowledgements
2 MML Thermodynamics and Kinetics group
1
Mentors
Daniel Wheeler, Ph.D
Faical Yannick P. Congo, Ph.D
3 Anushka Dasgupta
4 All who made NIST SURF possible
Editor's Notes
All scientific research should be reproducible and the main areas are computational and experimental. There is increasing difficulty in reproducing computational research due to the fast pace of advancement in our software and hardware. There are even scientists who dedicate their careers to going through older published papers and figuring out how to replicate their results. Since the software and computers from 5 years were very different.
Why do we care? To promote easier scientific advancements. If you find a code or simulation that someone created and you could use it to help with your own research and build on top of it, you first need to make sure that it is correct and agrees with your own theories. And sometimes that step can take a very long time.
CHiMaD is phase field community dedicated to distributing phase field models in order to determine the most efficient way of simulating various materials. Anushka and I worked on benchmark problem and figuring out how the code is compared to others.
5
Limitations: Records at the end, simultaneous database writing, Python focused, smtweb?, local storage, hacky, lacks community support, individual developed, no method for tracking any CPU or memory information
simulation management – compatible, packaging, click-and-run
Records at the end, simultaneous database writing, Python focused, smtweb?, local storage, hacky, community support, individual developed
CoRR– very early development stage, compatible, packaging (can recreate the environment just from the record and even run it on the cloud), click-and-run
CoRR limitations, current and future?
Save data files
We can calculate other properties such as free energy
12
15
16
I had to use python 2.7
Proprietary software
Processors, task prioritization
Time drain : can take long to get everything required to run simulation, make it available, answer people’s questions about it
I’m sure some of you had to use previously developed code that you had no idea how it worked, if it worked.
In 10 years – hardware will be so advanced that we will be able to run as many simulations as we desire and execution control will be even more crucial.
Solution to CHiMaD problem