Master's Thesis - climateprediction.net: A Cloudy Approach
1. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
climateprediction.net: A Cloudy Approach
Master in High Performance Computing
Master’s Thesis
Diego P´erez Montes
advised by
Tom´as Fern´andez Pena
Juan Antonio A˜nel Cabanelas
July 1, 2014
Diego P´erez Montes climateprediction.net: A Cloudy Approach
2. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
1 Problem Background
Current Infrastructure
Problem Description
2 Computing Infrastructure Migration
Measuring the Problem...
Infrastructure Redesign
3 Storage
4 Central Control System
Backend Components
Dashboard
Running the Simulation
5 Conclusions
Diego P´erez Montes climateprediction.net: A Cloudy Approach
3. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Motivation
Solve a real problem, useful for someone and that can be
expanded in further works.
Apply what I’ve learned in the Master courses.
I do love large infrastructure problems (and this is a big one!).
Diego P´erez Montes climateprediction.net: A Cloudy Approach
4. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Current Infrastructure
First of all: How does the project currently work?
Diego P´erez Montes climateprediction.net: A Cloudy Approach
5. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Current Infrastructure
Figure : BOINC: High Level Architecture and Workflow
Diego P´erez Montes climateprediction.net: A Cloudy Approach
6. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Problem Description
So, what is the problem then?
The need of execution of a new model (HadGEM)
The resources requirements are higher (Hardware: Computing
and Storage).
The current BOINC workunit processing time is 7-9 days , this
wants to be reduced.
Heterogeneous and unpredictable environment:
Can’t manage resources on-demand.
Execution time can’t be properly measured.
Processed data is missing.
Diego P´erez Montes climateprediction.net: A Cloudy Approach
7. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Problem Description
So, what is the problem then?
Need to establish metrics on the project.
Rationalization of costs (how much does a simulation really
cost?)
Diego P´erez Montes climateprediction.net: A Cloudy Approach
8. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Project Objectives
How is it going to be solved?
Conversion to an Infrastructure as a Service (Iaas) in the
Cloud (Amazon Web Services AWS: EC2 for Computing and
S3 for Storage).
Creation of a new abstraction layer, the Central Control
System:
Infrastructure and resources management.
Creation of metrics and statistics.
Free Software.
Fully documented.
Diego P´erez Montes climateprediction.net: A Cloudy Approach
9. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Measuring the Problem...
Unknown problem real size and how is it going to behave
into the new environment with the new parametrization.
Initial data from the current infrastructure over BOINC
(Computing point of view):
A workunit takes in average from 7 to 9 days to be processed.
A full simulation is (minimum) 36,000 workunits into sections
of 6,000.
Diego P´erez Montes climateprediction.net: A Cloudy Approach
10. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Measuring the Problem...
Initial considerations:
Models used on the tests: weather@homeUK floods and and
weather@home Australia New Zealand (full and regional:
HaDAM3P and HadRM3P)
Two representative systems (on EC2) were selected and 10
consecutive executions were done.
Diego P´erez Montes climateprediction.net: A Cloudy Approach
11. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Measuring the Problem...
System #1: Moderate CPU
CPU: 2 x Xeon E5-2650
MEM: 8GB (4GB/Core)
GPU: No
Workunit Time: 7.32 days
Workunit Cost: USD 4.464
Full Simulation Cost: USD 160,704
Diego P´erez Montes climateprediction.net: A Cloudy Approach
12. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Measuring the Problem...
System #2: Intensive CPU&GPU
CPU: 16 x Xeon X5570
MEM: 24GB (1.5GB/Core)
GPU: 2 x Tesla M2050
Workunit Time: 1.99 days
Workunit Cost: USD 100.966
Full Simulation Cost: USD 3,634,776
Diego P´erez Montes climateprediction.net: A Cloudy Approach
13. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Measuring the Problem...
Diego P´erez Montes climateprediction.net: A Cloudy Approach
14. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
How much does it really cost?
Diego P´erez Montes climateprediction.net: A Cloudy Approach
15. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Going IaaS
Figure : Proposed Infrastructure
Diego P´erez Montes climateprediction.net: A Cloudy Approach
16. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Going IaaS
Steps:
1 Template an instance:
Install Operating System (Amazon Linux 2014.03.1 64bit)
Configure network and firewall.
Configure local storage: 16GB
Install and configure BOINC to use climateprediction.net
Install local client (Simulation Terminator)
2 Contextualize and scale.
Diego P´erez Montes climateprediction.net: A Cloudy Approach
17. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Storage
Every simulation (36,000 workunits) outputs 3.6 TB of data.
There are not enough resources (disk space) on the current
systems.
Figure : Shared Storage Architecture
Diego P´erez Montes climateprediction.net: A Cloudy Approach
18. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Architecture
Figure : Central System Architecture
Diego P´erez Montes climateprediction.net: A Cloudy Approach
19. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Backend Components
Simple Scheduler: Runs and configures simulation with given
parameters (start/stop instances).
Reaper: Releases resources (terminates instances) when they
are powered off.
RESTful API: Gives access to configure and run simulations.
Diego P´erez Montes climateprediction.net: A Cloudy Approach
20. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
API
RESTful API
Get simulation status.
Get metric/statistic data.
Set/modify simulation parameters (number of worker
nodes/instances).
Stop simulation.
Diego P´erez Montes climateprediction.net: A Cloudy Approach
21. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Dashboard
Figure : Dashboard Interface
Diego P´erez Montes climateprediction.net: A Cloudy Approach
22. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Running the Simulation
[Overview of a Live System]
Diego P´erez Montes climateprediction.net: A Cloudy Approach
23. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Conclusions
Objectives Achieved
Computing and Storage successfully migrated to the Cloud
(EC2 and S3).
Simulations were executed, showing that running the model in
the cloud is possible.
Development and a Central System (scheduler and
dashboard).
Got costs and metrics of the project.
Diego P´erez Montes climateprediction.net: A Cloudy Approach
24. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Conclusions
What’s Next?
Migrate BOINC server.
More control/interaction with clients so the scheduler can be
improved (and give a full SaaS layer).
Costs: “warm up“ stage to dynamically recalculate price.
Diego P´erez Montes climateprediction.net: A Cloudy Approach
25. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Thanks!
Diego P´erez Montes climateprediction.net: A Cloudy Approach
26. Outline Problem Background Computing Infrastructure Migration Storage Central Control System Conclusions
Used Icons Links
Iconset Windows 8 metro style: https://www.iconfinder.
com/iconsets/windows-8-metro-style
Link: http://sta.sh/0228t4fyjyjb
Diego P´erez Montes climateprediction.net: A Cloudy Approach