SlideShare a Scribd company logo
1 of 16
Download to read offline
Baptiste Gerondeau, Takeharu Kato, Renato Golin
Arm Workshop, 26th July 2018
OpenHPC Automation with Ansible
Outline
● Linaro’s HPC-SIG Lab
● OpenHPC Ansible Automation
● Results
The HPC-SIG Lab
Goals:
● Cluster Automation & Validation
● Benchmarking & Performance Investigation
Requirements:
● Stability & Repeatability
● Close-to-production environment
● Upstream technology (reproducibility)
● Vendor isolation (hardware, results)
3
The HPC-SIG Lab
Network Layout
● Uplink subnet
○ Access/Provision
● BMC subnet
● MPI subnet
○ 100GB InfiniBand
○ Slave provision
● File System subnet
○ 100GB Ethernet
○ Lustre/Ceph (future)
4
The HPC-SIG Lab
● Stability & Repeatability
○ Critical external components cached locally
○ Strict migration plans (staging)
● Close-to-production environment
○ Hardware and firmware updated frequently
● Upstream technology (reproducibility)
○ All components open source / upstream
● Vendor isolation (hardware, results)
○ VPN, Provisioner, Jenkins, SSH control
5
Open Source and Upstream
Lab admin & tools:
● Jenkins: https://jenkins.io/
● Mr-Provisioner: https://github.com/Linaro/mr-provisioner
Lab Automation:
● https://github.com/Linaro/ans_setup_jenkins
● https://github.com/Linaro/mr-provisioner-role
● https://github.com/Linaro/mr-provisioner-kea-dhcp4-role
● https://github.com/Linaro/ansible-role-mr-provisioner
● https://github.com/Linaro/mr-provisioner-client
HPC-specific automation:
● https://github.com/Linaro/hpc_lab_setup
● https://github.com/Linaro/hpc_deploy_benchmarks
● https://github.com/Linaro/benchmark_harness
● https://github.com/Linaro/ansible-playbook-for-ohpc
6
Existing OpenHPC Automation
● A recipe with all rules described in the official documents
● LaTex snippets containing shell code
● Are converted and merged into a (OS-dependent) bash script
● Scripts in OHPC packages, installed after OpenHPC main package is installed
● Plus a input.local file, with some cluster-specific configuration / environment
7
Existing OpenHPC Automation
Shortcomings:
● The input.local file exports shell variables
● The recipe.sh is not idempotent
● Extensibility is impossible without editing the files
8
Ansible Playbooks
Ansible is a widely used automation tool which can describe the structure and
configuration of IT infrastructure with YAML “playbooks” and “roles”.
OpenHPC with Ansible:
● Ansible playbooks can more easily be made idempotent
● Ansible can manage nodes/tasks according to the the structure of the cluster
● Configuration is passed as a YAML file (no environment handling)
● Composition, using playbooks and roles, building on third-party content
9
Ansible OpenHPC Benefits
● Flexible cluster configuration
○ Fine grained / composable
○ Cluster wide / node group wide / node specific
● Works on both x86_64 and AArch64
○ Ansible gathers information about architecture
○ Same playbook runs on both
● OS is directly inferred by Ansible (gather_facts)
○ Ansible gathers information about OS
○ Yum, apt, zypper… can be switched in the roles’ logic
10
Ansible OpenHPC Recipe
The basic structure of the Ansible playbook
playbook
+---- group_vars/
+-- all.yml cluster wide configurations
+-- group1,group2 … node group(e.g., computing nodes, login, DFS) specific configurations
+---- host_vars/
+--- host1,host2 … host specific configurations
+---- roles/ package specific tasks, templates, config files, and config variables
+--- package-name/
+--- tasks/main.yml … YAML file to describe installation method of package-name
+--- vars/main.yml … package specific configuration variables
+--- templates/*.j2 … template files to generate configuration files
11
Ansible OpenHPC Tasks
12
Upstreaming our work
After multiple discussions, our proposal is:
● Generate from LaTex sources at the same time as recipe.sh
○ Each block/section as a separate role
○ Control OS/Arch via gather_facts/config
○ xCAT vs Warewulf, Slurm vs PBS: only add what hasn’t yet
● Push to a separate repository (openhpc/ansible-role ?)
○ Control release branch/tags as to not overwrite previous recipes
○ Repo can have additional playbooks (CI, new users)
○ Once release is validated, push to Ansible Galaxy
● Direct pull requests to playbooks / support roles
○ Only recipe roles are auto-generated
○ Support roles (specific to Ansible, playbook) are maintained on the repo
13
Results
Test Suite
Most tests green, however:
● Intel-specific tests (CILK, TBB, IMB) disabled
● Others need package install (PDF, CDF, HDF), but pass when installed
● TAU fails because LMod defaults to openmpi (needs openmpi3)
● Lustre fails as package depends on kernel 4.2 (which won’t work on our machines)
● MiniDFT and PRK had make failures, but we haven’t investigated yet
● --enable-longdoesn’t really, need to look into why not
The plan from now on is:
1. Automate package install conditional on enabled tests, fix remaining errors
2. Work with members to prioritise long term ones (like Lustre)
3. Use it for additional packages, so we can test them before sending upstream
4. Add a benchmark mode, making sure to use entire cluster
15
Thank You!

More Related Content

What's hot

OpenStack Toronto: Juno Community Update
OpenStack Toronto: Juno Community UpdateOpenStack Toronto: Juno Community Update
OpenStack Toronto: Juno Community UpdateStephen Gordon
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph GaluschkaOpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph GaluschkaNETWAYS
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64Linaro
 
Python Basics for Operators Troubleshooting OpenStack
Python Basics for Operators Troubleshooting OpenStackPython Basics for Operators Troubleshooting OpenStack
Python Basics for Operators Troubleshooting OpenStackJames Dennis
 
OpenPOWER foundation update new executive director and bright open future_i...
OpenPOWER  foundation update  new executive director and bright open future_i...OpenPOWER  foundation update  new executive director and bright open future_i...
OpenPOWER foundation update new executive director and bright open future_i...Ganesan Narayanasamy
 
Openstack devops challenges
Openstack devops challenges Openstack devops challenges
Openstack devops challenges openstackindia
 
Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?Stephen Gordon
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014Sergey Lukjanov
 
Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...
Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...
Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...Cloud Native Day Tel Aviv
 
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...Linaro
 
BKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopBKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopLinaro
 
Deploying OpenDaylight and OpenStack at Ease
Deploying OpenDaylight and OpenStack at EaseDeploying OpenDaylight and OpenStack at Ease
Deploying OpenDaylight and OpenStack at EaseMichelle Holley
 
LAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoSLAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoSLinaro
 
Guts & OpenStack migration
Guts & OpenStack migrationGuts & OpenStack migration
Guts & OpenStack migrationopenstackindia
 

What's hot (20)

OpenStack Toronto: Juno Community Update
OpenStack Toronto: Juno Community UpdateOpenStack Toronto: Juno Community Update
OpenStack Toronto: Juno Community Update
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph GaluschkaOpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64
 
Python Basics for Operators Troubleshooting OpenStack
Python Basics for Operators Troubleshooting OpenStackPython Basics for Operators Troubleshooting OpenStack
Python Basics for Operators Troubleshooting OpenStack
 
OpenPOWER foundation update new executive director and bright open future_i...
OpenPOWER  foundation update  new executive director and bright open future_i...OpenPOWER  foundation update  new executive director and bright open future_i...
OpenPOWER foundation update new executive director and bright open future_i...
 
Openstack devops challenges
Openstack devops challenges Openstack devops challenges
Openstack devops challenges
 
TripleO
 TripleO TripleO
TripleO
 
Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?
 
Openstack ansible
Openstack ansibleOpenstack ansible
Openstack ansible
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014
 
Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...
Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...
Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...
 
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
 
BKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopBKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing Hadoop
 
Deploying OpenDaylight and OpenStack at Ease
Deploying OpenDaylight and OpenStack at EaseDeploying OpenDaylight and OpenStack at Ease
Deploying OpenDaylight and OpenStack at Ease
 
LAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoSLAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoS
 
HA in OpenStack service - meetup #9
HA in OpenStack service - meetup #9HA in OpenStack service - meetup #9
HA in OpenStack service - meetup #9
 
Guts & OpenStack migration
Guts & OpenStack migrationGuts & OpenStack migration
Guts & OpenStack migration
 
OpenStack Neutron behind the Scenes
OpenStack Neutron behind the ScenesOpenStack Neutron behind the Scenes
OpenStack Neutron behind the Scenes
 

Similar to OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018

LAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLinaro
 
Network Automation: Ansible 101
Network Automation: Ansible 101Network Automation: Ansible 101
Network Automation: Ansible 101APNIC
 
LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)Linaro
 
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard UniverityTechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard UniverityOpenNebula Project
 
Introduction to ansible
Introduction to ansibleIntroduction to ansible
Introduction to ansibleOmid Vahdaty
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific DashboardCeph Community
 
PLNOG Automation@Brainly
PLNOG Automation@BrainlyPLNOG Automation@Brainly
PLNOG Automation@Brainlyvespian_256
 
PLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł RozlachPLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł RozlachPROIDEA
 
Linux Kernel Platform Development: Challenges and Insights
 Linux Kernel Platform Development: Challenges and Insights Linux Kernel Platform Development: Challenges and Insights
Linux Kernel Platform Development: Challenges and InsightsGlobalLogic Ukraine
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...VMware Tanzu
 
Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势Anthony Wong
 
Sheepdog Status Report
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status ReportLiu Yuan
 
Lec 10-linux-review
Lec 10-linux-reviewLec 10-linux-review
Lec 10-linux-reviewabinaya m
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205Linaro
 
Introduction to ansible
Introduction to ansibleIntroduction to ansible
Introduction to ansibleMukul Malhotra
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8AbdullahMunir32
 
Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014vespian_256
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia DatabasesJaime Crespo
 
#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to AnsibleCédric Delgehier
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...Linaro
 

Similar to OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018 (20)

LAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMG
 
Network Automation: Ansible 101
Network Automation: Ansible 101Network Automation: Ansible 101
Network Automation: Ansible 101
 
LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)
 
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard UniverityTechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
 
Introduction to ansible
Introduction to ansibleIntroduction to ansible
Introduction to ansible
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
PLNOG Automation@Brainly
PLNOG Automation@BrainlyPLNOG Automation@Brainly
PLNOG Automation@Brainly
 
PLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł RozlachPLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł Rozlach
 
Linux Kernel Platform Development: Challenges and Insights
 Linux Kernel Platform Development: Challenges and Insights Linux Kernel Platform Development: Challenges and Insights
Linux Kernel Platform Development: Challenges and Insights
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
 
Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势
 
Sheepdog Status Report
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status Report
 
Lec 10-linux-review
Lec 10-linux-reviewLec 10-linux-review
Lec 10-linux-review
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
 
Introduction to ansible
Introduction to ansibleIntroduction to ansible
Introduction to ansible
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
 
Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
 

More from Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...Linaro
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramLinaro
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNLinaro
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...Linaro
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...Linaro
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionLinaro
 

More from Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 

OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018

  • 1. Baptiste Gerondeau, Takeharu Kato, Renato Golin Arm Workshop, 26th July 2018 OpenHPC Automation with Ansible
  • 2. Outline ● Linaro’s HPC-SIG Lab ● OpenHPC Ansible Automation ● Results
  • 3. The HPC-SIG Lab Goals: ● Cluster Automation & Validation ● Benchmarking & Performance Investigation Requirements: ● Stability & Repeatability ● Close-to-production environment ● Upstream technology (reproducibility) ● Vendor isolation (hardware, results) 3
  • 4. The HPC-SIG Lab Network Layout ● Uplink subnet ○ Access/Provision ● BMC subnet ● MPI subnet ○ 100GB InfiniBand ○ Slave provision ● File System subnet ○ 100GB Ethernet ○ Lustre/Ceph (future) 4
  • 5. The HPC-SIG Lab ● Stability & Repeatability ○ Critical external components cached locally ○ Strict migration plans (staging) ● Close-to-production environment ○ Hardware and firmware updated frequently ● Upstream technology (reproducibility) ○ All components open source / upstream ● Vendor isolation (hardware, results) ○ VPN, Provisioner, Jenkins, SSH control 5
  • 6. Open Source and Upstream Lab admin & tools: ● Jenkins: https://jenkins.io/ ● Mr-Provisioner: https://github.com/Linaro/mr-provisioner Lab Automation: ● https://github.com/Linaro/ans_setup_jenkins ● https://github.com/Linaro/mr-provisioner-role ● https://github.com/Linaro/mr-provisioner-kea-dhcp4-role ● https://github.com/Linaro/ansible-role-mr-provisioner ● https://github.com/Linaro/mr-provisioner-client HPC-specific automation: ● https://github.com/Linaro/hpc_lab_setup ● https://github.com/Linaro/hpc_deploy_benchmarks ● https://github.com/Linaro/benchmark_harness ● https://github.com/Linaro/ansible-playbook-for-ohpc 6
  • 7. Existing OpenHPC Automation ● A recipe with all rules described in the official documents ● LaTex snippets containing shell code ● Are converted and merged into a (OS-dependent) bash script ● Scripts in OHPC packages, installed after OpenHPC main package is installed ● Plus a input.local file, with some cluster-specific configuration / environment 7
  • 8. Existing OpenHPC Automation Shortcomings: ● The input.local file exports shell variables ● The recipe.sh is not idempotent ● Extensibility is impossible without editing the files 8
  • 9. Ansible Playbooks Ansible is a widely used automation tool which can describe the structure and configuration of IT infrastructure with YAML “playbooks” and “roles”. OpenHPC with Ansible: ● Ansible playbooks can more easily be made idempotent ● Ansible can manage nodes/tasks according to the the structure of the cluster ● Configuration is passed as a YAML file (no environment handling) ● Composition, using playbooks and roles, building on third-party content 9
  • 10. Ansible OpenHPC Benefits ● Flexible cluster configuration ○ Fine grained / composable ○ Cluster wide / node group wide / node specific ● Works on both x86_64 and AArch64 ○ Ansible gathers information about architecture ○ Same playbook runs on both ● OS is directly inferred by Ansible (gather_facts) ○ Ansible gathers information about OS ○ Yum, apt, zypper… can be switched in the roles’ logic 10
  • 11. Ansible OpenHPC Recipe The basic structure of the Ansible playbook playbook +---- group_vars/ +-- all.yml cluster wide configurations +-- group1,group2 … node group(e.g., computing nodes, login, DFS) specific configurations +---- host_vars/ +--- host1,host2 … host specific configurations +---- roles/ package specific tasks, templates, config files, and config variables +--- package-name/ +--- tasks/main.yml … YAML file to describe installation method of package-name +--- vars/main.yml … package specific configuration variables +--- templates/*.j2 … template files to generate configuration files 11
  • 13. Upstreaming our work After multiple discussions, our proposal is: ● Generate from LaTex sources at the same time as recipe.sh ○ Each block/section as a separate role ○ Control OS/Arch via gather_facts/config ○ xCAT vs Warewulf, Slurm vs PBS: only add what hasn’t yet ● Push to a separate repository (openhpc/ansible-role ?) ○ Control release branch/tags as to not overwrite previous recipes ○ Repo can have additional playbooks (CI, new users) ○ Once release is validated, push to Ansible Galaxy ● Direct pull requests to playbooks / support roles ○ Only recipe roles are auto-generated ○ Support roles (specific to Ansible, playbook) are maintained on the repo 13
  • 15. Test Suite Most tests green, however: ● Intel-specific tests (CILK, TBB, IMB) disabled ● Others need package install (PDF, CDF, HDF), but pass when installed ● TAU fails because LMod defaults to openmpi (needs openmpi3) ● Lustre fails as package depends on kernel 4.2 (which won’t work on our machines) ● MiniDFT and PRK had make failures, but we haven’t investigated yet ● --enable-longdoesn’t really, need to look into why not The plan from now on is: 1. Automate package install conditional on enabled tests, fix remaining errors 2. Work with members to prioritise long term ones (like Lustre) 3. Use it for additional packages, so we can test them before sending upstream 4. Add a benchmark mode, making sure to use entire cluster 15