SlideShare a Scribd company logo
Baptiste Gerondeau, Takeharu Kato, Renato Golin
Arm Workshop, 26th July 2018
OpenHPC Automation with Ansible
Outline
● Linaro’s HPC-SIG Lab
● OpenHPC Ansible Automation
● Results
The HPC-SIG Lab
Goals:
● Cluster Automation & Validation
● Benchmarking & Performance Investigation
Requirements:
● Stability & Repeatability
● Close-to-production environment
● Upstream technology (reproducibility)
● Vendor isolation (hardware, results)
3
The HPC-SIG Lab
Network Layout
● Uplink subnet
○ Access/Provision
● BMC subnet
● MPI subnet
○ 100GB InfiniBand
○ Slave provision
● File System subnet
○ 100GB Ethernet
○ Lustre/Ceph (future)
4
The HPC-SIG Lab
● Stability & Repeatability
○ Critical external components cached locally
○ Strict migration plans (staging)
● Close-to-production environment
○ Hardware and firmware updated frequently
● Upstream technology (reproducibility)
○ All components open source / upstream
● Vendor isolation (hardware, results)
○ VPN, Provisioner, Jenkins, SSH control
5
Open Source and Upstream
Lab admin & tools:
● Jenkins: https://jenkins.io/
● Mr-Provisioner: https://github.com/Linaro/mr-provisioner
Lab Automation:
● https://github.com/Linaro/ans_setup_jenkins
● https://github.com/Linaro/mr-provisioner-role
● https://github.com/Linaro/mr-provisioner-kea-dhcp4-role
● https://github.com/Linaro/ansible-role-mr-provisioner
● https://github.com/Linaro/mr-provisioner-client
HPC-specific automation:
● https://github.com/Linaro/hpc_lab_setup
● https://github.com/Linaro/hpc_deploy_benchmarks
● https://github.com/Linaro/benchmark_harness
● https://github.com/Linaro/ansible-playbook-for-ohpc
6
Existing OpenHPC Automation
● A recipe with all rules described in the official documents
● LaTex snippets containing shell code
● Are converted and merged into a (OS-dependent) bash script
● Scripts in OHPC packages, installed after OpenHPC main package is installed
● Plus a input.local file, with some cluster-specific configuration / environment
7
Existing OpenHPC Automation
Shortcomings:
● The input.local file exports shell variables
● The recipe.sh is not idempotent
● Extensibility is impossible without editing the files
8
Ansible Playbooks
Ansible is a widely used automation tool which can describe the structure and
configuration of IT infrastructure with YAML “playbooks” and “roles”.
OpenHPC with Ansible:
● Ansible playbooks can more easily be made idempotent
● Ansible can manage nodes/tasks according to the the structure of the cluster
● Configuration is passed as a YAML file (no environment handling)
● Composition, using playbooks and roles, building on third-party content
9
Ansible OpenHPC Benefits
● Flexible cluster configuration
○ Fine grained / composable
○ Cluster wide / node group wide / node specific
● Works on both x86_64 and AArch64
○ Ansible gathers information about architecture
○ Same playbook runs on both
● OS is directly inferred by Ansible (gather_facts)
○ Ansible gathers information about OS
○ Yum, apt, zypper… can be switched in the roles’ logic
10
Ansible OpenHPC Recipe
The basic structure of the Ansible playbook
playbook
+---- group_vars/
+-- all.yml cluster wide configurations
+-- group1,group2 … node group(e.g., computing nodes, login, DFS) specific configurations
+---- host_vars/
+--- host1,host2 … host specific configurations
+---- roles/ package specific tasks, templates, config files, and config variables
+--- package-name/
+--- tasks/main.yml … YAML file to describe installation method of package-name
+--- vars/main.yml … package specific configuration variables
+--- templates/*.j2 … template files to generate configuration files
11
Ansible OpenHPC Tasks
12
Upstreaming our work
After multiple discussions, our proposal is:
● Generate from LaTex sources at the same time as recipe.sh
○ Each block/section as a separate role
○ Control OS/Arch via gather_facts/config
○ xCAT vs Warewulf, Slurm vs PBS: only add what hasn’t yet
● Push to a separate repository (openhpc/ansible-role ?)
○ Control release branch/tags as to not overwrite previous recipes
○ Repo can have additional playbooks (CI, new users)
○ Once release is validated, push to Ansible Galaxy
● Direct pull requests to playbooks / support roles
○ Only recipe roles are auto-generated
○ Support roles (specific to Ansible, playbook) are maintained on the repo
13
Results
Test Suite
Most tests green, however:
● Intel-specific tests (CILK, TBB, IMB) disabled
● Others need package install (PDF, CDF, HDF), but pass when installed
● TAU fails because LMod defaults to openmpi (needs openmpi3)
● Lustre fails as package depends on kernel 4.2 (which won’t work on our machines)
● MiniDFT and PRK had make failures, but we haven’t investigated yet
● --enable-longdoesn’t really, need to look into why not
The plan from now on is:
1. Automate package install conditional on enabled tests, fix remaining errors
2. Work with members to prioritise long term ones (like Lustre)
3. Use it for additional packages, so we can test them before sending upstream
4. Add a benchmark mode, making sure to use entire cluster
15
Thank You!

More Related Content

What's hot

OpenStack Toronto: Juno Community Update
OpenStack Toronto: Juno Community UpdateOpenStack Toronto: Juno Community Update
OpenStack Toronto: Juno Community Update
Stephen Gordon
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
George Markomanolis
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Linaro
 
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph GaluschkaOpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
NETWAYS
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64
Linaro
 
Python Basics for Operators Troubleshooting OpenStack
Python Basics for Operators Troubleshooting OpenStackPython Basics for Operators Troubleshooting OpenStack
Python Basics for Operators Troubleshooting OpenStack
James Dennis
 
OpenPOWER foundation update new executive director and bright open future_i...
OpenPOWER  foundation update  new executive director and bright open future_i...OpenPOWER  foundation update  new executive director and bright open future_i...
OpenPOWER foundation update new executive director and bright open future_i...
Ganesan Narayanasamy
 
Openstack devops challenges
Openstack devops challenges Openstack devops challenges
Openstack devops challenges
openstackindia
 
TripleO
 TripleO TripleO
TripleO
Kiran Murari
 
Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?
Stephen Gordon
 
Openstack ansible
Openstack ansibleOpenstack ansible
Openstack ansible
George Paraskevas
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014
Sergey Lukjanov
 
Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...
Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...
Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...
Cloud Native Day Tel Aviv
 
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
Linaro
 
BKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopBKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing Hadoop
Linaro
 
Deploying OpenDaylight and OpenStack at Ease
Deploying OpenDaylight and OpenStack at EaseDeploying OpenDaylight and OpenStack at Ease
Deploying OpenDaylight and OpenStack at Ease
Michelle Holley
 
LAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoSLAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoS
Linaro
 
HA in OpenStack service - meetup #9
HA in OpenStack service - meetup #9HA in OpenStack service - meetup #9
HA in OpenStack service - meetup #9
Vietnam Open Infrastructure User Group
 
Guts & OpenStack migration
Guts & OpenStack migrationGuts & OpenStack migration
Guts & OpenStack migration
openstackindia
 
OpenStack Neutron behind the Scenes
OpenStack Neutron behind the ScenesOpenStack Neutron behind the Scenes
OpenStack Neutron behind the Scenes
Anil Bidari ( CEO , Cloud Enabled)
 

What's hot (20)

OpenStack Toronto: Juno Community Update
OpenStack Toronto: Juno Community UpdateOpenStack Toronto: Juno Community Update
OpenStack Toronto: Juno Community Update
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph GaluschkaOpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
OpenNebula Conf 2014: CentOS, QA an OpenNebula - Christoph Galuschka
 
BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64BKK16-305B ILP32 Performance on AArch64
BKK16-305B ILP32 Performance on AArch64
 
Python Basics for Operators Troubleshooting OpenStack
Python Basics for Operators Troubleshooting OpenStackPython Basics for Operators Troubleshooting OpenStack
Python Basics for Operators Troubleshooting OpenStack
 
OpenPOWER foundation update new executive director and bright open future_i...
OpenPOWER  foundation update  new executive director and bright open future_i...OpenPOWER  foundation update  new executive director and bright open future_i...
OpenPOWER foundation update new executive director and bright open future_i...
 
Openstack devops challenges
Openstack devops challenges Openstack devops challenges
Openstack devops challenges
 
TripleO
 TripleO TripleO
TripleO
 
Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?Dude, This Isn't Where I Parked My Instance?
Dude, This Isn't Where I Parked My Instance?
 
Openstack ansible
Openstack ansibleOpenstack ansible
Openstack ansible
 
OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014OpenStack Data Processing ("Sahara") project update - December 2014
OpenStack Data Processing ("Sahara") project update - December 2014
 
Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...
Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...
Andy McCrae, Rackspace - Using Ansible to Deploy and Automate OpenStack, Open...
 
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
 
BKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopBKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing Hadoop
 
Deploying OpenDaylight and OpenStack at Ease
Deploying OpenDaylight and OpenStack at EaseDeploying OpenDaylight and OpenStack at Ease
Deploying OpenDaylight and OpenStack at Ease
 
LAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoSLAS16-207: Bus scaling QoS
LAS16-207: Bus scaling QoS
 
HA in OpenStack service - meetup #9
HA in OpenStack service - meetup #9HA in OpenStack service - meetup #9
HA in OpenStack service - meetup #9
 
Guts & OpenStack migration
Guts & OpenStack migrationGuts & OpenStack migration
Guts & OpenStack migration
 
OpenStack Neutron behind the Scenes
OpenStack Neutron behind the ScenesOpenStack Neutron behind the Scenes
OpenStack Neutron behind the Scenes
 

Similar to OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018

LAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMG
Linaro
 
Network Automation: Ansible 101
Network Automation: Ansible 101Network Automation: Ansible 101
Network Automation: Ansible 101
APNIC
 
LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)
Linaro
 
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard UniverityTechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
OpenNebula Project
 
Introduction to ansible
Introduction to ansibleIntroduction to ansible
Introduction to ansible
Omid Vahdaty
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
Ceph Community
 
PLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł RozlachPLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł Rozlach
PROIDEA
 
PLNOG Automation@Brainly
PLNOG Automation@BrainlyPLNOG Automation@Brainly
PLNOG Automation@Brainly
vespian_256
 
Linux Kernel Platform Development: Challenges and Insights
 Linux Kernel Platform Development: Challenges and Insights Linux Kernel Platform Development: Challenges and Insights
Linux Kernel Platform Development: Challenges and Insights
GlobalLogic Ukraine
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
VMware Tanzu
 
Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势
Anthony Wong
 
Sheepdog Status Report
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status Report
Liu Yuan
 
Lec 10-linux-review
Lec 10-linux-reviewLec 10-linux-review
Lec 10-linux-review
abinaya m
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
Linaro
 
Introduction to ansible
Introduction to ansibleIntroduction to ansible
Introduction to ansible
Mukul Malhotra
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
AbdullahMunir32
 
Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014
vespian_256
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
Jaime Crespo
 
#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible
Cédric Delgehier
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
Linaro
 

Similar to OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018 (20)

LAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMGLAS16-209: Finished and Upcoming Projects in LMG
LAS16-209: Finished and Upcoming Projects in LMG
 
Network Automation: Ansible 101
Network Automation: Ansible 101Network Automation: Ansible 101
Network Automation: Ansible 101
 
LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)LAS16-400: Mini Conference 3 AOSP (Session 1)
LAS16-400: Mini Conference 3 AOSP (Session 1)
 
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard UniverityTechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
 
Introduction to ansible
Introduction to ansibleIntroduction to ansible
Introduction to ansible
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
PLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł RozlachPLNOG14: Automation at Brainly - Paweł Rozlach
PLNOG14: Automation at Brainly - Paweł Rozlach
 
PLNOG Automation@Brainly
PLNOG Automation@BrainlyPLNOG Automation@Brainly
PLNOG Automation@Brainly
 
Linux Kernel Platform Development: Challenges and Insights
 Linux Kernel Platform Development: Challenges and Insights Linux Kernel Platform Development: Challenges and Insights
Linux Kernel Platform Development: Challenges and Insights
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
 
Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势Linux 开源操作系统发展新趋势
Linux 开源操作系统发展新趋势
 
Sheepdog Status Report
Sheepdog Status ReportSheepdog Status Report
Sheepdog Status Report
 
Lec 10-linux-review
Lec 10-linux-reviewLec 10-linux-review
Lec 10-linux-review
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
 
Introduction to ansible
Introduction to ansibleIntroduction to ansible
Introduction to ansible
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
 
Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014Automation@Brainly - Polish Linux Autumn 2014
Automation@Brainly - Polish Linux Autumn 2014
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 
#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible#OktoCampus - Workshop : An introduction to Ansible
#OktoCampus - Workshop : An introduction to Ansible
 
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
LCE13: Test and Validation Mini-Summit: Review Current Linaro Engineering Pro...
 

More from Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Linaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Linaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
Linaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
Linaro
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
Linaro
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
Linaro
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
Linaro
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
Linaro
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
Linaro
 

More from Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
HKG18-500K1 - Keynote: Dileep Bhandarkar - Emerging Computing Trends in the D...
 
HKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready ProgramHKG18-317 - Arm Server Ready Program
HKG18-317 - Arm Server Ready Program
 
HKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NNHKG18-312 - CMSIS-NN
HKG18-312 - CMSIS-NN
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
HKG18-300K2 - Keynote: Tomas Evensen - All Programmable SoCs? – Platforms to ...
 
HKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: IntroductionHKG18-212 - Trusted Firmware M: Introduction
HKG18-212 - Trusted Firmware M: Introduction
 

Recently uploaded

High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptxAcumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx
BrainSell Technologies
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
The Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and CitiesThe Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and Cities
Arpan Buwa
 
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
Priyanka Aash
 
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
alexjohnson7307
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
SubhamMandal40
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
ankush9927
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
AmandaCheung15
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
alexjohnson7307
 
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
Priyanka Aash
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
Bhajan Mehta
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Nicolás Lopéz
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
Jimmy Lai
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
SAI KAILASH R
 
(CISOPlatform Summit & SACON 2024) Cyber Insurance & Risk Quantification.pdf
(CISOPlatform Summit & SACON 2024) Cyber Insurance & Risk Quantification.pdf(CISOPlatform Summit & SACON 2024) Cyber Insurance & Risk Quantification.pdf
(CISOPlatform Summit & SACON 2024) Cyber Insurance & Risk Quantification.pdf
Priyanka Aash
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 

Recently uploaded (20)

High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptxAcumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
The Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and CitiesThe Impact of the Internet of Things (IoT) on Smart Homes and Cities
The Impact of the Internet of Things (IoT) on Smart Homes and Cities
 
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
 
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
leewayhertz.com-Generative AI tech stack Frameworks infrastructure models and...
 
Sonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdfSonkoloniya documentation - ONEprojukti.pdf
Sonkoloniya documentation - ONEprojukti.pdf
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 
Zaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdfZaitechno Handheld Raman Spectrometer.pdf
Zaitechno Handheld Raman Spectrometer.pdf
 
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
leewayhertz.com-AI agents for healthcare Applications benefits and implementa...
 
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
 
Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17Mule Experience Hub and Release Channel with Java 17
Mule Experience Hub and Release Channel with Java 17
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
 
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python CodebaseEuroPython 2024 - Streamlining Testing in a Large Python Codebase
EuroPython 2024 - Streamlining Testing in a Large Python Codebase
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
 
(CISOPlatform Summit & SACON 2024) Cyber Insurance & Risk Quantification.pdf
(CISOPlatform Summit & SACON 2024) Cyber Insurance & Risk Quantification.pdf(CISOPlatform Summit & SACON 2024) Cyber Insurance & Risk Quantification.pdf
(CISOPlatform Summit & SACON 2024) Cyber Insurance & Risk Quantification.pdf
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 

OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018

  • 1. Baptiste Gerondeau, Takeharu Kato, Renato Golin Arm Workshop, 26th July 2018 OpenHPC Automation with Ansible
  • 2. Outline ● Linaro’s HPC-SIG Lab ● OpenHPC Ansible Automation ● Results
  • 3. The HPC-SIG Lab Goals: ● Cluster Automation & Validation ● Benchmarking & Performance Investigation Requirements: ● Stability & Repeatability ● Close-to-production environment ● Upstream technology (reproducibility) ● Vendor isolation (hardware, results) 3
  • 4. The HPC-SIG Lab Network Layout ● Uplink subnet ○ Access/Provision ● BMC subnet ● MPI subnet ○ 100GB InfiniBand ○ Slave provision ● File System subnet ○ 100GB Ethernet ○ Lustre/Ceph (future) 4
  • 5. The HPC-SIG Lab ● Stability & Repeatability ○ Critical external components cached locally ○ Strict migration plans (staging) ● Close-to-production environment ○ Hardware and firmware updated frequently ● Upstream technology (reproducibility) ○ All components open source / upstream ● Vendor isolation (hardware, results) ○ VPN, Provisioner, Jenkins, SSH control 5
  • 6. Open Source and Upstream Lab admin & tools: ● Jenkins: https://jenkins.io/ ● Mr-Provisioner: https://github.com/Linaro/mr-provisioner Lab Automation: ● https://github.com/Linaro/ans_setup_jenkins ● https://github.com/Linaro/mr-provisioner-role ● https://github.com/Linaro/mr-provisioner-kea-dhcp4-role ● https://github.com/Linaro/ansible-role-mr-provisioner ● https://github.com/Linaro/mr-provisioner-client HPC-specific automation: ● https://github.com/Linaro/hpc_lab_setup ● https://github.com/Linaro/hpc_deploy_benchmarks ● https://github.com/Linaro/benchmark_harness ● https://github.com/Linaro/ansible-playbook-for-ohpc 6
  • 7. Existing OpenHPC Automation ● A recipe with all rules described in the official documents ● LaTex snippets containing shell code ● Are converted and merged into a (OS-dependent) bash script ● Scripts in OHPC packages, installed after OpenHPC main package is installed ● Plus a input.local file, with some cluster-specific configuration / environment 7
  • 8. Existing OpenHPC Automation Shortcomings: ● The input.local file exports shell variables ● The recipe.sh is not idempotent ● Extensibility is impossible without editing the files 8
  • 9. Ansible Playbooks Ansible is a widely used automation tool which can describe the structure and configuration of IT infrastructure with YAML “playbooks” and “roles”. OpenHPC with Ansible: ● Ansible playbooks can more easily be made idempotent ● Ansible can manage nodes/tasks according to the the structure of the cluster ● Configuration is passed as a YAML file (no environment handling) ● Composition, using playbooks and roles, building on third-party content 9
  • 10. Ansible OpenHPC Benefits ● Flexible cluster configuration ○ Fine grained / composable ○ Cluster wide / node group wide / node specific ● Works on both x86_64 and AArch64 ○ Ansible gathers information about architecture ○ Same playbook runs on both ● OS is directly inferred by Ansible (gather_facts) ○ Ansible gathers information about OS ○ Yum, apt, zypper… can be switched in the roles’ logic 10
  • 11. Ansible OpenHPC Recipe The basic structure of the Ansible playbook playbook +---- group_vars/ +-- all.yml cluster wide configurations +-- group1,group2 … node group(e.g., computing nodes, login, DFS) specific configurations +---- host_vars/ +--- host1,host2 … host specific configurations +---- roles/ package specific tasks, templates, config files, and config variables +--- package-name/ +--- tasks/main.yml … YAML file to describe installation method of package-name +--- vars/main.yml … package specific configuration variables +--- templates/*.j2 … template files to generate configuration files 11
  • 13. Upstreaming our work After multiple discussions, our proposal is: ● Generate from LaTex sources at the same time as recipe.sh ○ Each block/section as a separate role ○ Control OS/Arch via gather_facts/config ○ xCAT vs Warewulf, Slurm vs PBS: only add what hasn’t yet ● Push to a separate repository (openhpc/ansible-role ?) ○ Control release branch/tags as to not overwrite previous recipes ○ Repo can have additional playbooks (CI, new users) ○ Once release is validated, push to Ansible Galaxy ● Direct pull requests to playbooks / support roles ○ Only recipe roles are auto-generated ○ Support roles (specific to Ansible, playbook) are maintained on the repo 13
  • 15. Test Suite Most tests green, however: ● Intel-specific tests (CILK, TBB, IMB) disabled ● Others need package install (PDF, CDF, HDF), but pass when installed ● TAU fails because LMod defaults to openmpi (needs openmpi3) ● Lustre fails as package depends on kernel 4.2 (which won’t work on our machines) ● MiniDFT and PRK had make failures, but we haven’t investigated yet ● --enable-longdoesn’t really, need to look into why not The plan from now on is: 1. Automate package install conditional on enabled tests, fix remaining errors 2. Work with members to prioritise long term ones (like Lustre) 3. Use it for additional packages, so we can test them before sending upstream 4. Add a benchmark mode, making sure to use entire cluster 15