CPU Optimizations in the CERN Cloud - February 2016

•

2 likes•854 views

Belmiro Moreira

OpenStack User Group Manchester, February 2016

Technology

CPU optimizations in the CERN Cloud
Ops Midcycle - High Performance Computing with OpenStack - Manchester, 2016
Belmiro Moreira
belmiro.moreira@cern.ch @belmiromoreira
Arne Wiebalck
Tim Bell
Sean Crosby (Univ. of Melbourne)
Ulrich Schwickerath

CERN Cloud – LHC and Experiments
4
CMS detector
https://www.google.com/maps/streetview/#cern

OpenStack at CERN by numbers
6
~ 5500 Compute Nodes (~140k cores)
•  ~ 5300 KVM
•  ~ 200 Hyper-V
~ 2800 Images ( ~ 44 TB in use)
~ 2000 Volumes ( ~ 800 TB allocated)
~ 2200 Users
~ 2500 Projects
> 17000 VMs running
Number of VMs created (green) and VMs deleted (red) every 30 minutes

The “20% overhead” problem
•  When running the batch system on top of the Cloud Infrastructure
we reach the limit of the total number of hosts in LSF
•  On our batch full node VMs we noticed that the HS06 rating was
~20% lower than on the underlying host
•  Smaller VMs behaved much better: ~8% (sum of simultaneous
HS06 runs on 4x8core VMs on a 32core host)
7

HS06 on virtual batch workers
8
HWDB
HS06
VM Size
(cores)
Per VM
HS06
Total HS06 Overhead
357±16
4x 8 82.3±11 329 7.8%
2x 16 150±5 300 16%
1x 32 284±11 284 20.4%
Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz

Testing Optimizations – KSM off
9
•  ATLAS T0 batch VMs show an IOwait of 20-30%
•  Compute nodes started to swap even when leaving 2 GB for
the OS

Optimization by numbers – EPT off
10
HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead
357±16
4x 8 82.3±11 329 7.8%
2x 16 150±5 300 16%
1x 32 284±11 284 20.4%
HWDB HS06
VM Size
(cores)
Per VM HS06 Total HS06 Overhead
Overhead
Reduction
357±16
4x 8 87±11 348 2.5% 68%
2x 16 163.5±1 327 8.4% 52%
1x 32 311±1 311 12.9% 37%
Before:
After:

General virtualization issue?
•  Crosscheck w/ SLC6 VMs on Hyper-V
-  0.8% HS06 loss on 4x 8-core
-  3.3% HS06 loss on 1x 32-core SLC6 VM
•  No general virtualization overhead issue!
-  Rather a feature or configuration issue
•  What’s the difference between the VMs on Hyper-V and KVM?
11

NUMA
•  Hyper-V VMs have vCPUs pinned to
physical NUMA nodes
-  Pinned to sets that correspond to
physical NUMA nodes
•  OpenStack wider support for this is available in Kilo
12

NUMA - in the lab
… reduced the overhead to ~3% of the bare metal
13

Deploying in production
•  EPT off; KSM on; NUMA-aware
•  System services add ~1-2% overhead
•  We got a total overhead of:
~5%
14

$and then Extremely slow nodes... •  Small fraction of jobs 10x slower -  VMs look OK, actually pretty good -  Hosts: 30-50% system load, >100k IRQ/s (mostly TLB shoot-downs) •  Load attributed to qemu-kvm -  ‘perf top’: 90% in _raw_spin_lock -  ‘systemtap’: paging64_page_fault and kvm_mmu_pte* … 15 VM CPU utilization Compute Node CPU utilization$

Back to the drawing board
•  Needed to combine optimizations with EPT on
•  Huge pages a way out?
-  Idea: reduce the number of pages to be handled, increase hit ratio
•  1GB huge pages
-  Best HS06 results (with EPT on)
•  2MB huge pages
-  Also one of the default sizes
-  Performance loss around 5% compared to bare metal on batch VMs
16

Optimization by numbers
17
- NUMA + Pinning
- 2MB huge pages
- EPT on
- KSM on
VM sizes
(cores)
Before After
4x 8 7.8% 3.3%
2x 16 16% 4.6%
1x 32 20.4% 3-6%

$Deploy in production •  A small fraction can cause a lot of trouble… 18$

Summary
•  Reduced the virtualization HS06 overhead to a few
percent compared to bare metal
-  On full node VMs!
-  NUMA + pinning + huge pages + EPT on + KSM on
•  Pre-deployment testing very difficult
-  EPT off side-effects initially undetected
19

belmiro.moreira@cern.ch
@belmiromoreira
http://openstack-in-production.blogspot.com

What's hot

Learning to Scale OpenStack

Rainya Mosher

The next generation of research infrastructure and large scale scientific instruments will face new magnitudes of data. This talk presents two flagship programmes: the next generation of the Large Hadron Collider (LHC) at CERN and the Square Kilometre Array (SKA) radio telescope. Each in their way will push infrastructure to the limit. The LHC has been one of the significant users of OpenStack in scientific computing. The SKA is now working to a final software architecture design and is focusing on OpenStack as an underlying middleware function. Together, we plan to develop a common platform for scaling science: to accommodate new applications and software services, to deliver high ingest rate real-time and batch processing, to integrate high performance storage and to unlock the potential of software defined networking.

Future Science on Future OpenStack

Belmiro Moreira

"Networking Quality of Service was introduced in Neutron in the Liberty cycle, the initial work included API additions and implementation of an extendable mechanism. The thought was to be able to accommodate all the crazy ideas network engineers have. We started with basic bandwidth limiting rule and then enhanced the mechanism to support upgrades, RBAC (Role Based Access Control), DSCP marking and more In this session we would cover the the work that was done for supporting Networking QoS in Neutron as well as the near future plans in this domain."

Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...

Cloud Native Day Tel Aviv

The Leibniz Supercomputing Centre (shortly LRZ) is the IT service provider of the Bavarian Academy of Science and Humanities. LRZ decided to set up a cloud computing service for its institutional customers, such as the students and the researchers of the Munich universities, to meet their peek demands and to provide a very flexible compute service to them. This talk will describe the reasons and the benefits behind the choice of OpenNebula for this task, with a particular focus on the customisations that were required, such as: – network isolation among different groups of users, similarly to a private VLAN; – management of network security through four different security zones and OpenVswitch – the introduction of a mechanism to limit the usage of shared resources over time rather than just partitioning the cluster Author Biography Matteo Lanati received a MS in Electronic Engineering from the University of Pavia (Italy) in 2004. In 2007 he completed his Ph.D. program at the same University, starting a post-doc cooperation with EUCENTRE (European Centre for Training and Research in Earthquake Engineering) to work on two European projects involving grid computing and body area networks. In 2011 he joined LRZ where he built on his distributed computing skills. He is currently part of the team that takes care of the institution’s compute cloud platform.

OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...

OpenNebula Project

Stig Telfer - OpenStack and the Software-Defined SuperComputer

Danny Abukalam

20121017 OpenStack CERN Accelerating Science

Tim Bell

Audience Level Intermediate Synopsis The traditional user experience for High Performance Computing (HPC) centers around the command line, and the intricacies of the underlying hardware. At the same time, scientific software is moving towards the cloud, leveraging modern web-based frameworks, allowing rapid iteration, and a renewed focus on portability and reproducibility. This software still has need for the huge scale and specialist capabilities of HPC, but leveraging these resources is hampered by variation in implementation between facilities. Differences in software stack, scheduling systems and authentication all get in the way of developers who would rather focus on the research problem at hand. This presentation reviews efforts to overcome these barriers. We will cover container technologies, frameworks for programmatic HPC access, and RESTful APIs that can deliver this as a hosted solution. Speaker Bio Dr. David Perry is Compute Integration Specialist at The University of Melbourne, working to increase research productivity using cloud and HPC. David chairs Australia’s first community-owned wind farm, Hepburn Wind, and is co-founder/CTO of BoomPower, delivering simpler solar and battery purchasing decisions for consumers and NGOs.

Supercomputing by API: Connecting Modern Web Apps to HPC

OpenStack

Geneve

Madhu c

DataStax: Extreme Cassandra Optimization: The Sequel

DataStax Academy

Efficient monitoring is crucial when managing your Cloud infrastructure. The metrics collected by OpenNebula can be used to trigger automatic scaling, or quickly detect failures to automatically restart virtual machines. During this talk, I will show how OpenNebula can be used to efficiently monitor thousands of virtual machines at sub-1 minute interval. I will show how OpenNebula can be enhanced and optimized, and how different metrics collection tools such as Ganglia and Host-sFlow can be used with OpenNebula to monitor large-scale Cloud infrastructures.

Monitoring Large-scale Cloud Infrastructures with OpenNebula

NETWAYS

ELK: Moose-ively scaling your log system

Avleen Vig

[OpenInfra Days Korea 2018] (Track 3) - CephFS with OpenStack Manila based on...

OpenStack Korea Community

Audience Level Intermediate Synopsis Hypercoverged Compute, Network and Storage is ready for production workloads – where it makes sense. Whether you’re a telecommunications carrier, service provider or enterprise; implementing Network Function Virtualisation (NFV), focusing on specific known workloads or simply a dev / test cloud – deploying a hypercoverged OpenStack cloud makes a lot of sense. Come along and discover which workloads fit a hyperconverged architecture, see examples and look into the very near future and learn how OpenStack is truly ready to serve your every need. Speaker Bio Andrew has over 20 years experience in the IT industry across APAC, specialising in Databases, Directory Systems, Groupware, Virtualisation and Storage for Enterprise and Government organisations. When not helping customers slash costs and increase agility by moving to the software-defined future, he’s enjoying the subtle tones of Islay Whisky and shredding pow pow on the world’s best snowboard resorts.

Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat

OpenStack

Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...

Spark Summit

Performance Benchmarking of Clouds Evaluating OpenStack

Pradeep Kumar

Operational War Stories from 5 Years of Running OpenStack in Production

Arne Wiebalck

Open stack china_201109_sjtu_jinyh

OpenCity Community

CERN the European Laboratory for Nuclear Research and SKA the Square Kilmeter Array are preparing the next generation of research infrastructure for the new large scale scientific instruments that will produce new magnitudes of data. In Sydney OpenStack Summit we presented the collaboration and the platform that we plan to develop for scaling science. In this talk will present the work done related with Preemptible VMs and Containers on Baremetal. Preemptible VMs are instances that use idle allocated resources in the infrastructure and can be terminated when this capacity is required. Containers in Baremetal eliminate the virtualization overhead enabling the container full performance required for scientific workloads. We will present the current state, development and integration decisions and how these functionalities can be used in a common OpenStack infrastructure.

Containers on Baremetal and Preemptible VMs at CERN and SKA

Belmiro Moreira

Antoine Coetsier - billing the cloud

ShapeBlue

Audience Level Beginner Synopsis Layer 2 versus Layer 3, MLAG, Spanning-Tree, switch mechanism drivers, overlays and routing-on-the-host — What scales and what does not? The underlying plumbing of an OpenStack network is something you’d rather not have to think about. This presentation examines the network architectures of web-scale and large enterprise OpenStack users and how those same efficiencies can be used in deployments of all sizes. Speaker Bio: Scott is a Member of Technical Staff at Cumulus Networks where he designs, supports and deploys web-scale technologies and architectures in enterprise networks globally. Prior to becoming a founding member of the Cumulus office in Australia, Scott started his career as a network administrator before joining Cisco Systems to support their data centre products. OpenStack Australia Day Melbourne 2017 https://events.aptira.com/openstack-australia-day-melbourne-2017/

OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks

OpenStack

What's hot (20)

Learning to Scale OpenStack

Future Science on Future OpenStack

Networking, QoS, Liberty, Mitaka and Newton - Livnat Peer - OpenStack Day Isr...

OpenNebulaConf2015 2.05 OpenNebula at the Leibniz Supercomputing Centre - Mat...

Stig Telfer - OpenStack and the Software-Defined SuperComputer

20121017 OpenStack CERN Accelerating Science

Supercomputing by API: Connecting Modern Web Apps to HPC

Geneve

DataStax: Extreme Cassandra Optimization: The Sequel

Monitoring Large-scale Cloud Infrastructures with OpenNebula

ELK: Moose-ively scaling your log system

[OpenInfra Days Korea 2018] (Track 3) - CephFS with OpenStack Manila based on...

Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat

Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...

Performance Benchmarking of Clouds Evaluating OpenStack

Operational War Stories from 5 Years of Running OpenStack in Production

Open stack china_201109_sjtu_jinyh

Containers on Baremetal and Preemptible VMs at CERN and SKA

Antoine Coetsier - billing the cloud

OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks

Similar to CPU Optimizations in the CERN Cloud - February 2016

KVM Tuning @ eBay

Xu Jiang

VMworld 2013: Extreme Performance Series: Monster Virtual Machines

VMworld

In-memory processing has started to become the norm in large scale data handling. This is aclose to the metal analysis of highly important but often neglected aspects of memory accesstimes and how it impacts big data and NoSQL technologies.We cover aspects such as the TLB, the Transparent Huge Pages, the QPI Link, Hyperthreading and the impact of virtualization on high-memory footprint applications. We present benchmarks of various technologies ranging from Cloudera’s Impala to Couchbase and how they are impacted by the underlying hardware.The key takeaway is a better understanding of how to size a cluster, how to choose a cloud provider and an instance type for big data and NoSQL workloads and why not every core or GB of RAM is created equal.

Memory, Big Data, NoSQL and Virtualization

Bigstep

Deep Dive on Amazon EC2

Amazon Web Services

Resilience at Extreme Scale

Marc Snir

Amazon EC2 provides a broad selection of instance types to deliver high performance for a diverse mix of applications. In this session, we overview the drivers of system performance and discuss in depth how Amazon EC2 instances deliver system performance while also providing elasticity and complete control over your infrastructure. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.

Deep Dive on Delivering Amazon EC2 Instance Performance

Amazon Web Services

Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud

Ceph Community

Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud

Patrick McGarry

CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...

Ceph Community

The Economics of Scaling Cassandra - By Alex Bordei, Techie Product Manager at Bigstep This presentation was made during the "Cassandra Summit 2014" Event, in London. We benchmarked Cassandra on a number of configurations and we show what's the scaling profile. We test Cassandra on Docker as well as Cassandra's In-memory feature. Follow Alex on Twitter: @alexandrubordei Bigstep on Twitter: @BigStepInc If you have any questions, let us know at hello@bigstep.com and we'll do our best to answer. Stay informed: http://blog.bigstep.com/

Cassandra Performance Benchmark

Bigstep

Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.

AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...

Amazon Web Services

AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...

Amazon Web Services

How Ceph performs on ARM Microserver Cluster

Aaron Joue

Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io

OPNFV

Performance Oriented Design

Rodrigo Campos

Presentation v mware performance overview

solarisyourep

z/VM Performance Analysis

Rodrigo Campos

Designing for High Performance Ceph at Scale

James Saint-Rossy

AIST Super Green Cloud: lessons learned from the operation and the performanc...

Ryousei Takano

CPN302 your-linux-ami-optimization-and-performance

Coburn Watson

Similar to CPU Optimizations in the CERN Cloud - February 2016 (20)

KVM Tuning @ eBay

VMworld 2013: Extreme Performance Series: Monster Virtual Machines

Memory, Big Data, NoSQL and Virtualization

Deep Dive on Amazon EC2

Resilience at Extreme Scale

Deep Dive on Delivering Amazon EC2 Instance Performance

Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud

CEPH DAY BERLIN - 5 REASONS TO USE ARM-BASED MICRO-SERVER ARCHITECTURE FOR CE...

Cassandra Performance Benchmark

AWS re:Invent 2016: Deep Dive on Amazon EC2 Instances, Featuring Performance ...

AWS re:Invent 2016: [JK REPEAT] Deep Dive on Amazon EC2 Instances, Featuring ...

How Ceph performs on ARM Microserver Cluster

Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io

Performance Oriented Design

Presentation v mware performance overview

z/VM Performance Analysis

Designing for High Performance Ceph at Scale

AIST Super Green Cloud: lessons learned from the operation and the performanc...

CPN302 your-linux-ami-optimization-and-performance

Recently uploaded

Angeliki Cooney has spent over twenty years at the forefront of the life sciences industry, working out of Wynantskill, NY. She is highly regarded for her dedication to advancing the development and accessibility of innovative treatments for chronic diseases, rare disorders, and cancer. Her professional journey has centered on strategic consulting for biopharmaceutical companies, facilitating digital transformation, enhancing omnichannel engagement, and refining strategic commercial practices. Angeliki's innovative contributions include pioneering several software-as-a-service (SaaS) products for the life sciences sector, earning her three patents. As the Senior Vice President of Life Sciences at Avenga, Angeliki orchestrated the firm's strategic entry into the U.S. market. Avenga, a renowned digital engineering and consulting firm, partners with significant entities in the pharmaceutical and biotechnology fields. Her leadership was instrumental in expanding Avenga's client base and establishing its presence in the competitive U.S. market.

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Angeliki Cooney

Three things you will take away from the session: • How to run an effective tenant-to-tenant migration • Best practices for before, during, and after migration • Tips for using migration as a springboard to prepare for Copilot in Microsoft 365 Main ideas: Migration Overview: The presentation covers the current reality of cross-tenant migrations, the triggers, phases, best practices, and benefits of a successful tenant migration Considerations: When considering a migration, it is important to consider the migration scope, performance, customization, flexibility, user-friendly interface, automation, monitoring, support, training, scalability, data integrity, data security, cost, and licensing structure Next Wave: The next wave of change includes the launch of Copilot, which requires businesses to be prepared for upcoming changes related to Copilot and the cloud, and to consolidate data and tighten governance ShareGate: ShareGate can help with pre-migration analysis, configurable migration tool, and automated, end-user driven collaborative governance

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

sammart93

Boost Fertility New Invention Ups Success Rates.pdf

sudhanshuwaghmare1

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Zilliz

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Edi Saputra

DBX First Quarter 2024 Investor Presentation

Dropbox

AWS Community Day CPH - Three problems of Terraform

Andrey Devyatkin

Corporate and higher education. Two industries that, in the past, have had a clear divide with very little crossover. The difference in goals, learning styles and objectives paved the way for differing learning technologies platforms to evolve. Now, those stark lines are blurring as both sides are discovering they have content that’s relevant to the other. Join Tammy Rutherford as she walks through the pros and cons of corporate and higher ed collaborating. And the challenges of these different technology platforms working together for a brighter future.

Corporate and higher education May webinar.pptx

Rustici Software

The Good, the Bad and the Governed - Why is governance a dirty word? David O'Neill, Chief Operating Officer - APIContext Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

apidays

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

Accelerating FinTech Innovation: Unleashing API Economy and GenAI Vasa Krishnan, Chief Technology Officer - FinResults Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

apidays

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

ICT role in 21st century education and its challenges

rafiqahmad00786416

presentation ICT roal in 21st century education

jfdjdjcjdnsjd

How to Troubleshoot Apps for the Modern Connected Worker

ThousandEyes

Join our latest Connector Corner webinar to discover how UiPath Integration Service revolutionizes API-centric automation in a 'Quote to Cash' process—and how that automation empowers businesses to accelerate revenue generation. A comprehensive demo will explore connecting systems, GenAI, and people, through powerful pre-built connectors designed to speed process cycle times. Speakers: James Dickson, Senior Software Engineer Charlie Greenberg, Host, Product Marketing Manager

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

DianaGray10

CNIC Information System with Pakdata Cf In Pakistan

danishmna97

FWD Group - Insurer Innovation Award 2024

The Digital Insurer

Recently uploaded (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Boost Fertility New Invention Ups Success Rates.pdf

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

DBX First Quarter 2024 Investor Presentation

AWS Community Day CPH - Three problems of Terraform

Corporate and higher education May webinar.pptx

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Exploring the Future Potential of AI-Enabled Smartphone Processors

Artificial Intelligence Chap.5 : Uncertainty

ICT role in 21st century education and its challenges

presentation ICT roal in 21st century education

How to Troubleshoot Apps for the Modern Connected Worker

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

CNIC Information System with Pakdata Cf In Pakistan

FWD Group - Insurer Innovation Award 2024

CPU Optimizations in the CERN Cloud - February 2016

2. CPU optimizations in the CERN Cloud Ops Midcycle - High Performance Computing with OpenStack - Manchester, 2016 Belmiro Moreira belmiro.moreira@cern.ch @belmiromoreira Arne Wiebalck Tim Bell Sean Crosby (Univ. of Melbourne) Ulrich Schwickerath

3. What is CERN? 3

4. CERN Cloud – LHC and Experiments 4 CMS detector https://www.google.com/maps/streetview/#cern

5. CERN Cloud – AMS 5

6. OpenStack at CERN by numbers 6 ~ 5500 Compute Nodes (~140k cores) •  ~ 5300 KVM •  ~ 200 Hyper-V ~ 2800 Images ( ~ 44 TB in use) ~ 2000 Volumes ( ~ 800 TB allocated) ~ 2200 Users ~ 2500 Projects > 17000 VMs running Number of VMs created (green) and VMs deleted (red) every 30 minutes

7. The “20% overhead” problem •  When running the batch system on top of the Cloud Infrastructure we reach the limit of the total number of hosts in LSF •  On our batch full node VMs we noticed that the HS06 rating was ~20% lower than on the underlying host •  Smaller VMs behaved much better: ~8% (sum of simultaneous HS06 runs on 4x8core VMs on a 32core host) 7

8. HS06 on virtual batch workers 8 HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead 357±16 4x 8 82.3±11 329 7.8% 2x 16 150±5 300 16% 1x 32 284±11 284 20.4% Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz

9. Testing Optimizations – KSM off 9 •  ATLAS T0 batch VMs show an IOwait of 20-30% •  Compute nodes started to swap even when leaving 2 GB for the OS

10. Optimization by numbers – EPT off 10 HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead 357±16 4x 8 82.3±11 329 7.8% 2x 16 150±5 300 16% 1x 32 284±11 284 20.4% HWDB HS06 VM Size (cores) Per VM HS06 Total HS06 Overhead Overhead Reduction 357±16 4x 8 87±11 348 2.5% 68% 2x 16 163.5±1 327 8.4% 52% 1x 32 311±1 311 12.9% 37% Before: After:

11. General virtualization issue? •  Crosscheck w/ SLC6 VMs on Hyper-V -  0.8% HS06 loss on 4x 8-core -  3.3% HS06 loss on 1x 32-core SLC6 VM •  No general virtualization overhead issue! -  Rather a feature or configuration issue •  What’s the difference between the VMs on Hyper-V and KVM? 11

12. NUMA •  Hyper-V VMs have vCPUs pinned to physical NUMA nodes -  Pinned to sets that correspond to physical NUMA nodes •  OpenStack wider support for this is available in Kilo 12

13. NUMA - in the lab … reduced the overhead to ~3% of the bare metal 13

14. Deploying in production •  EPT off; KSM on; NUMA-aware •  System services add ~1-2% overhead •  We got a total overhead of: ~5% 14

15. and then Extremely slow nodes... •  Small fraction of jobs 10x slower -  VMs look OK, actually pretty good -  Hosts: 30-50% system load, >100k IRQ/s (mostly TLB shoot-downs) •  Load attributed to qemu-kvm -  ‘perf top’: 90% in _raw_spin_lock -  ‘systemtap’: paging64_page_fault and kvm_mmu_pte* … 15 VM CPU utilization Compute Node CPU utilization

16. Back to the drawing board •  Needed to combine optimizations with EPT on •  Huge pages a way out? -  Idea: reduce the number of pages to be handled, increase hit ratio •  1GB huge pages -  Best HS06 results (with EPT on) •  2MB huge pages -  Also one of the default sizes -  Performance loss around 5% compared to bare metal on batch VMs 16

17. Optimization by numbers 17 - NUMA + Pinning - 2MB huge pages - EPT on - KSM on VM sizes (cores) Before After 4x 8 7.8% 3.3% 2x 16 16% 4.6% 1x 32 20.4% 3-6%

18. Deploy in production •  A small fraction can cause a lot of trouble… 18

19. Summary •  Reduced the virtualization HS06 overhead to a few percent compared to bare metal -  On full node VMs! -  NUMA + pinning + huge pages + EPT on + KSM on •  Pre-deployment testing very difficult -  EPT off side-effects initially undetected 19

20. belmiro.moreira@cern.ch @belmiromoreira http://openstack-in-production.blogspot.com

Editor's Notes

I will do my best to answer your questions.

CPU Optimizations in the CERN Cloud - February 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to CPU Optimizations in the CERN Cloud - February 2016

Similar to CPU Optimizations in the CERN Cloud - February 2016 (20)

Recently uploaded

Recently uploaded (20)

CPU Optimizations in the CERN Cloud - February 2016

Editor's Notes