This document summarizes Tim Bell's presentation on OpenStack at CERN. It discusses how CERN adopted OpenStack in 2011 to manage its growing computing infrastructure needs for processing massive data sets from the Large Hadron Collider. OpenStack has since been scaled up to manage over 300,000 CPU cores and 500,000 physics jobs per day across CERN's private cloud. The document also briefly outlines CERN's use of other open source technologies like Ceph and Kubernetes.
10 Years of OpenStack at CERN - From 0 to 300k coresBelmiro Moreira
CERN, the European Laboratory for Particle Physics, provides the infrastructure and resources to thousands of scientists all around the world to uncover the mysteries of the Universe. In the quest to build a private Cloud Infrastructure to support its users, CERN started early evaluating the OpenStack project, building several prototypes and engaging with the community. Finally, in 2013 CERN released its production Cloud Infrastructure using OpenStack. Since then we moved from a few hundred cores to a multi-cell deployment spread between different regions. After 7 years deploying and managing OpenStack in production at a large scale, we now look back and discuss the challenges of building a massive scale infrastructure from 0 to +300K cores. In this talk we will dive into the history, architecture, tools and technical decisions behind the CERN Cloud Infrastructure over the years.
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Belmiro Moreira
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale
OpenStack Design Summit, Paris - November, 2014
Belmiro Moreira - CERN
Matt Van Winkle - Rackspace
Sam Morrison - NeCTAR, University of Melbourne
Containers on Baremetal and Preemptible VMs at CERN and SKABelmiro Moreira
CERN the European Laboratory for Nuclear Research and SKA the Square Kilmeter Array are preparing the next generation of research infrastructure for the new large scale scientific instruments that will produce new magnitudes of data. In Sydney OpenStack Summit we presented the collaboration and the platform that we plan to develop for scaling science.
In this talk will present the work done related with Preemptible VMs and Containers on Baremetal.
Preemptible VMs are instances that use idle allocated resources in the infrastructure and can be terminated when this capacity is required. Containers in Baremetal eliminate the virtualization overhead enabling the container full performance required for scientific workloads.
We will present the current state, development and integration decisions and how these functionalities can be used in a common OpenStack infrastructure.
CERN, the European Organization for Nuclear Research, is running for several years a large OpenStack Cloud that helps thousands of scientists to analyze the data from the LHC.
In 2012, early in the design phase of the CERN Cloud we decided to use Nova Cells to enable the infrastructure to scale to thousands of nodes. Now with more than 280K cores spread across 70 cells that are hosted in two data centres we were faced with the challenge to migrate to Nova Cells V2 required in the Pike release.
In this presentation, we will describe how Nova Cells allowed CERN to scale to thousands of nodes, its advantages and how we mitigate the implementation issues of Nova Cells V1. Next, we will cover how we upgraded Nova from Newton with Cells V1 to Pike with Cells V2. We will explain the steps that we followed and the issues that we faced during the upgrade. Finally, we will report our experience with Cells V2 at scale, its caveats and how we are mitigate them.
What can I expect to learn?
This presentation describes how CERN migrated from Cells V1 to Cells V2 when upgraded from Newton to Pike release.
You will learn the procedures followed by CERN in order to migrate Cells V1 to Cells V2 in a large production environment.
The issues found during the upgrade and how we mitigate them will be discussed.
Also, we will present how Cells V2 behaves in a large scale deployment with serveral thounsands nodes in 70 cells.
CERN is the European Centre for Particle Physics based in Geneva. The home of the Large Hadron Collider and the birth place of the world wide web is expanding its computing resources with a second data centre to process over 35PB/year from one of the largest scientific experiments ever constructed.
Within the constraints of fixed budget and manpower, agile computing techniques and common open source tools are being adopted to support over 11,000 physicists in their search for how the universe works and what is it made of.
By challenging special requirements and understanding how other large computing infrastructures are built, we have deployed a 50,000 core cloud based infrastructure building on tools such as Puppet, OpenStack and Kibana.
In moving to a cloud model, this has also required close examination of the IT processes and culture. Finding the right approach between Enterprise and DevOps techniques has been one of the greatest challenges of this transformation.
This talk will cover the requirements, tools selected, results achieved so far and the outlook for the future.
Learning to Scale Openstack: A Case Study in Rackspace's Open Cloud Deployment was presented at OpenStack Design Summit in Portland, OR on April 17, 2013. Watch the recording of the presentation on youtube at the following link: http://www.youtube.com/watch?v=3x8X6f5mnzc
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
Presented at PEARC20.
This talk presents expanding the IceCube’s production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.
Slides from our Q3 meetup held in Montreal on September 27th 2017 at the Cloud.ca Center.
Video recording can be seen at: https://www.youtube.com/watch?v=_1btwHW39ms&list=PLSsQodeQD6LPyqrvvczcC5mkOOnPt469o
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in DetailJose Antonio Coarasa Perez
Overlay opportunistic clouds in CMS/ATLAS at CERN: The CMSooooooCloud in detail
The CMS and ATLAS online clusters consist of more than 3000 computers each. They have been exclusively used for the data acquisition that led to the Higgs particle discovery, handling 100Gbytes/s data flows and archiving 20Tbytes of data per day.
An openstack cloud layer has been deployed on the newest part of the clusters (totalling 1300 hypervisors and more than 13000 cores in CMS alone) as a minimal overlay so as to leave the primary role of the computers untouched while allowing an opportunistic usage of the cluster.
This presentation will show how to share resources with a minimal impact on the existing infrastructure. We will present the architectural choices made to deploy an unusual, as opposed to dedicated, "overlaid cloud infrastructure". These architectural choices ensured a minimal impact on the running cluster configuration while giving a maximal segregation of the overlaid virtual computer infrastructure. The use of openvswitch to avoid changes on the network infrastructure and encapsulate the virtual machines traffic will be illustrated, as well as the networking configuration adopted due to the nature of our private network. The design and performance of the openstack cloud controlling layer will be presented. We will also show the integration carried out to allow the cluster to be used in an opportunistic way while giving full control to the CMS online run control.
Overview of what has happened in HNSciCloud over the last five months, Delivered by Helge Meinhard of CERN at the HEPiX Workshop on October 21st, 2016, in Berkeley, California, USA.
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
We ran a 50k GPU multi-cloud simulation to support the IceCube science. This talk provided an overview of what happened to the associated data.
Presented at the Internet2 booth at SC19.
For IceCube, large amount of photon propagation simulation is needed to properly calibrate natural Ice. Simulation is compute intensive and ideal for GPU compute. This Cloud run was more data intensive than precious ones, producing 130 TB of output data. To keep egress costs in check, we created dedicated network links via the Internet2 Cloud Connect Service.
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)Jose Antonio Coarasa Perez
The CMS online cluster consists of more than 3000 computers. It has been exclusively used for the Data Acquisition of the CMS experiment at CERN, archiving around 20Tbytes of data per day.
An openstack cloud layer has been deployed on part of the cluster (totalling more than 13000 cores) as a minimal overlay so as to leave the primary role of the computers untouched while allowing an opportunistic usage of the cluster. This allows running offline computing jobs on the online infrastructure while it is not (fully) used.
We will present the architectural choices made to deploy an unusual, as opposed to dedicated, "overlaid cloud infrastructure". These architectural choices ensured a minimal impact on the running cluster configuration while giving a maximal segregation of the overlaid virtual computer infrastructure. Openvswitch was chosen during the proof of concept phase in order to avoid changes on the network infrastructure. Its use will be illustrated as well as the final networking configuration used. The design and performance of the openstack cloud controlling layer will be also presented together with new developments and experience from the first year of usage.
10 Years of OpenStack at CERN - From 0 to 300k coresBelmiro Moreira
CERN, the European Laboratory for Particle Physics, provides the infrastructure and resources to thousands of scientists all around the world to uncover the mysteries of the Universe. In the quest to build a private Cloud Infrastructure to support its users, CERN started early evaluating the OpenStack project, building several prototypes and engaging with the community. Finally, in 2013 CERN released its production Cloud Infrastructure using OpenStack. Since then we moved from a few hundred cores to a multi-cell deployment spread between different regions. After 7 years deploying and managing OpenStack in production at a large scale, we now look back and discuss the challenges of building a massive scale infrastructure from 0 to +300K cores. In this talk we will dive into the history, architecture, tools and technical decisions behind the CERN Cloud Infrastructure over the years.
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale - November, 2014Belmiro Moreira
Multi-Cell OpenStack: How to Evolve Your Cloud to Scale
OpenStack Design Summit, Paris - November, 2014
Belmiro Moreira - CERN
Matt Van Winkle - Rackspace
Sam Morrison - NeCTAR, University of Melbourne
Containers on Baremetal and Preemptible VMs at CERN and SKABelmiro Moreira
CERN the European Laboratory for Nuclear Research and SKA the Square Kilmeter Array are preparing the next generation of research infrastructure for the new large scale scientific instruments that will produce new magnitudes of data. In Sydney OpenStack Summit we presented the collaboration and the platform that we plan to develop for scaling science.
In this talk will present the work done related with Preemptible VMs and Containers on Baremetal.
Preemptible VMs are instances that use idle allocated resources in the infrastructure and can be terminated when this capacity is required. Containers in Baremetal eliminate the virtualization overhead enabling the container full performance required for scientific workloads.
We will present the current state, development and integration decisions and how these functionalities can be used in a common OpenStack infrastructure.
CERN, the European Organization for Nuclear Research, is running for several years a large OpenStack Cloud that helps thousands of scientists to analyze the data from the LHC.
In 2012, early in the design phase of the CERN Cloud we decided to use Nova Cells to enable the infrastructure to scale to thousands of nodes. Now with more than 280K cores spread across 70 cells that are hosted in two data centres we were faced with the challenge to migrate to Nova Cells V2 required in the Pike release.
In this presentation, we will describe how Nova Cells allowed CERN to scale to thousands of nodes, its advantages and how we mitigate the implementation issues of Nova Cells V1. Next, we will cover how we upgraded Nova from Newton with Cells V1 to Pike with Cells V2. We will explain the steps that we followed and the issues that we faced during the upgrade. Finally, we will report our experience with Cells V2 at scale, its caveats and how we are mitigate them.
What can I expect to learn?
This presentation describes how CERN migrated from Cells V1 to Cells V2 when upgraded from Newton to Pike release.
You will learn the procedures followed by CERN in order to migrate Cells V1 to Cells V2 in a large production environment.
The issues found during the upgrade and how we mitigate them will be discussed.
Also, we will present how Cells V2 behaves in a large scale deployment with serveral thounsands nodes in 70 cells.
CERN is the European Centre for Particle Physics based in Geneva. The home of the Large Hadron Collider and the birth place of the world wide web is expanding its computing resources with a second data centre to process over 35PB/year from one of the largest scientific experiments ever constructed.
Within the constraints of fixed budget and manpower, agile computing techniques and common open source tools are being adopted to support over 11,000 physicists in their search for how the universe works and what is it made of.
By challenging special requirements and understanding how other large computing infrastructures are built, we have deployed a 50,000 core cloud based infrastructure building on tools such as Puppet, OpenStack and Kibana.
In moving to a cloud model, this has also required close examination of the IT processes and culture. Finding the right approach between Enterprise and DevOps techniques has been one of the greatest challenges of this transformation.
This talk will cover the requirements, tools selected, results achieved so far and the outlook for the future.
Learning to Scale Openstack: A Case Study in Rackspace's Open Cloud Deployment was presented at OpenStack Design Summit in Portland, OR on April 17, 2013. Watch the recording of the presentation on youtube at the following link: http://www.youtube.com/watch?v=3x8X6f5mnzc
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
Presented at PEARC20.
This talk presents expanding the IceCube’s production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.
Slides from our Q3 meetup held in Montreal on September 27th 2017 at the Cloud.ca Center.
Video recording can be seen at: https://www.youtube.com/watch?v=_1btwHW39ms&list=PLSsQodeQD6LPyqrvvczcC5mkOOnPt469o
Overlay Opportunistic Clouds in CMS/ATLAS at CERN: The CMSooooooCloud in DetailJose Antonio Coarasa Perez
Overlay opportunistic clouds in CMS/ATLAS at CERN: The CMSooooooCloud in detail
The CMS and ATLAS online clusters consist of more than 3000 computers each. They have been exclusively used for the data acquisition that led to the Higgs particle discovery, handling 100Gbytes/s data flows and archiving 20Tbytes of data per day.
An openstack cloud layer has been deployed on the newest part of the clusters (totalling 1300 hypervisors and more than 13000 cores in CMS alone) as a minimal overlay so as to leave the primary role of the computers untouched while allowing an opportunistic usage of the cluster.
This presentation will show how to share resources with a minimal impact on the existing infrastructure. We will present the architectural choices made to deploy an unusual, as opposed to dedicated, "overlaid cloud infrastructure". These architectural choices ensured a minimal impact on the running cluster configuration while giving a maximal segregation of the overlaid virtual computer infrastructure. The use of openvswitch to avoid changes on the network infrastructure and encapsulate the virtual machines traffic will be illustrated, as well as the networking configuration adopted due to the nature of our private network. The design and performance of the openstack cloud controlling layer will be presented. We will also show the integration carried out to allow the cluster to be used in an opportunistic way while giving full control to the CMS online run control.
Overview of what has happened in HNSciCloud over the last five months, Delivered by Helge Meinhard of CERN at the HEPiX Workshop on October 21st, 2016, in Berkeley, California, USA.
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
We ran a 50k GPU multi-cloud simulation to support the IceCube science. This talk provided an overview of what happened to the associated data.
Presented at the Internet2 booth at SC19.
For IceCube, large amount of photon propagation simulation is needed to properly calibrate natural Ice. Simulation is compute intensive and ideal for GPU compute. This Cloud run was more data intensive than precious ones, producing 130 TB of output data. To keep egress costs in check, we created dedicated network links via the Internet2 Cloud Connect Service.
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)Jose Antonio Coarasa Perez
The CMS online cluster consists of more than 3000 computers. It has been exclusively used for the Data Acquisition of the CMS experiment at CERN, archiving around 20Tbytes of data per day.
An openstack cloud layer has been deployed on part of the cluster (totalling more than 13000 cores) as a minimal overlay so as to leave the primary role of the computers untouched while allowing an opportunistic usage of the cluster. This allows running offline computing jobs on the online infrastructure while it is not (fully) used.
We will present the architectural choices made to deploy an unusual, as opposed to dedicated, "overlaid cloud infrastructure". These architectural choices ensured a minimal impact on the running cluster configuration while giving a maximal segregation of the overlaid virtual computer infrastructure. Openvswitch was chosen during the proof of concept phase in order to avoid changes on the network infrastructure. Its use will be illustrated as well as the final networking configuration used. The design and performance of the openstack cloud controlling layer will be also presented together with new developments and experience from the first year of usage.
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Databricks
The physicists at CERN are increasingly turning to Spark to process large physics datasets in a distributed fashion with the aim of reducing time-to-physics with increased interactivity. The physics data itself is stored in CERN’s mass storage system: EOS and CERN’s IT department runs on-premise private cloud based on OpenStack as a way to provide on-demand compute resources to physicists. This provides both opportunity and challenges to Big Data team at CERN to provide elastic, scalable, reliable spark-as-a-service on OpenStack.
The talk focuses on the design choices made and challenges faced while developing spark-as-a-service over kubernetes on openstack to simplify provisioning, automate management, and minimize the operating burden of managing Spark Clusters. In addition, the service tooling simplifies submitting applications on the behalf of the users, mounting user-specified ConfigMaps, copying application logs to s3 buckets for troubleshooting, performance analysis and accounting of spark applications and support for stateful spark streaming applications. We will also share results from running large scale sustained workloads over terabytes of physics data.
CERN, the European Organization for Nuclear Research, is one of the world’s largest centres for scientific research. Its business is fundamental physics, finding out what the universe is made of and how it works. At CERN, accelerators such as the 27km Large Hadron Collider, are used to study the basic constituents of matter. This talk reviews the challenges to record and analyse the 25 Petabytes/year produced by the experiments and the investigations into how OpenStack could help to deliver a more agile computing infrastructure.
CERN and Huawei, one of the companies that is working with T-Systems to develop a prototype for HNSciCloud, in the context of CERN openlab, will jointly work on improvements to OpenStack for running large scale scientific workloads.
This collaboration was announced at the CERN openlab open day on 21st September 2017.
Over 90% of CERN’s compute resources are delivered using OpenStack. OpenStack provides software for private and public clouds and CERN and Huawei are among the major contributors to the open source project (Huawei is a platinum member of the OpenStack foundation).
With the needs of LHC computing in future years, efficient and flexible delivery of compute resources will be key. That's why CERN and Huawei have joined forces to jointly work on improvements to OpenStack for running large scale scientific workloads.
The developments will be done within the OpenStack community following the standard open source processes.
Focus areas will be:
Flexible resource management
Quotas
Bare metal allocation
Compute cells
Changes, resulting from this activity, will then be included into the CERN private cloud and Huawei’s private and public cloud offerings.
This is a presentation by Prof. Anne Elster at the International Workshop on Open Source Supercomputing held in conjunction with the 2017 ISC High Performance Computing Conference.
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
In this deck from DataTech19, Debbie Bard from NERSC presents: Supercomputing and the scientist: How HPC and large-scale data analytics are transforming experimental science.
"Debbie Bard leads the Data Science Engagement Group NERSC. NERSC is the mission supercomputing center for the USA Department of Energy, and supports over 7000 scientists and 700 projects with supercomputing needs. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and has worked at Imperial College London as well as the Stanford Linear Accelerator Center (SLAC) in the USA, before joining the Data Department at NERSC, where she focuses on data-intensive computing and research, including supercomputing for experimental science and machine learning at scale."
Watch the video: https://wp.me/p3RLHQ-kLV
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
In this session, you will learn how CERN easily applied end-to-end deep learning and analytics pipelines on Apache Spark at scale for High Energy Physics using BigDL and Analytics Zoo open source software running on Intel Xeon-based distributed clusters.
Technical details and development learnings will be shared using an example of topology classification to improve real-time event selection at the Large Hadron Collider experiments. The classifier has demonstrated very good performance figures for efficiency, while also reducing the false positive rate compared to the existing methods. It could be used as a filter to improve the online event selection infrastructure of the LHC experiments, where one could benefit from a more flexible and inclusive selection strategy while reducing the amount of downstream resources wasted in processing false positives.
This is part of CERN’s research on applying Deep Learning and Analytics using open source and industry standard technologies as an alternative to the existing customized rule based methods. We show how we could quickly build and implement distributed deep learning solutions and data pipelines at scale on Apache Spark using Analytics Zoo and BigDL, which are open source frameworks unifying Analytics and AI on Spark with easy to use APIs and development interfaces seamlessly integrated with Big Data Platforms.
HNSciCloud has shared with the IT experts in scientific computing belonging to the HEPiX forum, the status of the ongoing Pre-Commercial Procurement of innovative cloud services and what the expected results might be.
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...Larry Smarr
11.04.06
Joint Presentation
UCSD School of Medicine Research Council
Larry Smarr, Calit2 & Phil Papadopoulos, SDSC/Calit2
Title: High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biomedical Sciences
Academic research institutions are at a precipice. They have historically been constrained to supporting classic “job” style workloads. With the growth of new workflow practices such as streaming data, science gateways, and more “dynamic” research using lambda-like functions, they must now support a variety of workloads.
In this talk, Lindsey and Bob will discuss some difficulties faced by academic institutions and how Kubernetes offers an extensible solution to support the future of research. They will present a selection of projects currently benefiting from Kubernetes enabled tools, like Argo, Kubeflow, and kube-batch. These workflows will be demonstrated using specific examples from two large research institutions: Compute Canada, Canada’s national computation research consortium and the University of Michigan, one of the largest public Universities in the United States.
KubeCon EU 2019
A “meta‑cloud” for building clouds
Build your own cloud on our hardware resources
Agnostic to specific cloud software
Run existing cloud software stacks (like OpenStack, Hadoop, etc.)
... or new ones built from the ground up
Control and visibility all the way to the bare metal
“Sliceable” for multiple, isolated experiments at once
Review of CERN's objectives and how the computing infrastructure is evolving to address the challenges at scale using community supported software such as Puppet and OpenStack.
Jennifer Schaus and Associates hosts a complimentary webinar series on The FAR in 2024. Join the webinars on Wednesdays and Fridays at noon, eastern.
Recordings are on YouTube and the company website.
https://www.youtube.com/@jenniferschaus/videos
Many ways to support street children.pptxSERUDS INDIA
By raising awareness, providing support, advocating for change, and offering assistance to children in need, individuals can play a crucial role in improving the lives of street children and helping them realize their full potential
Donate Us
https://serudsindia.org/how-individuals-can-support-street-children-in-india/
#donatefororphan, #donateforhomelesschildren, #childeducation, #ngochildeducation, #donateforeducation, #donationforchildeducation, #sponsorforpoorchild, #sponsororphanage #sponsororphanchild, #donation, #education, #charity, #educationforchild, #seruds, #kurnool, #joyhome
Russian anarchist and anti-war movement in the third year of full-scale warAntti Rautiainen
Anarchist group ANA Regensburg hosted my online-presentation on 16th of May 2024, in which I discussed tactics of anti-war activism in Russia, and reasons why the anti-war movement has not been able to make an impact to change the course of events yet. Cases of anarchists repressed for anti-war activities are presented, as well as strategies of support for political prisoners, and modest successes in supporting their struggles.
Thumbnail picture is by MediaZona, you may read their report on anti-war arson attacks in Russia here: https://en.zona.media/article/2022/10/13/burn-map
Links:
Autonomous Action
http://Avtonom.org
Anarchist Black Cross Moscow
http://Avtonom.org/abc
Solidarity Zone
https://t.me/solidarity_zone
Memorial
https://memopzk.org/, https://t.me/pzk_memorial
OVD-Info
https://en.ovdinfo.org/antiwar-ovd-info-guide
RosUznik
https://rosuznik.org/
Uznik Online
http://uznikonline.tilda.ws/
Russian Reader
https://therussianreader.com/
ABC Irkutsk
https://abc38.noblogs.org/
Send mail to prisoners from abroad:
http://Prisonmail.online
YouTube: https://youtu.be/c5nSOdU48O8
Spotify: https://podcasters.spotify.com/pod/show/libertarianlifecoach/episodes/Russian-anarchist-and-anti-war-movement-in-the-third-year-of-full-scale-war-e2k8ai4
ZGB - The Role of Generative AI in Government transformation.pdfSaeed Al Dhaheri
This keynote was presented during the the 7th edition of the UAE Hackathon 2024. It highlights the role of AI and Generative AI in addressing government transformation to achieve zero government bureaucracy
2. Grappling with Massive
Data Sets
Gavin McCance, CERN IT
Digital Energy 2018
1 May 2018 | Aberdeen
06/06/2018 OpenStack at CERN 2
OpenStack at CERN : A 5 year perspective
Tim Bell
tim.bell@cern.ch
@noggin143
OpenStack Days Budapest 2018
3. About Me - @noggin143
• Responsible for
Compute and Monitoring
at CERN
• Elected member of the
OpenStack Foundation
board
• Member of the
OpenStack user
committee from 2013-
2015
06/06/2018 OpenStack at CERN 3
4. OpenStack at CERN 4
CERNa
Worldwide
collaboration
CERN’s primary mission:
SCIENCE
Fundamental research on particle physics,
pushing the boundaries of knowledge and
technology
06/06/2018
6. 06/06/2018 OpenStack at CERN 6
Evolution of the Universe
Test the
Standard
Model?
What’s matter
made of?
What holds it
together?
Anti-matter?
(Gravity?)
7. OpenStack at CERN 7
The Large Hadron Collider: LHC
1232
dipole magnets
15 metres
35t EACH
27km
Image credit: CERN
06/06/2018
9. Vacuum?
• Yes
OpenStack at CERN 9
LHC: Highest Vacuum
104 km
of PIPES
10-11bar (~ moon)
Image credit: CERN
06/06/2018
10. Image credit: CERN
Image credit: CERN
OpenStack at CERN 10
ATLAS, CMS, ALICE and LHCb
EIFFEL
TOWER
HEAVIER
than the
Image credit: CERN
06/06/2018
11. OpenStack at CERN 11
40 million
pictures
per second
1PB/s
Image credit: CERN
12. OpenStack at CERN 12
Data Flow to Storage and Processing
ALICE: 4GB/s
ATLAS: 1GB/s
CMS: 600MB/s
LHCB: 750MB/s
RUN 2CERN DC
06/06/2018
13. Image credit: CERN
OpenStack at CERN 13
CERN Data Centre: Primary Copy of LHC Data
Data Centre on Google Street View
90k disks
15k servers
> 200 PB
on TAPES
06/06/2018
14. About WLCG:
• A community of 10,000 physicists
• ~250,000 jobs running concurrently
• 600,000 processing cores
• 700 PB storage available worldwide
• 20-40 Gbit/s connect CERN to Tier1s
Tier-0 (CERN)
• Initial data reconstruction
• Data recording & archiving
• Data distribution to rest of world
Tier-1s (14 centres worldwide)
• Permanent storage
• Re-processing
• Monte Carlo Simulation
• End-user analysis
Tier-2s (>150 centres worldwide)
• Monte Carlo Simulation
• End-user analysis
WLCG: LHC Computing Grid
Image credit: CERN
170 sites
WORLDWIDE
> 10000
users
15. CERN in 2017
230 PB on tape
550 million files
2017
55 PB produced
TB
06/06/2018 OpenStack at CERN 15
16. Cloud
OpenStack at CERN 16
CERN Data Centre: Private OpenStack Cloud
More Than
300 000
cores
More Than
500 000
physics jobs
per day
06/06/2018
17. Infrastructure in 2011
• Data centre managed by home grown toolset
• Initial development funded by EU projects
• Quattor, Lemon, …
• Development environment based on CVS
• 100K or so lines of Perl
• At the limit for power and cooling in Geneva
• No simple expansion options
06/06/2018 OpenStack at CERN 17
21. 2011 - First OpenStack summit talk
06/06/2018 OpenStack at CERN 21
https://www.slideshare.net/noggin143/cern-user-story
22. The Agile Infrastructure Project
2012, a turning point for CERN IT:
- LHC Computing and data requirements were
increasing … Moore’s law would help, but not enough
- EU funded projects for fabric management
toolset ended
- Staff fixed but must grow resources
- LS1 (2013) ahead, next window only in 2019!
- Other deployments have surpassed CERN‘s
Three core areas:
- Centralized Monitoring
- Config’ management
- IaaS based on OpenStack
“All servers shall be virtual!”
06/06/2018 OpenStack at CERN 22
31. OpenStack Magnum
An OpenStack API Service that allows creation of
container clusters
● Use your keystone credentials
● You choose your cluster type
● Multi-Tenancy
● Quickly create new clusters with advanced
features such as multi-master
32. OpenStack Magnum
$ openstack coe cluster create --cluster-template kubernetes --node-count 100 … mycluster
$ openstack cluster list
+------+----------------+------------+--------------+-----------------+
| uuid | name | node_count | master_count | status |
+------+----------------+------------+--------------+-----------------+
| .... | mycluster | 100 | 1 | CREATE_COMPLETE |
+------+----------------+------------+--------------+-----------------+
$ $(magnum cluster-config mycluster --dir mycluster)
$ kubectl get pod
$ openstack coe cluster update mycluster replace node_count=200
Single command cluster creation
33. 33
Why Bare-Metal Provisioning?
• VMs not sensible/suitable for all of our use cases
- Storage and database nodes, HPC clusters, boot strapping,
critical network equipment or specialised network setups,
precise/repeatable benchmarking for s/w frameworks, …
• Complete our service offerings
- Physical nodes (in addition to VMs and containers)
- OpenStack UI as the single pane of glass
• Simplify hardware provisioning workflows
- For users: openstack server create/delete
- For procurement & h/w provisioning team: initial on-boarding, server re-assignments
• Consolidate accounting & bookkeeping
- Resource accounting input will come from less sources
- Machine re-assignments will be easier to track
06/06/2018 OpenStack at CERN
34. Compute Intensive Workloads on VMs
• Up to 20% loss on very large VMs!
• “Tuning”: KSM*, EPT**, pinning, … 10%
• Compare with Hyper-V: no issue
• Numa-awares & node pinning ... <3%!
• Cross over : patches from Telecom
(*) Kernel Shared Memory
(**) Extended Page Tables
06/06/2018 OpenStack at CERN 34
VM Before After
4x 8 7.8%
2x 16 16%
1x 24 20% 5%
1x 32 20% 3%
35. 06/06/2018 OpenStack at CERN 35
A new use case: Containers on Bare-Metal
• OpenStack managed containers and bare
metal so put them together
• General service offer: managed clusters
- Users get only K8s credentials
- Cloud team manages the cluster and the underlying infra
• Batch farm runs in VMs as well
- Evaluating federated kubernetes for hybrid cloud integration
- 7 clouds federated demonstrated at Kubecon
- OpenStack and non-OpenStack transparently managed
Integration: seamless!
(based on specific template)
Monitoring (metrics/logs)?
Pod in the cluster
Logs: fluentd + ES
Metrics: cadvisor + influx
36. • h/w purchases: formal procedure compliant with public procurements
- Market survey identifies potential bidders
- Tender spec is sent to ask for offers
- Larger deliveries 1-2 times / year
• “Burn-in” before acceptance
- Compliance with technical spec (e.g. performance)
- Find failed components (e.g. broken RAM)
- Find systematic errors (e.g. bad firmware)
- Provoke early failing due to stress
Whole process can take weeks!
Hardware Burn-in in the CERN Data Centre (1)
“bathtub curve”
06/06/2018 OpenStack at CERN 36
37. Hardware Burn-in in the CERN Data Centre (2)
• Initial checks: Serial Asset Tag and BIOS settings
- Purchase order ID and unique serial no. to be set in the BMC (node name!)
• “Burn-in” tests
- CPU: burnK7, burnP6, burnMMX (cooling)
- RAM: memtest, Disk: badblocks
- Network: iperf(3) between pairs of nodes
- automatic node pairing
- Benchmarking: HEPSpec06 (& fio)
- derivative of SPEC06
- we buy total compute capacity (not newest processors)
$ ipmitool fru print 0 | tail -2
Product Serial : 245410-1
Product Asset Tag : CD5792984
$ openstack baremetal node show CD5792984-245410-1
“Double peak” structure due
to slower hardware threads
OpenAccess paper
06/06/2018 OpenStack at CERN 37
39. 39
Phase 1.
Nova Network
Linux Bridge
Phase 2.
Neutron
Linux Bridge
Phase 3.
SDN
Tungsten Fabric (testing)
Network Migration
New Region
coming in 2018
Already running
* But still used in 2018
*
40. Spectre / Meltdown
In January, a security vulnerability was
disclosed a new kernel everywhere
Campaign over two weeks from15th
January
7 reboot days, 7 tidy up days
By availability zone
Benefits
Automation now to reboot the cloud if
needed - 33,000 VMs on 9,000
hypervisors
Latest QEMU and RBD user code on all
VMs
Downside
Discovered Kernel bug in XFS which may
mean we have to do it again soon
06/06/2018 OpenStack at CERN 40
41. Community Experience
Open source collaboration sets model for in-house
teams
External recognition by the community is highly
rewarding for contributors
Reviews and being reviewed is a constant
learning experience
Productive for job market for staff
Working groups, like the Scientific and Large
Deployment teams, discuss wide range of topics
Effective knowledge transfer mechanisms
consistent with the CERN mission
110 outreach talks since 2011
Dojos at CERN bring good attendance
Ceph, CentOS, Elastic, OpenStack CH, …
06/06/2018 OpenStack at CERN 41
42. Increased complexity due to much higher pile-up and
higher trigger rates will bring several challenges to
reconstruction algorithms
MS had to cope with monster pile-up
8b4e bunch structure à pile-up of ~ 60 events/x-ing
for ~ 20 events/x-ing)
CMS: event with 78 reconstructed vertices
CMS: event from 2017 with 78
reconstructed vertices
ATLAS: simulation for HL-LHC
with 200 vertices
06/06/2018 OpenStack at CERN 42
HL-LHC: More collisions!
43. 06/06/2018 OpenStack at CERN 43
First run LS1 Second run Third run LS3 HL-LHC Run4
…2009 2013 2014 2015 2016 2017 201820112010 2012 2019 2023 2024 2030?20212020 2022 …2025
LS2
Significant part of cost comes
from global operations
Even with technology increase of
~15%/year, we still have a big
gap if we keep trying to do things
with our current compute models
Raw data volume
increases significantly
for High Luminosity LHC
2026
45. Development areas going forward
Spot Market
Cells V2
Neutron scaling
Magnum rolling upgrades
Block Storage Performance
Federated Kubernetes
Collaborations with Industry and SKA
06/06/2018 OpenStack at CERN 45
46. Summary
OpenStack has provided flexible infrastructure at
CERN since 2013
The open infrastructure toolchain has been stable
at scale
Clouds are part but not all of the solution
Open source collaborations have been fruitful for
CERN, industry and the communities
Further efforts will be needed to ensure that
physics is not limited by the computing resources
available
06/06/2018 OpenStack at CERN 46
47. Thanks for all your help .. Some links
CERN OpenStack blog at http://openstack-
in-production.blogspot.com
Recent CERN OpenStack talks at
Vancouver summit at
https://www.openstack.org/videos/search?se
arch=cern
CERN Tools at https://github.com/cernops
06/06/2018 OpenStack at CERN 47
49. Hardware Evolution
Looking at new hardware
platforms to reduce the
upcoming resource gap
Explorations have been made
in low cost and low power
ARM processors
Interesting R&Ds in high
performance hardware
GPUs for deep learning
network training and fast
simulation
FPGAs for neural network
inference and data
transformations
49
Significant
algorithm changes
needed to benefit
from potential
06/06/2018 OpenStack at CERN
Editor's Notes
Reference: Fabiola’s talk @ Univ of Geneva
https://www.unige.ch/public/actualites/2017/le-boson-de-higgs-et-notre-vie/
European Centre for Nuclear research
Founded in 1954, today 22 member state
World largest particle physics laboratory
~2.300 staff, 13k users on site
Budget 1k MCHF
Mission
Answer fundamental question on the universe
Advance the technology frontiers
Train scientist of tomorrow
Bring nations together
https://communications.web.cern.ch/fr/node/84
For all this fundamental research, CERN provides different facilities to scientists, for example the LHC
It’s a ring 27 km in circumference, crosses 2 countries, 100 mt underground, accelerates 2 particle beans to near the speed of light, and it make them collides to 4 different points where there are detectors to observe the fireworks.
2.500 people employed by CERN, > 10k users on the site
Talk about LHC here, describe experiment, lake geneve , mont blanc, an then jump in
Big ring is the LHC, the small one is the SPS, computer centre is not so far.
Pushing the boundary of technology,
It facilitate research, we just run the accelerators, experiment are done by institurtes, member states, university
Itranco swiss border, very close to geneva
Our flagship program is the LHC
Trillions of protons race around the 27km ring in opposite directions over 11,000 times a second, travelling at 99.9999991 per cent the speed of light.
Largest machine on earth
With an operating temperature of about -271 degrees Celsius, just 1.9 degrees above absolute zero, the LHC is one of the coldest places in the universe
120T Helium, only at that temperature there is no resistence
https://home.cern/about/engineering/vacuum-empty-interstellar-space
Inside beam operate a vey high vacuum, comparable to vacuum of the moon, there actually 2 beam, proton beams going int 2 directions, vaccum to avoiud protocon interacting with other particles
Technology very advanced beasts, 4 of them, ATLAS and CMS are the most well known ones, generale pouprose testing standard model properties, in those detector higgs particle have been discovered in 2012
In the picture you can see physicists. ALICE and LHCB
To sample and record the debris from up to 600 million proton collisions per second, scientists are building gargantuan devices that measure particles with micron precision.
100 Mpixel camera, 40 Million picture per seconds
https://www.ethz.ch/en/news-and-events/eth-news/news/2017/03/new-heart-for-cerns-cms.html
https://home.cern/about/computing/processing-what-record
First run abuot 5GB/s
Size of the Fibers from pitches to DC?
What we do with all this data? First thing we store it, the analysis is done offline, analysis can go one for years.
Tiered-systems where l0 is CERN data is recorded, reconstruted and distributed.
All these detectors will generate loads of data… about 1 PB (petabyte = million of gigabyte per… SECOND!)
Impossible to store so much data. Anyway not needed.
The event the experiments are trying to create and observe are very rare.
That’s why we make so many collisions but we keep only the interesting ones.
Therefore next to each detector is a «trigger», a kind of filter made of various layers (first electronic, then computers) which will select and keep only 1 collision out of a million average.
In the end will still generate dozens of Petabytes of data each year. We need about 200’000 computer CPUs to analyze this data.
As CERN has only about 100’000 CPUS we share the date over more than 100 computer centre over the planet (usually located in the physics institutes participating to the LHC collaboration). This is the Computing Grid, a gigantic planetary computer and hard drive!
Biggest scientific Grid project in the world
~170 computer centers (site)
1 Tier 0 (distributed in two locations)
14 bigger centers (Tier 1)
~160 Tier 2
42 countries
10,000 users
Running since Oct 2008
3 million jobs per day
~600.000 cores
300 PB data
Do you want to contribute?
http://lhcathome.web.cern.ch/
Optimized the usage resources and computing ( ~2012 private cloud based on Openstack) focusing on virtualization etc. and scaling options.