SlideShare a Scribd company logo
Towards a self automated CERN Cloud
José Castro León
CERN Cloud Infrastructure
Who am I?CERN Cloud Team
Outlines
4
●
Introduction
●
CERN Cloud service
●
Automation status
●
Upcoming challenges
●
Improvement plan
●
Source code
European Organization for Nuclear Research
5
●
World largest particle physics laboratory
●
Founded in 1954
●
22 member states
●
Fundamental research in physics
6
●
Infrastructure as a Service
●
Production since July 2013
●
CentOS 7 based
●
Geneva and Wigner Computer centres
●
Highly scalable architecture > 70 nova cells
●
Currently running Rocky release
CERN Cloud Service
7
CERN Cloud Infrastructure – initial offering
8
IaaS
Compute Storage
nova glance keystone
Identity
horizon
Web UI
CERN Cloud Infrastructure - now
9
IaaS
neutron ironic manila
Network
Orchestration
heat
barbican
Container
Orchestration
magnum
Automation
mistral
IaaS+
Key
manager
Compute Storage
nova cinder glance keystone
Identity
horizon
Web UI
Automation in the CERN Cloud
10
mistral
C
HR
Resources
cornerstone
collectd
grafana
GNI
11
Back in 2012
0
20
40
60
80
100
120
140
160
Run 1 Run 2 Run 3 Run 4
GRID
ATLAS
CMS
LHCb
ALICE
●
LHC Computing and Data requirements where increasing
●
Constant team size
●
Improve manageability and efficiency
●
Automation
– Considered early on
– Exercise it as much as possible
12
Situation now
●
300k core cloud and increasing
– Addition of new services
– Continuous improvements on existing ones
●
No change in number of staff
●
Automation is key
– Keep service knowledge
– Offload common tasks
– Simplify management
13
Automation in the CERN Cloud @today
Resource Lifecycle
management
Host and Service
monitoring
Optimize resource
availability
Improve VM
availability
and Performance
14
Host and Service Monitoring
●
Monitor HW events with Collectd
●
Collect service logs through Flume
●
General Notification Infrastructure
– Support tickets for repairs
●
Service alarms in Grafana
●
Rundeck jobs
– Time-scheduled jobs to fix common issues
– Offload ticket handling
– Schedule interventions
15
RunDeck: Task delegation
collectd GNI
●
Rely on Rundeck for offloading tasks to different teams
– Procurement
– Repair Team
– Resource Coordinator
– Cloud Service operations
●
Example: disk replacement
Repair
Team
16
Resource Lifecycle Management
●
Types of projects
●
Provisioning and cleanup in Mistral workflows
– Service inter-dependencies
Affiliation
Expired
User Disabled User Deletion
Shared Promote - -
Personal - Stop Delete
17
Resource Lifecycle Management in detail
●
Set of workbooks interconnected to manage
– Projects
– Services
keystone.project_get
keystone.project_delete
service_delete
mistral
service_delete
project_delete
magnum
barbicanheat
nova
cinder manila s3
glance
neutron
18
Resource Lifecycle Management for end user
mistral
19
Optimize resource availability - Expiration
●
Each VM in a personal project has an expiration date
●
Set shortly after creation and evaluated daily
●
Configured to 180 days and renewable
●
Reminder mails starting 30 days before expiration
●
Implemented on a Workbook in Mistral
ACTIVE EXPIRED
Reminder Expiration Deletion
20
Expiration of Personal Instances
21
Expiration workbook in detail
retrieve_projects
daily_expiration_global
daily.project_expiration
●
Based on project expiration tag and expire_at instance attribute
retrieve_instances
daily_expiration_project
daily.instance_expiration
check_status
daily_expiration_instance
check_expiration
fix_expiration
process_expiration
reminder expire delete
22
Improve VM availability and performance
●
Hyperconverged servers
– Compute + Storage Nodes
– Local Ceph pool
●
Instances
●
Volumes
– Ease management
– Small IO latency
– Increased Disk capacity
– Use cases:
●
DB and Storage services
23
Automation in the CERN Cloud @next
Add new services Root Cause Analysis
Kubernetes Jobs
Improve further more
availability
and performance
24
Continuous addition of new services
●
Project management workbooks are prepared to be extended
●
Latest addition is the S3 service through RadosGW
●
Uses AdminOps API for quota operations
– python-radosgw-admin
– python-mistral-radosgw-actions
●
Modify workflows accordingly
disable_user:
join: all
action: radosgw.user_update
input:
uid: <% $.id %>
suspended: true
secret_key: <% $.access_key %>
access_key: <% $.secret_key %>
25
Root Cause Analysis
●
Find root cause of issues
– Degradation of response of an application
●
CPU issue? kernel degradation?
●
Improve alarms with scope
– Automatically list impacted services
●
Find hidden service dependencies
●
Trigger automatic resolutions
– Run healing workflows
mistral
collectd
vitragecloud
26
Kubernetes jobs
●
Moving towards running control plane in kubernetes
– Based on Helm charts
– Healing operations added as jobs
●
All automated tasks in rundeck can be “dockerized”
●
Rundeck now interfaces with Kubernetes
●
Start moving tasks into jobs
27
Get even more performance
●
Hyperconverged servers
– Fixed CPU allocation for protecting IO operations
●
Dynamically adjust CPU usage in the setup
– Keeping free resources for IO
– Avoid impact on compute
– Automatic live-migration
watcher
28
Improve Cloud utilization
user
VMs
pre
user
VMs
pre
aardvark
●
Interested in preemptibles: Preemptible Instances at CERN on Thursday Nov 15th
1:40pm Hall A3
A
user
VMs
pre
user
VMs
29
Improve Cloud utilization
●
Dynamic allocation of preemptible instances
user
VMs
user
VMs
pre
user
VMs
pre
user
VMs
pre
watcherwatcher aardvark
A
30
#talk is cheap
show me the code
31
Here are the links
●
https://gitlab.cern.ch/cloud-infrastructure/
– cinder, horizon, ironic, keystone, mistral, neutron and nova
– mistral-workflows
– mistral-radosgw-actions (python-radosgw-admin)
– hzrequestspanel
– cci-scripts
– cci-tools
Thank you
32
gitlab.cern.ch/cloud-infrastructure
openstack-in-production.blogspot.ch
jose.castro.leon@cern.ch
@josecastroleon
BACKUP SLIDES

More Related Content

What's hot

7 - Monitoring Kubernetes with Elastic
7 - Monitoring Kubernetes with Elastic7 - Monitoring Kubernetes with Elastic
7 - Monitoring Kubernetes with Elastic
Kangaroot
 
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
[WSO2Con USA 2018] Deploying Applications in K8S and Docker[WSO2Con USA 2018] Deploying Applications in K8S and Docker
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
WSO2
 
Nova Updates - Kilo Edition
Nova Updates - Kilo EditionNova Updates - Kilo Edition
Nova Updates - Kilo Edition
OpenStack Foundation
 
5 - Hands-on Kubernetes Workshop:
5 - Hands-on Kubernetes Workshop:5 - Hands-on Kubernetes Workshop:
5 - Hands-on Kubernetes Workshop:
Kangaroot
 
Google Kubernetes Engine Deep Dive Meetup
Google Kubernetes Engine Deep Dive MeetupGoogle Kubernetes Engine Deep Dive Meetup
Google Kubernetes Engine Deep Dive Meetup
Iftach Schonbaum
 
4 - Customer story: Telenet
4 - Customer story: Telenet4 - Customer story: Telenet
4 - Customer story: Telenet
Kangaroot
 
Kubernetes in Production: Lessons Learnt
Kubernetes in Production: Lessons LearntKubernetes in Production: Lessons Learnt
Kubernetes in Production: Lessons Learnt
Arunvel Sriram
 
Serverless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipelineServerless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipeline
Shu-Jeng Hsieh
 
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLabWebinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLab
MayaData Inc
 
Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition
OpenStack Foundation
 
Sprint 16 report
Sprint 16 reportSprint 16 report
Sprint 16 report
ManageIQ
 
The evolving container landscape
The evolving container landscapeThe evolving container landscape
The evolving container landscape
Nilesh Trivedi
 
GIS on Rails by Oleksandr Kychun
GIS on Rails by Oleksandr Kychun GIS on Rails by Oleksandr Kychun
GIS on Rails by Oleksandr Kychun
Pivorak MeetUp
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona
Tim Bell
 
Rook cncf-wg-storage
Rook cncf-wg-storageRook cncf-wg-storage
Rook cncf-wg-storage
Bassam Tabbara
 
Kubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csiKubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csi
Rita Zhang
 
Gett && Golang
Gett && GolangGett && Golang
Gett && Golang
Sergey Lanzman
 
Introduction to rook
Introduction to rookIntroduction to rook
Introduction to rook
Rohan Gupta
 
DNSaaS and FWaaS
DNSaaS and FWaaSDNSaaS and FWaaS
DNSaaS and FWaaS
Alex Baretto
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
Tim Bell
 

What's hot (20)

7 - Monitoring Kubernetes with Elastic
7 - Monitoring Kubernetes with Elastic7 - Monitoring Kubernetes with Elastic
7 - Monitoring Kubernetes with Elastic
 
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
[WSO2Con USA 2018] Deploying Applications in K8S and Docker[WSO2Con USA 2018] Deploying Applications in K8S and Docker
[WSO2Con USA 2018] Deploying Applications in K8S and Docker
 
Nova Updates - Kilo Edition
Nova Updates - Kilo EditionNova Updates - Kilo Edition
Nova Updates - Kilo Edition
 
5 - Hands-on Kubernetes Workshop:
5 - Hands-on Kubernetes Workshop:5 - Hands-on Kubernetes Workshop:
5 - Hands-on Kubernetes Workshop:
 
Google Kubernetes Engine Deep Dive Meetup
Google Kubernetes Engine Deep Dive MeetupGoogle Kubernetes Engine Deep Dive Meetup
Google Kubernetes Engine Deep Dive Meetup
 
4 - Customer story: Telenet
4 - Customer story: Telenet4 - Customer story: Telenet
4 - Customer story: Telenet
 
Kubernetes in Production: Lessons Learnt
Kubernetes in Production: Lessons LearntKubernetes in Production: Lessons Learnt
Kubernetes in Production: Lessons Learnt
 
Serverless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipelineServerless ETL and Optimization on ML pipeline
Serverless ETL and Optimization on ML pipeline
 
Webinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLabWebinar: Building a multi-cloud Kubernetes storage on GitLab
Webinar: Building a multi-cloud Kubernetes storage on GitLab
 
Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition Neutron Updates - Liberty Edition
Neutron Updates - Liberty Edition
 
Sprint 16 report
Sprint 16 reportSprint 16 report
Sprint 16 report
 
The evolving container landscape
The evolving container landscapeThe evolving container landscape
The evolving container landscape
 
GIS on Rails by Oleksandr Kychun
GIS on Rails by Oleksandr Kychun GIS on Rails by Oleksandr Kychun
GIS on Rails by Oleksandr Kychun
 
20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona20161025 OpenStack at CERN Barcelona
20161025 OpenStack at CERN Barcelona
 
Rook cncf-wg-storage
Rook cncf-wg-storageRook cncf-wg-storage
Rook cncf-wg-storage
 
Kubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csiKubecon 2019_eu-k8s-secrets-csi
Kubecon 2019_eu-k8s-secrets-csi
 
Gett && Golang
Gett && GolangGett && Golang
Gett && Golang
 
Introduction to rook
Introduction to rookIntroduction to rook
Introduction to rook
 
DNSaaS and FWaaS
DNSaaS and FWaaSDNSaaS and FWaaS
DNSaaS and FWaaS
 
20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
 

Similar to Towards a self automated CERN Cloud

Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
aspyker
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
Sharma Podila
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
Carlos Andrés García
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
VMware Tanzu
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
Microservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing MicroservicesMicroservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing Microservices
QAware GmbH
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using Ansible
Alok Patra
 
Puppet Camp Berlin 2015: Configuration Management @ CERN: Going Agile with Style
Puppet Camp Berlin 2015: Configuration Management @ CERN: Going Agile with StylePuppet Camp Berlin 2015: Configuration Management @ CERN: Going Agile with Style
Puppet Camp Berlin 2015: Configuration Management @ CERN: Going Agile with Style
Puppet
 
Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...
Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...
Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...
NETWAYS
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
DoKC
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
Tim Bell
 
Monitoring hybrid container environments
Monitoring hybrid container environments Monitoring hybrid container environments
Monitoring hybrid container environments
Samuel Vandamme
 
Running PostgreSQL in a Kubernetes cluster: CloudNativePG
Running PostgreSQL in a Kubernetes cluster: CloudNativePGRunning PostgreSQL in a Kubernetes cluster: CloudNativePG
Running PostgreSQL in a Kubernetes cluster: CloudNativePG
Nick Ivanov
 
Welcome to icehouse
Welcome to icehouseWelcome to icehouse
Welcome to icehouse
Marcos García
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
Karol Chrapek
 
Red Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftRed Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShift
Kangaroot
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
NETWAYS
 
Kubernetes
KubernetesKubernetes
Kubernetes
Martin Podval
 
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Cédrick Lunven
 

Similar to Towards a self automated CERN Cloud (20)

Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016Scheduling a fuller house - Talk at QCon NY 2016
Scheduling a fuller house - Talk at QCon NY 2016
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
 
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKSPostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Microservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing MicroservicesMicroservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing Microservices
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using Ansible
 
Puppet Camp Berlin 2015: Configuration Management @ CERN: Going Agile with Style
Puppet Camp Berlin 2015: Configuration Management @ CERN: Going Agile with StylePuppet Camp Berlin 2015: Configuration Management @ CERN: Going Agile with Style
Puppet Camp Berlin 2015: Configuration Management @ CERN: Going Agile with Style
 
Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...
Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...
Puppet Camp Berlin 2015: Andrea Giardini | Configuration Management @ CERN: G...
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
Monitoring hybrid container environments
Monitoring hybrid container environments Monitoring hybrid container environments
Monitoring hybrid container environments
 
Running PostgreSQL in a Kubernetes cluster: CloudNativePG
Running PostgreSQL in a Kubernetes cluster: CloudNativePGRunning PostgreSQL in a Kubernetes cluster: CloudNativePG
Running PostgreSQL in a Kubernetes cluster: CloudNativePG
 
Welcome to icehouse
Welcome to icehouseWelcome to icehouse
Welcome to icehouse
 
Container orchestration and microservices world
Container orchestration and microservices worldContainer orchestration and microservices world
Container orchestration and microservices world
 
Red Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShiftRed Hat multi-cluster management & what's new in OpenShift
Red Hat multi-cluster management & what's new in OpenShift
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
 
Kubernetes
KubernetesKubernetes
Kubernetes
 
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
Top 10 present and future innovations in the NoSQL Cassandra ecosystem (2022)
 

Recently uploaded

Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
aymanquadri279
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
VALiNTRY360
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 

Recently uploaded (20)

Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfTop Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdf
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 

Towards a self automated CERN Cloud

  • 1.
  • 2. Towards a self automated CERN Cloud José Castro León CERN Cloud Infrastructure
  • 3. Who am I?CERN Cloud Team
  • 4. Outlines 4 ● Introduction ● CERN Cloud service ● Automation status ● Upcoming challenges ● Improvement plan ● Source code
  • 5. European Organization for Nuclear Research 5 ● World largest particle physics laboratory ● Founded in 1954 ● 22 member states ● Fundamental research in physics
  • 6. 6 ● Infrastructure as a Service ● Production since July 2013 ● CentOS 7 based ● Geneva and Wigner Computer centres ● Highly scalable architecture > 70 nova cells ● Currently running Rocky release CERN Cloud Service
  • 7. 7
  • 8. CERN Cloud Infrastructure – initial offering 8 IaaS Compute Storage nova glance keystone Identity horizon Web UI
  • 9. CERN Cloud Infrastructure - now 9 IaaS neutron ironic manila Network Orchestration heat barbican Container Orchestration magnum Automation mistral IaaS+ Key manager Compute Storage nova cinder glance keystone Identity horizon Web UI
  • 10. Automation in the CERN Cloud 10 mistral C HR Resources cornerstone collectd grafana GNI
  • 11. 11 Back in 2012 0 20 40 60 80 100 120 140 160 Run 1 Run 2 Run 3 Run 4 GRID ATLAS CMS LHCb ALICE ● LHC Computing and Data requirements where increasing ● Constant team size ● Improve manageability and efficiency ● Automation – Considered early on – Exercise it as much as possible
  • 12. 12 Situation now ● 300k core cloud and increasing – Addition of new services – Continuous improvements on existing ones ● No change in number of staff ● Automation is key – Keep service knowledge – Offload common tasks – Simplify management
  • 13. 13 Automation in the CERN Cloud @today Resource Lifecycle management Host and Service monitoring Optimize resource availability Improve VM availability and Performance
  • 14. 14 Host and Service Monitoring ● Monitor HW events with Collectd ● Collect service logs through Flume ● General Notification Infrastructure – Support tickets for repairs ● Service alarms in Grafana ● Rundeck jobs – Time-scheduled jobs to fix common issues – Offload ticket handling – Schedule interventions
  • 15. 15 RunDeck: Task delegation collectd GNI ● Rely on Rundeck for offloading tasks to different teams – Procurement – Repair Team – Resource Coordinator – Cloud Service operations ● Example: disk replacement Repair Team
  • 16. 16 Resource Lifecycle Management ● Types of projects ● Provisioning and cleanup in Mistral workflows – Service inter-dependencies Affiliation Expired User Disabled User Deletion Shared Promote - - Personal - Stop Delete
  • 17. 17 Resource Lifecycle Management in detail ● Set of workbooks interconnected to manage – Projects – Services keystone.project_get keystone.project_delete service_delete mistral service_delete project_delete magnum barbicanheat nova cinder manila s3 glance neutron
  • 18. 18 Resource Lifecycle Management for end user mistral
  • 19. 19 Optimize resource availability - Expiration ● Each VM in a personal project has an expiration date ● Set shortly after creation and evaluated daily ● Configured to 180 days and renewable ● Reminder mails starting 30 days before expiration ● Implemented on a Workbook in Mistral ACTIVE EXPIRED Reminder Expiration Deletion
  • 21. 21 Expiration workbook in detail retrieve_projects daily_expiration_global daily.project_expiration ● Based on project expiration tag and expire_at instance attribute retrieve_instances daily_expiration_project daily.instance_expiration check_status daily_expiration_instance check_expiration fix_expiration process_expiration reminder expire delete
  • 22. 22 Improve VM availability and performance ● Hyperconverged servers – Compute + Storage Nodes – Local Ceph pool ● Instances ● Volumes – Ease management – Small IO latency – Increased Disk capacity – Use cases: ● DB and Storage services
  • 23. 23 Automation in the CERN Cloud @next Add new services Root Cause Analysis Kubernetes Jobs Improve further more availability and performance
  • 24. 24 Continuous addition of new services ● Project management workbooks are prepared to be extended ● Latest addition is the S3 service through RadosGW ● Uses AdminOps API for quota operations – python-radosgw-admin – python-mistral-radosgw-actions ● Modify workflows accordingly disable_user: join: all action: radosgw.user_update input: uid: <% $.id %> suspended: true secret_key: <% $.access_key %> access_key: <% $.secret_key %>
  • 25. 25 Root Cause Analysis ● Find root cause of issues – Degradation of response of an application ● CPU issue? kernel degradation? ● Improve alarms with scope – Automatically list impacted services ● Find hidden service dependencies ● Trigger automatic resolutions – Run healing workflows mistral collectd vitragecloud
  • 26. 26 Kubernetes jobs ● Moving towards running control plane in kubernetes – Based on Helm charts – Healing operations added as jobs ● All automated tasks in rundeck can be “dockerized” ● Rundeck now interfaces with Kubernetes ● Start moving tasks into jobs
  • 27. 27 Get even more performance ● Hyperconverged servers – Fixed CPU allocation for protecting IO operations ● Dynamically adjust CPU usage in the setup – Keeping free resources for IO – Avoid impact on compute – Automatic live-migration watcher
  • 28. 28 Improve Cloud utilization user VMs pre user VMs pre aardvark ● Interested in preemptibles: Preemptible Instances at CERN on Thursday Nov 15th 1:40pm Hall A3 A user VMs pre user VMs
  • 29. 29 Improve Cloud utilization ● Dynamic allocation of preemptible instances user VMs user VMs pre user VMs pre user VMs pre watcherwatcher aardvark A
  • 30. 30 #talk is cheap show me the code
  • 31. 31 Here are the links ● https://gitlab.cern.ch/cloud-infrastructure/ – cinder, horizon, ironic, keystone, mistral, neutron and nova – mistral-workflows – mistral-radosgw-actions (python-radosgw-admin) – hzrequestspanel – cci-scripts – cci-tools
  • 33.