SlideShare a Scribd company logo
1 sysadmin vs 250 clusters
Etienne Menguy
SysadminDays
November 19, 2019
OVHcloud
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
2
1 500 000 customers
2200 employees
380 000 Bare-metal servers
Ceph at OVHcloud
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
3
Public Cloud
Virtual
machines
Additional
disks
Additional
disks
Additional
disks
Additional
disks
Cloud Disk Array
As A
Service
Evolution
„2015
• 4 dev
• 1 ops
• 8 clusters
• 4 regions
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
4
„2019
• 9 dev
• 250 clusters
• 10 regions
Daily work
„1 sysadmin
• Monitoring
• Prodding
• Support
• Training
• Deploying regions, servers
• And the daily surprises
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
5
8 devs
• Ceph as a service
• Infra as code
• Code review
• Tests
• R&D
Ceph setup
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
6
FlashcacheFlashcacheFlashcache
LXC
Data
LXC
Data
LXC
Data
NVME
Partition
Partition
Partition
x12
HDD
HDD
x12
HDD
Flashcache
LXC
Data
Bare-metal server
40Gbps NIC
Ceph as a service
„Autonomous users
• Creating cluster
• Managing users, pools, rights
• Managing network
• Cluster growth
„Backup management
• 500TB/day
• Ceph -> Swift
• Ceph -> Ceph
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
7
„Managing our infrastructure
• Cluster upgrade
• Deploy new ceph versions
• Manage tasks
• Host management
• Network management
• Containers management
Infrastructure
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
8
Serveurs
Conteneurs
VM
Instances
BDD
Puppet
API
Python
API
OVH
RabbitMQ
Celery
Task management
„ RabbitMQ
„ Celery
• https://github.com/ovh/celery-dyrygent
• Complex workflow
• Reliable
• Monitoring
• Web interface
• Planned tasks
• NVME replacement
• Self healing
• Triggered by monitoring probe
• Executes any operation
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
9
Example
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
10
start
Check
operation
safety
Lower disk
weight
Wait
cluster_health_ok
Remove disk
from cluster
Yes
No
Weight
equals 0
Continuous delivery
„CDS
• https://github.com/ovh/cds
„Each pull request
• Lint
• Unit test
„Daily prodding
• All tests executed
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
11
Infra as code
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
12
Inconsistent hardware
„Hardware profile
• 12 profils on production
• CPU
• NVME
• HDD
„Firmwares
„Ceph versions
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
13
• Generic tools
• 1 profile = 1 cluster
Monitoring
„ Automatic downtimes by tasks
„ Some alarms on working hours
„ Services/hosts aggregation
„ 143 000 services
„ 25 000 hosts
„ 3 infrastructures
• 6 masters
• 12 satellites
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
14
Metrics
„ Clusters metrics
• Usage
• Latency
„ Hardware
• Cpu, mermory usage
• Cache hit ratio
„ Service
• KPI
• Usage per openstack region
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
15
„ Metrics Data Platform
• https://www.ovh.com/fr/data-platforms/metrics/
„ 13 Millions series
„ 13 Billions points per day
„ Performance
• IO/s
• Latency
Logs
„ Infrastructure
• OS
• Ceph
„ Applications
• CAAS
• Celery / RabbitMQ
• Uniq step/task ID
„ API
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
16
„ Logs Data Platform
• https://www.ovh.com/fr/data-
platforms/logs/
„ 15 000 logs/second
„ Graylog
„ Filebeat
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
p ag e 17
Conclusion
D at e
F o o t er can b e p er so n alized as
fo llo w : In ser t / H ead er an d fo o t er
p ag e 18
Questions?

More Related Content

What's hot

VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
Ceph Community
 
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
OpenNebula Project
 
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebula Project
 
Introducing OVHcloud Enterprise Cloud Databases
Introducing OVHcloud Enterprise Cloud DatabasesIntroducing OVHcloud Enterprise Cloud Databases
Introducing OVHcloud Enterprise Cloud Databases
OVHcloud
 
TDS-16489U - Dual Processor
TDS-16489U - Dual ProcessorTDS-16489U - Dual Processor
TDS-16489U - Dual Processor
Fernando Barrientos
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
Ceph Community
 
Developing a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsDeveloping a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure Environments
Ceph Community
 
Stabilizing Ceph
Stabilizing CephStabilizing Ceph
Stabilizing Ceph
Ceph Community
 
Disk health prediction for Ceph
Disk health prediction for CephDisk health prediction for Ceph
Disk health prediction for Ceph
Ceph Community
 
VMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed SwitchVMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed Switch
VMworld
 
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Community
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migration
Ramkumar Nottath
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
OpenStack Korea Community
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Community
 
Enterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual ControllerEnterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual Controller
Fernando Barrientos
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Danielle Womboldt
 
Ceph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFSCeph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFS
Ceph Community
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Community
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Community
 

What's hot (20)

VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers VMworld 2013: How SRP Delivers More Than Power to Their Customers
VMworld 2013: How SRP Delivers More Than Power to Their Customers
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
OpenNebulaConf 2016 - Budgeting: the Ugly Duckling of Cloud computing? by Mat...
 
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
 
Introducing OVHcloud Enterprise Cloud Databases
Introducing OVHcloud Enterprise Cloud DatabasesIntroducing OVHcloud Enterprise Cloud Databases
Introducing OVHcloud Enterprise Cloud Databases
 
TDS-16489U - Dual Processor
TDS-16489U - Dual ProcessorTDS-16489U - Dual Processor
TDS-16489U - Dual Processor
 
iSCSI Target Support for Ceph
iSCSI Target Support for Ceph iSCSI Target Support for Ceph
iSCSI Target Support for Ceph
 
Developing a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsDeveloping a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure Environments
 
Stabilizing Ceph
Stabilizing CephStabilizing Ceph
Stabilizing Ceph
 
Disk health prediction for Ceph
Disk health prediction for CephDisk health prediction for Ceph
Disk health prediction for Ceph
 
VMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed SwitchVMworld 2014: vSphere Distributed Switch
VMworld 2014: vSphere Distributed Switch
 
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migration
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
 
Enterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual ControllerEnterprise Storage NAS - Dual Controller
Enterprise Storage NAS - Dual Controller
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
 
Ceph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFSCeph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFS
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
 

Similar to 1 sysadmin vs 250 clusters de stockage

Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph Community
 
[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...
[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...
[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...
Insight Technology, Inc.
 
Online spanish meetup #2
Online spanish meetup #2Online spanish meetup #2
Online spanish meetup #2
Alexandra N. Martinez
 
CloudStack usage service
CloudStack usage serviceCloudStack usage service
CloudStack usage service
ShapeBlue
 
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
donaghmccabe
 
FreeSWITCH as a Microservice
FreeSWITCH as a MicroserviceFreeSWITCH as a Microservice
FreeSWITCH as a Microservice
Evan McGee
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Jeremy Eder
 
Paul Angus - what's new in ACS 4.11
Paul Angus - what's new in ACS 4.11Paul Angus - what's new in ACS 4.11
Paul Angus - what's new in ACS 4.11
ShapeBlue
 
Whats new in Cloudstack 4.11 - behind the headlines
Whats new in Cloudstack 4.11 - behind the headlinesWhats new in Cloudstack 4.11 - behind the headlines
Whats new in Cloudstack 4.11 - behind the headlines
ShapeBlue
 
KACE Agent Architecture and Troubleshooting Overview
KACE Agent Architecture and Troubleshooting OverviewKACE Agent Architecture and Troubleshooting Overview
KACE Agent Architecture and Troubleshooting Overview
Dell World
 
The 12 Factor App
The 12 Factor AppThe 12 Factor App
The 12 Factor App
rudiyardley
 
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
SaltStack
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209
mffiedler
 
Tokyo azure meetup #12 service fabric internals
Tokyo azure meetup #12   service fabric internalsTokyo azure meetup #12   service fabric internals
Tokyo azure meetup #12 service fabric internals
Tokyo Azure Meetup
 
Using the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStackUsing the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStack
ShapeBlue
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
Wei Ting Chen
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
InfluxData
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Tibo Beijen
 
Lattice yapc-slideshare
Lattice yapc-slideshareLattice yapc-slideshare
Lattice yapc-slideshare
Gwenn Etourneau
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
C4Media
 

Similar to 1 sysadmin vs 250 clusters de stockage (20)

Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
 
[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...
[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...
[db tech showcase Tokyo 2015] C16:Oracle Disaster Recovery at New Zealand sto...
 
Online spanish meetup #2
Online spanish meetup #2Online spanish meetup #2
Online spanish meetup #2
 
CloudStack usage service
CloudStack usage serviceCloudStack usage service
CloudStack usage service
 
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
 
FreeSWITCH as a Microservice
FreeSWITCH as a MicroserviceFreeSWITCH as a Microservice
FreeSWITCH as a Microservice
 
Red Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShiftRed Hat Summit 2018 5 New High Performance Features in OpenShift
Red Hat Summit 2018 5 New High Performance Features in OpenShift
 
Paul Angus - what's new in ACS 4.11
Paul Angus - what's new in ACS 4.11Paul Angus - what's new in ACS 4.11
Paul Angus - what's new in ACS 4.11
 
Whats new in Cloudstack 4.11 - behind the headlines
Whats new in Cloudstack 4.11 - behind the headlinesWhats new in Cloudstack 4.11 - behind the headlines
Whats new in Cloudstack 4.11 - behind the headlines
 
KACE Agent Architecture and Troubleshooting Overview
KACE Agent Architecture and Troubleshooting OverviewKACE Agent Architecture and Troubleshooting Overview
KACE Agent Architecture and Troubleshooting Overview
 
The 12 Factor App
The 12 Factor AppThe 12 Factor App
The 12 Factor App
 
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
SaltConf14 - Eric johnson, Google - Orchestrating Google Compute Engine with ...
 
Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209
 
Tokyo azure meetup #12 service fabric internals
Tokyo azure meetup #12   service fabric internalsTokyo azure meetup #12   service fabric internals
Tokyo azure meetup #12 service fabric internals
 
Using the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStackUsing the KVMhypervisor in CloudStack
Using the KVMhypervisor in CloudStack
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
Wayfair Storefront Performance Monitoring with InfluxEnterprise by Richard La...
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 
Lattice yapc-slideshare
Lattice yapc-slideshareLattice yapc-slideshare
Lattice yapc-slideshare
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 

More from OVHcloud

OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
OVHcloud
 
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud
 
Webinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud DatabasesWebinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud Databases
OVHcloud
 
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud
 
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud
 
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud
 
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilitéOVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud
 
OVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML servingOVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML serving
OVHcloud
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
OVHcloud
 
Les APIs OpenStack
Les APIs OpenStackLes APIs OpenStack
Les APIs OpenStack
OVHcloud
 
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
OVHcloud
 
Industrialize Machine Learning
Industrialize Machine Learning Industrialize Machine Learning
Industrialize Machine Learning
OVHcloud
 
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
OVHcloud
 
Online passwords – understanding "credential stuffing" cyberattack
Online passwords – understanding "credential stuffing" cyberattackOnline passwords – understanding "credential stuffing" cyberattack
Online passwords – understanding "credential stuffing" cyberattack
OVHcloud
 
OVHcloud and Microsoft for the public sector
OVHcloud and Microsoft for the public sectorOVHcloud and Microsoft for the public sector
OVHcloud and Microsoft for the public sector
OVHcloud
 
The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?
OVHcloud
 
Maximising the security of your cloud infrastructure
Maximising the security of your cloud infrastructureMaximising the security of your cloud infrastructure
Maximising the security of your cloud infrastructure
OVHcloud
 
One year later… Revisiting the GDPR and what it means for the cloud
One year later… Revisiting the GDPR and what it means for the cloudOne year later… Revisiting the GDPR and what it means for the cloud
One year later… Revisiting the GDPR and what it means for the cloud
OVHcloud
 
Rethinking the Public Cloud user experience
Rethinking the Public Cloud user experienceRethinking the Public Cloud user experience
Rethinking the Public Cloud user experience
OVHcloud
 

More from OVHcloud (20)

OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
 
Webinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud DatabasesWebinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud Databases
 
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
 
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
 
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
 
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilitéOVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
 
OVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML servingOVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML serving
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
Les APIs OpenStack
Les APIs OpenStackLes APIs OpenStack
Les APIs OpenStack
 
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
 
Industrialize Machine Learning
Industrialize Machine Learning Industrialize Machine Learning
Industrialize Machine Learning
 
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
 
Online passwords – understanding "credential stuffing" cyberattack
Online passwords – understanding "credential stuffing" cyberattackOnline passwords – understanding "credential stuffing" cyberattack
Online passwords – understanding "credential stuffing" cyberattack
 
OVHcloud and Microsoft for the public sector
OVHcloud and Microsoft for the public sectorOVHcloud and Microsoft for the public sector
OVHcloud and Microsoft for the public sector
 
The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?
 
Maximising the security of your cloud infrastructure
Maximising the security of your cloud infrastructureMaximising the security of your cloud infrastructure
Maximising the security of your cloud infrastructure
 
One year later… Revisiting the GDPR and what it means for the cloud
One year later… Revisiting the GDPR and what it means for the cloudOne year later… Revisiting the GDPR and what it means for the cloud
One year later… Revisiting the GDPR and what it means for the cloud
 
Rethinking the Public Cloud user experience
Rethinking the Public Cloud user experienceRethinking the Public Cloud user experience
Rethinking the Public Cloud user experience
 

Recently uploaded

"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
marufrahmanstratejm
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 

Recently uploaded (20)

"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 

1 sysadmin vs 250 clusters de stockage

  • 1. 1 sysadmin vs 250 clusters Etienne Menguy SysadminDays November 19, 2019
  • 2. OVHcloud D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 2 1 500 000 customers 2200 employees 380 000 Bare-metal servers
  • 3. Ceph at OVHcloud D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 3 Public Cloud Virtual machines Additional disks Additional disks Additional disks Additional disks Cloud Disk Array As A Service
  • 4. Evolution „2015 • 4 dev • 1 ops • 8 clusters • 4 regions D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 4 „2019 • 9 dev • 250 clusters • 10 regions
  • 5. Daily work „1 sysadmin • Monitoring • Prodding • Support • Training • Deploying regions, servers • And the daily surprises D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 5 8 devs • Ceph as a service • Infra as code • Code review • Tests • R&D
  • 6. Ceph setup D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 6 FlashcacheFlashcacheFlashcache LXC Data LXC Data LXC Data NVME Partition Partition Partition x12 HDD HDD x12 HDD Flashcache LXC Data Bare-metal server 40Gbps NIC
  • 7. Ceph as a service „Autonomous users • Creating cluster • Managing users, pools, rights • Managing network • Cluster growth „Backup management • 500TB/day • Ceph -> Swift • Ceph -> Ceph D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 7 „Managing our infrastructure • Cluster upgrade • Deploy new ceph versions • Manage tasks • Host management • Network management • Containers management
  • 8. Infrastructure D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 8 Serveurs Conteneurs VM Instances BDD Puppet API Python API OVH RabbitMQ Celery
  • 9. Task management „ RabbitMQ „ Celery • https://github.com/ovh/celery-dyrygent • Complex workflow • Reliable • Monitoring • Web interface • Planned tasks • NVME replacement • Self healing • Triggered by monitoring probe • Executes any operation D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 9
  • 10. Example D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 10 start Check operation safety Lower disk weight Wait cluster_health_ok Remove disk from cluster Yes No Weight equals 0
  • 11. Continuous delivery „CDS • https://github.com/ovh/cds „Each pull request • Lint • Unit test „Daily prodding • All tests executed D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 11
  • 12. Infra as code D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 12
  • 13. Inconsistent hardware „Hardware profile • 12 profils on production • CPU • NVME • HDD „Firmwares „Ceph versions D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 13 • Generic tools • 1 profile = 1 cluster
  • 14. Monitoring „ Automatic downtimes by tasks „ Some alarms on working hours „ Services/hosts aggregation „ 143 000 services „ 25 000 hosts „ 3 infrastructures • 6 masters • 12 satellites D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 14
  • 15. Metrics „ Clusters metrics • Usage • Latency „ Hardware • Cpu, mermory usage • Cache hit ratio „ Service • KPI • Usage per openstack region D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 15 „ Metrics Data Platform • https://www.ovh.com/fr/data-platforms/metrics/ „ 13 Millions series „ 13 Billions points per day „ Performance • IO/s • Latency
  • 16. Logs „ Infrastructure • OS • Ceph „ Applications • CAAS • Celery / RabbitMQ • Uniq step/task ID „ API D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er 16 „ Logs Data Platform • https://www.ovh.com/fr/data- platforms/logs/ „ 15 000 logs/second „ Graylog „ Filebeat
  • 17. D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er p ag e 17 Conclusion
  • 18. D at e F o o t er can b e p er so n alized as fo llo w : In ser t / H ead er an d fo o t er p ag e 18 Questions?