SlideShare a Scribd company logo
Ceph at DigitalOcean
1
Alex Marangone - Sr Engineer
Agenda
○ What’s DigitalOcean
○ Ceph at DigitalOcean
○ Operating Ceph
○ Ceph gripes
○ Questions
2
3
What’s DigitalOcean
A brief history
DigitalOcean
○ Cloud provider founded in 2012
○ Core concept: simplicity
○ First product: Droplets (i.e. VM)
○ Main attractivity was SSD based VM
○ Second product: Volumes (Block) in 2016
○ Many more products have been added since, including Spaces (Object)
○ Datacenters in 8 regions
○ IPO in 2021
4
5
Ceph at DigitalOcean
Ceph at DigitalOcean
○ Block and Object storage are backed by Ceph
○ Images backups/BYOI isn’t but we’re looking into it
○ 37 production clusters on Ceph Nautilus
○ 1 on Ceph Luminous
○ >54 PB of raw Ceph storage
○ ~21,500 OSDs
○ Block storage is all flash
○ Object storage is HDD/Flash
○ Index nodes are flash
6
Why Ceph
● Scalability
○ Scales horizontally
○ And vertically
● Reliability
○ Self-healing
○ Strongly consistent
● Performance
○ Generally acceptable performance
○ Performance scales ~linearly~ with disk/node count
● Works with off-the-shelf components
● Usable for multiple features/products
7
8
Operating Ceph
Operating Ceph
○ Our team should scale independently from the amount of clusters we
have. Goal is to automate as much as possible
○ Ceph operations are mostly done via Ansible and AWX
○ Some standalone tooling that we want to move to services in the future
9
Deploying Ceph
○ In-house ansible playbooks deploying Ceph in containers
○ Build our own Ceph packages
○ ceph/daemon-base image is used with a minimal script to mount OSDs at boot
○ OSD in sequential order are cool but that’s slow…
○ … pre-create all OSDs in CRUSH and launch OSD creation in parallel on all hosts
○ Pushes all the infos needed by services to our secrets management
platform
○ Future: automatic inventory discovery for 0 intervention deployment
10
Monitoring Ceph
○ ceph_exporter to monitor overall cluster health get useful data
○ https://github.com/digitalocean/ceph_exporter
○ storexporter and node_exporter used to gather data from the admin
socket of each daemon on every host
○ Canary process running regularly to check the performance and
availability of every cluster
11
Augmenting Ceph
○ Two ways to augment Ceph today
○ Block
○ Create all OSDs in a dummy root tree
○ Move hosts to default tree and slowly upweight them to avoid latency issues
■ Archimedes for upweights: https://github.com/digitalocean/archimedes
○ Object
○ Set nobackfill/norebalance
○ Create all OSDs
○ Cancel all backfills via upmaps (pgremapper)
○ Unset nobackfill/norebelance
○ ceph balancer or pgremapper’s undo-upmaps to remove the mappings
12
pgremapper
○ Pgremapper is opensource and heavily inspired by CERN’s scripts
○ https://github.com/digitalocean/pgremapper
○ https://github.com/cernceph/ceph-scripts/tree/master/tools/upmap
○ Allows to cancel backfill for augment or to prioritize recovery_wait PGs
○ Undo upmaps
○ Balance a CRUSH bucket
○ Drain an OSD
○ Remap PG
○ export/import mappings
13
OSD Lifecycle
○ Standalone tooling to diagnose, recondition, deploy, upgrade firmware
of OSDs
○ Wraps ceph-volume for OSD deployment and recondition
○ Automatically fetch latest firmware on recondition
○ Diagnose tooling looks at SMART data, syslog, to determine if a disk
needs to be physically replaced
○ Needs to be turned into a service
14
Automatic remediation service
○ Fixes recurring issues (e.g. ceph-mgr running out of memory)
○ Monitors every cluster in our fleet
○ If a known recurring issue is detected an AWX job is executed to fix it
○ Dramatically reduces alerts and interventions from our Cloud Operations
team
15
16
Ceph gripes
Ceph upgrades are hard
○ Process is simple but issues are many
○ Documentation is sparse
○ Can be slow (days) on HDDs with on-disk checks/changes
○ History of the cluster matters
○ A successful upgrade does not make a trend
○ Testing is extremely difficult
○ We’re thinking of ways to improve this!
17
RGW defaults
○ Dynamic resharding
○ Can DoS a cluster on very large buckets
○ Makes listing very expensive over time
○ Beast FE (Nautilus)
○ Most librados calls are still synchronous
○ Slow Ops can consume all the threads very quickly
○ Auto shard removal
○ Can DoS a cluster on very large shards
18
19
Questions?
We’re hiring! do.co/jobs - amarangone@digitalocean.com

More Related Content

What's hot

BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
Sage Weil
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
NAVER D2
 
Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0
Ceph Community
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
Karan Singh
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
OpenStack Korea Community
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
Karan Singh
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
Mirantis
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
Ceph Community
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
Sage Weil
 
Intel QLC: Cost-effective Ceph on NVMe
Intel QLC: Cost-effective Ceph on NVMeIntel QLC: Cost-effective Ceph on NVMe
Intel QLC: Cost-effective Ceph on NVMe
Ceph Community
 
Red hat ceph storage customer presentation
Red hat ceph storage customer presentationRed hat ceph storage customer presentation
Red hat ceph storage customer presentation
Rodrigo Missiaggia
 
Ceph c01
Ceph c01Ceph c01
Ceph c01
Lâm Đào
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
ScyllaDB
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCERCEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
Ceph Community
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
Red_Hat_Storage
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Karan Singh
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS Scheduler
Yongseok Oh
 
CephFS Update
CephFS UpdateCephFS Update
CephFS Update
Ceph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Danielle Womboldt
 

What's hot (20)

BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0Performance optimization for all flash based on aarch64 v2.0
Performance optimization for all flash based on aarch64 v2.0
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
Intel QLC: Cost-effective Ceph on NVMe
Intel QLC: Cost-effective Ceph on NVMeIntel QLC: Cost-effective Ceph on NVMe
Intel QLC: Cost-effective Ceph on NVMe
 
Red hat ceph storage customer presentation
Red hat ceph storage customer presentationRed hat ceph storage customer presentation
Red hat ceph storage customer presentation
 
Ceph c01
Ceph c01Ceph c01
Ceph c01
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCERCEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS Scheduler
 
CephFS Update
CephFS UpdateCephFS Update
CephFS Update
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 

Similar to Ceph Tech Talk: Ceph at DigitalOcean

2021.06. Ceph Project Update
2021.06. Ceph Project Update2021.06. Ceph Project Update
2021.06. Ceph Project Update
Ceph Community
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
Ceph Community
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
Ceph Community
 
Montreal OpenStack Q2 MeetUp - May 30th 2017
Montreal OpenStack Q2 MeetUp - May 30th 2017Montreal OpenStack Q2 MeetUp - May 30th 2017
Montreal OpenStack Q2 MeetUp - May 30th 2017
Stacy Véronneau
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
ShapeBlue
 
Ceph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfCeph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdf
Clyso GmbH
 
OpenStack Toronto Q2 MeetUp - June 1st 2017
OpenStack Toronto Q2 MeetUp - June 1st 2017OpenStack Toronto Q2 MeetUp - June 1st 2017
OpenStack Toronto Q2 MeetUp - June 1st 2017
Stacy Véronneau
 
OpenStack Ottawa Q2 MeetUp - May 31st 2017
OpenStack Ottawa Q2 MeetUp - May 31st 2017OpenStack Ottawa Q2 MeetUp - May 31st 2017
OpenStack Ottawa Q2 MeetUp - May 31st 2017
Stacy Véronneau
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Danny Al-Gaaf
 
OpenEBS hangout #4
OpenEBS hangout #4OpenEBS hangout #4
OpenEBS hangout #4
OpenEBS
 
Discoblocks.pptx.pdf
Discoblocks.pptx.pdfDiscoblocks.pptx.pdf
Discoblocks.pptx.pdf
Richárd Kovács
 
Kernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does MatterKernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does Matter
Anne Nicolas
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data Center
Ettore Simone
 
Snowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD PipelinesSnowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD Pipelines
Drew Hansen
 
How to deliver High Performance OpenStack Cloud: Christoph Dwertmann, Vault S...
How to deliver High Performance OpenStack Cloud: Christoph Dwertmann, Vault S...How to deliver High Performance OpenStack Cloud: Christoph Dwertmann, Vault S...
How to deliver High Performance OpenStack Cloud: Christoph Dwertmann, Vault S...
OpenStack
 
[WSO2Con EU 2018] Architecting for a Container Native Environment
[WSO2Con EU 2018] Architecting for a Container Native Environment[WSO2Con EU 2018] Architecting for a Container Native Environment
[WSO2Con EU 2018] Architecting for a Container Native Environment
WSO2
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
Linaro
 
Dynamic infrastructure for development
Dynamic infrastructure for developmentDynamic infrastructure for development
Dynamic infrastructure for development
Balázs Rostás
 
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Community
 
Taking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideTaking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and Decide
Bret Fisher
 

Similar to Ceph Tech Talk: Ceph at DigitalOcean (20)

2021.06. Ceph Project Update
2021.06. Ceph Project Update2021.06. Ceph Project Update
2021.06. Ceph Project Update
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
 
Montreal OpenStack Q2 MeetUp - May 30th 2017
Montreal OpenStack Q2 MeetUp - May 30th 2017Montreal OpenStack Q2 MeetUp - May 30th 2017
Montreal OpenStack Q2 MeetUp - May 30th 2017
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 
Ceph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfCeph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdf
 
OpenStack Toronto Q2 MeetUp - June 1st 2017
OpenStack Toronto Q2 MeetUp - June 1st 2017OpenStack Toronto Q2 MeetUp - June 1st 2017
OpenStack Toronto Q2 MeetUp - June 1st 2017
 
OpenStack Ottawa Q2 MeetUp - May 31st 2017
OpenStack Ottawa Q2 MeetUp - May 31st 2017OpenStack Ottawa Q2 MeetUp - May 31st 2017
OpenStack Ottawa Q2 MeetUp - May 31st 2017
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
 
OpenEBS hangout #4
OpenEBS hangout #4OpenEBS hangout #4
OpenEBS hangout #4
 
Discoblocks.pptx.pdf
Discoblocks.pptx.pdfDiscoblocks.pptx.pdf
Discoblocks.pptx.pdf
 
Kernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does MatterKernel Recipes 2014 - Performance Does Matter
Kernel Recipes 2014 - Performance Does Matter
 
TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data Center
 
Snowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD PipelinesSnowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD Pipelines
 
How to deliver High Performance OpenStack Cloud: Christoph Dwertmann, Vault S...
How to deliver High Performance OpenStack Cloud: Christoph Dwertmann, Vault S...How to deliver High Performance OpenStack Cloud: Christoph Dwertmann, Vault S...
How to deliver High Performance OpenStack Cloud: Christoph Dwertmann, Vault S...
 
[WSO2Con EU 2018] Architecting for a Container Native Environment
[WSO2Con EU 2018] Architecting for a Container Native Environment[WSO2Con EU 2018] Architecting for a Container Native Environment
[WSO2Con EU 2018] Architecting for a Container Native Environment
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
 
Dynamic infrastructure for development
Dynamic infrastructure for developmentDynamic infrastructure for development
Dynamic infrastructure for development
 
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wild
 
Taking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideTaking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and Decide
 

Recently uploaded

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 

Recently uploaded (20)

GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 

Ceph Tech Talk: Ceph at DigitalOcean

  • 1. Ceph at DigitalOcean 1 Alex Marangone - Sr Engineer
  • 2. Agenda ○ What’s DigitalOcean ○ Ceph at DigitalOcean ○ Operating Ceph ○ Ceph gripes ○ Questions 2
  • 4. DigitalOcean ○ Cloud provider founded in 2012 ○ Core concept: simplicity ○ First product: Droplets (i.e. VM) ○ Main attractivity was SSD based VM ○ Second product: Volumes (Block) in 2016 ○ Many more products have been added since, including Spaces (Object) ○ Datacenters in 8 regions ○ IPO in 2021 4
  • 6. Ceph at DigitalOcean ○ Block and Object storage are backed by Ceph ○ Images backups/BYOI isn’t but we’re looking into it ○ 37 production clusters on Ceph Nautilus ○ 1 on Ceph Luminous ○ >54 PB of raw Ceph storage ○ ~21,500 OSDs ○ Block storage is all flash ○ Object storage is HDD/Flash ○ Index nodes are flash 6
  • 7. Why Ceph ● Scalability ○ Scales horizontally ○ And vertically ● Reliability ○ Self-healing ○ Strongly consistent ● Performance ○ Generally acceptable performance ○ Performance scales ~linearly~ with disk/node count ● Works with off-the-shelf components ● Usable for multiple features/products 7
  • 9. Operating Ceph ○ Our team should scale independently from the amount of clusters we have. Goal is to automate as much as possible ○ Ceph operations are mostly done via Ansible and AWX ○ Some standalone tooling that we want to move to services in the future 9
  • 10. Deploying Ceph ○ In-house ansible playbooks deploying Ceph in containers ○ Build our own Ceph packages ○ ceph/daemon-base image is used with a minimal script to mount OSDs at boot ○ OSD in sequential order are cool but that’s slow… ○ … pre-create all OSDs in CRUSH and launch OSD creation in parallel on all hosts ○ Pushes all the infos needed by services to our secrets management platform ○ Future: automatic inventory discovery for 0 intervention deployment 10
  • 11. Monitoring Ceph ○ ceph_exporter to monitor overall cluster health get useful data ○ https://github.com/digitalocean/ceph_exporter ○ storexporter and node_exporter used to gather data from the admin socket of each daemon on every host ○ Canary process running regularly to check the performance and availability of every cluster 11
  • 12. Augmenting Ceph ○ Two ways to augment Ceph today ○ Block ○ Create all OSDs in a dummy root tree ○ Move hosts to default tree and slowly upweight them to avoid latency issues ■ Archimedes for upweights: https://github.com/digitalocean/archimedes ○ Object ○ Set nobackfill/norebalance ○ Create all OSDs ○ Cancel all backfills via upmaps (pgremapper) ○ Unset nobackfill/norebelance ○ ceph balancer or pgremapper’s undo-upmaps to remove the mappings 12
  • 13. pgremapper ○ Pgremapper is opensource and heavily inspired by CERN’s scripts ○ https://github.com/digitalocean/pgremapper ○ https://github.com/cernceph/ceph-scripts/tree/master/tools/upmap ○ Allows to cancel backfill for augment or to prioritize recovery_wait PGs ○ Undo upmaps ○ Balance a CRUSH bucket ○ Drain an OSD ○ Remap PG ○ export/import mappings 13
  • 14. OSD Lifecycle ○ Standalone tooling to diagnose, recondition, deploy, upgrade firmware of OSDs ○ Wraps ceph-volume for OSD deployment and recondition ○ Automatically fetch latest firmware on recondition ○ Diagnose tooling looks at SMART data, syslog, to determine if a disk needs to be physically replaced ○ Needs to be turned into a service 14
  • 15. Automatic remediation service ○ Fixes recurring issues (e.g. ceph-mgr running out of memory) ○ Monitors every cluster in our fleet ○ If a known recurring issue is detected an AWX job is executed to fix it ○ Dramatically reduces alerts and interventions from our Cloud Operations team 15
  • 17. Ceph upgrades are hard ○ Process is simple but issues are many ○ Documentation is sparse ○ Can be slow (days) on HDDs with on-disk checks/changes ○ History of the cluster matters ○ A successful upgrade does not make a trend ○ Testing is extremely difficult ○ We’re thinking of ways to improve this! 17
  • 18. RGW defaults ○ Dynamic resharding ○ Can DoS a cluster on very large buckets ○ Makes listing very expensive over time ○ Beast FE (Nautilus) ○ Most librados calls are still synchronous ○ Slow Ops can consume all the threads very quickly ○ Auto shard removal ○ Can DoS a cluster on very large shards 18
  • 19. 19 Questions? We’re hiring! do.co/jobs - amarangone@digitalocean.com