SlideShare a Scribd company logo
1 of 20
Download to read offline
Making Ceph Fast in the Face of Failure
Josh Durgin and Neha Ojha
2018.08.28
The Dark Ages
Hammer
Baseline Hammer
0
0.2
0.4
0.6
0.8
1
1.2
Recovery vs Baseline IOPS
IOPS/Baseline
Hammer
●
Favored maximum recovery speed
●
Default client impact was huge
●
Tuning could help
osd max backfills = 10 → 1
osd recovery max active = 15 → 3
osd recovery op priority = 10 → 3
osd recovery max single start = 5 → 1
Infernalis
Baseline Hammer Infernalis
0
0.2
0.4
0.6
0.8
1
1.2
Recovery vs Baseline IOPS
IOPS/Baseline
Manual way to recover higher-level constructs, e.g. rbd images
pg force-recovery command
ceph pg force-recovery {pg-id} [{pg-id #2}] [{pg-id #3} ...]
ceph pg force-backfill {pg-id} [{pg-id #2}] [{pg-id #3} ...]
If you change your mind or prioritize wrong groups, use:
ceph pg cancel-force-recovery {pg-id} [{pg-id #2}] [{pg-id #3} ...]
ceph pg cancel-force-backfill {pg-id} [{pg-id #2}] [{pg-id #3} ...]
Luminous – High-level Priority
Luminous – Throttling Recovery
●
osd_recovery_sleep
●
changing this will shift the balance between
recovery and client I/O
●
Different configurations based on underlying
hardware
Recovery Sleep - HDDs
BlueStore on HDDs with Fio 4k random writes
➢ osd_recovery_sleep_hdd chosen as 0.1
Recovery Sleep - SSDs
BlueStore on SSDs with Fio 4k random writes
➢ osd_recovery_sleep_ssd chosen as 0
Recovery Sleep - Hybrid
BlueStore on HDDs+SSDs with Fio 4k random writes
➢ osd_recovery_sleep_hybrid chosen as 0.025
Luminous Defaults are much better
Baseline Hammer Infernalis Luminous
0
0.2
0.4
0.6
0.8
1
1.2
Recovery vs Baseline IOPS
IOPS/Baseline
Recovery in Ceph has been a synchronous process - it blocked writes to an object
until it was recovered.
Problem: This increases write latencies and affects availability.
Solution in Mimic: Asynchronous Recovery
Do not block writes on objects, which are only missing on non-actingset OSDs.
Perform recovery in the background on an OSD, out of the acting set, similar to
backfill, and use the PG log to determine what needs recovery.
Improving Latency during Recovery
Async recovery targets - OSDs that are not part of the acting set and are chosen
based on the following:
● approximate magnitude of the difference in length of logs is used as the cost
of recovery, async recovery targets have higher cost to recover
● threshold osd_async_recovery_min_pg_log_entries(default value=100) is
used to determine when asynchronous recovery is appropriate
● min_size replicas should be available
When do we perform asynchronous recovery?
RGW Workload generated using Cosbench.
Operations - read, list, write, delete
Cluster prefilled ~ 30%, 40000 objects
Kill one OSD to induce recovery
Recovery Experiments
Impact on Throughput during Recovery
Impact on Latency during Recovery
No Failure
OSD Failure
Throughput Comparison - COSBench
- Optimizations
- Partial object recovery
- Speed up backfill scanning and transfer
- Adaptive recovery throttling - set the value of osd_recovery_sleep based on
client load.
- QoS
- Recovery Order
Future Work
- Upgrade! Much better defaults and finer tuning in Luminous and Mimic
- Recovery sleep is all you need if you want to change client i/o vs recovery
balance
- Tuning older options (e.g. max active, single start, max backfill) only needed if
you want to increase recovery/backfill throughput
Summary
THANK YOU
Josh Durgin : jdurgin@redhat.com
Neha Ojha : nojha@redhat.com

More Related Content

What's hot

Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedAdrian Huang
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniZalando Technology
 
Webinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance ImplicationsWebinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance ImplicationsMongoDB
 
Secure container: Kata container and gVisor
Secure container: Kata container and gVisorSecure container: Kata container and gVisor
Secure container: Kata container and gVisorChing-Hsuan Yen
 
Windows IOCP vs Linux EPOLL Performance Comparison
Windows IOCP vs Linux EPOLL Performance ComparisonWindows IOCP vs Linux EPOLL Performance Comparison
Windows IOCP vs Linux EPOLL Performance ComparisonSeungmo Koo
 
Gateway/APIC security
Gateway/APIC securityGateway/APIC security
Gateway/APIC securityShiu-Fun Poon
 
Minio Cloud Storage
Minio Cloud StorageMinio Cloud Storage
Minio Cloud StorageMinio
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific DashboardCeph Community
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRoberto Franchini
 
Redis in Practice
Redis in PracticeRedis in Practice
Redis in PracticeNoah Davis
 
Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPABuilding Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPALDAPCon
 
Pynvme introduction
Pynvme introductionPynvme introduction
Pynvme introductionCrane Chu
 
Black Hat Europe 2017. DPAPI and DPAPI-NG: Decryption Toolkit
Black Hat Europe 2017. DPAPI and DPAPI-NG: Decryption ToolkitBlack Hat Europe 2017. DPAPI and DPAPI-NG: Decryption Toolkit
Black Hat Europe 2017. DPAPI and DPAPI-NG: Decryption ToolkitPaula Januszkiewicz
 
Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용흥배 최
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortNAVER D2
 
How to Import JSON Using Cypher and APOC
How to Import JSON Using Cypher and APOCHow to Import JSON Using Cypher and APOC
How to Import JSON Using Cypher and APOCNeo4j
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?Pradeep Kumar
 

What's hot (20)

Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando Patroni
 
Webinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance ImplicationsWebinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance Implications
 
Secure container: Kata container and gVisor
Secure container: Kata container and gVisorSecure container: Kata container and gVisor
Secure container: Kata container and gVisor
 
Windows IOCP vs Linux EPOLL Performance Comparison
Windows IOCP vs Linux EPOLL Performance ComparisonWindows IOCP vs Linux EPOLL Performance Comparison
Windows IOCP vs Linux EPOLL Performance Comparison
 
Gateway/APIC security
Gateway/APIC securityGateway/APIC security
Gateway/APIC security
 
Minio Cloud Storage
Minio Cloud StorageMinio Cloud Storage
Minio Cloud Storage
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time stream
 
Redis in Practice
Redis in PracticeRedis in Practice
Redis in Practice
 
Building Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPABuilding Open Source Identity Management with FreeIPA
Building Open Source Identity Management with FreeIPA
 
Pynvme introduction
Pynvme introductionPynvme introduction
Pynvme introduction
 
Xen Debugging
Xen DebuggingXen Debugging
Xen Debugging
 
Black Hat Europe 2017. DPAPI and DPAPI-NG: Decryption Toolkit
Black Hat Europe 2017. DPAPI and DPAPI-NG: Decryption ToolkitBlack Hat Europe 2017. DPAPI and DPAPI-NG: Decryption Toolkit
Black Hat Europe 2017. DPAPI and DPAPI-NG: Decryption Toolkit
 
Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용Twitter의 snowflake 소개 및 활용
Twitter의 snowflake 소개 및 활용
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
How to Import JSON Using Cypher and APOC
How to Import JSON Using Cypher and APOCHow to Import JSON Using Cypher and APOC
How to Import JSON Using Cypher and APOC
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?
 

Similar to Making Ceph fast in the face of failure

Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Community
 
Cephalocon apac china
Cephalocon apac chinaCephalocon apac china
Cephalocon apac chinaVikhyat Umrao
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Ceph Community
 
Postgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterPostgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterEDB
 
LCU14 201- Binary Analysis Tools
LCU14 201- Binary Analysis ToolsLCU14 201- Binary Analysis Tools
LCU14 201- Binary Analysis ToolsLinaro
 
Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus cagingBertrandDrouvot
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionKaran Singh
 
Techniques to Improve Cache Speed
Techniques to Improve Cache SpeedTechniques to Improve Cache Speed
Techniques to Improve Cache SpeedZohaib Hassan
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practiceAlexey Lesovsky
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystifiedOmid Vahdaty
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
 
Deep dive into the Rds PostgreSQL Universe Austin 2017
Deep dive into the Rds PostgreSQL Universe Austin 2017Deep dive into the Rds PostgreSQL Universe Austin 2017
Deep dive into the Rds PostgreSQL Universe Austin 2017Grant McAlister
 
Filesystem Performance from a Database Perspective
Filesystem Performance from a Database PerspectiveFilesystem Performance from a Database Perspective
Filesystem Performance from a Database PerspectiveMark Wong
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Community
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Kohei KaiGai
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDinakar Guniguntala
 
Dumb Simple PostgreSQL Performance (NYCPUG)
Dumb Simple PostgreSQL Performance (NYCPUG)Dumb Simple PostgreSQL Performance (NYCPUG)
Dumb Simple PostgreSQL Performance (NYCPUG)Joshua Drake
 

Similar to Making Ceph fast in the face of failure (20)

Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons LearnedCeph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned
 
Cephalocon apac china
Cephalocon apac chinaCephalocon apac china
Cephalocon apac china
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
 
Postgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even FasterPostgres Vision 2018: Making Postgres Even Faster
Postgres Vision 2018: Making Postgres Even Faster
 
LCU14 201- Binary Analysis Tools
LCU14 201- Binary Analysis ToolsLCU14 201- Binary Analysis Tools
LCU14 201- Binary Analysis Tools
 
Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus caging
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 
MesosCon 2018
MesosCon 2018MesosCon 2018
MesosCon 2018
 
Techniques to Improve Cache Speed
Techniques to Improve Cache SpeedTechniques to Improve Cache Speed
Techniques to Improve Cache Speed
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 
Ceph barcelona-v-1.2
Ceph barcelona-v-1.2Ceph barcelona-v-1.2
Ceph barcelona-v-1.2
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practice
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
Deep dive into the Rds PostgreSQL Universe Austin 2017
Deep dive into the Rds PostgreSQL Universe Austin 2017Deep dive into the Rds PostgreSQL Universe Austin 2017
Deep dive into the Rds PostgreSQL Universe Austin 2017
 
Filesystem Performance from a Database Perspective
Filesystem Performance from a Database PerspectiveFilesystem Performance from a Database Perspective
Filesystem Performance from a Database Perspective
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
DevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on KubernetesDevoxxUK: Optimizating Application Performance on Kubernetes
DevoxxUK: Optimizating Application Performance on Kubernetes
 
Dumb Simple PostgreSQL Performance (NYCPUG)
Dumb Simple PostgreSQL Performance (NYCPUG)Dumb Simple PostgreSQL Performance (NYCPUG)
Dumb Simple PostgreSQL Performance (NYCPUG)
 

Recently uploaded

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 

Recently uploaded (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 

Making Ceph fast in the face of failure

  • 1. Making Ceph Fast in the Face of Failure Josh Durgin and Neha Ojha 2018.08.28
  • 4. Hammer ● Favored maximum recovery speed ● Default client impact was huge ● Tuning could help osd max backfills = 10 → 1 osd recovery max active = 15 → 3 osd recovery op priority = 10 → 3 osd recovery max single start = 5 → 1
  • 6. Manual way to recover higher-level constructs, e.g. rbd images pg force-recovery command ceph pg force-recovery {pg-id} [{pg-id #2}] [{pg-id #3} ...] ceph pg force-backfill {pg-id} [{pg-id #2}] [{pg-id #3} ...] If you change your mind or prioritize wrong groups, use: ceph pg cancel-force-recovery {pg-id} [{pg-id #2}] [{pg-id #3} ...] ceph pg cancel-force-backfill {pg-id} [{pg-id #2}] [{pg-id #3} ...] Luminous – High-level Priority
  • 7. Luminous – Throttling Recovery ● osd_recovery_sleep ● changing this will shift the balance between recovery and client I/O ● Different configurations based on underlying hardware
  • 8. Recovery Sleep - HDDs BlueStore on HDDs with Fio 4k random writes ➢ osd_recovery_sleep_hdd chosen as 0.1
  • 9. Recovery Sleep - SSDs BlueStore on SSDs with Fio 4k random writes ➢ osd_recovery_sleep_ssd chosen as 0
  • 10. Recovery Sleep - Hybrid BlueStore on HDDs+SSDs with Fio 4k random writes ➢ osd_recovery_sleep_hybrid chosen as 0.025
  • 11. Luminous Defaults are much better Baseline Hammer Infernalis Luminous 0 0.2 0.4 0.6 0.8 1 1.2 Recovery vs Baseline IOPS IOPS/Baseline
  • 12. Recovery in Ceph has been a synchronous process - it blocked writes to an object until it was recovered. Problem: This increases write latencies and affects availability. Solution in Mimic: Asynchronous Recovery Do not block writes on objects, which are only missing on non-actingset OSDs. Perform recovery in the background on an OSD, out of the acting set, similar to backfill, and use the PG log to determine what needs recovery. Improving Latency during Recovery
  • 13. Async recovery targets - OSDs that are not part of the acting set and are chosen based on the following: ● approximate magnitude of the difference in length of logs is used as the cost of recovery, async recovery targets have higher cost to recover ● threshold osd_async_recovery_min_pg_log_entries(default value=100) is used to determine when asynchronous recovery is appropriate ● min_size replicas should be available When do we perform asynchronous recovery?
  • 14. RGW Workload generated using Cosbench. Operations - read, list, write, delete Cluster prefilled ~ 30%, 40000 objects Kill one OSD to induce recovery Recovery Experiments
  • 15. Impact on Throughput during Recovery
  • 16. Impact on Latency during Recovery
  • 17. No Failure OSD Failure Throughput Comparison - COSBench
  • 18. - Optimizations - Partial object recovery - Speed up backfill scanning and transfer - Adaptive recovery throttling - set the value of osd_recovery_sleep based on client load. - QoS - Recovery Order Future Work
  • 19. - Upgrade! Much better defaults and finer tuning in Luminous and Mimic - Recovery sleep is all you need if you want to change client i/o vs recovery balance - Tuning older options (e.g. max active, single start, max backfill) only needed if you want to increase recovery/backfill throughput Summary
  • 20. THANK YOU Josh Durgin : jdurgin@redhat.com Neha Ojha : nojha@redhat.com