SlideShare a Scribd company logo
1 of 17
Download to read offline
Pinpoint Ceph bottleneck
out of cluster behavior mists
Cheng, Yingxin
yingxin.cheng@intel.com
Why performance matters?
What is performance?
How to improve it?
- Collect
- Represent
- Analyze
Why performance matters?
Answer: better user experience
• Not all activities matter. Requests impact users.
• Not all costs matter. Costs impact responses.
• The requests’ responding costs matter.
Bad: bottom-up strategies.
Approach?
1. Request-oriented distributed tracing.
2. Back-trace from response point.
3. Collect responding costs.
Why performance matters?
Collect responding costs.
Responding costs
• Motivation: Cover the request responding time.
• Consecutive: Single path, and no overlap with each other.
• Category: Execution costs in thread, or waiting costs between threads.
Represent performance of concurrent requests … ?
What is performance?
Answer: Latency and throughput
Bad: latency-only analysis, measure requests individually.
Approach?
1. Focus on performance of parallel requests.
2. New visualization for both throughput and latency of requests.
✓ ✗
✗ ✓
Understand
performance
Latency Throughput
What is performance?
Represent both latency and throughput
start end
Throughput
(out)
Latency
Throughput
(in)
latency
Better throughput
Better latency
Better latency
Better throughput
What is performance?
Represent performance of parallel requests
Responding costs: One-way, Consecutive, Flatten
More: Cluster behavior of 4M-Seq, iodepth=32
What is performance?
Represent bottleneck
Latencies are not necessarily dependent.
Throughputs are dependent.
• Lowest throughput
• Worse: causes wait latencies; most of times, bottleneck
IN OUT
1000 requests/second 100 requests/second
Bottleneck:
100 requests/second
-> system throughput.
How to improve it?
Answer: identify bottleneck root causes
Root causes categories:
• Physical: configuration, deployment, hardware
• Logical: parameters, algorithm, architecture
• Other workload
Bad: do optimization subjectively and in blindness.
Approach?
1. Relate each cost with:
• Physical location: host, component, process(service), thread
• Logical location: code, workflow
• Runtime context: request, write length, offset …
2. Incremental analysis
• Controlled-variables
• Orthogonal methods
• Verification
How to improve it?
Incremental & interactive analysis
Refinements
• Filter
• Search
• Organize
Understand
• Overview
• Statistics
• Visualizations
• Factor impacts
Distributed-tracing:
Visualization:
Interactive frontend:
Motivation-aligned.
Straightforward
performance representation.
Be analysis-friendly.
An example
1. Distributed-tracing: RBD image write
Component Request Entity
RBD
ImageWriteRequest Images
ObjectRequest
Objects
RADOS
ObjectWriteOperationOSD
ObjectStore
Background: RBD image write data-path
Experiment: 3VMs, 4M-SEQ-Write, iodepth=16
An example
2.1. Visualize performance (ImageWriteRequests)
RBD::ImageWriteRequests
RBD::ObjectRequests
RADOS::ObjectWriteOperations
writex
omap operations
aiooperate
Queue operations
Object store
An example
3.1. Interactive Analysis
Physical location: costs by host Logical location: costs by workflow order
Filter by workflow step “j2” (osd enqueueop -> dequeueop)
OSD::do-op
An example
3.2. Root cause Analysis (do-op, 4M-SEQ)
Parameters:
- osd_op_num_shards
- osd_op_num_threads_per_shard
- pg_num
1 Shard
8 Shards
32 Shards
An example
Interactive analysis ...
• Lapse, host/thread count distribution
• Cost distribution by hosts, steps …
• Show longest, most-complex request
• Message heatmap between hosts
• Write balance
• RBD cache validity
• Combination with resource monitoring tools
Thank you!
yingxin.cheng@intel.com
Distributed-tracing
Visualization
Interactive frontend

More Related Content

What's hot

Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Chris Nauroth
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJANicolas Poggi
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges DataWorks Summit
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSDataWorks Summit
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceSteve Loughran
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Apache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerHBaseCon
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera FieldHBaseCon
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...DataWorks Summit
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayDataWorks Summit
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesCloudera, Inc.
 

What's hot (20)

Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
sudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJAsudoers: Benchmarking Hadoop with ALOJA
sudoers: Benchmarking Hadoop with ALOJA
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG France
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ TwitterCross-DC Fault-Tolerant ViewFileSystem @ Twitter
Cross-DC Fault-Tolerant ViewFileSystem @ Twitter
 
Apache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at CernerApache HBase in the Enterprise Data Hub at Cerner
Apache HBase in the Enterprise Data Hub at Cerner
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Tales from the Cloudera Field
Tales from the Cloudera FieldTales from the Cloudera Field
Tales from the Cloudera Field
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
 
Benchmarking
BenchmarkingBenchmarking
Benchmarking
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native way
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
 

Similar to Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng

Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013Daniel Austin
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Daniel Austin
 
Apache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesApache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesVinay Shukla
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsSAIL_QU
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...Michael Kehoe
 
Requirement analysis and UML modelling in Software engineering
Requirement analysis and UML modelling in Software engineeringRequirement analysis and UML modelling in Software engineering
Requirement analysis and UML modelling in Software engineeringsnehalkulkarni74
 
Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Steve Feldman
 
Performance Testing Java Applications
Performance Testing Java ApplicationsPerformance Testing Java Applications
Performance Testing Java ApplicationsC4Media
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Lionel Briand
 
requirement engineering
requirement engineeringrequirement engineering
requirement engineeringanam singla
 
Performance Testing from Scratch + JMeter intro
Performance Testing from Scratch + JMeter introPerformance Testing from Scratch + JMeter intro
Performance Testing from Scratch + JMeter introMykola Kovsh
 
B2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingB2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingSteve Feldman
 

Similar to Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng (20)

Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014
 
Man.ppt
Man.pptMan.ppt
Man.ppt
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
 
Apache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesApache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenches
 
Requirements analysis lecture
Requirements analysis lectureRequirements analysis lecture
Requirements analysis lecture
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise Applications
 
Soft requirement
Soft requirementSoft requirement
Soft requirement
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
Requirement analysis and UML modelling in Software engineering
Requirement analysis and UML modelling in Software engineeringRequirement analysis and UML modelling in Software engineering
Requirement analysis and UML modelling in Software engineering
 
Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07
 
Performance Testing Java Applications
Performance Testing Java ApplicationsPerformance Testing Java Applications
Performance Testing Java Applications
 
Requirement Analysis - Software Enigneering
Requirement Analysis - Software EnigneeringRequirement Analysis - Software Enigneering
Requirement Analysis - Software Enigneering
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...
 
Apsec 2014 Presentation
Apsec 2014 PresentationApsec 2014 Presentation
Apsec 2014 Presentation
 
The art of project estimation
The art of project estimationThe art of project estimation
The art of project estimation
 
requirement engineering
requirement engineeringrequirement engineering
requirement engineering
 
Performance Testing from Scratch + JMeter intro
Performance Testing from Scratch + JMeter introPerformance Testing from Scratch + JMeter intro
Performance Testing from Scratch + JMeter intro
 
Bdd with m spec
Bdd with m specBdd with m spec
Bdd with m spec
 
B2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingB2 2006 sizing_benchmarking
B2 2006 sizing_benchmarking
 

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Pinpoint Ceph Bottleneck Out of Cluster Behavior Mists - Yingxin Cheng

  • 1. Pinpoint Ceph bottleneck out of cluster behavior mists Cheng, Yingxin yingxin.cheng@intel.com
  • 2. Why performance matters? What is performance? How to improve it? - Collect - Represent - Analyze
  • 3. Why performance matters? Answer: better user experience • Not all activities matter. Requests impact users. • Not all costs matter. Costs impact responses. • The requests’ responding costs matter. Bad: bottom-up strategies. Approach? 1. Request-oriented distributed tracing. 2. Back-trace from response point. 3. Collect responding costs.
  • 4. Why performance matters? Collect responding costs. Responding costs • Motivation: Cover the request responding time. • Consecutive: Single path, and no overlap with each other. • Category: Execution costs in thread, or waiting costs between threads. Represent performance of concurrent requests … ?
  • 5. What is performance? Answer: Latency and throughput Bad: latency-only analysis, measure requests individually. Approach? 1. Focus on performance of parallel requests. 2. New visualization for both throughput and latency of requests. ✓ ✗ ✗ ✓ Understand performance Latency Throughput
  • 6. What is performance? Represent both latency and throughput start end Throughput (out) Latency Throughput (in) latency Better throughput Better latency Better latency Better throughput
  • 7. What is performance? Represent performance of parallel requests Responding costs: One-way, Consecutive, Flatten More: Cluster behavior of 4M-Seq, iodepth=32
  • 8. What is performance? Represent bottleneck Latencies are not necessarily dependent. Throughputs are dependent. • Lowest throughput • Worse: causes wait latencies; most of times, bottleneck IN OUT 1000 requests/second 100 requests/second Bottleneck: 100 requests/second -> system throughput.
  • 9. How to improve it? Answer: identify bottleneck root causes Root causes categories: • Physical: configuration, deployment, hardware • Logical: parameters, algorithm, architecture • Other workload Bad: do optimization subjectively and in blindness. Approach? 1. Relate each cost with: • Physical location: host, component, process(service), thread • Logical location: code, workflow • Runtime context: request, write length, offset … 2. Incremental analysis • Controlled-variables • Orthogonal methods • Verification
  • 10. How to improve it? Incremental & interactive analysis Refinements • Filter • Search • Organize Understand • Overview • Statistics • Visualizations • Factor impacts
  • 12. An example 1. Distributed-tracing: RBD image write Component Request Entity RBD ImageWriteRequest Images ObjectRequest Objects RADOS ObjectWriteOperationOSD ObjectStore Background: RBD image write data-path Experiment: 3VMs, 4M-SEQ-Write, iodepth=16
  • 13. An example 2.1. Visualize performance (ImageWriteRequests) RBD::ImageWriteRequests RBD::ObjectRequests RADOS::ObjectWriteOperations writex omap operations aiooperate Queue operations Object store
  • 14. An example 3.1. Interactive Analysis Physical location: costs by host Logical location: costs by workflow order Filter by workflow step “j2” (osd enqueueop -> dequeueop) OSD::do-op
  • 15. An example 3.2. Root cause Analysis (do-op, 4M-SEQ) Parameters: - osd_op_num_shards - osd_op_num_threads_per_shard - pg_num 1 Shard 8 Shards 32 Shards
  • 16. An example Interactive analysis ... • Lapse, host/thread count distribution • Cost distribution by hosts, steps … • Show longest, most-complex request • Message heatmap between hosts • Write balance • RBD cache validity • Combination with resource monitoring tools