SlideShare a Scribd company logo
1 of 30
Which Hypervisor is Best?
MySQL on Ceph
3:30pm – 4:20pm
Room 203
WHOIS
Kyle Bader
Storage Solution Architectures
Red Hat
Yves Trudeau
Principal Architect
Percona
AGENDA
• Ceph Architecture Elevator Pitch
• Tuning Ceph Block (RBD)
• Tuning QEMU Block Virtualization
• Benchmarks
Ceph Architecture
ARCHITECTURAL COMPONENTS
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CEPHFS
A distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
ARCHITECTURAL COMPONENTS
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CEPHFS
A distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
Linux Containers
vs
Virtual Machines
KVM/QEMU RBD BACKEND
RADOS CLUSTER
PERCONA ON KRBD
RADOS CLUSTER
TUNING CEPH BLOCK
TUNING CEPH BLOCK
• Format
• Order
• Fancy Striping
• TCP_NO_DELAY
RBD FORMAT
• Format 1
• Deprecated
• Supported by all versions of Ceph
• No reason to use it in greenfield environment
• Format 2
• New, default, format
• Support snapshot and clone
RBD ODER
• The chunk / striping boundary for block device
• Default is 4MB -> 22
• 4MB = 222
• Used default during our testing
RBD: Fancy Striping
• Only available to QEMU / librbd
• Finer striping for parallelization of small writes across order
• Helps with some HDD workloads
• Used default during our testing
TCP_NO_DELAY
• Disables Nagel congestion control algorithm
• Important for latency sensitive workloads
• Good for maximizing IOPS -> MySQL
• Default in QEMU
• Default in KRBD
• Added in mainline 4.2
• Backported to RHEL 7.2 3.10-236+
TUNING QEMU
BLOCK VIRTUALIZATION
TUNING QEMU BLOCK
• Paravirtual Devices
• AIO Mode
• Caching
• x-data-plane
• num_queues
QEMU: PARAVIRTUAL DEVICES
• Virtio-blk
• Virtio-scsi
QEMU: AIO MODE
• Threads
• Software implementation of aio using thread pool
• Native
• User Kernel AIO
• Way to go in the future
QEMU: CACHING
Writeback None Writethrough Directsync
Uses Host
Page Cache Yes No Yes No
Guest Disk
WCE Enabled Enabled Disabled Disabled
rbd_cache True False True False
rbd_max_dirty 25165824 0 0 0
QEMU: Timers
• Block storage benchmark too – fio
• Very frequent access to CPU timing registers
• Accesses need to be emulated
• Can block main QEMU event loop with concurrent high IO load
QEMU: Timers
• Block storage benchmark too – fio
• Very frequent access to CPU timing registers
• Accesses need to be emulated
• Can block main QEMU event loop with concurrent high IO load
BENCHMARKS
BENCHMARKS
• Sysbench OLTP, 32 tables of each 28M rows, ~200GB
• MySQL config: 50GB buffer pool, 8MB log file size, ACID
• Filesystem: XFS with noatime, nodiratime, nobarrier
• Data reloaded before each test
• 100% reads: --oltp-point-select=100
• 100% writes: --oltp-index-updates=100
• 70%/30% reads/writes: --oltp-index-updates=28 --oltp-point-select=70
--rand-type=uniform
• 20 minute run time per test, iterations averaged
• 64 threads, 8 cores
BASIC QEMU PERFORMANCE
0
5000
10000
15000
20000
25000
30000
35000
qemu tcg qemu-kvm-default io=threads cache=none io=native cache=none
IOPS
Reads Writes R/W 70/30
THREAD CACHING MODES
0
5000
10000
15000
20000
25000
30000
io=threads cache=none io=threads cache=writethrough io=threads cache=writeback
IOPS
Reads Writes R/W 70/30
DEDICATED DISPATCH THREADS
0
5000
10000
15000
20000
25000
30000
35000
io=native cache=none io=native cache=directsync io=native cache=directsync
iothread=1
io=native cache=directsync
iothread=2
IOPS
Reads Writes R/W 70/30
DATA PLANE AND VIRTIO-SCSI QUEUES
0
5000
10000
15000
20000
25000
30000
35000
40000
x-data-plane virtio-scsi, num-queues=4 virtio-scsi, num-queues=2, vectors=3 virtio-scsi, num-queues=4, vectors=5
IOPS
Reads Writes R/W 70/30
CONTAINERS AND METAL
0
10000
20000
30000
40000
50000
60000
Metal (taskset -c 10-17) lxc (cgroup cpu 10-17) io=threads cache=none io=native cache=none virtio-scsi, num-queues=2,
vectors=3
IOPS
Reads Writes R/W 70/30
THANK YOU

More Related Content

What's hot

Handling Redis failover with ZooKeeper
Handling Redis failover with ZooKeeperHandling Redis failover with ZooKeeper
Handling Redis failover with ZooKeeper
ryanlecompte
 
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
DevOpsDays Tel Aviv
 

What's hot (19)

inwinSTACK - ceph integrate with kubernetes
inwinSTACK - ceph integrate with kubernetesinwinSTACK - ceph integrate with kubernetes
inwinSTACK - ceph integrate with kubernetes
 
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCERCEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
 
RedisConf17 - Redis in High Traffic Adtech Stack
RedisConf17 - Redis in High Traffic Adtech StackRedisConf17 - Redis in High Traffic Adtech Stack
RedisConf17 - Redis in High Traffic Adtech Stack
 
RedisConf17 - Redis Cluster at flickr and tripod
RedisConf17 - Redis Cluster at flickr and tripodRedisConf17 - Redis Cluster at flickr and tripod
RedisConf17 - Redis Cluster at flickr and tripod
 
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
 
RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
RedisConf17 - Lyft - Geospatial at Scale - Daniel HochmanRedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
RedisConf17 - Lyft - Geospatial at Scale - Daniel Hochman
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Handling Redis failover with ZooKeeper
Handling Redis failover with ZooKeeperHandling Redis failover with ZooKeeper
Handling Redis failover with ZooKeeper
 
OpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of AlabamaOpenStack and Ceph case study at the University of Alabama
OpenStack and Ceph case study at the University of Alabama
 
RGW Beyond Cloud: Live Video Storage with Ceph - Shengjing Zhu, Yiming Xie
RGW Beyond Cloud: Live Video Storage with Ceph - Shengjing Zhu, Yiming XieRGW Beyond Cloud: Live Video Storage with Ceph - Shengjing Zhu, Yiming Xie
RGW Beyond Cloud: Live Video Storage with Ceph - Shengjing Zhu, Yiming Xie
 
Erasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William ByrneErasure Code at Scale - Thomas William Byrne
Erasure Code at Scale - Thomas William Byrne
 
Architecting Ceph Solutions
Architecting Ceph SolutionsArchitecting Ceph Solutions
Architecting Ceph Solutions
 
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
 
Why learn jenkins via nomad_ci (nomad/consul/docker/jenkins) 
Why learn jenkins via nomad_ci (nomad/consul/docker/jenkins) Why learn jenkins via nomad_ci (nomad/consul/docker/jenkins) 
Why learn jenkins via nomad_ci (nomad/consul/docker/jenkins) 
 
Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014Ceph and OpenStack - Feb 2014
Ceph and OpenStack - Feb 2014
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
 
London HUG 8/3 - Nomad
London HUG 8/3 - NomadLondon HUG 8/3 - Nomad
London HUG 8/3 - Nomad
 
Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers
 
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
 

Viewers also liked

A Storage Story #ChefConf2013
A Storage Story #ChefConf2013A Storage Story #ChefConf2013
A Storage Story #ChefConf2013
Kyle Bader
 

Viewers also liked (20)

Tiery Eyed
Tiery EyedTiery Eyed
Tiery Eyed
 
A Storage Story #ChefConf2013
A Storage Story #ChefConf2013A Storage Story #ChefConf2013
A Storage Story #ChefConf2013
 
Zend Core on IBM i - Security Considerations
Zend Core on IBM i - Security ConsiderationsZend Core on IBM i - Security Considerations
Zend Core on IBM i - Security Considerations
 
Script it
Script itScript it
Script it
 
Oracle Compute Cloud Service快速实践
Oracle Compute Cloud Service快速实践Oracle Compute Cloud Service快速实践
Oracle Compute Cloud Service快速实践
 
Oracle cloud ravello介绍及测试账户申请
Oracle cloud ravello介绍及测试账户申请Oracle cloud ravello介绍及测试账户申请
Oracle cloud ravello介绍及测试账户申请
 
MySQL Manchester TT - Security
MySQL Manchester TT  - SecurityMySQL Manchester TT  - Security
MySQL Manchester TT - Security
 
Application Diagnosis with Zend Server Tracing
Application Diagnosis with Zend Server TracingApplication Diagnosis with Zend Server Tracing
Application Diagnosis with Zend Server Tracing
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Solving the C20K problem: Raising the bar in PHP Performance and Scalability
Solving the C20K problem: Raising the bar in PHP Performance and ScalabilitySolving the C20K problem: Raising the bar in PHP Performance and Scalability
Solving the C20K problem: Raising the bar in PHP Performance and Scalability
 
Oracle cloud 使用云市场快速搭建小型电商网站
Oracle cloud 使用云市场快速搭建小型电商网站Oracle cloud 使用云市场快速搭建小型电商网站
Oracle cloud 使用云市场快速搭建小型电商网站
 
MySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats newMySQL Manchester TT - 5.7 Whats new
MySQL Manchester TT - 5.7 Whats new
 
Framework Shootout
Framework ShootoutFramework Shootout
Framework Shootout
 
PHP and Platform Independance in the Cloud
PHP and Platform Independance in the CloudPHP and Platform Independance in the Cloud
PHP and Platform Independance in the Cloud
 
PHP on IBM i Tutorial
PHP on IBM i TutorialPHP on IBM i Tutorial
PHP on IBM i Tutorial
 
PHP on Windows - What's New
PHP on Windows - What's NewPHP on Windows - What's New
PHP on Windows - What's New
 
MySQL Manchester TT - Replication Features
MySQL Manchester TT  - Replication FeaturesMySQL Manchester TT  - Replication Features
MySQL Manchester TT - Replication Features
 
MySQL Tech Tour 2015 - 5.7 Connector/J/Net
MySQL Tech Tour 2015 - 5.7 Connector/J/NetMySQL Tech Tour 2015 - 5.7 Connector/J/Net
MySQL Tech Tour 2015 - 5.7 Connector/J/Net
 
Oracle Compute Cloud Service介绍
Oracle Compute Cloud Service介绍Oracle Compute Cloud Service介绍
Oracle Compute Cloud Service介绍
 
MySQL in your laptop
MySQL in your laptopMySQL in your laptop
MySQL in your laptop
 

Similar to Which Hypervisor is Best?

Similar to Which Hypervisor is Best? (20)

OpenStack and Windows
OpenStack and WindowsOpenStack and Windows
OpenStack and Windows
 
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
 
NAVER Ceph Storage on ssd for Container
NAVER Ceph Storage on ssd for ContainerNAVER Ceph Storage on ssd for Container
NAVER Ceph Storage on ssd for Container
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kilo
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaS
 
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph StorageRed Hat Storage Day New York - What's New in Red Hat Ceph Storage
Red Hat Storage Day New York - What's New in Red Hat Ceph Storage
 
stackconf 2020 | Replace your Docker based Containers with Cri-o Kata Contain...
stackconf 2020 | Replace your Docker based Containers with Cri-o Kata Contain...stackconf 2020 | Replace your Docker based Containers with Cri-o Kata Contain...
stackconf 2020 | Replace your Docker based Containers with Cri-o Kata Contain...
 
Ceph - Desmistificando Software-Define Storage
Ceph - Desmistificando Software-Define StorageCeph - Desmistificando Software-Define Storage
Ceph - Desmistificando Software-Define Storage
 
New use cases for Ceph, beyond OpenStack, Luis Rico
New use cases for Ceph, beyond OpenStack, Luis RicoNew use cases for Ceph, beyond OpenStack, Luis Rico
New use cases for Ceph, beyond OpenStack, Luis Rico
 
Migrating to aws
Migrating to awsMigrating to aws
Migrating to aws
 
Climb Technical Overview
Climb Technical OverviewClimb Technical Overview
Climb Technical Overview
 
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
How Netflix’s Tools Can Help Accelerate Your Start-up (SVC202) | AWS re:Inven...
 
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxBetter, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
 
Svc 202-netflix-open-source
Svc 202-netflix-open-sourceSvc 202-netflix-open-source
Svc 202-netflix-open-source
 
Open vStorage Road show 2015 Q1
Open vStorage Road show 2015 Q1Open vStorage Road show 2015 Q1
Open vStorage Road show 2015 Q1
 
A Tale of 2 Systems
A Tale of 2 SystemsA Tale of 2 Systems
A Tale of 2 Systems
 
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with CrowbarWicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
Wicked Easy Ceph Block Storage & OpenStack Deployment with Crowbar
 
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
BDTC2015 hulu-梁宇明-voidbox - docker on yarnBDTC2015 hulu-梁宇明-voidbox - docker on yarn
BDTC2015 hulu-梁宇明-voidbox - docker on yarn
 
What is coming for VMware vSphere?
What is coming for VMware vSphere?What is coming for VMware vSphere?
What is coming for VMware vSphere?
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Which Hypervisor is Best?

  • 1. Which Hypervisor is Best? MySQL on Ceph 3:30pm – 4:20pm Room 203
  • 2. WHOIS Kyle Bader Storage Solution Architectures Red Hat Yves Trudeau Principal Architect Percona
  • 3. AGENDA • Ceph Architecture Elevator Pitch • Tuning Ceph Block (RBD) • Tuning QEMU Block Virtualization • Benchmarks
  • 5. ARCHITECTURAL COMPONENTS RGW A web services gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully- distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale- out metadata APP HOST/VM CLIENT
  • 6. ARCHITECTURAL COMPONENTS RGW A web services gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully- distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale- out metadata APP HOST/VM CLIENT
  • 11. TUNING CEPH BLOCK • Format • Order • Fancy Striping • TCP_NO_DELAY
  • 12. RBD FORMAT • Format 1 • Deprecated • Supported by all versions of Ceph • No reason to use it in greenfield environment • Format 2 • New, default, format • Support snapshot and clone
  • 13. RBD ODER • The chunk / striping boundary for block device • Default is 4MB -> 22 • 4MB = 222 • Used default during our testing
  • 14. RBD: Fancy Striping • Only available to QEMU / librbd • Finer striping for parallelization of small writes across order • Helps with some HDD workloads • Used default during our testing
  • 15. TCP_NO_DELAY • Disables Nagel congestion control algorithm • Important for latency sensitive workloads • Good for maximizing IOPS -> MySQL • Default in QEMU • Default in KRBD • Added in mainline 4.2 • Backported to RHEL 7.2 3.10-236+
  • 17. TUNING QEMU BLOCK • Paravirtual Devices • AIO Mode • Caching • x-data-plane • num_queues
  • 18. QEMU: PARAVIRTUAL DEVICES • Virtio-blk • Virtio-scsi
  • 19. QEMU: AIO MODE • Threads • Software implementation of aio using thread pool • Native • User Kernel AIO • Way to go in the future
  • 20. QEMU: CACHING Writeback None Writethrough Directsync Uses Host Page Cache Yes No Yes No Guest Disk WCE Enabled Enabled Disabled Disabled rbd_cache True False True False rbd_max_dirty 25165824 0 0 0
  • 21. QEMU: Timers • Block storage benchmark too – fio • Very frequent access to CPU timing registers • Accesses need to be emulated • Can block main QEMU event loop with concurrent high IO load
  • 22. QEMU: Timers • Block storage benchmark too – fio • Very frequent access to CPU timing registers • Accesses need to be emulated • Can block main QEMU event loop with concurrent high IO load
  • 24. BENCHMARKS • Sysbench OLTP, 32 tables of each 28M rows, ~200GB • MySQL config: 50GB buffer pool, 8MB log file size, ACID • Filesystem: XFS with noatime, nodiratime, nobarrier • Data reloaded before each test • 100% reads: --oltp-point-select=100 • 100% writes: --oltp-index-updates=100 • 70%/30% reads/writes: --oltp-index-updates=28 --oltp-point-select=70 --rand-type=uniform • 20 minute run time per test, iterations averaged • 64 threads, 8 cores
  • 25. BASIC QEMU PERFORMANCE 0 5000 10000 15000 20000 25000 30000 35000 qemu tcg qemu-kvm-default io=threads cache=none io=native cache=none IOPS Reads Writes R/W 70/30
  • 26. THREAD CACHING MODES 0 5000 10000 15000 20000 25000 30000 io=threads cache=none io=threads cache=writethrough io=threads cache=writeback IOPS Reads Writes R/W 70/30
  • 27. DEDICATED DISPATCH THREADS 0 5000 10000 15000 20000 25000 30000 35000 io=native cache=none io=native cache=directsync io=native cache=directsync iothread=1 io=native cache=directsync iothread=2 IOPS Reads Writes R/W 70/30
  • 28. DATA PLANE AND VIRTIO-SCSI QUEUES 0 5000 10000 15000 20000 25000 30000 35000 40000 x-data-plane virtio-scsi, num-queues=4 virtio-scsi, num-queues=2, vectors=3 virtio-scsi, num-queues=4, vectors=5 IOPS Reads Writes R/W 70/30
  • 29. CONTAINERS AND METAL 0 10000 20000 30000 40000 50000 60000 Metal (taskset -c 10-17) lxc (cgroup cpu 10-17) io=threads cache=none io=native cache=none virtio-scsi, num-queues=2, vectors=3 IOPS Reads Writes R/W 70/30