MySQL and
Ceph
2:20pm – 3:10pm
Room 203
MySQL in the Cloud
Head-to-Head Performance Lab
1:20pm – 2:10pm
Room 203
WHOIS
Brent Compton and Kyle Bader
Storage Solution Architectures
Red Hat
Yves Trudeau
Principal Architect
Percona
AGENDA
MySQL on Ceph MySQL in the Cloud
Head-to-Head Performance Lab
• MySQL on Ceph vs. AWS
• Head-to-head: Performance
• Head-to-head: Price/performance
• IOPS performance nodes for Ceph
• Why MySQL on Ceph
• Ceph Architecture
• Tuning: MySQL on Ceph
• HW Architectural Considerations
Why MySQL on Ceph
• Ceph #1 block storage for OpenStack clouds
• MySQL #4 workload on OpenStack
(#1-3 often use databases too!)
• 70% Apps use LAMP on OpenStack
• Ceph leading open-source SDS
• MySQL leading open-source RDBMS
WHY MYSQL ON CEPH?
MARKET DRIVERS
• Shared, elastic storage pool
• Dynamic DB placement
• Flexible volume resizing
• Live instance migration
• Backup to object pool
• Read replicas via copy-on-write snapshots
WHY MYSQL ON CEPH?
OPS EFFICIENCY
WHY MYSQL ON CEPH?
PUBLIC CLOUD FIDELITY
• Hybrid Cloud requires familiar platforms
• Developers want platform consistency
• Block storage, like the big kids
• Object storage, like the big kids
• Your hardware, datacenter, staff
WHY MYSQL ON CEPH?
HYBRID CLOUD REQUIRES HIGH IOPS
Ceph Provides
• Spinning Block – General Purpose
• Object Storage - Capacity
• SSD Block – High IOPS
CEPH ARCHITECTURE
ARCHITECTURAL COMPONENTS
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CEPHFS
A distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
CEPH OSD
RADOS CLUSTER
RADOS CLUSTER
RADOS COMPONENTS
OSDs
• 10s to 10000s in a cluster
• Typically one per disk
• Serve stored objects to clients
• Intelligently peer for replication & recovery
Monitors
• Maintain cluster membership and state
• Provide consensus for distributed decision-making
• Small, odd number
• These do not serve stored objects to clients
WHERE DO OBJECTS LIVE?
??
A METADATA SERVER?
1
2
CALCULATED PLACEMENT
EVEN BETTER: CRUSH
CLUSTERPLACEMENT GROUPS
(PGs)
CRUSH IS A QUICK CALCULATION
CLUSTER
DYNAMIC DATA PLACEMENT
CRUSH:
• Pseudo-random placement algorithm
• Fast calculation, no lookup
• Repeatable, deterministic
• Statistically uniform distribution
• Stable mapping
• Limited data migration on change
• Rule-based configuration
• Infrastructure topology aware
• Adjustable replication
• Weighting
DATA IS ORGANIZED INTO POOLS
CLUSTERPOOLS
(CONTAINING PGs)
POOL
A
POOL
B
POOL
C
POOL
D
ACCESS METHODS
ARCHITECTURAL COMPONENTS
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CEPHFS
A distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
ARCHITECTURAL COMPONENTS
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CEPHFS
A distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
ACCESSING A RADOS CLUSTER
RADOS CLUSTER
socket
RADOS ACCESS FOR APPLICATIONS
LIBRADOS
• Direct access to RADOS for applications
• C, C++, Python, PHP, Java, Erlang
• Direct access to storage nodes
• No HTTP overhead
ARCHITECTURAL COMPONENTS
RGW
A web services
gateway for object
storage, compatible
with S3 and Swift
LIBRADOS
A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP)
RADOS
A software-based, reliable, autonomous, distributed object store comprised of
self-healing, self-managing, intelligent storage nodes and lightweight monitors
RBD
A reliable, fully-
distributed block
device with cloud
platform integration
CEPHFS
A distributed file
system with POSIX
semantics and scale-
out metadata
APP HOST/VM CLIENT
STORING VIRTUAL DISKS
RADOS CLUSTER
STORING VIRTUAL DISKS
RADOS CLUSTER
STORING VIRTUAL DISKS
RADOS CLUSTER
PERCONA ON KRBD
RADOS CLUSTER
TUNING MYSQL ON CEPH
TUNING FOR HARMONY
OVERVIEW
Tuning MySQL
• Buffer pool > 20%
• Flush each Tx or batch?
• Parallel double write-buffer
flush
Tuning Ceph
• RHCS 1.3.2, tcmalloc 2.4
• 128M thread cache
• Co-resident journals
• 2-4 OSDs per SSD
TUNING FOR HARMONY
SAMPLE EFFECT OF MYSQL BUFFER POOL ON TpmC
-
200,000
400,000
600,000
800,000
1,000,000
1,200,000
0 1000 2000 3000 4000 5000 6000 7000 8000
tpmC
Time (seconds) - 1 data point per minute
64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses
1% Buffer Pool
5% Buffer Pool
25% Buffer Pool
50% Buffer Pool
75% Buffer Pool
TUNING FOR HARMONY
SAMPLE EFFECT OF MYSQL Tx FLUSH ON TpmC
-
500,000
1,000,000
1,500,000
2,000,000
2,500,000
0 1000 2000 3000 4000 5000 6000 7000 8000
tpmC
Time (seconds) - 1 data point per minute
64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses
Batch Tx flush (1 sec)
Per Tx flush
TUNING FOR HARMONY
SAMPLE EFFECT OF CEPH TCMALLOC VERSION ON TpmC
-
200,000
400,000
600,000
800,000
1,000,000
1,200,000
0 1000 2000 3000 4000 5000 6000 7000 8000
tpmC
Time (seconds) - 1 data point per minute
64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses
Per Tx flush
Per Tx flush (tcmalloc v2.4)
TUNING FOR HARMONY
CREATING A SEPARATE POOL TO SERVE IOPS WORKLOADS
Creating multiple pools in the CRUSH map
• Distinct branch in OSD tree
• Edit CRUSH map, add SSD rules
• Create pool, set crush_ruleset to SSD rule
• Add Volume Type to Cinder
TUNING FOR HARMONY
IF YOU MUST USE MAGNETIC MEDIA
Reducing seeks on magnetic pools
• RBD cache is safe
• RAID Controllers with write-back cache
• SSD Journals
• Software caches
HW ARCHITECTURE
CONSIDERATIONS
ARCHITECTURAL CONSIDERATIONS
UNDERSTANDING THE WORKLOAD
Traditional Ceph Workload
• $/GB
• PBs
• Unstructured data
• MB/sec
MySQL Ceph Workload
• $/IOP
• TBs
• Structured data
• IOPS
NEXT UP
2:20pm – 3:10pm
Room 203
MySQL in the Cloud
Head-to-Head Performance Lab

My SQL on Ceph

  • 1.
    MySQL and Ceph 2:20pm –3:10pm Room 203 MySQL in the Cloud Head-to-Head Performance Lab 1:20pm – 2:10pm Room 203
  • 2.
    WHOIS Brent Compton andKyle Bader Storage Solution Architectures Red Hat Yves Trudeau Principal Architect Percona
  • 3.
    AGENDA MySQL on CephMySQL in the Cloud Head-to-Head Performance Lab • MySQL on Ceph vs. AWS • Head-to-head: Performance • Head-to-head: Price/performance • IOPS performance nodes for Ceph • Why MySQL on Ceph • Ceph Architecture • Tuning: MySQL on Ceph • HW Architectural Considerations
  • 4.
  • 5.
    • Ceph #1block storage for OpenStack clouds • MySQL #4 workload on OpenStack (#1-3 often use databases too!) • 70% Apps use LAMP on OpenStack • Ceph leading open-source SDS • MySQL leading open-source RDBMS WHY MYSQL ON CEPH? MARKET DRIVERS
  • 6.
    • Shared, elasticstorage pool • Dynamic DB placement • Flexible volume resizing • Live instance migration • Backup to object pool • Read replicas via copy-on-write snapshots WHY MYSQL ON CEPH? OPS EFFICIENCY
  • 7.
    WHY MYSQL ONCEPH? PUBLIC CLOUD FIDELITY • Hybrid Cloud requires familiar platforms • Developers want platform consistency • Block storage, like the big kids • Object storage, like the big kids • Your hardware, datacenter, staff
  • 8.
    WHY MYSQL ONCEPH? HYBRID CLOUD REQUIRES HIGH IOPS Ceph Provides • Spinning Block – General Purpose • Object Storage - Capacity • SSD Block – High IOPS
  • 9.
  • 10.
    ARCHITECTURAL COMPONENTS RGW A webservices gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully- distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale- out metadata APP HOST/VM CLIENT
  • 11.
  • 12.
  • 13.
    RADOS COMPONENTS OSDs • 10sto 10000s in a cluster • Typically one per disk • Serve stored objects to clients • Intelligently peer for replication & recovery Monitors • Maintain cluster membership and state • Provide consensus for distributed decision-making • Small, odd number • These do not serve stored objects to clients
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    CRUSH IS AQUICK CALCULATION CLUSTER
  • 19.
    DYNAMIC DATA PLACEMENT CRUSH: •Pseudo-random placement algorithm • Fast calculation, no lookup • Repeatable, deterministic • Statistically uniform distribution • Stable mapping • Limited data migration on change • Rule-based configuration • Infrastructure topology aware • Adjustable replication • Weighting
  • 20.
    DATA IS ORGANIZEDINTO POOLS CLUSTERPOOLS (CONTAINING PGs) POOL A POOL B POOL C POOL D
  • 21.
  • 22.
    ARCHITECTURAL COMPONENTS RGW A webservices gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully- distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale- out metadata APP HOST/VM CLIENT
  • 23.
    ARCHITECTURAL COMPONENTS RGW A webservices gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully- distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale- out metadata APP HOST/VM CLIENT
  • 24.
    ACCESSING A RADOSCLUSTER RADOS CLUSTER socket
  • 25.
    RADOS ACCESS FORAPPLICATIONS LIBRADOS • Direct access to RADOS for applications • C, C++, Python, PHP, Java, Erlang • Direct access to storage nodes • No HTTP overhead
  • 26.
    ARCHITECTURAL COMPONENTS RGW A webservices gateway for object storage, compatible with S3 and Swift LIBRADOS A library allowing apps to directly access RADOS (C, C++, Java, Python, Ruby, PHP) RADOS A software-based, reliable, autonomous, distributed object store comprised of self-healing, self-managing, intelligent storage nodes and lightweight monitors RBD A reliable, fully- distributed block device with cloud platform integration CEPHFS A distributed file system with POSIX semantics and scale- out metadata APP HOST/VM CLIENT
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
    TUNING FOR HARMONY OVERVIEW TuningMySQL • Buffer pool > 20% • Flush each Tx or batch? • Parallel double write-buffer flush Tuning Ceph • RHCS 1.3.2, tcmalloc 2.4 • 128M thread cache • Co-resident journals • 2-4 OSDs per SSD
  • 33.
    TUNING FOR HARMONY SAMPLEEFFECT OF MYSQL BUFFER POOL ON TpmC - 200,000 400,000 600,000 800,000 1,000,000 1,200,000 0 1000 2000 3000 4000 5000 6000 7000 8000 tpmC Time (seconds) - 1 data point per minute 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses 1% Buffer Pool 5% Buffer Pool 25% Buffer Pool 50% Buffer Pool 75% Buffer Pool
  • 34.
    TUNING FOR HARMONY SAMPLEEFFECT OF MYSQL Tx FLUSH ON TpmC - 500,000 1,000,000 1,500,000 2,000,000 2,500,000 0 1000 2000 3000 4000 5000 6000 7000 8000 tpmC Time (seconds) - 1 data point per minute 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses Batch Tx flush (1 sec) Per Tx flush
  • 35.
    TUNING FOR HARMONY SAMPLEEFFECT OF CEPH TCMALLOC VERSION ON TpmC - 200,000 400,000 600,000 800,000 1,000,000 1,200,000 0 1000 2000 3000 4000 5000 6000 7000 8000 tpmC Time (seconds) - 1 data point per minute 64x MySQL Instances on Ceph cluster: each with 25x TPC-C Warehouses Per Tx flush Per Tx flush (tcmalloc v2.4)
  • 36.
    TUNING FOR HARMONY CREATINGA SEPARATE POOL TO SERVE IOPS WORKLOADS Creating multiple pools in the CRUSH map • Distinct branch in OSD tree • Edit CRUSH map, add SSD rules • Create pool, set crush_ruleset to SSD rule • Add Volume Type to Cinder
  • 37.
    TUNING FOR HARMONY IFYOU MUST USE MAGNETIC MEDIA Reducing seeks on magnetic pools • RBD cache is safe • RAID Controllers with write-back cache • SSD Journals • Software caches
  • 38.
  • 39.
    ARCHITECTURAL CONSIDERATIONS UNDERSTANDING THEWORKLOAD Traditional Ceph Workload • $/GB • PBs • Unstructured data • MB/sec MySQL Ceph Workload • $/IOP • TBs • Structured data • IOPS
  • 40.
    NEXT UP 2:20pm –3:10pm Room 203 MySQL in the Cloud Head-to-Head Performance Lab

Editor's Notes

  • #34 MySQL TPC-C tpmC benchmark on XS instances 2.5GB Dataset (25W) per Instance (64x) MySQL Instances on 48x OSD/HDD Ceph Cluster w 1024GB Total Server RAM MySQL Buffer Pool v. Dataset Ratio Varied (1%-100%)
  • #35 MySQL TPC-C tpmC benchmark on XS instances 2.5GB Dataset (25W) per Instance (64x) MySQL Instances on 48x OSD/HDD Ceph Cluster w 1024GB Total Server RAM MySQL Buffer Pool v. Dataset Ratio Varied (1%-100%)
  • #36 MySQL TPC-C tpmC benchmark on XS instances 2.5GB Dataset (25W) per Instance (64x) MySQL Instances on 48x OSD/HDD Ceph Cluster w 1024GB Total Server RAM MySQL Buffer Pool v. Dataset Ratio Varied (1%-100%)