Scaling MySQL – Sharding Made Easy
2
Agenda
• Scalability Issues
• MySQL 5.6
• Why Do-It-Yourself (DIY) Sharding Sucks
• ScaleBase Data Distribution:
– Successful sharding on Amazon and private clouds
– Single vs. multiple shards per server
– Eliminating data silos
– Creating a redundant, fault-tolerant architecture
– Re-balancing and splitting shards
• Q & A
3
Doron Levari, Founder & CTO
Doron Levari,
Founder & CTO
A technologist and long-
time veteran of the
database industry. Prior
to founding
ScaleBase, Doron was
CEO to Aluna.
4
What We Do
Simply and cost-effectively scale
MySQL to support an infinite
number of users, transactions and data
with NO disruption to the existing infrastructure
Scalability Issues and MySQL 5.6
6
MySQL Scalability Challenges
• Too many transactions
• Too many users
• Too much data
• Too many writes
• Capacity
• Throughput
• Performance inconsistencies
7
Improvements in MySQL 5.6 – Single Box
Partitioning Improvements
– Explicit Partition Selection:
SELECT * FROM employees
PARTITION (p0, p2);
– Import / Export for Partitioned Tables:
Bring a new data set into a partitioned
table, or export a partition to manage it
as a regular table ALTER TABLE e
EXCHANGE PARTITION p0 WITH
TABLE e2;
http://dev.mysql.co/tech-resources/articles/whats-new-in-mysql-5.6.html
Replication Improvements
– Optimizations to Row-Based
Replication
– Multi-Threaded Slaves
– Improvements to Data Integrity
– Crash-Safe Slaves
– Replication Checksums
SCALABILITY issues remain due to the limitations of a single box:
To ensure ACID, you still face limitations with:
- Memory management - Thread management
- Semaphores - Locking
- Recovery tasks
No new functionality for sharing workloads across multiple boxes
8
What are my Options
1. More/Bigger Hardware?
– Temporary fix…you will need new hardware again
– More memory…helps mostly with “reads,” but not with “writes”
– Every write operation is at least 4 write operations in database, plus
multiple activities in the database engine memory
2. Application re-architecture?
– Steer workload away from the database
– Example: introduce a caching layer
– Force application re-writes; new test & QA cycles
3. Do it Yourself Sharding?
4. Migrate to new database architecture
– Other RDBMS/NewSQL / NoSQL?
– Force application re-writes; new test & QA cycles
– ACID/Durability Issues
9
Scale Out your Existing MySQL
• Keep your MySQL - keep your InnoDB
• Ecosystem compatibility, preserve skills
• 100% application compatibility
• Smoother migration, no down-time, no forklift
• Your data is safe
• No “in-memory” magic
• No “in-memory” size limit
Don’t throw out the baby with the bath water!
Why Do It Yourself Sharding “Sucks”
11
What is Sharding?
Wikipedia - Shard (database architecture) http://en.wikipedia.org/wiki/Shard_(database_architecture)
A database shard is a horizontal partition in a
database or search engine. Each individual partition
is referred to as a shard.
Horizontal partitioning is a database design
principle whereby rows of a database table are held
separately, rather than being split into columns.
Each partition forms part of a shard, which may in
turn be located on a separate database server or
physical location.
12
DIY Sharding Challenges
Applications must be modified to support multiple shards
13
• Maintaining DB ops and IPs in the app
• Non-optimized sharding strategies
– No good way to maintain global tables
replicated across all database
• Sacrifices development agility,
additional administrative complexity
• Results in database silos
• Database ecosystem breaks because
the application “conceals” sharding
strategies internally
• Risks for data inconsistency
• Adding and removing databases
is not supported…overprovisioning…
• Jeopardizes high availability, backups & disaster recovery
• Demands custom application code that can fail ACID compliance
DIY Sharding Challenges
Challenges exist because
application code changes are
required to support multiple
database instances.
ScaleBase Data Distribution Overview
15
Data Distribution: Application Experience
Without ScaleBase: App must be customized to support shards
With ScaleBase: App sees ONE database…
…and doesn’t require any customization
ScaleBase acts as a proxy between the app and the
database, virtualizing the database environment
16
Manual Sharding versus ScaleBase
Sharding Limitations:
• Major app rewrite, maintaining code
• Maintaining DB ops & IPs in the app
• Administration/3rd party tools are broken
• DB silos/Database ecosystem is blind
– Application “hides” sharding strategies
• Non-optimized data distribution policy
– No good way to maintain global
tables, replicated across all database
• Sacrifices development agility
• Adding/removing DBs is not supported
• Risks for data inconsistency
• Demands custom application code that
can fail ACID compliance
• Jeopardizes high
availability, backups, and disaster
recovery
ScaleBase Benefits:
• No hard-coding application re-writes
• Unlimited scalability
• Improve performance
• Real time elasticity
• ACID compliance
• Verified data consistency
• Real time monitoring, traffic analysis
• Carefully analyze distribution policy
• Enable system upgrades and updates
• Simplified, centralized admin
– Adding users
– Changing schemas
– Maintenance scripts
– Management queries
17
Typical ScaleBase Data Traffic Manager Deployment
Application
Servers
BI
Management
Database A Replica A
Database B Replica B
Database C Replica C
Database D Replica D
Unlimited Scale
ScaleBase
Architecture
is Fault Tolerant
ScaleBase Data Distribution – In Detail
19
ScaleBase Enables MySQL Scale Out without Re-
writing Apps
• Data distribution and scale-out is part of the database
architecture, not the application
• One IP to connect to, and “see a unified database”
– The application
– Entire ecosystem (ETL, mysqldump, PHPMyAdmin)
– No special sharding wizard developer
– No app re-design, re-dev, re-QA, re-test, re-deploy
– No hard-coded variables lost in the code
– No special documentation
20
ScaleBase Enable Scale Out on AWS and Private
Clouds
• A virtualized DB environment makes it easy to change real
infrastructure, because it’s decoupled from the application
• No cloud makes your database elastic
• ScaleBase enables elasticity of MySQL in the cloud (EC2, RDS, etc.)
Scale-up hits
AWS’s tiered
configuration
limits fast
Scale-out is
unlimited and
gives cloud
flexibility
21
ScaleBase Supports Scale Out on Single & Multiple
Machines
Advantages of several
shards on one machine:
– Several smaller MySQL
instances better utilize
cores, memory
– When data grows, each
instance can later on
migrate to a bigger
machine of its own
Advantages of several shards
on multiple machines
– Leverage commodity hardware
– When reaches machine limits -
ScaleBase enables online data
redistribution (resharding) and
shard-split
22
ScaleBase Enables Splitting Shards
• ScaleBase also redistributes data across the array to eliminate hot
spots, splitting the hot spot into two databases
23
ScaleBase Re-balances Shards
• Special analysis and alerts about approaching limits
• ScaleBase dynamically redistributes data (resharding) - moving the
data across the array from the over-utilized to the under-utilized
24
ScaleBase Provides Optimal Data Distribution Policies
A good data distribution policy ensures that a specific
transaction is directed to a specific database
1,000 transactions
250
transactions
250
transactions
250
transactions
250
transactions
1,000 transactions
25
ScaleBase Eliminates Data Silos
When a query needs data
from several databases,
ScaleBase:
– Runs the query in parallel
on all databases
– Aggregates results into one
meaningful result-set to be returned to the client – the same
result-set that would have been returned from a single DB!
– Including cross-db GROUP BY, ORDER BY, aggregate functions
– Including cross-db JOIN operations
– Enables 2-phase commit for transactions spanning multiple
databases
26
ScaleBase Provides a Fault Tolerant Architecture
Application
Servers
BI
Management
Database A Replica A
Database B Replica B
Database C Replica C
Database D Replica D
Fully Redundant
Resilience to failures
Scheduled
maintenance without
downtime
Summary
28
ScaleBase Delivers Scalability
Scale to
Unlimited
Throughput
No Specialized
Hardware
No
Re-architecture
No Application
Rewrites
30
Detailed Scale Out Case Studies
Large Chip Co
• Scalability
• Multiple Apps
• Multiple growing
users
• Availability
• MySQL DB
Solar Edge
• Next Gen
Monitoring App
• Massive Scale
• Monitors real
time data from
thousands of
distributed
systems
Mozilla
• New Product/
Next Gen App/
AppStore
• Scalability
• Geo-clustering
AppDynamics
• Next gen APM
company
• Scalability for the
Netflix
implementation
31
ScaleBase Deployment
Environments
– Public Cloud
– AWS, Rackspace, any
– Private cloud
– Hosted / on-premise
Databases Supported
– MySQL 5.1, 5.5, 5.6 (under
certification)
– AWS RDS MySQL 5.1, 5.5
– Maria DB 10.0 (under
certification)
Path to Scale-Out:
1. Data Distribution
Policy Analysis
2. Functional Test
3. Load Test
4. Production Migration
(safe, online)
32
Summary
ScaleBase provides cost-effective Scale-Out solutions
• Scale to an infinite number of users, data and transactions
• Improve performance
• No application rewrites
• Real-time elasticity
• ACID Compliant
• Expert analysis and simple deployment
• Leverage existing MySQL ecosystem/skills
• Improve database visibility with real-time monitoring
• Simplified, centralized administration
33
Questions (please enter directly into the GTW side panel)
paul.campaniello@scalebase.com
doron.levari@scalebase.com
www.ScaleBase.com
617.630.2800
Additional Resources
http://www.scalebase.com/blog/
http://www.scalebase.com/resources/
@scalebase
34
Thank You

ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!

  • 1.
    Scaling MySQL –Sharding Made Easy
  • 2.
    2 Agenda • Scalability Issues •MySQL 5.6 • Why Do-It-Yourself (DIY) Sharding Sucks • ScaleBase Data Distribution: – Successful sharding on Amazon and private clouds – Single vs. multiple shards per server – Eliminating data silos – Creating a redundant, fault-tolerant architecture – Re-balancing and splitting shards • Q & A
  • 3.
    3 Doron Levari, Founder& CTO Doron Levari, Founder & CTO A technologist and long- time veteran of the database industry. Prior to founding ScaleBase, Doron was CEO to Aluna.
  • 4.
    4 What We Do Simplyand cost-effectively scale MySQL to support an infinite number of users, transactions and data with NO disruption to the existing infrastructure
  • 5.
  • 6.
    6 MySQL Scalability Challenges •Too many transactions • Too many users • Too much data • Too many writes • Capacity • Throughput • Performance inconsistencies
  • 7.
    7 Improvements in MySQL5.6 – Single Box Partitioning Improvements – Explicit Partition Selection: SELECT * FROM employees PARTITION (p0, p2); – Import / Export for Partitioned Tables: Bring a new data set into a partitioned table, or export a partition to manage it as a regular table ALTER TABLE e EXCHANGE PARTITION p0 WITH TABLE e2; http://dev.mysql.co/tech-resources/articles/whats-new-in-mysql-5.6.html Replication Improvements – Optimizations to Row-Based Replication – Multi-Threaded Slaves – Improvements to Data Integrity – Crash-Safe Slaves – Replication Checksums SCALABILITY issues remain due to the limitations of a single box: To ensure ACID, you still face limitations with: - Memory management - Thread management - Semaphores - Locking - Recovery tasks No new functionality for sharing workloads across multiple boxes
  • 8.
    8 What are myOptions 1. More/Bigger Hardware? – Temporary fix…you will need new hardware again – More memory…helps mostly with “reads,” but not with “writes” – Every write operation is at least 4 write operations in database, plus multiple activities in the database engine memory 2. Application re-architecture? – Steer workload away from the database – Example: introduce a caching layer – Force application re-writes; new test & QA cycles 3. Do it Yourself Sharding? 4. Migrate to new database architecture – Other RDBMS/NewSQL / NoSQL? – Force application re-writes; new test & QA cycles – ACID/Durability Issues
  • 9.
    9 Scale Out yourExisting MySQL • Keep your MySQL - keep your InnoDB • Ecosystem compatibility, preserve skills • 100% application compatibility • Smoother migration, no down-time, no forklift • Your data is safe • No “in-memory” magic • No “in-memory” size limit Don’t throw out the baby with the bath water!
  • 10.
    Why Do ItYourself Sharding “Sucks”
  • 11.
    11 What is Sharding? Wikipedia- Shard (database architecture) http://en.wikipedia.org/wiki/Shard_(database_architecture) A database shard is a horizontal partition in a database or search engine. Each individual partition is referred to as a shard. Horizontal partitioning is a database design principle whereby rows of a database table are held separately, rather than being split into columns. Each partition forms part of a shard, which may in turn be located on a separate database server or physical location.
  • 12.
    12 DIY Sharding Challenges Applicationsmust be modified to support multiple shards
  • 13.
    13 • Maintaining DBops and IPs in the app • Non-optimized sharding strategies – No good way to maintain global tables replicated across all database • Sacrifices development agility, additional administrative complexity • Results in database silos • Database ecosystem breaks because the application “conceals” sharding strategies internally • Risks for data inconsistency • Adding and removing databases is not supported…overprovisioning… • Jeopardizes high availability, backups & disaster recovery • Demands custom application code that can fail ACID compliance DIY Sharding Challenges Challenges exist because application code changes are required to support multiple database instances.
  • 14.
  • 15.
    15 Data Distribution: ApplicationExperience Without ScaleBase: App must be customized to support shards With ScaleBase: App sees ONE database… …and doesn’t require any customization ScaleBase acts as a proxy between the app and the database, virtualizing the database environment
  • 16.
    16 Manual Sharding versusScaleBase Sharding Limitations: • Major app rewrite, maintaining code • Maintaining DB ops & IPs in the app • Administration/3rd party tools are broken • DB silos/Database ecosystem is blind – Application “hides” sharding strategies • Non-optimized data distribution policy – No good way to maintain global tables, replicated across all database • Sacrifices development agility • Adding/removing DBs is not supported • Risks for data inconsistency • Demands custom application code that can fail ACID compliance • Jeopardizes high availability, backups, and disaster recovery ScaleBase Benefits: • No hard-coding application re-writes • Unlimited scalability • Improve performance • Real time elasticity • ACID compliance • Verified data consistency • Real time monitoring, traffic analysis • Carefully analyze distribution policy • Enable system upgrades and updates • Simplified, centralized admin – Adding users – Changing schemas – Maintenance scripts – Management queries
  • 17.
    17 Typical ScaleBase DataTraffic Manager Deployment Application Servers BI Management Database A Replica A Database B Replica B Database C Replica C Database D Replica D Unlimited Scale ScaleBase Architecture is Fault Tolerant
  • 18.
  • 19.
    19 ScaleBase Enables MySQLScale Out without Re- writing Apps • Data distribution and scale-out is part of the database architecture, not the application • One IP to connect to, and “see a unified database” – The application – Entire ecosystem (ETL, mysqldump, PHPMyAdmin) – No special sharding wizard developer – No app re-design, re-dev, re-QA, re-test, re-deploy – No hard-coded variables lost in the code – No special documentation
  • 20.
    20 ScaleBase Enable ScaleOut on AWS and Private Clouds • A virtualized DB environment makes it easy to change real infrastructure, because it’s decoupled from the application • No cloud makes your database elastic • ScaleBase enables elasticity of MySQL in the cloud (EC2, RDS, etc.) Scale-up hits AWS’s tiered configuration limits fast Scale-out is unlimited and gives cloud flexibility
  • 21.
    21 ScaleBase Supports ScaleOut on Single & Multiple Machines Advantages of several shards on one machine: – Several smaller MySQL instances better utilize cores, memory – When data grows, each instance can later on migrate to a bigger machine of its own Advantages of several shards on multiple machines – Leverage commodity hardware – When reaches machine limits - ScaleBase enables online data redistribution (resharding) and shard-split
  • 22.
    22 ScaleBase Enables SplittingShards • ScaleBase also redistributes data across the array to eliminate hot spots, splitting the hot spot into two databases
  • 23.
    23 ScaleBase Re-balances Shards •Special analysis and alerts about approaching limits • ScaleBase dynamically redistributes data (resharding) - moving the data across the array from the over-utilized to the under-utilized
  • 24.
    24 ScaleBase Provides OptimalData Distribution Policies A good data distribution policy ensures that a specific transaction is directed to a specific database 1,000 transactions 250 transactions 250 transactions 250 transactions 250 transactions 1,000 transactions
  • 25.
    25 ScaleBase Eliminates DataSilos When a query needs data from several databases, ScaleBase: – Runs the query in parallel on all databases – Aggregates results into one meaningful result-set to be returned to the client – the same result-set that would have been returned from a single DB! – Including cross-db GROUP BY, ORDER BY, aggregate functions – Including cross-db JOIN operations – Enables 2-phase commit for transactions spanning multiple databases
  • 26.
    26 ScaleBase Provides aFault Tolerant Architecture Application Servers BI Management Database A Replica A Database B Replica B Database C Replica C Database D Replica D Fully Redundant Resilience to failures Scheduled maintenance without downtime
  • 27.
  • 28.
    28 ScaleBase Delivers Scalability Scaleto Unlimited Throughput No Specialized Hardware No Re-architecture No Application Rewrites
  • 30.
    30 Detailed Scale OutCase Studies Large Chip Co • Scalability • Multiple Apps • Multiple growing users • Availability • MySQL DB Solar Edge • Next Gen Monitoring App • Massive Scale • Monitors real time data from thousands of distributed systems Mozilla • New Product/ Next Gen App/ AppStore • Scalability • Geo-clustering AppDynamics • Next gen APM company • Scalability for the Netflix implementation
  • 31.
    31 ScaleBase Deployment Environments – PublicCloud – AWS, Rackspace, any – Private cloud – Hosted / on-premise Databases Supported – MySQL 5.1, 5.5, 5.6 (under certification) – AWS RDS MySQL 5.1, 5.5 – Maria DB 10.0 (under certification) Path to Scale-Out: 1. Data Distribution Policy Analysis 2. Functional Test 3. Load Test 4. Production Migration (safe, online)
  • 32.
    32 Summary ScaleBase provides cost-effectiveScale-Out solutions • Scale to an infinite number of users, data and transactions • Improve performance • No application rewrites • Real-time elasticity • ACID Compliant • Expert analysis and simple deployment • Leverage existing MySQL ecosystem/skills • Improve database visibility with real-time monitoring • Simplified, centralized administration
  • 33.
    33 Questions (please enterdirectly into the GTW side panel) paul.campaniello@scalebase.com doron.levari@scalebase.com www.ScaleBase.com 617.630.2800 Additional Resources http://www.scalebase.com/blog/ http://www.scalebase.com/resources/ @scalebase
  • 34.