Flexible transactional scale for the connected world.
Challenges to Scaling MySQL:
Scaling In and Down – The Costs
Dave A. Anselmi @AnselmiDave
Director of Product Management
Questions for Today
o Why and when is scaling down MySQL a good idea?
o What options are there to scale down MySQL?
o How do I figure out the costs of not scaling down?
o How does ClustrixDB scale-down differently than MySQL?
o How real is elastically scaling in ClustrixDB? What are the
catches?
PROPRIETARY & CONFIDENTIAL 2
MySQL Scaling is Usually 1-Way: Up/Out
The Typical Path to Scale…SCALE
(GROWTH/SUCCESS)
T I M E
LAMP Stack
AWS, Azure,
RAX, GCE, etc
Private Cloud
REACH LIMIT
App too slow;
Lost users
REACH LIMIT
(AGAIN)
App too slow;
Lost users
Migrate
to Bigger
Machine
• Read slaves, then
Sharding, etc:
• Add more hardware &
DBAs
• Refactor Code
/Hardwired App
 More Expensive
 Higher Risk
 Lost Revenue
ONGOING:
• Refactoring
Hardware
• Data Balancing
• Shard
Maintenance
REPEAT
Migrate
to Bigger
Machine
PROPRIETARY & CONFIDENTIAL 4
Why is Scaling Down/In Needed?
PROPRIETARY & CONFIDENTIAL 6
Scaling for Peak Workloads
Peak or Periodic Workloads Waste Resources
o Many workloads have some periodicity
o Maintaining capacity for peaks while undersubscribed
results in wasted resources
o This bugs the CFO… & affects DevOps budgets
PROPRIETARY & CONFIDENTIAL 7
Why Costs Should Matter to Tech People
o DevOps, DBAs, and data architects focus on product features and
technical feasibilities. “TCO ain’t our TLA” 
o However, at some point the ‘business side’ of your company has to
authorize purchase of the actual system(s).
o Whether it’s licensing, support, or cloud solutions (AWS, etc), all of
them have a price, and all of them have to be ‘justified.’
o Knowing how to frame your implementation recommendations into
pros/cons-based ‘business cases’ greatly affects your resources
requests:
– Either with your team lead/department head
– or the guys and gals in finance
PROPRIETARY & CONFIDENTIAL 8
Peak or Periodic Workloads by Sector
o E-Commerce
– Black Friday/Cyber Monday, Single’s Day, ‘Back to School,’
flash sales, etc
– 80% of Revenue in 2 months
– Provisioning > 3x capacity for 2 months
o Gaming
– New game released, new update
– Need ability to quickly scale either out (game servers
oversubscribed) or in (less gamers than estimated)
– Cannibalization: New game causes migration of previous
subscribed base from old game(s)
PROPRIETARY & CONFIDENTIAL 9
Peak or Periodic Workloads by Sector
o Social Media
– Some events are periodic/predictable (e.g. Awards Season,
movie releases, Hallmark holidays, TV shows)
– Some events much less so (current events, ‘hot trends,’ politics,
social outrage, etc)
o Sports
– Playoffs, Super Bowl, March Madness, etc
– Provisioning for these requires quickly available additional
resources
– After the main event, sports app utilization can fall severely,
leaving server arrays overprovisioned
PROPRIETARY & CONFIDENTIAL 10
How are MySQL workloads Scaled Down/In?
Scaling Down/In: The Work Required
Approach How Scale Up/Out Scale Down/In
Scale-Up Keep increasing the size
of the (single) database
server
• Console: Click for larger server, until
largest available
• EC2: Bring up larger (redundant)
server with backup, use replication to
catch up, then change application to
new DB endpoint
• Console: Click for smaller server. Works
well if max workload fits in DBaaS offering
• EC2: Bring up smaller (redundant) server
with backup, use replication to catch up,
then change application to new DB
endpoint
Console: DBaaS with shared storage. EC2: Instance, or bare metal
Scaling Down/In: The Work Required
Approach How Scale Up/Out Scale Down/In
Scale-Up Keep increasing the size
of the (single) database
server
• Console: Click for larger server, until
largest available
• EC2: Bring up larger (redundant)
server with backup, use replication to
catch up, then change application to
new DB endpoint
• Console: Click for smaller server. Works
well if max workload fits in DBaaS offering
• EC2: Bring up smaller (redundant) server
with backup, use replication to catch up,
then change application to new DB
endpoint
Read Slaves Add a ‘Slave’ read-
server(s) to ‘Master’
database server
• Console: Click to add ‘Read Replicas’
• EC2: Bring up redundant server(s)
with backup, & turn on replication
• Setup read/write fan-out in app or at
the proxy level
• Console: Click to remove ‘Read Replicas’
• EC2: Bring down read slave(s)
• Change read/write fan-out in app or at the
proxy level
Console: DBaaS with shared storage. EC2: Instance, or bare metal
Scaling Down/In: The Work Required
Approach How Scale Up/Out Scale Down/In
Scale-Up Keep increasing the size
of the (single) database
server
• Console: Click for larger server, until
largest available
• EC2: Bring up larger (redundant)
server with backup, use replication to
catch up, then change application to
new DB endpoint
• Console: Click for smaller server. Works
well if max workload fits in DBaaS offering
• EC2: Bring up smaller (redundant) server
with backup, use replication to catch up,
then change application to new DB
endpoint
Read Slaves Add a ‘Slave’ read-
server(s) to ‘Master’
database server
• Console: Click to add ‘Read Replicas’
• EC2: Bring up redundant server(s)
with backup, & turn on replication
• Setup read/write fan-out in app or at
the proxy level
• Console: Click to remove ‘Read Replicas’
• EC2: Bring down read slave(s)
• Change read/write fan-out in app or at the
proxy level
Master-Master Add additional
‘Master’(s) to ‘Master’
database server
• Console: No native support. Can
deploy larger instances, but must
setup master/master yourself
• EC2: Provision 2 new larger masters
via backup, use replication to catch
up, then change application to new
DB’s endpoints
• Console: No native support. Can deploy
smaller instances, but must setup
master/master yourself
• EC2: Provision 2 new smaller masters via
backup, use replication to catch up, then
change application to new DB’s endpoints
Console: DBaaS with shared storage. EC2: Instance, or bare metal
Scaling Down/In: The Work Required
Approach How Scale Up/Out Scale Down/In
Vertical
Sharding
Separating tables across
separate database
servers
• Console: No native support. Can
deploy additional instances, but must
setup table distribution yourself
• EC2: Provision additional instances via
backup, (manually) re-distribute tables
across shards, then change
application to include new shards
• Console: No native support. Can
deprovision instances, but must
consolidate tables yourself
• EC2: Consolidate tables from redundant
shards, deprovision redundant shards, and
change application/table mapping to match
new data distribution
Console: DBaaS with shared storage. EC2: Instance, or bare metal
Scaling Down/In: The Work Required
Approach How Scale Up/Out Scale Down/In
Vertical
Sharding
Separating tables across
separate database
servers
• Console: No native support. Can
deploy additional instances, but must
setup table distribution yourself
• EC2: Provision additional instances via
backup, (manually) re-distribute tables
across shards, then change
application to include new shards
• Console: No native support. Can
deprovision instances, but must
consolidate tables yourself
• EC2: Consolidate tables from redundant
shards, deprovision redundant shards, and
change application/table mapping to match
new data distribution
Horizontal
Sharding
Partitioning tables across
separate database
servers
• Console: No native support. Can
deploy additional instances, but must
setup partition distribution yourself
• EC2: Provision additional instances via
backup, (manually) re-distribute
partitions across shards, then change
application to include new shards
• Console: No native support. Can deploy
smaller instances, but must consolidate
partitions yourself
• EC2: Consolidate tables from redundant
shards, deprovision redundant shards, and
change application/table mapping to match
new data distribution
Console: DBaaS with shared storage. EC2: Instance, or bare metal
What are the Costs of NOT Scaling Down/In
o Idle Server/Overcapacity cost
– CAPEX budget wasted on unused resources
– OPEX budget probably OK: idle servers need less DevOps
o Low Overall Impact to DevOps Infra:
– “Everything’s Working” / “Not broken; Don’t Fix it”
– Low CAPEX budget means low budgets for replacements; so instead
cannibalize underutilized infra. “No Problem”
PROPRIETARY & CONFIDENTIAL 18
DevOps Impact #1 from Overprovisioning
1. 1-way scaling to handle peaks =>> Idle resources at non-peak,
often most of the time
2. Idle resources =>> Blown/Shrunk DevOps Budgets
1. Both CAPEX and OPEX
2. Finance team pays attention!
3. Blown/Shrunk DevOps Budgets =>> Hard to get Approval for
further capacity
4. No Budget =>> Can’t Scale for Growing Peaks
5. Higher risk of site slowdowns or outages at next peak(s)
PROPRIETARY & CONFIDENTIAL 19
DevOps Impact #2 from Overprovisioning
Black Friday/Cyber Monday Outage Highlights
o 2011: PC Mall, Newegg, Toys R’Us, Avon: 30+min outages. Walmart:
3hr outage
o 2012: Kohl’s: repeated multi-hour outages
o 2013: Urban Outfitters, Motorola: offline most of Cyber Monday
o 2014: Best Buy: 2hrs+ total outages. HP, Nike: site crashes
o 2015: Neiman Marcus: 4hr+ outage
o 2016: Old Navy, Macy’s: multi-hour outages
2016 Black Friday/Cyber Monday
Total Online Sales: $5.27B, 21.6% increase over 2015
PROPRIETARY & CONFIDENTIAL 20
Even Larger Business Impact of Outages
o Opportunity cost
– Each missed visitor was potentially a customer or referral
o Single Sale cost
– Each missed sale is a tangible missed $-value
o Customer Lifetime cost
– Unhappy customers who find sites they like better, won’t return
o Market/Brand cost
– All customers use social media: communication ‘force multiplier’
– “If you make customers unhappy in the physical world, they might each tell six
friends. If you make customers unhappy on the internet, they can each tell 6,000”. –
Jeff Bezos
– W. Edwards Deming said “5” and “20”…
– Call it “Customer Satisfaction at Web-Scale”
PROPRIETARY & CONFIDENTIAL 21
How ClustrixDB Scales-In
ClustrixDB:
PROPRIETARY & CONFIDENTIAL 23
ClustrixDB
ACID Compliant
Transactions & Joins
Optimized for OLTP
Built-In Fault Tolerance
Flex-Up and Flex-Down
Minimal DB Admin
• Write + Read Linear Scale-Out
• Click to Elastically Add/Remove Servers
• MySQL-Compatible
PROPRIETARY & CONFIDENTIAL 24
Adding + Removing Nodes: Scaling Out + In
o Easy and simple Flex Up (or Flex Down)
– Single minimal ‘database pause’
o All servers handle writes and reads
– Workload is spread across more servers
after Flex Up
o Data is automatically rebalanced across
the cluster
– Tables are online for reads and writes
– MVCC for lockless reads while writing
S1
S2
S3
S3
S4
S4
S5
S1
ClustrixDB
S2
S5
Review: Questions for Today
o Why and When is scaling down MySQL a good idea?
– Periodic workloads, Flash Sales, new Releases, etc
o What options are there to scale down MySQL?
– Single Node: Shrink single node
– Master/Slave: Remove read slaves, shrink master
– Master/Master: Drop and/or shrink a master
– Sharding: Drop and combine shards
o How do I figure out the costs of not scaling down?
– Cost 1: Undersubscribed resources
– Cost 2: Budget impact to ability to scale for peaks
PROPRIETARY & CONFIDENTIAL 25
Review: Questions for Today
o How does ClustrixDB scale-down differently than MySQL?
– Shared-nothing scale-out RDBMS clustered database
– Simply add or drop nodes to scale-out or scale-in
o How real is elastically scaling in ClustrixDB? What are the
catches?
– Add nodes via IP. Add IP to Load Balancer. No app changes.
– Remove nodes via IP. Remove IP from Load Balancer. No app
changes.
– Minor ‘database pause’ for multi-node ‘group change’
PROPRIETARY & CONFIDENTIAL 26
QUESTIONS?
THANK YOU!

Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?

  • 1.
    Flexible transactional scalefor the connected world. Challenges to Scaling MySQL: Scaling In and Down – The Costs Dave A. Anselmi @AnselmiDave Director of Product Management
  • 2.
    Questions for Today oWhy and when is scaling down MySQL a good idea? o What options are there to scale down MySQL? o How do I figure out the costs of not scaling down? o How does ClustrixDB scale-down differently than MySQL? o How real is elastically scaling in ClustrixDB? What are the catches? PROPRIETARY & CONFIDENTIAL 2
  • 3.
    MySQL Scaling isUsually 1-Way: Up/Out
  • 4.
    The Typical Pathto Scale…SCALE (GROWTH/SUCCESS) T I M E LAMP Stack AWS, Azure, RAX, GCE, etc Private Cloud REACH LIMIT App too slow; Lost users REACH LIMIT (AGAIN) App too slow; Lost users Migrate to Bigger Machine • Read slaves, then Sharding, etc: • Add more hardware & DBAs • Refactor Code /Hardwired App  More Expensive  Higher Risk  Lost Revenue ONGOING: • Refactoring Hardware • Data Balancing • Shard Maintenance REPEAT Migrate to Bigger Machine PROPRIETARY & CONFIDENTIAL 4
  • 5.
    Why is ScalingDown/In Needed?
  • 6.
    PROPRIETARY & CONFIDENTIAL6 Scaling for Peak Workloads
  • 7.
    Peak or PeriodicWorkloads Waste Resources o Many workloads have some periodicity o Maintaining capacity for peaks while undersubscribed results in wasted resources o This bugs the CFO… & affects DevOps budgets PROPRIETARY & CONFIDENTIAL 7
  • 8.
    Why Costs ShouldMatter to Tech People o DevOps, DBAs, and data architects focus on product features and technical feasibilities. “TCO ain’t our TLA”  o However, at some point the ‘business side’ of your company has to authorize purchase of the actual system(s). o Whether it’s licensing, support, or cloud solutions (AWS, etc), all of them have a price, and all of them have to be ‘justified.’ o Knowing how to frame your implementation recommendations into pros/cons-based ‘business cases’ greatly affects your resources requests: – Either with your team lead/department head – or the guys and gals in finance PROPRIETARY & CONFIDENTIAL 8
  • 9.
    Peak or PeriodicWorkloads by Sector o E-Commerce – Black Friday/Cyber Monday, Single’s Day, ‘Back to School,’ flash sales, etc – 80% of Revenue in 2 months – Provisioning > 3x capacity for 2 months o Gaming – New game released, new update – Need ability to quickly scale either out (game servers oversubscribed) or in (less gamers than estimated) – Cannibalization: New game causes migration of previous subscribed base from old game(s) PROPRIETARY & CONFIDENTIAL 9
  • 10.
    Peak or PeriodicWorkloads by Sector o Social Media – Some events are periodic/predictable (e.g. Awards Season, movie releases, Hallmark holidays, TV shows) – Some events much less so (current events, ‘hot trends,’ politics, social outrage, etc) o Sports – Playoffs, Super Bowl, March Madness, etc – Provisioning for these requires quickly available additional resources – After the main event, sports app utilization can fall severely, leaving server arrays overprovisioned PROPRIETARY & CONFIDENTIAL 10
  • 11.
    How are MySQLworkloads Scaled Down/In?
  • 12.
    Scaling Down/In: TheWork Required Approach How Scale Up/Out Scale Down/In Scale-Up Keep increasing the size of the (single) database server • Console: Click for larger server, until largest available • EC2: Bring up larger (redundant) server with backup, use replication to catch up, then change application to new DB endpoint • Console: Click for smaller server. Works well if max workload fits in DBaaS offering • EC2: Bring up smaller (redundant) server with backup, use replication to catch up, then change application to new DB endpoint Console: DBaaS with shared storage. EC2: Instance, or bare metal
  • 13.
    Scaling Down/In: TheWork Required Approach How Scale Up/Out Scale Down/In Scale-Up Keep increasing the size of the (single) database server • Console: Click for larger server, until largest available • EC2: Bring up larger (redundant) server with backup, use replication to catch up, then change application to new DB endpoint • Console: Click for smaller server. Works well if max workload fits in DBaaS offering • EC2: Bring up smaller (redundant) server with backup, use replication to catch up, then change application to new DB endpoint Read Slaves Add a ‘Slave’ read- server(s) to ‘Master’ database server • Console: Click to add ‘Read Replicas’ • EC2: Bring up redundant server(s) with backup, & turn on replication • Setup read/write fan-out in app or at the proxy level • Console: Click to remove ‘Read Replicas’ • EC2: Bring down read slave(s) • Change read/write fan-out in app or at the proxy level Console: DBaaS with shared storage. EC2: Instance, or bare metal
  • 14.
    Scaling Down/In: TheWork Required Approach How Scale Up/Out Scale Down/In Scale-Up Keep increasing the size of the (single) database server • Console: Click for larger server, until largest available • EC2: Bring up larger (redundant) server with backup, use replication to catch up, then change application to new DB endpoint • Console: Click for smaller server. Works well if max workload fits in DBaaS offering • EC2: Bring up smaller (redundant) server with backup, use replication to catch up, then change application to new DB endpoint Read Slaves Add a ‘Slave’ read- server(s) to ‘Master’ database server • Console: Click to add ‘Read Replicas’ • EC2: Bring up redundant server(s) with backup, & turn on replication • Setup read/write fan-out in app or at the proxy level • Console: Click to remove ‘Read Replicas’ • EC2: Bring down read slave(s) • Change read/write fan-out in app or at the proxy level Master-Master Add additional ‘Master’(s) to ‘Master’ database server • Console: No native support. Can deploy larger instances, but must setup master/master yourself • EC2: Provision 2 new larger masters via backup, use replication to catch up, then change application to new DB’s endpoints • Console: No native support. Can deploy smaller instances, but must setup master/master yourself • EC2: Provision 2 new smaller masters via backup, use replication to catch up, then change application to new DB’s endpoints Console: DBaaS with shared storage. EC2: Instance, or bare metal
  • 15.
    Scaling Down/In: TheWork Required Approach How Scale Up/Out Scale Down/In Vertical Sharding Separating tables across separate database servers • Console: No native support. Can deploy additional instances, but must setup table distribution yourself • EC2: Provision additional instances via backup, (manually) re-distribute tables across shards, then change application to include new shards • Console: No native support. Can deprovision instances, but must consolidate tables yourself • EC2: Consolidate tables from redundant shards, deprovision redundant shards, and change application/table mapping to match new data distribution Console: DBaaS with shared storage. EC2: Instance, or bare metal
  • 16.
    Scaling Down/In: TheWork Required Approach How Scale Up/Out Scale Down/In Vertical Sharding Separating tables across separate database servers • Console: No native support. Can deploy additional instances, but must setup table distribution yourself • EC2: Provision additional instances via backup, (manually) re-distribute tables across shards, then change application to include new shards • Console: No native support. Can deprovision instances, but must consolidate tables yourself • EC2: Consolidate tables from redundant shards, deprovision redundant shards, and change application/table mapping to match new data distribution Horizontal Sharding Partitioning tables across separate database servers • Console: No native support. Can deploy additional instances, but must setup partition distribution yourself • EC2: Provision additional instances via backup, (manually) re-distribute partitions across shards, then change application to include new shards • Console: No native support. Can deploy smaller instances, but must consolidate partitions yourself • EC2: Consolidate tables from redundant shards, deprovision redundant shards, and change application/table mapping to match new data distribution Console: DBaaS with shared storage. EC2: Instance, or bare metal
  • 17.
    What are theCosts of NOT Scaling Down/In
  • 18.
    o Idle Server/Overcapacitycost – CAPEX budget wasted on unused resources – OPEX budget probably OK: idle servers need less DevOps o Low Overall Impact to DevOps Infra: – “Everything’s Working” / “Not broken; Don’t Fix it” – Low CAPEX budget means low budgets for replacements; so instead cannibalize underutilized infra. “No Problem” PROPRIETARY & CONFIDENTIAL 18 DevOps Impact #1 from Overprovisioning
  • 19.
    1. 1-way scalingto handle peaks =>> Idle resources at non-peak, often most of the time 2. Idle resources =>> Blown/Shrunk DevOps Budgets 1. Both CAPEX and OPEX 2. Finance team pays attention! 3. Blown/Shrunk DevOps Budgets =>> Hard to get Approval for further capacity 4. No Budget =>> Can’t Scale for Growing Peaks 5. Higher risk of site slowdowns or outages at next peak(s) PROPRIETARY & CONFIDENTIAL 19 DevOps Impact #2 from Overprovisioning
  • 20.
    Black Friday/Cyber MondayOutage Highlights o 2011: PC Mall, Newegg, Toys R’Us, Avon: 30+min outages. Walmart: 3hr outage o 2012: Kohl’s: repeated multi-hour outages o 2013: Urban Outfitters, Motorola: offline most of Cyber Monday o 2014: Best Buy: 2hrs+ total outages. HP, Nike: site crashes o 2015: Neiman Marcus: 4hr+ outage o 2016: Old Navy, Macy’s: multi-hour outages 2016 Black Friday/Cyber Monday Total Online Sales: $5.27B, 21.6% increase over 2015 PROPRIETARY & CONFIDENTIAL 20
  • 21.
    Even Larger BusinessImpact of Outages o Opportunity cost – Each missed visitor was potentially a customer or referral o Single Sale cost – Each missed sale is a tangible missed $-value o Customer Lifetime cost – Unhappy customers who find sites they like better, won’t return o Market/Brand cost – All customers use social media: communication ‘force multiplier’ – “If you make customers unhappy in the physical world, they might each tell six friends. If you make customers unhappy on the internet, they can each tell 6,000”. – Jeff Bezos – W. Edwards Deming said “5” and “20”… – Call it “Customer Satisfaction at Web-Scale” PROPRIETARY & CONFIDENTIAL 21
  • 22.
  • 23.
    ClustrixDB: PROPRIETARY & CONFIDENTIAL23 ClustrixDB ACID Compliant Transactions & Joins Optimized for OLTP Built-In Fault Tolerance Flex-Up and Flex-Down Minimal DB Admin • Write + Read Linear Scale-Out • Click to Elastically Add/Remove Servers • MySQL-Compatible
  • 24.
    PROPRIETARY & CONFIDENTIAL24 Adding + Removing Nodes: Scaling Out + In o Easy and simple Flex Up (or Flex Down) – Single minimal ‘database pause’ o All servers handle writes and reads – Workload is spread across more servers after Flex Up o Data is automatically rebalanced across the cluster – Tables are online for reads and writes – MVCC for lockless reads while writing S1 S2 S3 S3 S4 S4 S5 S1 ClustrixDB S2 S5
  • 25.
    Review: Questions forToday o Why and When is scaling down MySQL a good idea? – Periodic workloads, Flash Sales, new Releases, etc o What options are there to scale down MySQL? – Single Node: Shrink single node – Master/Slave: Remove read slaves, shrink master – Master/Master: Drop and/or shrink a master – Sharding: Drop and combine shards o How do I figure out the costs of not scaling down? – Cost 1: Undersubscribed resources – Cost 2: Budget impact to ability to scale for peaks PROPRIETARY & CONFIDENTIAL 25
  • 26.
    Review: Questions forToday o How does ClustrixDB scale-down differently than MySQL? – Shared-nothing scale-out RDBMS clustered database – Simply add or drop nodes to scale-out or scale-in o How real is elastically scaling in ClustrixDB? What are the catches? – Add nodes via IP. Add IP to Load Balancer. No app changes. – Remove nodes via IP. Remove IP from Load Balancer. No app changes. – Minor ‘database pause’ for multi-node ‘group change’ PROPRIETARY & CONFIDENTIAL 26
  • 27.
  • 28.