Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?

Flexible transactional scale for the connected world.
Challenges to Scaling MySQL:
Scaling In and Down – The Costs
Dave A. Anselmi @AnselmiDave
Director of Product Management

Questions for Today
o Why and when is scaling down MySQL a good idea?
o What options are there to scale down MySQL?
o How do I figure out the costs of not scaling down?
o How does ClustrixDB scale-down differently than MySQL?
o How real is elastically scaling in ClustrixDB? What are the
catches?
PROPRIETARY & CONFIDENTIAL 2

MySQL Scaling is Usually 1-Way: Up/Out

The Typical Path to Scale…SCALE
(GROWTH/SUCCESS)
T I M E
LAMP Stack
AWS, Azure,
RAX, GCE, etc
Private Cloud
REACH LIMIT
App too slow;
Lost users
REACH LIMIT
(AGAIN)
App too slow;
Lost users
Migrate
to Bigger
Machine
• Read slaves, then
Sharding, etc:
• Add more hardware &
DBAs
• Refactor Code
/Hardwired App
 More Expensive
 Higher Risk
 Lost Revenue
ONGOING:
• Refactoring
Hardware
• Data Balancing
• Shard
Maintenance
REPEAT
Migrate
to Bigger
Machine

Why is Scaling Down/In Needed?

Scaling for Peak Workloads

Peak or Periodic Workloads Waste Resources
o Many workloads have some periodicity
o Maintaining capacity for peaks while undersubscribed
results in wasted resources
o This bugs the CFO… & affects DevOps budgets

Why Costs Should Matter to Tech People
o DevOps, DBAs, and data architects focus on product features and
technical feasibilities. “TCO ain’t our TLA” 
o However, at some point the ‘business side’ of your company has to
authorize purchase of the actual system(s).
o Whether it’s licensing, support, or cloud solutions (AWS, etc), all of
them have a price, and all of them have to be ‘justified.’
o Knowing how to frame your implementation recommendations into
pros/cons-based ‘business cases’ greatly affects your resources
requests:
– Either with your team lead/department head
– or the guys and gals in finance

Peak or Periodic Workloads by Sector
o E-Commerce
– Black Friday/Cyber Monday, Single’s Day, ‘Back to School,’
flash sales, etc
– 80% of Revenue in 2 months
– Provisioning > 3x capacity for 2 months
o Gaming
– New game released, new update
– Need ability to quickly scale either out (game servers
oversubscribed) or in (less gamers than estimated)
– Cannibalization: New game causes migration of previous
subscribed base from old game(s)

Peak or Periodic Workloads by Sector
o Social Media
– Some events are periodic/predictable (e.g. Awards Season,
movie releases, Hallmark holidays, TV shows)
– Some events much less so (current events, ‘hot trends,’ politics,
social outrage, etc)
o Sports
– Playoffs, Super Bowl, March Madness, etc
– Provisioning for these requires quickly available additional
resources
– After the main event, sports app utilization can fall severely,
leaving server arrays overprovisioned

How are MySQL workloads Scaled Down/In?

Scaling Down/In: The Work Required
Approach How Scale Up/Out Scale Down/In
Scale-Up Keep increasing the size
of the (single) database
server
• Console: Click for larger server, until
largest available
• EC2: Bring up larger (redundant)
server with backup, use replication to
catch up, then change application to
new DB endpoint
• Console: Click for smaller server. Works
well if max workload fits in DBaaS offering
• EC2: Bring up smaller (redundant) server
with backup, use replication to catch up,
then change application to new DB
endpoint
Console: DBaaS with shared storage. EC2: Instance, or bare metal

server
largest available
new DB endpoint
endpoint
Read Slaves Add a ‘Slave’ read-
server(s) to ‘Master’
database server
• Console: Click to add ‘Read Replicas’
• EC2: Bring up redundant server(s)
with backup, & turn on replication
• Setup read/write fan-out in app or at
the proxy level
• Console: Click to remove ‘Read Replicas’
• EC2: Bring down read slave(s)
• Change read/write fan-out in app or at the
proxy level

server
largest available
new DB endpoint
endpoint
Read Slaves Add a ‘Slave’ read-
server(s) to ‘Master’
database server
• Console: Click to add ‘Read Replicas’
• EC2: Bring up redundant server(s)
with backup, & turn on replication
• Setup read/write fan-out in app or at
the proxy level
• Console: Click to remove ‘Read Replicas’
• EC2: Bring down read slave(s)
• Change read/write fan-out in app or at the
proxy level
Master-Master Add additional
‘Master’(s) to ‘Master’
database server
• Console: No native support. Can
deploy larger instances, but must
setup master/master yourself
• EC2: Provision 2 new larger masters
via backup, use replication to catch
up, then change application to new
DB’s endpoints
• Console: No native support. Can deploy
smaller instances, but must setup
master/master yourself
• EC2: Provision 2 new smaller masters via
backup, use replication to catch up, then
change application to new DB’s endpoints

Vertical
Sharding
Separating tables across
separate database
servers
deploy additional instances, but must
setup table distribution yourself
• EC2: Provision additional instances via
backup, (manually) re-distribute tables
across shards, then change
application to include new shards
deprovision instances, but must
consolidate tables yourself
• EC2: Consolidate tables from redundant
shards, deprovision redundant shards, and
change application/table mapping to match
new data distribution

Vertical
Sharding
Separating tables across
separate database
servers
setup table distribution yourself
backup, (manually) re-distribute tables
across shards, then change
deprovision instances, but must
consolidate tables yourself
Horizontal
Sharding
Partitioning tables across
separate database
servers
setup partition distribution yourself
backup, (manually) re-distribute
partitions across shards, then change
• Console: No native support. Can deploy
smaller instances, but must consolidate
partitions yourself

What are the Costs of NOT Scaling Down/In

o Idle Server/Overcapacity cost
– CAPEX budget wasted on unused resources
– OPEX budget probably OK: idle servers need less DevOps
o Low Overall Impact to DevOps Infra:
– “Everything’s Working” / “Not broken; Don’t Fix it”
– Low CAPEX budget means low budgets for replacements; so instead
cannibalize underutilized infra. “No Problem”
DevOps Impact #1 from Overprovisioning

1. 1-way scaling to handle peaks =>> Idle resources at non-peak,
often most of the time
2. Idle resources =>> Blown/Shrunk DevOps Budgets
1. Both CAPEX and OPEX
2. Finance team pays attention!
3. Blown/Shrunk DevOps Budgets =>> Hard to get Approval for
further capacity
4. No Budget =>> Can’t Scale for Growing Peaks
5. Higher risk of site slowdowns or outages at next peak(s)
DevOps Impact #2 from Overprovisioning

Black Friday/Cyber Monday Outage Highlights
o 2011: PC Mall, Newegg, Toys R’Us, Avon: 30+min outages. Walmart:
3hr outage
o 2012: Kohl’s: repeated multi-hour outages
o 2013: Urban Outfitters, Motorola: offline most of Cyber Monday
o 2014: Best Buy: 2hrs+ total outages. HP, Nike: site crashes
o 2015: Neiman Marcus: 4hr+ outage
o 2016: Old Navy, Macy’s: multi-hour outages
2016 Black Friday/Cyber Monday
Total Online Sales: $5.27B, 21.6% increase over 2015

Even Larger Business Impact of Outages
o Opportunity cost
– Each missed visitor was potentially a customer or referral
o Single Sale cost
– Each missed sale is a tangible missed $-value
o Customer Lifetime cost
– Unhappy customers who find sites they like better, won’t return
o Market/Brand cost
– All customers use social media: communication ‘force multiplier’
– “If you make customers unhappy in the physical world, they might each tell six
friends. If you make customers unhappy on the internet, they can each tell 6,000”. –
Jeff Bezos
– W. Edwards Deming said “5” and “20”…
– Call it “Customer Satisfaction at Web-Scale”

ClustrixDB:
ClustrixDB
ACID Compliant
Transactions & Joins
Optimized for OLTP
Built-In Fault Tolerance
Flex-Up and Flex-Down
Minimal DB Admin
• Write + Read Linear Scale-Out
• Click to Elastically Add/Remove Servers
• MySQL-Compatible

Adding + Removing Nodes: Scaling Out + In
o Easy and simple Flex Up (or Flex Down)
– Single minimal ‘database pause’
o All servers handle writes and reads
– Workload is spread across more servers
after Flex Up
o Data is automatically rebalanced across
the cluster
– Tables are online for reads and writes
– MVCC for lockless reads while writing
S1
S2
S3
S3
S4
S4
S5
S1
ClustrixDB
S2
S5

Review: Questions for Today
o Why and When is scaling down MySQL a good idea?
– Periodic workloads, Flash Sales, new Releases, etc
o What options are there to scale down MySQL?
– Single Node: Shrink single node
– Master/Slave: Remove read slaves, shrink master
– Master/Master: Drop and/or shrink a master
– Sharding: Drop and combine shards
o How do I figure out the costs of not scaling down?
– Cost 1: Undersubscribed resources
– Cost 2: Budget impact to ability to scale for peaks

Review: Questions for Today
o How does ClustrixDB scale-down differently than MySQL?
– Shared-nothing scale-out RDBMS clustered database
– Simply add or drop nodes to scale-out or scale-in
o How real is elastically scaling in ClustrixDB? What are the
catches?
– Add nodes via IP. Add IP to Load Balancer. No app changes.
– Remove nodes via IP. Remove IP from Load Balancer. No app
changes.
– Minor ‘database pause’ for multi-node ‘group change’

Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?

More Related Content

What's hot

Similar to Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?

More from Clustrix

Recently uploaded

Tech Talk Series, Part 3: Why is your CFO right to demand you scale down MySQL?