SlideShare a Scribd company logo
data tiering
Squeezing scale out of MySQL

Julien, VP Engineering at HouseTrip
github.com/mezis
disclaimer
IANADBA*

I’m not a database administrator

3
the scene

4
Load balancer
(ELB)

3k rpm
30x10
web workers
(Passenger/Rack)

6x20
job workers
(DJ)

85k qpm
Memcache
4x7GB

MySQL
400GB
5x replica

MondoDB
120GB
2x replica

S3
TBs

5
predictable traffic
~25% searches

6
search =
[destination, start date, end date]

↓

[ [property, price], … ]
property


availability


rate


!

!

!

destination_id

property_id
rate_id
start_date
end_date

price

7
search →

booking →

big bulky join




long transaction

+ business logic

+ many small R/W

property


availability


rate


!

!

!

destination_id

property_id
rate_id
start_date
end_date

price

8
the crisis

9
peak traffic 7pm - 10pm
write queries !
read queries "
write IO ⛅️
cpu load ⛅️
memory ☀️
10
contention
noun (kənˈtɛnʃən)
1. a struggling between opponents
2. competition for limited resources

11
slow reads ?
poor use of indices

during large write transactions

http://dev.mysql.com/doc/refman/5.5/en/optimizing-innodb-transaction-management.html

http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_covering_index

12
slow writes ?
load+locking on rollback segments

http://dev.mysql.com/doc/refman/5.1/en/innodb-multi-versioning.html

http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_rollback_segment

13
digging & deeper
SHOW ENGINE INNODB STATUS '
---TRANSACTION 72C, ACTIVE 755 sec

4 lock struct(s), …, 3 row lock(s), undo log entries 12

TABLE LOCK …

RECORD LOCKS …

RECORD LOCKS … locks rec but not gap

RECORD LOCKS … lock_mode X locks gap before rec 

http://www.mysqlperformanceblog.com/2012/03/27/innodbs-gap-locks/

http://dev.mysql.com/doc/refman/5.1/en/innodb-monitors.html

14
15
(not) fixing it

16
horizontal scaling
“throw money at it”
→ does not work
→ ops + maintenance cost

17
fine-tune

the DB engine
“bring in the experts”
→ only works short-term

18
vertical scaling
“throw more money at it”
→ bigger database instances

~ +4k$/mo
→ only 2 cartridges

in that gun

19
put it in a service
de-normalise the data
use that noSQL thing
webscale

webscale

websc
ale
20
good solution ?
→ live in 1-2 weeks
→ buys 6-12 months

http://xkcd.com/1205/

21
( intermission (

22
frame tearing
(not tiering)

23
frame tearing
caused by “simple buffering”
render scene in buffer

draw to screen from buffer

t

t

24
frame(buffer) tiering
aka “double buffering”
render scene in buffer

draw to screen from buffer

t

t

25
data tiering
separate read and write tables
read
availabilities_front
swap
availabilities_back
read/write

copy
availabilities

http://en.wikipedia.org/wiki/File:Comparison_double_triple_buffering.svg

26
how it works

27
tables
availabilities

availabilities_0

availabilities_1
data_tiering_switches

data_tiering_sync_logs 
28
using tiered tables
DataTiering::Switch.new.active_scope_for(Availability)
equivalent one of
Availability.scoped(:from => 'properties_0')

Availability.scoped(:from => 'properties_1')

read
availabilities_front
swap
availabilities_back
read/write

copy
availabilities
29
syncing
DataTiering::Sync.sync_and_switch!
regularly* scheduled task

(every 5 min for us, takes ~ 60s)
*depend on acceptable staleness
read
availabilities_front
swap
availabilities_back
read/write

copy
availabilities
30
syncing: schema
lazily (no migrations) :

- create missing /.*_[01]/ tables
- compare schemas with

SHOW CREATE TABLE
read
availabilities_front
swap
availabilities_back
read/write

copy
availabilities
31
syncing: bulk
- run TRUNCATE TABLE then

INSERT INTO … SELECT FROM
- too slow at runtime

(only for setup / after migrations)
read
availabilities_front
swap
availabilities_back
read/write

copy
availabilities
32
syncing: deltas
- deletions :

SELECT id … LEFT JOIN …

DELETE … WHERE id IN …

- insertions & updates :

read
REPLACE INTO … row_touched_at > X
availabilities_front

swap
- remember last sync in data_tiering_sync_logs
availabilities_back

- row_touched_at “magic”

copy
read/write
TIMESTAMP columnavailabilities

33
swapping
- renaming tables not transactional
- atomically change a pointer instead

(and cache it)
→ data_tiering_switches
read
availabilities_front
swap
availabilities_back
read/write

copy
availabilities
34
gem by
@kratob

@hubb

@mconnell

@danielgrieve

35
outcome

36
outcome
average

DB time
unchanged
nominal at
peak traffic
deadlocks
timeouts

faster reads
37
epilogue

38
so long, data tiering !
you served us well
39
keep calm
♡
refactor
@mezis_fr
http://dev.housetrip.com/

40

More Related Content

What's hot

What's hot (20)

PostgreSQL v9.4features
PostgreSQL v9.4featuresPostgreSQL v9.4features
PostgreSQL v9.4features
 
The Road To RAM - Carlos Bueno, MemSQL
The Road To RAM - Carlos Bueno, MemSQLThe Road To RAM - Carlos Bueno, MemSQL
The Road To RAM - Carlos Bueno, MemSQL
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod Polyakov"Metrics: Where and How", Vsevolod Polyakov
"Metrics: Where and How", Vsevolod Polyakov
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in Spark
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure way
 
MongoDB - An Introduction
MongoDB - An IntroductionMongoDB - An Introduction
MongoDB - An Introduction
 
Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...Understanding and tuning WiredTiger, the new high performance database engine...
Understanding and tuning WiredTiger, the new high performance database engine...
 
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
[db tech showcase Tokyo 2017] A11: SQLite - The most used yet least appreciat...
 
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...
Solving Low Latency Query Over Big Data with Spark SQL-(Julien Pierre, Micros...
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 
Mashing the data
Mashing the dataMashing the data
Mashing the data
 
RethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime webRethinkDB - the open-source database for the realtime web
RethinkDB - the open-source database for the realtime web
 
The power of streams in node js
The power of streams in node jsThe power of streams in node js
The power of streams in node js
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.Valerii Vasylkov Erlang. measurements and benefits.
Valerii Vasylkov Erlang. measurements and benefits.
 
KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)KDB database (EPAM tech talks, Sofia, April, 2015)
KDB database (EPAM tech talks, Sofia, April, 2015)
 
Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10
Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10
Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10
 

Similar to Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)

NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database Overview
Steve Min
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)
Rim Moussa
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)reference
FromDual GmbH
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And Availability
ConSanFrancisco123
 

Similar to Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13) (20)

Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentation
 
Cloudcon East Presentation
Cloudcon East PresentationCloudcon East Presentation
Cloudcon East Presentation
 
NoSQL with MySQL
NoSQL with MySQLNoSQL with MySQL
NoSQL with MySQL
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
 
NewSQL Database Overview
NewSQL Database OverviewNewSQL Database Overview
NewSQL Database Overview
 
In-Memory Database Performance on AWS M4 Instances
In-Memory Database Performance on AWS M4 InstancesIn-Memory Database Performance on AWS M4 Instances
In-Memory Database Performance on AWS M4 Instances
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Sergejus Barinovas
Sergejus BarinovasSergejus Barinovas
Sergejus Barinovas
 
Web Scale with NoSQL
Web Scale with NoSQLWeb Scale with NoSQL
Web Scale with NoSQL
 
Accessing Databases from R
Accessing Databases from RAccessing Databases from R
Accessing Databases from R
 
Accessing Databases from R
Accessing Databases from RAccessing Databases from R
Accessing Databases from R
 
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
Locality of (p)reference
Locality of (p)referenceLocality of (p)reference
Locality of (p)reference
 
No Sql
No SqlNo Sql
No Sql
 
Discover Database
Discover DatabaseDiscover Database
Discover Database
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Clustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And AvailabilityClustered Architecture Patterns Delivering Scalability And Availability
Clustered Architecture Patterns Delivering Scalability And Availability
 

Recently uploaded

Recently uploaded (20)

Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Buy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptxBuy Epson EcoTank L3210 Colour Printer Online.pptx
Buy Epson EcoTank L3210 Colour Printer Online.pptx
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 

Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)