Data Tiering: Squeezing Scale out of MySQL (LRUG Presentation 2014-01-13)

  • 1,683 views
Uploaded on

A story of how the HouseTrip team found an easy-to-implement solution to buy 6 months time on a scaling / contention problem with MySQL.

A story of how the HouseTrip team found an easy-to-implement solution to buy 6 months time on a scaling / contention problem with MySQL.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,683
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
8
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. data tiering Squeezing scale out of MySQL Julien, VP Engineering at HouseTrip github.com/mezis
  • 2. disclaimer IANADBA*
 I’m not a database administrator 3
  • 3. the scene 4
  • 4. Load balancer (ELB) 3k rpm 30x10 web workers (Passenger/Rack) 6x20 job workers (DJ) 85k qpm Memcache 4x7GB MySQL 400GB 5x replica MondoDB 120GB 2x replica S3 TBs 5
  • 5. predictable traffic ~25% searches 6
  • 6. search = [destination, start date, end date]
 ↓
 [ [property, price], … ] property
 availability
 rate
 ! ! ! destination_id property_id rate_id start_date end_date price 7
  • 7. search → booking → big bulky join
 
 long transaction
 + business logic
 + many small R/W property
 availability
 rate
 ! ! ! destination_id property_id rate_id start_date end_date price 8
  • 8. the crisis 9
  • 9. peak traffic 7pm - 10pm write queries ! read queries " write IO ⛅️ cpu load ⛅️ memory ☀️ 10
  • 10. contention noun (kənˈtɛnʃən) 1. a struggling between opponents 2. competition for limited resources 11
  • 11. slow reads ? poor use of indices
 during large write transactions http://dev.mysql.com/doc/refman/5.5/en/optimizing-innodb-transaction-management.html
 http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_covering_index 12
  • 12. slow writes ? load+locking on rollback segments http://dev.mysql.com/doc/refman/5.1/en/innodb-multi-versioning.html
 http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_rollback_segment 13
  • 13. digging & deeper SHOW ENGINE INNODB STATUS ' ---TRANSACTION 72C, ACTIVE 755 sec
 4 lock struct(s), …, 3 row lock(s), undo log entries 12
 TABLE LOCK …
 RECORD LOCKS …
 RECORD LOCKS … locks rec but not gap
 RECORD LOCKS … lock_mode X locks gap before rec  http://www.mysqlperformanceblog.com/2012/03/27/innodbs-gap-locks/
 http://dev.mysql.com/doc/refman/5.1/en/innodb-monitors.html 14
  • 14. 15
  • 15. (not) fixing it 16
  • 16. horizontal scaling “throw money at it” → does not work → ops + maintenance cost 17
  • 17. fine-tune
 the DB engine “bring in the experts” → only works short-term 18
  • 18. vertical scaling “throw more money at it” → bigger database instances
 ~ +4k$/mo → only 2 cartridges
 in that gun 19
  • 19. put it in a service de-normalise the data use that noSQL thing webscale webscale websc ale 20
  • 20. good solution ? → live in 1-2 weeks → buys 6-12 months http://xkcd.com/1205/ 21
  • 21. ( intermission ( 22
  • 22. frame tearing (not tiering) 23
  • 23. frame tearing caused by “simple buffering” render scene in buffer draw to screen from buffer t t 24
  • 24. frame(buffer) tiering aka “double buffering” render scene in buffer draw to screen from buffer t t 25
  • 25. data tiering separate read and write tables read availabilities_front swap availabilities_back read/write copy availabilities http://en.wikipedia.org/wiki/File:Comparison_double_triple_buffering.svg 26
  • 26. how it works 27
  • 27. tables availabilities
 availabilities_0
 availabilities_1 data_tiering_switches
 data_tiering_sync_logs  28
  • 28. using tiered tables DataTiering::Switch.new.active_scope_for(Availability) equivalent one of Availability.scoped(:from => 'properties_0')
 Availability.scoped(:from => 'properties_1') read availabilities_front swap availabilities_back read/write copy availabilities 29
  • 29. syncing DataTiering::Sync.sync_and_switch! regularly* scheduled task
 (every 5 min for us, takes ~ 60s) *depend on acceptable staleness read availabilities_front swap availabilities_back read/write copy availabilities 30
  • 30. syncing: schema lazily (no migrations) :
 - create missing /.*_[01]/ tables - compare schemas with
 SHOW CREATE TABLE read availabilities_front swap availabilities_back read/write copy availabilities 31
  • 31. syncing: bulk - run TRUNCATE TABLE then
 INSERT INTO … SELECT FROM - too slow at runtime
 (only for setup / after migrations) read availabilities_front swap availabilities_back read/write copy availabilities 32
  • 32. syncing: deltas - deletions :
 SELECT id … LEFT JOIN …
 DELETE … WHERE id IN … - insertions & updates :
 read REPLACE INTO … row_touched_at > X availabilities_front swap - remember last sync in data_tiering_sync_logs availabilities_back - row_touched_at “magic”
 copy read/write TIMESTAMP columnavailabilities 33
  • 33. swapping - renaming tables not transactional - atomically change a pointer instead
 (and cache it) → data_tiering_switches read availabilities_front swap availabilities_back read/write copy availabilities 34
  • 34. gem by @kratob
 @hubb
 @mconnell
 @danielgrieve 35
  • 35. outcome 36
  • 36. outcome average
 DB time unchanged nominal at peak traffic deadlocks timeouts
 faster reads 37
  • 37. epilogue 38
  • 38. so long, data tiering ! you served us well 39
  • 39. keep calm ♡ refactor @mezis_fr http://dev.housetrip.com/ 40