The Etsy Shard Architecture: Starts With S and Ends With Hard

32,436 views
31,380 views

Published on

Overview of the etsy shard architecture

Published in: Technology, Business
0 Comments
72 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
32,436
On SlideShare
0
From Embeds
0
Number of Embeds
37
Actions
Shares
0
Downloads
333
Comments
0
Likes
72
Embeds 0
No embeds

No notes for slide

The Etsy Shard Architecture: Starts With S and Ends With Hard

  1. The Etsy Shard Architecture Starts With S and Ends With Hard jgoulah@etsy.com / @johngoulah
  2. 1.5B page views / mo.525MM sales in 201140MM unique visitors/mo.800K shops / 150 countries
  3. 25K+ queries/sec avg3TB InnoDB buffer pool15TB+ data stored99.99% queries under 1ms
  4. 50+ MySQL servers Server Spec HP DL 380 G7 96GB RAM16 spindles / 1TB RAID 10 24 Core
  5. Ross SnyderScaling Etsy - What Went Wrong, What Went Right http://bit.ly/rpcxtP Matt Graham Migrating From PG to MySQL Without Downtime http://bit.ly/rQpqZG
  6. Architecture
  7. Redundancy
  8. Master - Master
  9. Master - Master R/W R/W
  10. Master - Master R/W R/W Side A Side B
  11. Scalability
  12. shard 1 shard 2 shard N ...
  13. shard 1 shard 2 shard N ... shard N + 1
  14. shard 1 shard 2 shard N ...Migrate Migrate Migrate shard N + 1
  15. Bird’s-Eye View
  16. tickets indexshard 1 shard 2 shard N
  17. tickets index Unique IDsshard 1 shard 2 shard N
  18. tickets index Shard Lookupshard 1 shard 2 shard N
  19. tickets indexshard 1 shard 2 shard N Store/Retrieve Data
  20. Basics
  21. users_groupsuser_id group_id 1 A 1 B 2 A 2 C 3 A 3 B 3 C
  22. users_groupsuser_id group_id 1 A 1 B 2 A 2 C 3 A 3 B 3 C
  23. users_groupsuser_id group_id 1 A 1 B 2 A user_id group_id 2 C 3 A 3 A 3 B 3 B 3 C 3 C
  24. users_groups shard 1user_id group_id 1 A 1 B shard 2 2 A user_id group_id 2 C 3 A 3 B 3 C
  25. Index Servers
  26. Shards NOT Determined by key hashing range partitions partitioning by function
  27. Look-Up Data
  28. indexshard 1 shard 2 shard N
  29. index select shard_id from user_index where user_id = Xshard 1 shard 2 shard N
  30. index select shard_id from user_index where user_id = X returns 1shard 1 shard 2 shard N
  31. index select join_date from users where user_id = Xshard 1 shard 2 shard N
  32. index select join_date from users where user_id = X returns 2012-02-05shard 1 shard 2 shard N
  33. Ticket Servers
  34. Globally Unique ID
  35. CREATE TABLE `tickets` ( `id` bigint(20) unsigned NOT NULL auto_increment, `stub` char(1) NOT NULL default , PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`)) ENGINE=MyISAM
  36. Ticket GenerationREPLACE INTO tickets (stub) VALUES (a);SELECT LAST_INSERT_ID();
  37. Ticket GenerationREPLACE INTO tickets (stub) VALUES (a);SELECT LAST_INSERT_ID();SELECT * FROM tickets; id stub 4589294 a
  38. tickets A auto-increment-increment = 2 auto-increment-offset = 1tickets B auto-increment-increment = 2 auto-increment-offset = 2
  39. tickets A auto-increment-increment = 2 auto-increment-offset = 1tickets B auto-increment-increment = 2 auto-increment-offset = 2 NOT master-master
  40. Shards
  41. Object Hashing
  42. A Buser_id : 500
  43. A Buser_id : 500 % (# active replicants)
  44. A Betsy_index_A => mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_index_B => mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_shard_001_A => mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_001_B => mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_A => mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_B => mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_A => mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_B => mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw, user_id : 500 % (# active replicants)
  45. A Betsy_index_A => mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_index_B => mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_shard_001_A => mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_001_B => mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_A => mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_B => mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_A => mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_B => mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw, user_id : 500 % (# active replicants)
  46. A Buser_id : 500 % (2)
  47. A Buser_id : 500 % (2) == 0
  48. A B select ...user_id : 500 % (2) == 0 insert ... update ...
  49. A Buser_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  50. 500 A B 501select ... select ...insert ... insert ...update ... update ...user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  51. Failure
  52. A Buser_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  53. A Buser_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  54. A Buser_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  55. A Betsy_index_A => mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_index_B => mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_shard_001_A => mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_001_B => mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_A => mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_B => mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_A => mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_B => mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw, user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  56. A Betsy_index_A => mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_index_B => mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_shard_001_A => mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_001_B => mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_A => mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_B => mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_A => mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_B => mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw, user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  57. A Buser_id : 500 % (1) == 0 user_id : 501 % (1) == 0
  58. ORM
  59. connection handling shard lookup replicant selection
  60. CRUDcache handling data validationdata abstraction
  61. Shard Selection
  62. Non-Writable Shards$config["non_writable_shards"] = array(1, 2, 3, 4); public static function getKnownWritableShards(){ return array_values( array_diff( self::getKnownShards(), self::getNonwritableShards() )); }
  63. Initial Selection$shards = EtsyORM::getKnownWritableShards();$user_shard = $shards[rand(0, count($shards) - 1)]; user_id shard_id 500
  64. Initial Selection$shards = EtsyORM::getKnownWritableShards();$user_shard = $shards[rand(0, count($shards) - 1)]; user_id shard_id 500 2
  65. Later.... select shard_id from user_index index where user_id = X shard 1 shard 2 shard N
  66. Variants
  67. shard 1 shard 2 user_id group_id user_id group_id 1 A 3 A 1 B 3 B 2 A 4 A 2 C 5 CSELECT user_id FROM users_groups WHERE group_id = ‘A’
  68. shard 1 shard 2 user_id group_id user_id group_id 1 A 3 A 1 B 3 B 2 A 4 A 2 C 5 CSELECT user_id FROM users_groups WHERE group_id = ‘A’ Broken!
  69. shard 1 shard 2 user_id group_id user_id group_id 1 1 A B JOIN? 3 3 A B 2 A 4 A 2 C 5 CSELECT user_id FROM users_groups WHERE group_id = ‘A’ Broken!
  70. shard 1 shard 2 user_id group_id user_id group_id 1 1 A B JOIN? 3 3 A B 2 A 4 A 2 C 5 CSELECT user_id FROM users_groups WHERE group_id = ‘A’ Broken!
  71. users_groups groups_usersuser_id group_id group_id user_id 1 A A 1 1 B A 3 2 A A 2 2 C B 3 3 A B 1 3 B C 2 3 C C 3
  72. users_groups_index groups_users_index user_id shard_id group_id shard_idindex 1 1 A 1 2 1 B 2 3 2 C 2 4 3 D 3 separate indexes for different slices of data
  73. users_groups_index groups_users_index user_id shard_id group_id shard_idindex 1 1 A 1 2 1 B 2 3 2 C 2 4 3 D 3 user_id group_id shard 3 4 A 4 B 4 C 4 D
  74. Schema Changes
  75. shard 1 shard 2 shard N
  76. shard 1 shard 2 shard N
  77. Schemanator
  78. shard 1 shard 2 shard N
  79. shard 1 shard 2 shard NSET SQL_LOG_BIN = 0; ALTER TABLE user ....
  80. shard migration
  81. Why?
  82. Prevent disk from filling
  83. Prevent disk from fillingHigh traffic objects (shops, users)
  84. Prevent disk from fillingHigh traffic objects (shops, users)Shard rebalancing
  85. When?
  86. Balance
  87. Added Shards
  88. per object migration <object type> <object id> <shard># migrate_object User 5307827 2
  89. percentage migration<object type> <percent> <old shard> <new shard> # migrate_pct User 25 3 6
  90. index user_id shard_id migration_lock old_shard_id 1 1 0 0 shard 1 shard 2 shard N
  91. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock shard 1 shard 2 shard N
  92. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock •Migrate shard 1 shard 2 shard N
  93. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock •Migrate •Checksum shard 1 shard 2 shard N
  94. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock •Migrate •Checksum shard 1 shard 2 shard N
  95. index user_id shard_id migration_lock old_shard_id 1 2 0 1 •Lock •Migrate •Checksum •Unlock shard 1 shard 2 shard N
  96. index user_id shard_id migration_lock old_shard_id 1 2 0 1 •Lock •Migrate •Checksum •Unlock •Delete (from old shard) shard 1 shard 2 shard N
  97. Usage Patterns
  98. Arbitrary Key Hash
  99. tag1 tag2 co_occurrence _count“red” “cloth” 666
  100. tag1 tag2 shard_id “red” “cloth” 1“vintage” “doll” 3“antique” “radio” 5 “gift” “vinyl” 2 hash_bucket shard_id “toy” “car” 1 1 2 “wool” “felt” 2 “floral”“wood” “wreath” “table” 5 8 OR 2 3 3 1 “box” “wood” 4 4 2 “doll” “happy” 5 5 3 “smile” “clown” 3 “radio” “vintage” 10 “blue” “luggage” 8“shoes” “green” 12 ... ... ...
  101. 1. provide some key
  102. 1. provide some key2. compute corresponding hash bucket
  103. 1. provide some key2. compute corresponding hash bucket3. lookup hash bucket on index to find shard
  104. 1,000,000 buckets each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  105. 1,000,000 buckets each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  106. 1,000,000 buckets each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  107. 1,000,000 buckets each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  108. Partitions
  109. PARTITION BY RANGE (reference_timestamp)( PARTITION P5 VALUES LESS THAN (1317441600), PARTITION P6 VALUES LESS THAN (1320120000), PARTITION P7 VALUES LESS THAN (1322715600), PARTITION P8 VALUES LESS THAN (1325394000));
  110. Deleting a large partition:few hours, tons of disk IO
  111. Deleting a large partition: few hours, tons of disk IODropping a 2G partition with 2M rows :
  112. Deleting a large partition: few hours, tons of disk IODropping a 2G partition with 2M rows : < 1s
  113. # file= "shop_stats_syndication_hourly#P#P1345867200.ibd"# ln $file $file.remove"
  114. # file= "shop_stats_syndication_hourly#P#P1345867200.ibd"# ln $file $file.remove"# stat "shop_stats_syndication_hourly#P#P1345867200.ibd" File: `shop_stats_syndication_hourly#P#P1345867200.ibd Size: 65536 Blocks: 136 IO Block: 4096 regular fileDevice: 6804h/26628d Inode: 41321163 Links: 2Access: (0660/-rw-rw----) Uid: ( 104/ mysql) Gid: ( 106/ mysql)
  115. tickets indexshard 1 shard 2 shard N
  116. Thank youetsy.com/jobs

×