The Etsy Shard Architecture: Starts With S and Ends With Hard

  • 20,891 views
Uploaded on

Overview of the etsy shard architecture

Overview of the etsy shard architecture

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
20,891
On Slideshare
0
From Embeds
0
Number of Embeds
31

Actions

Shares
Downloads
280
Comments
0
Likes
67

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The Etsy Shard Architecture Starts With S and Ends With Hard jgoulah@etsy.com / @johngoulah
  • 2. 1.5B page views / mo.525MM sales in 201140MM unique visitors/mo.800K shops / 150 countries
  • 3. 25K+ queries/sec avg3TB InnoDB buffer pool15TB+ data stored99.99% queries under 1ms
  • 4. 50+ MySQL servers Server Spec HP DL 380 G7 96GB RAM16 spindles / 1TB RAID 10 24 Core
  • 5. Ross SnyderScaling Etsy - What Went Wrong, What Went Right http://bit.ly/rpcxtP Matt Graham Migrating From PG to MySQL Without Downtime http://bit.ly/rQpqZG
  • 6. Architecture
  • 7. Redundancy
  • 8. Master - Master
  • 9. Master - Master R/W R/W
  • 10. Master - Master R/W R/W Side A Side B
  • 11. Scalability
  • 12. shard 1 shard 2 shard N ...
  • 13. shard 1 shard 2 shard N ... shard N + 1
  • 14. shard 1 shard 2 shard N ...Migrate Migrate Migrate shard N + 1
  • 15. Bird’s-Eye View
  • 16. tickets indexshard 1 shard 2 shard N
  • 17. tickets index Unique IDsshard 1 shard 2 shard N
  • 18. tickets index Shard Lookupshard 1 shard 2 shard N
  • 19. tickets indexshard 1 shard 2 shard N Store/Retrieve Data
  • 20. Basics
  • 21. users_groupsuser_id group_id 1 A 1 B 2 A 2 C 3 A 3 B 3 C
  • 22. users_groupsuser_id group_id 1 A 1 B 2 A 2 C 3 A 3 B 3 C
  • 23. users_groupsuser_id group_id 1 A 1 B 2 A user_id group_id 2 C 3 A 3 A 3 B 3 B 3 C 3 C
  • 24. users_groups shard 1user_id group_id 1 A 1 B shard 2 2 A user_id group_id 2 C 3 A 3 B 3 C
  • 25. Index Servers
  • 26. Shards NOT Determined by key hashing range partitions partitioning by function
  • 27. Look-Up Data
  • 28. indexshard 1 shard 2 shard N
  • 29. index select shard_id from user_index where user_id = Xshard 1 shard 2 shard N
  • 30. index select shard_id from user_index where user_id = X returns 1shard 1 shard 2 shard N
  • 31. index select join_date from users where user_id = Xshard 1 shard 2 shard N
  • 32. index select join_date from users where user_id = X returns 2012-02-05shard 1 shard 2 shard N
  • 33. Ticket Servers
  • 34. Globally Unique ID
  • 35. CREATE TABLE `tickets` ( `id` bigint(20) unsigned NOT NULL auto_increment, `stub` char(1) NOT NULL default , PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`)) ENGINE=MyISAM
  • 36. Ticket GenerationREPLACE INTO tickets (stub) VALUES (a);SELECT LAST_INSERT_ID();
  • 37. Ticket GenerationREPLACE INTO tickets (stub) VALUES (a);SELECT LAST_INSERT_ID();SELECT * FROM tickets; id stub 4589294 a
  • 38. tickets A auto-increment-increment = 2 auto-increment-offset = 1tickets B auto-increment-increment = 2 auto-increment-offset = 2
  • 39. tickets A auto-increment-increment = 2 auto-increment-offset = 1tickets B auto-increment-increment = 2 auto-increment-offset = 2 NOT master-master
  • 40. Shards
  • 41. Object Hashing
  • 42. A Buser_id : 500
  • 43. A Buser_id : 500 % (# active replicants)
  • 44. A Betsy_index_A => mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_index_B => mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_shard_001_A => mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_001_B => mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_A => mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_B => mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_A => mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_B => mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw, user_id : 500 % (# active replicants)
  • 45. A Betsy_index_A => mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_index_B => mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_shard_001_A => mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_001_B => mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_A => mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_B => mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_A => mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_B => mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw, user_id : 500 % (# active replicants)
  • 46. A Buser_id : 500 % (2)
  • 47. A Buser_id : 500 % (2) == 0
  • 48. A B select ...user_id : 500 % (2) == 0 insert ... update ...
  • 49. A Buser_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 50. 500 A B 501select ... select ...insert ... insert ...update ... update ...user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 51. Failure
  • 52. A Buser_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 53. A Buser_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 54. A Buser_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 55. A Betsy_index_A => mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_index_B => mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_shard_001_A => mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_001_B => mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_A => mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_B => mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_A => mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_B => mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw, user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 56. A Betsy_index_A => mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_index_B => mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_shard_001_A => mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_001_B => mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_A => mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_B => mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_A => mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_B => mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw, user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 57. A Buser_id : 500 % (1) == 0 user_id : 501 % (1) == 0
  • 58. ORM
  • 59. connection handling shard lookup replicant selection
  • 60. CRUDcache handling data validationdata abstraction
  • 61. Shard Selection
  • 62. Non-Writable Shards$config["non_writable_shards"] = array(1, 2, 3, 4); public static function getKnownWritableShards(){ return array_values( array_diff( self::getKnownShards(), self::getNonwritableShards() )); }
  • 63. Initial Selection$shards = EtsyORM::getKnownWritableShards();$user_shard = $shards[rand(0, count($shards) - 1)]; user_id shard_id 500
  • 64. Initial Selection$shards = EtsyORM::getKnownWritableShards();$user_shard = $shards[rand(0, count($shards) - 1)]; user_id shard_id 500 2
  • 65. Later.... select shard_id from user_index index where user_id = X shard 1 shard 2 shard N
  • 66. Variants
  • 67. shard 1 shard 2 user_id group_id user_id group_id 1 A 3 A 1 B 3 B 2 A 4 A 2 C 5 CSELECT user_id FROM users_groups WHERE group_id = ‘A’
  • 68. shard 1 shard 2 user_id group_id user_id group_id 1 A 3 A 1 B 3 B 2 A 4 A 2 C 5 CSELECT user_id FROM users_groups WHERE group_id = ‘A’ Broken!
  • 69. shard 1 shard 2 user_id group_id user_id group_id 1 1 A B JOIN? 3 3 A B 2 A 4 A 2 C 5 CSELECT user_id FROM users_groups WHERE group_id = ‘A’ Broken!
  • 70. shard 1 shard 2 user_id group_id user_id group_id 1 1 A B JOIN? 3 3 A B 2 A 4 A 2 C 5 CSELECT user_id FROM users_groups WHERE group_id = ‘A’ Broken!
  • 71. users_groups groups_usersuser_id group_id group_id user_id 1 A A 1 1 B A 3 2 A A 2 2 C B 3 3 A B 1 3 B C 2 3 C C 3
  • 72. users_groups_index groups_users_index user_id shard_id group_id shard_idindex 1 1 A 1 2 1 B 2 3 2 C 2 4 3 D 3 separate indexes for different slices of data
  • 73. users_groups_index groups_users_index user_id shard_id group_id shard_idindex 1 1 A 1 2 1 B 2 3 2 C 2 4 3 D 3 user_id group_id shard 3 4 A 4 B 4 C 4 D
  • 74. Schema Changes
  • 75. shard 1 shard 2 shard N
  • 76. shard 1 shard 2 shard N
  • 77. Schemanator
  • 78. shard 1 shard 2 shard N
  • 79. shard 1 shard 2 shard NSET SQL_LOG_BIN = 0; ALTER TABLE user ....
  • 80. shard migration
  • 81. Why?
  • 82. Prevent disk from filling
  • 83. Prevent disk from fillingHigh traffic objects (shops, users)
  • 84. Prevent disk from fillingHigh traffic objects (shops, users)Shard rebalancing
  • 85. When?
  • 86. Balance
  • 87. Added Shards
  • 88. per object migration <object type> <object id> <shard># migrate_object User 5307827 2
  • 89. percentage migration<object type> <percent> <old shard> <new shard> # migrate_pct User 25 3 6
  • 90. index user_id shard_id migration_lock old_shard_id 1 1 0 0 shard 1 shard 2 shard N
  • 91. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock shard 1 shard 2 shard N
  • 92. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock •Migrate shard 1 shard 2 shard N
  • 93. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock •Migrate •Checksum shard 1 shard 2 shard N
  • 94. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock •Migrate •Checksum shard 1 shard 2 shard N
  • 95. index user_id shard_id migration_lock old_shard_id 1 2 0 1 •Lock •Migrate •Checksum •Unlock shard 1 shard 2 shard N
  • 96. index user_id shard_id migration_lock old_shard_id 1 2 0 1 •Lock •Migrate •Checksum •Unlock •Delete (from old shard) shard 1 shard 2 shard N
  • 97. Usage Patterns
  • 98. Arbitrary Key Hash
  • 99. tag1 tag2 co_occurrence _count“red” “cloth” 666
  • 100. tag1 tag2 shard_id “red” “cloth” 1“vintage” “doll” 3“antique” “radio” 5 “gift” “vinyl” 2 hash_bucket shard_id “toy” “car” 1 1 2 “wool” “felt” 2 “floral”“wood” “wreath” “table” 5 8 OR 2 3 3 1 “box” “wood” 4 4 2 “doll” “happy” 5 5 3 “smile” “clown” 3 “radio” “vintage” 10 “blue” “luggage” 8“shoes” “green” 12 ... ... ...
  • 101. 1. provide some key
  • 102. 1. provide some key2. compute corresponding hash bucket
  • 103. 1. provide some key2. compute corresponding hash bucket3. lookup hash bucket on index to find shard
  • 104. 1,000,000 buckets each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  • 105. 1,000,000 buckets each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  • 106. 1,000,000 buckets each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  • 107. 1,000,000 buckets each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  • 108. Partitions
  • 109. PARTITION BY RANGE (reference_timestamp)( PARTITION P5 VALUES LESS THAN (1317441600), PARTITION P6 VALUES LESS THAN (1320120000), PARTITION P7 VALUES LESS THAN (1322715600), PARTITION P8 VALUES LESS THAN (1325394000));
  • 110. Deleting a large partition:few hours, tons of disk IO
  • 111. Deleting a large partition: few hours, tons of disk IODropping a 2G partition with 2M rows :
  • 112. Deleting a large partition: few hours, tons of disk IODropping a 2G partition with 2M rows : < 1s
  • 113. # file= "shop_stats_syndication_hourly#P#P1345867200.ibd"# ln $file $file.remove"
  • 114. # file= "shop_stats_syndication_hourly#P#P1345867200.ibd"# ln $file $file.remove"# stat "shop_stats_syndication_hourly#P#P1345867200.ibd" File: `shop_stats_syndication_hourly#P#P1345867200.ibd Size: 65536 Blocks: 136 IO Block: 4096 regular fileDevice: 6804h/26628d Inode: 41321163 Links: 2Access: (0660/-rw-rw----) Uid: ( 104/ mysql) Gid: ( 106/ mysql)
  • 115. tickets indexshard 1 shard 2 shard N
  • 116. Thank youetsy.com/jobs