Your SlideShare is downloading. ×
  • Like
The Etsy Shard Architecture: Starts With S and Ends With Hard
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

The Etsy Shard Architecture: Starts With S and Ends With Hard

  • 21,510 views
Published

Overview of the etsy shard architecture

Overview of the etsy shard architecture

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
21,510
On SlideShare
0
From Embeds
0
Number of Embeds
31

Actions

Shares
Downloads
286
Comments
0
Likes
68

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The Etsy Shard Architecture Starts With S and Ends With Hard jgoulah@etsy.com / @johngoulah
  • 2. 1.5B page views / mo.525MM sales in 201140MM unique visitors/mo.800K shops / 150 countries
  • 3. 25K+ queries/sec avg3TB InnoDB buffer pool15TB+ data stored99.99% queries under 1ms
  • 4. 50+ MySQL servers Server Spec HP DL 380 G7 96GB RAM16 spindles / 1TB RAID 10 24 Core
  • 5. Ross SnyderScaling Etsy - What Went Wrong, What Went Right http://bit.ly/rpcxtP Matt Graham Migrating From PG to MySQL Without Downtime http://bit.ly/rQpqZG
  • 6. Architecture
  • 7. Redundancy
  • 8. Master - Master
  • 9. Master - Master R/W R/W
  • 10. Master - Master R/W R/W Side A Side B
  • 11. Scalability
  • 12. shard 1 shard 2 shard N ...
  • 13. shard 1 shard 2 shard N ... shard N + 1
  • 14. shard 1 shard 2 shard N ...Migrate Migrate Migrate shard N + 1
  • 15. Bird’s-Eye View
  • 16. tickets indexshard 1 shard 2 shard N
  • 17. tickets index Unique IDsshard 1 shard 2 shard N
  • 18. tickets index Shard Lookupshard 1 shard 2 shard N
  • 19. tickets indexshard 1 shard 2 shard N Store/Retrieve Data
  • 20. Basics
  • 21. users_groupsuser_id group_id 1 A 1 B 2 A 2 C 3 A 3 B 3 C
  • 22. users_groupsuser_id group_id 1 A 1 B 2 A 2 C 3 A 3 B 3 C
  • 23. users_groupsuser_id group_id 1 A 1 B 2 A user_id group_id 2 C 3 A 3 A 3 B 3 B 3 C 3 C
  • 24. users_groups shard 1user_id group_id 1 A 1 B shard 2 2 A user_id group_id 2 C 3 A 3 B 3 C
  • 25. Index Servers
  • 26. Shards NOT Determined by key hashing range partitions partitioning by function
  • 27. Look-Up Data
  • 28. indexshard 1 shard 2 shard N
  • 29. index select shard_id from user_index where user_id = Xshard 1 shard 2 shard N
  • 30. index select shard_id from user_index where user_id = X returns 1shard 1 shard 2 shard N
  • 31. index select join_date from users where user_id = Xshard 1 shard 2 shard N
  • 32. index select join_date from users where user_id = X returns 2012-02-05shard 1 shard 2 shard N
  • 33. Ticket Servers
  • 34. Globally Unique ID
  • 35. CREATE TABLE `tickets` ( `id` bigint(20) unsigned NOT NULL auto_increment, `stub` char(1) NOT NULL default , PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`)) ENGINE=MyISAM
  • 36. Ticket GenerationREPLACE INTO tickets (stub) VALUES (a);SELECT LAST_INSERT_ID();
  • 37. Ticket GenerationREPLACE INTO tickets (stub) VALUES (a);SELECT LAST_INSERT_ID();SELECT * FROM tickets; id stub 4589294 a
  • 38. tickets A auto-increment-increment = 2 auto-increment-offset = 1tickets B auto-increment-increment = 2 auto-increment-offset = 2
  • 39. tickets A auto-increment-increment = 2 auto-increment-offset = 1tickets B auto-increment-increment = 2 auto-increment-offset = 2 NOT master-master
  • 40. Shards
  • 41. Object Hashing
  • 42. A Buser_id : 500
  • 43. A Buser_id : 500 % (# active replicants)
  • 44. A Betsy_index_A => mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_index_B => mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_shard_001_A => mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_001_B => mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_A => mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_B => mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_A => mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_B => mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw, user_id : 500 % (# active replicants)
  • 45. A Betsy_index_A => mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_index_B => mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_shard_001_A => mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_001_B => mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_A => mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_B => mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_A => mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_B => mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw, user_id : 500 % (# active replicants)
  • 46. A Buser_id : 500 % (2)
  • 47. A Buser_id : 500 % (2) == 0
  • 48. A B select ...user_id : 500 % (2) == 0 insert ... update ...
  • 49. A Buser_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 50. 500 A B 501select ... select ...insert ... insert ...update ... update ...user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 51. Failure
  • 52. A Buser_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 53. A Buser_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 54. A Buser_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 55. A Betsy_index_A => mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_index_B => mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_shard_001_A => mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_001_B => mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_A => mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_B => mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_A => mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_B => mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw, user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 56. A Betsy_index_A => mysql:host=dbindex01.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_index_B => mysql:host=dbindex02.ny4.etsy.com;port=3306;dbname=etsy_index;user=etsy_rw,etsy_shard_001_A => mysql:host=dbshard01.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_001_B => mysql:host=dbshard02.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_A => mysql:host=dbshard03.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_002_B => mysql:host=dbshard04.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_A => mysql:host=dbshard05.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw,etsy_shard_003_B => mysql:host=dbshard06.ny4.etsy.com;port=3306;dbname=etsy_shard;user=etsy_rw, user_id : 500 % (2) == 0 user_id : 501 % (2) == 1
  • 57. A Buser_id : 500 % (1) == 0 user_id : 501 % (1) == 0
  • 58. ORM
  • 59. connection handling shard lookup replicant selection
  • 60. CRUDcache handling data validationdata abstraction
  • 61. Shard Selection
  • 62. Non-Writable Shards$config["non_writable_shards"] = array(1, 2, 3, 4); public static function getKnownWritableShards(){ return array_values( array_diff( self::getKnownShards(), self::getNonwritableShards() )); }
  • 63. Initial Selection$shards = EtsyORM::getKnownWritableShards();$user_shard = $shards[rand(0, count($shards) - 1)]; user_id shard_id 500
  • 64. Initial Selection$shards = EtsyORM::getKnownWritableShards();$user_shard = $shards[rand(0, count($shards) - 1)]; user_id shard_id 500 2
  • 65. Later.... select shard_id from user_index index where user_id = X shard 1 shard 2 shard N
  • 66. Variants
  • 67. shard 1 shard 2 user_id group_id user_id group_id 1 A 3 A 1 B 3 B 2 A 4 A 2 C 5 CSELECT user_id FROM users_groups WHERE group_id = ‘A’
  • 68. shard 1 shard 2 user_id group_id user_id group_id 1 A 3 A 1 B 3 B 2 A 4 A 2 C 5 CSELECT user_id FROM users_groups WHERE group_id = ‘A’ Broken!
  • 69. shard 1 shard 2 user_id group_id user_id group_id 1 1 A B JOIN? 3 3 A B 2 A 4 A 2 C 5 CSELECT user_id FROM users_groups WHERE group_id = ‘A’ Broken!
  • 70. shard 1 shard 2 user_id group_id user_id group_id 1 1 A B JOIN? 3 3 A B 2 A 4 A 2 C 5 CSELECT user_id FROM users_groups WHERE group_id = ‘A’ Broken!
  • 71. users_groups groups_usersuser_id group_id group_id user_id 1 A A 1 1 B A 3 2 A A 2 2 C B 3 3 A B 1 3 B C 2 3 C C 3
  • 72. users_groups_index groups_users_index user_id shard_id group_id shard_idindex 1 1 A 1 2 1 B 2 3 2 C 2 4 3 D 3 separate indexes for different slices of data
  • 73. users_groups_index groups_users_index user_id shard_id group_id shard_idindex 1 1 A 1 2 1 B 2 3 2 C 2 4 3 D 3 user_id group_id shard 3 4 A 4 B 4 C 4 D
  • 74. Schema Changes
  • 75. shard 1 shard 2 shard N
  • 76. shard 1 shard 2 shard N
  • 77. Schemanator
  • 78. shard 1 shard 2 shard N
  • 79. shard 1 shard 2 shard NSET SQL_LOG_BIN = 0; ALTER TABLE user ....
  • 80. shard migration
  • 81. Why?
  • 82. Prevent disk from filling
  • 83. Prevent disk from fillingHigh traffic objects (shops, users)
  • 84. Prevent disk from fillingHigh traffic objects (shops, users)Shard rebalancing
  • 85. When?
  • 86. Balance
  • 87. Added Shards
  • 88. per object migration <object type> <object id> <shard># migrate_object User 5307827 2
  • 89. percentage migration<object type> <percent> <old shard> <new shard> # migrate_pct User 25 3 6
  • 90. index user_id shard_id migration_lock old_shard_id 1 1 0 0 shard 1 shard 2 shard N
  • 91. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock shard 1 shard 2 shard N
  • 92. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock •Migrate shard 1 shard 2 shard N
  • 93. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock •Migrate •Checksum shard 1 shard 2 shard N
  • 94. index user_id shard_id migration_lock old_shard_id 1 1 1 0 •Lock •Migrate •Checksum shard 1 shard 2 shard N
  • 95. index user_id shard_id migration_lock old_shard_id 1 2 0 1 •Lock •Migrate •Checksum •Unlock shard 1 shard 2 shard N
  • 96. index user_id shard_id migration_lock old_shard_id 1 2 0 1 •Lock •Migrate •Checksum •Unlock •Delete (from old shard) shard 1 shard 2 shard N
  • 97. Usage Patterns
  • 98. Arbitrary Key Hash
  • 99. tag1 tag2 co_occurrence _count“red” “cloth” 666
  • 100. tag1 tag2 shard_id “red” “cloth” 1“vintage” “doll” 3“antique” “radio” 5 “gift” “vinyl” 2 hash_bucket shard_id “toy” “car” 1 1 2 “wool” “felt” 2 “floral”“wood” “wreath” “table” 5 8 OR 2 3 3 1 “box” “wood” 4 4 2 “doll” “happy” 5 5 3 “smile” “clown” 3 “radio” “vintage” 10 “blue” “luggage” 8“shoes” “green” 12 ... ... ...
  • 101. 1. provide some key
  • 102. 1. provide some key2. compute corresponding hash bucket
  • 103. 1. provide some key2. compute corresponding hash bucket3. lookup hash bucket on index to find shard
  • 104. 1,000,000 buckets each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  • 105. 1,000,000 buckets each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  • 106. 1,000,000 buckets each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  • 107. 1,000,000 buckets each with a row in arbitrary_key_index which points to a shard hash_bucket shard_id 1 2 2 3 3 1 4 2 5 3hash_bucket == hash(‘red’, ‘cloth’) % BUCKETS
  • 108. Partitions
  • 109. PARTITION BY RANGE (reference_timestamp)( PARTITION P5 VALUES LESS THAN (1317441600), PARTITION P6 VALUES LESS THAN (1320120000), PARTITION P7 VALUES LESS THAN (1322715600), PARTITION P8 VALUES LESS THAN (1325394000));
  • 110. Deleting a large partition:few hours, tons of disk IO
  • 111. Deleting a large partition: few hours, tons of disk IODropping a 2G partition with 2M rows :
  • 112. Deleting a large partition: few hours, tons of disk IODropping a 2G partition with 2M rows : < 1s
  • 113. # file= "shop_stats_syndication_hourly#P#P1345867200.ibd"# ln $file $file.remove"
  • 114. # file= "shop_stats_syndication_hourly#P#P1345867200.ibd"# ln $file $file.remove"# stat "shop_stats_syndication_hourly#P#P1345867200.ibd" File: `shop_stats_syndication_hourly#P#P1345867200.ibd Size: 65536 Blocks: 136 IO Block: 4096 regular fileDevice: 6804h/26628d Inode: 41321163 Links: 2Access: (0660/-rw-rw----) Uid: ( 104/ mysql) Gid: ( 106/ mysql)
  • 115. tickets indexshard 1 shard 2 shard N
  • 116. Thank youetsy.com/jobs