Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Sharding: patterns and 
antipatterns 
Konstantin Osipov (Mail.Ru, Tarantool) 
Alexey Rybak (Badoo)
Big picture: scalable databases 
● replication 
● sharding and re-sharding 
● distributed queries & jobs, Map/Reduce 
● DD...
Contents 
I. sharding function 
II. routing 
III.re-sharding
I. Sharding function
Selecting a good shard key 
● the identified object 
should be small 
● some data you won’t be 
able to shard (and have to...
Good and bad shard keys 
● good: user session, shopping order 
● maybe: user (if user data isn’t too thick) 
● bad: invent...
Garage sharding: numbers 
● replication based doubling (2, 4, 8, out of 
cash) 
● the magic number 48 (2✕3✕4)
Garage sharding thru hashing 
● good: remainders 
o f(key) ≡ key % n_srv 
o f(key) ≡ crc32(key) % n_srv 
● bad: first logi...
Sharding for grown-ups 
● table function 
● consistent hashing
Table functions 
● virtual buckets: key -> bucket -> shard 
o “key -> bucket” function, “bucket -> shard” table 
o “key ->...
Consistent hashing 
● Danny Lewin RIP 
● Kinda ring and like... 
uhm... points, you 
know ... 
● Libraries: Ketama
Guava/Sumbur 
● f(key, n_servers) => server_id 
● strictly uniform key-to-server mapping 
● recurrence formula (15 lines o...
II. Routing
Routing types 
● smart client 
● coordinator 
● proxy 
● local proxy on every app server 
● intra-database routing
Smart Client 
● no extra hops 
● all clients 
(PHP/Python/C...) 
should implement 
it 
● resharding is hard
Proxy 
● encapsulates routing logic 
● extra hop, traffic 
● +1 service 
● SPOF 
=> local proxy
Coordinator 
● centralized 
knowledge 
● SPOF
Intra-database routing 
● too many nodes 
● redundancy is high 
● ad-hoc requests
III.Re-sharding
Re-sharding is a pain 
● redistribution impacts: 
o clients 
o network performance 
o consistency 
=> maintenance time win...
Best practice: no data redistribution 
● update is a move 
● data expiration (new data on new servers) 
● new data on sele...
DDL 
● upgrade your app 
● upgrade your database 
● update your app and remove any trace of old 
schema
Thank you! Questions? 
kostja@tarantool.org 
fisher@corp.badoo.com
Upcoming SlideShare
Loading in …5
×

Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак

3,758 views

Published on

Доклад Константина Осипова (Mail.Ru, Tarantool) и Алексея Рыбака (Badoo)

Published in: Internet
  • Be the first to comment

Sharding - patterns & antipatterns, Константин Осипов, Алексей Рыбак

  1. 1. Sharding: patterns and antipatterns Konstantin Osipov (Mail.Ru, Tarantool) Alexey Rybak (Badoo)
  2. 2. Big picture: scalable databases ● replication ● sharding and re-sharding ● distributed queries & jobs, Map/Reduce ● DDL ● will focus on sharding/re-sharding only
  3. 3. Contents I. sharding function II. routing III.re-sharding
  4. 4. I. Sharding function
  5. 5. Selecting a good shard key ● the identified object should be small ● some data you won’t be able to shard (and have to duplicate in each shard) ● don’t store the key if you don’t have to
  6. 6. Good and bad shard keys ● good: user session, shopping order ● maybe: user (if user data isn’t too thick) ● bad: inventory item, order date
  7. 7. Garage sharding: numbers ● replication based doubling (2, 4, 8, out of cash) ● the magic number 48 (2✕3✕4)
  8. 8. Garage sharding thru hashing ● good: remainders o f(key) ≡ key % n_srv o f(key) ≡ crc32(key) % n_srv ● bad: first login letter
  9. 9. Sharding for grown-ups ● table function ● consistent hashing
  10. 10. Table functions ● virtual buckets: key -> bucket -> shard o “key -> bucket” function, “bucket -> shard” table o “key -> bucket” table, “bucket -> shard” table
  11. 11. Consistent hashing ● Danny Lewin RIP ● Kinda ring and like... uhm... points, you know ... ● Libraries: Ketama
  12. 12. Guava/Sumbur ● f(key, n_servers) => server_id ● strictly uniform key-to-server mapping ● recurrence formula (15 lines of code)
  13. 13. II. Routing
  14. 14. Routing types ● smart client ● coordinator ● proxy ● local proxy on every app server ● intra-database routing
  15. 15. Smart Client ● no extra hops ● all clients (PHP/Python/C...) should implement it ● resharding is hard
  16. 16. Proxy ● encapsulates routing logic ● extra hop, traffic ● +1 service ● SPOF => local proxy
  17. 17. Coordinator ● centralized knowledge ● SPOF
  18. 18. Intra-database routing ● too many nodes ● redundancy is high ● ad-hoc requests
  19. 19. III.Re-sharding
  20. 20. Re-sharding is a pain ● redistribution impacts: o clients o network performance o consistency => maintenance time window ● forget about it on petabyte scale
  21. 21. Best practice: no data redistribution ● update is a move ● data expiration (new data on new servers) ● new data on selected servers
  22. 22. DDL ● upgrade your app ● upgrade your database ● update your app and remove any trace of old schema
  23. 23. Thank you! Questions? kostja@tarantool.org fisher@corp.badoo.com

×