Key-Value-Stores -- The Key to Scaling?

8,440 views
8,282 views

Published on

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,440
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
164
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Key-Value-Stores -- The Key to Scaling?

  1. 1. Key-Value-Stores: The Key to Scaling? Tim Lossen
  2. 2. Who? • @tlossen • backend developer - Ruby, Rails, Sinatra ... • passionate about technology
  3. 3. Problem
  4. 4. Challenge • backend for facebook game
  5. 5. Challenge • backend for facebook game • expected load: - 1 mio. daily active users - 20 mio. total users - 100 KB data per user
  6. 6. Challenge • expected peak traffic: - 10.000 concurrent users - 200.000 requests / minute
  7. 7. Challenge • expected peak traffic: - 10.000 concurrent users - 200.000 requests / minute • write-heavy workload
  8. 8. Wanted • scalable database • with high throughput - especially for writes
  9. 9. Options • relational database - with sharding
  10. 10. Options • relational database - with sharding • nosql database - key-value-store - document db - graph db
  11. 11. Options • relational database - with sharding • nosql database - key-value-store - document db - graph db
  12. 12. Options • relational database - with sharding • nosql database - key-value-store - document db - graph db
  13. 13. Options • relational database - with sharding • nosql database - key-value-store - document db - graph db
  14. 14. Shortlist • Cassandra • Redis • Membase
  15. 15. Cassandra
  16. 16. Facts • written in Java - 55.000 lines of code • Thrift API - clients for Java, Ruby, Python ...
  17. 17. History • originally developed by Facebook - in production for “Inbox Search” • later open-sourced - top-level Apache project
  18. 18. Features • high availability - no single point of failure • incremental scalability • eventual consistency
  19. 19. Architecture • Dynamo-like hash ring - partitioning + replication - all nodes are equal
  20. 20. Hash Ring
  21. 21. Architecture • Dynamo-like hash ring - partitioning + replication - all nodes are equal • Bigtable data model - column families - supercolumns
  22. 22. “Cassandra aims to run on an in"astructure of hundreds of nodes.”
  23. 23. Redis
  24. 24. Facts • written in C - 13.000 lines of code • socket API - redis-cli - client libs for all major languages
  25. 25. Features • high read & write throughput - 50.000 to 100.000 ops / second
  26. 26. Features • high read & write throughput - 50.000 to 100.000 ops / second • interesting data structures - lists, hashes, (sorted) sets - atomic operations
  27. 27. Features • high read & write throughput - 50.000 to 100.000 ops / second • interesting data structures - lists, hashes, (sorted) sets - atomic operations • strong consistency
  28. 28. Architecture • in-memory database - append-only log on disk - virtual memory
  29. 29. Architecture • in-memory database - append-only log on disk - virtual memory • single instance - master-slave replication - clustering is on roadmap
  30. 30. “Memory is the new disk, disk is the new tape.” — Jim Gray
  31. 31. Membase
  32. 32. Facts • written in C and Erlang • API-compatible to Memcached - same protocol • client libs for all major languages
  33. 33. History • developed by NorthScale & Zynga - used in production (Farmville) • released in June 2010 - Apache 2.0 License
  34. 34. Features • “Memcached with persistence” - extremely fast - throughput scales linearly
  35. 35. Features • “Memcached with persistence” - extremely fast - throughput scales linearly • automatic data placement - memory, ssd, disk
  36. 36. Features • “Memcached with persistence” - extremely fast - throughput scales linearly • automatic data placement - memory, ssd, disk • configurable replica count
  37. 37. Architecture • cluster - all nodes are alike - one elected as “coordinator”
  38. 38. Architecture • cluster - all nodes are alike - one elected as “coordinator” • each node is master for part of key space - handles all reads & writes
  39. 39. Mapping Scheme
  40. 40. “simple, fast, elastic”
  41. 41. Solution
  42. 42. Which one would you pick?
  43. 43. Decision • Cassandra ?
  44. 44. Decision • Cassandra ? - too big, too complicated
  45. 45. Decision • Cassandra ? - too big, too complicated • Membase ?
  46. 46. Decision • Cassandra ? - too big, too complicated • Membase ? - not yet available (then)
  47. 47. Decision • Cassandra ? - too big, too complicated • Membase ? - not yet available (then) • Redis !
  48. 48. Motivation • keep operations simple • use as few machines as possible - ideally, only one
  49. 49. Design • two machines (+ load balancer) - Redis master handles all reads / writes - Redis slave as hot standby
  50. 50. Design • two machines (+ load balancer) - Redis master handles all reads / writes - Redis slave as hot standby - both machines used as app servers
  51. 51. Design • two machines (+ load balancer) - Redis master handles all reads / writes - Redis slave as hot standby - both machines used as app servers • dedicated hardware
  52. 52. Data model • one Redis hash per user - key: facebook id • store data as serialized JSON - booleans, strings, numbers, timestamps ...
  53. 53. Advantages • turns Redis into “document db” - efficient to swap user data in / out - atomic ops on parts • easy to dump / restore user data
  54. 54. Capacity • 4 GB memory for 20 mio. integer keys - keys always stay in memory!
  55. 55. Capacity • 4 GB memory for 20 mio. integer keys - keys always stay in memory! • 2 GB memory for 10.000 user hashes - others can be swapped out
  56. 56. Capacity • 4 GB memory for 20 mio. integer keys - keys always stay in memory! • 2 GB memory for 10.000 user hashes - others can be swapped out • 3.6 mio. ops / minute - sufficient for 200.000 requests
  57. 57. Status • game was launched in august - currently still in beta
  58. 58. Status • game was launched in august - currently still in beta • expect to reach 1 mio. daily active users in Q1/2011
  59. 59. Status • game was launched in august - currently still in beta • expect to reach 1 mio. daily active users in Q1/2011 • will try to stick to 2 or 3 machines - possibly bigger / faster ones
  60. 60. Conclusions • use the right tool for the job
  61. 61. Conclusions • use the right tool for the job • keep it simple - avoid sharding, if possible
  62. 62. Conclusions • use the right tool for the job • keep it simple - avoid sharding, if possible • don’t scale out too early - but have a viable “plan b”
  63. 63. Conclusions • use the right tool for the job • keep it simple - avoid sharding, if possible • don’t scale out too early - but have a viable “plan b” • use dedicated hardware
  64. 64. Q&A
  65. 65. Links • cassandra.apache.org • redis.io • membase.org • tim.lossen.de

×