• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Key-Value-Stores -- The Key to Scaling?
 

Key-Value-Stores -- The Key to Scaling?

on

  • 6,122 views

 

Statistics

Views

Total Views
6,122
Views on SlideShare
6,122
Embed Views
0

Actions

Likes
8
Downloads
109
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Key-Value-Stores -- The Key to Scaling? Key-Value-Stores -- The Key to Scaling? Presentation Transcript

    • Key-Value-Stores: The Key to Scaling? Tim Lossen
    • Who? • @tlossen • backend developer - Ruby, Rails, Sinatra ... • passionate about technology
    • Problem
    • Challenge • backend for facebook game
    • Challenge • backend for facebook game • expected load: - 1 mio. daily active users - 20 mio. total users - 100 KB data per user
    • Challenge • expected peak traffic: - 10.000 concurrent users - 200.000 requests / minute
    • Challenge • expected peak traffic: - 10.000 concurrent users - 200.000 requests / minute • write-heavy workload
    • Wanted • scalable database • with high throughput - especially for writes
    • Options • relational database - with sharding
    • Options • relational database - with sharding • nosql database - key-value-store - document db - graph db
    • Options • relational database - with sharding • nosql database - key-value-store - document db - graph db
    • Options • relational database - with sharding • nosql database - key-value-store - document db - graph db
    • Options • relational database - with sharding • nosql database - key-value-store - document db - graph db
    • Shortlist • Cassandra • Redis • Membase
    • Cassandra
    • Facts • written in Java - 55.000 lines of code • Thrift API - clients for Java, Ruby, Python ...
    • History • originally developed by Facebook - in production for “Inbox Search” • later open-sourced - top-level Apache project
    • Features • high availability - no single point of failure • incremental scalability • eventual consistency
    • Architecture • Dynamo-like hash ring - partitioning + replication - all nodes are equal
    • Hash Ring
    • Architecture • Dynamo-like hash ring - partitioning + replication - all nodes are equal • Bigtable data model - column families - supercolumns
    • “Cassandra aims to run on an in"astructure of hundreds of nodes.”
    • Redis
    • Facts • written in C - 13.000 lines of code • socket API - redis-cli - client libs for all major languages
    • Features • high read & write throughput - 50.000 to 100.000 ops / second
    • Features • high read & write throughput - 50.000 to 100.000 ops / second • interesting data structures - lists, hashes, (sorted) sets - atomic operations
    • Features • high read & write throughput - 50.000 to 100.000 ops / second • interesting data structures - lists, hashes, (sorted) sets - atomic operations • strong consistency
    • Architecture • in-memory database - append-only log on disk - virtual memory
    • Architecture • in-memory database - append-only log on disk - virtual memory • single instance - master-slave replication - clustering is on roadmap
    • “Memory is the new disk, disk is the new tape.” — Jim Gray
    • Membase
    • Facts • written in C and Erlang • API-compatible to Memcached - same protocol • client libs for all major languages
    • History • developed by NorthScale & Zynga - used in production (Farmville) • released in June 2010 - Apache 2.0 License
    • Features • “Memcached with persistence” - extremely fast - throughput scales linearly
    • Features • “Memcached with persistence” - extremely fast - throughput scales linearly • automatic data placement - memory, ssd, disk
    • Features • “Memcached with persistence” - extremely fast - throughput scales linearly • automatic data placement - memory, ssd, disk • configurable replica count
    • Architecture • cluster - all nodes are alike - one elected as “coordinator”
    • Architecture • cluster - all nodes are alike - one elected as “coordinator” • each node is master for part of key space - handles all reads & writes
    • Mapping Scheme
    • “simple, fast, elastic”
    • Solution
    • Which one would you pick?
    • Decision • Cassandra ?
    • Decision • Cassandra ? - too big, too complicated
    • Decision • Cassandra ? - too big, too complicated • Membase ?
    • Decision • Cassandra ? - too big, too complicated • Membase ? - not yet available (then)
    • Decision • Cassandra ? - too big, too complicated • Membase ? - not yet available (then) • Redis !
    • Motivation • keep operations simple • use as few machines as possible - ideally, only one
    • Design • two machines (+ load balancer) - Redis master handles all reads / writes - Redis slave as hot standby
    • Design • two machines (+ load balancer) - Redis master handles all reads / writes - Redis slave as hot standby - both machines used as app servers
    • Design • two machines (+ load balancer) - Redis master handles all reads / writes - Redis slave as hot standby - both machines used as app servers • dedicated hardware
    • Data model • one Redis hash per user - key: facebook id • store data as serialized JSON - booleans, strings, numbers, timestamps ...
    • Advantages • turns Redis into “document db” - efficient to swap user data in / out - atomic ops on parts • easy to dump / restore user data
    • Capacity • 4 GB memory for 20 mio. integer keys - keys always stay in memory!
    • Capacity • 4 GB memory for 20 mio. integer keys - keys always stay in memory! • 2 GB memory for 10.000 user hashes - others can be swapped out
    • Capacity • 4 GB memory for 20 mio. integer keys - keys always stay in memory! • 2 GB memory for 10.000 user hashes - others can be swapped out • 3.6 mio. ops / minute - sufficient for 200.000 requests
    • Status • game was launched in august - currently still in beta
    • Status • game was launched in august - currently still in beta • expect to reach 1 mio. daily active users in Q1/2011
    • Status • game was launched in august - currently still in beta • expect to reach 1 mio. daily active users in Q1/2011 • will try to stick to 2 or 3 machines - possibly bigger / faster ones
    • Conclusions • use the right tool for the job
    • Conclusions • use the right tool for the job • keep it simple - avoid sharding, if possible
    • Conclusions • use the right tool for the job • keep it simple - avoid sharding, if possible • don’t scale out too early - but have a viable “plan b”
    • Conclusions • use the right tool for the job • keep it simple - avoid sharding, if possible • don’t scale out too early - but have a viable “plan b” • use dedicated hardware
    • Q&A
    • Links • cassandra.apache.org • redis.io • membase.org • tim.lossen.de