Cassandra as Memcache
Upcoming SlideShare
Loading in...5
×
 

Cassandra as Memcache

on

  • 13,003 views

Cassandra, TTL, used Memcache.

Cassandra, TTL, used Memcache.

Statistics

Views

Total Views
13,003
Views on SlideShare
12,999
Embed Views
4

Actions

Likes
4
Downloads
59
Comments
0

1 Embed 4

https://twitter.com 4

Accessibility

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Cassandra as Memcache Cassandra as Memcache Presentation Transcript

  • Cassandra as Memcache Edward Capriolo Media6Degrees.com
  • What we learned in Operating Systems
    • CPU (and registers) - Super FAST!
    • Main Memory - Fast
    • Hard Disks - Slow
  • What has changed since my first computer
    • 100 MHZ
    • 8 MB RAM
    • 1 GB Disk
    • 14.4kbps Modem
    • 686 Windowz 3.11
    • Packard Bell
    • Multiple Cores
    • @ 4GHZ
    • 2GB RAM
    • 2TB Disk
    • 1/10Gb Ethernet
    • 64 bit FC 14
    • Sadly no more Packard bell
  • The Present Situation
    • Computers are not and never will be fast or big enough
    • Until they take over and then they will be too fast and too big
  • Traditional two tier Web Application
    • User facing tier
        • Usually Apache|Tomcat|...
        • Speaks some CGI alternative php|jsp|cfm|...
        • Logging
        • Display
    • Back end
        • Usually an RDBMS
        • Stores and indexes data
        • Supports a data abstraction and manipulation language
  • Simple Schema
    • create table user (
    • id int auto_increment, name varchar UNIQUE,
    • pass varchar
    • )
    • create table book ( id int auto_increment, name varchar 25 unique, author varchar 25
    • )
    • Create table users_books ( uid int , bid int , unique (uid,bid), index (bid)
    • )
  • Some Queries you might see (user login)
    • Select id,pass from users where user.name=?
    • Totally random queries based on user login
    • Not often read - may not be helpful to cache
  • Some queries you might see (Books a user has read)
    • Select user.name, book.name
    • FROM user JOIN users_books
    • ON user.id=users_books.uid
    • JOIN book ON book.id=bid
    • WHERE user.id=?
    • More complex query
    • Two join conditions
    • Result might be on users start page
    • Result might be often used by algorithms
  • Some queries you might see (count all the read books)
    • Select user_books.bid, book.name, count(*) from user_books inner join books on user_books.bid=book.id group by user_books.bid, book.name
    • No where clause!
    • Possible table scan
    • Possible intermediate results to temp file
    • Result displayed on main index page
  • How fast are these queries?
    • Trick question!
    • How much data?
        • The Log-O for 'small' data sets is negligible
    • How fast are the disks?
        • Streaming much faster then seeking*
    • How many QPS?
        • More requests means more contention
    • How much RAM?
        • Unallocated RAM works as page cache...
  • Wait..Page Cache... what?
    • Virtual File System or VFS cache
    • RAM not in use by a process
    • Used to Cache Disk
    • Blocks read often get cached in RAM
    • large disk to RAM ratio reduces hit chance
  • Scaling RDBMS challenges
    • Scaling up
        • More RAM, DISK
        • Upper limit
    • Adding Slaves
        • Add read capacity
        • Does not add write capacity
        • Monitoring/fixing replication
    • Shard-ed
        • Possibly giving up DB features
        • Re-shard with growth
  • Enter Memcache
    • Key value store with no persistence*
    • Works with memory slabs
    • Set a key, value, and a Time To Live
    • Typically client controlled sharing
    • Normal Use Case
        • Check cache
        • If found in cache return
        • Else query and save in cache
    • Save resource by not re-querying mostly static, non transactional, and non time sensitive data
  • Memcache...Good Things
    • More control of cache then VFS cache
    • Saves web server memory vs HttpSession
    • Fast to store and access data
    • Simple to use
    • Clients for many languages
  • Memcache (possibly not so good things)
    • Memcache empty on shutdown
    • 8GB hash table better then 8GB more in your database machine?
    • Another tier to manage
    • Is it scalable?...
  • A highly un-suggested deployment
  • Enter Cassandra...
    • Data sharding and replication
    • Writing
        • Structured log format
        • Linear Writes to sorted memtable
        • Memtables flush (time,size,ops)
    • Reading
        • VFS Cache
        • Bloom filters
        • Row Cache
        • Key Cache
    • 0.7.X brings TTL fields!
  • So then... Cassandra is faster then memcache?
    • No!
        • Memcache is an in memory datastore
        • Cassandra has to persist data
    • But may be faster, more efficient, and easier to manage then separate memcache + database tier
  • Configuration 1: Defacto Standard
    • 5 Nodes
    • Replication Factor = 3
    • Key Cache
    • Results in:
        • Good Performance
        • Strong consistency
        • Highly fault tolerant
  • Configuration 2: Do not care about stale reads
    • 5 nodes
    • Replication Factor = 3
    • Row cache
    • Read Repair Chance = 0 %
    • Results in:
        • 1/3 rd the read traffic
        • Minor possibility of not found/out of sync data (not much different then memcache)
  • Configuration 3: Snitches get stitches
    • 5 nodes
    • Replication Factor = 3
    • Row Cache
    • Read Repair Chance = 0%
    • Dynamic Snitches + Pinning
    • Results in:
        • Reads should hit the same node not random replica
        • Caches on each node have less duplication
  • Configuration 4: Little Data, Big Request load!
    • 20 nodes
    • Replication Factor 20! (only this keyspace)
    • Row Cache
    • Read Repair Chance = 0%
    • Results in:
        • 20 nodes capable of serving this reads!
        • Writes do not scale (like master-slave replication)
  • To recap... Cassandra
    • 0.7.X brings Time To Live
    • 0.7.X brings Read Repair Chance
    • Can serve purely from memory
    • Can serve from disk
    • Replication Factor, Caching, Sharding many ways to tune
    • General Awesomeness