Your SlideShare is downloading. ×
0
Apache Cassandra:

   Distributed DB for Massive Scale

                  Austin JUG
Stu Hood (@stuhood) – Technical Lead,...
My, what a large/volatile dataset you
               have...
●   Large
    ●   Larger than 1 node can handle
●   Volatile
...
My, what a large/volatile dataset you
               have...
●   For example
    ●   Event/log data
    ●   Output of batc...
What about sharding?
●   shard n. - A horizontal partition in a database
    ●   Example: Sharding by userid
●   Limitatio...
What about sharding?
●   Options
    ●   Provided by ORM?
        –   Fixed partitions: manual rebalancing
    ●   Develop...
What about sharding?



    Pain in the ass.
Case Study: Digg
1.Vertical partitioning and master/slave trees
2.Developed sharding solution
  ●   IDDB
  ●   Awkward rep...
Relationship trouble: The NoSQL
           Movement
Lessons from the NoSQL Movement
●   Not Only SQL: Specialization!
    ●   Relational is not always the answer
    ●   Need...
Lessons from the NoSQL Movement
●   Understanding your data, finding the most efficient
    model
    ●   Key-value / Docu...
Lessons from the NoSQL Movement
●   Replication is not one-size-fits-all
    ●   Facebook multi datacenter caching
       ...
Lessons from the NoSQL Movement
●   Partition tolerance vs. consistency
    ●   Brewer's CAP Theorem
        –   Consisten...
Cassandra's Elders
Standing on the shoulders of:
             Amazon Dynamo
●   No node in the cluster is special
    ●   No special roles
  ...
Standing on the shoulders of:
              Google Bigtable
●   Column family data model
●   Range queries for rows:
    ●...
Enter Cassandra
●   Hybrid of ancestors
    ●   Adopts listed features
●   And adds:
    ●   A sweet logo!
    ●   Pluggab...
Enter Cassandra
●   Project status
    ●   Open sourced by Facebook (no longer active)
    ●   Apache License
    ●   Incu...
Enter Cassandra
●   The code base
    ●   Java, Apache Ant, Git/SVN
    ●   5+ committers from 3+ companies
●   Known depl...
The Datamodel
Cluster
The Datamodel
Cluster >   Keyspace



                                           Partitioners:
                           ...
The Datamodel
Cluster > Keyspace >   Column Family




        SortedMap<String,Row>
              Key → Row              ...
The Datamodel
Cluster > Keyspace > Column Family >   Row




                                             SortedMap<Name,V...
The Datamodel
Cluster > Keyspace > Column Family > Row >   “Column”

                                                Not l...
StatusApp: another Twitter clone.
StatusApp Example
<ColumnFamily Name=”Users”>
●   Unique id as key: name->value pairs contain
    user attributes.
{key: “...
StatusApp Example
<ColumnFamily Name=”Statuses”>
●   Unique id as key: name->value pairs contain
    status attributes
{ke...
StatusApp Example
<ColumnFamily Name=”Timelines”
ColumnType=”Super”>
●   User id as key: row contains lists of timelines
 ...
Client API
●   Thrift generates client bindings for (almost) any
    language
    ●   RPC framework
●   Most popular:
    ...
Client API
1. Get the full name for a user:
●   get(keyspace, key, [column_family,
    column_name], consistency_level)
● ...
Client API
2. Get the 50 most recent statuses in a user's
personal timeline:
●   get_slice(keyspace, key, [column_family,
...
Consistency Levels?
●   Eventual consistency
    ●   Synch to Washington, asynch to Hong Kong
●   Client API Tunables
    ...
Caveat consumptor
●   No secondary indexes:
    ●   Querying the content of rows always involves
        iterating over th...
Caveat consumptor
●   No joins:
    ●   No server side method to re-query based on a first
        query
●   Solution: Den...
The bright side: Ops
●   Need N new nodes?
    ●   Start more nodes with the same config file
    ●   New nodes request lo...
The bright side: Ops
●   Dead drive?
    ●   Swap the drive, restart, run 'repair'
    ●   Streams missing data from other...
The bright side: Ops
●   Dead node?
    ●   Start a new node with the same IP and token, run
        'repair'
The bright side: Ops
●   Adding a datacenter?
    ●   Extend the 'EndpointSnitch' class to describe the
        location o...
The bright side: Performance
Getting started
●   http://incubator.apache.org/cassandra/
●   Read "Getting Started"... Roughly:
    ●   Start one node
 ...
Getting started
●   Resources
    ●   http://incubator.apache.org/cassandra/
    ●   Wiki
    ●   Mailing List
    ●   #ca...
Thanks!
Austin JUG
   UT
Rackspace
Questions?
References
●   Digg Technology Blog
    ●   http://about.digg.com/blog/looking-future-cassandra
    ●   http://about.digg....
Upcoming SlideShare
Loading in...5
×

Cassandra Talk: Austin JUG

9,243

Published on

Introduction to Cassandra delivered Jan 26th 2010 for the Austin Java Users Group.

Published in: Technology

Transcript of "Cassandra Talk: Austin JUG"

  1. 1. Apache Cassandra: Distributed DB for Massive Scale Austin JUG Stu Hood (@stuhood) – Technical Lead, Rackspace January 26th 2010
  2. 2. My, what a large/volatile dataset you have... ● Large ● Larger than 1 node can handle ● Volatile ● More than 25% (ish) writes ● Expensive ● More than you can afford with a commercial solution
  3. 3. My, what a large/volatile dataset you have... ● For example ● Event/log data ● Output of batch processing or log analytics jobs – In Hadoop, particularly ● Social network relationships/updates
  4. 4. What about sharding? ● shard n. - A horizontal partition in a database ● Example: Sharding by userid ● Limitations (for the partitioned dimension) ● No more joins ● No more indexes
  5. 5. What about sharding? ● Options ● Provided by ORM? – Fixed partitions: manual rebalancing ● Developing from scratch? – Adding/removing nodes – Handling failover – As a library? As a middle tier?
  6. 6. What about sharding? Pain in the ass.
  7. 7. Case Study: Digg 1.Vertical partitioning and master/slave trees 2.Developed sharding solution ● IDDB ● Awkward replication, fragile scaling 3.Began populating Cassandra in parallel ● Initial dataset for 'green badges' – 3 TB – 76 billion kv pairs ● Most applications being ported to Cassandra
  8. 8. Relationship trouble: The NoSQL Movement
  9. 9. Lessons from the NoSQL Movement ● Not Only SQL: Specialization! ● Relational is not always the answer ● Need highly specialized systems to distribute: – Transactions – Joins ● Normalization strives to remove duplication – But duplication is an interesting alternative to joins
  10. 10. Lessons from the NoSQL Movement ● Understanding your data, finding the most efficient model ● Key-value / Document – Hint: already denormalized? fits in memcache? – Cassandra, Riak, Voldemort, CouchDB, MongoDB ● Graph/RDF – Hint: frequent schema changes or deeply nested joins? – Neo4j, Sesame ● Column store – Hint: all writes go to the end of a few tables? – Cassandra, SDB, Hbase, Hypertable
  11. 11. Lessons from the NoSQL Movement ● Replication is not one-size-fits-all ● Facebook multi datacenter caching – Writes occurring in one DC need to expire caches elsewhere – Label events as requiring cache expiration ● Google semi-synchronous replication patches – Selective synchronous replication ● Fragile write master – DRBD for high availability
  12. 12. Lessons from the NoSQL Movement ● Partition tolerance vs. consistency ● Brewer's CAP Theorem – Consistency, Availability, Partition tolerance – Choose 2: ● CA – Corruption possible if live nodes can’t communicate (network partition) ● CP – Completely inaccessible if any nodes are dead ● AP – Always available, but may not always read most recent ● Tunables! – Cassandra chooses AP, but makes consistency configurable
  13. 13. Cassandra's Elders
  14. 14. Standing on the shoulders of: Amazon Dynamo ● No node in the cluster is special ● No special roles ● No scaling bottlenecks ● No single point of failure ● Techniques ● Gossip ● Eventual consistency
  15. 15. Standing on the shoulders of: Google Bigtable ● Column family data model ● Range queries for rows: ● Scan rows in order ● Memtable/SSTable structure ● Always writes sequentially to disk ● Bloom filters to minimize random reads ● Trounces B-Trees for big data – Linear insert performance – Log growth for reads
  16. 16. Enter Cassandra ● Hybrid of ancestors ● Adopts listed features ● And adds: ● A sweet logo! ● Pluggable partitioning ● Multi datacenter support – Pluggable locality awareness ● Datamodel improvements
  17. 17. Enter Cassandra ● Project status ● Open sourced by Facebook (no longer active) ● Apache License ● Incubating with Apache: graduation in progress ● Major releases: 0.3, 0.4, 0.5 (this past weekend!)
  18. 18. Enter Cassandra ● The code base ● Java, Apache Ant, Git/SVN ● 5+ committers from 3+ companies ● Known deployments at: ● Rackspace, Digg, Twitter, Mahalo, SimpleGeo, Cloudkick
  19. 19. The Datamodel Cluster
  20. 20. The Datamodel Cluster > Keyspace Partitioners: OrderPreservingPartitioner RandomPartitioner Like an RDBMS schema: Keyspace per application
  21. 21. The Datamodel Cluster > Keyspace > Column Family SortedMap<String,Row> Key → Row Like an RDBMS table: Separates types in an app
  22. 22. The Datamodel Cluster > Keyspace > Column Family > Row SortedMap<Name,Value> ...
  23. 23. The Datamodel Cluster > Keyspace > Column Family > Row > “Column” Not like an RDBMS column: Attribute of the row: each row can contain millions of different columns … Name → Value byte[] → byte[] +version timestamp
  24. 24. StatusApp: another Twitter clone.
  25. 25. StatusApp Example <ColumnFamily Name=”Users”> ● Unique id as key: name->value pairs contain user attributes. {key: “strangeloop_stl”, row: {“fullname”: “Dr. Strange”, “joindate”: “20091023” … }}
  26. 26. StatusApp Example <ColumnFamily Name=”Statuses”> ● Unique id as key: name->value pairs contain status attributes {key: “status19”, row: {“userid”: “user19”, “content”: “Excited to be at #strangeloop!” … }}
  27. 27. StatusApp Example <ColumnFamily Name=”Timelines” ColumnType=”Super”> ● User id as key: row contains lists of timelines that the user cares about. Each list contains name:value pairs representing ordered statuses {key: “user19”, row: {“personal”: [<timeuuid>: “status19”, … ], “following”: [<timeuuid>: “status21”, … ] }}
  28. 28. Client API ● Thrift generates client bindings for (almost) any language ● RPC framework ● Most popular: ● Python ● Ruby ● Java ● PHP
  29. 29. Client API 1. Get the full name for a user: ● get(keyspace, key, [column_family, column_name], consistency_level) ● get(“statusapp”, “userid19”, [“Users”, “fullname”], ONE) > “Dr. Strange”
  30. 30. Client API 2. Get the 50 most recent statuses in a user's personal timeline: ● get_slice(keyspace, key, [column_family, column_name], predicate, consistency_level) ● get_slice(“statusapp”, “userid19”, [“Timelines”, “personal”], {start: ””, count: 50}, QUORUM) > [<timeuuid>: “status19”, <timeuuid>: “status24” …]
  31. 31. Consistency Levels? ● Eventual consistency ● Synch to Washington, asynch to Hong Kong ● Client API Tunables ● Synchronously write to W replicas ● Confirm R replicas match at read time ● of N total replicas ● Allows for almost-strong consistency ● When W + R > N
  32. 32. Caveat consumptor ● No secondary indexes: ● Querying the content of rows always involves iterating over them ● Example: Finding users with a certain last name ● Solution: Manually maintain indexes ● Implications: – Indexes must by synced with the indexed data manually – New indexes mean new writes, reads
  33. 33. Caveat consumptor ● No joins: ● No server side method to re-query based on a first query ● Solution: Denormalize some queries, client join the rest
  34. 34. The bright side: Ops ● Need N new nodes? ● Start more nodes with the same config file ● New nodes request load information from the cluster and join with a token that balances the cluster
  35. 35. The bright side: Ops ● Dead drive? ● Swap the drive, restart, run 'repair' ● Streams missing data from other replicas
  36. 36. The bright side: Ops ● Dead node? ● Start a new node with the same IP and token, run 'repair'
  37. 37. The bright side: Ops ● Adding a datacenter? ● Extend the 'EndpointSnitch' class to describe the location of your nodes ● Add new nodes as before
  38. 38. The bright side: Performance
  39. 39. Getting started ● http://incubator.apache.org/cassandra/ ● Read "Getting Started"... Roughly: ● Start one node ● Test/develop app, editing node config as necessary – Don't skip: exposes partitioner, data model strengths/weaknesses ● Launch cluster by starting more nodes with chosen config
  40. 40. Getting started ● Resources ● http://incubator.apache.org/cassandra/ ● Wiki ● Mailing List ● #cassandra on freenode.net
  41. 41. Thanks! Austin JUG UT Rackspace
  42. 42. Questions?
  43. 43. References ● Digg Technology Blog ● http://about.digg.com/blog/looking-future-cassandra ● http://about.digg.com/blog/introducing-digg’s-iddb- infrastructure ● Cassandra Wiki ● http://wiki.apache.org/cassandra/ ● Werner Vogels – Eventual Consistency ● http://www.allthingsdistributed.com/2008/12/eventually_consist ent.html ● Jonathan Ellis's Blog ● http://spyced.blogspot.com/2010/01/cassandra-05.html
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×