• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cassandra: Open Source Bigtable + Dynamo
 

Cassandra: Open Source Bigtable + Dynamo

on

  • 30,523 views

Cassandra is a highly scalable, eventually consistent, distributed, structured columnfamily store with no single points of failure, initially open-sourced by Facebook and now part of the Apache ...

Cassandra is a highly scalable, eventually consistent, distributed, structured columnfamily store with no single points of failure, initially open-sourced by Facebook and now part of the Apache Incubator. These slides are from Jonathan Ellis's OSCON 09 talk: http://en.oreilly.com/oscon2009/public/schedule/detail/7975

Statistics

Views

Total Views
30,523
Views on SlideShare
28,553
Embed Views
1,970

Actions

Likes
49
Downloads
972
Comments
1

31 Embeds 1,970

http://spyced.blogspot.com 1194
http://abrdev.com 491
http://www.slideshare.net 161
http://spyced.blogspot.in 24
http://planetcassandra.org 24
http://spyced.blogspot.co.uk 12
http://theoldreader.com 7
http://spyced.blogspot.com.br 7
http://spyced.blogspot.jp 6
url_unknown 5
http://spyced.blogspot.fr 5
http://spyced.blogspot.ca 5
http://spyced.blogspot.se 3
http://spyced.blogspot.de 3
http://spyced.blogspot.com.ar 3
http://spyced.blogspot.com.au 2
http://spyced.blogspot.sg 2
http://spyced.blogspot.hk 2
http://translate.googleusercontent.com 2
http://www.newsblur.com 1
http://www.blogger.com 1
http://spyced.blogspot.kr 1
http://www.4624.info 1
http://74.125.155.132 1
http://spyced.blogspot.com.es 1
http://209.85.135.132 1
https://jujo00obo2o234ungd3t8qjfcjrs3o6k-a-sites-opensocial.googleusercontent.com 1
http://spyced.blogspot.ro 1
http://spyced.blogspot.it 1
http://infosiftr.com 1
http://spyced.blogspot.co.il 1
More...

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Cassandra: Open Source Bigtable + Dynamo Cassandra: Open Source Bigtable + Dynamo Presentation Transcript

    • Cassandra Jonathan Ellis
    • Motivation ● Scaling reads to a relational database is hard ● Scaling writes to a relational database is virtually impossible ● … and when you do, it usually isn't relational anymore
    • The new face of data ● Scale out, not up ● Online load balancing, cluster growth ● Flexible schema ● Key-oriented queries ● CAP-aware
    • CAP theorem ● Pick two of Consistency, Availability, Partition tolerance
    • Two famous papers ● Bigtable: A distributed storage system for structured data, 2006 ● Dynamo: amazon's highly available key- value store, 2007
    • Two approaches ● Bigtable: “How can we build a distributed db on top of GFS?” ● Dynamo: “How can we build a distributed hash table appropriate for the data center?”
    • 10,000 ft summary ● Dynamo partitioning and replication ● Log-structured ColumnFamily data model similar to Bigtable's
    • Cassandra highlights ● High availability ● Incremental scalability ● Eventually consistent ● Tunable tradeoffs between consistency and latency ● Minimal administration ● No SPF
    • Dynamo architecture & Lookup
    • Architecture details ● O(1) node lookup ● Explicit replication ● Eventually consistent
    • Architecture layers Messaging service Commit log Tombstones Gossip Memtable Hinted handoff Failure detection SSTable Read repair Cluster state Indexes Bootstrap Partitioner Compaction Monitoring Replication Admin tools
    • Writes ● Any node ● Partitioner ● Commitlog, memtable ● SSTable ● Compaction ● Wait for W responses
    • Memtable / SSTable Disk Commit log
    • SSTable format ● Key / data
    • SSTable Indexes ● Bloom filter ● Key ● Column (Similar to Hadoop MapFile / Tfile)
    • Compaction ● Merge keys ● Combine columns ● Discard tombstones
    • Remove ● Deletion marker (tombstone) necessary to suppress data in older SSTables, until compaction ● Read repair complicates things a little ● Eventually consistent complicates things more ● Solution: configurable delay before tombstone GC, after which tombstones are not repaired
    • Cassandra write properties ● No reads ● No seeks ● Fast ● Atomic within ColumnFamily ● Always writable
    • Read path ● Any node ● Partitioner ● Wait for R responses ● Wait for N – R responses in the background and perform read repair
    • Cassandra read properties ● Read multiple SSTables ● Slower than writes (but still fast) ● Seeks can be mitigated with more RAM ● Scales to billions of rows
    • Consistency in a BASE world ● If W + R > N, you will have consistency ● W=1, R=N ● W=N, R=1 ● W=Q, R=Q where Q = N / 2 + 1
    • vs MySQL with 50GB of data ● MySQL ● ~300ms write ● ~350ms read ● Cassandra ● ~0.12ms write ● ~15ms read ● Achtung!
    • Data model ● Rows, ColumnFamilies, Columns
    • ColumnFamilies keyA column1 column2 column3 keyC column1 column7 column11 Column Byte[] Name Byte[] Value I64 timestamp
    • Super ColumnFamilies keyF Super1 Super2 column column column column column column keyJ Super1 Super5 column column column column column column
    • Types of queries ● Single column ● Slice ● Set of names / range of names ● Simple slice -> columns ● Super slice -> supercolumns ● Key range
    • Range queries ● Add “master” server ● Implement on top of K/V ● Order-preserving partitioning
    • Modification ● Insert / update ● Remove ● Single column or batch ● Specify W, number of nodes to wait for
    • Thrift struct Column {    1: binary                        name,    2: binary                        value,    3: i64                           timestamp, } struct SuperColumn {    1: binary                        name,    2: list<Column>                  columns, } Column get_column(table, key, column_path, block_for=1) list<string> get_key_range(table, column_family, start_with="",  stop_at="", max_results=100) void insert(table, key, column_path, value, timestamp,  block_for=0) void remove(tablename, key, column_path_or_parent, timestamp)
    • Honestly, Thrift kinda sucks
    • Example: a multiuser blog Two queries - the most recent posts belonging to a given blog, in reverse chronological order - a single post and its comments, in chronological order
    • First try JBE Cassandra is teh awesome BASE FTW blog post comment comment post comment comment Evan I like kittens And Ruby blog post comment comment post comment comment <ColumnFamily Type="Super" CompareWith="TimeString" CompareSubcolumnsWith="UUID" Name="Blog"/>
    • Second try JBE blog Cassandra BASE FTW Cassandr comment comment is teh a is teh awesome awesome Evan blog I like kittens And Ruby Base FTW comment comment I like comment comment kittens And Ruby comment comment <ColumnFamily <ColumnFamily CompareWith="UUIDType" CompareWith="UUIDType" Name="Blog"/> Name="Comment"/>
    • Roadmap
    • Cassandra 0.3 ● Remove support ● OPP / Range queries ● Test suite ● Workarounds for JDK bugs ● Rudimentary multi-datacenter support
    • Cassandra 0.4 ● Branched May 18 ● Data file format change to support billions of rows per node instead of millions ● API changes (no more colon delimiters) ● Multi-table (keyspace) support ● LRU key cache ● fsync support ● Bootstrap ● Web interface
    • Cassandra 0.5 ● Bootstrap ● Load balancing ● Closely related to “bootstrap done right” ● Merkle tree repair ● Millions of columns per row ● This will require another data format change ● Multiget ● Callout support
    • Users Production: facebook, RocketFuel Production RSN: Digg, Rackspace No date yet: IBM Research, Twitter Evaluating: 50+ in #cassandra on freenode
    • More ● Eventual consistency: http://www.allthingsdistributed.com/2008/12/ ● Introduction to distributed databases by Todd Lipcon at NoSQL 09: http://www.vimeo.com/5145059 ● Other articles/videos about Cassandra: http://wiki.apache.org/cassandra/ArticlesAndP ● #cassandra on irc.freenode.net
    • Cassandra