Cassandra FrOSCon 10
Upcoming SlideShare
Loading in...5
×
 

Cassandra FrOSCon 10

on

  • 4,679 views

 

Statistics

Views

Total Views
4,679
Views on SlideShare
4,679
Embed Views
0

Actions

Likes
7
Downloads
32
Comments
2

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • thx'
    Are you sure you want to
    Your message goes here
    Processing…
  • Nice talk. Thanks!

    Video here: http://bit.ly/9aveah
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Cassandra FrOSCon 10 Cassandra FrOSCon 10 Presentation Transcript

    • Inside the Cassandra Database Jonathan Ellis jbellis@riptano.com / @spyced Saturday, August 21, 2010
    • Saturday, August 21, 2010
    • Why NoSQL? ✤ Relational databases don’t scale ✤ The relational model maps poorly to some problems ✤ Relational databases are slow Saturday, August 21, 2010
    • Myth 1 ✤ “NoSQL is for people who don’t understand {SQL, denormalization, query tuning, ...}” ✤ Similarly: “Only users of [database X] are turning to NoSQL databases, because X sucks.” Saturday, August 21, 2010
    • eBay: NoSQL pioneer ✤ “BASE is diametrically opposed to ACID. Where ACID is pessimistic and forces consistency at the end of every operation, BASE is optimistic and accepts that the database consistency will be in a state of flux. Although this sounds impossible to cope with, in reality it is quite manageable and leads to levels of scalability that cannot be obtained with ACID.” ✤ --BASE: An Acid Alternative, Dan Pritchett, eBay ✤ See also: The eBay Architecture, Randy Shoup and Dan Pritchett Saturday, August 21, 2010
    • (“The eBay Architecture,” Randy Shoup and Dan Pritchett) Saturday, August 21, 2010
    • Myth 2 ✤ “NoSQL is nothing new because we had key/value databases like bdb years ago.” Saturday, August 21, 2010
    • Myth 3 ✤ “Only huge web 2.0 sites like Facebook and Twitter care about scalability.” Saturday, August 21, 2010
    • Cassandra ✤ Relational databases don’t scale ✤ The relational model maps poorly to some problems ✤ Relational databases are slow Saturday, August 21, 2010
    • Saturday, August 21, 2010
    • Saturday, August 21, 2010
    • Saturday, August 21, 2010
    • Saturday, August 21, 2010
    • Saturday, August 21, 2010
    • Saturday, August 21, 2010
    • Use cases ✤ Digital Reasoning: NLP + entity analytics ✤ OpenX: largest publisher-side ad network in the world ✤ Cloudkick: performance data & aggregation ✤ SimpleGEO: location-as-API ✤ Ooyala: video analytics and business intelligence ✤ ngmoco: massively multiplayer game worlds Saturday, August 21, 2010
    • Cassandra ✤ Relational databases don’t scale ✤ The relational model maps poorly to some problems ✤ Relational databases are slow Saturday, August 21, 2010
    • Saturday, August 21, 2010
    • Cassandra memory-aware index <key 127> <row data 0> <key 255> <row data 1> ... ... <row data 127> ... <row data 255> ... Saturday, August 21, 2010
    • Reader Writer Memtable Commitlog The Log-Structured Merge-Tree, Bigtable: A Distributed Storage System for Structured Data Saturday, August 21, 2010
    • Saturday, August 21, 2010
    • Durable ✤ Write to commitlog ✤ fsync is cheap since it’s append-only ✤ Write to memtable ✤ [amortized] flush memtable to sstable Saturday, August 21, 2010
    • Scaling Saturday, August 21, 2010
    • W A T L Saturday, August 21, 2010
    • W A F T L Saturday, August 21, 2010
    • W A F (A-L] T L Saturday, August 21, 2010
    • W A (A-F] F T (F-L] L Saturday, August 21, 2010
    • Key “C” W A F T L Saturday, August 21, 2010
    • Reliability ✤ No single points of failure, clean design ✤ Multiple datacenters ✤ Monitorable Saturday, August 21, 2010
    • Saturday, August 21, 2010
    • Some headlines ✤ “Resyncing Broken MySQL Replication” ✤ “How To Repair MySQL Replication” ✤ “Fixing Broken MySQL Database Replication” ✤ “Replication on Linux broken after db restore” ✤ “MySQL :: Repairing broken replication” Saturday, August 21, 2010
    • The opposite of heroes ✤ “If your software wakes someone up at 4 AM to fix it, you’re doing it wrong.” Saturday, August 21, 2010
    • Good architecture solves multiple problems at once ✤ Availability in single datacenter ✤ Availablility in multiple datacenters ✤ Built-in repair rather than an afterthought ✤ or worse, having to restart replication entirely Saturday, August 21, 2010
    • Y Key “C” A W U F T L P Saturday, August 21, 2010
    • Y A W U F T L P Saturday, August 21, 2010
    • Y A W U F T L P Saturday, August 21, 2010
    • Y Key “C” A W U F T L P Saturday, August 21, 2010
    • Y Key “C” A W U F T L P Saturday, August 21, 2010
    • Y Key “C” A W U F T L P Saturday, August 21, 2010
    • Y Key “C” A W U F T L P Saturday, August 21, 2010
    • Y Key “C” A W U F T L P Saturday, August 21, 2010
    • Y Key “C” A W U F X T L P Saturday, August 21, 2010
    • Y Key “C” A W U F X T hint L P Saturday, August 21, 2010
    • Y Key “C” A W U F X T hint L P Saturday, August 21, 2010
    • Y A W U F T L P Saturday, August 21, 2010
    • Y A W U F T L P Saturday, August 21, 2010
    • Y A W U F T L P Saturday, August 21, 2010
    • Saturday, August 21, 2010
    • Eventual Tuneable consistency ✤ ONE, QUORUM, ALL ✤ R+W>N ✤ Choose availability vs consistency (and latency) Saturday, August 21, 2010
    • Monitorable Saturday, August 21, 2010
    • Events Saturday, August 21, 2010
    • 1500 mails sent 1125 750 375 0 Jan Feb Apr May Jun Jul (0.5) (0.5.1) Mar (0.6, 0.6.1) (0.6.2) (0.6.3) (0.6.4) Saturday, August 21, 2010
    • Data model ✤ Twitter: “Fifteen months ago, it took two weeks to perform ALTER TABLE on the statuses [tweets] table.” Saturday, August 21, 2010
    • ColumnFamilies Columns Saturday, August 21, 2010
    • Saturday, August 21, 2010
    • Cassandra “indexes” are materialized views SELECT * FROM tweets WHERE user_id IN (SELECT follower FROM followers WHERE user_id = ?) ORDER BY time_tweeted DESC LIMIT 40 followers timeline ? SORT ? uuid:tweet tweets Saturday, August 21, 2010
    • Analytics in Cassandra ✤ @afex: “Cassandra + Pig (Hadoop) is very exciting. A 7 line script to analyze data from my entire cluster transparently, with no ETL? Yes, please” Saturday, August 21, 2010
    • TaskTracker JobTracker Saturday, August 21, 2010
    • Coming in 0.7 [beta1 released 13/8] ✤ Secondary indexes and expressions ✤ Large (> 2GB) rows ✤ More control over replica placement ✤ Online schema changes ✤ Flow control ✤ Dynamic routing around slow nodes Saturday, August 21, 2010
    • More ✤ http://wiki.apache.org/cassandra/ ArticlesAndPresentations ✤ http://wiki.apache.org/cassandra/ArchitectureInternals Saturday, August 21, 2010
    • Questions Saturday, August 21, 2010