Cassandra devoxx 2010

3,797 views

Published on

Introduction to Cassandra at Devoxx 2010

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,797
On SlideShare
0
From Embeds
0
Number of Embeds
27
Actions
Shares
0
Downloads
54
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Cassandra devoxx 2010

  1. 1. Jonathan Ellis jbellis@riptano.com / @spyced The Cassandra Distributed Database Tuesday, November 30, 2010
  2. 2. Bigtable, 2006 Dynamo, 2007 OSS, 2008 Incubator, 2009 TLP, 2010 Tuesday, November 30, 2010
  3. 3. Tuesday, November 30, 2010
  4. 4. Why Cassandra? ✤ Relational databases are not designed to scale ✤ B-trees are slow ✤ and require read-before-write Tuesday, November 30, 2010
  5. 5. Tuesday, November 30, 2010
  6. 6. Tuesday, November 30, 2010
  7. 7. Tuesday, November 30, 2010
  8. 8. Tuesday, November 30, 2010
  9. 9. Tuesday, November 30, 2010
  10. 10. Tuesday, November 30, 2010
  11. 11. Tuesday, November 30, 2010
  12. 12. (“The eBay Architecture,” Randy Shoup and Dan Pritchett) Tuesday, November 30, 2010
  13. 13. Tuesday, November 30, 2010
  14. 14. eBay: NoSQL pioneer ✤ “BASE is diametrically opposed to ACID. Where ACID is pessimistic and forces consistency at the end of every operation, BASE is optimistic and accepts that the database consistency will be in a state of flux. Although this sounds impossible to cope with, in reality it is quite manageable and leads to levels of scalability that cannot be obtained with ACID.” ✤ ”BASE: An Acid Alternative,” Dan Pritchett, eBay Tuesday, November 30, 2010
  15. 15. Tuesday, November 30, 2010
  16. 16. Tuesday, November 30, 2010
  17. 17. Tuesday, November 30, 2010
  18. 18. Tuesday, November 30, 2010
  19. 19. Tuesday, November 30, 2010
  20. 20. Tuesday, November 30, 2010
  21. 21. Commitlog MemtableWriter Reader The Log-Structured Merge-Tree, Bigtable: A Distributed Storage System for Structured Data Tuesday, November 30, 2010
  22. 22. Myth 1 ✤ “Cassandra is for people who don’t understand {SQL, denormalization, query tuning, ...}” ✤ Similarly: “Only users of [database X] are turning to Cassandra, because X sucks.” Tuesday, November 30, 2010
  23. 23. Myth 2 ✤ “Only huge social media sites care about scalability.” Tuesday, November 30, 2010
  24. 24. Cassandra in production ✤ Digital Reasoning: NLP + entity analytics ✤ OpenX: largest publisher-side ad network in the world ✤ Cloudkick: performance data & aggregation ✤ SimpleGEO: location-as-API ✤ Ooyala: video analytics and business intelligence ✤ ngmoco: massively multiplayer game worlds Tuesday, November 30, 2010
  25. 25. Myth 3 ✤ “Cassandra is only appropriate for unimportant data.” Tuesday, November 30, 2010
  26. 26. Durabilty ✤ Write to commitlog ✤ fsync is cheap since it’s append-only ✤ Write to memtable ✤ [amortized] flush memtable to sstable Tuesday, November 30, 2010
  27. 27. Commitlog MemtableWriter Reader The Log-Structured Merge-Tree, Bigtable: A Distributed Storage System for Structured Data Tuesday, November 30, 2010
  28. 28. SSTable format, briefly <row data 0> <row data 1> ... <row data 127> ... <row data 255> ... <key 127> <key 255> ... Sorted [clustered] by row key Tuesday, November 30, 2010
  29. 29. Scaling Tuesday, November 30, 2010
  30. 30. A L T W Tuesday, November 30, 2010
  31. 31. A L T W F Tuesday, November 30, 2010
  32. 32. A L T W F (A-L] Tuesday, November 30, 2010
  33. 33. A L T W F(A-F] (F-L] Tuesday, November 30, 2010
  34. 34. A L T W F Key “C” Tuesday, November 30, 2010
  35. 35. Reliability ✤ No single points of failure ✤ Multiple datacenters ✤ Monitorable Tuesday, November 30, 2010
  36. 36. Some headlines ✤ “Resyncing Broken MySQL Replication” ✤ “How To Repair MySQL Replication” ✤ “Fixing Broken MySQL Database Replication” ✤ “Replication on Linux broken after db restore” ✤ “MySQL :: Repairing broken replication” Tuesday, November 30, 2010
  37. 37. Tuesday, November 30, 2010
  38. 38. Tuesday, November 30, 2010
  39. 39. Good architecture solves multiple problems at once ✤ Availability in single datacenter ✤ Availablility in multiple datacenters Tuesday, November 30, 2010
  40. 40. A L T W F P Y Key “C” U Tuesday, November 30, 2010
  41. 41. A L T W F P Y Key “C” U Tuesday, November 30, 2010
  42. 42. A L T W F P Y Key “C” U Tuesday, November 30, 2010
  43. 43. A L T W F P Y Key “C” U Tuesday, November 30, 2010
  44. 44. A L T W F P Y Key “C” U X Tuesday, November 30, 2010
  45. 45. A L T W F P Y Key “C” U X hint Tuesday, November 30, 2010
  46. 46. A L T W F P Y Key “C” U X hint Tuesday, November 30, 2010
  47. 47. A L T W F P Y U Tuesday, November 30, 2010
  48. 48. A L T W F P Y U Tuesday, November 30, 2010
  49. 49. A L T W F P Y U Tuesday, November 30, 2010
  50. 50. Tuesday, November 30, 2010
  51. 51. A L T W F P Y Key “C” U Tuesday, November 30, 2010
  52. 52. A L T W F P Y U Key “C” Tuesday, November 30, 2010
  53. 53. Tuneable consistency ✤ ONE, QUORUM, ALL ✤ R + W > N ✤ Choose availability vs consistency (and latency) Tuesday, November 30, 2010
  54. 54. Monitorable Tuesday, November 30, 2010
  55. 55. JMX Tuesday, November 30, 2010
  56. 56. Ripcord Tuesday, November 30, 2010
  57. 57. Data model tradeoffs ✤ Twitter: “Fifteen months ago, it took two weeks to perform ALTER TABLE on the statuses [tweets] table.” Tuesday, November 30, 2010
  58. 58. A static ColumnFamily Tuesday, November 30, 2010
  59. 59. Tuesday, November 30, 2010
  60. 60. A dynamic ColumnFamily Tuesday, November 30, 2010
  61. 61. SELECT * FROM tweets WHERE user_id IN (SELECT follower FROM followers WHERE user_id = ?) followers ? tweets timeline ? uuid:tweet Tuesday, November 30, 2010
  62. 62. SuperColumns = full denormalization Tuesday, November 30, 2010
  63. 63. A little deeper ✤ http://twissandra.com ✤ http://github.com/jhermes/twissjava Tuesday, November 30, 2010
  64. 64. Tuesday, November 30, 2010
  65. 65. Tuesday, November 30, 2010
  66. 66. Mutator<String> m = createMutator("Twissandra", stringExtractor); MutationResult mr = m.insert(tweetId, "Tweet", createStringColumn("uname", uname))   .insert(tweetId, "Tweet", createStringColumn("body", body)); for (String follower : getFollowers(uname)) { mr.insert(follower, "Timeline", createColumn(timestamp, tweetId, longExtractor, stringExtractor)); } m.execute() Tuesday, November 30, 2010
  67. 67. SliceQuery<String, String, String> q = createSliceQuery("Twissandra", stringExtractor, stringExtractor, stringExtractor); q.setColumnFamily("Timeline") .setKey(uname) .setRange(startTimestamp, null, true, 40); ColumnSlice<String, String> slice = q.execute().get(); Tuesday, November 30, 2010
  68. 68. API cake ✤ libpq ✤ JDBC ✤ JPA ✤ Thrift ✤ Pelops, Hector ✤ Kundera, ? Tuesday, November 30, 2010
  69. 69. Analytics in Cassandra ✤ @afex: “Cassandra + Pig (Hadoop) is very exciting. A 7 line script to analyze data from my entire cluster transparently, with no ETL? Yes, please” Tuesday, November 30, 2010
  70. 70. TaskTracker JobTracker Tuesday, November 30, 2010
  71. 71. 0.7 ✤ More control over replica placement ✤ Hadoop refinements ✤ Secondary indexes ✤ Online schema changes ✤ Large row support (> 2GB) ✤ Dynamic routing around slow nodes Tuesday, November 30, 2010
  72. 72. When do you need Cassandra? ✤ Ian Eure: “If you’re deploying memcache on top of your database, you’re inventing your own ad-hoc, difficult to maintain NoSQL data store” Tuesday, November 30, 2010
  73. 73. Not Only SQL ✤ Curt Monash: “ACID-compliant transaction integrity commonly costs more in terms of DBMS licenses and many other components of TCO (Total Cost of Ownership) than [scalable NoSQL]. Worse, it can actually hurt application uptime, by forcing your system to pull in its horns and stop functioning in the face of failures that a non-transactional system might smoothly work around. Other flavors of “complexity can be a bad thing” apply as well. Thus, transaction integrity can be more trouble than it’s worth.” [Curt’s emphasis] Tuesday, November 30, 2010
  74. 74. More ✤ http://riptano.com/docs ✤ http://wiki.apache.org/cassandra/ ArticlesAndPresentations ✤ http://wiki.apache.org/cassandra/ArchitectureInternals Tuesday, November 30, 2010
  75. 75. Questions Tuesday, November 30, 2010

×