Phil Coleman Presentation

425
-1

Published on

Phil Coleman gives a talk on NoSQL and Cassandra

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
425
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Phil Coleman Presentation

  1. 1. NoSQL Or: How I Learned to Stopped Worrying and Love Cassandra
  2. 2. What is NoSql?
  3. 3. NoSQL != No SQL <ul><li>N ot O nly SQL </li></ul><ul><ul><li>Mostly Same Functionality </li></ul></ul><ul><ul><li>Same Purpose </li></ul></ul><ul><ul><li>No Longer Relational </li></ul></ul>
  4. 4. ACID vs WHAM <ul><li>ACID </li></ul><ul><ul><li>A tomicity </li></ul></ul><ul><ul><li>C onsistency </li></ul></ul><ul><ul><li>I solation </li></ul></ul><ul><ul><li>D urability </li></ul></ul><ul><li>Issues </li></ul><ul><ul><li>Dated </li></ul></ul><ul><ul><li>Not Necessary </li></ul></ul>
  5. 5. ACID vs WHAM <ul><li>WHAM </li></ul><ul><ul><li>W eb-scale </li></ul></ul><ul><ul><li>H igh Traffic </li></ul></ul><ul><ul><li>A daptive Schema </li></ul></ul><ul><ul><li>M ulti-Use </li></ul></ul><ul><li>Benefits </li></ul><ul><ul><li>Better Suited for the current Data Ecosystem </li></ul></ul>
  6. 6. No Family Drama <ul><li>Removes Existing Slowdowns </li></ul><ul><ul><li>Relations </li></ul></ul><ul><ul><li>Consistency Between Related Data </li></ul></ul><ul><ul><li>Joins </li></ul></ul><ul><li>More Easily Distributed </li></ul><ul><li>Faster Queries </li></ul>
  7. 7. Implementations
  8. 8. Cassandra - History <ul><li>2007 - Initial Developed by Facebook </li></ul><ul><ul><li>Manage User Inbox Searching </li></ul></ul><ul><li>2008 - Released as Open Source 2008 </li></ul><ul><li>2009 - Picked up by Apache </li></ul><ul><li>Present - Cassandra 1.0 </li></ul>
  9. 9. Big Table + Dynamo <ul><li>Basic Design Principle </li></ul><ul><ul><li>Big Table Column Families </li></ul></ul><ul><ul><ul><li>Adaptable </li></ul></ul></ul><ul><ul><ul><li>Similarities with SQL </li></ul></ul></ul><ul><ul><li>Dynamo Decentralized Distributive Nature </li></ul></ul><ul><ul><ul><li>Easily Scalable </li></ul></ul></ul><ul><ul><ul><li>Durable </li></ul></ul></ul>
  10. 10. Cassandra Pyramid
  11. 11. Example
  12. 12. Building the Pyramid <ul><li>Columns </li></ul><ul><ul><li>3 Values </li></ul></ul><ul><ul><ul><li>Name </li></ul></ul></ul><ul><ul><ul><li>Value </li></ul></ul></ul><ul><ul><ul><li>Time Stamp </li></ul></ul></ul><ul><ul><li>Primitive Types </li></ul></ul><ul><ul><li>Similar Role to Columns in RDBMs </li></ul></ul>
  13. 13. Example <ul><li>Column 1 </li></ul><ul><ul><li>Name: City </li></ul></ul><ul><ul><li>Value: Houston </li></ul></ul><ul><li>Column 1 </li></ul><ul><ul><li>Name: Attire </li></ul></ul><ul><ul><li>Value: Casual </li></ul></ul>
  14. 14. Building the Pyramid <ul><li>Super Columns </li></ul><ul><ul><li>2 Values </li></ul></ul><ul><ul><ul><li>Name </li></ul></ul></ul><ul><ul><ul><li>Collection of Columns </li></ul></ul></ul><ul><ul><li>Columns can be sorted by name. </li></ul></ul>
  15. 15. Example <ul><li>Super Column 1 </li></ul><ul><ul><li>Name: Hours </li></ul></ul><ul><ul><li>Value: Column 1 </li></ul></ul><ul><ul><ul><li>Name: Mon-Thu </li></ul></ul></ul><ul><ul><ul><li>Value: 10 am – 10 pm </li></ul></ul></ul><ul><ul><li>Value: Column 2 </li></ul></ul><ul><ul><ul><li>Name: Fri-Sat </li></ul></ul></ul><ul><ul><ul><li>Value: 10 am – 11 pm </li></ul></ul></ul><ul><ul><li>Value: Column 3 </li></ul></ul><ul><ul><ul><li>Name: Sun </li></ul></ul></ul><ul><ul><ul><li>Value: 10 am – 9 pm </li></ul></ul></ul>
  16. 16. Building the Pyramid <ul><li>Rows </li></ul><ul><ul><li>A Collection of Columns </li></ul></ul><ul><ul><li>Identified by A Unique Key </li></ul></ul><ul><ul><li>Columns not necessarily related </li></ul></ul><ul><ul><li>Similar Role to Rows in RDBMs </li></ul></ul>
  17. 17. Example <ul><li>Row 1 </li></ul><ul><ul><li>Key: nikonikogreekcafe </li></ul></ul><ul><ul><li>Column 1: City </li></ul></ul><ul><ul><li>Column 2: Attire </li></ul></ul>
  18. 18. Building the Pyramid <ul><li>Column Family </li></ul><ul><ul><li>A Collection of Rows </li></ul></ul><ul><ul><li>Either Columns or Super Columns </li></ul></ul><ul><ul><li>Query on Only One Column Family </li></ul></ul><ul><ul><li>Similar Role to Tables in RDBMs </li></ul></ul>
  19. 19. Example <ul><li>Value: Column 1 </li></ul><ul><ul><ul><li>Name: Hours.Mon-Thu </li></ul></ul></ul><ul><ul><ul><li>Value: 10 am – 10 pm </li></ul></ul></ul><ul><ul><li>Value: Column 2 </li></ul></ul><ul><ul><ul><li>Name: Hours.Fri-Sat </li></ul></ul></ul><ul><ul><ul><li>Value: 10 am – 11 pm </li></ul></ul></ul><ul><ul><li>Value: Column 3 </li></ul></ul><ul><ul><ul><li>Name: Hours.Sun </li></ul></ul></ul><ul><ul><ul><li>Value: 10 am – 9 pm </li></ul></ul></ul>
  20. 20. Example <ul><li>Column Family: </li></ul><ul><ul><li>Name: Location </li></ul></ul><ul><ul><li>Row 1: Niko Niko's </li></ul></ul><ul><ul><li>Row 2: Goode Co. </li></ul></ul><ul><ul><li>Row 3: Datafiniti </li></ul></ul>
  21. 21. Building the Pyramid <ul><li>Keyspace </li></ul><ul><ul><li>A Collection of Column Families </li></ul></ul><ul><ul><li>Global Settings </li></ul></ul><ul><ul><ul><li>Sorting </li></ul></ul></ul><ul><ul><ul><li>Replication Factor </li></ul></ul></ul><ul><ul><li>Similar Role to Databases in RDBMs </li></ul></ul>
  22. 22. Advantages <ul><li>Horizontally Scalable </li></ul><ul><ul><li>Decentralized Approach </li></ul></ul><ul><ul><li>Eventually Consistent </li></ul></ul><ul><ul><li>Auto-bootstrapping </li></ul></ul><ul><ul><li>Distributed Computing </li></ul></ul><ul><ul><li>Distributed Storage </li></ul></ul>
  23. 23. Advantages <ul><li>Real Time Responses </li></ul><ul><ul><li>Read and Writes </li></ul></ul><ul><ul><li>Removes Slower Functions </li></ul></ul><ul><ul><ul><li>Joins/Relations </li></ul></ul></ul><ul><ul><ul><li>Consistency </li></ul></ul></ul><ul><ul><li>Does not need to Sort </li></ul></ul><ul><ul><ul><li>Able to return the first matches. </li></ul></ul></ul>
  24. 24. Advantages <ul><li>Developed for programmers </li></ul><ul><ul><li>Good API Support </li></ul></ul><ul><ul><ul><li>In most major languages </li></ul></ul></ul><ul><ul><ul><li>Object-based Interaction model </li></ul></ul></ul><ul><ul><ul><li>Few Function Calls </li></ul></ul></ul><ul><ul><ul><li>Greater Control </li></ul></ul></ul>
  25. 25. Advantages <ul><li>No Single Point of Failure </li></ul><ul><ul><li>Redundancy across multiple nodes </li></ul></ul><ul><ul><li>Decentralized </li></ul></ul><ul><ul><li>Hinted Hand Offs </li></ul></ul><ul><ul><ul><li>Other Nodes will handle writes </li></ul></ul></ul><ul><ul><ul><li>Update failed nodes when back online. </li></ul></ul></ul>
  26. 26. Disadvantages <ul><li>No Relational Model </li></ul><ul><ul><li>No Internal Joins </li></ul></ul><ul><ul><li>Less Consistency </li></ul></ul><ul><ul><li>Faster Queries </li></ul></ul>
  27. 27. Disadvantages <ul><li>No Sorting at Query Time </li></ul><ul><ul><li>Less Flexibility on Data Returned </li></ul></ul><ul><ul><li>No Ranking </li></ul></ul><ul><ul><li>Able to Return First Results it Finds </li></ul></ul>
  28. 28. Disadvantages <ul><li>No SQL </li></ul><ul><ul><li>Uses Proprietary Query Language (CQL) </li></ul></ul><ul><ul><li>Less Familiar </li></ul></ul>
  29. 29. Potential Uses <ul><li>Key Elements </li></ul><ul><ul><li>Large Amounts of Data </li></ul></ul><ul><ul><li>Data Parameters Shift or Grow Frequently </li></ul></ul><ul><ul><li>Real Time Responses </li></ul></ul><ul><ul><li>Data is Not Reliant on Relations </li></ul></ul>
  30. 30. Potential Uses <ul><li>Inbox Management </li></ul><ul><li>Key/Value Store </li></ul><ul><li>Social Network Management </li></ul><ul><li>Data Warehouse </li></ul>
  31. 31. Potential Uses <ul><li>Major Users </li></ul>
  32. 32. Questions?

×