Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
#theedge2012Practical Introduction           To                    Sonia Margulis                     @robosonia       Mar...
Your Application
Gone Viral
Best Hardware Money Can Buy
Improve Reads
Sharding RDBMS – A Nightmare
Cassandra’s Sweet Spot   Many                    Linear concurrent              Scalability   users                       ...
The Road to Mastership    Introduction    to Cassandra                    Introduction to                                 ...
A non-relational databaseValues availabilityScales out, not upOpen sourceActive community
AlwaysAvailable
Who Uses It?
Use Case: Social & Timelines
Use Case: Statistics & LogsLogs by Rick Payette
The Road to Mastership    Introduction    to Cassandra                                   Running a Server                 ...
The Cassandra Project »          Project » Runs on: » Apache License » Current release: 1.0.8                            Y...
Running a Server  sonia@hiro:~/apache-cassandra-1.0.8$  bin/cassandra -f .... Now serving reads. localhost/127.0.0.1:9160
Connecting to Our Server Cassandra command line interface (CLI) tool sonia@hiro:~/apache-cassandra-1.0.8$ bin/cassandra-cl...
Creating a Keyspace Cassandra’s equivalent to RDBMSs database [default@unknown] create keyspace demo; Lets start using it ...
Creating a Column Family A column family holds data, much like a table in RDBMS. [default@demo] create column family user;...
Retrieving Data Retrieving columns by user key [default@demo] get user[2];   (column=b, value=bar)   (column=c, value=test...
The Road to Mastership    Introduction    to Cassandra                                       Data Model                   ...
Column         Column         Name         Value
Column                 name                Peter Parker     1         name           Peter Parker
Row            icon    name          residencespiderman                   Peter Parker   New York
Row                          Columns Row Id             icon         name          residencespiderman                     ...
Column Family    spider-   icon    name     residence    man              Peter P   New York              icon    name    ...
Column Family     spider-    icon         name     residence     man                    Peter P   New York                ...
The Allies Column Family          Robin    Alfred batman  spider- Iceman   Firestar   Iron Man   Storm  man
Published Issues Column Family                   ~2600 columnsspider- 1/8/1962man       ###                   ...   1/3/20...
Model Flexibility Flexible Data Model                    Image: photostock / FreeDigitalPhotos.net
Keyspace » Like RDBMS database » A container for column families [default@unknown] create keyspace demo; » One keyspace pe...
Expiring Columns – TTL            icon     name        passwd_ residence  spider-                        reminder  man    ...
Distributed Counters          javaedge speakers   sessions          .com      1035       3402 incr page_views[„javaedge.co...
The Road to Mastership    Introduction    to Cassandra                  Communication with                                ...
Cassandra Query Language » Looks a lot like SQL INSERT INTO users (KEY, name, universe)            VALUES (hulk, Bruce, ma...
Advantages of using CQL » Run ad-hoc queries » Very familiar, easier to use » Stable interface   ▪ For library developers ...
CQL Example SELECT name, residence FROM users SELECT 01/1/2011 .. 1/1/2012 FROM published_issues WHERE KEY = „spiderman‟ S...
CQL Example SELECT name, residence FROM users SELECT 01/1/2011 .. 1/1/2012 FROM published_issues WHERE KEY = „spiderman‟ S...
CQL Example SELECT name, residence FROM users SELECT 01/1/2011 .. 1/1/2012 FROM published_issues WHERE KEY = „spiderman‟ S...
Cassandra JDBC Driver import java.sql.*; Class.forName(   "org.apache.cassandra.cql.jdbc.CassandraDriver"); Connection con...
Cassandra JDBC Driver Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery(  “SELECT name, residence  ...
Cassandra JDBC Driver        JDBC
Hector SliceQuery<...> query =     HFactory.createSliceQuery(keyspace, ...); query.setRange(startDate, endDate, false, 100...
Hector: Advanced Features » Failover support » Connection pooling » Load balancing » JMX counters » Object mapper
Maven plugin mvn cassandra:start                  Run your tests mvn cassandra:cql-exec mvn cassandra:stop
The Road to Mastership    Introduction    to Cassandra                                    Modeling Data                   ...
Queries First » Use the same Column Family for data that   should be fetched together   ▪ Reduces IO » Consider filtering ...
Denormalize » Less seeks - faster reads » Storing redundant data   ▪ Manually handling data integrity » Disk space is chea...
Secondary Index » Requirement:   Find all superheroes that live in New York                  icon    name       residence ...
Secondary Index » Requirement:   Find all superheroes that live in New York                   icon    name       residence...
Manually Managed Index » Requirement:   Find a superhero by name
Manually Managed Index » Requirement:   Find a superhero by name                       hulk        batman               Br...
Bucketing  hulk_jan 1/1/2012   2/1/2012    4/1/2012  _2012     Issue-1    Issue-2    Issue-3                              ...
The Road to Mastership    Introduction    to Cassandra                                  Cassandra Cluster                 ...
Virtual Ring                     10               90          40                75        60
Node Token                    10 Node Keys              90          40 10   91-10 40   11-40 60   41-60 75   61-75 90   76...
Node Token hulkMD5’(hulk) = 20         10                  90          40                   75        60
Node TokenMD5’(hulk) = 20         10        hulk                  90          40                   75        60
Node Token                        10        hulk thor                         40MD5’(thor) = 42   90                   75 ...
Node Token                        10         hulkMD5’(thor) = 42   90          40                                  thor   ...
Inter-Node Communication                    10              90            40» Gossip» Failure  Detection               75 ...
Fault Tolerance» Replication factor» Hinted Handoff                            10        hulk                   90        ...
Replication Factor» Replication factor» Hinted Handoff                             10             hulk        thor        ...
Fault Tolerance» Replication factor» Hinted Handoff                            10                   90             40     ...
Hinted Handoff» Replication factor» Hinted Handoff                            10                   90             40      ...
Hinted Handoff» Replication factor» Hinted Handoff                            10                   90             40      ...
Client Requests              Coordinator                                  10     Write Request                            ...
Consistency Level         Consistency         level = ONE                             10     Write Request                ...
Consistency Level         Consistency          level = ALL                              10     Write Request              ...
The Road to Mastership    Introduction    to Cassandra                                       Summary                      ...
Where Do You Sign? » Cassandra   ▪ http://cassandra.apache.com   ▪ http://www.datastax.com/      • Docs, tutorials & video...
Cassandra Intro -- TheEdge2012
Upcoming SlideShare
Loading in …5
×

Cassandra Intro -- TheEdge2012

2,495 views

Published on

This is an introductory presentation to Cassandra, the database of choice for high availability and insane scalability.
I gave this talk at TheEdge conference.

Published in: Technology
  • Be the first to comment

Cassandra Intro -- TheEdge2012

  1. 1. #theedge2012Practical Introduction To Sonia Margulis @robosonia March 2012
  2. 2. Your Application
  3. 3. Gone Viral
  4. 4. Best Hardware Money Can Buy
  5. 5. Improve Reads
  6. 6. Sharding RDBMS – A Nightmare
  7. 7. Cassandra’s Sweet Spot Many Linear concurrent Scalability users Distributed High Volumes Inherently of Operations Clustered
  8. 8. The Road to Mastership Introduction to Cassandra Introduction to Cassandra DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
  9. 9. A non-relational databaseValues availabilityScales out, not upOpen sourceActive community
  10. 10. AlwaysAvailable
  11. 11. Who Uses It?
  12. 12. Use Case: Social & Timelines
  13. 13. Use Case: Statistics & LogsLogs by Rick Payette
  14. 14. The Road to Mastership Introduction to Cassandra Running a Server DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
  15. 15. The Cassandra Project » Project » Runs on: » Apache License » Current release: 1.0.8 You are here sonia@hiro:~/apache-cassandra-1.0.8$
  16. 16. Running a Server sonia@hiro:~/apache-cassandra-1.0.8$ bin/cassandra -f .... Now serving reads. localhost/127.0.0.1:9160
  17. 17. Connecting to Our Server Cassandra command line interface (CLI) tool sonia@hiro:~/apache-cassandra-1.0.8$ bin/cassandra-cli –host 127.0.0.1 –port 9160 Connected to: “Test Cluster” on localhost/9160 Welcome to Cassandra CLI version 1.0.8
  18. 18. Creating a Keyspace Cassandra’s equivalent to RDBMSs database [default@unknown] create keyspace demo; Lets start using it [default@unknown] use demo; [default@demo]
  19. 19. Creating a Column Family A column family holds data, much like a table in RDBMS. [default@demo] create column family user; Start adding data [default@demo] set user[1][a]=utf8(„foo‟); [default@demo] set user[2][b]=utf8(„bar‟); [default@demo] set user[2][c]=utf8(„test‟);
  20. 20. Retrieving Data Retrieving columns by user key [default@demo] get user[2]; (column=b, value=bar) (column=c, value=test) Returned 2 results.
  21. 21. The Road to Mastership Introduction to Cassandra Data Model DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
  22. 22. Column Column Name Value
  23. 23. Column name Peter Parker 1 name Peter Parker
  24. 24. Row icon name residencespiderman Peter Parker New York
  25. 25. Row Columns Row Id icon name residencespiderman Peter Parker New York 1 2 spiderman name Peter Parker
  26. 26. Column Family spider- icon name residence man Peter P New York icon name residence batman Bruce W Gotham icon name residence hulk Bruce B New York
  27. 27. Column Family spider- icon name residence man Peter P New York icon name residence batman set user[„spiderman‟][„name‟] W „Peter Parker‟ Bruce = Gotham icon name residence hulk Value Column Bruce B New York Row id name Column Family
  28. 28. The Allies Column Family Robin Alfred batman spider- Iceman Firestar Iron Man Storm man
  29. 29. Published Issues Column Family ~2600 columnsspider- 1/8/1962man ### ... 1/3/2012 8/3/2012 ### ###batman 1/5/1939 ### ... 2/3/2012 9/3/2012 ### ### ~3800 columns
  30. 30. Model Flexibility Flexible Data Model Image: photostock / FreeDigitalPhotos.net
  31. 31. Keyspace » Like RDBMS database » A container for column families [default@unknown] create keyspace demo; » One keyspace per application, in most cases
  32. 32. Expiring Columns – TTL icon name passwd_ residence spider- reminder man Peter P abcd New York set users[„spiredman‟][„passwd_reminder‟] = „abcd‟ with ttl = 7200; 7200s = 2 hours
  33. 33. Distributed Counters javaedge speakers sessions .com 1035 3402 incr page_views[„javaedge.com‟][„speakers‟] by 1 get page_views[„javaedge.com‟][„speakers‟]
  34. 34. The Road to Mastership Introduction to Cassandra Communication with the Server: Clients DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
  35. 35. Cassandra Query Language » Looks a lot like SQL INSERT INTO users (KEY, name, universe) VALUES (hulk, Bruce, marvel) » Mostly valid SQL SELECT name, universe FROM users WHERE KEY = „hulk‟
  36. 36. Advantages of using CQL » Run ad-hoc queries » Very familiar, easier to use » Stable interface ▪ For library developers ▪ For users
  37. 37. CQL Example SELECT name, residence FROM users SELECT 01/1/2011 .. 1/1/2012 FROM published_issues WHERE KEY = „spiderman‟ SELECT FIRST 5 FROM allies WHERE KEY = „spiderman‟
  38. 38. CQL Example SELECT name, residence FROM users SELECT 01/1/2011 .. 1/1/2012 FROM published_issues WHERE KEY = „spiderman‟ SELECT FIRST 5 FROM allies WHERE KEY = „spiderman‟
  39. 39. CQL Example SELECT name, residence FROM users SELECT 01/1/2011 .. 1/1/2012 FROM published_issues WHERE KEY = „spiderman‟ SELECT FIRST 5 FROM allies WHERE KEY = „spiderman‟
  40. 40. Cassandra JDBC Driver import java.sql.*; Class.forName( "org.apache.cassandra.cql.jdbc.CassandraDriver"); Connection con = DriverManager.getConnection( "jdbc:cassandra://localhost:9160/keyspace");
  41. 41. Cassandra JDBC Driver Statement stmt = con.createStatement(); ResultSet rs = stmt.executeQuery( “SELECT name, residence FROM users WHERE KEY =" + key + "");
  42. 42. Cassandra JDBC Driver JDBC
  43. 43. Hector SliceQuery<...> query = HFactory.createSliceQuery(keyspace, ...); query.setRange(startDate, endDate, false, 100) .setColumnFamily("published_issues") .setKey("spiderman"); QueryResult<ColumnSlice<Date, String>> result = query.execute();
  44. 44. Hector: Advanced Features » Failover support » Connection pooling » Load balancing » JMX counters » Object mapper
  45. 45. Maven plugin mvn cassandra:start Run your tests mvn cassandra:cql-exec mvn cassandra:stop
  46. 46. The Road to Mastership Introduction to Cassandra Modeling Data DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
  47. 47. Queries First » Use the same Column Family for data that should be fetched together ▪ Reduces IO » Consider filtering and ordering
  48. 48. Denormalize » Less seeks - faster reads » Storing redundant data ▪ Manually handling data integrity » Disk space is cheaper than seek time
  49. 49. Secondary Index » Requirement: Find all superheroes that live in New York icon name residence spiderman Peter Parker New York
  50. 50. Secondary Index » Requirement: Find all superheroes that live in New York icon name residence spiderman Peter Parker New York create column family users ... and column_metadata= [{column_name: residence, index_type: KEYS}]; » Good nameindexes with low cardinality SELECT for FROM users WHERE residence = „New York‟
  51. 51. Manually Managed Index » Requirement: Find a superhero by name
  52. 52. Manually Managed Index » Requirement: Find a superhero by name hulk batman Bruce Search Keys in term users CF spiderman Peter » Manually maintain an inverted index
  53. 53. Bucketing hulk_jan 1/1/2012 2/1/2012 4/1/2012 _2012 Issue-1 Issue-2 Issue-3 All issues hulk_feb 2/2/2012 28/2/2012 29/2/2012 _2012 Issue-4 Issue-5 Issue-6 By month
  54. 54. The Road to Mastership Introduction to Cassandra Cassandra Cluster DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
  55. 55. Virtual Ring 10 90 40 75 60
  56. 56. Node Token 10 Node Keys 90 40 10 91-10 40 11-40 60 41-60 75 61-75 90 76-90 75 60
  57. 57. Node Token hulkMD5’(hulk) = 20 10 90 40 75 60
  58. 58. Node TokenMD5’(hulk) = 20 10 hulk 90 40 75 60
  59. 59. Node Token 10 hulk thor 40MD5’(thor) = 42 90 75 60
  60. 60. Node Token 10 hulkMD5’(thor) = 42 90 40 thor 75 60
  61. 61. Inter-Node Communication 10 90 40» Gossip» Failure Detection 75 60
  62. 62. Fault Tolerance» Replication factor» Hinted Handoff 10 hulk 90 40 75 60 thor
  63. 63. Replication Factor» Replication factor» Hinted Handoff 10 hulk thor 90 Replication 40 factor = 3 hulk hulk thor 75 60 thor
  64. 64. Fault Tolerance» Replication factor» Hinted Handoff 10 90 40 75 60
  65. 65. Hinted Handoff» Replication factor» Hinted Handoff 10 90 40 75 60
  66. 66. Hinted Handoff» Replication factor» Hinted Handoff 10 90 40 75 60
  67. 67. Client Requests Coordinator 10 Write Request 90 75 60
  68. 68. Consistency Level Consistency level = ONE 10 Write Request 90 75 60
  69. 69. Consistency Level Consistency level = ALL 10 Write Request 90 75 60
  70. 70. The Road to Mastership Introduction to Cassandra Summary DataRunning a ModelServer Modeling Data Communicating with the Server Growing a Cluster
  71. 71. Where Do You Sign? » Cassandra ▪ http://cassandra.apache.com ▪ http://www.datastax.com/ • Docs, tutorials & videos ▪ IRC: #cassandra on freenode » Hector ▪ https://github.com/rantav/hector ▪ https://github.com/zznate/hector-examples

×