CassandraNick Bailey@nickmbaileynick@datastax.comThursday, May 30, 13
©2012 DataStaxIntroduction2Thursday, May 30, 13
©2012 DataStaxWhy does Cassandra Exist?3Thursday, May 30, 13
©2012 DataStaxAnalytics+Real Time4Big DataThursday, May 30, 13
©2012 DataStaxArchitecture5Thursday, May 30, 13
©2012 DataStaxDynamo+BigTable6Thursday, May 30, 13
©2012 DataStaxWhy do people like Cassandra?7Thursday, May 30, 13
©2012 DataStaxAvailability8Thursday, May 30, 13
©2012 DataStaxScalability9Thursday, May 30, 13
©2012 DataStax 10Thursday, May 30, 13
©2012 DataStaxPerformance11Thursday, May 30, 13
©2012 DataStax 12Thursday, May 30, 13
©2012 DataStaxMulti Datacenter Support13Thursday, May 30, 13
©2012 DataStax 14Thursday, May 30, 13
©2012 DataStaxHadoop Support15Thursday, May 30, 13
©2012 DataStaxHadoop Support• InputFormat• Run tasktrackers/datanodes locally• Run namenode/jobtracker anywhere16Thursday,...
©2012 DataStaxData LocalityWorkload Partitioning17Thursday, May 30, 13
©2012 DataStaxData Modeling18Thursday, May 30, 13
©2012 DataStaxKeyspace,Column Families19Thursday, May 30, 13
©2012 DataStaxDatabase,Tables20Thursday, May 30, 13
©2012 DataStaxColumn Family =Row Key + Columns (name, value)...21Thursday, May 30, 13
©2012 DataStaxStatic Column FamiliesDynamic Column Families22Thursday, May 30, 13
©2012 DataStaxStatic - Users Column Family23Row Keyg_m_bluthpassword:banana standname: GeorgeMichaeltobias_fpassword:c_wea...
©2012 DataStaxDynamic - Friend Column Family24Row Keyg_m_bluth <date>:ann_v <date>:maebytobias_f <date>:barry_z <date>:car...
©2012 DataStaxTime Series Data• Event logs• Metrics• Sensor Data• Etc25Thursday, May 30, 13
©2012 DataStaxTime Series - Login CF26Row Keyg_m_bluth1369633061:United States1369625839:Mexico...tobias_f1369932413:Canad...
©2012 DataStaxWhat Else?27Thursday, May 30, 13
©2012 DataStaxCounter Columns28• Inc/Dec operations• Not idempotent• Possibility for over countingThursday, May 30, 13
©2012 DataStaxExpiring Columns29• TTL - Time to live• Set per column• Possibly an anti-pattern (we’ll get to that later)Th...
©2012 DataStaxSecondary Indexes30• Select * from Users where name=Nick;• Only support ‘=’ clauses (for first condition)• Of...
©2012 DataStaxCQLCassandra Query Language31Thursday, May 30, 13
©2012 DataStax 32CREATE COLUMNFAMILY songs (id uuid PRIMARY KEY,title text,album text,artist text,data blob);INSERT INTO s...
©2012 DataStaxHow do I start?33Thursday, May 30, 13
©2012 DataStaxDefine your questions34Thursday, May 30, 13
©2012 DataStaxSELECT time, location FROMlogins WHERE user =‘nickmbailey’ ORDER BY timeDESC LIMIT 10;35Thursday, May 30, 13
©2012 DataStaxWHERE user = ‘nickmbailey’Row Key36Thursday, May 30, 13
©2012 DataStaxORDER BY time DESC LIMIT10;Store columns in chronologicalorder37Thursday, May 30, 13
©2012 DataStaxCREATE COLUMN FAMILY logins (! user,time,location,PRIMARY KEY (user, time));38Thursday, May 30, 13
©2012 DataStaxWhat about?39Thursday, May 30, 13
©2012 DataStaxSELECT time FROM loginsWHERE user = ‘nickmbailey’and location = ‘United States’;40Thursday, May 30, 13
©2012 DataStax 41g_m_bluth1369633061:United States1369625839:Mexico....1369622839:Canada1369422839:Canada1368422839:Canada...
©2012 DataStaxCREATE COLUMN FAMILYlogins (user, time, location,PRIMARY KEY (user, location));42Thursday, May 30, 13
©2012 DataStax 43g_m_bluthUnited States:1369633061Canada:1369622839....Thursday, May 30, 13
©2012 DataStaxTo Normalize or Not44Thursday, May 30, 13
©2012 DataStaxSELECT time, location FROM.....+SELECT city, state, zip.... FROMlocations.....45Thursday, May 30, 13
©2012 DataStax 46g_m_bluth1369633061:<United States,Austin,Texas,78701>1369625839:<Mexico,Tiajuana,88191>1358633061:<Unite...
©2012 DataStaxAnti Patterns47Thursday, May 30, 13
©2012 DataStaxBatched Writes• Failure case is suboptimal• Increased chance of failure• Tune to your workload48Thursday, Ma...
©2012 DataStaxBOP/OPP• You don’t really need it• Your Ops Team will hate you• Really, you don’t need it.49Thursday, May 30...
©2012 DataStaxSuper Columns• Performance penalty• Speed• Memory• Replaced by CQL350Thursday, May 30, 13
©2012 DataStaxRead Before Write• Race conditions• Hurts performance• Cache• IO51Thursday, May 30, 13
©2012 DataStaxQueues• More generally, many deletes within a row• A delete in Cassandra is actually a tombstone• Read 1000 ...
©2012 DataStaxUse Cases53Thursday, May 30, 13
©2012 DataStaxEbay54Thursday, May 30, 13
©2012 DataStaxhttp://www.youtube.com/watch?v=F-fYqPu2ciQ55Thursday, May 30, 13
©2012 DataStaxEbay• dozens of nodes• 200 TB+ of storage56Thursday, May 30, 13
©2012 DataStaxEbay• Social Signals• Hunch Taste Graph• Various Time Series57Thursday, May 30, 13
©2012 DataStaxSocial Signals• Like, Own, Want• Need:• scalable counters• high performance writes• want to find most popular...
©2012 DataStaxSocial Signals59Row Keyitem_id_1 like: 300 own:104 want:105item_id_2 ... ... ...ItemCountRow Keyuser_id_1 li...
©2012 DataStaxSocial Signals60Row Keyitem_id_1 user_id_1:<time> user_id_2:<time> ...item_id_2 ... ... ...ItemLikeRow Keyus...
©2012 DataStaxSocial Signals - Possibilities• Store aggregated counts per category• Column names are counts• Get top N ite...
Questions?Thursday, May 30, 13
Come to the Summit!Ask me for a discount codeJune 11-12, 2013San Francisco, CAhttp://www.datastax.com/company/news-and-eve...
Upcoming SlideShare
Loading in …5
×

Introduction to Cassandra and Data Modeling

4,036 views

Published on

Introduction to Cassandra and Data Modeling

  1. 1. CassandraNick Bailey@nickmbaileynick@datastax.comThursday, May 30, 13
  2. 2. ©2012 DataStaxIntroduction2Thursday, May 30, 13
  3. 3. ©2012 DataStaxWhy does Cassandra Exist?3Thursday, May 30, 13
  4. 4. ©2012 DataStaxAnalytics+Real Time4Big DataThursday, May 30, 13
  5. 5. ©2012 DataStaxArchitecture5Thursday, May 30, 13
  6. 6. ©2012 DataStaxDynamo+BigTable6Thursday, May 30, 13
  7. 7. ©2012 DataStaxWhy do people like Cassandra?7Thursday, May 30, 13
  8. 8. ©2012 DataStaxAvailability8Thursday, May 30, 13
  9. 9. ©2012 DataStaxScalability9Thursday, May 30, 13
  10. 10. ©2012 DataStax 10Thursday, May 30, 13
  11. 11. ©2012 DataStaxPerformance11Thursday, May 30, 13
  12. 12. ©2012 DataStax 12Thursday, May 30, 13
  13. 13. ©2012 DataStaxMulti Datacenter Support13Thursday, May 30, 13
  14. 14. ©2012 DataStax 14Thursday, May 30, 13
  15. 15. ©2012 DataStaxHadoop Support15Thursday, May 30, 13
  16. 16. ©2012 DataStaxHadoop Support• InputFormat• Run tasktrackers/datanodes locally• Run namenode/jobtracker anywhere16Thursday, May 30, 13
  17. 17. ©2012 DataStaxData LocalityWorkload Partitioning17Thursday, May 30, 13
  18. 18. ©2012 DataStaxData Modeling18Thursday, May 30, 13
  19. 19. ©2012 DataStaxKeyspace,Column Families19Thursday, May 30, 13
  20. 20. ©2012 DataStaxDatabase,Tables20Thursday, May 30, 13
  21. 21. ©2012 DataStaxColumn Family =Row Key + Columns (name, value)...21Thursday, May 30, 13
  22. 22. ©2012 DataStaxStatic Column FamiliesDynamic Column Families22Thursday, May 30, 13
  23. 23. ©2012 DataStaxStatic - Users Column Family23Row Keyg_m_bluthpassword:banana standname: GeorgeMichaeltobias_fpassword:c_weathersname:Tobias phone: 512-7777Thursday, May 30, 13
  24. 24. ©2012 DataStaxDynamic - Friend Column Family24Row Keyg_m_bluth <date>:ann_v <date>:maebytobias_f <date>:barry_z <date>:carl_w <date>:lindsay ...Thursday, May 30, 13
  25. 25. ©2012 DataStaxTime Series Data• Event logs• Metrics• Sensor Data• Etc25Thursday, May 30, 13
  26. 26. ©2012 DataStaxTime Series - Login CF26Row Keyg_m_bluth1369633061:United States1369625839:Mexico...tobias_f1369932413:Canada1369681738:United States...Thursday, May 30, 13
  27. 27. ©2012 DataStaxWhat Else?27Thursday, May 30, 13
  28. 28. ©2012 DataStaxCounter Columns28• Inc/Dec operations• Not idempotent• Possibility for over countingThursday, May 30, 13
  29. 29. ©2012 DataStaxExpiring Columns29• TTL - Time to live• Set per column• Possibly an anti-pattern (we’ll get to that later)Thursday, May 30, 13
  30. 30. ©2012 DataStaxSecondary Indexes30• Select * from Users where name=Nick;• Only support ‘=’ clauses (for first condition)• Often misusedThursday, May 30, 13
  31. 31. ©2012 DataStaxCQLCassandra Query Language31Thursday, May 30, 13
  32. 32. ©2012 DataStax 32CREATE COLUMNFAMILY songs (id uuid PRIMARY KEY,title text,album text,artist text,data blob);INSERT INTO songs (id, title, artist, album)VALUES (a3e64f8f..., La Grange, ZZ Top, Tres Hombres);SELECT * FROM songs;id          | album        | artist         | title-------------+--------------+----------------+----------------2b09185b... |    Roll Away | Back Door Slam | Outside Woman...8a172618... | We Must Obey |      Fu Manchu | Moving in Ste...a3e64f8f... | Tres Hombres |         ZZ Top | La GrangeThursday, May 30, 13
  33. 33. ©2012 DataStaxHow do I start?33Thursday, May 30, 13
  34. 34. ©2012 DataStaxDefine your questions34Thursday, May 30, 13
  35. 35. ©2012 DataStaxSELECT time, location FROMlogins WHERE user =‘nickmbailey’ ORDER BY timeDESC LIMIT 10;35Thursday, May 30, 13
  36. 36. ©2012 DataStaxWHERE user = ‘nickmbailey’Row Key36Thursday, May 30, 13
  37. 37. ©2012 DataStaxORDER BY time DESC LIMIT10;Store columns in chronologicalorder37Thursday, May 30, 13
  38. 38. ©2012 DataStaxCREATE COLUMN FAMILY logins (! user,time,location,PRIMARY KEY (user, time));38Thursday, May 30, 13
  39. 39. ©2012 DataStaxWhat about?39Thursday, May 30, 13
  40. 40. ©2012 DataStaxSELECT time FROM loginsWHERE user = ‘nickmbailey’and location = ‘United States’;40Thursday, May 30, 13
  41. 41. ©2012 DataStax 41g_m_bluth1369633061:United States1369625839:Mexico....1369622839:Canada1369422839:Canada1368422839:Canada....1368421839:Canada1367421839:United States1367411839:Mexico....Thursday, May 30, 13
  42. 42. ©2012 DataStaxCREATE COLUMN FAMILYlogins (user, time, location,PRIMARY KEY (user, location));42Thursday, May 30, 13
  43. 43. ©2012 DataStax 43g_m_bluthUnited States:1369633061Canada:1369622839....Thursday, May 30, 13
  44. 44. ©2012 DataStaxTo Normalize or Not44Thursday, May 30, 13
  45. 45. ©2012 DataStaxSELECT time, location FROM.....+SELECT city, state, zip.... FROMlocations.....45Thursday, May 30, 13
  46. 46. ©2012 DataStax 46g_m_bluth1369633061:<United States,Austin,Texas,78701>1369625839:<Mexico,Tiajuana,88191>1358633061:<UnitedStates,Austin,Texas,78701>Thursday, May 30, 13
  47. 47. ©2012 DataStaxAnti Patterns47Thursday, May 30, 13
  48. 48. ©2012 DataStaxBatched Writes• Failure case is suboptimal• Increased chance of failure• Tune to your workload48Thursday, May 30, 13
  49. 49. ©2012 DataStaxBOP/OPP• You don’t really need it• Your Ops Team will hate you• Really, you don’t need it.49Thursday, May 30, 13
  50. 50. ©2012 DataStaxSuper Columns• Performance penalty• Speed• Memory• Replaced by CQL350Thursday, May 30, 13
  51. 51. ©2012 DataStaxRead Before Write• Race conditions• Hurts performance• Cache• IO51Thursday, May 30, 13
  52. 52. ©2012 DataStaxQueues• More generally, many deletes within a row• A delete in Cassandra is actually a tombstone• Read 1000 tombstones in order to find 10columns52Thursday, May 30, 13
  53. 53. ©2012 DataStaxUse Cases53Thursday, May 30, 13
  54. 54. ©2012 DataStaxEbay54Thursday, May 30, 13
  55. 55. ©2012 DataStaxhttp://www.youtube.com/watch?v=F-fYqPu2ciQ55Thursday, May 30, 13
  56. 56. ©2012 DataStaxEbay• dozens of nodes• 200 TB+ of storage56Thursday, May 30, 13
  57. 57. ©2012 DataStaxEbay• Social Signals• Hunch Taste Graph• Various Time Series57Thursday, May 30, 13
  58. 58. ©2012 DataStaxSocial Signals• Like, Own, Want• Need:• scalable counters• high performance writes• want to find most popular items in a givencategory58Thursday, May 30, 13
  59. 59. ©2012 DataStaxSocial Signals59Row Keyitem_id_1 like: 300 own:104 want:105item_id_2 ... ... ...ItemCountRow Keyuser_id_1 like: 50 own:10 want:75user_id_2 ... ... ...UserCountThursday, May 30, 13
  60. 60. ©2012 DataStaxSocial Signals60Row Keyitem_id_1 user_id_1:<time> user_id_2:<time> ...item_id_2 ... ... ...ItemLikeRow Keyuser_id_1 <time>: <item_id> <time>: <item_id> ...user_id_2 ... ... ...UserLikeThursday, May 30, 13
  61. 61. ©2012 DataStaxSocial Signals - Possibilities• Store aggregated counts per category• Column names are counts• Get top N items in a category61Thursday, May 30, 13
  62. 62. Questions?Thursday, May 30, 13
  63. 63. Come to the Summit!Ask me for a discount codeJune 11-12, 2013San Francisco, CAhttp://www.datastax.com/company/news-and-events/events/cassandrasummit2013Thursday, May 30, 13

×