• Save
Introduction to Cassandra and Data Modeling
Upcoming SlideShare
Loading in...5
×
 

Introduction to Cassandra and Data Modeling

on

  • 2,402 views

 

Statistics

Views

Total Views
2,402
Slideshare-icon Views on SlideShare
2,402
Embed Views
0

Actions

Likes
7
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Introduction to Cassandra and Data Modeling Introduction to Cassandra and Data Modeling Presentation Transcript

    • CassandraNick Bailey@nickmbaileynick@datastax.comThursday, May 30, 13
    • ©2012 DataStaxIntroduction2Thursday, May 30, 13
    • ©2012 DataStaxWhy does Cassandra Exist?3Thursday, May 30, 13
    • ©2012 DataStaxAnalytics+Real Time4Big DataThursday, May 30, 13
    • ©2012 DataStaxArchitecture5Thursday, May 30, 13
    • ©2012 DataStaxDynamo+BigTable6Thursday, May 30, 13
    • ©2012 DataStaxWhy do people like Cassandra?7Thursday, May 30, 13
    • ©2012 DataStaxAvailability8Thursday, May 30, 13
    • ©2012 DataStaxScalability9Thursday, May 30, 13
    • ©2012 DataStax 10Thursday, May 30, 13
    • ©2012 DataStaxPerformance11Thursday, May 30, 13
    • ©2012 DataStax 12Thursday, May 30, 13
    • ©2012 DataStaxMulti Datacenter Support13Thursday, May 30, 13
    • ©2012 DataStax 14Thursday, May 30, 13
    • ©2012 DataStaxHadoop Support15Thursday, May 30, 13
    • ©2012 DataStaxHadoop Support• InputFormat• Run tasktrackers/datanodes locally• Run namenode/jobtracker anywhere16Thursday, May 30, 13
    • ©2012 DataStaxData LocalityWorkload Partitioning17Thursday, May 30, 13
    • ©2012 DataStaxData Modeling18Thursday, May 30, 13
    • ©2012 DataStaxKeyspace,Column Families19Thursday, May 30, 13
    • ©2012 DataStaxDatabase,Tables20Thursday, May 30, 13
    • ©2012 DataStaxColumn Family =Row Key + Columns (name, value)...21Thursday, May 30, 13
    • ©2012 DataStaxStatic Column FamiliesDynamic Column Families22Thursday, May 30, 13
    • ©2012 DataStaxStatic - Users Column Family23Row Keyg_m_bluthpassword:banana standname: GeorgeMichaeltobias_fpassword:c_weathersname:Tobias phone: 512-7777Thursday, May 30, 13
    • ©2012 DataStaxDynamic - Friend Column Family24Row Keyg_m_bluth <date>:ann_v <date>:maebytobias_f <date>:barry_z <date>:carl_w <date>:lindsay ...Thursday, May 30, 13
    • ©2012 DataStaxTime Series Data• Event logs• Metrics• Sensor Data• Etc25Thursday, May 30, 13
    • ©2012 DataStaxTime Series - Login CF26Row Keyg_m_bluth1369633061:United States1369625839:Mexico...tobias_f1369932413:Canada1369681738:United States...Thursday, May 30, 13
    • ©2012 DataStaxWhat Else?27Thursday, May 30, 13
    • ©2012 DataStaxCounter Columns28• Inc/Dec operations• Not idempotent• Possibility for over countingThursday, May 30, 13
    • ©2012 DataStaxExpiring Columns29• TTL - Time to live• Set per column• Possibly an anti-pattern (we’ll get to that later)Thursday, May 30, 13
    • ©2012 DataStaxSecondary Indexes30• Select * from Users where name=Nick;• Only support ‘=’ clauses (for first condition)• Often misusedThursday, May 30, 13
    • ©2012 DataStaxCQLCassandra Query Language31Thursday, May 30, 13
    • ©2012 DataStax 32CREATE COLUMNFAMILY songs (id uuid PRIMARY KEY,title text,album text,artist text,data blob);INSERT INTO songs (id, title, artist, album)VALUES (a3e64f8f..., La Grange, ZZ Top, Tres Hombres);SELECT * FROM songs;id          | album        | artist         | title-------------+--------------+----------------+----------------2b09185b... |    Roll Away | Back Door Slam | Outside Woman...8a172618... | We Must Obey |      Fu Manchu | Moving in Ste...a3e64f8f... | Tres Hombres |         ZZ Top | La GrangeThursday, May 30, 13
    • ©2012 DataStaxHow do I start?33Thursday, May 30, 13
    • ©2012 DataStaxDefine your questions34Thursday, May 30, 13
    • ©2012 DataStaxSELECT time, location FROMlogins WHERE user =‘nickmbailey’ ORDER BY timeDESC LIMIT 10;35Thursday, May 30, 13
    • ©2012 DataStaxWHERE user = ‘nickmbailey’Row Key36Thursday, May 30, 13
    • ©2012 DataStaxORDER BY time DESC LIMIT10;Store columns in chronologicalorder37Thursday, May 30, 13
    • ©2012 DataStaxCREATE COLUMN FAMILY logins (! user,time,location,PRIMARY KEY (user, time));38Thursday, May 30, 13
    • ©2012 DataStaxWhat about?39Thursday, May 30, 13
    • ©2012 DataStaxSELECT time FROM loginsWHERE user = ‘nickmbailey’and location = ‘United States’;40Thursday, May 30, 13
    • ©2012 DataStax 41g_m_bluth1369633061:United States1369625839:Mexico....1369622839:Canada1369422839:Canada1368422839:Canada....1368421839:Canada1367421839:United States1367411839:Mexico....Thursday, May 30, 13
    • ©2012 DataStaxCREATE COLUMN FAMILYlogins (user, time, location,PRIMARY KEY (user, location));42Thursday, May 30, 13
    • ©2012 DataStax 43g_m_bluthUnited States:1369633061Canada:1369622839....Thursday, May 30, 13
    • ©2012 DataStaxTo Normalize or Not44Thursday, May 30, 13
    • ©2012 DataStaxSELECT time, location FROM.....+SELECT city, state, zip.... FROMlocations.....45Thursday, May 30, 13
    • ©2012 DataStax 46g_m_bluth1369633061:<United States,Austin,Texas,78701>1369625839:<Mexico,Tiajuana,88191>1358633061:<UnitedStates,Austin,Texas,78701>Thursday, May 30, 13
    • ©2012 DataStaxAnti Patterns47Thursday, May 30, 13
    • ©2012 DataStaxBatched Writes• Failure case is suboptimal• Increased chance of failure• Tune to your workload48Thursday, May 30, 13
    • ©2012 DataStaxBOP/OPP• You don’t really need it• Your Ops Team will hate you• Really, you don’t need it.49Thursday, May 30, 13
    • ©2012 DataStaxSuper Columns• Performance penalty• Speed• Memory• Replaced by CQL350Thursday, May 30, 13
    • ©2012 DataStaxRead Before Write• Race conditions• Hurts performance• Cache• IO51Thursday, May 30, 13
    • ©2012 DataStaxQueues• More generally, many deletes within a row• A delete in Cassandra is actually a tombstone• Read 1000 tombstones in order to find 10columns52Thursday, May 30, 13
    • ©2012 DataStaxUse Cases53Thursday, May 30, 13
    • ©2012 DataStaxEbay54Thursday, May 30, 13
    • ©2012 DataStaxhttp://www.youtube.com/watch?v=F-fYqPu2ciQ55Thursday, May 30, 13
    • ©2012 DataStaxEbay• dozens of nodes• 200 TB+ of storage56Thursday, May 30, 13
    • ©2012 DataStaxEbay• Social Signals• Hunch Taste Graph• Various Time Series57Thursday, May 30, 13
    • ©2012 DataStaxSocial Signals• Like, Own, Want• Need:• scalable counters• high performance writes• want to find most popular items in a givencategory58Thursday, May 30, 13
    • ©2012 DataStaxSocial Signals59Row Keyitem_id_1 like: 300 own:104 want:105item_id_2 ... ... ...ItemCountRow Keyuser_id_1 like: 50 own:10 want:75user_id_2 ... ... ...UserCountThursday, May 30, 13
    • ©2012 DataStaxSocial Signals60Row Keyitem_id_1 user_id_1:<time> user_id_2:<time> ...item_id_2 ... ... ...ItemLikeRow Keyuser_id_1 <time>: <item_id> <time>: <item_id> ...user_id_2 ... ... ...UserLikeThursday, May 30, 13
    • ©2012 DataStaxSocial Signals - Possibilities• Store aggregated counts per category• Column names are counts• Get top N items in a category61Thursday, May 30, 13
    • Questions?Thursday, May 30, 13
    • Come to the Summit!Ask me for a discount codeJune 11-12, 2013San Francisco, CAhttp://www.datastax.com/company/news-and-events/events/cassandrasummit2013Thursday, May 30, 13