Introduction to Cassandra Basics

  • 4,490 views
Uploaded on

An introduction to some basic concepts and data modeling techniques in Cassandra.

An introduction to some basic concepts and data modeling techniques in Cassandra.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • I haven't reviewed it...
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
4,490
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
242
Comments
1
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introduction to Cassandra Nick Bailey @nickmbailey Monday, October 28, 13
  • 2. Who am I? ©2012 DataStax Monday, October 28, 13 2
  • 3. What’s DataStax? ©2012 DataStax Monday, October 28, 13 3
  • 4. On to the good stuff! ©2012 DataStax Monday, October 28, 13 4
  • 5. Why Cassandra? Cluster Architecture Node Architecture 5 Data Modeling Wrap up ©2012 DataStax Monday, October 28, 13
  • 6. Why Cassandra? ©2012 DataStax Monday, October 28, 13 6
  • 7. Time for buzz words! ©2012 DataStax Monday, October 28, 13 Big Data! NoSQL! 7
  • 8. Big Data • Gartner: “...high-volume, high-velocity and high-variety...” • 2 sides of ‘big data’ • • ©2012 DataStax Monday, October 28, 13 Analytics Real-time 8
  • 9. NoSQL • A terrible label • Covers a wide range of DBs • • • • • ©2012 DataStax Monday, October 28, 13 Cassandra Redis MongoDB HBase ... 9
  • 10. Started by Facebook ©2012 DataStax Monday, October 28, 13 10
  • 11. Dynamo (Amazon) + Big Table (Google) ©2012 DataStax Monday, October 28, 13 11
  • 12. ©2012 DataStax Monday, October 28, 13 12
  • 13. Cassandra is great for... • Massive, linear scaling (e.g. CERN hadron collider, Barracuda Networks) • Extremely heavy writes (e.g. BlueMountain Capital – financial tick data) • High availability (e.g. eBay, Eventbrite, Netflix, SoundCloud, HeathCare Anytime, Comcast, GoDaddy, Sony Entertainment Network) ©2012 DataStax Monday, October 28, 13 13
  • 14. ©2012 DataStax Monday, October 28, 13 14
  • 15. ©2012 DataStax Monday, October 28, 13 15
  • 16. http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html ©2012 DataStax Monday, October 28, 13 16 9
  • 17. One size does not fit all Polyglot persistence ©2012 DataStax Monday, October 28, 13 17
  • 18. More Resources • PlanetCassandra.org • Blog • 5 minute interviews ©2012 DataStax Monday, October 28, 13 18
  • 19. Cluster Architecture ©2012 DataStax Monday, October 28, 13 19
  • 20. Data Distribution 0 75 25 50 Hash_Function(Partition Key) >> Token ©2012 DataStax Monday, October 28, 13
  • 21. Replication ©2012 DataStax Monday, October 28, 13
  • 22. Failure Modes ©2012 DataStax Monday, October 28, 13
  • 23. Consistency Level • Multiple options • • • • • ONE QUORUM ALL LOCAL_QUORUM ... • Can be specified per request ©2012 DataStax Monday, October 28, 13 23
  • 24. Quorum ©2012 DataStax Monday, October 28, 13
  • 25. Quorum ©2012 DataStax Monday, October 28, 13
  • 26. Consistency Write CL: ONE ©2012 DataStax Monday, October 28, 13
  • 27. Consistency Read CL: One ©2012 DataStax Monday, October 28, 13
  • 28. Failure Types • UnavailableException • Didn’t even try • Possible success or failure • TimedOutException ©2012 DataStax Monday, October 28, 13 28
  • 29. Multi DC ©2012 DataStax Monday, October 28, 13
  • 30. Gossip • Manages cluster state • • Nodes up/down Nodes joining/leaving • Decentralized ©2012 DataStax Monday, October 28, 13 30
  • 31. Snitch • Responsible for determining cluster topology • Tracks node responsiveness • Simple, PropertyFile, Ec2Snitch, etc... ©2012 DataStax Monday, October 28, 13 31
  • 32. Node Architecture ©2012 DataStax Monday, October 28, 13 32
  • 33. Write Path Write Memtable Memory Disk commit log ©2012 DataStax Monday, October 28, 13 SSTable 33
  • 34. Read Path Read Memtable Memory Disk SSTable ©2012 DataStax Monday, October 28, 13 SSTable 34
  • 35. Data Modeling ©2012 DataStax Monday, October 28, 13 35
  • 36. CQL Cassandra Query Language ©2012 DataStax Monday, October 28, 13 36
  • 37. Terminology • Keyspace • Table (Column Family) • Row • Column • Partition Key • Clustering Key (Optional) ©2012 DataStax Monday, October 28, 13 37
  • 38. For Example: CREATE KEYSPACE packagetracker WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; CREATE KEYSPACE packagetracker WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 2, 'dc2' : 2}; CREATE TABLE events ( package_id text, status_timestamp timestamp, location text, notes text, PRIMARY KEY (package_id, status_timestamp) ); ©2012 DataStax Monday, October 28, 13 38
  • 39. Constructs ©2012 DataStax Monday, October 28, 13 39
  • 40. Basic Data Types • blob • int • text • long • uuid • etc ©2012 DataStax Monday, October 28, 13 40
  • 41. More Data Modeling Constructs • Collections • map, set, list • Time to live (TTL) • Counters • Secondary Indexes ©2012 DataStax Monday, October 28, 13 41
  • 42. Approaching Data Modeling • Model your queries, not your data • Optimize your data model for reads • Don’t be afraid to denormalize • You will get it wrong, iterate ©2012 DataStax Monday, October 28, 13 42
  • 43. An Example: User Logins ©2012 DataStax Monday, October 28, 13 43
  • 44. The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; ©2012 DataStax Monday, October 28, 13 44
  • 45. The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; Partition Key ©2012 DataStax Monday, October 28, 13 45
  • 46. The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; Clustering Key ©2012 DataStax Monday, October 28, 13 Partition Key 46
  • 47. The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; Clustering Key ©2012 DataStax Monday, October 28, 13 Partition Key Additional Columns 47
  • 48. The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; Clustering Key Partition Key Additional Columns CREATE COLUMN FAMILY logins ( user text, time timestamp, location text, PRIMARY KEY (user, time)); ©2012 DataStax Monday, October 28, 13 48
  • 49. The Query What are the last 10 locations nickmbailey logged in from? SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10; CREATE COLUMN FAMILY logins ( user text, time timestamp, location text, PRIMARY KEY (user, time)); Partition key Primary key User Time Location nickmbailey 2013-07-19 09:22:18 Austin, Texas nickmbailey 2013-07-19 14:49:27 Blacksburg, Virginia jsmith 2013-07-20 07:59:34 Atlanta, Georgia ©2012 DataStax Monday, October 28, 13 49
  • 50. Time-series data • By far, the most common data model • Event logs • Metrics • Sensor Data • Etc ©2012 DataStax Monday, October 28, 13 50
  • 51. Another Query When was the last time nickmbailey logged in from San Francisco, California? SELECT time FROM logins WHERE user = ‘nickmbailey’ and location=‘San Francisco, California’; User Time Location nickmbailey 2013-07-19 09:22:18 Austin, Texas nickmbailey 2013-07-19 14:49:27 Blacksburg, Virginia nickmbailey 2013-07-19 14:49:27 Austin, Texas nickmbailey 2013-05-19 14:49:27 Austin, Texas nickmbailey 2013-04-19 14:49:27 San Francisco, California ... ... ... jsmith 2013-07-20 07:59:34 Atlanta, Georgia ©2012 DataStax Monday, October 28, 13 51
  • 52. Another Query When was the last time nickmbailey logged in from Austin, Texas? SELECT time FROM logins_by_location WHERE user = ‘nickmbailey’ and location=‘San Francisco, California’; CREATE COLUMN FAMILY logins_by_location ( user text, time timestamp, location text, PRIMARY KEY (user, location)); ©2012 DataStax Monday, October 28, 13 52
  • 53. Another Query When was the last time nickmbailey logged in from Austin, Texas? SELECT time FROM logins_by_location WHERE user = ‘nickmbailey’ and location=‘San Francisco, California’; CREATE COLUMN FAMILY logins_by_location ( user text, time timestamp, location text, PRIMARY KEY (user, location)); User Location Time nickmbailey Austin, Texas 2013-07-19 09:22:18 nickmbailey Blacksburg, Virginia 2013-07-19 14:49:27 nickmbailey San Francisco, California 2013-07-19 14:49:27 ©2012 DataStax Monday, October 28, 13 53
  • 54. Denormalize • Create materialized views of the same data to support different queries • Storage space is cheap, Cassandra is fast ©2012 DataStax Monday, October 28, 13 54
  • 55. Debugging your data model cqlsh> tracing on; Now tracing requests. cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example'); Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 activity | timestamp | source | source_elapsed -------------------------------------+--------------+-----------+---------------execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0 Parsing statement | 00:02:37,015 | 127.0.0.1 | 81 Preparing statement | 00:02:37,015 | 127.0.0.1 | 273 Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779 Messsage received from /127.0.0.1 Applying mutation Acquiring switchLock Appending to commitlog Adding to memtable Enqueuing response to /127.0.0.1 Sending message to /127.0.0.1 ©2012 DataStax Monday, October 28, 13 | | | | | | | 00:02:37,016 00:02:37,016 00:02:37,016 00:02:37,016 00:02:37,016 00:02:37,016 00:02:37,016 | | | | | | | 127.0.0.2 127.0.0.2 127.0.0.2 127.0.0.2 127.0.0.2 127.0.0.2 127.0.0.2 | | | | | | | 63 220 250 277 378 710 888 55
  • 56. A note on Transactions • In general, you want to construct your data model around them • The latest version of Cassandra has ‘Compare and swap’ • • • ©2012 DataStax Monday, October 28, 13 An implementation of Paxos ...IF NOT EXISTS; ...IF column1 = ‘value’; 56
  • 57. Try it out ©2012 DataStax Monday, October 28, 13 57
  • 58. CCM • CCM - Cassandra Cluster Manager • https://github.com/pcmanus/ccm • • • ccm create test -v 2.0.1 ccm populate -n 3 ccm start • Warning: not lightweight • Example: ©2012 DataStax Monday, October 28, 13 58
  • 59. Clients • Cqlsh • Bundled with Cassandra • • • • java: https://github.com/datastax/java-driver python: https://github.com/datastax/python-driver .net: https://github.com/datastax/csharp-driver and more: http://www.datastax.com/download/ clientdrivers • Drivers ©2012 DataStax Monday, October 28, 13 59
  • 60. Get Help • IRC: #cassandra on freenode • Mailing Lists • Stack Overflow • DataStax Docs • ©2012 DataStax Monday, October 28, 13 http://www.datastax.com/docs 60
  • 61. Questions? ©2012 DataStax Monday, October 28, 13 61
  • 62. Monday, October 28, 13