Cassandra Community Webinar: Back to Basics with CQL3
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Cassandra Community Webinar: Back to Basics with CQL3

on

  • 1,701 views

Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this ...

Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this fast-paced presentation, we'll briefly describe big data, and the area of big data that Cassandra is designed to fill. We will cover Cassandra's unique, every-node-the-same architecture. We will reveal Cassandra's internal data structure and explain just why Cassandra is so darned fast. Finally, we'll wrap up with a discussion of data modeling using the new standard protocol: CQL (Cassandra Query Language).

Statistics

Views

Total Views
1,701
Views on SlideShare
1,697
Embed Views
4

Actions

Likes
5
Downloads
35
Comments
0

1 Embed 4

https://twitter.com 4

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • In Cassandra 1.1?

Cassandra Community Webinar: Back to Basics with CQL3 Presentation Transcript

  • 1. Back to Basics with CQL3 Matt Overstreet OpenSource Connections OpenSource Connections
  • 2. Outline • • • • • Overview Architecture Data Modeling Good At/Bad At Using Cassandra OpenSource Connections
  • 3. Outline • • • • • Overview Architecture Data Modeling Good At/Bad At Using Cassandra • What is Big Data? • How does Cassandra fit? OpenSource Connections
  • 4. What is Big Data? • The three V’s (and a C) velocity volume Variety Complexity OpenSource Connections
  • 5. What is Big Data • Brewer’s CAP theorem o o o o Consistency - all nodes have same world view Availability - requests can be serviced Partition tolerance - network/machine failure Can’t have all 3 -- Pick 2! • Examples o MySQL – Consistent, Available o HBase – Consistent, Partition Tolerant o Cassandra – Available, Partition Tolerant – and “Tunably Consistent”! OpenSource Connections
  • 6. What is Big Data? • Common theme: Denormalize everything! o What’s that? • JOIN all the tables in the database... • … well not all the tables o Why? • You can shard database at any point • All related data is co-located • What this means for you o o o o o No joins No transactions - potential for inconsistency Vastly simplified querying No data-modeling -- Instead, query-modeling “Infinite and easy” scaling potential OpenSource Connections
  • 7. How Does Cassandra Fit? • No single point of failure • Optimized for writes, still good with reads • Can decide between Consistency and Availably concerns OpenSource Connections
  • 8. Outline • • • • • Overview Architecture Data Modeling Good At/Bad At Using Cassandra • Ring architecture • Data partitioning o Operations o Writes o Reads OpenSource Connections
  • 9. Ring Architecture • No single point of failure • Nodes talk via gossip • Democratic - all nodes are equal OpenSource Connections
  • 10. Data Partitioning Original partitioning method. OpenSource Connections
  • 11. Data Partitioning Flexible partitioning with virtual nodes. OpenSource Connections
  • 12. Operations: Writes Requests sent out to nodes and replicants. OpenSource Connections
  • 13. Operations: Reads Coordinator node reaches out to relevant replicants. OpenSource Connections
  • 14. Outline • • • • • Overview Architecture Data Modeling Good At/Bad At Using Cassandra • • • • Internals Cassandra Query Language Modeling Strategy Example OpenSource Connections
  • 15. C* Data Model Keyspace OpenSource Connections
  • 16. C* Data Model Keyspace Column Family Column Family OpenSource Connections
  • 17. C* Data Model Keyspace Column Family Column Family OpenSource Connections
  • 18. C* Data Model Keyspace Column Family Column Family OpenSource Connections
  • 19. C* Data Model Row Key OpenSource Connections
  • 20. C* Data Model Row Key Column Column Name Column Value (or Tombstone) Timestamp Time-to-live OpenSource Connections
  • 21. C* Data Model Row Key Column Column Name Column Value (or Tombstone) Timestamp Time-to-live ● Row Key, Column Name, Column Value have types ● Column Name has comparator ● RowKey has partitioner ● Rows can have any number of columns - even in same column family ● Rows can have many columns ● Column Values can be omitted ● Time-to-live is useful! ● Tombstones OpenSource Connections
  • 22. C* Data Model: Writes Mem Table CommitLog Row Cache Bloom Filter ● Insert into MemTable ● Dump to CommitLog ● No read ● Very Fast! ● Blocks on CPU before O/I! Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 23. C* Data Model: Writes Mem Table CommitLog Row Cache Bloom Filter ● Insert into MemTable ● Dump to CommitLog ● No read ● Very Fast! ● Blocks on CPU before O/I! Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 24. C* Data Model: Writes Mem Table CommitLog Row Cache Bloom Filter ● Insert into MemTable ● Dump to CommitLog ● No read ● Very Fast! ● Blocks on CPU before O/I! Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 25. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 26. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 27. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 28. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 29. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 30. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 31. Internals: Twitter Example • 4 ColumnFamilies o o o o followers following tweets timeline OpenSource Connections
  • 32. Internals: Twitter Example • 4 ColumnFamilies o o o o followers following tweets timeline • Nate follows Patricia o o o o SET followers[Patricia][Nate] = „‟; SET following[Nate][Patricia] = „‟; storing data in column names (not values) denormalized, redundant! • Get all Nate’s followers o GET followers[Patricia] o => Nate,Eric,Scott,Matt,Doug,Kate o No JOIN! OpenSource Connections
  • 33. Internals: Twitter Example • Nate tweets o SET tweets[Nate][2013-07-19 T 09:20] = “Wonderful morning. This coffee is great.” o SET tweets[Nate][2013-07-19 T 09:21] = “Oops, smoke is coming out of the SQL server!” o SET tweets[Nate][2013-07-19 T 09:51] = “Now my coffee is cold :-(” • Get Nate’s tweets o GET tweets[Nate] …(what you’d expect)... OpenSource Connections
  • 34. CQL (Cassandra Query Language) CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp ); OpenSource Connections
  • 35. CQL (Cassandra Query Language) CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp ); INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),'Berryman',‟John','1975-09-15'); OpenSource Connections
  • 36. CQL (Cassandra Query Language) CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp ); INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),‟Berryman‟,‟John‟,‟1975-09-15‟); UPDATE users SET firstname = ‟John‟ WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6; OpenSource Connections
  • 37. CQL (Cassandra Query Language) CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp ); INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),'Berryman',‟John','1975-09-15'); UPDATE users SET firstname = 'John‟ WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6; SELECT dateofbirth,firstname,lastname FROM users ; dateofbirth | firstname | lastname --------------------------+-----------+---------1975-09-15 00:00:00-0400 | John | Berryman OpenSource Connections
  • 38. The CQL/Cassandra Mapping CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name) ); OpenSource Connections
  • 39. The CQL/Cassandra Mapping CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name) ); company | name | age | role --------+------+-----+----OSC | eric | 38 | ceo OSC | john | 37 | dev RKG | anya | 29 | lead RKG | ben | 27 | dev RKG | chad | 35 | ops OpenSource Connections
  • 40. The CQL/Cassandra Mapping company | name | age | role --------+------+-----+----OSC | eric | 38 | ceo OSC | john | 37 | dev RKG | anya | 29 | lead RKG | ben | 27 | dev RKG | chad | 35 | ops CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name) ); eric:age OS C eric:role john:age john:role 38 dev 37 dev anya:age RK G anya:role ben:age ben:role chad:age chad:role 29 lead 27 dev 35 ops OpenSource Connections
  • 41. Modeling Strategy • Don’t think about the data structure • Do think of the questions you’ll ask • Consider efficient operations for Cassandra o o o o Writing (4K writes per second per core) Retrieving a row Retrieving a row slice Retrieving in natural order (which you control) • Write the data in the way you will query it • Disk space is cheap • Seperate read-heavy and write-heavy task o Make wise use of caches OpenSource Connections
  • 42. Modeling Strategy: Anti-Patterns • Read-then-write • Heavy deletes o Scatters dead columns throughout SSTables o Won’t be corrected until first compaction after gc_grace_seconds (10days) • Distributed queue • JOIN-like behavior • Super wide-row sneak attack (>2B columns) OpenSource Connections
  • 43. QUESTIONS? OpenSource Connections