Cassandra Community Webinar: Back to Basics with CQL3
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Cassandra Community Webinar: Back to Basics with CQL3

  • 1,784 views
Uploaded on

Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this......

Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this fast-paced presentation, we'll briefly describe big data, and the area of big data that Cassandra is designed to fill. We will cover Cassandra's unique, every-node-the-same architecture. We will reveal Cassandra's internal data structure and explain just why Cassandra is so darned fast. Finally, we'll wrap up with a discussion of data modeling using the new standard protocol: CQL (Cassandra Query Language).

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,784
On Slideshare
1,780
From Embeds
4
Number of Embeds
1

Actions

Shares
Downloads
35
Comments
0
Likes
5

Embeds 4

https://twitter.com 4

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • In Cassandra 1.1?

Transcript

  • 1. Back to Basics with CQL3 Matt Overstreet OpenSource Connections OpenSource Connections
  • 2. Outline • • • • • Overview Architecture Data Modeling Good At/Bad At Using Cassandra OpenSource Connections
  • 3. Outline • • • • • Overview Architecture Data Modeling Good At/Bad At Using Cassandra • What is Big Data? • How does Cassandra fit? OpenSource Connections
  • 4. What is Big Data? • The three V’s (and a C) velocity volume Variety Complexity OpenSource Connections
  • 5. What is Big Data • Brewer’s CAP theorem o o o o Consistency - all nodes have same world view Availability - requests can be serviced Partition tolerance - network/machine failure Can’t have all 3 -- Pick 2! • Examples o MySQL – Consistent, Available o HBase – Consistent, Partition Tolerant o Cassandra – Available, Partition Tolerant – and “Tunably Consistent”! OpenSource Connections
  • 6. What is Big Data? • Common theme: Denormalize everything! o What’s that? • JOIN all the tables in the database... • … well not all the tables o Why? • You can shard database at any point • All related data is co-located • What this means for you o o o o o No joins No transactions - potential for inconsistency Vastly simplified querying No data-modeling -- Instead, query-modeling “Infinite and easy” scaling potential OpenSource Connections
  • 7. How Does Cassandra Fit? • No single point of failure • Optimized for writes, still good with reads • Can decide between Consistency and Availably concerns OpenSource Connections
  • 8. Outline • • • • • Overview Architecture Data Modeling Good At/Bad At Using Cassandra • Ring architecture • Data partitioning o Operations o Writes o Reads OpenSource Connections
  • 9. Ring Architecture • No single point of failure • Nodes talk via gossip • Democratic - all nodes are equal OpenSource Connections
  • 10. Data Partitioning Original partitioning method. OpenSource Connections
  • 11. Data Partitioning Flexible partitioning with virtual nodes. OpenSource Connections
  • 12. Operations: Writes Requests sent out to nodes and replicants. OpenSource Connections
  • 13. Operations: Reads Coordinator node reaches out to relevant replicants. OpenSource Connections
  • 14. Outline • • • • • Overview Architecture Data Modeling Good At/Bad At Using Cassandra • • • • Internals Cassandra Query Language Modeling Strategy Example OpenSource Connections
  • 15. C* Data Model Keyspace OpenSource Connections
  • 16. C* Data Model Keyspace Column Family Column Family OpenSource Connections
  • 17. C* Data Model Keyspace Column Family Column Family OpenSource Connections
  • 18. C* Data Model Keyspace Column Family Column Family OpenSource Connections
  • 19. C* Data Model Row Key OpenSource Connections
  • 20. C* Data Model Row Key Column Column Name Column Value (or Tombstone) Timestamp Time-to-live OpenSource Connections
  • 21. C* Data Model Row Key Column Column Name Column Value (or Tombstone) Timestamp Time-to-live ● Row Key, Column Name, Column Value have types ● Column Name has comparator ● RowKey has partitioner ● Rows can have any number of columns - even in same column family ● Rows can have many columns ● Column Values can be omitted ● Time-to-live is useful! ● Tombstones OpenSource Connections
  • 22. C* Data Model: Writes Mem Table CommitLog Row Cache Bloom Filter ● Insert into MemTable ● Dump to CommitLog ● No read ● Very Fast! ● Blocks on CPU before O/I! Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 23. C* Data Model: Writes Mem Table CommitLog Row Cache Bloom Filter ● Insert into MemTable ● Dump to CommitLog ● No read ● Very Fast! ● Blocks on CPU before O/I! Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 24. C* Data Model: Writes Mem Table CommitLog Row Cache Bloom Filter ● Insert into MemTable ● Dump to CommitLog ● No read ● Very Fast! ● Blocks on CPU before O/I! Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 25. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 26. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 27. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 28. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 29. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 30. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
  • 31. Internals: Twitter Example • 4 ColumnFamilies o o o o followers following tweets timeline OpenSource Connections
  • 32. Internals: Twitter Example • 4 ColumnFamilies o o o o followers following tweets timeline • Nate follows Patricia o o o o SET followers[Patricia][Nate] = „‟; SET following[Nate][Patricia] = „‟; storing data in column names (not values) denormalized, redundant! • Get all Nate’s followers o GET followers[Patricia] o => Nate,Eric,Scott,Matt,Doug,Kate o No JOIN! OpenSource Connections
  • 33. Internals: Twitter Example • Nate tweets o SET tweets[Nate][2013-07-19 T 09:20] = “Wonderful morning. This coffee is great.” o SET tweets[Nate][2013-07-19 T 09:21] = “Oops, smoke is coming out of the SQL server!” o SET tweets[Nate][2013-07-19 T 09:51] = “Now my coffee is cold :-(” • Get Nate’s tweets o GET tweets[Nate] …(what you’d expect)... OpenSource Connections
  • 34. CQL (Cassandra Query Language) CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp ); OpenSource Connections
  • 35. CQL (Cassandra Query Language) CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp ); INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),'Berryman',‟John','1975-09-15'); OpenSource Connections
  • 36. CQL (Cassandra Query Language) CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp ); INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),‟Berryman‟,‟John‟,‟1975-09-15‟); UPDATE users SET firstname = ‟John‟ WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6; OpenSource Connections
  • 37. CQL (Cassandra Query Language) CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp ); INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),'Berryman',‟John','1975-09-15'); UPDATE users SET firstname = 'John‟ WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6; SELECT dateofbirth,firstname,lastname FROM users ; dateofbirth | firstname | lastname --------------------------+-----------+---------1975-09-15 00:00:00-0400 | John | Berryman OpenSource Connections
  • 38. The CQL/Cassandra Mapping CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name) ); OpenSource Connections
  • 39. The CQL/Cassandra Mapping CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name) ); company | name | age | role --------+------+-----+----OSC | eric | 38 | ceo OSC | john | 37 | dev RKG | anya | 29 | lead RKG | ben | 27 | dev RKG | chad | 35 | ops OpenSource Connections
  • 40. The CQL/Cassandra Mapping company | name | age | role --------+------+-----+----OSC | eric | 38 | ceo OSC | john | 37 | dev RKG | anya | 29 | lead RKG | ben | 27 | dev RKG | chad | 35 | ops CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name) ); eric:age OS C eric:role john:age john:role 38 dev 37 dev anya:age RK G anya:role ben:age ben:role chad:age chad:role 29 lead 27 dev 35 ops OpenSource Connections
  • 41. Modeling Strategy • Don’t think about the data structure • Do think of the questions you’ll ask • Consider efficient operations for Cassandra o o o o Writing (4K writes per second per core) Retrieving a row Retrieving a row slice Retrieving in natural order (which you control) • Write the data in the way you will query it • Disk space is cheap • Seperate read-heavy and write-heavy task o Make wise use of caches OpenSource Connections
  • 42. Modeling Strategy: Anti-Patterns • Read-then-write • Heavy deletes o Scatters dead columns throughout SSTables o Won’t be corrected until first compaction after gc_grace_seconds (10days) • Distributed queue • JOIN-like behavior • Super wide-row sneak attack (>2B columns) OpenSource Connections
  • 43. QUESTIONS? OpenSource Connections