Back to Basics with
CQL3
Matt Overstreet
OpenSource Connections

OpenSource Connections
Outline
•
•
•
•
•

Overview
Architecture
Data Modeling
Good At/Bad At
Using Cassandra

OpenSource Connections
Outline
•
•
•
•
•

Overview
Architecture
Data Modeling
Good At/Bad At
Using Cassandra

• What is Big Data?
• How does Cass...
What is Big Data?
• The three V’s (and a C)

velocity
volume
Variety
Complexity
OpenSource Connections
What is Big Data
• Brewer’s CAP theorem
o
o
o
o

Consistency - all nodes have same world view
Availability - requests can ...
What is Big Data?
• Common theme: Denormalize everything!
o What’s that?
• JOIN all the tables in the database...
• … well...
How Does Cassandra Fit?
• No single point of failure
• Optimized for writes, still good with reads
• Can decide between Co...
Outline
•
•
•
•
•

Overview
Architecture
Data Modeling
Good At/Bad At
Using Cassandra

• Ring architecture
• Data partitio...
Ring Architecture
• No single point of failure
• Nodes talk via gossip
• Democratic - all nodes
are equal

OpenSource Conn...
Data Partitioning

Original partitioning method.
OpenSource Connections
Data Partitioning

Flexible partitioning with virtual nodes.
OpenSource Connections
Operations: Writes

Requests sent out to nodes and replicants.
OpenSource Connections
Operations: Reads

Coordinator node reaches out to relevant replicants.
OpenSource Connections
Outline
•
•
•
•
•

Overview
Architecture
Data Modeling
Good At/Bad At
Using Cassandra

•
•
•
•

Internals
Cassandra Query ...
C* Data Model
Keyspace

OpenSource Connections
C* Data Model
Keyspace
Column Family

Column Family

OpenSource Connections
C* Data Model
Keyspace
Column Family

Column Family

OpenSource Connections
C* Data Model
Keyspace
Column Family

Column Family

OpenSource Connections
C* Data Model
Row Key

OpenSource Connections
C* Data Model
Row Key

Column
Column Name

Column Value
(or Tombstone)
Timestamp
Time-to-live

OpenSource Connections
C* Data Model
Row Key

Column
Column Name

Column Value
(or Tombstone)
Timestamp
Time-to-live

● Row Key, Column Name, Col...
C* Data Model: Writes

Mem
Table

CommitLog
Row
Cache

Bloom
Filter

● Insert into
MemTable
● Dump to
CommitLog
● No read
...
C* Data Model: Writes

Mem
Table

CommitLog
Row
Cache

Bloom
Filter

● Insert into
MemTable
● Dump to
CommitLog
● No read
...
C* Data Model: Writes

Mem
Table

CommitLog
Row
Cache

Bloom
Filter

● Insert into
MemTable
● Dump to
CommitLog
● No read
...
C* Data Model:
Reads
Mem
Table

CommitLog
Row
Cache

Bloom
Filter

● Get values from Memtable
● Get values from row
cache ...
C* Data Model:
Reads
Mem
Table

CommitLog
Row
Cache

Bloom
Filter

● Get values from Memtable
● Get values from row
cache ...
C* Data Model:
Reads
Mem
Table

CommitLog
Row
Cache

Bloom
Filter

● Get values from Memtable
● Get values from row
cache ...
C* Data Model:
Reads
Mem
Table

CommitLog
Row
Cache

Bloom
Filter

● Get values from Memtable
● Get values from row
cache ...
C* Data Model:
Reads
Mem
Table

CommitLog
Row
Cache

Bloom
Filter

● Get values from Memtable
● Get values from row
cache ...
C* Data Model:
Reads
Mem
Table

CommitLog
Row
Cache

Bloom
Filter

● Get values from Memtable
● Get values from row
cache ...
Internals: Twitter Example
• 4 ColumnFamilies
o
o
o
o

followers
following
tweets
timeline

OpenSource Connections
Internals: Twitter Example
• 4 ColumnFamilies
o
o
o
o

followers
following
tweets
timeline

• Nate follows Patricia
o
o
o
...
Internals: Twitter Example
• Nate tweets
o SET tweets[Nate][2013-07-19 T 09:20] = “Wonderful morning.
This coffee is great...
CQL (Cassandra Query Language)
CREATE TABLE users (
id timeuuid PRIMARY KEY,
lastname varchar,
firstname varchar,
dateOfBi...
CQL (Cassandra Query Language)
CREATE TABLE users (
id timeuuid PRIMARY KEY,
lastname varchar,
firstname varchar,
dateOfBi...
CQL (Cassandra Query Language)
CREATE TABLE users (
id timeuuid PRIMARY KEY,
lastname varchar,
firstname varchar,
dateOfBi...
CQL (Cassandra Query Language)
CREATE TABLE users (
id timeuuid PRIMARY KEY,
lastname varchar,
firstname varchar,
dateOfBi...
The CQL/Cassandra Mapping
CREATE TABLE employees (
company text,
name text,
age int,
role text,
PRIMARY KEY (company,name)...
The CQL/Cassandra Mapping
CREATE TABLE employees (
company text,
name text,
age int,
role text,
PRIMARY KEY (company,name)...
The CQL/Cassandra Mapping
company | name | age | role
--------+------+-----+----OSC | eric | 38 | ceo
OSC | john | 37 | de...
Modeling Strategy
• Don’t think about the data structure
• Do think of the questions you’ll ask
• Consider efficient opera...
Modeling Strategy: Anti-Patterns
• Read-then-write
• Heavy deletes
o Scatters dead columns throughout SSTables
o Won’t be ...
QUESTIONS?

OpenSource Connections
Upcoming SlideShare
Loading in...5
×

Cassandra Community Webinar: Back to Basics with CQL3

2,075

Published on

Cassandra is a distributed, massively scalable, fault tolerant, columnar data store, and if you need the ability to make fast writes, the only thing faster than Cassandra is /dev/null! In this fast-paced presentation, we'll briefly describe big data, and the area of big data that Cassandra is designed to fill. We will cover Cassandra's unique, every-node-the-same architecture. We will reveal Cassandra's internal data structure and explain just why Cassandra is so darned fast. Finally, we'll wrap up with a discussion of data modeling using the new standard protocol: CQL (Cassandra Query Language).

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,075
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
59
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide
  • In Cassandra 1.1?
  • Cassandra Community Webinar: Back to Basics with CQL3

    1. 1. Back to Basics with CQL3 Matt Overstreet OpenSource Connections OpenSource Connections
    2. 2. Outline • • • • • Overview Architecture Data Modeling Good At/Bad At Using Cassandra OpenSource Connections
    3. 3. Outline • • • • • Overview Architecture Data Modeling Good At/Bad At Using Cassandra • What is Big Data? • How does Cassandra fit? OpenSource Connections
    4. 4. What is Big Data? • The three V’s (and a C) velocity volume Variety Complexity OpenSource Connections
    5. 5. What is Big Data • Brewer’s CAP theorem o o o o Consistency - all nodes have same world view Availability - requests can be serviced Partition tolerance - network/machine failure Can’t have all 3 -- Pick 2! • Examples o MySQL – Consistent, Available o HBase – Consistent, Partition Tolerant o Cassandra – Available, Partition Tolerant – and “Tunably Consistent”! OpenSource Connections
    6. 6. What is Big Data? • Common theme: Denormalize everything! o What’s that? • JOIN all the tables in the database... • … well not all the tables o Why? • You can shard database at any point • All related data is co-located • What this means for you o o o o o No joins No transactions - potential for inconsistency Vastly simplified querying No data-modeling -- Instead, query-modeling “Infinite and easy” scaling potential OpenSource Connections
    7. 7. How Does Cassandra Fit? • No single point of failure • Optimized for writes, still good with reads • Can decide between Consistency and Availably concerns OpenSource Connections
    8. 8. Outline • • • • • Overview Architecture Data Modeling Good At/Bad At Using Cassandra • Ring architecture • Data partitioning o Operations o Writes o Reads OpenSource Connections
    9. 9. Ring Architecture • No single point of failure • Nodes talk via gossip • Democratic - all nodes are equal OpenSource Connections
    10. 10. Data Partitioning Original partitioning method. OpenSource Connections
    11. 11. Data Partitioning Flexible partitioning with virtual nodes. OpenSource Connections
    12. 12. Operations: Writes Requests sent out to nodes and replicants. OpenSource Connections
    13. 13. Operations: Reads Coordinator node reaches out to relevant replicants. OpenSource Connections
    14. 14. Outline • • • • • Overview Architecture Data Modeling Good At/Bad At Using Cassandra • • • • Internals Cassandra Query Language Modeling Strategy Example OpenSource Connections
    15. 15. C* Data Model Keyspace OpenSource Connections
    16. 16. C* Data Model Keyspace Column Family Column Family OpenSource Connections
    17. 17. C* Data Model Keyspace Column Family Column Family OpenSource Connections
    18. 18. C* Data Model Keyspace Column Family Column Family OpenSource Connections
    19. 19. C* Data Model Row Key OpenSource Connections
    20. 20. C* Data Model Row Key Column Column Name Column Value (or Tombstone) Timestamp Time-to-live OpenSource Connections
    21. 21. C* Data Model Row Key Column Column Name Column Value (or Tombstone) Timestamp Time-to-live ● Row Key, Column Name, Column Value have types ● Column Name has comparator ● RowKey has partitioner ● Rows can have any number of columns - even in same column family ● Rows can have many columns ● Column Values can be omitted ● Time-to-live is useful! ● Tombstones OpenSource Connections
    22. 22. C* Data Model: Writes Mem Table CommitLog Row Cache Bloom Filter ● Insert into MemTable ● Dump to CommitLog ● No read ● Very Fast! ● Blocks on CPU before O/I! Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
    23. 23. C* Data Model: Writes Mem Table CommitLog Row Cache Bloom Filter ● Insert into MemTable ● Dump to CommitLog ● No read ● Very Fast! ● Blocks on CPU before O/I! Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
    24. 24. C* Data Model: Writes Mem Table CommitLog Row Cache Bloom Filter ● Insert into MemTable ● Dump to CommitLog ● No read ● Very Fast! ● Blocks on CPU before O/I! Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
    25. 25. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
    26. 26. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
    27. 27. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
    28. 28. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
    29. 29. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
    30. 30. C* Data Model: Reads Mem Table CommitLog Row Cache Bloom Filter ● Get values from Memtable ● Get values from row cache if present ● Otherwise check bloom filter to find appropriate SSTables ● Check Key Cache for fast SSTable Search ● Get values from SSTables ● Repopulate Row Cache ● Super Fast Col. retrieval ● Fast row slicing Key Cache Key Cache Key Cache Key Cache SSTable SSTable SSTable SSTable OpenSource Connections
    31. 31. Internals: Twitter Example • 4 ColumnFamilies o o o o followers following tweets timeline OpenSource Connections
    32. 32. Internals: Twitter Example • 4 ColumnFamilies o o o o followers following tweets timeline • Nate follows Patricia o o o o SET followers[Patricia][Nate] = „‟; SET following[Nate][Patricia] = „‟; storing data in column names (not values) denormalized, redundant! • Get all Nate’s followers o GET followers[Patricia] o => Nate,Eric,Scott,Matt,Doug,Kate o No JOIN! OpenSource Connections
    33. 33. Internals: Twitter Example • Nate tweets o SET tweets[Nate][2013-07-19 T 09:20] = “Wonderful morning. This coffee is great.” o SET tweets[Nate][2013-07-19 T 09:21] = “Oops, smoke is coming out of the SQL server!” o SET tweets[Nate][2013-07-19 T 09:51] = “Now my coffee is cold :-(” • Get Nate’s tweets o GET tweets[Nate] …(what you’d expect)... OpenSource Connections
    34. 34. CQL (Cassandra Query Language) CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp ); OpenSource Connections
    35. 35. CQL (Cassandra Query Language) CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp ); INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),'Berryman',‟John','1975-09-15'); OpenSource Connections
    36. 36. CQL (Cassandra Query Language) CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp ); INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),‟Berryman‟,‟John‟,‟1975-09-15‟); UPDATE users SET firstname = ‟John‟ WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6; OpenSource Connections
    37. 37. CQL (Cassandra Query Language) CREATE TABLE users ( id timeuuid PRIMARY KEY, lastname varchar, firstname varchar, dateOfBirth timestamp ); INSERT INTO users (id,lastname, firstname, dateofbirth) VALUES (now(),'Berryman',‟John','1975-09-15'); UPDATE users SET firstname = 'John‟ WHERE id = f74c0b20-0862-11e3-8cf6-b74c10b01fc6; SELECT dateofbirth,firstname,lastname FROM users ; dateofbirth | firstname | lastname --------------------------+-----------+---------1975-09-15 00:00:00-0400 | John | Berryman OpenSource Connections
    38. 38. The CQL/Cassandra Mapping CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name) ); OpenSource Connections
    39. 39. The CQL/Cassandra Mapping CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name) ); company | name | age | role --------+------+-----+----OSC | eric | 38 | ceo OSC | john | 37 | dev RKG | anya | 29 | lead RKG | ben | 27 | dev RKG | chad | 35 | ops OpenSource Connections
    40. 40. The CQL/Cassandra Mapping company | name | age | role --------+------+-----+----OSC | eric | 38 | ceo OSC | john | 37 | dev RKG | anya | 29 | lead RKG | ben | 27 | dev RKG | chad | 35 | ops CREATE TABLE employees ( company text, name text, age int, role text, PRIMARY KEY (company,name) ); eric:age OS C eric:role john:age john:role 38 dev 37 dev anya:age RK G anya:role ben:age ben:role chad:age chad:role 29 lead 27 dev 35 ops OpenSource Connections
    41. 41. Modeling Strategy • Don’t think about the data structure • Do think of the questions you’ll ask • Consider efficient operations for Cassandra o o o o Writing (4K writes per second per core) Retrieving a row Retrieving a row slice Retrieving in natural order (which you control) • Write the data in the way you will query it • Disk space is cheap • Seperate read-heavy and write-heavy task o Make wise use of caches OpenSource Connections
    42. 42. Modeling Strategy: Anti-Patterns • Read-then-write • Heavy deletes o Scatters dead columns throughout SSTables o Won’t be corrected until first compaction after gc_grace_seconds (10days) • Distributed queue • JOIN-like behavior • Super wide-row sneak attack (>2B columns) OpenSource Connections
    43. 43. QUESTIONS? OpenSource Connections
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×