A gentle introduction by @przemur from
Tuesday, October 22, 13
Tuesday, October 22, 13
PERFORMANCE

Tuesday, October 22, 13
http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf
Tuesday, October 22, 13
http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf
Tuesday, October 22, 13
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

Tuesday, October 22, 13
http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html
Tuesday, October 22, 13
http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/

Tuesday, October 22, 13
Tuesday, October 22, 13
A TAXONOMY OF
DISTRIBUTED
DATABASES

Tuesday, October 22, 13
ID

FIRST

LAST

1

John

Smith

2

Mike

Kowalski

:name_1 -> “John Smith”
:name_2 -> “Mike Kowalski”

Company
Employee

...
ID

FIRST

LAST

1

John

Smith

2

Mike

Kowalski

:name_1 -> “John Smith”
:name_2 -> “Mike Kowalski”

Relational (MySQL,...
Key-Value
(Redis, Riak, Dynamo, ...)
ID

FIRST

LAST

1

John

Smith

2

Mike

Kowalski

:name_1 -> “John Smith”
:name_2 -...
Key-Value
(Redis, Riak, Dynamo, ...)
ID

FIRST

LAST

1

John

Smith

2

Mike

Kowalski

:name_1 -> “John Smith”
:name_2 -...
Key-Value
(Redis, Riak, Dynamo, ...)
ID

FIRST

LAST

1

John

Smith

2

Mike

Kowalski

:name_1 -> “John Smith”
:name_2 -...
Key-Value
(Redis, Riak, Dynamo, ...)
ID

FIRST

LAST

1

John

Smith

2

Mike

Kowalski

Relational (MySQL,
Oracle, ...)

...
Consistency

Availability

Partition
tolerance

“Pick any two”
(and have acceptable latency)

Tuesday, October 22, 13
Consistency
RDBMSs

Availability

Partition
tolerance

“Pick any two”
(and have acceptable latency)

Tuesday, October 22, ...
Consistency
Immediate
Consistency: HBase, ...

RDBMSs

Availability

Partition
tolerance

“Pick any two”
(and have accepta...
Consistency
Immediate
Consistency: HBase, ...

RDBMSs

Partition
tolerance

Availability
Eventual Consistency:
Cassandra, ...
Consistency
Immediate
Consistency: HBase, ...

RDBMSs

Partition
tolerance

Availability
Eventual Consistency:
Cassandra, ...
OH REALLY?
•

Cassandra vs. Consistency:
http://aphyr.com/posts/294-call-me-maybe-cassandra

•

CAP criticism:
http://aphy...
KEY IDEAS

Tuesday, October 22, 13
• Dynamo
• simple
• no

partitioning + BigTable model

architecture, minimal administration

single point of failure

• cl...
CASSANDRA’S
DATA MODEL

Tuesday, October 22, 13
Keyspace
Column Family
Row (Partition) Key
Column Name
Value

Tuesday, October 22, 13
Keyspace
Column Family
Row (Partition) Key
Column Name
Value

Tuesday, October 22, 13

“Database”
Keyspace

“Database”

Column Family

“Table”

Row (Partition) Key
Column Name
Value

Tuesday, October 22, 13
Keyspace

“Database”

Column Family

“Table”

Row (Partition) Key

“Primary ID”

Column Name
Value

Tuesday, October 22, 1...
Keyspace

“Database”

Column Family

“Table”

Row (Partition) Key

“Primary ID”

Column Name

Sorted “Column”

Value

Tues...
Keyspace
Column Family

“Table”

Row (Partition) Key

“Primary ID”

Column Name

Sorted “Column”

Value

Tuesday, October ...
PARTITIONING

Tuesday, October 22, 13
TWO PARTITIONERS OUT OF
THE BOX

• Byte

Ordered Partitioner

• Random

Partitioner

http://www.datastax.com/docs/1.0/clus...
TWO PARTITIONERS OUT OF
THE BOX

• Byte

Ordered Partitioner

• Random

Forget it:
•hot spots
•uneven distribution
•load b...
1

aaa

4

2

zzz

bbb

3

xxx

Tuesday, October 22, 13
1

aaa

Initial token

4

2

zzz

bbb

3

xxx

Tuesday, October 22, 13
1

Range: [aaa,bbb)

aaa

Initial token

4

2

zzz

bbb

3

xxx

Tuesday, October 22, 13
1

Range: [aaa,bbb)

aaa

4

2

zzz

bbb

3

xxx

Tuesday, October 22, 13
1

aaa

4

2

zzz

bbb

3

xxx

Tuesday, October 22, 13
Row Key Hash

...

abc

...

klm

...

xyz

...

1

aaa

4

2

zzz

bbb

3

xxx

Tuesday, October 22, 13
Row Key Hash

...

abc

...

klm

...

xyz

...

1

abc

aaa

4

2

zzz

bbb

3

xxx

Tuesday, October 22, 13
Row Key Hash

...

abc

...

klm

...

xyz

...

1

abc

aaa

4

2

zzz

bbb

3

xxx

Tuesday, October 22, 13

klm
Row Key Hash

...

abc

...

klm

...

xyz

...

1

abc

aaa

4

2

zzz

xyz

Tuesday, October 22, 13

bbb

3

xxx

klm
WHAT ABOUT THE
REPLICATION!?

Tuesday, October 22, 13
Replication
Factor = 2
1

abc

aaa

4

2

zzz

xyz
Warning: greatly
simplified.
Checkout snitch
docs for more
info.
Tuesday...
Replication
Factor = 2
1

abc

aaa

4

2

zzz

xyz
Warning: greatly
simplified.
Checkout snitch
docs for more
info.
Tuesday...
Replication
Factor = 2
1

abc

aaa

4

2

zzz

xyz
Warning: greatly
simplified.
Checkout snitch
docs for more
info.
Tuesday...
Replication
Factor = 2
1

xyz

abc

aaa

4

2

zzz

xyz
Warning: greatly
simplified.
Checkout snitch
docs for more
info.
Tu...
Replication
Factor = 3
1

xyz

abc

aaa

4

2

zzz

xyz
klm

Tuesday, October 22, 13

bbb

3

xxx

klm
abc
Replication
Factor = 3
1

xyz

abc

aaa

4

2

zzz

xyz
klm
abc

Tuesday, October 22, 13

bbb

3

xxx

klm
abc
Replication
Factor = 3
1

xyz

aaa

klm

4

2

zzz

xyz
klm
abc

Tuesday, October 22, 13

abc

bbb

3

xxx

klm
abc
Replication
Factor = 3
1

xyz

aaa

klm

4

klm
abc

Tuesday, October 22, 13

xyz

2

zzz

xyz

abc

bbb

3

xxx

klm
abc
Replication
Factor = 3
BTW,
QUORUM = (RF/2)+1

1

xyz

aaa

klm

4

klm
abc

Tuesday, October 22, 13

xyz

2

zzz

xyz

ab...
Tuesday, October 22, 13
WHAT HAPPENS WHEN A
NEW NODE IS BEING ADDED ?
1

5

aaa

???

4

2

zzz

bbb

3

xxx

Tuesday, October 22, 13
VNODES
1

aaa
ccc
ggg

2

4

bbb
vvv
mmm

zzz
eee
ddd

3

xxx
uuu
jjj

Tuesday, October 22, 13
VNODES
5

1

aaa
ccc
ggg

2

4

bbb
vvv
mmm

zzz
eee
ddd

3

xxx
uuu
jjj

Tuesday, October 22, 13
5

1

ggg
mmm

aaa
ccc

4

2

zzz
eee
ddd

bbb
vvv

3

xxx
uuu
jjj

Tuesday, October 22, 13
5

1

ggg
mmm

aaa
ccc

4

2

zzz
eee
ddd

bbb
vvv

3

This also greatly helps in
case when a node is down.

Tuesday, Octo...
CASSANDRA 101

Tuesday, October 22, 13
INSTALLATION &
CONFIGURATION

Tuesday, October 22, 13
SELECT * FROM books;

INSERT
INTO
books (author, title, year)
VALUES
(‘Moby-Dick’, ‘Herman Melville’, 1851);

CQL3

DELETE...
DATA MODELING PRACTICES
COMPOSITE COLUMNS

Tuesday, October 22, 13
Author

Year

Number of words

George Orwell

Animal Farm

1945

32451

George Orwell

1984

1949

110581

James Joyce

Tu...
Author

Book

Year

Number of words

George Orwell

Animal Farm

1945

32451

George Orwell

1984

1949

110581

James Joy...
Author

Book

Year

Number of words

George Orwell

Animal Farm

1945

32451

George Orwell

1984

1949

110581

James Joy...
COUNTERS
http://www.slideshare.net/kevinweil/rainbird-realtimeanalytics-at-twitter-strata-2011

Tuesday, October 22, 13
SETS, LISTS, MAPS

Tuesday, October 22, 13
CONSOLE TIME

Tuesday, October 22, 13
WHAT WE HIT?

Tuesday, October 22, 13
• no

DESCRIBE when calling from a client

• cache

settings

• insertion

performance with 100 000’s of columns

• PRIMAR...
I WANT TO KNOW MORE

Tuesday, October 22, 13
• http://wiki.apache.org/cassandra/ArchitectureOverview
• http://www.datastax.com/documentation/cql/3.0/webhelp/

index.ht...
BY THE WAY...

Tuesday, October 22, 13
Tuesday, October 22, 13

getbase.com

getbase.com

getbase.com

getbase.com

getbase.com

getbase.com

getbase.com

getbas...
Upcoming SlideShare
Loading in …5
×

Apache Cassandra - A gentle introduction

1,828 views
1,642 views

Published on

A presentation about Cassandra, presented by Przemyslaw Maciolek during DataKRK meetup: www.meetup.com/datakrk/events/145043192/

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,828
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Apache Cassandra - A gentle introduction

  1. 1. A gentle introduction by @przemur from Tuesday, October 22, 13
  2. 2. Tuesday, October 22, 13
  3. 3. PERFORMANCE Tuesday, October 22, 13
  4. 4. http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf Tuesday, October 22, 13
  5. 5. http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf Tuesday, October 22, 13
  6. 6. http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html Tuesday, October 22, 13
  7. 7. http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html Tuesday, October 22, 13
  8. 8. http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/ Tuesday, October 22, 13
  9. 9. Tuesday, October 22, 13
  10. 10. A TAXONOMY OF DISTRIBUTED DATABASES Tuesday, October 22, 13
  11. 11. ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Company Employee ACME Employee:1:Name Employee:2:Name John Smith Mike Kowalski Name: John Smith Employee ID: 1 Name: Mike Kowalski ID: 2 works with John Smith Tuesday, October 22, 13 Mike Kowalski
  12. 12. ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Relational (MySQL, Oracle, ...) Company Employee ACME Employee:1:Name Employee:2:Name John Smith Mike Kowalski Name: John Smith Employee ID: 1 Name: Mike Kowalski ID: 2 works with John Smith Tuesday, October 22, 13 Mike Kowalski
  13. 13. Key-Value (Redis, Riak, Dynamo, ...) ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Relational (MySQL, Oracle, ...) Company Employee ACME Employee:1:Name Employee:2:Name John Smith Mike Kowalski Name: John Smith Employee ID: 1 Name: Mike Kowalski ID: 2 works with John Smith Tuesday, October 22, 13 Mike Kowalski
  14. 14. Key-Value (Redis, Riak, Dynamo, ...) ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Relational (MySQL, Oracle, ...) Company Employee ACME Employee:1:Name Employee:2:Name John Smith Mike Kowalski Name: John Smith Employee ID: 1 Name: Mike Kowalski ID: 2 Document (MongoDB, Couchbase, ...) Tuesday, October 22, 13 works with John Smith Mike Kowalski
  15. 15. Key-Value (Redis, Riak, Dynamo, ...) ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Relational (MySQL, Oracle, ...) Company Employee Name: John Smith Employee ID: 1 Name: Mike Kowalski Employee:1:Name Employee:2:Name ACME John Smith Graph (Neo4j, ...) ID: 2 Document (MongoDB, Couchbase, ...) Tuesday, October 22, 13 Mike Kowalski works with John Smith Mike Kowalski
  16. 16. Key-Value (Redis, Riak, Dynamo, ...) ID FIRST LAST 1 John Smith 2 Mike Kowalski Relational (MySQL, Oracle, ...) :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Wide Column (BigTable, Cassandra, HBase, ...) Company Employee Name: John Smith Employee ID: 1 Name: Mike Kowalski Employee:1:Name Employee:2:Name ACME John Smith Graph (Neo4j, ...) ID: 2 Document (MongoDB, Couchbase, ...) Tuesday, October 22, 13 Mike Kowalski works with John Smith Mike Kowalski
  17. 17. Consistency Availability Partition tolerance “Pick any two” (and have acceptable latency) Tuesday, October 22, 13
  18. 18. Consistency RDBMSs Availability Partition tolerance “Pick any two” (and have acceptable latency) Tuesday, October 22, 13
  19. 19. Consistency Immediate Consistency: HBase, ... RDBMSs Availability Partition tolerance “Pick any two” (and have acceptable latency) Tuesday, October 22, 13
  20. 20. Consistency Immediate Consistency: HBase, ... RDBMSs Partition tolerance Availability Eventual Consistency: Cassandra, Riak, ... “Pick any two” (and have acceptable latency) Tuesday, October 22, 13
  21. 21. Consistency Immediate Consistency: HBase, ... RDBMSs Partition tolerance Availability Eventual Consistency: Cassandra, Riak, ... “Pick any two” (and have acceptable latency) Tuesday, October 22, 13 + Configurable (MongoDB, Cassandra - to some extent, ...)
  22. 22. OH REALLY? • Cassandra vs. Consistency: http://aphyr.com/posts/294-call-me-maybe-cassandra • CAP criticism: http://aphyr.com/posts/292-call-me-maybe-nuodb http://www.julianbrowne.com/article/viewer/brewers-cap-theorem http://www.percona.com/live/mysql-conference-2013/sites/default/ files/slides/aslett%20cap%20theorem.pdf Tuesday, October 22, 13
  23. 23. KEY IDEAS Tuesday, October 22, 13
  24. 24. • Dynamo • simple • no partitioning + BigTable model architecture, minimal administration single point of failure • closer • low to the metal (e.g. no HDFS) latency Tuesday, October 22, 13
  25. 25. CASSANDRA’S DATA MODEL Tuesday, October 22, 13
  26. 26. Keyspace Column Family Row (Partition) Key Column Name Value Tuesday, October 22, 13
  27. 27. Keyspace Column Family Row (Partition) Key Column Name Value Tuesday, October 22, 13 “Database”
  28. 28. Keyspace “Database” Column Family “Table” Row (Partition) Key Column Name Value Tuesday, October 22, 13
  29. 29. Keyspace “Database” Column Family “Table” Row (Partition) Key “Primary ID” Column Name Value Tuesday, October 22, 13
  30. 30. Keyspace “Database” Column Family “Table” Row (Partition) Key “Primary ID” Column Name Sorted “Column” Value Tuesday, October 22, 13
  31. 31. Keyspace Column Family “Table” Row (Partition) Key “Primary ID” Column Name Sorted “Column” Value Tuesday, October 22, 13 “Database” “Value”
  32. 32. PARTITIONING Tuesday, October 22, 13
  33. 33. TWO PARTITIONERS OUT OF THE BOX • Byte Ordered Partitioner • Random Partitioner http://www.datastax.com/docs/1.0/cluster_architecture/partitioning Tuesday, October 22, 13
  34. 34. TWO PARTITIONERS OUT OF THE BOX • Byte Ordered Partitioner • Random Forget it: •hot spots •uneven distribution •load balancing Partitioner http://www.datastax.com/docs/1.0/cluster_architecture/partitioning Tuesday, October 22, 13
  35. 35. 1 aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  36. 36. 1 aaa Initial token 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  37. 37. 1 Range: [aaa,bbb) aaa Initial token 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  38. 38. 1 Range: [aaa,bbb) aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  39. 39. 1 aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  40. 40. Row Key Hash ... abc ... klm ... xyz ... 1 aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  41. 41. Row Key Hash ... abc ... klm ... xyz ... 1 abc aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  42. 42. Row Key Hash ... abc ... klm ... xyz ... 1 abc aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13 klm
  43. 43. Row Key Hash ... abc ... klm ... xyz ... 1 abc aaa 4 2 zzz xyz Tuesday, October 22, 13 bbb 3 xxx klm
  44. 44. WHAT ABOUT THE REPLICATION!? Tuesday, October 22, 13
  45. 45. Replication Factor = 2 1 abc aaa 4 2 zzz xyz Warning: greatly simplified. Checkout snitch docs for more info. Tuesday, October 22, 13 bbb 3 xxx klm
  46. 46. Replication Factor = 2 1 abc aaa 4 2 zzz xyz Warning: greatly simplified. Checkout snitch docs for more info. Tuesday, October 22, 13 bbb 3 xxx klm abc
  47. 47. Replication Factor = 2 1 abc aaa 4 2 zzz xyz Warning: greatly simplified. Checkout snitch docs for more info. Tuesday, October 22, 13 klm bbb 3 xxx klm abc
  48. 48. Replication Factor = 2 1 xyz abc aaa 4 2 zzz xyz Warning: greatly simplified. Checkout snitch docs for more info. Tuesday, October 22, 13 klm bbb 3 xxx klm abc
  49. 49. Replication Factor = 3 1 xyz abc aaa 4 2 zzz xyz klm Tuesday, October 22, 13 bbb 3 xxx klm abc
  50. 50. Replication Factor = 3 1 xyz abc aaa 4 2 zzz xyz klm abc Tuesday, October 22, 13 bbb 3 xxx klm abc
  51. 51. Replication Factor = 3 1 xyz aaa klm 4 2 zzz xyz klm abc Tuesday, October 22, 13 abc bbb 3 xxx klm abc
  52. 52. Replication Factor = 3 1 xyz aaa klm 4 klm abc Tuesday, October 22, 13 xyz 2 zzz xyz abc bbb 3 xxx klm abc
  53. 53. Replication Factor = 3 BTW, QUORUM = (RF/2)+1 1 xyz aaa klm 4 klm abc Tuesday, October 22, 13 xyz 2 zzz xyz abc bbb 3 xxx klm abc
  54. 54. Tuesday, October 22, 13
  55. 55. WHAT HAPPENS WHEN A NEW NODE IS BEING ADDED ? 1 5 aaa ??? 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  56. 56. VNODES 1 aaa ccc ggg 2 4 bbb vvv mmm zzz eee ddd 3 xxx uuu jjj Tuesday, October 22, 13
  57. 57. VNODES 5 1 aaa ccc ggg 2 4 bbb vvv mmm zzz eee ddd 3 xxx uuu jjj Tuesday, October 22, 13
  58. 58. 5 1 ggg mmm aaa ccc 4 2 zzz eee ddd bbb vvv 3 xxx uuu jjj Tuesday, October 22, 13
  59. 59. 5 1 ggg mmm aaa ccc 4 2 zzz eee ddd bbb vvv 3 This also greatly helps in case when a node is down. Tuesday, October 22, 13 xxx uuu jjj
  60. 60. CASSANDRA 101 Tuesday, October 22, 13
  61. 61. INSTALLATION & CONFIGURATION Tuesday, October 22, 13
  62. 62. SELECT * FROM books; INSERT INTO books (author, title, year) VALUES (‘Moby-Dick’, ‘Herman Melville’, 1851); CQL3 DELETE FROM books WHERE author=‘Paulo Coelho’; Tuesday, October 22, 13
  63. 63. DATA MODELING PRACTICES COMPOSITE COLUMNS Tuesday, October 22, 13
  64. 64. Author Year Number of words George Orwell Animal Farm 1945 32451 George Orwell 1984 1949 110581 James Joyce Tuesday, October 22, 13 Book Ulysses 1922 265192
  65. 65. Author Book Year Number of words George Orwell Animal Farm 1945 32451 George Orwell 1984 1949 110581 James Joyce Ulysses 1922 265192 CREATE TABLE books ( author varchar, title varchar, year integer, number_of_words integer, PRIMARY KEY (author, title) ); Tuesday, October 22, 13
  66. 66. Author Book Year Number of words George Orwell Animal Farm 1945 32451 George Orwell 1984 1949 110581 James Joyce Ulysses 1922 265192 CREATE TABLE books ( author varchar, title varchar, year integer, number_of_words integer, PRIMARY KEY (author, title) ); George Orwell [1984, Year]: 1949 [1984, Number of words]: 110581 James Joyce [Ulysses, Year]: 1922 [Ulysses, Number of words]: 265192 Tuesday, October 22, 13 [Animal Farm, Year]: 1945 [Animal Farm, Number of words]: 32451
  67. 67. COUNTERS http://www.slideshare.net/kevinweil/rainbird-realtimeanalytics-at-twitter-strata-2011 Tuesday, October 22, 13
  68. 68. SETS, LISTS, MAPS Tuesday, October 22, 13
  69. 69. CONSOLE TIME Tuesday, October 22, 13
  70. 70. WHAT WE HIT? Tuesday, October 22, 13
  71. 71. • no DESCRIBE when calling from a client • cache settings • insertion performance with 100 000’s of columns • PRIMARY KEY((a,b,c),d) • compaction Tuesday, October 22, 13 settings
  72. 72. I WANT TO KNOW MORE Tuesday, October 22, 13
  73. 73. • http://wiki.apache.org/cassandra/ArchitectureOverview • http://www.datastax.com/documentation/cql/3.0/webhelp/ index.html • http://cassandra.apache.org/doc/cql3/CQL.html • http://www.slideshare.net/acunu/freakin-fast-cassandra • http://nosql.mypopescu.com/ • http://planetcassandra.org/ Tuesday, October 22, 13
  74. 74. BY THE WAY... Tuesday, October 22, 13
  75. 75. Tuesday, October 22, 13 getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com HAVE YOU SEEN HIM?

×