Your SlideShare is downloading. ×
Apache Cassandra - A gentle introduction
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Apache Cassandra - A gentle introduction

1,366
views

Published on

A presentation about Cassandra, presented by Przemyslaw Maciolek during DataKRK meetup: www.meetup.com/datakrk/events/145043192/

A presentation about Cassandra, presented by Przemyslaw Maciolek during DataKRK meetup: www.meetup.com/datakrk/events/145043192/

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,366
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A gentle introduction by @przemur from Tuesday, October 22, 13
  • 2. Tuesday, October 22, 13
  • 3. PERFORMANCE Tuesday, October 22, 13
  • 4. http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf Tuesday, October 22, 13
  • 5. http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf Tuesday, October 22, 13
  • 6. http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html Tuesday, October 22, 13
  • 7. http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html Tuesday, October 22, 13
  • 8. http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/ Tuesday, October 22, 13
  • 9. Tuesday, October 22, 13
  • 10. A TAXONOMY OF DISTRIBUTED DATABASES Tuesday, October 22, 13
  • 11. ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Company Employee ACME Employee:1:Name Employee:2:Name John Smith Mike Kowalski Name: John Smith Employee ID: 1 Name: Mike Kowalski ID: 2 works with John Smith Tuesday, October 22, 13 Mike Kowalski
  • 12. ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Relational (MySQL, Oracle, ...) Company Employee ACME Employee:1:Name Employee:2:Name John Smith Mike Kowalski Name: John Smith Employee ID: 1 Name: Mike Kowalski ID: 2 works with John Smith Tuesday, October 22, 13 Mike Kowalski
  • 13. Key-Value (Redis, Riak, Dynamo, ...) ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Relational (MySQL, Oracle, ...) Company Employee ACME Employee:1:Name Employee:2:Name John Smith Mike Kowalski Name: John Smith Employee ID: 1 Name: Mike Kowalski ID: 2 works with John Smith Tuesday, October 22, 13 Mike Kowalski
  • 14. Key-Value (Redis, Riak, Dynamo, ...) ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Relational (MySQL, Oracle, ...) Company Employee ACME Employee:1:Name Employee:2:Name John Smith Mike Kowalski Name: John Smith Employee ID: 1 Name: Mike Kowalski ID: 2 Document (MongoDB, Couchbase, ...) Tuesday, October 22, 13 works with John Smith Mike Kowalski
  • 15. Key-Value (Redis, Riak, Dynamo, ...) ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Relational (MySQL, Oracle, ...) Company Employee Name: John Smith Employee ID: 1 Name: Mike Kowalski Employee:1:Name Employee:2:Name ACME John Smith Graph (Neo4j, ...) ID: 2 Document (MongoDB, Couchbase, ...) Tuesday, October 22, 13 Mike Kowalski works with John Smith Mike Kowalski
  • 16. Key-Value (Redis, Riak, Dynamo, ...) ID FIRST LAST 1 John Smith 2 Mike Kowalski Relational (MySQL, Oracle, ...) :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Wide Column (BigTable, Cassandra, HBase, ...) Company Employee Name: John Smith Employee ID: 1 Name: Mike Kowalski Employee:1:Name Employee:2:Name ACME John Smith Graph (Neo4j, ...) ID: 2 Document (MongoDB, Couchbase, ...) Tuesday, October 22, 13 Mike Kowalski works with John Smith Mike Kowalski
  • 17. Consistency Availability Partition tolerance “Pick any two” (and have acceptable latency) Tuesday, October 22, 13
  • 18. Consistency RDBMSs Availability Partition tolerance “Pick any two” (and have acceptable latency) Tuesday, October 22, 13
  • 19. Consistency Immediate Consistency: HBase, ... RDBMSs Availability Partition tolerance “Pick any two” (and have acceptable latency) Tuesday, October 22, 13
  • 20. Consistency Immediate Consistency: HBase, ... RDBMSs Partition tolerance Availability Eventual Consistency: Cassandra, Riak, ... “Pick any two” (and have acceptable latency) Tuesday, October 22, 13
  • 21. Consistency Immediate Consistency: HBase, ... RDBMSs Partition tolerance Availability Eventual Consistency: Cassandra, Riak, ... “Pick any two” (and have acceptable latency) Tuesday, October 22, 13 + Configurable (MongoDB, Cassandra - to some extent, ...)
  • 22. OH REALLY? • Cassandra vs. Consistency: http://aphyr.com/posts/294-call-me-maybe-cassandra • CAP criticism: http://aphyr.com/posts/292-call-me-maybe-nuodb http://www.julianbrowne.com/article/viewer/brewers-cap-theorem http://www.percona.com/live/mysql-conference-2013/sites/default/ files/slides/aslett%20cap%20theorem.pdf Tuesday, October 22, 13
  • 23. KEY IDEAS Tuesday, October 22, 13
  • 24. • Dynamo • simple • no partitioning + BigTable model architecture, minimal administration single point of failure • closer • low to the metal (e.g. no HDFS) latency Tuesday, October 22, 13
  • 25. CASSANDRA’S DATA MODEL Tuesday, October 22, 13
  • 26. Keyspace Column Family Row (Partition) Key Column Name Value Tuesday, October 22, 13
  • 27. Keyspace Column Family Row (Partition) Key Column Name Value Tuesday, October 22, 13 “Database”
  • 28. Keyspace “Database” Column Family “Table” Row (Partition) Key Column Name Value Tuesday, October 22, 13
  • 29. Keyspace “Database” Column Family “Table” Row (Partition) Key “Primary ID” Column Name Value Tuesday, October 22, 13
  • 30. Keyspace “Database” Column Family “Table” Row (Partition) Key “Primary ID” Column Name Sorted “Column” Value Tuesday, October 22, 13
  • 31. Keyspace Column Family “Table” Row (Partition) Key “Primary ID” Column Name Sorted “Column” Value Tuesday, October 22, 13 “Database” “Value”
  • 32. PARTITIONING Tuesday, October 22, 13
  • 33. TWO PARTITIONERS OUT OF THE BOX • Byte Ordered Partitioner • Random Partitioner http://www.datastax.com/docs/1.0/cluster_architecture/partitioning Tuesday, October 22, 13
  • 34. TWO PARTITIONERS OUT OF THE BOX • Byte Ordered Partitioner • Random Forget it: •hot spots •uneven distribution •load balancing Partitioner http://www.datastax.com/docs/1.0/cluster_architecture/partitioning Tuesday, October 22, 13
  • 35. 1 aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 36. 1 aaa Initial token 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 37. 1 Range: [aaa,bbb) aaa Initial token 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 38. 1 Range: [aaa,bbb) aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 39. 1 aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 40. Row Key Hash ... abc ... klm ... xyz ... 1 aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 41. Row Key Hash ... abc ... klm ... xyz ... 1 abc aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 42. Row Key Hash ... abc ... klm ... xyz ... 1 abc aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13 klm
  • 43. Row Key Hash ... abc ... klm ... xyz ... 1 abc aaa 4 2 zzz xyz Tuesday, October 22, 13 bbb 3 xxx klm
  • 44. WHAT ABOUT THE REPLICATION!? Tuesday, October 22, 13
  • 45. Replication Factor = 2 1 abc aaa 4 2 zzz xyz Warning: greatly simplified. Checkout snitch docs for more info. Tuesday, October 22, 13 bbb 3 xxx klm
  • 46. Replication Factor = 2 1 abc aaa 4 2 zzz xyz Warning: greatly simplified. Checkout snitch docs for more info. Tuesday, October 22, 13 bbb 3 xxx klm abc
  • 47. Replication Factor = 2 1 abc aaa 4 2 zzz xyz Warning: greatly simplified. Checkout snitch docs for more info. Tuesday, October 22, 13 klm bbb 3 xxx klm abc
  • 48. Replication Factor = 2 1 xyz abc aaa 4 2 zzz xyz Warning: greatly simplified. Checkout snitch docs for more info. Tuesday, October 22, 13 klm bbb 3 xxx klm abc
  • 49. Replication Factor = 3 1 xyz abc aaa 4 2 zzz xyz klm Tuesday, October 22, 13 bbb 3 xxx klm abc
  • 50. Replication Factor = 3 1 xyz abc aaa 4 2 zzz xyz klm abc Tuesday, October 22, 13 bbb 3 xxx klm abc
  • 51. Replication Factor = 3 1 xyz aaa klm 4 2 zzz xyz klm abc Tuesday, October 22, 13 abc bbb 3 xxx klm abc
  • 52. Replication Factor = 3 1 xyz aaa klm 4 klm abc Tuesday, October 22, 13 xyz 2 zzz xyz abc bbb 3 xxx klm abc
  • 53. Replication Factor = 3 BTW, QUORUM = (RF/2)+1 1 xyz aaa klm 4 klm abc Tuesday, October 22, 13 xyz 2 zzz xyz abc bbb 3 xxx klm abc
  • 54. Tuesday, October 22, 13
  • 55. WHAT HAPPENS WHEN A NEW NODE IS BEING ADDED ? 1 5 aaa ??? 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 56. VNODES 1 aaa ccc ggg 2 4 bbb vvv mmm zzz eee ddd 3 xxx uuu jjj Tuesday, October 22, 13
  • 57. VNODES 5 1 aaa ccc ggg 2 4 bbb vvv mmm zzz eee ddd 3 xxx uuu jjj Tuesday, October 22, 13
  • 58. 5 1 ggg mmm aaa ccc 4 2 zzz eee ddd bbb vvv 3 xxx uuu jjj Tuesday, October 22, 13
  • 59. 5 1 ggg mmm aaa ccc 4 2 zzz eee ddd bbb vvv 3 This also greatly helps in case when a node is down. Tuesday, October 22, 13 xxx uuu jjj
  • 60. CASSANDRA 101 Tuesday, October 22, 13
  • 61. INSTALLATION & CONFIGURATION Tuesday, October 22, 13
  • 62. SELECT * FROM books; INSERT INTO books (author, title, year) VALUES (‘Moby-Dick’, ‘Herman Melville’, 1851); CQL3 DELETE FROM books WHERE author=‘Paulo Coelho’; Tuesday, October 22, 13
  • 63. DATA MODELING PRACTICES COMPOSITE COLUMNS Tuesday, October 22, 13
  • 64. Author Year Number of words George Orwell Animal Farm 1945 32451 George Orwell 1984 1949 110581 James Joyce Tuesday, October 22, 13 Book Ulysses 1922 265192
  • 65. Author Book Year Number of words George Orwell Animal Farm 1945 32451 George Orwell 1984 1949 110581 James Joyce Ulysses 1922 265192 CREATE TABLE books ( author varchar, title varchar, year integer, number_of_words integer, PRIMARY KEY (author, title) ); Tuesday, October 22, 13
  • 66. Author Book Year Number of words George Orwell Animal Farm 1945 32451 George Orwell 1984 1949 110581 James Joyce Ulysses 1922 265192 CREATE TABLE books ( author varchar, title varchar, year integer, number_of_words integer, PRIMARY KEY (author, title) ); George Orwell [1984, Year]: 1949 [1984, Number of words]: 110581 James Joyce [Ulysses, Year]: 1922 [Ulysses, Number of words]: 265192 Tuesday, October 22, 13 [Animal Farm, Year]: 1945 [Animal Farm, Number of words]: 32451
  • 67. COUNTERS http://www.slideshare.net/kevinweil/rainbird-realtimeanalytics-at-twitter-strata-2011 Tuesday, October 22, 13
  • 68. SETS, LISTS, MAPS Tuesday, October 22, 13
  • 69. CONSOLE TIME Tuesday, October 22, 13
  • 70. WHAT WE HIT? Tuesday, October 22, 13
  • 71. • no DESCRIBE when calling from a client • cache settings • insertion performance with 100 000’s of columns • PRIMARY KEY((a,b,c),d) • compaction Tuesday, October 22, 13 settings
  • 72. I WANT TO KNOW MORE Tuesday, October 22, 13
  • 73. • http://wiki.apache.org/cassandra/ArchitectureOverview • http://www.datastax.com/documentation/cql/3.0/webhelp/ index.html • http://cassandra.apache.org/doc/cql3/CQL.html • http://www.slideshare.net/acunu/freakin-fast-cassandra • http://nosql.mypopescu.com/ • http://planetcassandra.org/ Tuesday, October 22, 13
  • 74. BY THE WAY... Tuesday, October 22, 13
  • 75. Tuesday, October 22, 13 getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com HAVE YOU SEEN HIM?