Your SlideShare is downloading. ×
  • Like
Apache Cassandra - A gentle introduction
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Apache Cassandra - A gentle introduction

  • 1,307 views
Published

A presentation about Cassandra, presented by Przemyslaw Maciolek during DataKRK meetup: www.meetup.com/datakrk/events/145043192/

A presentation about Cassandra, presented by Przemyslaw Maciolek during DataKRK meetup: www.meetup.com/datakrk/events/145043192/

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,307
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
20
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A gentle introduction by @przemur from Tuesday, October 22, 13
  • 2. Tuesday, October 22, 13
  • 3. PERFORMANCE Tuesday, October 22, 13
  • 4. http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf Tuesday, October 22, 13
  • 5. http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf Tuesday, October 22, 13
  • 6. http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html Tuesday, October 22, 13
  • 7. http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html Tuesday, October 22, 13
  • 8. http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/ Tuesday, October 22, 13
  • 9. Tuesday, October 22, 13
  • 10. A TAXONOMY OF DISTRIBUTED DATABASES Tuesday, October 22, 13
  • 11. ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Company Employee ACME Employee:1:Name Employee:2:Name John Smith Mike Kowalski Name: John Smith Employee ID: 1 Name: Mike Kowalski ID: 2 works with John Smith Tuesday, October 22, 13 Mike Kowalski
  • 12. ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Relational (MySQL, Oracle, ...) Company Employee ACME Employee:1:Name Employee:2:Name John Smith Mike Kowalski Name: John Smith Employee ID: 1 Name: Mike Kowalski ID: 2 works with John Smith Tuesday, October 22, 13 Mike Kowalski
  • 13. Key-Value (Redis, Riak, Dynamo, ...) ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Relational (MySQL, Oracle, ...) Company Employee ACME Employee:1:Name Employee:2:Name John Smith Mike Kowalski Name: John Smith Employee ID: 1 Name: Mike Kowalski ID: 2 works with John Smith Tuesday, October 22, 13 Mike Kowalski
  • 14. Key-Value (Redis, Riak, Dynamo, ...) ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Relational (MySQL, Oracle, ...) Company Employee ACME Employee:1:Name Employee:2:Name John Smith Mike Kowalski Name: John Smith Employee ID: 1 Name: Mike Kowalski ID: 2 Document (MongoDB, Couchbase, ...) Tuesday, October 22, 13 works with John Smith Mike Kowalski
  • 15. Key-Value (Redis, Riak, Dynamo, ...) ID FIRST LAST 1 John Smith 2 Mike Kowalski :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Relational (MySQL, Oracle, ...) Company Employee Name: John Smith Employee ID: 1 Name: Mike Kowalski Employee:1:Name Employee:2:Name ACME John Smith Graph (Neo4j, ...) ID: 2 Document (MongoDB, Couchbase, ...) Tuesday, October 22, 13 Mike Kowalski works with John Smith Mike Kowalski
  • 16. Key-Value (Redis, Riak, Dynamo, ...) ID FIRST LAST 1 John Smith 2 Mike Kowalski Relational (MySQL, Oracle, ...) :name_1 -> “John Smith” :name_2 -> “Mike Kowalski” Wide Column (BigTable, Cassandra, HBase, ...) Company Employee Name: John Smith Employee ID: 1 Name: Mike Kowalski Employee:1:Name Employee:2:Name ACME John Smith Graph (Neo4j, ...) ID: 2 Document (MongoDB, Couchbase, ...) Tuesday, October 22, 13 Mike Kowalski works with John Smith Mike Kowalski
  • 17. Consistency Availability Partition tolerance “Pick any two” (and have acceptable latency) Tuesday, October 22, 13
  • 18. Consistency RDBMSs Availability Partition tolerance “Pick any two” (and have acceptable latency) Tuesday, October 22, 13
  • 19. Consistency Immediate Consistency: HBase, ... RDBMSs Availability Partition tolerance “Pick any two” (and have acceptable latency) Tuesday, October 22, 13
  • 20. Consistency Immediate Consistency: HBase, ... RDBMSs Partition tolerance Availability Eventual Consistency: Cassandra, Riak, ... “Pick any two” (and have acceptable latency) Tuesday, October 22, 13
  • 21. Consistency Immediate Consistency: HBase, ... RDBMSs Partition tolerance Availability Eventual Consistency: Cassandra, Riak, ... “Pick any two” (and have acceptable latency) Tuesday, October 22, 13 + Configurable (MongoDB, Cassandra - to some extent, ...)
  • 22. OH REALLY? • Cassandra vs. Consistency: http://aphyr.com/posts/294-call-me-maybe-cassandra • CAP criticism: http://aphyr.com/posts/292-call-me-maybe-nuodb http://www.julianbrowne.com/article/viewer/brewers-cap-theorem http://www.percona.com/live/mysql-conference-2013/sites/default/ files/slides/aslett%20cap%20theorem.pdf Tuesday, October 22, 13
  • 23. KEY IDEAS Tuesday, October 22, 13
  • 24. • Dynamo • simple • no partitioning + BigTable model architecture, minimal administration single point of failure • closer • low to the metal (e.g. no HDFS) latency Tuesday, October 22, 13
  • 25. CASSANDRA’S DATA MODEL Tuesday, October 22, 13
  • 26. Keyspace Column Family Row (Partition) Key Column Name Value Tuesday, October 22, 13
  • 27. Keyspace Column Family Row (Partition) Key Column Name Value Tuesday, October 22, 13 “Database”
  • 28. Keyspace “Database” Column Family “Table” Row (Partition) Key Column Name Value Tuesday, October 22, 13
  • 29. Keyspace “Database” Column Family “Table” Row (Partition) Key “Primary ID” Column Name Value Tuesday, October 22, 13
  • 30. Keyspace “Database” Column Family “Table” Row (Partition) Key “Primary ID” Column Name Sorted “Column” Value Tuesday, October 22, 13
  • 31. Keyspace Column Family “Table” Row (Partition) Key “Primary ID” Column Name Sorted “Column” Value Tuesday, October 22, 13 “Database” “Value”
  • 32. PARTITIONING Tuesday, October 22, 13
  • 33. TWO PARTITIONERS OUT OF THE BOX • Byte Ordered Partitioner • Random Partitioner http://www.datastax.com/docs/1.0/cluster_architecture/partitioning Tuesday, October 22, 13
  • 34. TWO PARTITIONERS OUT OF THE BOX • Byte Ordered Partitioner • Random Forget it: •hot spots •uneven distribution •load balancing Partitioner http://www.datastax.com/docs/1.0/cluster_architecture/partitioning Tuesday, October 22, 13
  • 35. 1 aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 36. 1 aaa Initial token 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 37. 1 Range: [aaa,bbb) aaa Initial token 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 38. 1 Range: [aaa,bbb) aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 39. 1 aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 40. Row Key Hash ... abc ... klm ... xyz ... 1 aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 41. Row Key Hash ... abc ... klm ... xyz ... 1 abc aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 42. Row Key Hash ... abc ... klm ... xyz ... 1 abc aaa 4 2 zzz bbb 3 xxx Tuesday, October 22, 13 klm
  • 43. Row Key Hash ... abc ... klm ... xyz ... 1 abc aaa 4 2 zzz xyz Tuesday, October 22, 13 bbb 3 xxx klm
  • 44. WHAT ABOUT THE REPLICATION!? Tuesday, October 22, 13
  • 45. Replication Factor = 2 1 abc aaa 4 2 zzz xyz Warning: greatly simplified. Checkout snitch docs for more info. Tuesday, October 22, 13 bbb 3 xxx klm
  • 46. Replication Factor = 2 1 abc aaa 4 2 zzz xyz Warning: greatly simplified. Checkout snitch docs for more info. Tuesday, October 22, 13 bbb 3 xxx klm abc
  • 47. Replication Factor = 2 1 abc aaa 4 2 zzz xyz Warning: greatly simplified. Checkout snitch docs for more info. Tuesday, October 22, 13 klm bbb 3 xxx klm abc
  • 48. Replication Factor = 2 1 xyz abc aaa 4 2 zzz xyz Warning: greatly simplified. Checkout snitch docs for more info. Tuesday, October 22, 13 klm bbb 3 xxx klm abc
  • 49. Replication Factor = 3 1 xyz abc aaa 4 2 zzz xyz klm Tuesday, October 22, 13 bbb 3 xxx klm abc
  • 50. Replication Factor = 3 1 xyz abc aaa 4 2 zzz xyz klm abc Tuesday, October 22, 13 bbb 3 xxx klm abc
  • 51. Replication Factor = 3 1 xyz aaa klm 4 2 zzz xyz klm abc Tuesday, October 22, 13 abc bbb 3 xxx klm abc
  • 52. Replication Factor = 3 1 xyz aaa klm 4 klm abc Tuesday, October 22, 13 xyz 2 zzz xyz abc bbb 3 xxx klm abc
  • 53. Replication Factor = 3 BTW, QUORUM = (RF/2)+1 1 xyz aaa klm 4 klm abc Tuesday, October 22, 13 xyz 2 zzz xyz abc bbb 3 xxx klm abc
  • 54. Tuesday, October 22, 13
  • 55. WHAT HAPPENS WHEN A NEW NODE IS BEING ADDED ? 1 5 aaa ??? 4 2 zzz bbb 3 xxx Tuesday, October 22, 13
  • 56. VNODES 1 aaa ccc ggg 2 4 bbb vvv mmm zzz eee ddd 3 xxx uuu jjj Tuesday, October 22, 13
  • 57. VNODES 5 1 aaa ccc ggg 2 4 bbb vvv mmm zzz eee ddd 3 xxx uuu jjj Tuesday, October 22, 13
  • 58. 5 1 ggg mmm aaa ccc 4 2 zzz eee ddd bbb vvv 3 xxx uuu jjj Tuesday, October 22, 13
  • 59. 5 1 ggg mmm aaa ccc 4 2 zzz eee ddd bbb vvv 3 This also greatly helps in case when a node is down. Tuesday, October 22, 13 xxx uuu jjj
  • 60. CASSANDRA 101 Tuesday, October 22, 13
  • 61. INSTALLATION & CONFIGURATION Tuesday, October 22, 13
  • 62. SELECT * FROM books; INSERT INTO books (author, title, year) VALUES (‘Moby-Dick’, ‘Herman Melville’, 1851); CQL3 DELETE FROM books WHERE author=‘Paulo Coelho’; Tuesday, October 22, 13
  • 63. DATA MODELING PRACTICES COMPOSITE COLUMNS Tuesday, October 22, 13
  • 64. Author Year Number of words George Orwell Animal Farm 1945 32451 George Orwell 1984 1949 110581 James Joyce Tuesday, October 22, 13 Book Ulysses 1922 265192
  • 65. Author Book Year Number of words George Orwell Animal Farm 1945 32451 George Orwell 1984 1949 110581 James Joyce Ulysses 1922 265192 CREATE TABLE books ( author varchar, title varchar, year integer, number_of_words integer, PRIMARY KEY (author, title) ); Tuesday, October 22, 13
  • 66. Author Book Year Number of words George Orwell Animal Farm 1945 32451 George Orwell 1984 1949 110581 James Joyce Ulysses 1922 265192 CREATE TABLE books ( author varchar, title varchar, year integer, number_of_words integer, PRIMARY KEY (author, title) ); George Orwell [1984, Year]: 1949 [1984, Number of words]: 110581 James Joyce [Ulysses, Year]: 1922 [Ulysses, Number of words]: 265192 Tuesday, October 22, 13 [Animal Farm, Year]: 1945 [Animal Farm, Number of words]: 32451
  • 67. COUNTERS http://www.slideshare.net/kevinweil/rainbird-realtimeanalytics-at-twitter-strata-2011 Tuesday, October 22, 13
  • 68. SETS, LISTS, MAPS Tuesday, October 22, 13
  • 69. CONSOLE TIME Tuesday, October 22, 13
  • 70. WHAT WE HIT? Tuesday, October 22, 13
  • 71. • no DESCRIBE when calling from a client • cache settings • insertion performance with 100 000’s of columns • PRIMARY KEY((a,b,c),d) • compaction Tuesday, October 22, 13 settings
  • 72. I WANT TO KNOW MORE Tuesday, October 22, 13
  • 73. • http://wiki.apache.org/cassandra/ArchitectureOverview • http://www.datastax.com/documentation/cql/3.0/webhelp/ index.html • http://cassandra.apache.org/doc/cql3/CQL.html • http://www.slideshare.net/acunu/freakin-fast-cassandra • http://nosql.mypopescu.com/ • http://planetcassandra.org/ Tuesday, October 22, 13
  • 74. BY THE WAY... Tuesday, October 22, 13
  • 75. Tuesday, October 22, 13 getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com getbase.com HAVE YOU SEEN HIM?