Introduction to NoSQL and Cassandra

2,573 views

Published on

Intro to NoSQL, Cassandra and Hector I gave at Globant Laminar in Buenos Aires Argentina Dec 13th 2012.

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,573
On SlideShare
0
From Embeds
0
Number of Embeds
90
Actions
Shares
0
Downloads
97
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Introduction to NoSQL and Cassandra

  1. 1. Introduction to NoSQL and Apache Cassandra Patricio Echagüe patricioe@gmail.com @patricioe
  2. 2. About mePresent: Relateiq (Data Processing and Scalability) Hector committerPast: DataStax (The Cassandra Company) Cassandra/Hadoop distribution (former Brisk) Cassandra FS CQL connection pool Cassandra contributions
  3. 3. Trends: “NoSQL”
  4. 4. 2011
  5. 5. 2012
  6. 6. What is “NoSQL” ?systems able to store and retrieve great quantities of data with none or little information about the relationships between them.Generally they dont have a SQL like language for data manipulation and their schema is more relaxed than traditional RDBM systems.Full ACID is not often guaranteed.
  7. 7. Brewers CAP theoremConsistency: all replicas agree on the same valueAvailability: always get an answer from a replicaPartition Tolerance: the system works even if replicas cant talk You can have 2 of these
  8. 8. Brewers CAP theorem
  9. 9. CAP Classification ConsistencyAvailability Partitioning
  10. 10. Types- Relationals- Key-Value stores- Columnar (column-oriented)- Graph databases- Document
  11. 11. Whats eventual consistency?It is a promise that eventually, in the absence of new writes, all replicas that are responsible for a data item will agree on the same version
  12. 12. How eventual is eventual?Write to 1 replica and Read from 1 replica of a total of 3
  13. 13. How eventual is eventual?Write to 2 replicas and Read from 2 replicas of a total of 3
  14. 14. Why is it good? because, by contacting fewerreplicas, read and write operations complete more quickly, lowering latency.
  15. 15. Cassandra is a distributed , fault tolerant, scalable, column oriented and tunable consistency data store.
  16. 16. Cassandra has CAPBut C is tunable
  17. 17. What is Apache Cassandra?
  18. 18. Key ConceptsMulti-Master, Multi-DCLinearly scalableIntegrated CachingPerforms well with Larger-than-memory DatasetsTunable consistencyIdempotent (client clock)Schema OptionalNo ACID transactions, No Locking
  19. 19. Generally complements another system(s)(Not intended to be one-size-fits-all)You should always use the right tool for the right job
  20. 20. Speaking Cassandra
  21. 21. Data Model“4-Dimensional Hash Table”A Keyspace contains a collection of Column Families(Controls replication)A Column Family contains RowsA Row have a key, and each row has columns(No need to define the columns before hand)Each column has a name and a value and a timestamp(TTL is optional)
  22. 22. Data Model – (RDBMS)Keyspace (Schema)Column Family(CF) (table)Row (row)Column (column*) → may not be present in all rows
  23. 23. Data Model – Column FamilyStatic Column Family- Model my object dataDynamic Column Family- Precalculated / Prematerialized query resultsNothing stopping you from mixing them!
  24. 24. Data Model – Static Column Family
  25. 25. Data Model – Dynamic CF stats for a specific date
  26. 26. Data Model – Dynamic CFTimeline of tweets by a userTimeline of tweets by all of the people a user isfollowingList of comments sorted by scoreList of friends grouped by stateMetrics for a time bucket
  27. 27. ...Lets store “foo”
  28. 28. ...Lets store “foo” Foo
  29. 29. …But if that node is down? Foo
  30. 30. ...Lets store “foo” in 3 nodes.This is the Replication Factor(N) Foo Foo Foo
  31. 31. ...Now we need to know what nodes the key was written to so we can read it later
  32. 32. ...The Initial Token specifies the upper value of the key range each node is responsible for #1 #5 <= d <= z e f g h I j k #2 <= k #4 <= u #3 <= pa b c d e f g h I j k l m n …. z
  33. 33. ...Gossip is the protocol Cassandra uses to interchange information with nodes in the cluster (a.k.a. Ring)
  34. 34. …Gossip is the protocol Cassandra uses to interchange information with nodes in the cluster (a.k.a. Ring)For example, what nodes owns the key “foo”
  35. 35. ...Gossip is the protocol Cassandra uses to interchange information with nodes in the cluster (a.k.a. Ring)For example, what nodes owns the key “foo” #1 Read foo #5 <= d Client e f g h I j k <= z #2 foo <= k #4 <= u #3 <= p
  36. 36. ...A Partitioner is used to transform the key.“foo1” and “foo2” may end up in different nodes
  37. 37. ...A Partitioner is used to transform the key.“foo1” and “foo2” may end up in different nodesThe most commonly used is Random Partitioner “foo1” md5(“foo1”) “A99A0B....”
  38. 38. ...A Partitioner is used to transform the key.“foo1” and “foo2” may end up in different nodesThe most commonly used is Random Partitioner #1 foo1 #5 #2 foo2 #4 #3
  39. 39. ...A Replica Placement Strategy determines which nodes contain replicas
  40. 40. ...A Replica Placement Strategy determines which nodes contain replicasSimple Strategy place them clockwise foo1 #1 #5 foo1 #2 #4 #3 foo1
  41. 41. ...A Replica Placement Strategy determines which nodes contain replicasNetwork Topology Strategy place them in different DCs DC1:3 DC2:1 foo1 #1 #1 foo1 #5 #5 foo1 #2 #2 #4 #4 #3 #3 foo1
  42. 42. ...Consistency Level determines how many replicas to contact to
  43. 43. ...Consistency Level determines how many replicas to contact toCL = 1 #1 foo1 Client # 5 foo1 #2 # 4 #3 foo1
  44. 44. ...Consistency Level determines how many replicas to contact toCL = QUORUM #1 foo1 Client # 5 foo1 #2 # 4 #3 foo1
  45. 45. Consistency For WritesANYONETWOTHREEQUORUMLOCAL_QUORUMEACH_QUORUMALL
  46. 46. Consistency For ReadsONETWOTHREEQUORUMLOCAL_QUORUMEACH_QUORUMALL
  47. 47. Consistency In Math Term Cassandra guarantees strong consistency if (nodes_written + nodes_read) > replication_factor R+W>N
  48. 48. Back to the example..Consistency Level determines how many replicas to contact toCL = QUORUM #1 foo1 Client # 5 foo1 #2 # 4 #3 foo1
  49. 49. ...But what if node #3 is down?
  50. 50. ...But what if node #3 is down? hint #1 foo1 Client # 5 foo1 #2 # 4 #3
  51. 51. ...But what if node #3 is down?The coordinator nodes will store a hint and will replay that mutation when the down node comes back up.This is known as Hinted Handoff
  52. 52. ...Node #5 will replay the hint to node #3 when it comes back online hint foo1 #1 Client #5 foo1 #2 #4 #3 foo1
  53. 53. ...And if node #5 dies before sending the hints to node #3? hint #1 foo1 Client #5 foo1 #2 #4 #3
  54. 54. ...If using Quorum, node #4 will request for foo to all the replicas hint #1 foo1 Client #5 foo1 #2 #4 #3
  55. 55. ...If the result received do not match, a Read Repair process is performed in the background hint #1 foo1 Client #5 foo1 #2 #4 #3
  56. 56. ...And the missing or not up-to-date value is pushed to the out of date node. #3 in this case hint #1 foo1 Client #5 foo1 #2 #4 foo != #3 foo
  57. 57. ...The last feature to achieve consistency is the Anti Entropy Service (AES)Should run periodically as part of the cluster maintenance or when a node was down
  58. 58. Recap Consistency FeaturesRead RepairAnti Entropy Service (AES)Hinted Handoff
  59. 59. scaling “e” “z” “j” “t” “o”
  60. 60. scaling “e” “?” “z” “j” “t” “o”
  61. 61. scaling “e” “z” “g” “j” “t” “o” Nodetool move ?
  62. 62. Want 2x performance ?!Add 2x nodesNo downtime included!
  63. 63. Want 2x performance ?! “e” “z” “j” “t” “o”
  64. 64. Want 2x performance ?! “b” “e” “z” “g” “v” “j” “t” “l” “q” “o”
  65. 65. With RF= 3 we could lose “b” “e” “z” X “g” X “v” “j” “t” X “l” “q” “o”
  66. 66. With RF= 3 we could lose ? “b” “e” X “z” X “g” X“v” “j” “t” X “l” “q” “o”
  67. 67. Vs others b e z g v j t l q o
  68. 68. RecapReplication FactorTokensGossipPartitionerReplica PlacementConsistencyHinted HandoffRead RepairAESClustering
  69. 69. PerformanceReads on par with writes
  70. 70. Scalability
  71. 71. Internals
  72. 72. Read and Write path
  73. 73. Storage - SSTable- SSTables are sorted- Immutable (“Merge on read”)- Newest timestamp wins
  74. 74. Storage – Compaction
  75. 75. Storage – CompactionMerges SSTables together into a larger SSTablesRemoves TombstonesRebuild primary and secondary indexes
  76. 76. Storage – CompactionTwo types:- Size-tiered compaction- Leveled compaction
  77. 77. Storage – CompactionSize-tiered compactionPerformance no guaranteedRow may be across many SSTablesWaste of spaceGood for write heavy opsRows are written once100% more space than SSTables
  78. 78. Storage – CompactionLeveled compactionGrouped into levelsNo overlapping within a levelEach level is ten times as large90% of reads satisfied with 1 SSTableTwice as much I/O
  79. 79. RecapSSTableMemtableRow CacheCompaction
  80. 80. SSDs and cachingBefore - 48 Cassandra on m2.4xlarge. 36 EVcache on m2.xlargeAfter - 12 Cassandra on hi1.4xlarge
  81. 81. API Operations
  82. 82. Five general categories Retrieving Write/Update/Remove (all the same op!) Increment counters Meta Information Schema Manipulation CQL Execution
  83. 83. Insertion/Deletion => MutationAgain: Every mutation is an insert!- Merge on read- Sstables are immutable- Highest timestamp wins
  84. 84. CQLINSERT INTO Hollywood.NerdMovies (user_uuid, fan) VALUES (cfd66ccc-d857-4e90-b1e5-df98a3d40cd6, johndoe) USING CONSISTENCY LOCAL_QUORUM AND TTL 86400;
  85. 85. Hadoop
  86. 86. Using a Client - Hector http://hector-client.org - Astyanax https://github.com/Netflix/astyanax - Pelops https://github.com/s7/scale7-pelops
  87. 87. Using a Client → Hector - Most popular Java client - In use at very large installations - A number of tools and utilities built on top - Very active community - MIT Licensed
  88. 88. Features - High Level API - Failover behavior - High performant connection pool - JMX counters for management - Discoverability of new nodes - Automatic retry of downed hosts - Suspension of nodes after several timeouts - Load Balancing: Configurable and extensible - Locking (Beta)
  89. 89. Hectors Architecture
  90. 90. vs JDBC Hector is operation-oriented Whereas JDBC is connection-oriented
  91. 91. API Abstractions Templates Mutator Thrift
  92. 92. ColumnFamilyTemplate Familiar, type-safe approach - based on template-method design pattern - generic: ColumnFamilyTemplate<K,N> (K is the key type, N the column name type)ColumnFamilyTemplate template = new ThriftColumnFamilyTemplate(keyspaceName, columnFamilyName, StringSerializer.get(), StringSerializer.get());*** (no generics for clarity)
  93. 93. ColumnFamilyTemplatenew ThriftColumnFamilyTemplate( keyspaceName,columnFamilyName,StringSerializer.get(), Key FormatStringSerializer.get()); Column Name Format - Cassandra calls this a “comparator” - Remember: defines column order in on-disk format
  94. 94. ColumnFamilyTemplateColumnFamilyResult<String, String> res =cft.queryColumns("patricioe");String value = res.getString("email");Date startDate = res.getDate(“DateOfBirth”); Key Format Column Name Format
  95. 95. ColumnFamilyTemplateInserting data with ColumnFamilyUpdaterColumnFamilyUpdater updater = template.createUpdater(”pato");updater.setString("companyName",”Relateiq");updater.addKey(”sabina");updater.setString("companyName",”Globant");template.update(updater);
  96. 96. ColumnFamilyTemplateDeleting Data with ColumnFamilyTemplatetemplate.deleteColumn("zznate", "notNeededStuff");template.deleteColumn("zznate", "somethingElse");template.deleteColumn("patricioe", "aDifferentColumnName");...template.deleteRow(“someuser”);template.executeBatch();
  97. 97. Integrating with existing patternsHector Object Mapper -> Apache Gorahttps://github.com/hector-client/hector/tree/master/object-mapperHector JPA*:https://github.com/riptano/hector-jpaSpring IOCCQL: JDBC Driver and Pool in 1.0!JdbcTemplate FTW!
  98. 98. Development Resources Hector Documentation (http://hector-client.org) Cassandra Unit https://github.com/jsevellec/cassandra-unit Cassandra Maven Plugin http://mojo.codehaus.org/cassandra-maven-plugin/ CCM localhost cassandra cluster https://github.com/pcmanus/ccm OpsCenter http://www.datastax.com/products/opscenter Cassandra AMIs https://github.com/riptano/CassandraClusterAMI
  99. 99. Want to contribute?git clone git@github.com:hector-client/hector.git
  100. 100. Summary- Take advantage of strengths- idempotence and asynchronicity are your friends- If its not in the API, you are probably doing it wrong- Seek death is still possible if you model incorrectly- Try Denormalizing (append-only model ?)
  101. 101. Patricio Echagüepatricioe@gmail.com @patricioe
  102. 102. CreditsNate McCallAaron Morton (http://thelastpickle.com)Datastax (http://www.datastax.com)http://www.slideshare.net/mikiobraun/cassandra-an-introduction
  103. 103. Additional ResourcesDataStax Documentation: http://www.datastax.com/docsApache Cassandra project wiki: http://wiki.apache.org/cassandra/“The Dynamo Paper”http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdfP. Helland. Building on Quicksandhttp://arxiv.org/pdf/0909.1788P. Helland. Life Beyond Distributed Transactionshttp://www.ics.uci.edu/~cs223/papers/cidr07p15.pdfS. Anand. “Netflixs Transition to High-Availability Storage Systems”http://media.amazonwebservices.com/Netflix_Transition_to_a_Key_v3.pdf“The Megastore Paper”http://research.google.com/pubs/archive/36971.pdf

×