Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Cassandra and Drivers

2,095 views

Published on

This is a two part talk in which we'll go over the architecture that enables Apache Cassandra’s linear scalability as well as how DataStax Drivers are able to take full advantage of it to provide developers with nicely designed and speedy clients extendable to the core.

Published in: Technology

Apache Cassandra and Drivers

  1. 1. Apache Cassandra and Drivers Overview of Apache Cassandra and DataStax Drivers Bulat Shakirzyanov @avalanche123 Sandeep Tamhankar @stamhankar999 https://goo.gl/cBsRVv
  2. 2. Introduction Cassandra Overview
  3. 3. © 2015 DataStax, All Rights Reserved. Datacenter Datacenter Cassandra Topology 3 Node NodeNode Node Client Client Node NodeNode Node Client Client Cluster
  4. 4. © 2015 DataStax, All Rights Reserved. Datacenter Datacenter Request Coordinator 4 Node NodeNode Node Client Client Node NodeCoordinator Node Client Client Coordinator node: Forwards requests to corresponding replicas
  5. 5. © 2015 DataStax, All Rights Reserved. Datacenter Row Replica 5 Replica NodeNode Replica Client Client Datacenter Node Node Replica Client Client Coordinator Replica node: Stores a slice of total rows of each keyspace
  6. 6. © 2015 DataStax, All Rights Reserved. Token Ring 6 12 1 2 3 4 5 6 7 8 9 10 11
  7. 7. © 2015 DataStax, All Rights Reserved. Token Ring 6 -263 … (+263 - 1) Murmur3 Partitioner
  8. 8. © 2015 DataStax, All Rights Reserved. Token Ring 6 Node 11…12 Node 12…1 Node 1…2 Node 2…3 Node 3…4 Node 4…5 Node 5…6 Node 6…7 Node 7…8 Node 8…9 Node 9…10 Node 10…11 -263 … (+263 - 1) Murmur3 Partitioner
  9. 9. © 2015 DataStax, All Rights Reserved. Keyspaces 7 CREATE KEYSPACE default WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 3 }
  10. 10. © 2015 DataStax, All Rights Reserved. C* Data Partitioning 8 Keyspace Row token(PK) = 1 RF = 3 Partitioner: Gets a token by hashing the primary key of a row
  11. 11. © 2015 DataStax, All Rights Reserved. C* Replication Strategy 9 Keyspace 1 Row RF = 3 Replication strategy: Determines the first replica for the row token(PK) = 1
  12. 12. © 2015 DataStax, All Rights Reserved. C* Replication Factor 10 Keyspace Row RF = 3 Replication factor: Specifies total number of replicas for each row token(PK) = 1
  13. 13. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 11 ReplicaApplication Consistency Level RF = 3, CL = Quorum
  14. 14. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 11 ReplicaApplication Consistency Level RF = 3, CL = Quorum INSERT
  15. 15. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 11 ReplicaApplication Consistency Level RF = 3, CL = Quorum INSERT
  16. 16. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 11 ReplicaApplication Consistency Level RF = 3, CL = Quorum INSERT
  17. 17. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 11 ReplicaApplication Consistency Level RF = 3, CL = Quorum INSERT
  18. 18. DataStax Drivers Smart clients for Apache Cassandra
  19. 19. © 2015 DataStax, All Rights Reserved. Goals of DataStax Drivers • Consistent set of features across languages • Asynchronous execution of requests • Load balancing • Fault tolerant • Address Resolution (multi-region!) • Automatic cluster discovery and reconnection • Flexible to the core • Consistent terminology • Open source 13
  20. 20. © 2015 DataStax, All Rights Reserved. 14
  21. 21. Asynchronous Execution IO Reactor, Request Pipelining and Future Composition
  22. 22. © 2015 DataStax, All Rights Reserved. Asynchronous Core 16 Application Thread Business Logic Driver Background Thread IO Reactor
  23. 23. © 2015 DataStax, All Rights Reserved. Request Pipelining 17 Client Without Request Pipelining Server Client Server With Request Pipelining 1 2 2 3 1 3 1 2 3 1 2 3
  24. 24. © 2015 DataStax, All Rights Reserved. What is a Future? • Represents the result of an asynchronous operation • Returned by any *_async method in the Ruby driver • execute_async • prepare_async • Will block if asked for the true result 18
  25. 25. © 2015 DataStax, All Rights Reserved. Future Composition 19 select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get
  26. 26. © 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 20
  27. 27. © 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 21
  28. 28. © 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 22
  29. 29. © 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 23
  30. 30. © 2015 DataStax, All Rights Reserved. select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get Future Composition 24
  31. 31. © 2015 DataStax, All Rights Reserved. Future Composition 25 [#<User @id=1 @username="avalanche123"; @page=#<Page @slug="avalanche123" ... > ... >, ... ]
  32. 32. © 2015 DataStax, All Rights Reserved. Pop Quiz: How to make this faster? 26 select_user = session.prepare('SELECT * FROM users WHERE id = ?') select_page = session.prepare('SELECT * FROM pages WHERE slug = ?') user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(select_user, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(select_page, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get
  33. 33. © 2015 DataStax, All Rights Reserved. Pop Quiz: How to make this faster? 27 user_future = session.prepare_async(‘SELECT * FROM users WHERE id = ?') page_future = session.prepare_async(‘SELECT * FROM pages WHERE slug = ?’) user_ids = [1, 2, 3, 4] futures = user_ids.map do |id| future = session.execute_async(user_future.get, arguments: [id]) future.then do |users| user = users.first future = session.execute_async(page_future.get, arguments: [user['username']]) future.then do |pages| page = pages.first User.new(user, Page.new(page)) end end end Cassandra::Future.all(futures).get
  34. 34. Load Balancing Principles and Implementations
  35. 35. © 2015 DataStax, All Rights Reserved. Application Driver Load Balancing 29 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy
  36. 36. © 2015 DataStax, All Rights Reserved. Application Driver Load Balancing 29 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy
  37. 37. © 2015 DataStax, All Rights Reserved. Application Driver Load Balancing 29 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy
  38. 38. © 2015 DataStax, All Rights Reserved. Datacenter Datacenter DataCenter Aware Balancing 30 Node Node NodeClient Node Node Node Client Client Client Client Client Local nodes are queried first, if none are available, the request could be sent to a remote node.
  39. 39. © 2015 DataStax, All Rights Reserved. Token Aware Balancing 31 Route request directly to Replicas Node Node Replica Node Client Replica Replica Uses prepared statement metadata to get the token
  40. 40. © 2015 DataStax, All Rights Reserved. Other built-in policies • Round Robin Policy • ignores topology • White List Policy • only connect with certain hosts 32
  41. 41. Fault Tolerance Sources of Failure and Error Handling
  42. 42. © 2015 DataStax, All Rights Reserved. Fault Tolerance 34 Coordinator Node Replica Replica Replica Node Business Logic Driver Application
  43. 43. © 2015 DataStax, All Rights Reserved. 35 Coordinator Node Replica Replica Replica Node Business Logic Driver Application Invalid Requests Network Timeouts Server Errors Possible Failures
  44. 44. © 2015 DataStax, All Rights Reserved. Application Driver Automatic Retry of Server Errors 36 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy
  45. 45. © 2015 DataStax, All Rights Reserved. Application Driver Automatic Retry of Server Errors 36 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy
  46. 46. © 2015 DataStax, All Rights Reserved. Application Driver Automatic Retry of Server Errors 36 Application Thread Node Pool Session Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Load Balancing Policy
  47. 47. © 2015 DataStax, All Rights Reserved. 37 Coordinator Node Replica Replica Replica Node Business Logic Driver Application Unreachable Consistency
  48. 48. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 38 Replica Business Logic Driver Application Read / Write Timeout Error
  49. 49. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 38 Replica Business Logic Driver Application Read / Write Timeout Error
  50. 50. © 2015 DataStax, All Rights Reserved. Coordinator Node Replica Replica Node 38 Replica Business Logic Driver Application Read / Write Timeout Error read / write timeout
  51. 51. © 2015 DataStax, All Rights Reserved. 39 Coordinator Node Replica Replica Replica Node Business Logic Driver Application Unavailable Error
  52. 52. © 2015 DataStax, All Rights Reserved. 39 Coordinator Node Replica Replica Replica Node Business Logic Driver Application Unavailable Error unavailable
  53. 53. © 2015 DataStax, All Rights Reserved. 40 Error Handling
  54. 54. Address Resolution Topology Aware Client
  55. 55. © 2015 DataStax, All Rights Reserved. Datacenter Datacenter Multiple Addresses 42 Node NodeNode Node Client Client Node NodeNode Node Client Client Within Datacenter:
 Private IPs Across Datacenters:
 Public IPs
  56. 56. © 2015 DataStax, All Rights Reserved. Application Driver Address Resolution 43 Application Thread Application Thread Application Thread Client Cluster
  57. 57. © 2015 DataStax, All Rights Reserved. Application Driver Address Resolution 43 Application Thread Node Cluster Application Thread Application Thread Client Cluster Address Resolution Policy
  58. 58. © 2015 DataStax, All Rights Reserved. Application Driver Address Resolution 43 Application Thread Node Cluster Application Thread Application Thread Client Cluster Node Node Node Address Resolution Policy Control Connection
  59. 59. © 2015 DataStax, All Rights Reserved. Application Driver Address Resolution 43 Application Thread Node Cluster Application Thread Application Thread Client Cluster Node Node Node Address Resolution Policy Control Connection
  60. 60. © 2015 DataStax, All Rights Reserved. Application Driver Address Resolution 43 Application Thread Node Pool Cluster Pool Pool Pool Application Thread Application Thread Client Cluster Node Node Node Address Resolution Policy Control Connection Session
  61. 61. © 2015 DataStax, All Rights Reserved. EC2 Multi-Region Address Resolution 44
  62. 62. © 2015 DataStax, All Rights Reserved. More • Request Tracing • Execution Information • which node was used, # retries for query, etc. • State Listeners • node goes down/comes up, schema changes, etc. • Result Paging • SSL and Authentication 45
  63. 63. Questions

×