Cassandra drivers

1,403 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,403
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Maintainer of pycassa, phpcassa, and telephus. I've also worked on Hector and clj-hector a bit.
    The Datastax python driver was originally ported from the DataStax Java driver.
  • Cassandra drivers

    1. 1. Cassandra Native Protocol Drivers Tyler Hobbs, C* and C* driver engineer
    2. 2. About Me CHART TITLE GOES HERE Thrift drivers: pycassa, phpcassa, telephus, and others DataStax python driver (native protocol) Cassandra Engineer
    3. 3. Thrift Drivers CHART TITLE GOES HERE RPC Framework, machine generated
    4. 4. Thrift Drivers Problems? ● Backwards & forwards compatibility ● Too many connections ● No standard interface ● Thrift overhead ● Cluster state must be polled
    5. 5. Problems & Solutions Backwards/Forwards Compatibility ● Possible with Thrift, but easier with a query language (CQL) ● Separately versioned query language and protocol
    6. 6. Problems & Solutions Too Many Connections Operation pipelining with the native protocol
    7. 7. Problems & Solutions Operation Pipelining ● 127 in-flight ops per connection ● Improves throughput (not latency) ● Out-of-order processing ● Async, event-loop driven
    8. 8. Problems & Solutions No Standard Interface ● Query language ● Standard policies for load balancing, connection management, and retries/failure handling ● More similar to standard RDMBS drivers
    9. 9. Problems & Solutions Thrift Overhead Custom protocol, prepared statements
    10. 10. Problems & Solutions Cluster State Must be Polled “Control connection,” register for pushed notifications
    11. 11. New Driver API Sync/Async Operations result = session.execute(“SELECT * FROM foo”)
    12. 12. New Driver API Sync/Async Operations future = session.execute_async(“SELECT * FROM foo”) … result = future.result()
    13. 13. New Driver API Sync/Async Operations session.execute_async(query).add_callbacks(     callback=process_data,     errback=log_error )
    14. 14. New Driver Architecture Connection Pooling ● Min/max conns per remote, local nodes ● Use least busy conn ● Open and close conns as needed
    15. 15. New Driver Architecture What happens during Operations? ● Nodes to query are picked by LoadBalancingPolicy ● Failures are handled by RetryPolicy ● On errors, nodes are marked down by ConvictionPolicy
    16. 16. New Driver Architecture Load Balancing Policies ● RoundRobin ● DcAwareRoundRobin ● TokenAware wrapper ● Custom
    17. 17. New Driver Architecture What happens during Operations? ● Nodes to query are picked by LoadBalancingPolicy ● Failures are handled by RetryPolicy ● On errors, nodes are marked down by ConvictionPolicy
    18. 18. New Driver Architecture Retry Policies ● Operation type ● Consistency level ● Number (and type) of responses ● Type of failure ● Retry, raise error, or ignore error
    19. 19. New Driver Architecture What happens during Operations? ● Nodes to query are picked by LoadBalancingPolicy ● Failures are handled by RetryPolicy ● On errors, nodes are marked down by ConvictionPolicy, reconnect with ReconnectionPolicy
    20. 20. New Driver Architecture Reconnection Policy ● Schedule for attempting reconnects to down nodes ● Constant and Exponential backoff
    21. 21. New Driver Architecture Policy Defaults ● RoundRobin load balancing (not token or DC aware) ● Retry at most once (in a small number of cases) ● Mark node down after one failure ● Exponential backoff on reconnection attempts
    22. 22. New Driver Architecture Prepared Statements ● Prepared against all nodes ● Cache ● Re-preparation prepared = session.prepare(“SELECT foo FROM bar WHERE id=?”) result = session.execute(prepared, [user_id1])
    23. 23. New Driver Architecture Control Connection ● Listens for pushed updates to cluster state and schema ● Marks nodes up and down ● Auto discovers nodes in cluster ● Updates schema metadata
    24. 24. New Driver Architecture Metrics ● Count timeouts, connection errors, and other errors ● Open connection stats ● Operation latency histogram
    25. 25. New Driver Architecture Cursors ● No more manual paging over large queries ● Works across multiple nodes ● Paging state provided by client
    26. 26. New Driver Architecture Quick Python Benchmark ● 3 nodes (local, ccm), one conn per host ● 50k individual inserts, single threaded ● Pycassa: ~1200 ops/sec ● DataStax python driver (sync, blocking): ~950 ops/sec ● DataStax python driver (future batching): ~3000 ops/sec ● DataStax python driver (callback chaining): ~7300 ops/sec
    27. 27. New Driver Architecture Languages Supported ● Java – 1.0 released ● in Spring 2013 Simple object mapper under development ● C# - 1.0 released in Summer 2013 ● LINQ integration ● Python – Beta since Summer 2013, 1.0 coming soon ● Basic mapper available through cqlengine ● C++ - Currently in Alpha state ● Ruby, JS, PHP – planned, but no development so far
    28. 28. New Driver Architecture Languages Supported github.com/datastax
    29. 29. Questions? @tylhobbs thobbs on #cassandra, #datastaxdrivers

    ×