C* Summit EU 2013: Cassandra Internals

1,683 views

Published on

Speaker: Aaron Morton, Apache Cassandra Committer & Co-Founder/Principle Consultant at The Last Pickle Inc.
Video: http://www.youtube.com/watch?v=efI5fL8eEfo&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=23
From the microsecond your request hits an Apache Cassandra node there are many code paths, threads and machines involved in storing or fetching your data. This talk will step through the common operations and highlight the code responsible. Apache Cassandra solves many interesting problems to provide a scalable, distributed, fault tolerant database. Cluster wide operations track node membership, direct requests and implement consistency guarantees. At the node level, the Log Structured storage engine provides high performance reads and writes. All of this is implemented in a Java code base that has greatly matured over the past few years. This talk will step through read and write requests, automatic processes and manual maintenance tasks. I'll discuss the general approach to solving the problem and drill down to the code responsible for implementation. Existing Cassandra users, those wanting to contribute to the project and people interested in Dynamo based systems will all benefit from this tour of the code base.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,683
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
61
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

C* Summit EU 2013: Cassandra Internals

  1. 1. CASSANDRA EU 2013 CASSANDRA INTERNALS Aaron Morton @aaronmorton ! Co-Founder & Principal Consultant www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License #CassandraEU
  2. 2. About The Last Pickle. Work with clients to deliver and improve Apache Cassandra based solutions. Apache Cassandra Committer, DataStax MVP, Hector Maintainer, Apache Usergrid Committer. Based in New Zealand & Austin, TX. www.thelastpickle.com #CassandraEU
  3. 3. Architecture Code www.thelastpickle.com #CassandraEU
  4. 4. Cassandra Architecture. Clients API's Cluster Aware Cluster Unaware Disk www.thelastpickle.com #CassandraEU
  5. 5. Cassandra Cluster Architecture. Clients API's Cluster Aware Cluster Aware Cluster Unaware Cluster Unaware Disk Disk Node 1 www.thelastpickle.com API's Node 2 #CassandraEU
  6. 6. Dynamo Cluster Architecture. Clients API's Dynamo Dynamo Database Database Disk Disk Node 1 www.thelastpickle.com API's Node 2 #CassandraEU
  7. 7. Architecture API Dynamo Database www.thelastpickle.com #CassandraEU
  8. 8. API Transports. ! Thrift Native Binary ! www.thelastpickle.com #CassandraEU
  9. 9. Thrift Transport. ! //Custom TServer implementations o.a.c.thrift.CustomTThreadPoolServer o.a.c.thrift.CustomTHsHaServer www.thelastpickle.com #CassandraEU
  10. 10. API Transports. www.thelastpickle.com Thrift Native Binary #CassandraEU
  11. 11. Native Binary Transport. ! Beta in Cassandra 1.2, now GA. Uses Netty. CQL 3 only. www.thelastpickle.com #CassandraEU
  12. 12. o.a.c.transport.Server.run() ! //Setup the Netty server new ExecutionHandler() new NioServerSocketChannelFactory() ServerBootstrap.setPipelineFactory() www.thelastpickle.com #CassandraEU
  13. 13. o.a.c.transport.Message.Dispatcher.messageReceived() ! //Process message from client ServerConnection.validateNewMessage() Request.execute() ServerConnection.applyStateTransition() Channel.write() www.thelastpickle.com #CassandraEU
  14. 14. Messages. ! Defined in the Native Binary Protocol $SRC/doc/native_protocol.spec www.thelastpickle.com #CassandraEU
  15. 15. API Services. ! JMX Thrift CQL 3 ! www.thelastpickle.com #CassandraEU
  16. 16. JMX Management Beans. ! Spread around the code base. Interfaces named *MBean www.thelastpickle.com #CassandraEU
  17. 17. JMX Management Beans. ! Registered with names such as org.apache.cassandra.db: type=StorageProxy www.thelastpickle.com #CassandraEU
  18. 18. API Services. ! JMX Thrift CQL 3 ! www.thelastpickle.com #CassandraEU
  19. 19. o.a.c.thrift.CassandraServer ! // Implements Thrift Interface // Access control // Input validation // Mapping to/from Thrift and internal types www.thelastpickle.com #CassandraEU
  20. 20. Thrift Interface. ! Thrift IDL $SRC/interface/cassandra.thrift www.thelastpickle.com #CassandraEU
  21. 21. o.a.c.thrift.CassandraServer.get_slice() ! // get columns for one row Tracing.begin() ClientState cState = state() cState.hasColumnFamilyAccess() multigetSliceInternal() ! www.thelastpickle.com #CassandraEU
  22. 22. CassandraServer.multigetSliceInternal() ! // get columns for may rows ThriftValidation.validate*() // Create ReadCommands getSlice() ! www.thelastpickle.com #CassandraEU
  23. 23. CassandraServer.getSlice() ! // Process ReadCommands // return Thrift types ! readColumnFamily() thriftifyColumnFamily() ! www.thelastpickle.com #CassandraEU
  24. 24. CassandraServer.readColumnFamily() ! // Process ReadCommands // Return ColumnFamilies ! StorageProxy.read() ! www.thelastpickle.com #CassandraEU
  25. 25. API Services. ! JMX Thrift CQL 3 ! www.thelastpickle.com #CassandraEU
  26. 26. o.a.c.cql3.QueryProcessor ! // Prepares and executes CQL3 statements // Used by Thrift & Native transports // Access control // Input validation // Returns transport.ResultMessage ! ! www.thelastpickle.com #CassandraEU
  27. 27. CQL3 Grammar. ! ANTLR Grammar $SRC/o.a.c.cql3/Cql.g www.thelastpickle.com #CassandraEU
  28. 28. o.a.c.cql3.statements.ParsedStatement ! // Subclasses generated by ANTLR // Tracks bound term count // Prepare CQLStatement prepare() www.thelastpickle.com #CassandraEU
  29. 29. o.a.c.cql3.statements.CQLStatement ! checkAccess(ClientState state) validate(ClientState state) execute(ConsistencyLevel cl, QueryState state, List<ByteBuffer> variables) www.thelastpickle.com #CassandraEU
  30. 30. statements.SelectStatement.RawStatement ! // Implements ParsedStatement // Input validation prepare() www.thelastpickle.com #CassandraEU
  31. 31. statements.SelectStatement.execute() ! // Create ReadCommands StorageProxy.read() www.thelastpickle.com #CassandraEU
  32. 32. Architecture API Dynamo Database www.thelastpickle.com #CassandraEU
  33. 33. Dynamo Layer. o.a.c.service o.a.c.net ! o.a.c.dht o.a.c.gms o.a.c.locator o.a.c.stream www.thelastpickle.com #CassandraEU
  34. 34. o.a.c.service.StorageProxy ! // Cluster wide storage operations // Select endpoints & check CL available // Send messages to Stages // Wait for response // Store Hints www.thelastpickle.com #CassandraEU
  35. 35. o.a.c.service.StorageService ! // Ring operations // Track ring state // Start & stop ring membership // Node & token queries www.thelastpickle.com #CassandraEU
  36. 36. o.a.c.service.IResponseResolver ! preprocess(MessageIn<T> message) resolve() throws DigestMismatchException ! RowDigestResolver RowDataResolver RangeSliceResponseResolver www.thelastpickle.com #CassandraEU
  37. 37. Response Handlers / Callback. implements IAsyncCallback<T> ! response(MessageIn<T> msg) ! www.thelastpickle.com #CassandraEU
  38. 38. o.a.c.service.ReadCallback.get() //Wait for blockfor & data response condition.await(timeout, TimeUnit.MILLISECONDS) ! throw ReadTimeoutException() ! resolver.resolve() www.thelastpickle.com #CassandraEU
  39. 39. o.a.c.service.StorageProxy.fetchRows() ! getLiveSortedEndpoints() new RowDigestResolver() new ReadCallback() MessagingService.sendRR() --------------------------------------ReadCallback.get() # blocking catch (DigestMismatchException ex) catch (ReadTimeoutException ex) www.thelastpickle.com #CassandraEU
  40. 40. Dynamo Layer ! o.a.c.service o.a.c.net ! o.a.c.dht o.a.c.gms o.a.c.locator o.a.c.stream www.thelastpickle.com #CassandraEU
  41. 41. o.a.c.net.MessagingService.verb<<enum>> ! MUTATION READ REQUEST_RESPONSE TREE_REQUEST TREE_RESPONSE (And more...) www.thelastpickle.com #CassandraEU
  42. 42. o.a.c.net.MessagingService.verbHandlers ! new EnumMap<Verb, IVerbHandler>(Verb.class) www.thelastpickle.com #CassandraEU
  43. 43. o.a.c.net.IVerbHandler<T> ! doVerb(MessageIn<T> message, String id); ! www.thelastpickle.com #CassandraEU
  44. 44. o.a.c.net.MessagingService.verbStages ! new EnumMap<MessagingService.Verb, Stage>(MessagingService.Verb.class) www.thelastpickle.com #CassandraEU
  45. 45. o.a.c.net.MessagingService.receive() ! runnable = new MessageDeliveryTask( message, id, timestamp); ! StageManager.getStage( message.getMessageType()); ! stage.execute(runnable); www.thelastpickle.com #CassandraEU
  46. 46. o.a.c.net.MessageDeliveryTask.run() ! // If dropable and rpc_timeout MessagingService.incrementDroppedMessages(v erb); return; ! MessagingService.getVerbHandler(verb) verbHandler.doVerb(message, id) www.thelastpickle.com #CassandraEU
  47. 47. Architecture API Layer Dynamo Layer Database Layer www.thelastpickle.com #CassandraEU
  48. 48. Database Layer ! o.a.c.concurrent o.a.c.db ! o.a.c.cache o.a.c.io o.a.c.trace www.thelastpickle.com #CassandraEU
  49. 49. o.a.c.concurrent.StageManager ! stages = new EnumMap<Stage, ThreadPoolExecutor>(Stage.class); ! getStage(Stage stage) www.thelastpickle.com #CassandraEU
  50. 50. o.a.c.concurrent.Stage ! READ MUTATION GOSSIP REQUEST_RESPONSE ANTI_ENTROPY (And more...) www.thelastpickle.com #CassandraEU
  51. 51. Database Layer. o.a.c.concurrent o.a.c.db ! o.a.c.cache o.a.c.io o.a.c.trace www.thelastpickle.com #CassandraEU
  52. 52. o.a.c.db.Table ! // Keyspace open(String table) getColumnFamilyStore(String cfName) ! getRow(QueryFilter filter) apply(RowMutation mutation, boolean writeCommitLog) www.thelastpickle.com #CassandraEU
  53. 53. o.a.c.db.ColumnFamilyStore ! // Column Family getColumnFamily(QueryFilter filter) getTopLevelColumns(...) ! apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer) www.thelastpickle.com #CassandraEU
  54. 54. o.a.c.db.IColumnContainer ! addColumn(IColumn column) remove(ByteBuffer columnName) ! ColumnFamily SuperColumn ! (Removed in 2.0) www.thelastpickle.com #CassandraEU
  55. 55. o.a.c.db.ISortedColumns ! addColumn(IColumn column, Allocator allocator) removeColumn(ByteBuffer name) ! ArrayBackedSortedColumns AtomicSortedColumns TreeMapBackedSortedColumns www.thelastpickle.com #CassandraEU
  56. 56. o.a.c.db.Memtable ! put(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer) ! flushAndSignal(CountDownLatch latch, Future<ReplayPosition> context) www.thelastpickle.com #CassandraEU
  57. 57. o.a.c.db.ReadCommand ! getRow(Table table) ! SliceByNamesReadCommand SliceFromReadCommand RangeSliceCommand (Additional classes for paging in 2.0) www.thelastpickle.com #CassandraEU
  58. 58. o.a.c.db.IDiskAtomFilter ! getMemtableColumnIterator(...) getSSTableColumnIterator(...) ! IdentityQueryFilter NamesQueryFilter SliceQueryFilter www.thelastpickle.com #CassandraEU
  59. 59. Summary CustomTThreadPoolServer Message.Dispatcher CassandraServer API QueryProcessor Dynamo ReadCommand StorageProxy IResponseResolver IAsyncCallback MessagingService IVerbHandler Table www.thelastpickle.com ColumnFamilyStore IDiskAtomFilter Database #CassandraEU
  60. 60. Thanks. ! www.thelastpickle.com #CassandraEU
  61. 61. Aaron Morton @aaronmorton www.thelastpickle.com ! Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

×