Cassandra SF 2013 - Cassandra Internals

3,082 views

Published on

Cassandra SF 2013 Conference - Cassandra Internals

Published in: Technology

Cassandra SF 2013 - Cassandra Internals

  1. 1. CASSANDRA SF 2013CASSANDRAINTERNALSAaron Morton@aaronmortonwww.thelastpickle.com#Cassandra13Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  2. 2. About MeFreelance Cassandra ConsultantBased in Wellington, New ZealandApache Cassandra Committer#Cassandra13
  3. 3. ArchitectureCode#Cassandra13
  4. 4. Cassandra ArchitectureAPIsCluster AwareCluster UnawareClientsDisk#Cassandra13
  5. 5. Cassandra Cluster ArchitectureAPIsCluster AwareCluster UnawareClientsDiskAPIsCluster AwareCluster UnawareDiskNode 1 Node 2#Cassandra13
  6. 6. Dynamo Cluster ArchitectureAPIsDynamoDatabaseClientsDiskAPIsDynamoDatabaseDiskNode 1 Node 2#Cassandra13
  7. 7. ArchitectureAPIDynamoDatabase#Cassandra13
  8. 8. APITransportsThriftNative Binary#Cassandra13
  9. 9. ThriftTransport//Custom TServer implementationso.a.c.thrift.CustomTThreadPoolServero.a.c.thrift.CustomTNonBlockingServero.a.c.thrift.CustomTHsHaServer#Cassandra13
  10. 10. APITransportsThriftNative Binary#Cassandra13
  11. 11. Native BinaryTransportBeta in Cassandra 1.2Uses NettyEnabled withstart_native_transport(Disabled by default)#Cassandra13
  12. 12. o.a.c.transport.Server.run()//Setup the Netty servernew ExecutionHandler()new NioServerSocketChannelFactory()ServerBootstrap.setPipelineFactory()#Cassandra13
  13. 13. o.a.c.transport.Message.Dispatcher.messageReceived()//Process message from clientServerConnection.validateNewMessage()Request.execute()ServerConnection.applyStateTransition()Channel.write()#Cassandra13
  14. 14. MessagesDefined in the Native BinaryProtocol$SRC/doc/native_protocol.spec#Cassandra13
  15. 15. API ServicesJMXThriftCQL 3#Cassandra13
  16. 16. JMX Management BeansSpread around the code base.Interfaces named *MBean#Cassandra13
  17. 17. JMX Management BeansRegistered with names such asorg.apache.cassandra.db:type=StorageProxy#Cassandra13
  18. 18. API ServicesJMXThriftCQL 3#Cassandra13
  19. 19. o.a.c.thrift.CassandraServer// Implements Thrift Interface// Access control// Input validation// Mapping to/from Thrift and internal types#Cassandra13
  20. 20. Thrift InterfaceThrift IDL$SRC/interface/cassandra.thrift#Cassandra13
  21. 21. o.a.c.thrift.CassandraServer.get_slice()// get columns for one rowTracing.begin()ClientState cState = state()cState.hasColumnFamilyAccess()multigetSliceInternal()#Cassandra13
  22. 22. CassandraServer.multigetSliceInternal()// get columns for may rowsThriftValidation.validate*()// Create ReadCommandsgetSlice()#Cassandra13
  23. 23. CassandraServer.getSlice()// Process ReadCommands// return Thrift typesreadColumnFamily()thriftifyColumnFamily()#Cassandra13
  24. 24. CassandraServer.readColumnFamily()// Process ReadCommands// Return ColumnFamiliesStorageProxy.read()#Cassandra13
  25. 25. API ServicesJMXThriftCQL 3#Cassandra13
  26. 26. o.a.c.cql3.QueryProcessor// Prepares and executes CQL3 statements// Used by Thrift & Native transports// Access control// Input validation// Returns transport.ResultMessage#Cassandra13
  27. 27. CQL3 GrammarANTLR Grammar$SRC/o.a.c.cql3/Cql.g#Cassandra13
  28. 28. o.a.c.cql3.statements.ParsedStatement// Subclasses generated by ANTLR// Tracks bound term count// Prepare CQLStatementprepare()#Cassandra13
  29. 29. o.a.c.cql3.statements.CQLStatementcheckAccess(ClientState state)validate(ClientState state)execute(ConsistencyLevel cl,QueryState state,List<ByteBuffer> variables)#Cassandra13
  30. 30. statements.SelectStatement.RawStatement// Implements ParsedStatement// Input validationprepare()#Cassandra13
  31. 31. statements.SelectStatement.execute()// Create ReadCommandsStorageProxy.read()#Cassandra13
  32. 32. ArchitectureAPIDynamoDatabase#Cassandra13
  33. 33. Dynamo Layero.a.c.serviceo.a.c.neto.a.c.dhto.a.c.gmso.a.c.locatoro.a.c.stream#Cassandra13
  34. 34. o.a.c.service.StorageProxy// Cluster wide storage operations// Select endpoints & check CL available// Send messages to Stages// Wait for response// Store Hints#Cassandra13
  35. 35. o.a.c.service.StorageService// Ring operations// Track ring state// Start & stop ring membership// Node & token queries#Cassandra13
  36. 36. o.a.c.service.IResponseResolverpreprocess(MessageIn<T> message)resolve() throwsDigestMismatchExceptionRowDigestResolverRowDataResolverRangeSliceResponseResolver#Cassandra13
  37. 37. Response Handlers / Callbackimplements IAsyncCallback<T>response(MessageIn<T> msg)#Cassandra13
  38. 38. o.a.c.service.ReadCallback.get()//Wait for blockfor & datacondition.await(timeout,TimeUnit.MILLISECONDS)throw ReadTimeoutException()resolver.resolve()#Cassandra13
  39. 39. o.a.c.service.StorageProxy.fetchRows()getLiveSortedEndpoints()new RowDigestResolver()new ReadCallback()MessagingService.sendRR()---------------------------------------ReadCallback.get() # blockingcatch (DigestMismatchException ex)catch (ReadTimeoutException ex)#Cassandra13
  40. 40. Dynamo Layero.a.c.serviceo.a.c.neto.a.c.dhto.a.c.gmso.a.c.locatoro.a.c.stream#Cassandra13
  41. 41. o.a.c.net.MessagingService.verb<<enum>>MUTATIONREADREQUEST_RESPONSETREE_REQUESTTREE_RESPONSE(And more...)#Cassandra13
  42. 42. o.a.c.net.MessagingService.verbHandlersnew EnumMap<Verb,IVerbHandler>(Verb.class)#Cassandra13
  43. 43. o.a.c.net.IVerbHandler<T>doVerb(MessageIn<T> message,String id);#Cassandra13
  44. 44. o.a.c.net.MessagingService.verbStagesnew EnumMap<MessagingService.Verb,Stage>(MessagingService.Verb.class)#Cassandra13
  45. 45. o.a.c.net.MessagingService.receive()runnable = new MessageDeliveryTask(message, id, timestamp);StageManager.getStage(message.getMessageType());stage.execute(runnable);#Cassandra13
  46. 46. o.a.c.net.MessageDeliveryTask.run()// If dropable and rpc_timeoutMessagingService.incrementDroppedMessages(verb);MessagingService.getVerbHandler(verb)verbHandler.doVerb(message, id)#Cassandra13
  47. 47. ArchitectureAPI LayerDynamo LayerDatabase Layer#Cassandra13
  48. 48. Database Layero.a.c.concurrento.a.c.dbo.a.c.cacheo.a.c.ioo.a.c.trace#Cassandra13
  49. 49. o.a.c.concurrent.StageManagerstages = new EnumMap<Stage,ThreadPoolExecutor>(Stage.class);getStage(Stage stage)#Cassandra13
  50. 50. o.a.c.concurrent.StageREADMUTATIONGOSSIPREQUEST_RESPONSEANTI_ENTROPY(And more...)#Cassandra13
  51. 51. Database Layero.a.c.concurrento.a.c.dbo.a.c.cacheo.a.c.ioo.a.c.trace#Cassandra13
  52. 52. o.a.c.db.Table// Keyspaceopen(String table)getColumnFamilyStore(String cfName)getRow(QueryFilter filter)apply(RowMutation mutation,boolean writeCommitLog)#Cassandra13
  53. 53. o.a.c.db.ColumnFamilyStore// Column FamilygetColumnFamily(QueryFilter filter)getTopLevelColumns(...)apply(DecoratedKey key,ColumnFamily columnFamily,SecondaryIndexManager.Updaterindexer)#Cassandra13
  54. 54. o.a.c.db.IColumnContaineraddColumn(IColumn column)remove(ByteBuffer columnName)ColumnFamilySuperColumn#Cassandra13
  55. 55. o.a.c.db.ISortedColumnsaddColumn(IColumn column,Allocator allocator)removeColumn(ByteBuffer name)ArrayBackedSortedColumnsAtomicSortedColumnsTreeMapBackedSortedColumns#Cassandra13
  56. 56. o.a.c.db.Memtableput(DecoratedKey key,ColumnFamily columnFamily,SecondaryIndexManager.Updaterindexer)flushAndSignal(CountDownLatch latch,Future<ReplayPosition>context)#Cassandra13
  57. 57. o.a.c.db.ReadCommandgetRow(Table table)SliceByNamesReadCommandSliceFromReadCommand#Cassandra13
  58. 58. o.a.c.db.IDiskAtomFiltergetMemtableColumnIterator(...)getSSTableColumnIterator(...)IdentityQueryFilterNamesQueryFilterSliceQueryFilter#Cassandra13
  59. 59. SummaryCustomTThreadPoolServer Message.DispatcherCassandraServer QueryProcessorReadCommandStorageProxyIResponseResolverIAsyncCallbackMessagingServiceIVerbHandlerTable ColumnFamilyStore IDiskAtomFilterAPIDynamoDatabase#Cassandra13
  60. 60. Thanks.#Cassandra13
  61. 61. Aaron Morton@aaronmortonwww.thelastpickle.comLicensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

×