Your SlideShare is downloading. ×
0
CASSANDRA COMMUNITY WEBINARS AUGUST 2013
CASSANDRA
INTERNALS
Aaron Morton
@aaronmorton
Co-Founder & Principal Consultant
w...
AboutThe Last Pickle
Work with clients to deliver and improve
Apache Cassandra based solutions.
Apache Cassandra Committer...
Architecture
Code
www.thelastpickle.com
Cassandra Architecture
API's
Cluster Aware
Cluster Unaware
Clients
Disk
www.thelastpickle.com
Cassandra Cluster Architecture
API's
Cluster Aware
Cluster Unaware
Clients
Disk
API's
Cluster Aware
Cluster Unaware
Disk
N...
Dynamo Cluster Architecture
API's
Dynamo
Database
Clients
Disk
API's
Dynamo
Database
Disk
Node 1 Node 2
www.thelastpickle....
Architecture
API
Dynamo
Database
www.thelastpickle.com
APITransports
Thrift
Native Binary
www.thelastpickle.com
ThriftTransport
//Custom TServer implementations
o.a.c.thrift.CustomTThreadPoolServer
o.a.c.thrift.CustomTNonBlockingServe...
APITransports
Thrift
Native Binary
www.thelastpickle.com
Native BinaryTransport
Beta in Cassandra 1.2
Uses Netty
Enabled with
start_native_transport
(Disabled by default)
www.thel...
o.a.c.transport.Server.run()
//Setup the Netty server
new ExecutionHandler()
new NioServerSocketChannelFactory()
ServerBoo...
o.a.c.transport.Message.Dispatcher.messageReceived()
//Process message from client
ServerConnection.validateNewMessage()
R...
Messages
Defined in the Native Binary
Protocol
$SRC/doc/native_protocol.spec
www.thelastpickle.com
API Services
JMX
Thrift
CQL 3
www.thelastpickle.com
JMX Management Beans
Spread around the code base.
Interfaces named *MBean
www.thelastpickle.com
JMX Management Beans
Registered with names such as
org.apache.cassandra.db:
type=StorageProxy
www.thelastpickle.com
API Services
JMX
Thrift
CQL 3
www.thelastpickle.com
o.a.c.thrift.CassandraServer
// Implements Thrift Interface
// Access control
// Input validation
// Mapping to/from Thrif...
Thrift Interface
Thrift IDL
$SRC/interface/cassandra.thrift
www.thelastpickle.com
o.a.c.thrift.CassandraServer.get_slice()
// get columns for one row
Tracing.begin()
ClientState cState = state()
cState.ha...
CassandraServer.multigetSliceInternal()
// get columns for may rows
ThriftValidation.validate*()
// Create ReadCommands
ge...
CassandraServer.getSlice()
// Process ReadCommands
// return Thrift types
readColumnFamily()
thriftifyColumnFamily()
www.t...
CassandraServer.readColumnFamily()
// Process ReadCommands
// Return ColumnFamilies
StorageProxy.read()
www.thelastpickle....
API Services
JMX
Thrift
CQL 3
www.thelastpickle.com
o.a.c.cql3.QueryProcessor
// Prepares and executes CQL3 statements
// Used by Thrift & Native transports
// Access control...
CQL3 Grammar
ANTLR Grammar
$SRC/o.a.c.cql3/Cql.g
www.thelastpickle.com
o.a.c.cql3.statements.ParsedStatement
// Subclasses generated by ANTLR
// Tracks bound term count
// Prepare CQLStatement
...
o.a.c.cql3.statements.CQLStatement
checkAccess(ClientState state)
validate(ClientState state)
execute(ConsistencyLevel cl,...
statements.SelectStatement.RawStatement
// Implements ParsedStatement
// Input validation
prepare()
www.thelastpickle.com
statements.SelectStatement.execute()
// Create ReadCommands
StorageProxy.read()
www.thelastpickle.com
Architecture
API
Dynamo
Database
www.thelastpickle.com
Dynamo Layer
o.a.c.service
o.a.c.net
o.a.c.dht
o.a.c.gms
o.a.c.locator
o.a.c.stream
www.thelastpickle.com
o.a.c.service.StorageProxy
// Cluster wide storage operations
// Select endpoints & check CL available
// Send messages to...
o.a.c.service.StorageService
// Ring operations
// Track ring state
// Start & stop ring membership
// Node & token querie...
o.a.c.service.IResponseResolver
preprocess(MessageIn<T> message)
resolve() throws
DigestMismatchException
RowDigestResolve...
Response Handlers / Callback
implements IAsyncCallback<T>
response(MessageIn<T> msg)
www.thelastpickle.com
o.a.c.service.ReadCallback.get()
//Wait for blockfor & data
condition.await(timeout,
TimeUnit.MILLISECONDS)
throw ReadTime...
o.a.c.service.StorageProxy.fetchRows()
getLiveSortedEndpoints()
new RowDigestResolver()
new ReadCallback()
MessagingServic...
Dynamo Layer
o.a.c.service
o.a.c.net
o.a.c.dht
o.a.c.gms
o.a.c.locator
o.a.c.stream
www.thelastpickle.com
o.a.c.net.MessagingService.verb<<enum>>
MUTATION
READ
REQUEST_RESPONSE
TREE_REQUEST
TREE_RESPONSE
(And more...)
www.thelas...
o.a.c.net.MessagingService.verbHandlers
new EnumMap<Verb,
IVerbHandler>(Verb.class)
www.thelastpickle.com
o.a.c.net.IVerbHandler<T>
doVerb(MessageIn<T> message,
String id);
www.thelastpickle.com
o.a.c.net.MessagingService.verbStages
new EnumMap<MessagingService.Verb,
Stage>(MessagingService.Verb.class)
www.thelastpi...
o.a.c.net.MessagingService.receive()
runnable = new MessageDeliveryTask(
message, id, timestamp);
StageManager.getStage(
m...
o.a.c.net.MessageDeliveryTask.run()
// If dropable and rpc_timeout
MessagingService.incrementDroppedMessag
es(verb);
Messa...
Architecture
API Layer
Dynamo Layer
Database Layer
www.thelastpickle.com
Database Layer
o.a.c.concurrent
o.a.c.db
o.a.c.cache
o.a.c.io
o.a.c.trace
www.thelastpickle.com
o.a.c.concurrent.StageManager
stages = new EnumMap<Stage,
ThreadPoolExecutor>(Stage.class);
getStage(Stage stage)
www.thel...
o.a.c.concurrent.Stage
READ
MUTATION
GOSSIP
REQUEST_RESPONSE
ANTI_ENTROPY
(And more...)
www.thelastpickle.com
Database Layer
o.a.c.concurrent
o.a.c.db
o.a.c.cache
o.a.c.io
o.a.c.trace
www.thelastpickle.com
o.a.c.db.Table
// Keyspace
open(String table)
getColumnFamilyStore(String cfName)
getRow(QueryFilter filter)
apply(RowMuta...
o.a.c.db.ColumnFamilyStore
// Column Family
getColumnFamily(QueryFilter filter)
getTopLevelColumns(...)
apply(DecoratedKey...
o.a.c.db.IColumnContainer
addColumn(IColumn column)
remove(ByteBuffer columnName)
ColumnFamily
SuperColumn
www.thelastpick...
o.a.c.db.ISortedColumns
addColumn(IColumn column,
Allocator allocator)
removeColumn(ByteBuffer name)
ArrayBackedSortedColu...
o.a.c.db.Memtable
put(DecoratedKey key,
ColumnFamily columnFamily,
SecondaryIndexManager.Updater
indexer)
flushAndSignal(C...
o.a.c.db.ReadCommand
getRow(Table table)
SliceByNamesReadCommand
SliceFromReadCommand
www.thelastpickle.com
o.a.c.db.IDiskAtomFilter
getMemtableColumnIterator(...)
getSSTableColumnIterator(...)
IdentityQueryFilter
NamesQueryFilter...
Summary
CustomTThreadPoolServer Message.Dispatcher
CassandraServer QueryProcessor
ReadCommand
StorageProxy
IResponseResolv...
Thanks.
www.thelastpickle.com
Aaron Morton
@aaronmorton
Co-Founder & Principal Consultant
www.thelastpickle.com
Licensed under a Creative Commons Attrib...
Upcoming SlideShare
Loading in...5
×

Cassandra Community Webinar: Apache Cassandra Internals

1,694

Published on

Apache Cassandra solves many interesting problems to provide a scalable, distributed, fault tolerant database. Cluster wide operations track node membership, direct requests and implement consistency guarantees. At the node level, the Log Structured storage engine provides high performance reads and writes. All of this is implemented in a Java code base that has greatly matured over the past few years.

In this webinar Aaron Morton will step through read and write requests, automatic processes and manual maintenance tasks. He will also discuss the general approach to solving the problem and drill down to the code responsible for implementation.

Speaker: Aaron Morton, Apache Cassandra Committer
Aaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010 he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,694
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
62
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Cassandra Community Webinar: Apache Cassandra Internals"

  1. 1. CASSANDRA COMMUNITY WEBINARS AUGUST 2013 CASSANDRA INTERNALS Aaron Morton @aaronmorton Co-Founder & Principal Consultant www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  2. 2. AboutThe Last Pickle Work with clients to deliver and improve Apache Cassandra based solutions. Apache Cassandra Committer, DataStax MVP, Hector Maintainer, 6+ years combined Cassandra experience. Based in New Zealand & Austin,TX.
  3. 3. Architecture Code www.thelastpickle.com
  4. 4. Cassandra Architecture API's Cluster Aware Cluster Unaware Clients Disk www.thelastpickle.com
  5. 5. Cassandra Cluster Architecture API's Cluster Aware Cluster Unaware Clients Disk API's Cluster Aware Cluster Unaware Disk Node 1 Node 2 www.thelastpickle.com
  6. 6. Dynamo Cluster Architecture API's Dynamo Database Clients Disk API's Dynamo Database Disk Node 1 Node 2 www.thelastpickle.com
  7. 7. Architecture API Dynamo Database www.thelastpickle.com
  8. 8. APITransports Thrift Native Binary www.thelastpickle.com
  9. 9. ThriftTransport //Custom TServer implementations o.a.c.thrift.CustomTThreadPoolServer o.a.c.thrift.CustomTNonBlockingServer o.a.c.thrift.CustomTHsHaServer www.thelastpickle.com
  10. 10. APITransports Thrift Native Binary www.thelastpickle.com
  11. 11. Native BinaryTransport Beta in Cassandra 1.2 Uses Netty Enabled with start_native_transport (Disabled by default) www.thelastpickle.com
  12. 12. o.a.c.transport.Server.run() //Setup the Netty server new ExecutionHandler() new NioServerSocketChannelFactory() ServerBootstrap.setPipelineFactory() www.thelastpickle.com
  13. 13. o.a.c.transport.Message.Dispatcher.messageReceived() //Process message from client ServerConnection.validateNewMessage() Request.execute() ServerConnection.applyStateTransition() Channel.write() www.thelastpickle.com
  14. 14. Messages Defined in the Native Binary Protocol $SRC/doc/native_protocol.spec www.thelastpickle.com
  15. 15. API Services JMX Thrift CQL 3 www.thelastpickle.com
  16. 16. JMX Management Beans Spread around the code base. Interfaces named *MBean www.thelastpickle.com
  17. 17. JMX Management Beans Registered with names such as org.apache.cassandra.db: type=StorageProxy www.thelastpickle.com
  18. 18. API Services JMX Thrift CQL 3 www.thelastpickle.com
  19. 19. o.a.c.thrift.CassandraServer // Implements Thrift Interface // Access control // Input validation // Mapping to/from Thrift and internal types www.thelastpickle.com
  20. 20. Thrift Interface Thrift IDL $SRC/interface/cassandra.thrift www.thelastpickle.com
  21. 21. o.a.c.thrift.CassandraServer.get_slice() // get columns for one row Tracing.begin() ClientState cState = state() cState.hasColumnFamilyAccess() multigetSliceInternal() www.thelastpickle.com
  22. 22. CassandraServer.multigetSliceInternal() // get columns for may rows ThriftValidation.validate*() // Create ReadCommands getSlice() www.thelastpickle.com
  23. 23. CassandraServer.getSlice() // Process ReadCommands // return Thrift types readColumnFamily() thriftifyColumnFamily() www.thelastpickle.com
  24. 24. CassandraServer.readColumnFamily() // Process ReadCommands // Return ColumnFamilies StorageProxy.read() www.thelastpickle.com
  25. 25. API Services JMX Thrift CQL 3 www.thelastpickle.com
  26. 26. o.a.c.cql3.QueryProcessor // Prepares and executes CQL3 statements // Used by Thrift & Native transports // Access control // Input validation // Returns transport.ResultMessage www.thelastpickle.com
  27. 27. CQL3 Grammar ANTLR Grammar $SRC/o.a.c.cql3/Cql.g www.thelastpickle.com
  28. 28. o.a.c.cql3.statements.ParsedStatement // Subclasses generated by ANTLR // Tracks bound term count // Prepare CQLStatement prepare() www.thelastpickle.com
  29. 29. o.a.c.cql3.statements.CQLStatement checkAccess(ClientState state) validate(ClientState state) execute(ConsistencyLevel cl, QueryState state, List<ByteBuffer> variables) www.thelastpickle.com
  30. 30. statements.SelectStatement.RawStatement // Implements ParsedStatement // Input validation prepare() www.thelastpickle.com
  31. 31. statements.SelectStatement.execute() // Create ReadCommands StorageProxy.read() www.thelastpickle.com
  32. 32. Architecture API Dynamo Database www.thelastpickle.com
  33. 33. Dynamo Layer o.a.c.service o.a.c.net o.a.c.dht o.a.c.gms o.a.c.locator o.a.c.stream www.thelastpickle.com
  34. 34. o.a.c.service.StorageProxy // Cluster wide storage operations // Select endpoints & check CL available // Send messages to Stages // Wait for response // Store Hints www.thelastpickle.com
  35. 35. o.a.c.service.StorageService // Ring operations // Track ring state // Start & stop ring membership // Node & token queries www.thelastpickle.com
  36. 36. o.a.c.service.IResponseResolver preprocess(MessageIn<T> message) resolve() throws DigestMismatchException RowDigestResolver RowDataResolver RangeSliceResponseResolver www.thelastpickle.com
  37. 37. Response Handlers / Callback implements IAsyncCallback<T> response(MessageIn<T> msg) www.thelastpickle.com
  38. 38. o.a.c.service.ReadCallback.get() //Wait for blockfor & data condition.await(timeout, TimeUnit.MILLISECONDS) throw ReadTimeoutException() resolver.resolve() www.thelastpickle.com
  39. 39. o.a.c.service.StorageProxy.fetchRows() getLiveSortedEndpoints() new RowDigestResolver() new ReadCallback() MessagingService.sendRR() --------------------------------------- ReadCallback.get() # blocking catch (DigestMismatchException ex) catch (ReadTimeoutException ex) www.thelastpickle.com
  40. 40. Dynamo Layer o.a.c.service o.a.c.net o.a.c.dht o.a.c.gms o.a.c.locator o.a.c.stream www.thelastpickle.com
  41. 41. o.a.c.net.MessagingService.verb<<enum>> MUTATION READ REQUEST_RESPONSE TREE_REQUEST TREE_RESPONSE (And more...) www.thelastpickle.com
  42. 42. o.a.c.net.MessagingService.verbHandlers new EnumMap<Verb, IVerbHandler>(Verb.class) www.thelastpickle.com
  43. 43. o.a.c.net.IVerbHandler<T> doVerb(MessageIn<T> message, String id); www.thelastpickle.com
  44. 44. o.a.c.net.MessagingService.verbStages new EnumMap<MessagingService.Verb, Stage>(MessagingService.Verb.class) www.thelastpickle.com
  45. 45. o.a.c.net.MessagingService.receive() runnable = new MessageDeliveryTask( message, id, timestamp); StageManager.getStage( message.getMessageType()); stage.execute(runnable); www.thelastpickle.com
  46. 46. o.a.c.net.MessageDeliveryTask.run() // If dropable and rpc_timeout MessagingService.incrementDroppedMessag es(verb); MessagingService.getVerbHandler(verb) verbHandler.doVerb(message, id) www.thelastpickle.com
  47. 47. Architecture API Layer Dynamo Layer Database Layer www.thelastpickle.com
  48. 48. Database Layer o.a.c.concurrent o.a.c.db o.a.c.cache o.a.c.io o.a.c.trace www.thelastpickle.com
  49. 49. o.a.c.concurrent.StageManager stages = new EnumMap<Stage, ThreadPoolExecutor>(Stage.class); getStage(Stage stage) www.thelastpickle.com
  50. 50. o.a.c.concurrent.Stage READ MUTATION GOSSIP REQUEST_RESPONSE ANTI_ENTROPY (And more...) www.thelastpickle.com
  51. 51. Database Layer o.a.c.concurrent o.a.c.db o.a.c.cache o.a.c.io o.a.c.trace www.thelastpickle.com
  52. 52. o.a.c.db.Table // Keyspace open(String table) getColumnFamilyStore(String cfName) getRow(QueryFilter filter) apply(RowMutation mutation, boolean writeCommitLog) www.thelastpickle.com
  53. 53. o.a.c.db.ColumnFamilyStore // Column Family getColumnFamily(QueryFilter filter) getTopLevelColumns(...) apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer) www.thelastpickle.com
  54. 54. o.a.c.db.IColumnContainer addColumn(IColumn column) remove(ByteBuffer columnName) ColumnFamily SuperColumn www.thelastpickle.com
  55. 55. o.a.c.db.ISortedColumns addColumn(IColumn column, Allocator allocator) removeColumn(ByteBuffer name) ArrayBackedSortedColumns AtomicSortedColumns TreeMapBackedSortedColumns www.thelastpickle.com
  56. 56. o.a.c.db.Memtable put(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer) flushAndSignal(CountDownLatch latch, Future<ReplayPosition> context) www.thelastpickle.com
  57. 57. o.a.c.db.ReadCommand getRow(Table table) SliceByNamesReadCommand SliceFromReadCommand www.thelastpickle.com
  58. 58. o.a.c.db.IDiskAtomFilter getMemtableColumnIterator(...) getSSTableColumnIterator(...) IdentityQueryFilter NamesQueryFilter SliceQueryFilter www.thelastpickle.com
  59. 59. Summary CustomTThreadPoolServer Message.Dispatcher CassandraServer QueryProcessor ReadCommand StorageProxy IResponseResolver IAsyncCallback MessagingService IVerbHandler Table ColumnFamilyStore IDiskAtomFilter API Dynamo Database www.thelastpickle.com
  60. 60. Thanks. www.thelastpickle.com
  61. 61. Aaron Morton @aaronmorton Co-Founder & Principal Consultant www.thelastpickle.com Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×