SlideShare a Scribd company logo
1 of 38
Codebase 2011 Getting to know the codebase Gary Dusbabek @gdusbabek
Questions?
Outline How to contribute Internals Some thoughts
How to Contribute
How to Contribute http://wiki.apache.org/cassandra/HowToContribute JIRA: “lhf” label (Low hanging fruit) Scratch your itch
How to Contribute Run the tests ant test nosetests test/system/test_thrift_server.py
How to Contribute http://wiki.apache.org/cassandra/CodeStyle Avoid: Reformatting white space Renaming things everywhere Unrelated changes
How to Contribute Use git Attach patches git format-patch as jira attachments. Group them sensibly
How to Contribute Someone will review your code Usually a committer Persistence helps Don’t get your feelings hurt It usually takes a few rounds
How to Contribute Participate! #cassandra-dev on freenode dev@cassandra.apache.org
Internals
Services Ring Operations (StorageService) Storage Operations (StorageProxy)
Startup Sequence bin/cassandra Finds cassandra.in.sh $CLASSPATH (mandatory) $CASSANDRA_HOME $CASSANDRA_CONF (mandatory) Executes $CASSANDRA_CONF/cassandra-env.sh Sets heap sizes (gc tuning goes here!)
o.a.c.thrift.CassandraDaemon
AbstractCassandraDaemon ACD.setup(): Reads configuration: DatabaseDescriptor Loads schema: DD.loadSchemas() Scrub directories Initialize storage (keyspaces + CFs) Commit log recovery: CL.recover() StorageService.initServer() -> StorageService.joinTokenRing()
Attn Tinkerers! Abstracted initialization of transport. Handy if you’re experimenting with transports/RPC Just extend AbstractCassandraDaemon and make sure that class is started up via bin/cassandra.
o.a.c.thrift.CassandraServer Implements thrift interface methods (the API). Start here when trying to understand the read/write path and RPC.
Configuration DatabaseDescriptor Side-effect of ACD.setup() Reads config settings from yaml Defines system tables Changes regularly I hate this code.  Please fix it.
Main Singletons StorageService StorageProxy MessagingService CompactionManager StageManager MigrationManager
Did you just say ‘Singletons?’
Main Singletons StorageService StorageProxy MessagingService CompactionManager StageManager MigrationManager
JMX MBeans Tooling supplied by Mbeans Anything that does measureable/configurable work is tooled Thread pools Compaction Hinted handoff Streaming Storage Commit log
StorageService initServer() -> joinTokenRing() Starts gossip Starts MessagingService Negotiates bootstrap Many ring operations live here. Repository of ring topology TokenMetadata (quasi-singleton via SS.tokenMetadata_) Partitioner instance is also here
MessagingService Verb handlers live here (initialized from SS). Main event handlers, haven’t changed much. Socket listener 2 threads per ring node Message gateway emitted from MessageProducerimpls MS.sendRR() MS.sendOneWay() MS.receive() Messages are versioned now (0.8) IncomingTCPConnection
StorageProxy Top level of all read/write operations Called from o.a.c.thrift.CassandraServer Write path changed because of counters Notion of WritePerformer Eventually to Table and ColumnFamilyStore Further, to SSTable and related classes.
StageManager Fancy java ThreadPoolExecutor SEDA:  http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf consumes callables from a queue. Manages concurrency. Hasn’t changed much.
Adding API Methods Define method+structures in IDL interface/cassandra.thrift Regenerate files ant gen-thrift-java gen-thrift-py Implement stubs: o.a.c.thrift.CassandraServer Create a system test tests/system/test_thrift_server.py
Reading Socket->CassandraServer Permissions Request validation Marshalling ReadCommands created in CS.multigetSliceInternal, passed to StorageProxy 1 per key
Reading StorageProxy.read(), fetchRows() For each ReadCommand Determine endpoints Local & remote branches
Reading StorageProxy local READ stage executes a LocalReadRunnable True read vs digest Table, ColumnFamilyStore CFS.getTopLevelColumns Make QueryFilter Query Memtables Query SSTables Coalesce in iterators
Reading StorageProxy remote read command Response handler Send to remote nodes Read repair happens in SP.fetchRows().
Writing CS.doInsert() Marshalling, creates RMs StorageProxy local/remote branch SP.sendToHintedEndpoints() RowMutation one Key per (several CFs) ColumnFamily Collection of column modifications
Writing RM.apply->Table.apply Write to CL Iterate over RM CFs CFS.apply() Overwrites results on pre-existing column families
Writing RM is serialized into a Message and sent to other nodes Waits for ACKs depending on CL
Challenges
Challenges To have an in-depth understanding of everything. Hard for hobbyist/part-timers Outside of Datastax, little support for full-timers Still changing fast Keeping up
Challenge: Lines of Code 0.4 (Sep 2009) 52 kloc 0.5 (Jan 2010) 59 kloc 0.6 (Apr 2010) 73 kloc 0.7 (Jan 2011) 122 kloc 0.8 (Jun 2011) 146 kloc Trunk (yesterday) 149 kloc Average: 4,500 lines per month
Challenges Codewise Growing pains Software maturity Decisions made early on

More Related Content

What's hot

使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡Lawrence Huang
 
Gude for C++11 in Apache Traffic Server
Gude for C++11 in Apache Traffic ServerGude for C++11 in Apache Traffic Server
Gude for C++11 in Apache Traffic ServerApache Traffic Server
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQXin Wang
 
Asynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbsAsynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbsAnil Gursel
 
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.XCassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.Xaaronmorton
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparisonshsedghi
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Xavier Lucas
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redisDaeMyung Kang
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net DriverDataStax Academy
 
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaLogging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaMd Safiyat Reza
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Reactivesummit
 
Cassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisCassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisDuyhai Doan
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaAkara Sucharitakul
 
PagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresPagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresDataStax Academy
 
Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014Eric Torreborre
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper Omid Vahdaty
 
Testing Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnitTesting Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnitMarkus Günther
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 Peopleconfluent
 

What's hot (20)

使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡
 
Gude for C++11 in Apache Traffic Server
Gude for C++11 in Apache Traffic ServerGude for C++11 in Apache Traffic Server
Gude for C++11 in Apache Traffic Server
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Asynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbsAsynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbs
 
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.XCassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
 
Apache Zookeeper
Apache ZookeeperApache Zookeeper
Apache Zookeeper
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
 
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaLogging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
 
Cassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisCassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS Paris
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
 
PagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresPagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra Failures
 
Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
 
Testing Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnitTesting Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnit
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
 

Similar to Cassandra Codebase 2011

About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"Jihyun Ahn
 
MySQL HA with PaceMaker
MySQL HA with  PaceMakerMySQL HA with  PaceMaker
MySQL HA with PaceMakerKris Buytaert
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceAshok Modi
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
 
540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdfhamzadamani7
 
WE18_Performance_Up.ppt
WE18_Performance_Up.pptWE18_Performance_Up.ppt
WE18_Performance_Up.pptwebhostingguy
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-applicationNguyễn Duy Nhân
 
Drupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityDrupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityAshok Modi
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disquszeeg
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginnerswebhostingguy
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 
CoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdCoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdRichard Lister
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streamingphanleson
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisationgrooverdan
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
 

Similar to Cassandra Codebase 2011 (20)

About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
MySQL HA with PaceMaker
MySQL HA with  PaceMakerMySQL HA with  PaceMaker
MySQL HA with PaceMaker
 
Performance_Up.ppt
Performance_Up.pptPerformance_Up.ppt
Performance_Up.ppt
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf
 
WE18_Performance_Up.ppt
WE18_Performance_Up.pptWE18_Performance_Up.ppt
WE18_Performance_Up.ppt
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-application
 
Drupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityDrupal Backend Performance and Scalability
Drupal Backend Performance and Scalability
 
Java se7 features
Java se7 featuresJava se7 features
Java se7 features
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginners
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
CoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdCoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love Systemd
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisation
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 
Best practices tekx
Best practices tekxBest practices tekx
Best practices tekx
 

More from gdusbabek

My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015gdusbabek
 
How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014gdusbabek
 
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014gdusbabek
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014gdusbabek
 
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013gdusbabek
 
Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013gdusbabek
 
Rackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCRackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCgdusbabek
 
Austin cassandra meetup
Austin cassandra meetupAustin cassandra meetup
Austin cassandra meetupgdusbabek
 
How Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses CassandraHow Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses Cassandragdusbabek
 
Breaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL DatastoresBreaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL Datastoresgdusbabek
 
Building Rackspace Cloud Monitoring
Building Rackspace Cloud MonitoringBuilding Rackspace Cloud Monitoring
Building Rackspace Cloud Monitoringgdusbabek
 
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column FamiliesData Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Familiesgdusbabek
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)gdusbabek
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGgdusbabek
 

More from gdusbabek (14)

My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
 
How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014
 
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014
 
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013
 
Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013
 
Rackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCRackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYC
 
Austin cassandra meetup
Austin cassandra meetupAustin cassandra meetup
Austin cassandra meetup
 
How Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses CassandraHow Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses Cassandra
 
Breaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL DatastoresBreaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL Datastores
 
Building Rackspace Cloud Monitoring
Building Rackspace Cloud MonitoringBuilding Rackspace Cloud Monitoring
Building Rackspace Cloud Monitoring
 
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column FamiliesData Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUG
 

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Cassandra Codebase 2011

  • 1. Codebase 2011 Getting to know the codebase Gary Dusbabek @gdusbabek
  • 3. Outline How to contribute Internals Some thoughts
  • 5. How to Contribute http://wiki.apache.org/cassandra/HowToContribute JIRA: “lhf” label (Low hanging fruit) Scratch your itch
  • 6. How to Contribute Run the tests ant test nosetests test/system/test_thrift_server.py
  • 7. How to Contribute http://wiki.apache.org/cassandra/CodeStyle Avoid: Reformatting white space Renaming things everywhere Unrelated changes
  • 8. How to Contribute Use git Attach patches git format-patch as jira attachments. Group them sensibly
  • 9. How to Contribute Someone will review your code Usually a committer Persistence helps Don’t get your feelings hurt It usually takes a few rounds
  • 10. How to Contribute Participate! #cassandra-dev on freenode dev@cassandra.apache.org
  • 12. Services Ring Operations (StorageService) Storage Operations (StorageProxy)
  • 13. Startup Sequence bin/cassandra Finds cassandra.in.sh $CLASSPATH (mandatory) $CASSANDRA_HOME $CASSANDRA_CONF (mandatory) Executes $CASSANDRA_CONF/cassandra-env.sh Sets heap sizes (gc tuning goes here!)
  • 15. AbstractCassandraDaemon ACD.setup(): Reads configuration: DatabaseDescriptor Loads schema: DD.loadSchemas() Scrub directories Initialize storage (keyspaces + CFs) Commit log recovery: CL.recover() StorageService.initServer() -> StorageService.joinTokenRing()
  • 16. Attn Tinkerers! Abstracted initialization of transport. Handy if you’re experimenting with transports/RPC Just extend AbstractCassandraDaemon and make sure that class is started up via bin/cassandra.
  • 17. o.a.c.thrift.CassandraServer Implements thrift interface methods (the API). Start here when trying to understand the read/write path and RPC.
  • 18. Configuration DatabaseDescriptor Side-effect of ACD.setup() Reads config settings from yaml Defines system tables Changes regularly I hate this code. Please fix it.
  • 19. Main Singletons StorageService StorageProxy MessagingService CompactionManager StageManager MigrationManager
  • 20. Did you just say ‘Singletons?’
  • 21. Main Singletons StorageService StorageProxy MessagingService CompactionManager StageManager MigrationManager
  • 22. JMX MBeans Tooling supplied by Mbeans Anything that does measureable/configurable work is tooled Thread pools Compaction Hinted handoff Streaming Storage Commit log
  • 23. StorageService initServer() -> joinTokenRing() Starts gossip Starts MessagingService Negotiates bootstrap Many ring operations live here. Repository of ring topology TokenMetadata (quasi-singleton via SS.tokenMetadata_) Partitioner instance is also here
  • 24. MessagingService Verb handlers live here (initialized from SS). Main event handlers, haven’t changed much. Socket listener 2 threads per ring node Message gateway emitted from MessageProducerimpls MS.sendRR() MS.sendOneWay() MS.receive() Messages are versioned now (0.8) IncomingTCPConnection
  • 25. StorageProxy Top level of all read/write operations Called from o.a.c.thrift.CassandraServer Write path changed because of counters Notion of WritePerformer Eventually to Table and ColumnFamilyStore Further, to SSTable and related classes.
  • 26. StageManager Fancy java ThreadPoolExecutor SEDA: http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf consumes callables from a queue. Manages concurrency. Hasn’t changed much.
  • 27. Adding API Methods Define method+structures in IDL interface/cassandra.thrift Regenerate files ant gen-thrift-java gen-thrift-py Implement stubs: o.a.c.thrift.CassandraServer Create a system test tests/system/test_thrift_server.py
  • 28. Reading Socket->CassandraServer Permissions Request validation Marshalling ReadCommands created in CS.multigetSliceInternal, passed to StorageProxy 1 per key
  • 29. Reading StorageProxy.read(), fetchRows() For each ReadCommand Determine endpoints Local & remote branches
  • 30. Reading StorageProxy local READ stage executes a LocalReadRunnable True read vs digest Table, ColumnFamilyStore CFS.getTopLevelColumns Make QueryFilter Query Memtables Query SSTables Coalesce in iterators
  • 31. Reading StorageProxy remote read command Response handler Send to remote nodes Read repair happens in SP.fetchRows().
  • 32. Writing CS.doInsert() Marshalling, creates RMs StorageProxy local/remote branch SP.sendToHintedEndpoints() RowMutation one Key per (several CFs) ColumnFamily Collection of column modifications
  • 33. Writing RM.apply->Table.apply Write to CL Iterate over RM CFs CFS.apply() Overwrites results on pre-existing column families
  • 34. Writing RM is serialized into a Message and sent to other nodes Waits for ACKs depending on CL
  • 36. Challenges To have an in-depth understanding of everything. Hard for hobbyist/part-timers Outside of Datastax, little support for full-timers Still changing fast Keeping up
  • 37. Challenge: Lines of Code 0.4 (Sep 2009) 52 kloc 0.5 (Jan 2010) 59 kloc 0.6 (Apr 2010) 73 kloc 0.7 (Jan 2011) 122 kloc 0.8 (Jun 2011) 146 kloc Trunk (yesterday) 149 kloc Average: 4,500 lines per month
  • 38. Challenges Codewise Growing pains Software maturity Decisions made early on

Editor's Notes

  1. Who was here last year?Very good presentations on data modeling and capacity planning.
  2. Turn it around.Ask questions first.
  3. Transport still not initialized though.DD getting loaded is just a side-effect
  4. This is actually a good exercise.
  5. Good place to extend and experiment on your own.