SlideShare a Scribd company logo
1 of 38
Codebase 2011 Getting to know the codebase Gary Dusbabek @gdusbabek
Questions?
Outline How to contribute Internals Some thoughts
How to Contribute
How to Contribute http://wiki.apache.org/cassandra/HowToContribute JIRA: “lhf” label (Low hanging fruit) Scratch your itch
How to Contribute Run the tests ant test nosetests test/system/test_thrift_server.py
How to Contribute http://wiki.apache.org/cassandra/CodeStyle Avoid: Reformatting white space Renaming things everywhere Unrelated changes
How to Contribute Use git Attach patches git format-patch as jira attachments. Group them sensibly
How to Contribute Someone will review your code Usually a committer Persistence helps Don’t get your feelings hurt It usually takes a few rounds
How to Contribute Participate! #cassandra-dev on freenode dev@cassandra.apache.org
Internals
Services Ring Operations (StorageService) Storage Operations (StorageProxy)
Startup Sequence bin/cassandra Finds cassandra.in.sh $CLASSPATH (mandatory) $CASSANDRA_HOME $CASSANDRA_CONF (mandatory) Executes $CASSANDRA_CONF/cassandra-env.sh Sets heap sizes (gc tuning goes here!)
o.a.c.thrift.CassandraDaemon
AbstractCassandraDaemon ACD.setup(): Reads configuration: DatabaseDescriptor Loads schema: DD.loadSchemas() Scrub directories Initialize storage (keyspaces + CFs) Commit log recovery: CL.recover() StorageService.initServer() -> StorageService.joinTokenRing()
Attn Tinkerers! Abstracted initialization of transport. Handy if you’re experimenting with transports/RPC Just extend AbstractCassandraDaemon and make sure that class is started up via bin/cassandra.
o.a.c.thrift.CassandraServer Implements thrift interface methods (the API). Start here when trying to understand the read/write path and RPC.
Configuration DatabaseDescriptor Side-effect of ACD.setup() Reads config settings from yaml Defines system tables Changes regularly I hate this code.  Please fix it.
Main Singletons StorageService StorageProxy MessagingService CompactionManager StageManager MigrationManager
Did you just say ‘Singletons?’
Main Singletons StorageService StorageProxy MessagingService CompactionManager StageManager MigrationManager
JMX MBeans Tooling supplied by Mbeans Anything that does measureable/configurable work is tooled Thread pools Compaction Hinted handoff Streaming Storage Commit log
StorageService initServer() -> joinTokenRing() Starts gossip Starts MessagingService Negotiates bootstrap Many ring operations live here. Repository of ring topology TokenMetadata (quasi-singleton via SS.tokenMetadata_) Partitioner instance is also here
MessagingService Verb handlers live here (initialized from SS). Main event handlers, haven’t changed much. Socket listener 2 threads per ring node Message gateway emitted from MessageProducerimpls MS.sendRR() MS.sendOneWay() MS.receive() Messages are versioned now (0.8) IncomingTCPConnection
StorageProxy Top level of all read/write operations Called from o.a.c.thrift.CassandraServer Write path changed because of counters Notion of WritePerformer Eventually to Table and ColumnFamilyStore Further, to SSTable and related classes.
StageManager Fancy java ThreadPoolExecutor SEDA:  http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf consumes callables from a queue. Manages concurrency. Hasn’t changed much.
Adding API Methods Define method+structures in IDL interface/cassandra.thrift Regenerate files ant gen-thrift-java gen-thrift-py Implement stubs: o.a.c.thrift.CassandraServer Create a system test tests/system/test_thrift_server.py
Reading Socket->CassandraServer Permissions Request validation Marshalling ReadCommands created in CS.multigetSliceInternal, passed to StorageProxy 1 per key
Reading StorageProxy.read(), fetchRows() For each ReadCommand Determine endpoints Local & remote branches
Reading StorageProxy local READ stage executes a LocalReadRunnable True read vs digest Table, ColumnFamilyStore CFS.getTopLevelColumns Make QueryFilter Query Memtables Query SSTables Coalesce in iterators
Reading StorageProxy remote read command Response handler Send to remote nodes Read repair happens in SP.fetchRows().
Writing CS.doInsert() Marshalling, creates RMs StorageProxy local/remote branch SP.sendToHintedEndpoints() RowMutation one Key per (several CFs) ColumnFamily Collection of column modifications
Writing RM.apply->Table.apply Write to CL Iterate over RM CFs CFS.apply() Overwrites results on pre-existing column families
Writing RM is serialized into a Message and sent to other nodes Waits for ACKs depending on CL
Challenges
Challenges To have an in-depth understanding of everything. Hard for hobbyist/part-timers Outside of Datastax, little support for full-timers Still changing fast Keeping up
Challenge: Lines of Code 0.4 (Sep 2009) 52 kloc 0.5 (Jan 2010) 59 kloc 0.6 (Apr 2010) 73 kloc 0.7 (Jan 2011) 122 kloc 0.8 (Jun 2011) 146 kloc Trunk (yesterday) 149 kloc Average: 4,500 lines per month
Challenges Codewise Growing pains Software maturity Decisions made early on

More Related Content

What's hot

使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡Lawrence Huang
 
Gude for C++11 in Apache Traffic Server
Gude for C++11 in Apache Traffic ServerGude for C++11 in Apache Traffic Server
Gude for C++11 in Apache Traffic ServerApache Traffic Server
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQXin Wang
 
Asynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbsAsynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbsAnil Gursel
 
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.XCassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.Xaaronmorton
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparisonshsedghi
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Xavier Lucas
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redisDaeMyung Kang
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net DriverDataStax Academy
 
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaLogging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaMd Safiyat Reza
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Reactivesummit
 
Cassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisCassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisDuyhai Doan
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaAkara Sucharitakul
 
PagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresPagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresDataStax Academy
 
Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014Eric Torreborre
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper Omid Vahdaty
 
Testing Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnitTesting Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnitMarkus Günther
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 Peopleconfluent
 

What's hot (20)

使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡使用ZooKeeper打造軟體式負載平衡
使用ZooKeeper打造軟體式負載平衡
 
Gude for C++11 in Apache Traffic Server
Gude for C++11 in Apache Traffic ServerGude for C++11 in Apache Traffic Server
Gude for C++11 in Apache Traffic Server
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Asynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbsAsynchronous Orchestration DSL on squbs
Asynchronous Orchestration DSL on squbs
 
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.XCassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
 
Apache Zookeeper
Apache ZookeeperApache Zookeeper
Apache Zookeeper
 
Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28Openstack meetup lyon_2017-09-28
Openstack meetup lyon_2017-09-28
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Introduction to .Net Driver
Introduction to .Net DriverIntroduction to .Net Driver
Introduction to .Net Driver
 
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, KibanaLogging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
Logging for OpenStack - Elasticsearch, Fluentd, Logstash, Kibana
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Ka...
 
Cassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS ParisCassandra NodeJS driver & NodeJS Paris
Cassandra NodeJS driver & NodeJS Paris
 
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & KafkaBack-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
Back-Pressure in Action: Handling High-Burst Workloads with Akka Streams & Kafka
 
PagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresPagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra Failures
 
Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014Specs2 whirlwind tour at Scaladays 2014
Specs2 whirlwind tour at Scaladays 2014
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
 
Testing Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnitTesting Kafka components with Kafka for JUnit
Testing Kafka components with Kafka for JUnit
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 PeopleKafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
 

Similar to Cassandra Codebase 2011

About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"Jihyun Ahn
 
MySQL HA with PaceMaker
MySQL HA with  PaceMakerMySQL HA with  PaceMaker
MySQL HA with PaceMakerKris Buytaert
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceAshok Modi
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
 
540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdfhamzadamani7
 
WE18_Performance_Up.ppt
WE18_Performance_Up.pptWE18_Performance_Up.ppt
WE18_Performance_Up.pptwebhostingguy
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-applicationNguyễn Duy Nhân
 
Drupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityDrupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityAshok Modi
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disquszeeg
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginnerswebhostingguy
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 
CoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdCoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdRichard Lister
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streamingphanleson
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisationgrooverdan
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
 

Similar to Cassandra Codebase 2011 (20)

About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
MySQL HA with PaceMaker
MySQL HA with  PaceMakerMySQL HA with  PaceMaker
MySQL HA with PaceMaker
 
Performance_Up.ppt
Performance_Up.pptPerformance_Up.ppt
Performance_Up.ppt
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf540slidesofnodejsbackendhopeitworkforu.pdf
540slidesofnodejsbackendhopeitworkforu.pdf
 
WE18_Performance_Up.ppt
WE18_Performance_Up.pptWE18_Performance_Up.ppt
WE18_Performance_Up.ppt
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-application
 
Drupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityDrupal Backend Performance and Scalability
Drupal Backend Performance and Scalability
 
Java se7 features
Java se7 featuresJava se7 features
Java se7 features
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
Scalable Apache for Beginners
Scalable Apache for BeginnersScalable Apache for Beginners
Scalable Apache for Beginners
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
CoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love SystemdCoreOS, or How I Learned to Stop Worrying and Love Systemd
CoreOS, or How I Learned to Stop Worrying and Love Systemd
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisation
 
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...
 
Best practices tekx
Best practices tekxBest practices tekx
Best practices tekx
 

More from gdusbabek

My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015gdusbabek
 
How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014gdusbabek
 
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014gdusbabek
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014gdusbabek
 
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013gdusbabek
 
Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013gdusbabek
 
Rackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCRackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCgdusbabek
 
Austin cassandra meetup
Austin cassandra meetupAustin cassandra meetup
Austin cassandra meetupgdusbabek
 
How Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses CassandraHow Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses Cassandragdusbabek
 
Breaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL DatastoresBreaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL Datastoresgdusbabek
 
Building Rackspace Cloud Monitoring
Building Rackspace Cloud MonitoringBuilding Rackspace Cloud Monitoring
Building Rackspace Cloud Monitoringgdusbabek
 
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column FamiliesData Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Familiesgdusbabek
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)gdusbabek
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGgdusbabek
 

More from gdusbabek (14)

My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
 
How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014How To (Not) Open Source - Javazone, Oslo 2014
How To (Not) Open Source - Javazone, Oslo 2014
 
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
Blueflood and Beyond: The Future of Metrics - Berlin Buzzwords 2014
 
Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014Measure All the Things! - Austin Data Day 2014
Measure All the Things! - Austin Data Day 2014
 
Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013Blueflood: Open Source Metrics Processing at CassandraEU 2013
Blueflood: Open Source Metrics Processing at CassandraEU 2013
 
Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013Introduction to Blueflood at Berlin Buzzwords 2013
Introduction to Blueflood at Berlin Buzzwords 2013
 
Rackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYCRackspace Cloud Monitoring - Strata NYC
Rackspace Cloud Monitoring - Strata NYC
 
Austin cassandra meetup
Austin cassandra meetupAustin cassandra meetup
Austin cassandra meetup
 
How Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses CassandraHow Rackspace Cloud Monitoring uses Cassandra
How Rackspace Cloud Monitoring uses Cassandra
 
Breaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL DatastoresBreaking the Relational Headlock: A Survey of NoSQL Datastores
Breaking the Relational Headlock: A Survey of NoSQL Datastores
 
Building Rackspace Cloud Monitoring
Building Rackspace Cloud MonitoringBuilding Rackspace Cloud Monitoring
Building Rackspace Cloud Monitoring
 
Data Modeling with Cassandra Column Families
Data Modeling with Cassandra Column FamiliesData Modeling with Cassandra Column Families
Data Modeling with Cassandra Column Families
 
Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)Introduction to Cassandra (June 2010)
Introduction to Cassandra (June 2010)
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUG
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Cassandra Codebase 2011

  • 1. Codebase 2011 Getting to know the codebase Gary Dusbabek @gdusbabek
  • 3. Outline How to contribute Internals Some thoughts
  • 5. How to Contribute http://wiki.apache.org/cassandra/HowToContribute JIRA: “lhf” label (Low hanging fruit) Scratch your itch
  • 6. How to Contribute Run the tests ant test nosetests test/system/test_thrift_server.py
  • 7. How to Contribute http://wiki.apache.org/cassandra/CodeStyle Avoid: Reformatting white space Renaming things everywhere Unrelated changes
  • 8. How to Contribute Use git Attach patches git format-patch as jira attachments. Group them sensibly
  • 9. How to Contribute Someone will review your code Usually a committer Persistence helps Don’t get your feelings hurt It usually takes a few rounds
  • 10. How to Contribute Participate! #cassandra-dev on freenode dev@cassandra.apache.org
  • 12. Services Ring Operations (StorageService) Storage Operations (StorageProxy)
  • 13. Startup Sequence bin/cassandra Finds cassandra.in.sh $CLASSPATH (mandatory) $CASSANDRA_HOME $CASSANDRA_CONF (mandatory) Executes $CASSANDRA_CONF/cassandra-env.sh Sets heap sizes (gc tuning goes here!)
  • 15. AbstractCassandraDaemon ACD.setup(): Reads configuration: DatabaseDescriptor Loads schema: DD.loadSchemas() Scrub directories Initialize storage (keyspaces + CFs) Commit log recovery: CL.recover() StorageService.initServer() -> StorageService.joinTokenRing()
  • 16. Attn Tinkerers! Abstracted initialization of transport. Handy if you’re experimenting with transports/RPC Just extend AbstractCassandraDaemon and make sure that class is started up via bin/cassandra.
  • 17. o.a.c.thrift.CassandraServer Implements thrift interface methods (the API). Start here when trying to understand the read/write path and RPC.
  • 18. Configuration DatabaseDescriptor Side-effect of ACD.setup() Reads config settings from yaml Defines system tables Changes regularly I hate this code. Please fix it.
  • 19. Main Singletons StorageService StorageProxy MessagingService CompactionManager StageManager MigrationManager
  • 20. Did you just say ‘Singletons?’
  • 21. Main Singletons StorageService StorageProxy MessagingService CompactionManager StageManager MigrationManager
  • 22. JMX MBeans Tooling supplied by Mbeans Anything that does measureable/configurable work is tooled Thread pools Compaction Hinted handoff Streaming Storage Commit log
  • 23. StorageService initServer() -> joinTokenRing() Starts gossip Starts MessagingService Negotiates bootstrap Many ring operations live here. Repository of ring topology TokenMetadata (quasi-singleton via SS.tokenMetadata_) Partitioner instance is also here
  • 24. MessagingService Verb handlers live here (initialized from SS). Main event handlers, haven’t changed much. Socket listener 2 threads per ring node Message gateway emitted from MessageProducerimpls MS.sendRR() MS.sendOneWay() MS.receive() Messages are versioned now (0.8) IncomingTCPConnection
  • 25. StorageProxy Top level of all read/write operations Called from o.a.c.thrift.CassandraServer Write path changed because of counters Notion of WritePerformer Eventually to Table and ColumnFamilyStore Further, to SSTable and related classes.
  • 26. StageManager Fancy java ThreadPoolExecutor SEDA: http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf consumes callables from a queue. Manages concurrency. Hasn’t changed much.
  • 27. Adding API Methods Define method+structures in IDL interface/cassandra.thrift Regenerate files ant gen-thrift-java gen-thrift-py Implement stubs: o.a.c.thrift.CassandraServer Create a system test tests/system/test_thrift_server.py
  • 28. Reading Socket->CassandraServer Permissions Request validation Marshalling ReadCommands created in CS.multigetSliceInternal, passed to StorageProxy 1 per key
  • 29. Reading StorageProxy.read(), fetchRows() For each ReadCommand Determine endpoints Local & remote branches
  • 30. Reading StorageProxy local READ stage executes a LocalReadRunnable True read vs digest Table, ColumnFamilyStore CFS.getTopLevelColumns Make QueryFilter Query Memtables Query SSTables Coalesce in iterators
  • 31. Reading StorageProxy remote read command Response handler Send to remote nodes Read repair happens in SP.fetchRows().
  • 32. Writing CS.doInsert() Marshalling, creates RMs StorageProxy local/remote branch SP.sendToHintedEndpoints() RowMutation one Key per (several CFs) ColumnFamily Collection of column modifications
  • 33. Writing RM.apply->Table.apply Write to CL Iterate over RM CFs CFS.apply() Overwrites results on pre-existing column families
  • 34. Writing RM is serialized into a Message and sent to other nodes Waits for ACKs depending on CL
  • 36. Challenges To have an in-depth understanding of everything. Hard for hobbyist/part-timers Outside of Datastax, little support for full-timers Still changing fast Keeping up
  • 37. Challenge: Lines of Code 0.4 (Sep 2009) 52 kloc 0.5 (Jan 2010) 59 kloc 0.6 (Apr 2010) 73 kloc 0.7 (Jan 2011) 122 kloc 0.8 (Jun 2011) 146 kloc Trunk (yesterday) 149 kloc Average: 4,500 lines per month
  • 38. Challenges Codewise Growing pains Software maturity Decisions made early on

Editor's Notes

  1. Who was here last year?Very good presentations on data modeling and capacity planning.
  2. Turn it around.Ask questions first.
  3. Transport still not initialized though.DD getting loaded is just a side-effect
  4. This is actually a good exercise.
  5. Good place to extend and experiment on your own.