Apache Cassandra
Community Health
NGCC 2017
Ben Bromhead
Agenda
● What this talk is, what it is not
● Bias
● Data, Anecdotes and Observations
● Final thoughts
What this talk is
● A chance to look at some (arbitrary) measures about our community
● A chance for me to talk about my personal and corporate experience in the
community
● A chance to reflect
● A chance to celebrate
What this talk is not
● Blaming, Finger pointing etc
● A definitive state of the community
● This is not the way things have always been and always will be
● A roadmap / plan to fix things
Bias
● Using Cassandra since 2012
● Active since 2013
● I am not a committer or a PMC member
● I have a leadership and ownership position in a company with direct
commercial interest in the Apache Cassandra project
● My company is in competition with others that have an interest/influence in
the Apache Cassandra project
● We also have informal and formal partnerships with other companies that
have direct interest/influence on the Apache Cassandra project
Each measure of an Open Source projects health is a brush stroke,
not the whole picture
Methodology
Methodology
● Code Activity
● Release history
● User community
● longevity
● surrounding ecosystem
● db-engines popularity ranking*
*vanity metric
Code Activity
● Volume and timeliness of commits
○ Proxy for community activity
● As projects mature / stabilize, volume can reduce
○ Shift to incremental features, bug fixes etc
○ Cyclical following project approach to versions
● Who works on what, when and how often
○ Governance can impact volume and timeliness
Release Activity
● Push and pull between releasing new features and stability for users
○ Proxy for community maturity
● Databases tend to favor stability
○ New ones still have a lot of features to ship!
○ 9 years old, Cassandra is still a baby
○ 1.0 was only released 6 years ago
● A regular, consistent release cadence indicates maturity
○ Can be as long or short between features
○ Generally reflects mature development processes, release infra
Release Activity
Branch Previous Release Previous Release
Time Delta from
previous
Current Release Current Release
Time Delta from
previous
3.x 3.10 4 Months 3.11.0 5 Months
3.0 3.0.14 1 Months 3.0.14 2 Months
2.2 2.2.9 5 Months 2.2.10 4 Months
2.1 2.1.17 4 Months 2.1.18 4 Months
Cassandra 3.11.1 (unreleased) elapsed time - 3 Months, 3 days
User Community
● Hardest to measure
○ Previous examinations are measures of <1% who use Cassandra
○ Public measures of community generally only include 10% (those who engage)
● Use a diversity of measures
● Will include some commercial measures
● Will include some qualitative data
Apache Cassandra (search term) vs AWS DynamoDB (search term)
Apache Cassandra (topic) vs DynamoDB (search term)
DB-Engines
Longevity
● Cassandra is often considered a “winner” when looking at the NoSQL cohort
from the late 2000’s
● Large user base focused primarily in USA and Europe
● Top 1% of Apache Cassandra users are heavily invested in it
● A number of companies looking to take a similar approach
● Community health and engagement with users/new contributors is a
challenge
Ecosystem
Direct - Consulting companies, Support providers, Managed Service providers
OEM - C* as the embedded data store for their core product.
Integrations - OSS integrations with other big/fast data
Startups - Companies targeting Cassandra with their initial product release
Commercial Performance
● Instaclustr performance
○ 100%+ CAGR whole business
○ Significantly better growth for our Cassandra support line of business
● Other players in the space all appear to be growing well and rapidly
○ Datastax
○ Scylla
○ Consulting groups
Datastax withdrawal
● It has happened
○ Definite reduction in commits / time / jira comments from DS employed committers
○ Awesome individual contributions from DS employed committers who are passionate about
Apache Cassandra
● How users viewed it
○ It appears to be a very binary thing in the wider community
○ Seen as a clear fork
● Perception Impact (survey conducted by Instaclustr)
○ 20% say it's a negative
○ 60% say it doesn’t matter
○ 20% say it’s a positive
User Discussions - Bad
● Datastax withdrawal rattled a few Cassandra evaluators, polarised some
users towards Apache Cassandra as well
● Long tail of old advice, cargo culting, bad documentation, tick-tock is causing
first time adopters to struggle with the “right” way to run Cassandra.
● Performance issues and half finished features
User Discussions - Good
● Cassandra is part of the standard developer toolbelt. We now see second and
third generation deployments due to cross pollination
● I would argue that a significant % of the global population interacts with a C*
backed service on a daily basis!
● Passionate community who are excited about solving these challenges
● Existing large users who did not have a heavy investment in Cassandra now
feel like they have the opportunity and responsibility to be better community
citizens
Community Challenges - Summary
From our perspective the Cassandra community faces two challenges:
● Filling the gap left by Datastax withdrawal, this is primarily related to
resources and time.
● Improving first (n) time experiences
○ Documentation
○ Development
○ Deployment
○ Operations
○ Contributors
45% of engineers who submit and have a patch accepted will submit a second
one!
Questions?

Apache Cassandra Community Health

  • 1.
  • 2.
    Agenda ● What thistalk is, what it is not ● Bias ● Data, Anecdotes and Observations ● Final thoughts
  • 3.
    What this talkis ● A chance to look at some (arbitrary) measures about our community ● A chance for me to talk about my personal and corporate experience in the community ● A chance to reflect ● A chance to celebrate
  • 4.
    What this talkis not ● Blaming, Finger pointing etc ● A definitive state of the community ● This is not the way things have always been and always will be ● A roadmap / plan to fix things
  • 5.
    Bias ● Using Cassandrasince 2012 ● Active since 2013 ● I am not a committer or a PMC member ● I have a leadership and ownership position in a company with direct commercial interest in the Apache Cassandra project ● My company is in competition with others that have an interest/influence in the Apache Cassandra project ● We also have informal and formal partnerships with other companies that have direct interest/influence on the Apache Cassandra project
  • 6.
    Each measure ofan Open Source projects health is a brush stroke, not the whole picture Methodology
  • 7.
    Methodology ● Code Activity ●Release history ● User community ● longevity ● surrounding ecosystem ● db-engines popularity ranking* *vanity metric
  • 8.
    Code Activity ● Volumeand timeliness of commits ○ Proxy for community activity ● As projects mature / stabilize, volume can reduce ○ Shift to incremental features, bug fixes etc ○ Cyclical following project approach to versions ● Who works on what, when and how often ○ Governance can impact volume and timeliness
  • 13.
    Release Activity ● Pushand pull between releasing new features and stability for users ○ Proxy for community maturity ● Databases tend to favor stability ○ New ones still have a lot of features to ship! ○ 9 years old, Cassandra is still a baby ○ 1.0 was only released 6 years ago ● A regular, consistent release cadence indicates maturity ○ Can be as long or short between features ○ Generally reflects mature development processes, release infra
  • 14.
    Release Activity Branch PreviousRelease Previous Release Time Delta from previous Current Release Current Release Time Delta from previous 3.x 3.10 4 Months 3.11.0 5 Months 3.0 3.0.14 1 Months 3.0.14 2 Months 2.2 2.2.9 5 Months 2.2.10 4 Months 2.1 2.1.17 4 Months 2.1.18 4 Months Cassandra 3.11.1 (unreleased) elapsed time - 3 Months, 3 days
  • 15.
    User Community ● Hardestto measure ○ Previous examinations are measures of <1% who use Cassandra ○ Public measures of community generally only include 10% (those who engage) ● Use a diversity of measures ● Will include some commercial measures ● Will include some qualitative data
  • 18.
    Apache Cassandra (searchterm) vs AWS DynamoDB (search term)
  • 19.
    Apache Cassandra (topic)vs DynamoDB (search term)
  • 20.
  • 21.
    Longevity ● Cassandra isoften considered a “winner” when looking at the NoSQL cohort from the late 2000’s ● Large user base focused primarily in USA and Europe ● Top 1% of Apache Cassandra users are heavily invested in it ● A number of companies looking to take a similar approach ● Community health and engagement with users/new contributors is a challenge
  • 22.
    Ecosystem Direct - Consultingcompanies, Support providers, Managed Service providers OEM - C* as the embedded data store for their core product. Integrations - OSS integrations with other big/fast data Startups - Companies targeting Cassandra with their initial product release
  • 23.
    Commercial Performance ● Instaclustrperformance ○ 100%+ CAGR whole business ○ Significantly better growth for our Cassandra support line of business ● Other players in the space all appear to be growing well and rapidly ○ Datastax ○ Scylla ○ Consulting groups
  • 24.
    Datastax withdrawal ● Ithas happened ○ Definite reduction in commits / time / jira comments from DS employed committers ○ Awesome individual contributions from DS employed committers who are passionate about Apache Cassandra ● How users viewed it ○ It appears to be a very binary thing in the wider community ○ Seen as a clear fork ● Perception Impact (survey conducted by Instaclustr) ○ 20% say it's a negative ○ 60% say it doesn’t matter ○ 20% say it’s a positive
  • 25.
    User Discussions -Bad ● Datastax withdrawal rattled a few Cassandra evaluators, polarised some users towards Apache Cassandra as well ● Long tail of old advice, cargo culting, bad documentation, tick-tock is causing first time adopters to struggle with the “right” way to run Cassandra. ● Performance issues and half finished features
  • 26.
    User Discussions -Good ● Cassandra is part of the standard developer toolbelt. We now see second and third generation deployments due to cross pollination ● I would argue that a significant % of the global population interacts with a C* backed service on a daily basis! ● Passionate community who are excited about solving these challenges ● Existing large users who did not have a heavy investment in Cassandra now feel like they have the opportunity and responsibility to be better community citizens
  • 27.
    Community Challenges -Summary From our perspective the Cassandra community faces two challenges: ● Filling the gap left by Datastax withdrawal, this is primarily related to resources and time. ● Improving first (n) time experiences ○ Documentation ○ Development ○ Deployment ○ Operations ○ Contributors 45% of engineers who submit and have a patch accepted will submit a second one!
  • 28.