SlideShare a Scribd company logo
Apache Cassandra
    Present and Future
    Jonathan Ellis




Monday, October 24, 2011
History


   Bigtable, 2006                                    Dynamo, 2007



                                             OSS, 2008




                                                  TLP, 2010
                           Incubator, 2009
                                             1.0, October 2011

Monday, October 24, 2011
Why people choose Cassandra

    ✤    Multi-master, multi-DC
    ✤    Linearly scalable
    ✤    Larger-than-memory datasets
    ✤    High performance
    ✤    Full durability
    ✤    Integrated caching
    ✤    Tuneable consistency



Monday, October 24, 2011
Cassandra users

    ✤    Financial
    ✤    Social Media
    ✤    Advertising
    ✤    Entertainment
    ✤    Energy
    ✤    E-tail
    ✤    Health care
    ✤    Government

Monday, October 24, 2011
Road to 1.0

    ✤    Storage engine improvements
    ✤    Specialized replication modes
    ✤    Performance and other real-world considerations
    ✤    Building an ecosystem




Monday, October 24, 2011
Compaction

                           Size-Tiered




                           Leveled




Monday, October 24, 2011
Differences in Cassandra’s leveling

    ✤    ColumnFamily instead of key/value
    ✤    Multithreading (experimental)
    ✤    Optional throttling (16MB/s by default)
    ✤    Per-sstable bloom filter for row keys
    ✤    Larger data files (5MB by default)
    ✤    Does not block writes if compaction falls behind




Monday, October 24, 2011
Column (“secondary”) indexing

cqlsh> CREATE INDEX state_idx ON users(state);
cqlsh> INSERT INTO users (uname, state, birth_date)
                VALUES (‘bsanderson’, ‘UT’, 1975)


    users                                       state_idx
                           state   birth_date
         bsanderson        UT        1975            UT     bsanderson   htayler
           prothfuss        WI       1973            WI      prothfuss
             htayler       UT        1968




Monday, October 24, 2011
Querying


cqlsh> SELECT * FROM users
                WHERE state='UT' AND birth_date > 1970
                ORDER BY tokenize(key);




Monday, October 24, 2011
More sophisticated indexing?

    ✤    Want to:
          ✤   Support more operators
          ✤   Support user-defined ordering
          ✤   Support high-cardinality values
    ✤    But:
          ✤   Scatter/gather scales poorly
          ✤   If it’s not node local, we can’t guarantee atomicity
          ✤   If we can’t guarantee atomicity, doublechecking across nodes is a
              huge penalty


Monday, October 24, 2011
Other storage engine improvements

        ✤    Compression
        ✤    Expiring columns
        ✤    Bounded worst-case reads by re-writing fragmented
             rows




Monday, October 24, 2011
Eventually-consistent counters

    ✤    Counter is partitioned by replica; each replica is “master”
         of its own partition




Monday, October 24, 2011
13

Monday, October 24, 2011
14

Monday, October 24, 2011
15

Monday, October 24, 2011
16

Monday, October 24, 2011
The interesting parts

    ✤    Tuneable consistency
    ✤    Avoiding contention on local increments
          ✤   Store increments, not full value; merge on read + compaction
    ✤    Renewing counter id to deal with data loss
    ✤    Interaction with tombstones




Monday, October 24, 2011
(What about version vectors?)

    ✤    Version vectors allow detecting conflict, but do not give
         enough information to resolve it except in special cases
          ✤   In the counters case, we’d need to hold onto all previous versions
              until we can be sure no new conflict with them can occur
          ✤   Jeff Darcy has a longer discussion at http://pl.atyp.us/
              wordpress/?p=2601




Monday, October 24, 2011
Performance




     A single four-core machine; one million inserts + one million updates



Monday, October 24, 2011
Dealing with the JVM

    ✤    JNA
          ✤   mlockall()
          ✤   posix_fadvise()
          ✤   link()
    ✤    Memory
          ✤   Move cache off-heap
          ✤   In-heap arena allocation for memtables, bloom filters
          ✤   Move compaction to a separate process?



Monday, October 24, 2011
The Cassandra ecosystem

    ✤    Replication into Cassandra
          ✤   Gigaspaces
          ✤   Drizzle
    ✤    Solandra: Cassandra + Solr search
    ✤    DataStax Enterprise: Cassandra + Hadoop analytics




Monday, October 24, 2011
DataStax Enterprise




Monday, October 24, 2011
Operations

    ✤    “Vanilla” Hadoop
          ✤   8+ services to setup, monitor, backup, and recover
              (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper,
              Metastore, ...)
          ✤   Single points of failure
          ✤   Can't separate online and offline processing

    ✤    DataStax Enterprise
          ✤   Single, simplified component
          ✤   Peer to peer
          ✤   JobTracker failover
          ✤   No additional cassandra config



Monday, October 24, 2011
What’s next

    ✤    Ease Of Use
    ✤    CQL: “Native” transport, prepared statements
    ✤    Triggers
    ✤    Entity groups
    ✤    Smarter range queries enabling Hive predicate push-
         down
    ✤    Blue sky: streaming / CEP




Monday, October 24, 2011
Questions?

    ✤    jbellis@datastax.com




    ✤    DataStax is hiring!
          ✤   ~15 engineers (35 employees), want to double in 2012
               ✤    Austin, Burlingame, NYC, France, Japan, Belarus
          ✤   100+ customers
          ✤   $11M Series B funding

Monday, October 24, 2011

More Related Content

Similar to Cassandra at High Performance Transaction Systems 2011

Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
mubarakss
 

Similar to Cassandra at High Performance Transaction Systems 2011 (20)

Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Migrating Social Content to Drupal
Migrating Social Content to DrupalMigrating Social Content to Drupal
Migrating Social Content to Drupal
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great Success
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
6269441.ppt
6269441.ppt6269441.ppt
6269441.ppt
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache Cassnadra
 
Storage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems PresentationStorage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems Presentation
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
 
An Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupAn Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User Group
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 

More from jbellis

Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
jbellis
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
jbellis
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
jbellis
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
jbellis
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
jbellis
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
jbellis
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
jbellis
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
jbellis
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
jbellis
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
jbellis
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
jbellis
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
jbellis
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
jbellis
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011
jbellis
 
Brisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by CassandraBrisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by Cassandra
jbellis
 

More from jbellis (20)

Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011
 
Brisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by CassandraBrisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by Cassandra
 
PyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialPyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorial
 

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 

Cassandra at High Performance Transaction Systems 2011

  • 1. Apache Cassandra Present and Future Jonathan Ellis Monday, October 24, 2011
  • 2. History Bigtable, 2006 Dynamo, 2007 OSS, 2008 TLP, 2010 Incubator, 2009 1.0, October 2011 Monday, October 24, 2011
  • 3. Why people choose Cassandra ✤ Multi-master, multi-DC ✤ Linearly scalable ✤ Larger-than-memory datasets ✤ High performance ✤ Full durability ✤ Integrated caching ✤ Tuneable consistency Monday, October 24, 2011
  • 4. Cassandra users ✤ Financial ✤ Social Media ✤ Advertising ✤ Entertainment ✤ Energy ✤ E-tail ✤ Health care ✤ Government Monday, October 24, 2011
  • 5. Road to 1.0 ✤ Storage engine improvements ✤ Specialized replication modes ✤ Performance and other real-world considerations ✤ Building an ecosystem Monday, October 24, 2011
  • 6. Compaction Size-Tiered Leveled Monday, October 24, 2011
  • 7. Differences in Cassandra’s leveling ✤ ColumnFamily instead of key/value ✤ Multithreading (experimental) ✤ Optional throttling (16MB/s by default) ✤ Per-sstable bloom filter for row keys ✤ Larger data files (5MB by default) ✤ Does not block writes if compaction falls behind Monday, October 24, 2011
  • 8. Column (“secondary”) indexing cqlsh> CREATE INDEX state_idx ON users(state); cqlsh> INSERT INTO users (uname, state, birth_date) VALUES (‘bsanderson’, ‘UT’, 1975) users state_idx state birth_date bsanderson UT 1975 UT bsanderson htayler prothfuss WI 1973 WI prothfuss htayler UT 1968 Monday, October 24, 2011
  • 9. Querying cqlsh> SELECT * FROM users WHERE state='UT' AND birth_date > 1970 ORDER BY tokenize(key); Monday, October 24, 2011
  • 10. More sophisticated indexing? ✤ Want to: ✤ Support more operators ✤ Support user-defined ordering ✤ Support high-cardinality values ✤ But: ✤ Scatter/gather scales poorly ✤ If it’s not node local, we can’t guarantee atomicity ✤ If we can’t guarantee atomicity, doublechecking across nodes is a huge penalty Monday, October 24, 2011
  • 11. Other storage engine improvements ✤ Compression ✤ Expiring columns ✤ Bounded worst-case reads by re-writing fragmented rows Monday, October 24, 2011
  • 12. Eventually-consistent counters ✤ Counter is partitioned by replica; each replica is “master” of its own partition Monday, October 24, 2011
  • 17. The interesting parts ✤ Tuneable consistency ✤ Avoiding contention on local increments ✤ Store increments, not full value; merge on read + compaction ✤ Renewing counter id to deal with data loss ✤ Interaction with tombstones Monday, October 24, 2011
  • 18. (What about version vectors?) ✤ Version vectors allow detecting conflict, but do not give enough information to resolve it except in special cases ✤ In the counters case, we’d need to hold onto all previous versions until we can be sure no new conflict with them can occur ✤ Jeff Darcy has a longer discussion at http://pl.atyp.us/ wordpress/?p=2601 Monday, October 24, 2011
  • 19. Performance A single four-core machine; one million inserts + one million updates Monday, October 24, 2011
  • 20. Dealing with the JVM ✤ JNA ✤ mlockall() ✤ posix_fadvise() ✤ link() ✤ Memory ✤ Move cache off-heap ✤ In-heap arena allocation for memtables, bloom filters ✤ Move compaction to a separate process? Monday, October 24, 2011
  • 21. The Cassandra ecosystem ✤ Replication into Cassandra ✤ Gigaspaces ✤ Drizzle ✤ Solandra: Cassandra + Solr search ✤ DataStax Enterprise: Cassandra + Hadoop analytics Monday, October 24, 2011
  • 23. Operations ✤ “Vanilla” Hadoop ✤ 8+ services to setup, monitor, backup, and recover (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Metastore, ...) ✤ Single points of failure ✤ Can't separate online and offline processing ✤ DataStax Enterprise ✤ Single, simplified component ✤ Peer to peer ✤ JobTracker failover ✤ No additional cassandra config Monday, October 24, 2011
  • 24. What’s next ✤ Ease Of Use ✤ CQL: “Native” transport, prepared statements ✤ Triggers ✤ Entity groups ✤ Smarter range queries enabling Hive predicate push- down ✤ Blue sky: streaming / CEP Monday, October 24, 2011
  • 25. Questions? ✤ jbellis@datastax.com ✤ DataStax is hiring! ✤ ~15 engineers (35 employees), want to double in 2012 ✤ Austin, Burlingame, NYC, France, Japan, Belarus ✤ 100+ customers ✤ $11M Series B funding Monday, October 24, 2011