SlideShare a Scribd company logo
1 of 46
Download to read offline
Apache Cassandra:
Real-world scalability, today!
Jonathan Ellis
CTO
Cassandra Job Trends




©2012 DataStax
“Big Data” trend




©2012 DataStax
Why Big Data Matters




           Research done by McKinsey & Company shows the eye-opening, 10-
           year category growth rate differences between businesses that smartly
           use their big data and those that do not.


©2012 DataStax
Big data



           Analytics        Realtime
                       ?
           (Hadoop)        (“NoSQL”)




©2012 DataStax
Some Casandra users




 ©2012 DataStax
Industries & use cases
  • Financial      • Time series data
  • Social Media   • Messaging
  • Advertising    • Ad tracking
  • Entertainment  • Data mining
  • Energy         • User activity streams
  • E-tail         • User sessions
  • Health care    • Anything requiring:
                     Scalable performant
  • Government       + highly available

©2012 DataStax
Why Cassandra?
 • Fully distributed, no SPOF
 • Multi-master, multi-DC
 • Linearly scalable
 • Larger-than-memory datasets
 • Best-in-class performance (not just writes!)
 • Fully durable
 • Integrated caching
 • Tuneable consistency
©2012 DataStax
Availability
  • “There is no such thing as standby
            infrastructure: there is stuff you always use and
            stuff that won’t work when you need it.” -- Ben
            Black: founder, Boundary; ex-AWS

     • “The biggest problem with failover is that you're
            almost never using it until it really hurts. It's like
            backups that you never test.” -- Rick Branson: instagram;
            ex-DataStax




©2012 DataStax
Classic partitioning with SPOF
                 partition 1   partition 2      partition 3   partition 4




                                         router


                                             client
©2012 DataStax
Fully distributed, no SPOF
                 client




                          p3
                                p6        p1
                           p1




                                     p1




©2012 DataStax
©2012 DataStax
Partitioning



                  jim     age: 36   car: camaro gender: M

                 carol    age: 37   car: subaru   gender: F

                 johnny   age:12    gender: M

                 suzy     age:10     gender: F

©2012 DataStax
Partitioning
           Primary key determines placement*



                  jim     age: 36   car: camaro gender: M

                 carol    age: 37   car: subaru   gender: F

                 johnny   age:12    gender: M

                 suzy     age:10     gender: F

©2012 DataStax
PK      MD5 Hash



                  jim     5e02739678...      MD5 hash
                                          operation yields
                 carol    a9a0198010...
                                             a 128-bit
                 johnny   f4eb27cea7...     number for
                                                keys
                 suzy     78b421309e...     of any size.




©2012 DataStax
The “token ring”




                 Node A   Node B




                 Node D   Node C



©2012 DataStax
Start             End
                     0xc000000000..   0x0000000000..
                 A         1                0
                     0x0000000000..   0x4000000000..
                 B         1                0
                     0x4000000000..   0x8000000000..
                 C         1                0
                     0x8000000000..   0xc000000000..
                 D         1                0



                      jim         5e02739678...


                     carol        a9a0198010...


                     johnny           f4eb27cea7...


                     suzy         78b421309e...


©2012 DataStax
Start             End
                     0xc000000000..   0x0000000000..
                 A         1                0
                     0x0000000000..   0x4000000000..
                 B         1                0
                     0x4000000000..   0x8000000000..
                 C         1                0
                     0x8000000000..   0xc000000000..
                 D         1                0



                      jim         5e02739678...


                     carol        a9a0198010...


                     johnny           f4eb27cea7...


                     suzy         78b421309e...


©2012 DataStax
Start             End
                     0xc000000000..   0x0000000000..
                 A         1                0
                     0x0000000000..   0x4000000000..
                 B         1                0
                     0x4000000000..   0x8000000000..
                 C         1                0
                     0x8000000000..   0xc000000000..
                 D         1                0



                      jim         5e02739678...


                     carol        a9a0198010...


                     johnny           f4eb27cea7...


                     suzy         78b421309e...


©2012 DataStax
Start             End
                     0xc000000000..   0x0000000000..
                 A         1                0
                     0x0000000000..   0x4000000000..
                 B         1                0
                     0x4000000000..   0x8000000000..
                 C         1                0
                     0x8000000000..   0xc000000000..
                 D         1                0



                      jim         5e02739678...


                     carol        a9a0198010...


                     johnny           f4eb27cea7...


                     suzy         78b421309e...


©2012 DataStax
Start             End
                     0xc000000000..   0x0000000000..
                 A         1                0
                     0x0000000000..   0x4000000000..
                 B         1                0
                     0x4000000000..   0x8000000000..
                 C         1                0
                     0x8000000000..   0xc000000000..
                 D         1                0



                      jim         5e02739678...


                     carol        a9a0198010...


                     johnny           f4eb27cea7...


                     suzy         78b421309e...


©2012 DataStax
Replication




                             Node A   Node B




                             Node D   Node C


       carol     a9a0198010...
©2012 DataStax
Node A   Node B




                             Node D   Node C


       carol     a9a0198010...
©2012 DataStax
Node A   Node B




                             Node D   Node C


       carol     a9a0198010...
©2012 DataStax
Highlights
 • Adding capacity is application-transparent and
            requires no downtime
     • No SPOF, not even temporarily
           •     No “primary” replica

     • Configurable synchronous/asynchronous
     • Tolerates node failure; never have to restart
            replication “from scratch”
     • “Smart” replication avoids correlated failures

©2012 DataStax
What about performance?
 • Log-structured storage engine avoids random i/
            o
     • Excellent performance on both reads and writes
     • Row-level isolation via concurrent algorithms
           •     no locking

     • Built in compression improves cache hotness
     • “Row cache” can replace memcached


©2012 DataStax
reads/s            writes/s

                                                                       35000



                                                                      30000


                                                                     25000


                                                                    20000


                                                                   15000


                                                                   10000

                                                               5000
                 Cassandra 0.6
                                                               0
©2012 DataStax
                                           Cassandra 1.0
©2012 DataStax
Netflix
                       Application/Use Case
                       • Manage subscriber interactions with
                         downloaded movies
                       • Need to handle distributed databases all over
                         the world (40 countries)
                       • Need better TCO than Oracle

simple text            Why Cassandra?
                       • Easy scale and multi-data center support
                         for geographical data distribution
                       • Data model perfect fit for customer
                         interaction data
                       • Much better TCO than Oracle or SimpleDB



                 “I can create a Cassandra cluster in any region of the world in 10
                 minutes. When marketing guys decide we want to move into a
                 certain part of the world, we’re ready.”
©2012 DataStax
Constant Contact
                    Application/Use Case
                    • Manage marketing/email campaigns for
                      small businesses
                    • Needed database to handle social media
                      data that is very large in volume and must be
                      maintained for long time
                    • Data is unstructured in nature
simple text
                    Why Cassandra?
                    • Cassandra built for big data scale and able
                      to persist, manage, and quickly query big
                      data
                    • Deployed application on Cassandra in
                      1/3rd the time and 1/10th the cost of
                      Oracle

                 “Whenever we need new capacity, we just add new nodes online
                 and we’re able to meet whatever demand we have. Cassandra is
                 great for that.”
©2012 DataStax
ReachLocal
                 Application/Use Case
                 • ReachLocal provides end-to-end Internet
                   advertising services to small and medium-
                   sized businesses in eight countries
                 • Must track most or all user interaction with
                   marketing campaigns on web sites

simple text      Why Cassandra?
                 • The amount of information was beyond
                   the scalability limits of traditional
                   RDBMS’s
                 • Has to replicate data to six data centers
                   around the world
                 • Needed integration with real-time data and
                   analytics/search




©2012 DataStax
Backupify
                     Application/Use Case
                     • Cloud-based utility that enables backups and
                       searches of Google Apps, Gmail, Facebook,
                       Twitter, Blogger and other content.
                     • Must write lots of data very quickly


simple text          Why Cassandra?
                     • Big data requirements necessitated easy
                       scale out and continuously available
                       database architecture
                     • Strong Community support of Cassandra
                     • TCO was much better than others


                 “Cassandra was just a better design all around – more truly
                 horizontally scalable and with less management overhead – and
                 there’s no single point of failure. I looked at Cassandra’s
                 architecture and thought, ‘Yeah, that’s how you do it.’”
©2012 DataStax
OpenWave
                     Application/Use Case
                     • Openwave Messaging delivers next
                       generation converged messaging platform
                       with cloud and social integration capabilities.



simple text
                     Why Cassandra?
                     • Needed new database that would support
                       geographic redundancy, continuous
                       availability, and big data scale
                     • Required high IOPS database speed
                     • Better TCO than prior Oracle database

                 “Here are the big ‘checkbox’ items for us with Apache
                 Cassandra: There is no single point of failure, it offers high read-
                 and-write performance, and it has the ability to work on
                 commodity hardware”.
©2012 DataStax
Healthx
                 Application/Use Case
                 • Develops and manages online portals for
                   healthcare market
                 • Delivered via cloud platform
                 • Manages provider, patient, and other related
                   data

simple text      Why DataStax Enterprise?
                 • Needed to scale, perform, and search data
                   faster than previous Microsoft SQL Server
                   database farm
                 • Integrated big data platform that provides
                   one database cluster for all real-time and
                   search data


                 “We really like the integration with Solr. We get the full
                 redundancy that you’d expect out of Cassandra as well as the full
                 text indexing of Solr. The two things together make a win.”
©2012 DataStax
Big data



           Analytics        Realtime
                       ?
           (Hadoop)        (“NoSQL”)




©2012 DataStax
The evolution of Analytics




                 Analytics + Realtime
©2012 DataStax
The evolution of Analytics


                             replication




                 Analytics                 Realtime

©2012 DataStax
The evolution of Analytics


                 ETL




©2012 DataStax
Big data



           Analytics    Datastax     Realtime
           (Hadoop)    Enterprise   (Cassandra)




©2012 DataStax
Reunification of realtime + analytics




©2012 DataStax
©2012 DataStax
Portfolio Demo dataflow
Portfolios                    Portfolios
Historical Prices       Live Prices for
Intermediate                     today
Results
Largest loss                Largest loss




  ©2012 DataStax
Better Hadoop than Hadoop
  • “Vanilla” Hadoop
           •     8+ services to setup, monitor, backup, and recover
                 (NameNode, SecondaryNameNode, DataNode, JobTracker,
                 TaskTracker, Zookeeper, Region Server,...)

           •     Single points of failure
           •     Can't separate online and offline processing


     • DataStax Enterprise
           •     Single, simplified component
           •     Self-organizes based on workload
           •     Peer to peer
           •     JobTracker failover

©2012 DataStax
Enterprise search with Solr
 SELECT title FROM solr WHERE solr_query='title:natio*';

  title
 --------------------------------------------------------------------------
                                       Bolivia national football team 2002
  List of French born footballers who have played for other national teams
                     Lithuania national basketball team at Eurobasket 2009
                                       Bolivia national football team 2000
                                     Kenya national under-20 football team
                                       Bolivia national football team 1999
                                  Israel men's national inline hockey team
                                       Bolivia national football team 2001




©2012 DataStax
Managing & Monitoring Big Data
      DataStax
      OpsCenter
      manages and
      monitors all
      Cassandra and
      Hadoop
      operations




 ©2012 DataStax
Questions?
     • http://www.datastax.com/docs
     • http://www.datastax.com/dev/blog/whats-
            new-in-cassandra-1-1
     • http://www.datastax.com/products/enterprise




©2012 DataStax

More Related Content

More from jbellis

Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015jbellis
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014jbellis
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1jbellis
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014jbellis
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013jbellis
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0jbellis
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynotejbellis
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012jbellis
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandrajbellis
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1jbellis
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Javajbellis
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprisejbellis
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)jbellis
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011jbellis
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)jbellis
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from javajbellis
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011jbellis
 
Brisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by CassandraBrisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by Cassandrajbellis
 
PyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialPyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialjbellis
 
Cassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability GroupCassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability Groupjbellis
 

More from jbellis (20)

Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
Tokyo cassandra conference 2014
Tokyo cassandra conference 2014Tokyo cassandra conference 2014
Tokyo cassandra conference 2014
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011Cassandra at High Performance Transaction Systems 2011
Cassandra at High Performance Transaction Systems 2011
 
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
Cassandra 1.0 and the future of big data (Cassandra Tokyo 2011)
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011
 
Brisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by CassandraBrisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by Cassandra
 
PyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialPyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorial
 
Cassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability GroupCassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability Group
 

Cassandra at NoSql Matters 2012

  • 1. Apache Cassandra: Real-world scalability, today! Jonathan Ellis CTO
  • 4. Why Big Data Matters Research done by McKinsey & Company shows the eye-opening, 10- year category growth rate differences between businesses that smartly use their big data and those that do not. ©2012 DataStax
  • 5. Big data Analytics Realtime ? (Hadoop) (“NoSQL”) ©2012 DataStax
  • 6. Some Casandra users ©2012 DataStax
  • 7. Industries & use cases • Financial • Time series data • Social Media • Messaging • Advertising • Ad tracking • Entertainment • Data mining • Energy • User activity streams • E-tail • User sessions • Health care • Anything requiring: Scalable performant • Government + highly available ©2012 DataStax
  • 8. Why Cassandra? • Fully distributed, no SPOF • Multi-master, multi-DC • Linearly scalable • Larger-than-memory datasets • Best-in-class performance (not just writes!) • Fully durable • Integrated caching • Tuneable consistency ©2012 DataStax
  • 9. Availability • “There is no such thing as standby infrastructure: there is stuff you always use and stuff that won’t work when you need it.” -- Ben Black: founder, Boundary; ex-AWS • “The biggest problem with failover is that you're almost never using it until it really hurts. It's like backups that you never test.” -- Rick Branson: instagram; ex-DataStax ©2012 DataStax
  • 10. Classic partitioning with SPOF partition 1 partition 2 partition 3 partition 4 router client ©2012 DataStax
  • 11. Fully distributed, no SPOF client p3 p6 p1 p1 p1 ©2012 DataStax
  • 13. Partitioning jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F ©2012 DataStax
  • 14. Partitioning Primary key determines placement* jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F ©2012 DataStax
  • 15. PK MD5 Hash jim 5e02739678... MD5 hash operation yields carol a9a0198010... a 128-bit johnny f4eb27cea7... number for keys suzy 78b421309e... of any size. ©2012 DataStax
  • 16. The “token ring” Node A Node B Node D Node C ©2012 DataStax
  • 17. Start End 0xc000000000.. 0x0000000000.. A 1 0 0x0000000000.. 0x4000000000.. B 1 0 0x4000000000.. 0x8000000000.. C 1 0 0x8000000000.. 0xc000000000.. D 1 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... ©2012 DataStax
  • 18. Start End 0xc000000000.. 0x0000000000.. A 1 0 0x0000000000.. 0x4000000000.. B 1 0 0x4000000000.. 0x8000000000.. C 1 0 0x8000000000.. 0xc000000000.. D 1 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... ©2012 DataStax
  • 19. Start End 0xc000000000.. 0x0000000000.. A 1 0 0x0000000000.. 0x4000000000.. B 1 0 0x4000000000.. 0x8000000000.. C 1 0 0x8000000000.. 0xc000000000.. D 1 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... ©2012 DataStax
  • 20. Start End 0xc000000000.. 0x0000000000.. A 1 0 0x0000000000.. 0x4000000000.. B 1 0 0x4000000000.. 0x8000000000.. C 1 0 0x8000000000.. 0xc000000000.. D 1 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... ©2012 DataStax
  • 21. Start End 0xc000000000.. 0x0000000000.. A 1 0 0x0000000000.. 0x4000000000.. B 1 0 0x4000000000.. 0x8000000000.. C 1 0 0x8000000000.. 0xc000000000.. D 1 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... ©2012 DataStax
  • 22. Replication Node A Node B Node D Node C carol a9a0198010... ©2012 DataStax
  • 23. Node A Node B Node D Node C carol a9a0198010... ©2012 DataStax
  • 24. Node A Node B Node D Node C carol a9a0198010... ©2012 DataStax
  • 25. Highlights • Adding capacity is application-transparent and requires no downtime • No SPOF, not even temporarily • No “primary” replica • Configurable synchronous/asynchronous • Tolerates node failure; never have to restart replication “from scratch” • “Smart” replication avoids correlated failures ©2012 DataStax
  • 26. What about performance? • Log-structured storage engine avoids random i/ o • Excellent performance on both reads and writes • Row-level isolation via concurrent algorithms • no locking • Built in compression improves cache hotness • “Row cache” can replace memcached ©2012 DataStax
  • 27. reads/s writes/s 35000 30000 25000 20000 15000 10000 5000 Cassandra 0.6 0 ©2012 DataStax Cassandra 1.0
  • 29. Netflix Application/Use Case • Manage subscriber interactions with downloaded movies • Need to handle distributed databases all over the world (40 countries) • Need better TCO than Oracle simple text Why Cassandra? • Easy scale and multi-data center support for geographical data distribution • Data model perfect fit for customer interaction data • Much better TCO than Oracle or SimpleDB “I can create a Cassandra cluster in any region of the world in 10 minutes. When marketing guys decide we want to move into a certain part of the world, we’re ready.” ©2012 DataStax
  • 30. Constant Contact Application/Use Case • Manage marketing/email campaigns for small businesses • Needed database to handle social media data that is very large in volume and must be maintained for long time • Data is unstructured in nature simple text Why Cassandra? • Cassandra built for big data scale and able to persist, manage, and quickly query big data • Deployed application on Cassandra in 1/3rd the time and 1/10th the cost of Oracle “Whenever we need new capacity, we just add new nodes online and we’re able to meet whatever demand we have. Cassandra is great for that.” ©2012 DataStax
  • 31. ReachLocal Application/Use Case • ReachLocal provides end-to-end Internet advertising services to small and medium- sized businesses in eight countries • Must track most or all user interaction with marketing campaigns on web sites simple text Why Cassandra? • The amount of information was beyond the scalability limits of traditional RDBMS’s • Has to replicate data to six data centers around the world • Needed integration with real-time data and analytics/search ©2012 DataStax
  • 32. Backupify Application/Use Case • Cloud-based utility that enables backups and searches of Google Apps, Gmail, Facebook, Twitter, Blogger and other content. • Must write lots of data very quickly simple text Why Cassandra? • Big data requirements necessitated easy scale out and continuously available database architecture • Strong Community support of Cassandra • TCO was much better than others “Cassandra was just a better design all around – more truly horizontally scalable and with less management overhead – and there’s no single point of failure. I looked at Cassandra’s architecture and thought, ‘Yeah, that’s how you do it.’” ©2012 DataStax
  • 33. OpenWave Application/Use Case • Openwave Messaging delivers next generation converged messaging platform with cloud and social integration capabilities. simple text Why Cassandra? • Needed new database that would support geographic redundancy, continuous availability, and big data scale • Required high IOPS database speed • Better TCO than prior Oracle database “Here are the big ‘checkbox’ items for us with Apache Cassandra: There is no single point of failure, it offers high read- and-write performance, and it has the ability to work on commodity hardware”. ©2012 DataStax
  • 34. Healthx Application/Use Case • Develops and manages online portals for healthcare market • Delivered via cloud platform • Manages provider, patient, and other related data simple text Why DataStax Enterprise? • Needed to scale, perform, and search data faster than previous Microsoft SQL Server database farm • Integrated big data platform that provides one database cluster for all real-time and search data “We really like the integration with Solr. We get the full redundancy that you’d expect out of Cassandra as well as the full text indexing of Solr. The two things together make a win.” ©2012 DataStax
  • 35. Big data Analytics Realtime ? (Hadoop) (“NoSQL”) ©2012 DataStax
  • 36. The evolution of Analytics Analytics + Realtime ©2012 DataStax
  • 37. The evolution of Analytics replication Analytics Realtime ©2012 DataStax
  • 38. The evolution of Analytics ETL ©2012 DataStax
  • 39. Big data Analytics Datastax Realtime (Hadoop) Enterprise (Cassandra) ©2012 DataStax
  • 40. Reunification of realtime + analytics ©2012 DataStax
  • 42. Portfolio Demo dataflow Portfolios Portfolios Historical Prices Live Prices for Intermediate today Results Largest loss Largest loss ©2012 DataStax
  • 43. Better Hadoop than Hadoop • “Vanilla” Hadoop • 8+ services to setup, monitor, backup, and recover (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Region Server,...) • Single points of failure • Can't separate online and offline processing • DataStax Enterprise • Single, simplified component • Self-organizes based on workload • Peer to peer • JobTracker failover ©2012 DataStax
  • 44. Enterprise search with Solr SELECT title FROM solr WHERE solr_query='title:natio*'; title -------------------------------------------------------------------------- Bolivia national football team 2002 List of French born footballers who have played for other national teams Lithuania national basketball team at Eurobasket 2009 Bolivia national football team 2000 Kenya national under-20 football team Bolivia national football team 1999 Israel men's national inline hockey team Bolivia national football team 2001 ©2012 DataStax
  • 45. Managing & Monitoring Big Data DataStax OpsCenter manages and monitors all Cassandra and Hadoop operations ©2012 DataStax
  • 46. Questions? • http://www.datastax.com/docs • http://www.datastax.com/dev/blog/whats- new-in-cassandra-1-1 • http://www.datastax.com/products/enterprise ©2012 DataStax