0
Apache Cassandra:Real-world scalability, today!Jonathan EllisCTO
Cassandra Job Trends©2012 DataStax
“Big Data” trend©2012 DataStax
Why Big Data Matters           Research done by McKinsey & Company shows the eye-opening, 10-           year category grow...
Big data           Analytics        Realtime                       ?           (Hadoop)        (“NoSQL”)©2012 DataStax
Some Casandra users ©2012 DataStax
Industries & use cases  • Financial      • Time series data  • Social Media   • Messaging  • Advertising    • Ad tracking ...
Why Cassandra? • Fully distributed, no SPOF • Multi-master, multi-DC • Linearly scalable • Larger-than-memory datasets • B...
Availability  • “There is no such thing as standby            infrastructure: there is stuff you always use and           ...
Classic partitioning with SPOF                 partition 1   partition 2      partition 3   partition 4                   ...
Fully distributed, no SPOF                 client                          p3                                p6        p1 ...
©2012 DataStax
Partitioning                  jim     age: 36   car: camaro gender: M                 carol    age: 37   car: subaru   gen...
Partitioning           Primary key determines placement*                  jim     age: 36   car: camaro gender: M         ...
PK      MD5 Hash                  jim     5e02739678...      MD5 hash                                          operation y...
The “token ring”                 Node A   Node B                 Node D   Node C©2012 DataStax
Start             End                     0xc000000000..   0x0000000000..                 A         1                0    ...
Start             End                     0xc000000000..   0x0000000000..                 A         1                0    ...
Start             End                     0xc000000000..   0x0000000000..                 A         1                0    ...
Start             End                     0xc000000000..   0x0000000000..                 A         1                0    ...
Start             End                     0xc000000000..   0x0000000000..                 A         1                0    ...
Replication                             Node A   Node B                             Node D   Node C       carol     a9a019...
Node A   Node B                             Node D   Node C       carol     a9a0198010...©2012 DataStax
Node A   Node B                             Node D   Node C       carol     a9a0198010...©2012 DataStax
Highlights • Adding capacity is application-transparent and            requires no downtime     • No SPOF, not even tempor...
What about performance? • Log-structured storage engine avoids random i/            o     • Excellent performance on both ...
reads/s            writes/s                                                                       35000                   ...
©2012 DataStax
Netflix                       Application/Use Case                       • Manage subscriber interactions with            ...
Constant Contact                    Application/Use Case                    • Manage marketing/email campaigns for        ...
ReachLocal                 Application/Use Case                 • ReachLocal provides end-to-end Internet                 ...
Backupify                     Application/Use Case                     • Cloud-based utility that enables backups and     ...
OpenWave                     Application/Use Case                     • Openwave Messaging delivers next                  ...
Healthx                 Application/Use Case                 • Develops and manages online portals for                   h...
Big data           Analytics        Realtime                       ?           (Hadoop)        (“NoSQL”)©2012 DataStax
The evolution of Analytics                 Analytics + Realtime©2012 DataStax
The evolution of Analytics                             replication                 Analytics                 Realtime©2012...
The evolution of Analytics                 ETL©2012 DataStax
Big data           Analytics    Datastax     Realtime           (Hadoop)    Enterprise   (Cassandra)©2012 DataStax
Reunification of realtime + analytics©2012 DataStax
©2012 DataStax
Portfolio Demo dataflowPortfolios                    PortfoliosHistorical Prices       Live Prices forIntermediate         ...
Better Hadoop than Hadoop  • “Vanilla” Hadoop           •     8+ services to setup, monitor, backup, and recover          ...
Enterprise search with Solr SELECT title FROM solr WHERE solr_query=title:natio*;  title ---------------------------------...
Managing & Monitoring Big Data      DataStax      OpsCenter      manages and      monitors all      Cassandra and      Had...
Questions?     • http://www.datastax.com/docs     • http://www.datastax.com/dev/blog/whats-            new-in-cassandra-1-...
Upcoming SlideShare
Loading in...5
×

Cassandra at NoSql Matters 2012

2,979,874

Published on

Transcript of "Cassandra at NoSql Matters 2012"

  1. 1. Apache Cassandra:Real-world scalability, today!Jonathan EllisCTO
  2. 2. Cassandra Job Trends©2012 DataStax
  3. 3. “Big Data” trend©2012 DataStax
  4. 4. Why Big Data Matters Research done by McKinsey & Company shows the eye-opening, 10- year category growth rate differences between businesses that smartly use their big data and those that do not.©2012 DataStax
  5. 5. Big data Analytics Realtime ? (Hadoop) (“NoSQL”)©2012 DataStax
  6. 6. Some Casandra users ©2012 DataStax
  7. 7. Industries & use cases • Financial • Time series data • Social Media • Messaging • Advertising • Ad tracking • Entertainment • Data mining • Energy • User activity streams • E-tail • User sessions • Health care • Anything requiring: Scalable performant • Government + highly available©2012 DataStax
  8. 8. Why Cassandra? • Fully distributed, no SPOF • Multi-master, multi-DC • Linearly scalable • Larger-than-memory datasets • Best-in-class performance (not just writes!) • Fully durable • Integrated caching • Tuneable consistency©2012 DataStax
  9. 9. Availability • “There is no such thing as standby infrastructure: there is stuff you always use and stuff that won’t work when you need it.” -- Ben Black: founder, Boundary; ex-AWS • “The biggest problem with failover is that youre almost never using it until it really hurts. Its like backups that you never test.” -- Rick Branson: instagram; ex-DataStax©2012 DataStax
  10. 10. Classic partitioning with SPOF partition 1 partition 2 partition 3 partition 4 router client©2012 DataStax
  11. 11. Fully distributed, no SPOF client p3 p6 p1 p1 p1©2012 DataStax
  12. 12. ©2012 DataStax
  13. 13. Partitioning jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F©2012 DataStax
  14. 14. Partitioning Primary key determines placement* jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F©2012 DataStax
  15. 15. PK MD5 Hash jim 5e02739678... MD5 hash operation yields carol a9a0198010... a 128-bit johnny f4eb27cea7... number for keys suzy 78b421309e... of any size.©2012 DataStax
  16. 16. The “token ring” Node A Node B Node D Node C©2012 DataStax
  17. 17. Start End 0xc000000000.. 0x0000000000.. A 1 0 0x0000000000.. 0x4000000000.. B 1 0 0x4000000000.. 0x8000000000.. C 1 0 0x8000000000.. 0xc000000000.. D 1 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...©2012 DataStax
  18. 18. Start End 0xc000000000.. 0x0000000000.. A 1 0 0x0000000000.. 0x4000000000.. B 1 0 0x4000000000.. 0x8000000000.. C 1 0 0x8000000000.. 0xc000000000.. D 1 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...©2012 DataStax
  19. 19. Start End 0xc000000000.. 0x0000000000.. A 1 0 0x0000000000.. 0x4000000000.. B 1 0 0x4000000000.. 0x8000000000.. C 1 0 0x8000000000.. 0xc000000000.. D 1 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...©2012 DataStax
  20. 20. Start End 0xc000000000.. 0x0000000000.. A 1 0 0x0000000000.. 0x4000000000.. B 1 0 0x4000000000.. 0x8000000000.. C 1 0 0x8000000000.. 0xc000000000.. D 1 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...©2012 DataStax
  21. 21. Start End 0xc000000000.. 0x0000000000.. A 1 0 0x0000000000.. 0x4000000000.. B 1 0 0x4000000000.. 0x8000000000.. C 1 0 0x8000000000.. 0xc000000000.. D 1 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...©2012 DataStax
  22. 22. Replication Node A Node B Node D Node C carol a9a0198010...©2012 DataStax
  23. 23. Node A Node B Node D Node C carol a9a0198010...©2012 DataStax
  24. 24. Node A Node B Node D Node C carol a9a0198010...©2012 DataStax
  25. 25. Highlights • Adding capacity is application-transparent and requires no downtime • No SPOF, not even temporarily • No “primary” replica • Configurable synchronous/asynchronous • Tolerates node failure; never have to restart replication “from scratch” • “Smart” replication avoids correlated failures©2012 DataStax
  26. 26. What about performance? • Log-structured storage engine avoids random i/ o • Excellent performance on both reads and writes • Row-level isolation via concurrent algorithms • no locking • Built in compression improves cache hotness • “Row cache” can replace memcached©2012 DataStax
  27. 27. reads/s writes/s 35000 30000 25000 20000 15000 10000 5000 Cassandra 0.6 0©2012 DataStax Cassandra 1.0
  28. 28. ©2012 DataStax
  29. 29. Netflix Application/Use Case • Manage subscriber interactions with downloaded movies • Need to handle distributed databases all over the world (40 countries) • Need better TCO than Oraclesimple text Why Cassandra? • Easy scale and multi-data center support for geographical data distribution • Data model perfect fit for customer interaction data • Much better TCO than Oracle or SimpleDB “I can create a Cassandra cluster in any region of the world in 10 minutes. When marketing guys decide we want to move into a certain part of the world, we’re ready.”©2012 DataStax
  30. 30. Constant Contact Application/Use Case • Manage marketing/email campaigns for small businesses • Needed database to handle social media data that is very large in volume and must be maintained for long time • Data is unstructured in naturesimple text Why Cassandra? • Cassandra built for big data scale and able to persist, manage, and quickly query big data • Deployed application on Cassandra in 1/3rd the time and 1/10th the cost of Oracle “Whenever we need new capacity, we just add new nodes online and we’re able to meet whatever demand we have. Cassandra is great for that.”©2012 DataStax
  31. 31. ReachLocal Application/Use Case • ReachLocal provides end-to-end Internet advertising services to small and medium- sized businesses in eight countries • Must track most or all user interaction with marketing campaigns on web sitessimple text Why Cassandra? • The amount of information was beyond the scalability limits of traditional RDBMS’s • Has to replicate data to six data centers around the world • Needed integration with real-time data and analytics/search©2012 DataStax
  32. 32. Backupify Application/Use Case • Cloud-based utility that enables backups and searches of Google Apps, Gmail, Facebook, Twitter, Blogger and other content. • Must write lots of data very quicklysimple text Why Cassandra? • Big data requirements necessitated easy scale out and continuously available database architecture • Strong Community support of Cassandra • TCO was much better than others “Cassandra was just a better design all around – more truly horizontally scalable and with less management overhead – and there’s no single point of failure. I looked at Cassandra’s architecture and thought, ‘Yeah, that’s how you do it.’”©2012 DataStax
  33. 33. OpenWave Application/Use Case • Openwave Messaging delivers next generation converged messaging platform with cloud and social integration capabilities.simple text Why Cassandra? • Needed new database that would support geographic redundancy, continuous availability, and big data scale • Required high IOPS database speed • Better TCO than prior Oracle database “Here are the big ‘checkbox’ items for us with Apache Cassandra: There is no single point of failure, it offers high read- and-write performance, and it has the ability to work on commodity hardware”.©2012 DataStax
  34. 34. Healthx Application/Use Case • Develops and manages online portals for healthcare market • Delivered via cloud platform • Manages provider, patient, and other related datasimple text Why DataStax Enterprise? • Needed to scale, perform, and search data faster than previous Microsoft SQL Server database farm • Integrated big data platform that provides one database cluster for all real-time and search data “We really like the integration with Solr. We get the full redundancy that you’d expect out of Cassandra as well as the full text indexing of Solr. The two things together make a win.”©2012 DataStax
  35. 35. Big data Analytics Realtime ? (Hadoop) (“NoSQL”)©2012 DataStax
  36. 36. The evolution of Analytics Analytics + Realtime©2012 DataStax
  37. 37. The evolution of Analytics replication Analytics Realtime©2012 DataStax
  38. 38. The evolution of Analytics ETL©2012 DataStax
  39. 39. Big data Analytics Datastax Realtime (Hadoop) Enterprise (Cassandra)©2012 DataStax
  40. 40. Reunification of realtime + analytics©2012 DataStax
  41. 41. ©2012 DataStax
  42. 42. Portfolio Demo dataflowPortfolios PortfoliosHistorical Prices Live Prices forIntermediate todayResultsLargest loss Largest loss ©2012 DataStax
  43. 43. Better Hadoop than Hadoop • “Vanilla” Hadoop • 8+ services to setup, monitor, backup, and recover (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Region Server,...) • Single points of failure • Cant separate online and offline processing • DataStax Enterprise • Single, simplified component • Self-organizes based on workload • Peer to peer • JobTracker failover©2012 DataStax
  44. 44. Enterprise search with Solr SELECT title FROM solr WHERE solr_query=title:natio*; title -------------------------------------------------------------------------- Bolivia national football team 2002 List of French born footballers who have played for other national teams Lithuania national basketball team at Eurobasket 2009 Bolivia national football team 2000 Kenya national under-20 football team Bolivia national football team 1999 Israel mens national inline hockey team Bolivia national football team 2001©2012 DataStax
  45. 45. Managing & Monitoring Big Data DataStax OpsCenter manages and monitors all Cassandra and Hadoop operations ©2012 DataStax
  46. 46. Questions? • http://www.datastax.com/docs • http://www.datastax.com/dev/blog/whats- new-in-cassandra-1-1 • http://www.datastax.com/products/enterprise©2012 DataStax
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×