Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Cassandra: NoSQL in the enterprise


Published on

Published in: Technology
  • Be the first to comment

Apache Cassandra: NoSQL in the enterprise

  1. 1. Apache Cassandra:NoSQL in theEnterprise, today Jonathan Ellis CTO @spyced
  2. 2. Cassandra Job Trends (
  3. 3. “Big Data” trend
  4. 4. Why Big Data MattersResearch done by McKinsey & Company shows the eye-opening, 10-yearcategory growth rate differences between businesses that smartly use their bigdata and those that do not.
  5. 5. Big data Analytics Realtime ? (Hadoop) (“NoSQL”)
  6. 6. Some users✤ Financial✤ Social Media✤ Advertising✤ Entertainment✤ Energy✤ E-tail✤ Health care✤ Government
  7. 7. Common use cases✤ Time series data✤ Messaging✤ Ad tracking✤ Data mining✤ User activity streams✤ User sessions✤ Anything requiring: Scalable + performant + highly available
  8. 8. Why Cassandra?✤ Fully distributed, no SPOF✤ Multi-master, multi-DC✤ Linearly scalable✤ Larger-than-memory datasets✤ Best-in-class performance (not just writes!)✤ Fully durable✤ Integrated caching✤ Tuneable consistency
  9. 9. Classing partitioning with SPOF partition 1 partition 2 partition 3 partition 4 slave slave master request router
  10. 10. Fully distributed, no SPOF client p3 p6 p1 p1 p1
  11. 11. Performance summary
  12. 12. “With Cassandra, we get better business agility, and wedon’t have to plan capacity in advance, we don’t need toask permission of other people to build things for us,and we don’t worry about running out of space orpower.”Adrian Cockcroft, Cloud Architect
  13. 13. Netflix on Cassandra✤ Could not build datacenters fast enough✤ Made decision to go to cloud (AWS)✤ Applications include Netflix’s subscriber system, AB testing, and viewing history service✤ Over a year in, Netflix finds Cassandra to be ✤ Fast ✤ Cost-effective ✤ Scalable ✤ Flexible ✤ Reliable: no SPOF
  14. 14. “Without Cassandra, our engineers would’ve had tocreate something that could scale to our needs, thatwould’ve prevented us from focusing on buildingproduct and solving problems for Backupify’s users,which are far more important tasks.”Matt Conway, VP Engineering
  15. 15. Backupify on Cassandra✤ Cloud-based utility that enables businesses and consumers to backup, search and restore the content of popular online applications such as Google Apps, Gmail, Facebook, Twitter, and Blogger✤ Cassandra findings: ✤ Solved scaling, allowing engineers to focus on their business ✤ DataStax OpsCenter made it easy to monitor the health and performance of their cluster ✤ Reliable, redundant and scalable data storage helped eliminate down-time ✤ Ability to offer both backup and storage, but also analysis
  16. 16. “You can seamlessly add new nodes and expand yourtotal capacity without deteriorating the performance ofthe data store. Cassandra has allowed us to scale veryeffectively.”Harry Robertson, Tech Lead
  17. 17. Ooyala on Cassandra✤ Ooyala provides a suite of technologies and services that support content owners in managing, analyzing and monetizing the digital video they publish online✤ Cassandra findings: ✤ Classic “Big Data” problem did not require re-architecting ✤ Delivered ability to respond to increasingly sophisticated analytic needs of customers ✤ Developers spend time building application features, not figuring out how to scale
  18. 18. “Cassandra has allowed us to build bigger featuresfaster and more reliably, while using less money andwithout needing to expand our staff.”Kyle Ambroff, Sr. Engineer
  19. 19. Formspring on Cassandra✤ Users of Formspring engage with and learn more about each other by asking and responding to questions. Close to 4B responses in the system and 30M unique users✤ Cassandra experience ✤ No sharding needed – just add nodes to scale ✤ Performance – the popular users with many followers saw no speed reduction. No more memcached! ✤ Flexibility of a schema-optional architecture is very developer friendly
  20. 20. Big data Analytics Realtime ? (Hadoop) (“NoSQL”)
  21. 21. The evolution of Analytics Analytics + Realtime
  22. 22. The evolution of Analytics replication Analytics Realtime
  23. 23. The evolution of Analytics ETL
  24. 24. Big data Analytics Datastax Realtime (Hadoop) Enterprise (“NoSQL”)
  25. 25. DataStax Enterprise re-unifiesrealtime and analytics
  26. 26. Portfolio Demo dataflowPortfolios PortfoliosHistorical Prices Live Prices for todayIntermediate ResultsLargest loss Largest loss
  27. 27. Operations✤ “Vanilla” Hadoop ✤ 8+ services to setup, monitor, backup, and recover (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Region Server,...) ✤ Single points of failure ✤ Cant separate online and offline processing✤ DataStax Enterprise ✤ Single, simplified component ✤ Self-organizes based on workload ✤ Peer to peer ✤ JobTracker failover
  28. 28. Managing & Monitoring Big Data✤ DataStax OpsCenter manages and monitors all Cassandra and Hadoop operations
  29. 29. Questions?