The Future Of Big Data

8,532 views

Published on

A high level overview of common Cassandra use cases, adoption reasons, BigData trends, DataStax Enterprise and the future of BigData given at the 7th Advanced Computing Conference in Seoul, South Korea

Published in: Technology
1 Comment
10 Likes
Statistics
Notes
No Downloads
Views
Total views
8,532
On SlideShare
0
From Embeds
0
Number of Embeds
1,938
Actions
Shares
0
Downloads
125
Comments
1
Likes
10
Embeds 0
No embeds

No notes for slide

The Future Of Big Data

  1. 1. Cassandra 1.0The Future Of Big DataMatthew F. Dennis // @mdennis7th Advanced Computing ConferenceSeoul, South KoreaFebruary 15th, 2012
  2. 2. Cassandra Job Trends (indeed.com)
  3. 3. Cassandra Job Trends (indeed.com)
  4. 4. “Big Data” Job Trends (indeed.com)
  5. 5. Big Data
  6. 6. Why People Choose Cassandra True Multi­DC Support Linearly scalable Larger­than­memory datasets Best­in­class performance (not just for writes!) Fully durable Integrated caching Tuneable consistency No single point of failure (SPOF)
  7. 7. Common Cassandra Use Cases Time Series  Sensor Data Messaging Ad Tracking Financial Market Data User Activity Streams Fraud Detection / Risk Analysis Anything Requiring: linear scale + high performance + global availability
  8. 8. “With Cassandra, we get better business agility, and we don’t have to plan capacity in advance, we don’t need to ask permission of other people to build things for us, and we don’t worry about running out of space or power.”   Adrian Cockcroft, Cloud Architect
  9. 9. Netflix’s problems Could not build datacenters fast enough Made decision to go to cloud (AWS) Cassandra on AWS is a key infrastructure  component of its globally distributed  streaming product. Applications include Netflix’s subscriber  system, AB testing, and viewing history  service (including pause/resume).
  10. 10. Netflix on Cassandra Fast Cheap Scalable Flexible No SPOF
  11. 11. Scale Horizontally http://www.datastax.com/1-million-writesClient Writes Per Second Number Of Nodes
  12. 12. “Without Cassandra, our engineers would’ve had to create something that could scale to our needs, that would’ve prevented us from focusing on building product and solving problems for Backupify’s users, which are far more important tasks.”Matt Conway, VP Engineering
  13. 13. Backupify’s problemCloud­based utility that enables businesses and consumers to backup, search and restore the content of popular online applications such as Google Apps, Gmail, Facebook, Twitter, and Blogger
  14. 14. Backupify on CassandraEase of scale enabled engineers to focus on building great applicationsDataStax OpsCenter made it easy to monitor the health and performance of their clusterReliable, redundant, scalable and cheap data  storage helped eliminate down­timeAbility to offer both backup and storage, but   also analysis of data in the future
  15. 15. “You can seamlessly add new nodes and expand your total capacity without deteriorating the performance of the data store. Cassandra has allowed us to scale very effectively.”Harry Robertson, Tech Lead
  16. 16. Ooyala’s problemOoyala provides a suite of technologies and services that support content owners in managing, analyzing and monetizing the digital video they publish online
  17. 17. Ooyala on CassandraClassic “Big Data” problem did not require re­architectingEnabled Application agility – developers spend time building cool apps, not figuring out how to scaleEnabled more powerful and granular analytics for their customers
  18. 18. Some More Cassandra Users http://www.datastax.com/cassandrausersFinancialSocial MediaAdvertisingEntertainmentEnergyE­TailHealth CareInfrastructureGovernment
  19. 19. Big Data
  20. 20. The evolution of Analytics Analytics + Realtime
  21. 21. The evolution of Analytics replication Analytics Realtime
  22. 22. The evolution of Analytics ETL Analytics Realtime
  23. 23. DataStax Enterprise re-unifies realtime and analytics
  24. 24. realtime and analytics
  25. 25. Portfolio Demo dataflowPortfolios PortfoliosHistorical Prices Live Prices for todayIntermediate ResultsLargest loss Largest loss
  26. 26. Operations“Vanilla” Hadoop Many pieces to setup, monitor, backup, and maintain (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper,  Region Server, ...) Single points of failureDataStax Enterprise Single simplified system Self­organizes based on workload Peer to peer JobTracker failover No additional Cassandra config
  27. 27. Monitoring Cassandra (OpsCenter)
  28. 28. Q?Matthew F. Dennis // @mdennishttp://slideshare.net/mattdennis

×