Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar


Published on

Published in: Technology
  • Be the first to comment

Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar

  1. 1. Why every NoSQL deployment should be paired with Hadoop James Phillips Amr Awadallah Co-founder and SVP Products Co-founder and CTO Couchbase Cloudera 1
  2. 2. Agenda• Big Audience vs. Big Data• NoSQL for Big Audience• Hadoop for Big Data• Big Audiences create and consume Big Data – NoSQL and Hadoop are highly synergistic• Couchbase + Cloudera 2
  3. 3. Aren’t NoSQL, Hadoop, “Big Data” all the same? No. 3
  4. 4. Two challenges at the data layer “Big Audience.” “Big Data.” Most new interactive software IDC estimates that more than 1.8 systems are accessed via browser trillion gigabytes of information was with 2 billion potential users and a created in 2011 and that it will 24x7 uptime requirement. double every two years. 4
  5. 5. 5
  6. 6. Changes in interactive software – NoSQL driver 6
  7. 7. Modern interactive software architecture Application Scales Out Just add more commodity web servers Database Scales Up Get a bigger, more complex server Note – Relational database technology is great for what it is great for, but it is not great for this. 7
  8. 8. Extending the scope of RDBMS technology• Data partitioning (“sharding”) – Disruptive to reshard – impacts application – No cross-shard joins – Schema management at every shard• Denormalizng – Increases speed – At the limit, provides complete flexibility – Eliminates relational query benefits• Distributed caching – Accelerate reads – Scale out – Another tier, no write acceleration, coherency management 8
  9. 9. Lacking market solutions, users forced to invent Bigtable Dynamo Cassandra Voldemort November 2006 October 2007 August 2008 February 2009 • No schema required before inserting data • No schema change required to change data format • Auto-sharding without application participation • Distributed queries • Integrated main memory caching • Data synchronization (mobile, multi-datacenter) 9
  10. 10. NoSQL database matches application logic tier architectureData layer now scales with linear cost and constant performance. Application Scales Out Just add more commodity web servers NoSQL Database Servers Database Scales Out Just add more commodity data servers Scaling out flattens the cost and performance curves. 10
  11. 11. Survey: Schema inflexibility #1 adoption driver What is the biggest data management problem driving your use of NoSQL in the coming year? Lack of flexibility/rigid schemas 49% Inability to scale out data 35% High latency/low performance 29% Costs 16% All of these 12% Other 11% Source: Couchbase NoSQL Survey, December 2011, n=1351 11
  12. 12. 12
  13. 13. 13
  14. 14. 14
  15. 15. 15
  16. 16. 16
  17. 17. 17
  18. 18. 18
  19. 19. Two peas. One pod. 19
  20. 20. Hadoop as a Web application feeder or consumerPattern 1 Pattern 2Hadoop feeding a web application Hadoop consuming web application data big audience “big audience” insights Web “big data” application Web application insights big data 20
  21. 21. Pattern 1 Case Study: AOL Ad Targeting• One of the largest online ad targeting operations• Ad slot filling optimization – Serve the most relevant ad to a given user – Meet contracted impression counts• Relevancy criteria – Demographic – Psychographic – Current behavioral• 40 milliseconds to fill all slots 21
  22. 22. AOL Advertising: Hadoop as an ad targeting feeder 40 milliseconds to respond with the decision. profiles, real time campaign 3 statistics affiliates 2 1 profiles, campaigns events 22
  23. 23. Pattern 2 Case Study: Social gaming user analysis• Tens to hundreds of millions of users• Game optimization requirements – Keep game fresh and retain audience – Maximize revenue through offer and experience tuning• Very different data management tasks – Serving game data • System of record game data • Very low latency data access • Non-disruptive elasticity • Complex queries – Analyzing user behavior • Not game data, rather user behavior data • High-throughput data analysis 23
  24. 24. Social Game: Game optimization via Hadoop User interacting 1 with game Insights 5 Validation and response 2 4 Game and user data User behavioral data system of record 3 24
  25. 25. 25
  26. 26. Couchcbase Sqoop connector for Cloudera Cloudera-certified connector Bi-directional data movement - Hadoop -> Couchbase - Couchbase -> Hadoop 26