Handwritten Text Recognition for manuscripts and early printed texts
Why Every NoSQL Deployment Should Be Paired with Hadoop Webinar
1. Why every NoSQL deployment should
be paired with Hadoop
James Phillips Amr Awadallah
Co-founder and SVP Products Co-founder and CTO
Couchbase Cloudera
1
2. Agenda
• Big Audience vs. Big Data
• NoSQL for Big Audience
• Hadoop for Big Data
• Big Audiences create and consume Big Data
– NoSQL and Hadoop are highly synergistic
• Couchbase + Cloudera
2
4. Two challenges at the data layer
“Big Audience.” “Big Data.”
Most new interactive software IDC estimates that more than 1.8
systems are accessed via browser trillion gigabytes of information was
with 2 billion potential users and a created in 2011 and that it will
24x7 uptime requirement. double every two years.
4
7. Modern interactive software architecture
Application Scales Out
Just add more commodity web servers
Database Scales Up
Get a bigger, more complex server
Note – Relational database technology is great for what it is great for, but it is not great for this.
7
8. Extending the scope of RDBMS technology
• Data partitioning (“sharding”)
– Disruptive to reshard – impacts application
– No cross-shard joins
– Schema management at every shard
• Denormalizng
– Increases speed
– At the limit, provides complete flexibility
– Eliminates relational query benefits
• Distributed caching
– Accelerate reads
– Scale out
– Another tier, no write acceleration, coherency management
8
9. Lacking market solutions, users forced to invent
Bigtable Dynamo Cassandra Voldemort
November 2006 October 2007 August 2008 February 2009
• No schema required before inserting data
• No schema change required to change data format
• Auto-sharding without application participation
• Distributed queries
• Integrated main memory caching
• Data synchronization (mobile, multi-datacenter)
9
10. NoSQL database matches application logic tier architecture
Data layer now scales with linear cost and constant performance.
Application Scales Out
Just add more commodity web servers
NoSQL Database Servers
Database Scales Out
Just add more commodity data servers
Scaling out flattens the cost and performance curves.
10
11. Survey: Schema inflexibility #1 adoption driver
What is the biggest data management problem
driving your use of NoSQL in the coming year?
Lack of flexibility/rigid schemas 49%
Inability to scale out data 35%
High latency/low performance 29%
Costs 16%
All of these 12%
Other 11%
Source: Couchbase NoSQL Survey, December 2011, n=1351
11
20. Hadoop as a Web application feeder or consumer
Pattern 1 Pattern 2
Hadoop feeding a web application Hadoop consuming web application data
big audience
“big audience”
insights
Web
“big data” application
Web
application
insights
big data
20
21. Pattern 1 Case Study: AOL Ad Targeting
• One of the largest online ad targeting operations
• Ad slot filling optimization
– Serve the most relevant ad to a given user
– Meet contracted impression counts
• Relevancy criteria
– Demographic
– Psychographic
– Current behavioral
• 40 milliseconds to fill all slots
21
22. AOL Advertising: Hadoop as an ad targeting feeder
40 milliseconds to respond
with the decision.
profiles, real time campaign
3 statistics
affiliates
2
1 profiles, campaigns
events
22
23. Pattern 2 Case Study: Social gaming user analysis
• Tens to hundreds of millions of users
• Game optimization requirements
– Keep game fresh and retain audience
– Maximize revenue through offer and experience tuning
• Very different data management tasks
– Serving game data
• System of record game data
• Very low latency data access
• Non-disruptive elasticity
• Complex queries
– Analyzing user behavior
• Not game data, rather user behavior data
• High-throughput data analysis
23
24. Social Game: Game optimization via Hadoop
User
interacting 1
with game
Insights
5
Validation and
response 2
4
Game and user data User behavioral data
system of record
3
24