• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
NoSQL! is it for you?
 

NoSQL! is it for you?

on

  • 5,221 views

A detailed analysis on various NoSQL solutions based on CAP theorem. This talk was given my me at BASIS softexpo

A detailed analysis on various NoSQL solutions based on CAP theorem. This talk was given my me at BASIS softexpo

Statistics

Views

Total Views
5,221
Views on SlideShare
5,215
Embed Views
6

Actions

Likes
17
Downloads
359
Comments
0

3 Embeds 6

http://www.linkedin.com 3
http://www.scoop.it 2
http://blog.houen.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    NoSQL! is it for you? NoSQL! is it for you? Presentation Transcript

    • NoSQL What it is and is it for you? Iraj Islam Rubayeet Islam Nurul Ferdous NewsCredThursday, February 3, 2011
    • Agenda NewsCred • Part 1. Why NoSQL? • Part 2. NoSQL Use Cases • Part 3. Choosing a NoSQL Solution • Part 4. Understanding MongoDB • Part 5. Building a MongoDB App • Part 6. Scaling MongoDB • QuestionsThursday, February 3, 2011
    • Who We Are NewsCred Iraj Islam CTO/Co-founder, NewsCred Rubayeet Islam Senior Software Engineer, NewsCred Nurul Ferdous Senior Software Engineer, NewsCredThursday, February 3, 2011
    • Our Story NewsCred Launched 2008 Founded by two Bangladeshis 2008 Funded By Investors of Twitter Floodgate Ventures (twitter), Bessemer Cap. (LinkedIn) Top-tier Clients Yahoo! Orange Telecom, Harvard U, The Daily Star etc.Thursday, February 3, 2011
    • What We Do NewsCred Domain Expertise • Big Data • Information Retrieval • Machine Learning • Semantic Web Technologies • Apache Solr • MySQL/MongoDB • Python/JavaThursday, February 3, 2011
    • Part 1 Why NoSQL? NewsCredThursday, February 3, 2011
    • What’s NoSQL? NewsCred NoSQL What’s with the weird name?Thursday, February 3, 2011
    • What’s NoSQL? NewsCred NoSQL Non-relational, web-scale database.Thursday, February 3, 2011
    • Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing ModelThursday, February 3, 2011
    • Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Textual ContentThursday, February 3, 2011
    • Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Small Data Textual ContentThursday, February 3, 2011
    • Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual ContentThursday, February 3, 2011
    • Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual Content SearchThursday, February 3, 2011
    • Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual Content Personal Computer SearchThursday, February 3, 2011
    • Why NoSQL? NewsCred The Age of Big Data Exabytes (1018) of data stored per year 1000 750 500 250 2006 2007 2008 0 2009 2010Thursday, February 3, 2011
    • Why NoSQL? NewsCred Web 2.0+ The write intensive webThursday, February 3, 2011
    • Why NoSQL? NewsCred Web 2.0+ The write intensive web User-generated ContentThursday, February 3, 2011
    • Why NoSQL? NewsCred Web 2.0+ The write intensive web Big Data User-generated ContentThursday, February 3, 2011
    • Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Big Data User-generated ContentThursday, February 3, 2011
    • Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Big Data Semantic Web User-generated ContentThursday, February 3, 2011
    • Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web User-generated ContentThursday, February 3, 2011
    • Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web Ubiquity User-generated Content Any device. Anywhere.Thursday, February 3, 2011
    • Why NoSQL? NewsCred The MySQL Problem 1. Default Application Data Source Writing MySQL User ReadingThursday, February 3, 2011
    • Why NoSQL? NewsCred The MySQL Problem 1. Default Application Bottleneck, too much load! Data Source Writing MySQL User ReadingThursday, February 3, 2011
    • Why NoSQL? NewsCred The MySQL Problem 2. Replication Application Data Source Writing MySQL Master User Reading MySQL SlavesThursday, February 3, 2011
    • Why NoSQL? NewsCred The MySQL Problem 2. Replication Application Data Source Writing MySQL Master User Reading MySQL Slaves Scalable Reads!Thursday, February 3, 2011
    • Why NoSQL? NewsCred The MySQL Problem 2. Replication Bottleneck, writes won’t scale! Application Data Source Writing MySQL Master User Reading MySQL Slaves Scalable Reads!Thursday, February 3, 2011
    • Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Data Source Writing S MySQL User Reading SThursday, February 3, 2011
    • Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Great, scalable writes! Data Source Writing S MySQL User Reading SThursday, February 3, 2011
    • Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Great, scalable writes! Data Source Writing S MySQL User Reading S Development and maintenance costs just skyrocketed!Thursday, February 3, 2011
    • Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web Ubiquity User-generated Content Any device. Anywhere.Thursday, February 3, 2011
    • Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-freeThursday, February 3, 2011
    • Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writesThursday, February 3, 2011
    • Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writes Real-time >> High-performanceThursday, February 3, 2011
    • Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writes Real-time >> High-performance Ubiquity >> High-availabilityThursday, February 3, 2011
    • NoSQL vs RDMS NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availabilityThursday, February 3, 2011
    • NoSQL vs RDMS NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systemsThursday, February 3, 2011
    • Is NoSQL For You? NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systemsThursday, February 3, 2011
    • Is NoSQL For You? NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systemsThursday, February 3, 2011
    • Part 2 NoSQL Use Cases NewsCredThursday, February 3, 2011
    • Who’s Using NoSQL? NewsCredThursday, February 3, 2011
    • NoSQL Use Cases NewsCred • Consumer Use Cases • Facebook • Twitter • NetFlix • Enterprise Use Cases • Rackspace • TrendMicro • NewsCredThursday, February 3, 2011
    • NoSQL Use Cases NewsCred • Facebook • Hbase - Facebook messages • Scribe - Real-time click logs • Hive - SQL queries -> MapReduce jobs • Hadoop • Web analytics warehouse • Distributed datastore • MySQL backupsThursday, February 3, 2011
    • NoSQL Use Cases NewsCred • Twitter • Hadoop - Analytics • Hbase - People search • Scribe - Log collection framework • FlockDB - Social graph analysisThursday, February 3, 2011
    • NoSQL Use Cases NewsCred • Rackspace • Cassandra – stat collection, mail and apps • TrendMicro • Hbase & Hadoop – reputation databases • NewsCred • MongoDB • API usage analytics • Pixel tracking analytics • Entity metadata storageThursday, February 3, 2011
    • Demo NewsCred API Analytics NewsCredThursday, February 3, 2011
    • Part 3 Choosing a NoSQL Solution NewsCredThursday, February 3, 2011
    • Choosing a NoSQL Solution NewsCred Availability Each:client:can:always:read:and:write A RDBMSs Cassandra MySQL: Voldemort PostgreSQL CouchDB Aster:Data CA AP Dynamo GreenPlum SimpleDB Vertica Tokyo:Cabinet Riak C P PartitionDtolerance: Consistency CP All:clients:have:the:same:view:of: The:system:works:well:despite: the:data BigTable Scalaris physical:network:partitions HyperTable Berkeley:DB Hbase Memcache:DB MongoDB RedisThursday, February 3, 2011
    • Consistent, Available (CA) NewsCred CA-systems have trouble with partitions and deal with it with replication. • Examples • MySQL (relational) • Aster Data (relational) • Greenplum (relational) • Vertica (column)Thursday, February 3, 2011
    • Availability, Partition-Tolerant (AP) NewsCred AP-systems have trouble with consistency, achieve “eventual consistency” through replication. • Examples • Cassandra (column/tabular) • Dynamo (key-value) • Voldemort (key-value) • Tokyo Cabinet (key-value) • CouchDB (document) • SimpleDB (document) • Riak (document)Thursday, February 3, 2011
    • Consistent, Partition-Tolerant (CP) NewsCred CP-systems have trouble with availability while keeping data consistent across partitioned nodes. • Examples • MongoDB (document) • BigTable (column/tabular) • HyperTable (column/tabular) • Hbase (column/tabular) • Redis (key-value) • Scalaris (key-value) • MemcacheDB (key-value)Thursday, February 3, 2011
    • Hbase NewsCred Selling point: A Billions of rows, millions of columns Use when you need: Random, real-time access to Big Data C P Written in: Java License: Apache Type: Column/Tabular Protocol: HTTP/REST/Thrift Users: Community Support: Good Yahoo!, Facebook, Microsoft, Adobe, Learning Curve: High StumbleUpon etc.Thursday, February 3, 2011
    • Cassandra NewsCred Selling point: A Best of Google BigTable and Amazon Dynamo Use when you need: To write more than you read (logging) C P Written in: Java License: Apache Type: Column/Tabular Protocol: Custom, binary (Thrift) Users: Community Support: Great Facebook, Twitter, Digg, Reddit, Learning Curve: Medium Rackspace, Cisco, SimpleGeo, Cloudkick etc.Thursday, February 3, 2011
    • Redis NewsCred Selling point: A Blazing fast, in-memory like memcached Use when you need: To manage rapidly changing data C P Written in: C/C++ License: BSD Type: Key-value Protocol: Telnet-like Users: Community Support: Good Github, Craigslist, Stackoverflow, Learning Curve: Low Disqus, The Guardian Uk etc.Thursday, February 3, 2011
    • MongoDB NewsCred Selling point: A Best of NoSQL and RDBMS Use when you need: Dynamic queries and indexing on a Big DB C P Written in: C++ License: AGPL Type: Document Protocol: Custom, binary (BSON) Users: Community Support: Great NewsCred, Foursquare, Github, Sourceforge, Learning Curve: Low The New York Times, Etsy, Shutterfly etc.Thursday, February 3, 2011
    • Part 4 Understanding MongoDB NewsCredThursday, February 3, 2011
    • Understanding MongoDB NewsCred • Database == Database • Table == Collection • Row == DocumentThursday, February 3, 2011
    • Understanding MongoDB NewsCred • Mongo ShellThursday, February 3, 2011
    • Understanding MongoDB NewsCred • INSERTThursday, February 3, 2011
    • Understanding MongoDB NewsCred • SELECT SELECT * FROM users WHERE X = 3 AND Y = abc; db.users.find({X:3, Y: ”abc”}) SELECT * FROM users WHERE X = 3 AND Y = abc ORDER BY X ASC; db.users.find({X:3, Y: ”abc”}).sort({X:1}) SELECT username, email FROM users WHERE X = 3 AND Y = abc; db.users.find({X:3, Y: ”abc”}, {username:true, email:true})Thursday, February 3, 2011
    • Understanding MongoDB NewsCred • UPDATE db.collection.update(criteria, modifier, upsert, multi) criteria : Query which selects the record(s) to update modifier : $set, $inc, $unset, $push, $pop... upsert : Insert if not exists, update otherwise multi : Update multiple docs matching the criteria UPDATE users SET X = 4, Y = abc WHERE username = joegunchy; db.users.update({username:”joegunchy”}, {$set: {X:4, Y:abc}}, true, true)Thursday, February 3, 2011
    • Understanding MongoDB NewsCred • DELETE db.articles.remove({}) /*remove all*/ db.articles.remove({tag:sql}) /*remove all articles with tag = sql*/ db.articles.remove({tag:sql}) /*block other ops while removing*/Thursday, February 3, 2011
    • Understanding MongoDB NewsCred • AGGREGATION > db.users.count() 42 > db.addresses.distinct(zipcode, {city:Dhaka}) [1000, 1100, 1204, 1205....]Thursday, February 3, 2011
    • Understanding MongoDB NewsCred • Map/Reduce • Algorithm introduced by Google for processing large datasets on clusters • MongoDB uses it for: • Aggregation (Group By, Avg, Sum etc.) • Batch processing jobsThursday, February 3, 2011
    • Understanding MongoDB NewsCred • Map/ReduceThursday, February 3, 2011
    • Understanding MongoDB NewsCred • Map/Reduce Example Document We want to do something like...Thursday, February 3, 2011
    • Understanding MongoDB NewsCred • Map/Reduce Example Map ReduceThursday, February 3, 2011
    • Understanding MongoDB NewsCred • Map/Reduce Example ExecuteThursday, February 3, 2011
    • Understanding MongoDB NewsCred • Map/Reduce Example ResultThursday, February 3, 2011
    • Part 5 Building a MongoDB App NewsCredThursday, February 3, 2011
    • Part 6 Scaling with MongoDB NewsCredThursday, February 3, 2011
    • Scaling with MongoDB NewsCred • Scaling is a challenge • No silver bullet • Strategies • Replication • Replica Sets • Auto-shardingThursday, February 3, 2011
    • Scaling with MongoDB NewsCred Replication Master Slave Slave SlaveThursday, February 3, 2011
    • Scaling with MongoDB NewsCred Replica Sets Secondary User Passive PrimaryThursday, February 3, 2011
    • Scaling with MongoDB NewsCred Replica Sets: Election Synced,3ms,ago C Priority,1 A Synced,1ms,ago E Priority,1 Priority 1 B D Priority,0Thursday, February 3, 2011
    • Scaling with MongoDB NewsCred • Replica Sets: Network Partition • Election Process initiated • When a node can’t reach primary • When primary can’t reach majority of nodes in set • New primary is elected by majority of nodes in set • Node with the most recent data gets priority • Arbiter node used to break tiesThursday, February 3, 2011
    • Scaling with MongoDB NewsCred • Auto-sharding • Cluster handles sharding data and rebalancing automatically • No administrative headaches of manual sharding • Application is oblivious to existence of shardsThursday, February 3, 2011
    • Scaling with MongoDB NewsCred Auto-sharding Big$CollectionThursday, February 3, 2011
    • Scaling with MongoDB NewsCred Auto-sharding User Router)Thursday, February 3, 2011
    • Scaling with MongoDB NewsCred Auto-sharding • Connect to a single server • db = connect(‘localhost:27017’) • Connect to a router • db = connect(‘localhost:27017’) User Mongo)DBThursday, February 3, 2011
    • Scaling with MongoDB NewsCred • When to shard? • Running out of disk space • Write intensive • Need to keep large chunk of data in memory • Don’t start out with a sharded collection! • Shard “if and when” you need toThursday, February 3, 2011
    • Scaling with MongoDB NewsCred • Choosing a Shard Key • Incremental • Example: timestamps i.e. ‘created_at’ • Queries on shard key is highly efficient • Random • Example: ‘username’ • Writes are distributed across multiple shardsThursday, February 3, 2011
    • Scaling with MongoDB NewsCred Sharding + Replica Sets User Router P P S S S SThursday, February 3, 2011
    • Questions? NewsCred Iraj Islam iraj@newscred.com, @irajislam Rubayeet Islam rubayeet@newscred.com, @rubayeet Nurul Ferdous nurul@newscred.com, @ferdousThursday, February 3, 2011