NoSQL! is it for you?
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

NoSQL! is it for you?

on

  • 5,747 views

A detailed analysis on various NoSQL solutions based on CAP theorem. This talk was given my me at BASIS softexpo

A detailed analysis on various NoSQL solutions based on CAP theorem. This talk was given my me at BASIS softexpo

Statistics

Views

Total Views
5,747
Views on SlideShare
5,740
Embed Views
7

Actions

Likes
18
Downloads
374
Comments
0

4 Embeds 7

http://www.linkedin.com 3
http://www.scoop.it 2
http://blog.houen.net 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

NoSQL! is it for you? Presentation Transcript

  • 1. NoSQL What it is and is it for you? Iraj Islam Rubayeet Islam Nurul Ferdous NewsCredThursday, February 3, 2011
  • 2. Agenda NewsCred • Part 1. Why NoSQL? • Part 2. NoSQL Use Cases • Part 3. Choosing a NoSQL Solution • Part 4. Understanding MongoDB • Part 5. Building a MongoDB App • Part 6. Scaling MongoDB • QuestionsThursday, February 3, 2011
  • 3. Who We Are NewsCred Iraj Islam CTO/Co-founder, NewsCred Rubayeet Islam Senior Software Engineer, NewsCred Nurul Ferdous Senior Software Engineer, NewsCredThursday, February 3, 2011
  • 4. Our Story NewsCred Launched 2008 Founded by two Bangladeshis 2008 Funded By Investors of Twitter Floodgate Ventures (twitter), Bessemer Cap. (LinkedIn) Top-tier Clients Yahoo! Orange Telecom, Harvard U, The Daily Star etc.Thursday, February 3, 2011
  • 5. What We Do NewsCred Domain Expertise • Big Data • Information Retrieval • Machine Learning • Semantic Web Technologies • Apache Solr • MySQL/MongoDB • Python/JavaThursday, February 3, 2011
  • 6. Part 1 Why NoSQL? NewsCredThursday, February 3, 2011
  • 7. What’s NoSQL? NewsCred NoSQL What’s with the weird name?Thursday, February 3, 2011
  • 8. What’s NoSQL? NewsCred NoSQL Non-relational, web-scale database.Thursday, February 3, 2011
  • 9. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing ModelThursday, February 3, 2011
  • 10. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Textual ContentThursday, February 3, 2011
  • 11. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Small Data Textual ContentThursday, February 3, 2011
  • 12. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual ContentThursday, February 3, 2011
  • 13. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual Content SearchThursday, February 3, 2011
  • 14. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual Content Personal Computer SearchThursday, February 3, 2011
  • 15. Why NoSQL? NewsCred The Age of Big Data Exabytes (1018) of data stored per year 1000 750 500 250 2006 2007 2008 0 2009 2010Thursday, February 3, 2011
  • 16. Why NoSQL? NewsCred Web 2.0+ The write intensive webThursday, February 3, 2011
  • 17. Why NoSQL? NewsCred Web 2.0+ The write intensive web User-generated ContentThursday, February 3, 2011
  • 18. Why NoSQL? NewsCred Web 2.0+ The write intensive web Big Data User-generated ContentThursday, February 3, 2011
  • 19. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Big Data User-generated ContentThursday, February 3, 2011
  • 20. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Big Data Semantic Web User-generated ContentThursday, February 3, 2011
  • 21. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web User-generated ContentThursday, February 3, 2011
  • 22. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web Ubiquity User-generated Content Any device. Anywhere.Thursday, February 3, 2011
  • 23. Why NoSQL? NewsCred The MySQL Problem 1. Default Application Data Source Writing MySQL User ReadingThursday, February 3, 2011
  • 24. Why NoSQL? NewsCred The MySQL Problem 1. Default Application Bottleneck, too much load! Data Source Writing MySQL User ReadingThursday, February 3, 2011
  • 25. Why NoSQL? NewsCred The MySQL Problem 2. Replication Application Data Source Writing MySQL Master User Reading MySQL SlavesThursday, February 3, 2011
  • 26. Why NoSQL? NewsCred The MySQL Problem 2. Replication Application Data Source Writing MySQL Master User Reading MySQL Slaves Scalable Reads!Thursday, February 3, 2011
  • 27. Why NoSQL? NewsCred The MySQL Problem 2. Replication Bottleneck, writes won’t scale! Application Data Source Writing MySQL Master User Reading MySQL Slaves Scalable Reads!Thursday, February 3, 2011
  • 28. Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Data Source Writing S MySQL User Reading SThursday, February 3, 2011
  • 29. Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Great, scalable writes! Data Source Writing S MySQL User Reading SThursday, February 3, 2011
  • 30. Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Great, scalable writes! Data Source Writing S MySQL User Reading S Development and maintenance costs just skyrocketed!Thursday, February 3, 2011
  • 31. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web Ubiquity User-generated Content Any device. Anywhere.Thursday, February 3, 2011
  • 32. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-freeThursday, February 3, 2011
  • 33. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writesThursday, February 3, 2011
  • 34. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writes Real-time >> High-performanceThursday, February 3, 2011
  • 35. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writes Real-time >> High-performance Ubiquity >> High-availabilityThursday, February 3, 2011
  • 36. NoSQL vs RDMS NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availabilityThursday, February 3, 2011
  • 37. NoSQL vs RDMS NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systemsThursday, February 3, 2011
  • 38. Is NoSQL For You? NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systemsThursday, February 3, 2011
  • 39. Is NoSQL For You? NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systemsThursday, February 3, 2011
  • 40. Part 2 NoSQL Use Cases NewsCredThursday, February 3, 2011
  • 41. Who’s Using NoSQL? NewsCredThursday, February 3, 2011
  • 42. NoSQL Use Cases NewsCred • Consumer Use Cases • Facebook • Twitter • NetFlix • Enterprise Use Cases • Rackspace • TrendMicro • NewsCredThursday, February 3, 2011
  • 43. NoSQL Use Cases NewsCred • Facebook • Hbase - Facebook messages • Scribe - Real-time click logs • Hive - SQL queries -> MapReduce jobs • Hadoop • Web analytics warehouse • Distributed datastore • MySQL backupsThursday, February 3, 2011
  • 44. NoSQL Use Cases NewsCred • Twitter • Hadoop - Analytics • Hbase - People search • Scribe - Log collection framework • FlockDB - Social graph analysisThursday, February 3, 2011
  • 45. NoSQL Use Cases NewsCred • Rackspace • Cassandra – stat collection, mail and apps • TrendMicro • Hbase & Hadoop – reputation databases • NewsCred • MongoDB • API usage analytics • Pixel tracking analytics • Entity metadata storageThursday, February 3, 2011
  • 46. Demo NewsCred API Analytics NewsCredThursday, February 3, 2011
  • 47. Part 3 Choosing a NoSQL Solution NewsCredThursday, February 3, 2011
  • 48. Choosing a NoSQL Solution NewsCred Availability Each:client:can:always:read:and:write A RDBMSs Cassandra MySQL: Voldemort PostgreSQL CouchDB Aster:Data CA AP Dynamo GreenPlum SimpleDB Vertica Tokyo:Cabinet Riak C P PartitionDtolerance: Consistency CP All:clients:have:the:same:view:of: The:system:works:well:despite: the:data BigTable Scalaris physical:network:partitions HyperTable Berkeley:DB Hbase Memcache:DB MongoDB RedisThursday, February 3, 2011
  • 49. Consistent, Available (CA) NewsCred CA-systems have trouble with partitions and deal with it with replication. • Examples • MySQL (relational) • Aster Data (relational) • Greenplum (relational) • Vertica (column)Thursday, February 3, 2011
  • 50. Availability, Partition-Tolerant (AP) NewsCred AP-systems have trouble with consistency, achieve “eventual consistency” through replication. • Examples • Cassandra (column/tabular) • Dynamo (key-value) • Voldemort (key-value) • Tokyo Cabinet (key-value) • CouchDB (document) • SimpleDB (document) • Riak (document)Thursday, February 3, 2011
  • 51. Consistent, Partition-Tolerant (CP) NewsCred CP-systems have trouble with availability while keeping data consistent across partitioned nodes. • Examples • MongoDB (document) • BigTable (column/tabular) • HyperTable (column/tabular) • Hbase (column/tabular) • Redis (key-value) • Scalaris (key-value) • MemcacheDB (key-value)Thursday, February 3, 2011
  • 52. Hbase NewsCred Selling point: A Billions of rows, millions of columns Use when you need: Random, real-time access to Big Data C P Written in: Java License: Apache Type: Column/Tabular Protocol: HTTP/REST/Thrift Users: Community Support: Good Yahoo!, Facebook, Microsoft, Adobe, Learning Curve: High StumbleUpon etc.Thursday, February 3, 2011
  • 53. Cassandra NewsCred Selling point: A Best of Google BigTable and Amazon Dynamo Use when you need: To write more than you read (logging) C P Written in: Java License: Apache Type: Column/Tabular Protocol: Custom, binary (Thrift) Users: Community Support: Great Facebook, Twitter, Digg, Reddit, Learning Curve: Medium Rackspace, Cisco, SimpleGeo, Cloudkick etc.Thursday, February 3, 2011
  • 54. Redis NewsCred Selling point: A Blazing fast, in-memory like memcached Use when you need: To manage rapidly changing data C P Written in: C/C++ License: BSD Type: Key-value Protocol: Telnet-like Users: Community Support: Good Github, Craigslist, Stackoverflow, Learning Curve: Low Disqus, The Guardian Uk etc.Thursday, February 3, 2011
  • 55. MongoDB NewsCred Selling point: A Best of NoSQL and RDBMS Use when you need: Dynamic queries and indexing on a Big DB C P Written in: C++ License: AGPL Type: Document Protocol: Custom, binary (BSON) Users: Community Support: Great NewsCred, Foursquare, Github, Sourceforge, Learning Curve: Low The New York Times, Etsy, Shutterfly etc.Thursday, February 3, 2011
  • 56. Part 4 Understanding MongoDB NewsCredThursday, February 3, 2011
  • 57. Understanding MongoDB NewsCred • Database == Database • Table == Collection • Row == DocumentThursday, February 3, 2011
  • 58. Understanding MongoDB NewsCred • Mongo ShellThursday, February 3, 2011
  • 59. Understanding MongoDB NewsCred • INSERTThursday, February 3, 2011
  • 60. Understanding MongoDB NewsCred • SELECT SELECT * FROM users WHERE X = 3 AND Y = abc; db.users.find({X:3, Y: ”abc”}) SELECT * FROM users WHERE X = 3 AND Y = abc ORDER BY X ASC; db.users.find({X:3, Y: ”abc”}).sort({X:1}) SELECT username, email FROM users WHERE X = 3 AND Y = abc; db.users.find({X:3, Y: ”abc”}, {username:true, email:true})Thursday, February 3, 2011
  • 61. Understanding MongoDB NewsCred • UPDATE db.collection.update(criteria, modifier, upsert, multi) criteria : Query which selects the record(s) to update modifier : $set, $inc, $unset, $push, $pop... upsert : Insert if not exists, update otherwise multi : Update multiple docs matching the criteria UPDATE users SET X = 4, Y = abc WHERE username = joegunchy; db.users.update({username:”joegunchy”}, {$set: {X:4, Y:abc}}, true, true)Thursday, February 3, 2011
  • 62. Understanding MongoDB NewsCred • DELETE db.articles.remove({}) /*remove all*/ db.articles.remove({tag:sql}) /*remove all articles with tag = sql*/ db.articles.remove({tag:sql}) /*block other ops while removing*/Thursday, February 3, 2011
  • 63. Understanding MongoDB NewsCred • AGGREGATION > db.users.count() 42 > db.addresses.distinct(zipcode, {city:Dhaka}) [1000, 1100, 1204, 1205....]Thursday, February 3, 2011
  • 64. Understanding MongoDB NewsCred • Map/Reduce • Algorithm introduced by Google for processing large datasets on clusters • MongoDB uses it for: • Aggregation (Group By, Avg, Sum etc.) • Batch processing jobsThursday, February 3, 2011
  • 65. Understanding MongoDB NewsCred • Map/ReduceThursday, February 3, 2011
  • 66. Understanding MongoDB NewsCred • Map/Reduce Example Document We want to do something like...Thursday, February 3, 2011
  • 67. Understanding MongoDB NewsCred • Map/Reduce Example Map ReduceThursday, February 3, 2011
  • 68. Understanding MongoDB NewsCred • Map/Reduce Example ExecuteThursday, February 3, 2011
  • 69. Understanding MongoDB NewsCred • Map/Reduce Example ResultThursday, February 3, 2011
  • 70. Part 5 Building a MongoDB App NewsCredThursday, February 3, 2011
  • 71. Part 6 Scaling with MongoDB NewsCredThursday, February 3, 2011
  • 72. Scaling with MongoDB NewsCred • Scaling is a challenge • No silver bullet • Strategies • Replication • Replica Sets • Auto-shardingThursday, February 3, 2011
  • 73. Scaling with MongoDB NewsCred Replication Master Slave Slave SlaveThursday, February 3, 2011
  • 74. Scaling with MongoDB NewsCred Replica Sets Secondary User Passive PrimaryThursday, February 3, 2011
  • 75. Scaling with MongoDB NewsCred Replica Sets: Election Synced,3ms,ago C Priority,1 A Synced,1ms,ago E Priority,1 Priority 1 B D Priority,0Thursday, February 3, 2011
  • 76. Scaling with MongoDB NewsCred • Replica Sets: Network Partition • Election Process initiated • When a node can’t reach primary • When primary can’t reach majority of nodes in set • New primary is elected by majority of nodes in set • Node with the most recent data gets priority • Arbiter node used to break tiesThursday, February 3, 2011
  • 77. Scaling with MongoDB NewsCred • Auto-sharding • Cluster handles sharding data and rebalancing automatically • No administrative headaches of manual sharding • Application is oblivious to existence of shardsThursday, February 3, 2011
  • 78. Scaling with MongoDB NewsCred Auto-sharding Big$CollectionThursday, February 3, 2011
  • 79. Scaling with MongoDB NewsCred Auto-sharding User Router)Thursday, February 3, 2011
  • 80. Scaling with MongoDB NewsCred Auto-sharding • Connect to a single server • db = connect(‘localhost:27017’) • Connect to a router • db = connect(‘localhost:27017’) User Mongo)DBThursday, February 3, 2011
  • 81. Scaling with MongoDB NewsCred • When to shard? • Running out of disk space • Write intensive • Need to keep large chunk of data in memory • Don’t start out with a sharded collection! • Shard “if and when” you need toThursday, February 3, 2011
  • 82. Scaling with MongoDB NewsCred • Choosing a Shard Key • Incremental • Example: timestamps i.e. ‘created_at’ • Queries on shard key is highly efficient • Random • Example: ‘username’ • Writes are distributed across multiple shardsThursday, February 3, 2011
  • 83. Scaling with MongoDB NewsCred Sharding + Replica Sets User Router P P S S S SThursday, February 3, 2011
  • 84. Questions? NewsCred Iraj Islam iraj@newscred.com, @irajislam Rubayeet Islam rubayeet@newscred.com, @rubayeet Nurul Ferdous nurul@newscred.com, @ferdousThursday, February 3, 2011