0
NoSQL           What it is and is it for you?           Iraj Islam           Rubayeet Islam           Nurul Ferdous       ...
Agenda                                                   NewsCred                    •        Part 1. Why NoSQL?          ...
Who We Are                                   NewsCred                Iraj Islam                CTO/Co-founder, NewsCred   ...
Our Story                                                NewsCred                Launched 2008                Founded by t...
What We Do                     NewsCred                             Domain Expertise                             •   Big D...
Part 1           Why NoSQL?                     NewsCredThursday, February 3, 2011
What’s NoSQL?                                  NewsCred                                   NoSQL                           ...
What’s NoSQL?                                NewsCred                                NoSQL                      Non-relati...
Why NoSQL?                                       NewsCred                                       Web 1.0                   ...
Why NoSQL?                                         NewsCred                                         Web 1.0               ...
Why NoSQL?                                             NewsCred                                         Web 1.0           ...
Why NoSQL?                                             NewsCred                                         Web 1.0           ...
Why NoSQL?                                              NewsCred                                         Web 1.0          ...
Why NoSQL?                                               NewsCred                                         Web 1.0         ...
Why NoSQL?                                                          NewsCred                                    The Age of...
Why NoSQL?                                     NewsCred                                   Web 2.0+                        ...
Why NoSQL?                                       NewsCred                                   Web 2.0+                      ...
Why NoSQL?                                        NewsCred                                   Web 2.0+                     ...
Why NoSQL?                                        NewsCred                                   Web 2.0+                     ...
Why NoSQL?                                           NewsCred                                      Web 2.0+               ...
Why NoSQL?                                           NewsCred                                      Web 2.0+               ...
Why NoSQL?                                                  NewsCred                                      Web 2.0+        ...
Why NoSQL?                               NewsCred                             The MySQL Problem                           ...
Why NoSQL?                                          NewsCred                             The MySQL Problem                ...
Why NoSQL?                               NewsCred                             The MySQL Problem                           ...
Why NoSQL?                                             NewsCred                             The MySQL Problem             ...
Why NoSQL?                                                NewsCred                             The MySQL Problem          ...
Why NoSQL?                                   NewsCred                             The MySQL Problem                       ...
Why NoSQL?                                             NewsCred                             The MySQL Problem             ...
Why NoSQL?                                                NewsCred                             The MySQL Problem          ...
Why NoSQL?                                                  NewsCred                                      Web 2.0+        ...
Why NoSQL?                                             NewsCred                                    The NoSQL Solution     ...
Why NoSQL?                                                NewsCred                                    The NoSQL Solution  ...
Why NoSQL?                                                NewsCred                                    The NoSQL Solution  ...
Why NoSQL?                                                   NewsCred                                    The NoSQL Solutio...
NoSQL vs RDMS                                   NewsCred          NoSQL                           RDBMS          • Schema-...
NoSQL vs RDMS                                       NewsCred          NoSQL                               RDBMS          •...
Is NoSQL For You?                                   NewsCred          NoSQL                               RDBMS          •...
Is NoSQL For You?                                   NewsCred          NoSQL                               RDBMS          •...
Part 2           NoSQL Use Cases                     NewsCredThursday, February 3, 2011
Who’s Using NoSQL?   NewsCredThursday, February 3, 2011
NoSQL Use Cases                    NewsCred                • Consumer Use Cases                        • Facebook         ...
NoSQL Use Cases                                               NewsCred                • Facebook                        • ...
NoSQL Use Cases                                          NewsCred                • Twitter                        • Hadoop...
NoSQL Use Cases                                                NewsCred                • Rackspace                        ...
Demo           NewsCred API Analytics                     NewsCredThursday, February 3, 2011
Part 3           Choosing a NoSQL Solution                     NewsCredThursday, February 3, 2011
Choosing a NoSQL Solution                                                                                                 ...
Consistent, Available (CA)                                 NewsCred                             CA-systems have trouble wi...
Availability, Partition-Tolerant (AP)                    NewsCred                         AP-systems have trouble with con...
Consistent, Partition-Tolerant (CP)                          NewsCred                              CP-systems have trouble...
Hbase                                                                 NewsCred             Selling point:                 ...
Cassandra                                                              NewsCred             Selling point:                ...
Redis                                                                   NewsCred             Selling point:               ...
MongoDB                                                               NewsCred             Selling point:                 ...
Part 4           Understanding MongoDB                     NewsCredThursday, February 3, 2011
Understanding MongoDB            NewsCred                • Database == Database                • Table == Collection      ...
Understanding MongoDB   NewsCred                • Mongo ShellThursday, February 3, 2011
Understanding MongoDB   NewsCred                • INSERTThursday, February 3, 2011
Understanding MongoDB                                                   NewsCred                • SELECT                SE...
Understanding MongoDB                                                                 NewsCred                • UPDATE    ...
Understanding MongoDB                                                              NewsCred                • DELETE       ...
Understanding MongoDB                                          NewsCred                • AGGREGATION                > db.u...
Understanding MongoDB                                                  NewsCred                • Map/Reduce               ...
Understanding MongoDB   NewsCred                • Map/ReduceThursday, February 3, 2011
Understanding MongoDB                       NewsCred                • Map/Reduce Example                  Document        ...
Understanding MongoDB          NewsCred                • Map/Reduce Example                  Map                  ReduceTh...
Understanding MongoDB          NewsCred                • Map/Reduce Example                  ExecuteThursday, February 3, ...
Understanding MongoDB          NewsCred                • Map/Reduce Example                  ResultThursday, February 3, 2...
Part 5           Building a MongoDB App                     NewsCredThursday, February 3, 2011
Part 6           Scaling with MongoDB                     NewsCredThursday, February 3, 2011
Scaling with MongoDB               NewsCred                • Scaling is a challenge                • No silver bullet     ...
Scaling with MongoDB                               NewsCred                                     Replication               ...
Scaling with MongoDB                            NewsCred                                    Replica Sets                  ...
Scaling with MongoDB                                           NewsCred                                 Replica Sets: Elec...
Scaling with MongoDB                                                           NewsCred                • Replica Sets: Net...
Scaling with MongoDB                                                     NewsCred                • Auto-sharding          ...
Scaling with MongoDB                                  NewsCred                                              Auto-sharding ...
Scaling with MongoDB                 NewsCred                             Auto-sharding                                   ...
Scaling with MongoDB                                      NewsCred                                             Auto-shardi...
Scaling with MongoDB                                                NewsCred                • When to shard?              ...
Scaling with MongoDB                                                      NewsCred                • Choosing a Shard Key  ...
Scaling with MongoDB                                  NewsCred                             Sharding + Replica Sets        ...
Questions?                                 NewsCred                Iraj Islam                iraj@newscred.com, @irajislam...
Upcoming SlideShare
Loading in...5
×

NoSQL! is it for you?

5,910

Published on

A detailed analysis on various NoSQL solutions based on CAP theorem. This talk was given my me at BASIS softexpo

Published in: Technology, Education
0 Comments
19 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,910
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
398
Comments
0
Likes
19
Embeds 0
No embeds

No notes for slide

Transcript of "NoSQL! is it for you?"

  1. 1. NoSQL What it is and is it for you? Iraj Islam Rubayeet Islam Nurul Ferdous NewsCredThursday, February 3, 2011
  2. 2. Agenda NewsCred • Part 1. Why NoSQL? • Part 2. NoSQL Use Cases • Part 3. Choosing a NoSQL Solution • Part 4. Understanding MongoDB • Part 5. Building a MongoDB App • Part 6. Scaling MongoDB • QuestionsThursday, February 3, 2011
  3. 3. Who We Are NewsCred Iraj Islam CTO/Co-founder, NewsCred Rubayeet Islam Senior Software Engineer, NewsCred Nurul Ferdous Senior Software Engineer, NewsCredThursday, February 3, 2011
  4. 4. Our Story NewsCred Launched 2008 Founded by two Bangladeshis 2008 Funded By Investors of Twitter Floodgate Ventures (twitter), Bessemer Cap. (LinkedIn) Top-tier Clients Yahoo! Orange Telecom, Harvard U, The Daily Star etc.Thursday, February 3, 2011
  5. 5. What We Do NewsCred Domain Expertise • Big Data • Information Retrieval • Machine Learning • Semantic Web Technologies • Apache Solr • MySQL/MongoDB • Python/JavaThursday, February 3, 2011
  6. 6. Part 1 Why NoSQL? NewsCredThursday, February 3, 2011
  7. 7. What’s NoSQL? NewsCred NoSQL What’s with the weird name?Thursday, February 3, 2011
  8. 8. What’s NoSQL? NewsCred NoSQL Non-relational, web-scale database.Thursday, February 3, 2011
  9. 9. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing ModelThursday, February 3, 2011
  10. 10. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Textual ContentThursday, February 3, 2011
  11. 11. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Small Data Textual ContentThursday, February 3, 2011
  12. 12. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual ContentThursday, February 3, 2011
  13. 13. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual Content SearchThursday, February 3, 2011
  14. 14. Why NoSQL? NewsCred Web 1.0 The read intensive web Publishing Model Browsing Small Data Textual Content Personal Computer SearchThursday, February 3, 2011
  15. 15. Why NoSQL? NewsCred The Age of Big Data Exabytes (1018) of data stored per year 1000 750 500 250 2006 2007 2008 0 2009 2010Thursday, February 3, 2011
  16. 16. Why NoSQL? NewsCred Web 2.0+ The write intensive webThursday, February 3, 2011
  17. 17. Why NoSQL? NewsCred Web 2.0+ The write intensive web User-generated ContentThursday, February 3, 2011
  18. 18. Why NoSQL? NewsCred Web 2.0+ The write intensive web Big Data User-generated ContentThursday, February 3, 2011
  19. 19. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Big Data User-generated ContentThursday, February 3, 2011
  20. 20. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Big Data Semantic Web User-generated ContentThursday, February 3, 2011
  21. 21. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web User-generated ContentThursday, February 3, 2011
  22. 22. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web Ubiquity User-generated Content Any device. Anywhere.Thursday, February 3, 2011
  23. 23. Why NoSQL? NewsCred The MySQL Problem 1. Default Application Data Source Writing MySQL User ReadingThursday, February 3, 2011
  24. 24. Why NoSQL? NewsCred The MySQL Problem 1. Default Application Bottleneck, too much load! Data Source Writing MySQL User ReadingThursday, February 3, 2011
  25. 25. Why NoSQL? NewsCred The MySQL Problem 2. Replication Application Data Source Writing MySQL Master User Reading MySQL SlavesThursday, February 3, 2011
  26. 26. Why NoSQL? NewsCred The MySQL Problem 2. Replication Application Data Source Writing MySQL Master User Reading MySQL Slaves Scalable Reads!Thursday, February 3, 2011
  27. 27. Why NoSQL? NewsCred The MySQL Problem 2. Replication Bottleneck, writes won’t scale! Application Data Source Writing MySQL Master User Reading MySQL Slaves Scalable Reads!Thursday, February 3, 2011
  28. 28. Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Data Source Writing S MySQL User Reading SThursday, February 3, 2011
  29. 29. Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Great, scalable writes! Data Source Writing S MySQL User Reading SThursday, February 3, 2011
  30. 30. Why NoSQL? NewsCred The MySQL Problem 3. Sharding Application Great, scalable writes! Data Source Writing S MySQL User Reading S Development and maintenance costs just skyrocketed!Thursday, February 3, 2011
  31. 31. Why NoSQL? NewsCred Web 2.0+ The write intensive web Semi-structured Data Real-time Big Data Semantic Web Ubiquity User-generated Content Any device. Anywhere.Thursday, February 3, 2011
  32. 32. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-freeThursday, February 3, 2011
  33. 33. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writesThursday, February 3, 2011
  34. 34. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writes Real-time >> High-performanceThursday, February 3, 2011
  35. 35. Why NoSQL? NewsCred The NoSQL Solution Design Goals Semi-structure >> Schema-free Big Data >> Scalable reads/writes Real-time >> High-performance Ubiquity >> High-availabilityThursday, February 3, 2011
  36. 36. NoSQL vs RDMS NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availabilityThursday, February 3, 2011
  37. 37. NoSQL vs RDMS NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systemsThursday, February 3, 2011
  38. 38. Is NoSQL For You? NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systemsThursday, February 3, 2011
  39. 39. Is NoSQL For You? NewsCred NoSQL RDBMS • Schema-free • Relational schema • Scalable writes/reads • Scalable reads vs • Auto high-availability • Custom high-availability • Limited queries • Flexible queries • Eventual Consistency * • Consistency • BASE • ACID * Applies to most NoSQL systemsThursday, February 3, 2011
  40. 40. Part 2 NoSQL Use Cases NewsCredThursday, February 3, 2011
  41. 41. Who’s Using NoSQL? NewsCredThursday, February 3, 2011
  42. 42. NoSQL Use Cases NewsCred • Consumer Use Cases • Facebook • Twitter • NetFlix • Enterprise Use Cases • Rackspace • TrendMicro • NewsCredThursday, February 3, 2011
  43. 43. NoSQL Use Cases NewsCred • Facebook • Hbase - Facebook messages • Scribe - Real-time click logs • Hive - SQL queries -> MapReduce jobs • Hadoop • Web analytics warehouse • Distributed datastore • MySQL backupsThursday, February 3, 2011
  44. 44. NoSQL Use Cases NewsCred • Twitter • Hadoop - Analytics • Hbase - People search • Scribe - Log collection framework • FlockDB - Social graph analysisThursday, February 3, 2011
  45. 45. NoSQL Use Cases NewsCred • Rackspace • Cassandra – stat collection, mail and apps • TrendMicro • Hbase & Hadoop – reputation databases • NewsCred • MongoDB • API usage analytics • Pixel tracking analytics • Entity metadata storageThursday, February 3, 2011
  46. 46. Demo NewsCred API Analytics NewsCredThursday, February 3, 2011
  47. 47. Part 3 Choosing a NoSQL Solution NewsCredThursday, February 3, 2011
  48. 48. Choosing a NoSQL Solution NewsCred Availability Each:client:can:always:read:and:write A RDBMSs Cassandra MySQL: Voldemort PostgreSQL CouchDB Aster:Data CA AP Dynamo GreenPlum SimpleDB Vertica Tokyo:Cabinet Riak C P PartitionDtolerance: Consistency CP All:clients:have:the:same:view:of: The:system:works:well:despite: the:data BigTable Scalaris physical:network:partitions HyperTable Berkeley:DB Hbase Memcache:DB MongoDB RedisThursday, February 3, 2011
  49. 49. Consistent, Available (CA) NewsCred CA-systems have trouble with partitions and deal with it with replication. • Examples • MySQL (relational) • Aster Data (relational) • Greenplum (relational) • Vertica (column)Thursday, February 3, 2011
  50. 50. Availability, Partition-Tolerant (AP) NewsCred AP-systems have trouble with consistency, achieve “eventual consistency” through replication. • Examples • Cassandra (column/tabular) • Dynamo (key-value) • Voldemort (key-value) • Tokyo Cabinet (key-value) • CouchDB (document) • SimpleDB (document) • Riak (document)Thursday, February 3, 2011
  51. 51. Consistent, Partition-Tolerant (CP) NewsCred CP-systems have trouble with availability while keeping data consistent across partitioned nodes. • Examples • MongoDB (document) • BigTable (column/tabular) • HyperTable (column/tabular) • Hbase (column/tabular) • Redis (key-value) • Scalaris (key-value) • MemcacheDB (key-value)Thursday, February 3, 2011
  52. 52. Hbase NewsCred Selling point: A Billions of rows, millions of columns Use when you need: Random, real-time access to Big Data C P Written in: Java License: Apache Type: Column/Tabular Protocol: HTTP/REST/Thrift Users: Community Support: Good Yahoo!, Facebook, Microsoft, Adobe, Learning Curve: High StumbleUpon etc.Thursday, February 3, 2011
  53. 53. Cassandra NewsCred Selling point: A Best of Google BigTable and Amazon Dynamo Use when you need: To write more than you read (logging) C P Written in: Java License: Apache Type: Column/Tabular Protocol: Custom, binary (Thrift) Users: Community Support: Great Facebook, Twitter, Digg, Reddit, Learning Curve: Medium Rackspace, Cisco, SimpleGeo, Cloudkick etc.Thursday, February 3, 2011
  54. 54. Redis NewsCred Selling point: A Blazing fast, in-memory like memcached Use when you need: To manage rapidly changing data C P Written in: C/C++ License: BSD Type: Key-value Protocol: Telnet-like Users: Community Support: Good Github, Craigslist, Stackoverflow, Learning Curve: Low Disqus, The Guardian Uk etc.Thursday, February 3, 2011
  55. 55. MongoDB NewsCred Selling point: A Best of NoSQL and RDBMS Use when you need: Dynamic queries and indexing on a Big DB C P Written in: C++ License: AGPL Type: Document Protocol: Custom, binary (BSON) Users: Community Support: Great NewsCred, Foursquare, Github, Sourceforge, Learning Curve: Low The New York Times, Etsy, Shutterfly etc.Thursday, February 3, 2011
  56. 56. Part 4 Understanding MongoDB NewsCredThursday, February 3, 2011
  57. 57. Understanding MongoDB NewsCred • Database == Database • Table == Collection • Row == DocumentThursday, February 3, 2011
  58. 58. Understanding MongoDB NewsCred • Mongo ShellThursday, February 3, 2011
  59. 59. Understanding MongoDB NewsCred • INSERTThursday, February 3, 2011
  60. 60. Understanding MongoDB NewsCred • SELECT SELECT * FROM users WHERE X = 3 AND Y = abc; db.users.find({X:3, Y: ”abc”}) SELECT * FROM users WHERE X = 3 AND Y = abc ORDER BY X ASC; db.users.find({X:3, Y: ”abc”}).sort({X:1}) SELECT username, email FROM users WHERE X = 3 AND Y = abc; db.users.find({X:3, Y: ”abc”}, {username:true, email:true})Thursday, February 3, 2011
  61. 61. Understanding MongoDB NewsCred • UPDATE db.collection.update(criteria, modifier, upsert, multi) criteria : Query which selects the record(s) to update modifier : $set, $inc, $unset, $push, $pop... upsert : Insert if not exists, update otherwise multi : Update multiple docs matching the criteria UPDATE users SET X = 4, Y = abc WHERE username = joegunchy; db.users.update({username:”joegunchy”}, {$set: {X:4, Y:abc}}, true, true)Thursday, February 3, 2011
  62. 62. Understanding MongoDB NewsCred • DELETE db.articles.remove({}) /*remove all*/ db.articles.remove({tag:sql}) /*remove all articles with tag = sql*/ db.articles.remove({tag:sql}) /*block other ops while removing*/Thursday, February 3, 2011
  63. 63. Understanding MongoDB NewsCred • AGGREGATION > db.users.count() 42 > db.addresses.distinct(zipcode, {city:Dhaka}) [1000, 1100, 1204, 1205....]Thursday, February 3, 2011
  64. 64. Understanding MongoDB NewsCred • Map/Reduce • Algorithm introduced by Google for processing large datasets on clusters • MongoDB uses it for: • Aggregation (Group By, Avg, Sum etc.) • Batch processing jobsThursday, February 3, 2011
  65. 65. Understanding MongoDB NewsCred • Map/ReduceThursday, February 3, 2011
  66. 66. Understanding MongoDB NewsCred • Map/Reduce Example Document We want to do something like...Thursday, February 3, 2011
  67. 67. Understanding MongoDB NewsCred • Map/Reduce Example Map ReduceThursday, February 3, 2011
  68. 68. Understanding MongoDB NewsCred • Map/Reduce Example ExecuteThursday, February 3, 2011
  69. 69. Understanding MongoDB NewsCred • Map/Reduce Example ResultThursday, February 3, 2011
  70. 70. Part 5 Building a MongoDB App NewsCredThursday, February 3, 2011
  71. 71. Part 6 Scaling with MongoDB NewsCredThursday, February 3, 2011
  72. 72. Scaling with MongoDB NewsCred • Scaling is a challenge • No silver bullet • Strategies • Replication • Replica Sets • Auto-shardingThursday, February 3, 2011
  73. 73. Scaling with MongoDB NewsCred Replication Master Slave Slave SlaveThursday, February 3, 2011
  74. 74. Scaling with MongoDB NewsCred Replica Sets Secondary User Passive PrimaryThursday, February 3, 2011
  75. 75. Scaling with MongoDB NewsCred Replica Sets: Election Synced,3ms,ago C Priority,1 A Synced,1ms,ago E Priority,1 Priority 1 B D Priority,0Thursday, February 3, 2011
  76. 76. Scaling with MongoDB NewsCred • Replica Sets: Network Partition • Election Process initiated • When a node can’t reach primary • When primary can’t reach majority of nodes in set • New primary is elected by majority of nodes in set • Node with the most recent data gets priority • Arbiter node used to break tiesThursday, February 3, 2011
  77. 77. Scaling with MongoDB NewsCred • Auto-sharding • Cluster handles sharding data and rebalancing automatically • No administrative headaches of manual sharding • Application is oblivious to existence of shardsThursday, February 3, 2011
  78. 78. Scaling with MongoDB NewsCred Auto-sharding Big$CollectionThursday, February 3, 2011
  79. 79. Scaling with MongoDB NewsCred Auto-sharding User Router)Thursday, February 3, 2011
  80. 80. Scaling with MongoDB NewsCred Auto-sharding • Connect to a single server • db = connect(‘localhost:27017’) • Connect to a router • db = connect(‘localhost:27017’) User Mongo)DBThursday, February 3, 2011
  81. 81. Scaling with MongoDB NewsCred • When to shard? • Running out of disk space • Write intensive • Need to keep large chunk of data in memory • Don’t start out with a sharded collection! • Shard “if and when” you need toThursday, February 3, 2011
  82. 82. Scaling with MongoDB NewsCred • Choosing a Shard Key • Incremental • Example: timestamps i.e. ‘created_at’ • Queries on shard key is highly efficient • Random • Example: ‘username’ • Writes are distributed across multiple shardsThursday, February 3, 2011
  83. 83. Scaling with MongoDB NewsCred Sharding + Replica Sets User Router P P S S S SThursday, February 3, 2011
  84. 84. Questions? NewsCred Iraj Islam iraj@newscred.com, @irajislam Rubayeet Islam rubayeet@newscred.com, @rubayeet Nurul Ferdous nurul@newscred.com, @ferdousThursday, February 3, 2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×