Navigating the NoSQLLandscape                       Tugdual “tug” Grall                       Technology Evangel          ...
What we’ll talk about• Why RDBMS are not enough?• What are the different NoSQL taxonomies?• Which “NoSQL” is right for me?...
Growth is the New Reality• Instagram gained nearly 1 million users overnight when they  expanded to Android               ...
Does it work with RDBMS backend?                                                             Application Scales Out       ...
Some alternative to scale out your RDBMS                          Scale out your RDBMS                          • Run many...
Scale Out with RDBMS                                               Is this a good approach to scale?                      ...
Lacking market solutions, users forced to invent   Bigtable                Dynamo                    Cassandra            ...
Survey: Schema inflexibility #1 adoption driver           What is the biggest data management problem           driving yo...
NoSQL database matches application logic tier architectureData layer now scales with linear cost and constant performance....
NOSQL TAXONOMY                 10
The Key-Value Store – the foundation of NoSQL                  Key                   101100101000100010011101             ...
Memcached – the NoSQL precursor   Key    101100101000100010011101               Memcached    101100101000100010011101    1...
Database         Cache     (memory/disk)   (memory only)                          Memcached                               ...
Redis – More “Structured Data” commands   Key   101100101000100010011101                Redis   101100101000100010011101  ...
NoSQL catalog                Key-Value   Data Structure(memory only)   Cache                Memcached        Redis(memory/...
Membase – From key-value cache to database   Key   101100101000100010011101             Membase   101100101000100010011101...
NoSQL catalog                Key-Value   Data Structure(memory only)   Cache                Memcached        Redis(memory/...
Couchbase – document-oriented database  Key                                                Couchbase   {       “string” : ...
NoSQL catalog                Key-Value   Data Structure   Document(memory only)   Cache                Memcached        Re...
MongoDB – Document-oriented database  Key                                            MongoDB  {      “string” : “string”, ...
NoSQL catalog                Key-Value   Data Structure   Document(memory only)   Cache                Memcached        Re...
Cassandra – Column overlays                Key                101100101000100010011101                                    ...
NoSQL catalog                Key-Value   Data Structure   Document    Column(memory only)   Cache                Memcached...
Neo4j – Graph database                       Key                        101100101000100010011101                        10...
NoSQL catalog                Key-Value   Data Structure   Document    Column      Graph(memory only)   Cache              ...
NoSQL catalog                Key-Value   Data Structure   Document    Column       Graph(memory only)   Cache             ...
27
What about Hadoop?                     28
Hadoop : Big Data Swiss Army Knife•   Oozie: Workflow, coordination•   Sqoop : Data connector to import/export data•   Hiv...
So what? Hadoop & Couchbase                           40 milliseconds to respond                           with the decisi...
WHICH ONE IS RIGHT FOR ME ?                              31
Survey: Schema inflexibility #1 adoption driver           What is the biggest data management problem           driving yo...
Lack of Flexibility / Rigid Schema• Aggregate Data Models (Martin Fowler)   – Flexible Data Structure   – Optimized Access...
Use Cases  Key Value                 • Session Management                            • User Profile/Preferences           ...
Production Environment                         EMEA DC            US DATA            CENTER                               ...
Scale out your data• Modify cluster topology should be simple   – Add, Remove, Configure Nodes on a running system• What i...
Add Nodes                  APP SERVER 1                                APP SERVER 2                                       ...
Fail Over Node                  APP SERVER 1                                APP SERVER 2                                  ...
Performance• What is my working set?• How cache is working?   – Put your data in RAM• How to design my data model?   – Agg...
40
Management and Monitoring• Do not forget about Operations!   – Service Reliability Engineering Team will thank you!• Manag...
Conclusion• One Size Does Not Fit All• Overview of the the NoSQL types• Choose the right solution   – Developer Productivi...
QUESTIONS?             43
Simple. Fast. Elastic. NoSQL.Couchbase automatically distributes data across commodity servers. Built-in caching enables  ...
Upcoming SlideShare
Loading in …5
×

CCB12 Navigating the NoSQL Ladnscape

556 views
514 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
556
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • So many applications are still using RDBMS, so how do they scale? On the middleware, and this is true for all technologies (Ruby, PHP, Java.Net, …) we use clustered servers. Each time we need more resources we can install new servers behind the Web Loadbalancer… We add new Commodity Web ServersThis allows us to manage the growth easily : Scale Out approachWhat about the database?With classical relational database the easiest way is to use bigger servers “Scale Up” Usually a very expensive approach : Application is still limited by the size of the server…. So architects and developer have searched for alternatives …
  • Key Value Store: - All you can do: lookup the data by the key
  • Simple and fast “and dumb”Really to useNot a database it is a caching layer.“You have no idea about what is in the value” this is a limitationIt is a little too simple to be interesting
  • Combination of the data: - log, event, user profile, campaigns: hadoop is used to put the data in the “operational store” that is couchbase in this example.- Add targetting, recommendations…Integration with existing data (for example data coming from a RDBMS you may need to take a combination of multiple data and then push back to Couchbase…)Couchbase contains the “stuff” you are serving to the user- Hadoop got everything/all the data
  • CCB12 Navigating the NoSQL Ladnscape

    1. 1. Navigating the NoSQLLandscape Tugdual “tug” Grall Technology Evangel @tgrall 1
    2. 2. What we’ll talk about• Why RDBMS are not enough?• What are the different NoSQL taxonomies?• Which “NoSQL” is right for me? 2
    3. 3. Growth is the New Reality• Instagram gained nearly 1 million users overnight when they expanded to Android 3
    4. 4. Does it work with RDBMS backend? Application Scales Out Just add more commodity web servers Database Scales Up Get a bigger, more complex server Note – Relational database technology is great for what it is great for, but it is not great for this. 4
    5. 5. Some alternative to scale out your RDBMS Scale out your RDBMS • Run many SQL Servers • Data are sharded (most of the time using client code) • Memcached for faster response time 5
    6. 6. Scale Out with RDBMS Is this a good approach to scale? • Lot of components to deploy • Scale by Hand – Caching – Sharding/Replication Learn From Others This Scenario Costs Time and Money. Scaling SQL is potentially disastrous when going Viral: Very risky time for major code changes and migrations... You have no Time when skyrocketing up! 6
    7. 7. Lacking market solutions, users forced to invent Bigtable Dynamo Cassandra Voldemort November 2006 October 2007 August 2008 February 2009 • No schema required before inserting data • No schema change required to change data format • Auto-sharding without application participation • Distributed queries • Integrated main memory caching • Data synchronization (mobile, multi-datacenter) Very few organizations want to (fewer can) build and maintain database software technology. But every organization building interactive web applications needs this technology. 7
    8. 8. Survey: Schema inflexibility #1 adoption driver What is the biggest data management problem driving your use of NoSQL in the coming year? Lack of flexibility/rigid schemas 49% Inability to scale out data 35% High latency/low performance 29% Costs 16% All of these 12% Other 11% Source: Couchbase NoSQL Survey, December 2011, n=1351 8
    9. 9. NoSQL database matches application logic tier architectureData layer now scales with linear cost and constant performance. Application Scales Out Just add more commodity web servers NoSQL Database Servers Database Scales Out Just add more commodity data servers Scaling out flattens the cost and performance curves. 9
    10. 10. NOSQL TAXONOMY 10
    11. 11. The Key-Value Store – the foundation of NoSQL Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 11
    12. 12. Memcached – the NoSQL precursor Key 101100101000100010011101 Memcached 101100101000100010011101 101100101000100010011101 101100101000100010011101 In-memory only 101100101000100010011101 Limited set of operations Opaque 101100101000100010011101 Blob Storage: Set, Add, Replace, CAS 101100101000100010011101 Binary 101100101000100010011101 Retrieval: Get 101100101000100010011101 Structured Data: Append, Increment Value 101100101000100010011101 101100101000100010011101 “Simple and fast.” 101100101000100010011101 101100101000100010011101 101100101000100010011101 Challenges: cold cache, disruptive elasticity 101100101000100010011101 12
    13. 13. Database Cache (memory/disk) (memory only) Memcached Key-Value NoSQL catalog13
    14. 14. Redis – More “Structured Data” commands Key 101100101000100010011101 Redis 101100101000100010011101 101100101000100010011101 101100101000100010011101 “Data Structures” Disk Persistence (eventual consistency on 101100101000100010011101 the disk) Blob 101100101000100010011101 Vast set of operations 101100101000100010011101 List 101100101000100010011101 Blob Storage: Set, Add, Replace, CAS 101100101000100010011101 Set Retrieval: Get, Pub-Sub 101100101000100010011101 Structured Data: Strings, Hashes, Lists, Sets, Hash 101100101000100010011101 Sorted lists 101100101000100010011101 … 101100101000100010011101 Example operations for a Set 101100101000100010011101 Add, count, subtract sets, intersection, is 101100101000100010011101 member, atomic move from one set to another 14
    15. 15. NoSQL catalog Key-Value Data Structure(memory only) Cache Memcached Redis(memory/disk) Database 15
    16. 16. Membase – From key-value cache to database Key 101100101000100010011101 Membase 101100101000100010011101 101100101000100010011101 101100101000100010011101 Disk-based with built-in memcached cache 101100101000100010011101 Cache refill on restart Opaque 101100101000100010011101 Memcached compatible (drop in replacement) 101100101000100010011101 Binary 101100101000100010011101 Highly-available (data replication) 101100101000100010011101 Add or remove capacity to live cluster Value 101100101000100010011101 101100101000100010011101 “Simple, fast, elastic.” 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 16
    17. 17. NoSQL catalog Key-Value Data Structure(memory only) Cache Memcached Redis(memory/disk) Database Membase 17
    18. 18. Couchbase – document-oriented database Key Couchbase { “string” : “string”, “string” : value, Auto-sharding “string” : Disk-based with built-in memcached cache JSON , “string” : “string”, Cache refill on restart “string” : value -, OBJECT “string” : * array + Memcached compatible (drop in replace) Highly-available (data replication) } (“DOCUMENT”) Add or remove capacity to live cluster When values are JSON objects (“documents”): Create indices, views and query against the views 18
    19. 19. NoSQL catalog Key-Value Data Structure Document(memory only) Cache Memcached Redis(memory/disk) Database Membase Couchbase 19
    20. 20. MongoDB – Document-oriented database Key MongoDB { “string” : “string”, “string” : value, Disk-based with in-memory “caching” “string” : BSON (“binary JSON”) format and wire protocol BSON , “string” : “string”, Master-slave replication OBJECT “string” : value -, Auto-sharding “string” : * array + (“DOCUMENT”) Values are BSON objects } Supports ad hoc queries – best when indexed 20
    21. 21. NoSQL catalog Key-Value Data Structure Document(memory only) Cache Memcached Redis(memory/disk) Database Membase Couchbase MongoDB 21
    22. 22. Cassandra – Column overlays Key 101100101000100010011101 Cassandra 101100101000100010011101Column 1 101100101000100010011101 101100101000100010011101 Disk-based system 101100101000100010011101 Opaque 101100101000100010011101 ClusteredColumn 2 101100101000100010011101 External caching required for low-latency reads Binary 101100101000100010011101 101100101000100010011101 “Columns” are overlaid on the data Value 101100101000100010011101 101100101000100010011101 Not all rows must have all columnsColumn 3 101100101000100010011101(not present) 101100101000100010011101 Supports efficient queries on columns 101100101000100010011101 101100101000100010011101 Restart required when adding columns 22
    23. 23. NoSQL catalog Key-Value Data Structure Document Column(memory only) Cache Memcached Redis(memory/disk) Database Membase Couchbase Cassandra MongoDB 23
    24. 24. Neo4j – Graph database Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Neo4j Key Key Disk-based system 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 External caching required for low-latency reads Nodes, relationships and paths 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Opaque 101100101000100010011101 101100101000100010011101 101100101000100010011101 Binary Binary Properties on nodes 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Delete, Insert, Traverse, etc. Key Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Opaque 101100101000100010011101 101100101000100010011101 101100101000100010011101 Binary 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 24
    25. 25. NoSQL catalog Key-Value Data Structure Document Column Graph(memory only) Cache Memcached Redis(memory/disk) Database Membase Couchbase Cassandra Neo4j MongoDB 25
    26. 26. NoSQL catalog Key-Value Data Structure Document Column Graph(memory only) Cache Memcached Redis Coherence(memory/disk) Database Membase Couchbase Cassandra Neo4j MongoDB HBase InfiniteGraph 26
    27. 27. 27
    28. 28. What about Hadoop? 28
    29. 29. Hadoop : Big Data Swiss Army Knife• Oozie: Workflow, coordination• Sqoop : Data connector to import/export data• Hive : SQL-Like interface• Pig : High level programming language• Mahout : Machine learning library• Whirr : Hadoop management tools for cloud services• Flume : Aggregator• Map Reduce : Framework to process large volume of data• HBase : Key Value data store• Zookeeper : Centralized configuration management• HDFS : Distributed file system 29
    30. 30. So what? Hadoop & Couchbase 40 milliseconds to respond with the decision. profiles, real time campaign 3 statistics 2 1 profiles, campaigns click stream events 30
    31. 31. WHICH ONE IS RIGHT FOR ME ? 31
    32. 32. Survey: Schema inflexibility #1 adoption driver What is the biggest data management problem driving your use of NoSQL in the coming year? Lack of flexibility/rigid schemas 49% Inability to scale out data 35% High latency/low performance 29% Costs 16% All of these 12% Other 11% Source: Couchbase NoSQL Survey, December 2011, n=1351 32
    33. 33. Lack of Flexibility / Rigid Schema• Aggregate Data Models (Martin Fowler) – Flexible Data Structure – Optimized Access – Easy to distribute data o::1001 { uid: ji22jd, customer: Ann, line_items: [ { sku: 0321293533, quan: 3, unit_price: 48.0 }, { sku: 0321601912, quan: 1, unit_price: 39.0 }, { sku: 0131495054, quan: 1, unit_price: 51.0 } ], payment: { type: Amex, expiry: 04/2001, last5: 12345 } } http://martinfowler.com/bliki/AggregateOrientedDatabase.html 33
    34. 34. Use Cases Key Value • Session Management • User Profile/Preferences • Shopping Cart Document • Event Logging • Content Management • Web Analytics • E-Commerce Application Columns • Event Logging • Content Management • Counters Graph • Connected Data / Social Networks • Routing, Dispatch • Recommendations based on Social Graph Thanks to Martin Fowler 34
    35. 35. Production Environment EMEA DC US DATA CENTER APAC DC 35
    36. 36. Scale out your data• Modify cluster topology should be simple – Add, Remove, Configure Nodes on a running system• What is the impact of topology changes? – Sharding, Caching of the data – Availability of the service during cluster changes• More hardware = More failures – Availability, reliability of the system: failover support 36
    37. 37. Add Nodes APP SERVER 1 APP SERVER 2  Two servers added to COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY cluster  One-click operation CLUSTER MAP CLUSTER MAP  Docs automatically rebalanced across cluster  Even distribution of docs Read/Write/Update Read/Write/Update  Minimum doc movement  Cluster map updated  App database calls now distributed over larger # SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5 of servers Active Docs Active Docs Active Docs Active Docs Active Docs Active Docs Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 3 Doc 2 DOC Doc 7 DOC Doc 3 DOC Doc 6 Doc 9 DOC Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 7 Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 9 Doc 8 DOC Doc 2 DOC Doc 5 DOC COUCHBASE SERVER CLUSTERUser Configured Replica Count = 1 37
    38. 38. Fail Over Node APP SERVER 1 APP SERVER 2  App servers happily accessing docs on Server 3 COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY  Server fails  App server requests to server 3 fail CLUSTER MAP CLUSTER MAP  Cluster detects server has failed  Promotes replicas of docs to active  Updates cluster map  App server requests for docs now go to appropriate server  Typically rebalance would follow SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5 Active Docs Active Docs Active Docs Active Docs Active Docs Active Docs Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 9 DOC Doc 6 DOC Doc 3 Doc 2 DOC Doc 7 DOC Doc 3 Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 5 DOC Doc 8 DOC Doc 7 Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 2 DOC Doc 9 COUCHBASE SERVER CLUSTERUser Configured Replica Count = 1 38
    39. 39. Performance• What is my working set?• How cache is working? – Put your data in RAM• How to design my data model? – Aggregate Model – Easy to change 39
    40. 40. 40
    41. 41. Management and Monitoring• Do not forget about Operations! – Service Reliability Engineering Team will thank you!• Manage your cluster easily: – Command Line, Administration Console to change cluster toplogy• Monitor “your NoSQL” – Analyze the overall status of your cluster – View and fix bottlenecks 41
    42. 42. Conclusion• One Size Does Not Fit All• Overview of the the NoSQL types• Choose the right solution – Developer Productivity – Large Scale Data 42
    43. 43. QUESTIONS? 43
    44. 44. Simple. Fast. Elastic. NoSQL.Couchbase automatically distributes data across commodity servers. Built-in caching enables apps to read and write data with sub-millisecond latency. And with no schema to manage, Couchbase effortlessly accommodates changing data management requirements. 44

    ×