Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CCB12 Navigating the NoSQL Ladnscape

647 views

Published on

  • Be the first to comment

  • Be the first to like this

CCB12 Navigating the NoSQL Ladnscape

  1. 1. Navigating the NoSQLLandscape Tugdual “tug” Grall Technology Evangel @tgrall 1
  2. 2. What we’ll talk about• Why RDBMS are not enough?• What are the different NoSQL taxonomies?• Which “NoSQL” is right for me? 2
  3. 3. Growth is the New Reality• Instagram gained nearly 1 million users overnight when they expanded to Android 3
  4. 4. Does it work with RDBMS backend? Application Scales Out Just add more commodity web servers Database Scales Up Get a bigger, more complex server Note – Relational database technology is great for what it is great for, but it is not great for this. 4
  5. 5. Some alternative to scale out your RDBMS Scale out your RDBMS • Run many SQL Servers • Data are sharded (most of the time using client code) • Memcached for faster response time 5
  6. 6. Scale Out with RDBMS Is this a good approach to scale? • Lot of components to deploy • Scale by Hand – Caching – Sharding/Replication Learn From Others This Scenario Costs Time and Money. Scaling SQL is potentially disastrous when going Viral: Very risky time for major code changes and migrations... You have no Time when skyrocketing up! 6
  7. 7. Lacking market solutions, users forced to invent Bigtable Dynamo Cassandra Voldemort November 2006 October 2007 August 2008 February 2009 • No schema required before inserting data • No schema change required to change data format • Auto-sharding without application participation • Distributed queries • Integrated main memory caching • Data synchronization (mobile, multi-datacenter) Very few organizations want to (fewer can) build and maintain database software technology. But every organization building interactive web applications needs this technology. 7
  8. 8. Survey: Schema inflexibility #1 adoption driver What is the biggest data management problem driving your use of NoSQL in the coming year? Lack of flexibility/rigid schemas 49% Inability to scale out data 35% High latency/low performance 29% Costs 16% All of these 12% Other 11% Source: Couchbase NoSQL Survey, December 2011, n=1351 8
  9. 9. NoSQL database matches application logic tier architectureData layer now scales with linear cost and constant performance. Application Scales Out Just add more commodity web servers NoSQL Database Servers Database Scales Out Just add more commodity data servers Scaling out flattens the cost and performance curves. 9
  10. 10. NOSQL TAXONOMY 10
  11. 11. The Key-Value Store – the foundation of NoSQL Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 11
  12. 12. Memcached – the NoSQL precursor Key 101100101000100010011101 Memcached 101100101000100010011101 101100101000100010011101 101100101000100010011101 In-memory only 101100101000100010011101 Limited set of operations Opaque 101100101000100010011101 Blob Storage: Set, Add, Replace, CAS 101100101000100010011101 Binary 101100101000100010011101 Retrieval: Get 101100101000100010011101 Structured Data: Append, Increment Value 101100101000100010011101 101100101000100010011101 “Simple and fast.” 101100101000100010011101 101100101000100010011101 101100101000100010011101 Challenges: cold cache, disruptive elasticity 101100101000100010011101 12
  13. 13. Database Cache (memory/disk) (memory only) Memcached Key-Value NoSQL catalog13
  14. 14. Redis – More “Structured Data” commands Key 101100101000100010011101 Redis 101100101000100010011101 101100101000100010011101 101100101000100010011101 “Data Structures” Disk Persistence (eventual consistency on 101100101000100010011101 the disk) Blob 101100101000100010011101 Vast set of operations 101100101000100010011101 List 101100101000100010011101 Blob Storage: Set, Add, Replace, CAS 101100101000100010011101 Set Retrieval: Get, Pub-Sub 101100101000100010011101 Structured Data: Strings, Hashes, Lists, Sets, Hash 101100101000100010011101 Sorted lists 101100101000100010011101 … 101100101000100010011101 Example operations for a Set 101100101000100010011101 Add, count, subtract sets, intersection, is 101100101000100010011101 member, atomic move from one set to another 14
  15. 15. NoSQL catalog Key-Value Data Structure(memory only) Cache Memcached Redis(memory/disk) Database 15
  16. 16. Membase – From key-value cache to database Key 101100101000100010011101 Membase 101100101000100010011101 101100101000100010011101 101100101000100010011101 Disk-based with built-in memcached cache 101100101000100010011101 Cache refill on restart Opaque 101100101000100010011101 Memcached compatible (drop in replacement) 101100101000100010011101 Binary 101100101000100010011101 Highly-available (data replication) 101100101000100010011101 Add or remove capacity to live cluster Value 101100101000100010011101 101100101000100010011101 “Simple, fast, elastic.” 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 16
  17. 17. NoSQL catalog Key-Value Data Structure(memory only) Cache Memcached Redis(memory/disk) Database Membase 17
  18. 18. Couchbase – document-oriented database Key Couchbase { “string” : “string”, “string” : value, Auto-sharding “string” : Disk-based with built-in memcached cache JSON , “string” : “string”, Cache refill on restart “string” : value -, OBJECT “string” : * array + Memcached compatible (drop in replace) Highly-available (data replication) } (“DOCUMENT”) Add or remove capacity to live cluster When values are JSON objects (“documents”): Create indices, views and query against the views 18
  19. 19. NoSQL catalog Key-Value Data Structure Document(memory only) Cache Memcached Redis(memory/disk) Database Membase Couchbase 19
  20. 20. MongoDB – Document-oriented database Key MongoDB { “string” : “string”, “string” : value, Disk-based with in-memory “caching” “string” : BSON (“binary JSON”) format and wire protocol BSON , “string” : “string”, Master-slave replication OBJECT “string” : value -, Auto-sharding “string” : * array + (“DOCUMENT”) Values are BSON objects } Supports ad hoc queries – best when indexed 20
  21. 21. NoSQL catalog Key-Value Data Structure Document(memory only) Cache Memcached Redis(memory/disk) Database Membase Couchbase MongoDB 21
  22. 22. Cassandra – Column overlays Key 101100101000100010011101 Cassandra 101100101000100010011101Column 1 101100101000100010011101 101100101000100010011101 Disk-based system 101100101000100010011101 Opaque 101100101000100010011101 ClusteredColumn 2 101100101000100010011101 External caching required for low-latency reads Binary 101100101000100010011101 101100101000100010011101 “Columns” are overlaid on the data Value 101100101000100010011101 101100101000100010011101 Not all rows must have all columnsColumn 3 101100101000100010011101(not present) 101100101000100010011101 Supports efficient queries on columns 101100101000100010011101 101100101000100010011101 Restart required when adding columns 22
  23. 23. NoSQL catalog Key-Value Data Structure Document Column(memory only) Cache Memcached Redis(memory/disk) Database Membase Couchbase Cassandra MongoDB 23
  24. 24. Neo4j – Graph database Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Neo4j Key Key Disk-based system 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 External caching required for low-latency reads Nodes, relationships and paths 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Opaque 101100101000100010011101 101100101000100010011101 101100101000100010011101 Binary Binary Properties on nodes 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Delete, Insert, Traverse, etc. Key Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Opaque 101100101000100010011101 101100101000100010011101 101100101000100010011101 Binary 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 24
  25. 25. NoSQL catalog Key-Value Data Structure Document Column Graph(memory only) Cache Memcached Redis(memory/disk) Database Membase Couchbase Cassandra Neo4j MongoDB 25
  26. 26. NoSQL catalog Key-Value Data Structure Document Column Graph(memory only) Cache Memcached Redis Coherence(memory/disk) Database Membase Couchbase Cassandra Neo4j MongoDB HBase InfiniteGraph 26
  27. 27. 27
  28. 28. What about Hadoop? 28
  29. 29. Hadoop : Big Data Swiss Army Knife• Oozie: Workflow, coordination• Sqoop : Data connector to import/export data• Hive : SQL-Like interface• Pig : High level programming language• Mahout : Machine learning library• Whirr : Hadoop management tools for cloud services• Flume : Aggregator• Map Reduce : Framework to process large volume of data• HBase : Key Value data store• Zookeeper : Centralized configuration management• HDFS : Distributed file system 29
  30. 30. So what? Hadoop & Couchbase 40 milliseconds to respond with the decision. profiles, real time campaign 3 statistics 2 1 profiles, campaigns click stream events 30
  31. 31. WHICH ONE IS RIGHT FOR ME ? 31
  32. 32. Survey: Schema inflexibility #1 adoption driver What is the biggest data management problem driving your use of NoSQL in the coming year? Lack of flexibility/rigid schemas 49% Inability to scale out data 35% High latency/low performance 29% Costs 16% All of these 12% Other 11% Source: Couchbase NoSQL Survey, December 2011, n=1351 32
  33. 33. Lack of Flexibility / Rigid Schema• Aggregate Data Models (Martin Fowler) – Flexible Data Structure – Optimized Access – Easy to distribute data o::1001 { uid: ji22jd, customer: Ann, line_items: [ { sku: 0321293533, quan: 3, unit_price: 48.0 }, { sku: 0321601912, quan: 1, unit_price: 39.0 }, { sku: 0131495054, quan: 1, unit_price: 51.0 } ], payment: { type: Amex, expiry: 04/2001, last5: 12345 } } http://martinfowler.com/bliki/AggregateOrientedDatabase.html 33
  34. 34. Use Cases Key Value • Session Management • User Profile/Preferences • Shopping Cart Document • Event Logging • Content Management • Web Analytics • E-Commerce Application Columns • Event Logging • Content Management • Counters Graph • Connected Data / Social Networks • Routing, Dispatch • Recommendations based on Social Graph Thanks to Martin Fowler 34
  35. 35. Production Environment EMEA DC US DATA CENTER APAC DC 35
  36. 36. Scale out your data• Modify cluster topology should be simple – Add, Remove, Configure Nodes on a running system• What is the impact of topology changes? – Sharding, Caching of the data – Availability of the service during cluster changes• More hardware = More failures – Availability, reliability of the system: failover support 36
  37. 37. Add Nodes APP SERVER 1 APP SERVER 2  Two servers added to COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY cluster  One-click operation CLUSTER MAP CLUSTER MAP  Docs automatically rebalanced across cluster  Even distribution of docs Read/Write/Update Read/Write/Update  Minimum doc movement  Cluster map updated  App database calls now distributed over larger # SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5 of servers Active Docs Active Docs Active Docs Active Docs Active Docs Active Docs Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 3 Doc 2 DOC Doc 7 DOC Doc 3 DOC Doc 6 Doc 9 DOC Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 7 Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 9 Doc 8 DOC Doc 2 DOC Doc 5 DOC COUCHBASE SERVER CLUSTERUser Configured Replica Count = 1 37
  38. 38. Fail Over Node APP SERVER 1 APP SERVER 2  App servers happily accessing docs on Server 3 COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY  Server fails  App server requests to server 3 fail CLUSTER MAP CLUSTER MAP  Cluster detects server has failed  Promotes replicas of docs to active  Updates cluster map  App server requests for docs now go to appropriate server  Typically rebalance would follow SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5 Active Docs Active Docs Active Docs Active Docs Active Docs Active Docs Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 9 DOC Doc 6 DOC Doc 3 Doc 2 DOC Doc 7 DOC Doc 3 Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 5 DOC Doc 8 DOC Doc 7 Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 2 DOC Doc 9 COUCHBASE SERVER CLUSTERUser Configured Replica Count = 1 38
  39. 39. Performance• What is my working set?• How cache is working? – Put your data in RAM• How to design my data model? – Aggregate Model – Easy to change 39
  40. 40. 40
  41. 41. Management and Monitoring• Do not forget about Operations! – Service Reliability Engineering Team will thank you!• Manage your cluster easily: – Command Line, Administration Console to change cluster toplogy• Monitor “your NoSQL” – Analyze the overall status of your cluster – View and fix bottlenecks 41
  42. 42. Conclusion• One Size Does Not Fit All• Overview of the the NoSQL types• Choose the right solution – Developer Productivity – Large Scale Data 42
  43. 43. QUESTIONS? 43
  44. 44. Simple. Fast. Elastic. NoSQL.Couchbase automatically distributes data across commodity servers. Built-in caching enables apps to read and write data with sub-millisecond latency. And with no schema to manage, Couchbase effortlessly accommodates changing data management requirements. 44

×