Couchbase_UK_2013_NoSQL_Landscape

868 views
809 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
868
On SlideShare
0
From Embeds
0
Number of Embeds
452
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Bulletize the text. Make sure build work properly.
  • Bulletize the text. Make sure build work properly.
  • Couchbase_UK_2013_NoSQL_Landscape

    1. 1. The NoSQL Landscape Tugdual Grall Technology Evangelist
    2. 2. What we’ll talk about• Why RDBMS are not enough?• What are the different NoSQL taxonomies?• Which “NoSQL” is right for me?
    3. 3. Growth is the New Reality• Instagram gained nearly 1 million users overnight when they expanded to Android
    4. 4. Does it work with RDMBS backend? Application Scales Out Just add more commodity web servers Database Scales Up Get a bigger, more complex serverNote – Relational database technology is great for what it is great for, but it is not great for this.
    5. 5. Some alternatives to scale out your RDBMSScale out your RDBMS• Run many SQL Servers• Data are sharded (most of the time using client code)• Memcached for faster response time
    6. 6. Scale out with RDBMSIs this a good approach to scale?• Lot of components to deploy• Scale by Hand - Caching - Sharding/Replication Learn From Others This Scenario Costs Time and Money. Scaling SQL is potentially disastrous when going Viral: Very risky time for major code changes and migrations... You have no Time when skyrocketing up!
    7. 7. Lacking market solutions, users forced to invent Bigtable Dynamo Cassandra VoldemortNovember 2006 October 2007 August 2008 February 2009 • No schema required before inserting data • No schema change required to change data format • Auto-sharding without application participation • Distributed queries • Integrated main memory caching • Data synchronization (mobile, multi-datacenter)Very few organizations want to (fewer can) build and maintain database software technology.But every organization building interactive web applications needs this technology.
    8. 8. Survey: Schema inflexibility #1 adoption driver What is the biggest data management problem driving your use of NoSQL in the coming year? Lack of flexibility/rigid schemas 49% Inability to scale out data 35% High latency/low performance 29% Costs 16% All of these 12% Other 11% Source: Couchbase NoSQL Survey, December 2011, n=1351
    9. 9. NoSQL database matches application logic tier architecture Data layer now scales with linear cost and constant performance Application Scales Out Just add more commodity web serversNoSQL Database Servers Database Scales Out Just add more commodity data serversScaling out flattens the cost and performance curves.
    10. 10. NoSQL Taxonomy
    11. 11. The Key-Value Store – the foundation of NoSQL Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101
    12. 12. Memcached – the NoSQL precursorKey101100101000100010011101 Memcached101100101000100010011101101100101000100010011101101100101000100010011101 In-memory only101100101000100010011101 Limited set of operations Opaque101100101000100010011101 Blob Storage: Set, Add, Replace, CAS101100101000100010011101 Binary101100101000100010011101 Retrieval: Get101100101000100010011101 Structured Data: Append, Increment Value101100101000100010011101101100101000100010011101 “Simple and fast.”101100101000100010011101101100101000100010011101101100101000100010011101 Challenges: cold cache, disruptive elasticity101100101000100010011101
    13. 13. Database Cache(memory/disk) (memory only) Memcached Key-Value NoSQL catalog
    14. 14. Redis – More “Structured Data” commandsKey101100101000100010011101 Redis101100101000100010011101101100101000100010011101101100101000100010011101 “Data Structures” Disk Persistence (eventual consistency on101100101000100010011101 the disk) Blob101100101000100010011101 Vast set of operations101100101000100010011101 List101100101000100010011101 Blob Storage: Set, Add, Replace, CAS101100101000100010011101 Set Retrieval: Get, Pub-Sub101100101000100010011101 Structured Data: Strings, Hashes, Lists, Sets, Hash101100101000100010011101 Sorted lists101100101000100010011101 …101100101000100010011101 Example operations for a Set101100101000100010011101 Add, count, subtract sets, intersection, is101100101000100010011101 member, atomic move from one set to another
    15. 15. NoSQL catalog Key-Value Data Structure(memory only) Cache Memcached Redis(memory/disk) Database
    16. 16. Membase – From key-value cache to databaseKey101100101000100010011101 Membase101100101000100010011101101100101000100010011101101100101000100010011101 Disk-based with built-in memcached cache101100101000100010011101 Cache refill on restart Opaque101100101000100010011101 Memcached compatible (drop in replacement)101100101000100010011101 Binary101100101000100010011101 Highly-available (data replication)101100101000100010011101 Add or remove capacity to live cluster Value101100101000100010011101101100101000100010011101 “Simple, fast, elastic.”101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101
    17. 17. NoSQL catalog Key-Value Data Structure(memory only) Cache Memcached Redis(memory/disk) Database Membase
    18. 18. Couchbase – document-oriented databaseKey Couchbase{ “string” : “string”, “string” : value, Auto-sharding “string” : Disk-based with built-in memcached cache JSON , “string” : “string”, Cache refill on restart “string” : value -, OBJECT “string” : * array + Memcached compatible (drop in replace) Highly-available (data replication)} (“DOCUMENT”) Add or remove capacity to live cluster When values are JSON objects (“documents”): Create indices, views and query against the views
    19. 19. NoSQL catalog Key-Value Data Structure Document(memory only) Cache Memcached Redis(memory/disk) Database Membase Couchbase
    20. 20. MongoDB – Document-oriented databaseKey MongoDB{ “string” : “string”, “string” : value, Disk-based with in-memory “caching” “string” : BSON (“binary JSON”) format and wire protocol BSON , “string” : “string”, Master-slave replication OBJECT “string” : value -, Auto-sharding “string” : * array + (“DOCUMENT”) Values are BSON objects} Supports ad hoc queries – best when indexed
    21. 21. NoSQL catalog Key-Value Data Structure Document(memory only) Cache Memcached Redis(memory/disk) Database Membase Couchbase MongoDB
    22. 22. Cassandra – Column overlays Key 101100101000100010011101 Cassandra 101100101000100010011101Column 1 101100101000100010011101 101100101000100010011101 Disk-based system 101100101000100010011101 Opaque 101100101000100010011101 ClusteredColumn 2 101100101000100010011101 External caching required for low-latency reads Binary 101100101000100010011101 101100101000100010011101 “Columns” are overlaid on the data Value 101100101000100010011101 101100101000100010011101 Not all rows must have all columnsColumn 3 101100101000100010011101(not present) 101100101000100010011101 Supports efficient queries on columns 101100101000100010011101 101100101000100010011101 Restart required when adding columns
    23. 23. NoSQL catalog Key-Value Data Structure Document Column(memory only) Cache Memcached Redis(memory/disk) Database Membase Couchbase Cassandra MongoDB
    24. 24. Neo4j – Graph database Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Neo4jKey Key Disk-based system101100101000100010011101101100101000100010011101101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 External caching required for low-latency reads Nodes, relationships and paths101100101000100010011101 101100101000100010011101101100101000100010011101 101100101000100010011101 Opaque101100101000100010011101 Opaque 101100101000100010011101101100101000100010011101 101100101000100010011101 Binary Binary Properties on nodes101100101000100010011101 101100101000100010011101101100101000100010011101 101100101000100010011101 Value101100101000100010011101 Value 101100101000100010011101101100101000100010011101 101100101000100010011101101100101000100010011101 101100101000100010011101101100101000100010011101101100101000100010011101101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Delete, Insert, Traverse, etc. Key Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Opaque 101100101000100010011101 101100101000100010011101 101100101000100010011101 Binary 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101
    25. 25. NoSQL catalog Key-Value Data Structure Document Column Graph(memory only) Cache Memcached Redis(memory/disk) Database Membase Couchbase Cassandra Neo4j MongoDB
    26. 26. NoSQL catalog Key-Value Data Structure Document Column Graph(memory only) Cache Memcached Redis Coherence(memory/disk) Database Membase Couchbase Cassandra Neo4j MongoDB HBase InfiniteGraph
    27. 27. Speed and Scale
    28. 28. What about Hadoop?
    29. 29. Hadoop: Big Data Swiss Army Knife• Oozie: Workflow, coordination• Sqoop : Data connector to import/export data• Hive : SQL-Like interface• Pig : High level programming language• Mahout : Machine learning library• Whirr : Hadoop management tools for cloud services• Flume : Aggregator• Map Reduce : Framework to process large volume of data• HBase : Key Value data store• Zookeeper : Centralized configuration management• HDFS : Distributed file system
    30. 30. So what? Hadoop & Couchbase 40 milliseconds to respond with the decision. profiles, real time campaign 3 statistics 2 1 profiles, campaigns click stream events
    31. 31. Which one is right for me?
    32. 32. Survey: Schema inflexibility #1 adoption driver What is the biggest data management problem driving your use of NoSQL in the coming year? Lack of flexibility/rigid schemas 49% Inability to scale out data 35% High latency/low performance 29% Costs 16% All of these 12% Other 11% Source: Couchbase NoSQL Survey, December 2011, n=1351
    33. 33. Lack of Flexibility / Rigid Schema• Aggregate Data Models (Martin Fowler) - Flexible Data Structure - Optimized Access - Easy to distribute data o::1001 { uid: ji22jd, customer: Ann, line_items: [ { sku: 0321293533, quan: 3, unit_price: 48.0 }, { sku: 0321601912, quan: 1, unit_price: 39.0 }, { sku: 0131495054, quan: 1, unit_price: 51.0 } ], payment: { type: Amex, expiry: 04/2001, last5: 12345 } } http://martinfowler.com/bliki/AggregateOrientedDatabase.html
    34. 34. Use CasesKey Value • Session Management • User Profile/Preferences • Shopping CartDocument • Event Logging • Content Management • Web Analytics • E-Commerce ApplicationColumns • Event Logging • Content Management • CountersGraph • Connected Data / Social Networks • Routing, Dispatch • Recommendations based on Social Graph
    35. 35. Production Environment EMEA DC US DATA CENTER APAC DC
    36. 36. Scale out your data• Modify cluster topology should be simple - Add, Remove, Configure Nodes on a running system• What is the impact of topology changes? - Sharding, Caching of the data - Availability of the service during cluster changes• More hardware = More failures - Availability, reliability of the system: failover support
    37. 37. Add Nodes to Cluster APP SERVER 1 APP SERVER 2 COUCHBASE Client Library COUCHBASE Client Library CLUSTER MAP CLUSTER MAP READ/WRITE/UPDATE READ/WRITE/UPDATE SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5 • Two servers added ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE One-click operation Doc 5 Doc Doc 4 Doc Doc 1 Doc • Docs automatically rebalanced across Doc 2 Doc Doc 7 Doc Doc 2 Doc cluster Even distribution of docs Minimum doc movement Doc 9 Doc Doc 8 Doc Doc 6 Doc • Cluster map updated REPLICA REPLICA REPLICA REPLICA REPLICA • App database Doc 4 Doc Doc 6 Doc Doc 7 Doc calls now distributed over larger number of Doc 1 Doc Doc 3 Doc Doc 9 Doc servers Doc 8 Doc Doc 2 Doc Doc 5 Doc COUCHBASE SERVER CLUSTERUser Configured Replica Count = 1
    38. 38. Fail Over Node APP SERVER 1 APP SERVER 2 COUCHBASE Client Library COUCHBASE Client Library CLUSTER MAP CLUSTER MAP SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5 • App servers accessing docs ACTIVE ACTIVE ACTIVE ACTIVE ACTIVE • Requests to Server 3 fail Doc 5 Doc Doc 4 Doc Doc 1 Doc Doc 9 Doc Doc 6 Doc • Cluster detects server failed Promotes replicas of docs to Doc 2 Doc Doc 7 Doc Doc 2 Doc Doc 8 Doc Doc active Updates cluster map Doc 1 Doc 3 • Requests for docs now go to REPLICA REPLICA REPLICA REPLICA REPLICA appropriate server Doc 4 Doc Doc 6 Doc Doc 7 Doc Doc 5 Doc Doc 8 Doc • Typically rebalance would follow Doc 1 Doc Doc 3 Doc Doc 9 Doc Doc 2 Doc COUCHBASE SERVER CLUSTERUser Configured Replica Count = 1
    39. 39. Performance• What is my working set?• How cache is working? - Put your data in RAM• How to design my data model? - Aggregate Model - Easy to change
    40. 40. Management and Monitoring• Do not forget about Operations! - Service Reliability Engineering Team will thank you!• Manage your cluster easily: - Command Line, Administration Console to change cluster toplogy• Monitor “your NoSQL” - Analyze the overall status of your cluster - View and fix bottlenecks
    41. 41. Conclusion• One Size Does Not Fit All• Overview of the the NoSQL types• Choose the right solution - Developer Productivity - Large Scale Data
    42. 42. Q&A
    43. 43. Thanks!

    ×