Characteristics of no sql databases

1,577 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,577
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
40
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • These slides are meant to discuss and address the technical differences of RDBMS vs. NoSQL from a document modeling and performance/scale perspective.
  • Get “But every . . .” onto 1 line.
  • These are the 4 “promises” of NoSQL
  • Do not failover a healthy node!
  • Do not failover a healthy node!
  • Most of you are probably familiar with the table layout. A table is defined with a set of column. And each record in the table conforms to the schema. If you wish to capture different data in the future, the table schema must be changed using the alter table statement. Typically data is normalized in the 3rd normal form reduce duplication. Large tables are split into smaller tables.using foreign keys
  • Do not failover a healthy node!
  • Summary bullet should read “To get info about a specific user you perform a join across two tables”Shouldn’t “Geo Info” be “Address Info”Changes names so they are employee names.Example. Normalized schema 2 tables Fk connects the two. To get information about a specific error, you will perform and join across the two tables
  • Example. Normalized schema 2 tables Fk connects the two. To get information about a specific error, you will perform and join across the two tables
  • The data is modeled for the application code and not for the database.
  • Most of you are probably familiar with the table layout. A table is defined with a set of column. And each record in the table conforms to the schema. If you wish to capture different data in the future, the table schema must be changed using the alter table statement. Typically data is normalized in the 3rd normal form reduce duplication. Large tables are split into smaller tables.using foreign keys
  • This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
  • This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
  • This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
  • This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
  • This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
  • This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
  • This example shows how a very simple user profile is represented.In a relational database, the user information might be represented across 2 interrelated tables.In a document database, the information is aggregated into a single document that is very natural to program with.So, this is how the data model is different. Let’s now talk about how scalability is different.
  • These are the market segments
  • Partial listing of companies with paid production deploymentsThousands more using open source
  • Characteristics of no sql databases

    1. 1. NoSQL for SQL Professionals Dipti Borkar Director, Product Management
    2. 2. Link to Slides http://bit.ly/17pgrcP
    3. 3. Macro Trends Driving NoSQL Technology More Data More Users + Interactive Apps + NoSQL
    4. 4. Lacking Solutions, Users Forced to Invent Bigtable November 2006 Dynamo October 2007 Cassandra August 2008 Voldemort February 2009 Very few organizations can build and maintain database software technology. But every organization building interactive web applications needs this technology.
    5. 5. What Is Biggest Data Management Problem Driving Use of NoSQL in Coming Year? 49% 35% 29% 16% Lack of flexibility/ rigid schemas Inability to scale out data Source: Couchbase Survey, December 2011, n = 1351. Performance challenges Cost 12% All of these 11% Other
    6. 6. Relational vs. NoSQL
    7. 7. Key Differences
    8. 8. Relational Technology Scales Up Application Scales Out Just add more commodity web servers System Cost Application Performance Web/App Server Tier Users RDBMS Scales Up Get a bigger, more complex server System Cost Application Performance Won’t scale beyond this point Relational Database Users Expensive and disruptive sharding, doesn’t perform at web scale
    9. 9. Couchbase Server Scales Out Like App Tier Application Scales Out Just add more commodity web servers System Cost Application Performance Web/App Server Tier Users NoSQL Database Scales Out Cost and performance mirrors app tier System Cost Application Performance Couchbase Distributed Data Store Users Scaling out flattens the cost and performance curves
    10. 10. Differences • 1. Tables vs Document - Relational has tables with predefined columns: Schema pre-determined before data can be inserted. Best practice is to normalize by splitting into several tables, joined by PK-FK relation.
    11. 11. Differences • Tables vs Document (contd.) - In Couchbase, there are no tables only documents A logical entity is stored within a single document Different documents do not need to have the same set of fields or structure You differentiate different types of documents either based on key names you provide or by adding attributes
    12. 12. Relational vs Document Data Model C1 C2 C3 C4 { JSON JSON } JSON Relational data model Document data model Highly-structured table organization with rigidly-defined data formats and record structure. Collection of complex documents with arbitrary, nested data formats and varying “record” format.
    13. 13. Differences • Joins vs logical single document - Single logical document. No need for joins. If normalized and several documents, then use a series of gets recipe= couchbase.get("my-recipe-id"); reviews = couchbase.multiget(recipe.comments); • Transactions - Relational: Atomicity can span several records across several tables. NoSQL: Atomicity confined to at document level
    14. 14. Key Couchbase Concepts Clients Servers Documents User/application data Read/write from/to Data Buckets Multitenant Architecture Which live on Server Nodes based on bucket partitioning That form a Couchbase Cluster dynamically scalable
    15. 15. RDBMS Example: User Profile User Info Address Info KEY First Last ZIP_id ZIP_id CITY STATE ZIP 1 Dipti Borkar 2 1 DEN CO 30303 2 Joe Smith 2 2 MV CA 94040 3 Ali Dodson 2 3 CHI IL 60609 4 John Doe 3 4 NY NY 10010 To get information about specific user, you perform a join across two tables
    16. 16. Document Example: User Profile { “ID”: 1, “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA” = + } JSON All data in a single document
    17. 17. Making a Change Using RDBMS Photo Table User Table User ID First Last Zip Country ID 1 Dipti Borkar 94040 001 Country Table User ID TEL 3 Photo ID Comment 2 d043 NYC 2 b054 Country ID Country ID Country name 001 001 USA Bday 007 002 UK 003 Argentina 004 Australia 005 Aruba 006 Austria 007 Brazil 008 Canada 009 Chile 2 Joe Smith 94040 001 5 c036 Miami 001 3 Ali Dodson 94040 001 7 d072 Sunset 133 5002 e086 Spain 133 4 Sarah Gorin NW1 002 5 Bob Young 30303 001 6 Nancy Baker 10010 001 Status Table 8 Ray Jones Lee Chen 31311 V5V3M 001 008 . . . • • • Status ID Text 1 a42 At conf 134 4 b26 excited 007 5 7 User ID Country ID c32 hockey 008 12 d83 Go A’s 001 5000 e34 sailing 005 130 Affiliations Table User ID Doug Moore 04252 001 50001 Mary White SW195 002 50002 Lisa Clark 12425 001 Affl ID Affl Name Country ID 2 a42 Cal 001 4 b96 USC 001 7 50000 • • • c14 UW 001 8 e22 Oxford 002 Portugal 131 Romania 132 Russia 133 Spain 134 Sweden
    18. 18. Making the Same Change With a Document DB { “ID”: 1, “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA”, “STATUS”: , “TEXT”: “At Conf” ,} } “GEO_LOC”: “134” -, “COUNTRY”: ”USA” } JSON Just add information to a document
    19. 19. Relational vs Document Performance User Table Photo Table First Last Zip 1 Frank Wiegel Weigel 94040 2 Joe Smith 94040 3 Ali Dodson 94040 4 Sarah Gorin Bob Young 30303 6 Nancy Baker 10010 7 Ray Jones 31311 Photo ID Comment d043 NYC 2 b054 Bday 5 c036 Miami 7 d072 Sunset 5002 e086 Spain NW1 5 User ID 2 User ID Status Table Lee Chen V5V3 • • • Status ID Text 1 a42 At conf 4 5 b26 c032 5 4 c32 b26 hockey d83 Go A’s 5000 e34 sailing Affiliations Table User ID 5000 Doug Moore 04252 5001 Mary White 41694 5002 5002 Lisa Lisa Clark 12425 { excited 12 8 User ID { Affiliations Affiliations ID Name 2 a42 b96 c14 UW 8 e22 JSON JSON JSON JSON JSON JSON USC 7 } } } } }} Cal 4 { { {{ Oxford Faster response times and higher throughput
    20. 20. Document Databases Easily Accommodate Unstructured Data Hotels { “ID”: 1, “NAME”: “Fairmont San Francisco”, “DESCRIPTION”: “Historic grandeur…”, “AVG_REVIEWER_SCORE”: “4.3”, “AMENITY”: ,“TYPE”: “gym”, DESCRIPTION: “fitness center” }, ,“TYPE”: “wifi”, “DESCRIPTION”: “free wifi”-, “RATE_TYPE”: “nightly”, “PRICE”: “$199”, “REVIEWS”: *“review_1”, “review_2”+, “ATTRACTIONS”: “Chinatown”, { } “ID”: 2, “NAME”: “W San Francisco”, JSON “DESCRIPTION”: “Chic, hip accommodations..”, “AVG_REVIEWER_SCORE”: “4.0”, “AMENITY”: ,“TYPE”: “spa”, DESCRIPTION: “Bliss Spa” }, ,“TYPE”: “wifi”, “DESCRIPTION”: “free wifi”-, ,“TYPE”: “dining”, “DESCRIPTION”: “bar/lounge”-, “RATE_TYPE”: “nightly”, “PRICE”: “$194”, “REVIEWS”: *“review_1”, “review_2”+, } JSON
    21. 21. Document Databases Easily Accommodate Unstructured Data Hotels { “ID”: 1, “NAME”: “Fairmont San Francisco”, …- JSON Reviews { “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel & Location”, “WOULD RECOMMEND”: “yes”, { “AVG_REVIEWER_SCORE”: “5”, “REVIEW_ID”: 2, “REVIEW_DATE”: “May “REVIEW”: “Nice, but a few 29, 2013”, kinks”, “271”, “USER_PROFILE_ID”: “WOULD RECOMMEND”: “yes”, } “AVG_REVIEWER_SCORE”: “4”, JSON “REVIEW_DATE”: “May 22, 2013”, “USER_PROFILE_ID”: “923”, } JSON
    22. 22. Document Databases Easily Accommodate Unstructured Data Hotel Descriptions { “ID”: 1, “NAME”: “Fairmont San Francisco”, …- JSON Reviews { “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”, …- JSON User Profiles { “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”, …- JSON { “USER_ID”: 1, { “DISPLAY_NAME ”: “USER_ID”: 1, “Ted’s Trip Experience”, “DISPLAY_NAME ”: “CITY”: “Saratoga”, “WhatWhat567”, “STATE”: “California”, “CITY”: “Kansas “NUM_OF_REVIEWS”: City”, “8”, “STATE”: “MO”, } “NUM_OF_REVIEWS”: “3”, JSON } JSON
    23. 23. Document Databases Easily Accommodate Unstructured Data Hotel Descriptions { “ID”: 1, “NAME”: “Fairmont San Francisco”, …- Hotels points to reviews JSON Reviews { “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”, …- JSON { “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”, …- JSON User Profiles { “USER_ID”: 1, “DISPLAY”: “Ted’s Trip…”, …- { “USER_ID”: 2, “DISPLAY”: “WhatWhat …”, …- JSON Document IDs associates related objects JSON Reviews points to users
    24. 24. Indexing with Document Databases Index on AVG_REVIEWER_SCORE
    25. 25. Indexing with Document Databases Index on AVG_REVIEWER_SCORE Index … 4.0, doc_id 4.0, doc_id 4.1, doc_id 4.3, doc_id 5.0, doc_id …
    26. 26. Querying with Document Databases Query on AVG_REVIEWER_SCORE Query Index … 3.4, doc_id 3.4, doc_id 3.5, doc_id 3.6, doc_id 3.7, doc_id 3.8, doc_id 4.0, doc_id 4.1, doc_id 4.3, doc_id 4.5, doc_id 4.7, doc_id 4.9, doc_id 5.0, doc_id … 5.0, doc_id Matching Results
    27. 27. Flavors of NoSQL
    28. 28. NoSQL catalog Database (memory/disk) Cache (memory only) Key-Value Data Structure memcached redis membase Document Column Graph couchbase cassandra Neo4j mongoDB
    29. 29. The Key-Value Store – the foundation of NoSQL Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101
    30. 30. Memcached – the NoSQL precursor Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 memcached In-memory only Limited set of operations Blob Storage: Set, Add, Replace, CAS Retrieval: Get Structured Data: Append, Increment “Simple and fast.” Challenges: cold cache, disruptive elasticity
    31. 31. Couchbase – document-oriented database Key Couchbase { “string” : “string”, “string” : value, “string” : , “string” : “string”, JSON “string” : value -, OBJECT “string” : * array + } (“DOCUMENT”) Auto-sharding Disk-based with built-in memcached cache Cache refill on restart Memcached compatible (drop in replace) Highly-available (data replication) Add or remove capacity to live cluster When values are JSON objects (“documents”): Create indices, views and query against the views
    32. 32. NoSQL catalog Database (memory/disk) Cache (memory only) Key-Value Data Structure memcached redis membase Document couchbase Column Graph
    33. 33. MongoDB – Document-oriented database Key MongoDB { } “string” : “string”, “string” : value, “string” : BSON , “string” : “string”, OBJECT “string” : value -, “string” : * array + (“DOCUMENT”) Disk-based with in-memory “caching” BSON (“binary JSON”) format and wire protocol Master-slave replication Auto-sharding Values are BSON objects Supports ad hoc queries – best when indexed
    34. 34. MongoDB Architecture
    35. 35. NoSQL catalog Database (memory/disk) Cache (memory only) Key-Value Data Structure memcached redis membase Document couchbase mongoDB Column Graph
    36. 36. Cassandra – Column overlays Key Column 1 Column 2 Column 3 (not present) 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Cassandra Disk-based system Clustered External caching required for low-latency reads “Columns” are overlaid on the data Not all rows must have all columns Supports efficient queries on columns Restart required when adding columns Good cross-datacenter support
    37. 37. Cassandra Architecture
    38. 38. NoSQL catalog Database (memory/disk) Cache (memory only) Key-Value Data Structure memcached redis membase Document Column couchbase cassandra mongoDB Graph
    39. 39. Neo4j – Graph database Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Key Neo4j Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Key 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Opaque 101100101000100010011101 Binary 101100101000100010011101 101100101000100010011101 Value 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 101100101000100010011101 Disk-based system External caching required for low-latency reads Nodes, relationships and paths Properties on nodes Delete, Insert, Traverse, etc.
    40. 40. NoSQL catalog Database (memory/disk) Cache (memory only) Key-Value Data Structure memcached redis membase Document Column Graph couchbase cassandra Neo4j mongoDB
    41. 41. Where is NoSQL a good fit?
    42. 42. Market Adoption Internet Companies • Social Gaming • Ad Networks • Social Networks • Online Business Services • E-Commerce • Online Media • Content Management • Cloud Services Enterprises • Communications • Retail • Financial Services • Health Care • Automotive/Airline • Agriculture • Consumer Electronics • Business Systems
    43. 43. Market Adoption – Customers Internet Companies Enterprises More than 300 customers -- 5,000 production deployments worldwide
    44. 44. Application Characteristics - Data driven • 3rd party or user defined structure (Twitter feeds) • Support for unlimited data growth (Viral apps) • Data with non-homogenous structure • Need to quickly and often change data structure • Variable length documents • Sparse data records • Hierarchical data Couchbase is a good fit
    45. 45. Application Characteristics - Performance driven • Low latency critical (ex. 1millisecond) • High throughput (ex. 200000 ops / sec) • Large number of users • Unknown demand with sudden growth of users/data • Predominantly direct document access • Read / Mixed / Write heavy workloads Couchbase is a good fit
    46. 46. Common Use Cases Social Gaming • Couchbase stores player and game data • Examples customers include: Zynga • Tapjoy, Ubisoft, Ten cent Mobile Apps • Couchbase stores user info and app content • Examples customers include: Kobo, Playtika Ad Targeting • Couchbase stores user information for fast access • Examples customers include: AOL, Mediamind, Co nvertro Session store • Couchbase Server as a keyvalue store • Examples customers include: Concur, Sabre User Profile Store • Couchbase Server as a key-value store • Examples customers include: Tunewiki High availability cache • Couchbase Server used as a cache tier replacement • Examples customers include: Orbitz Content & Metadata Store • Couchbase document store with Elasticsearch • Examples customers include: McGraw Hill, Tunewiki 3rd party data aggregation • Couchbase stores social media and data feeds • Examples customers include: Sambacloud
    47. 47. Q&A
    48. 48. Thank you dipti@couchbase.com @dborkar

    ×