SlideShare a Scribd company logo
1 of 68
1
Distributed Document database


           Sharon Barr
         VP of Engineering




                                2
Couchbase NoSQL Leadership

    Leading NoSQL database company
    Open Source development & business model
    Behind Couchbase open source project

    Document-oriented NoSQL database
    Focused on interactive internet and mobile applications


    Provide more flexible, higher performance,
    more scalable database than relational alternative

    Most mature, reliable and widely deployed solution
    >5,000 paid production deployments worldwide, over 350 customers


    Headquarters in Silicon Valley (Mountain View, CA)
    ~100 employees including >60 in engineering/product
    >80% of commits to Couchbase, memcached, Apache CouchDB

                                                                       3
Agenda




    What is a Document database
    The document model


    Couchbase Server
    Couchbase nosql solution




                                  5
The evolving database landscape




             Matthew Aslett – 451 group – Dec 2012
                                                     6
Where does document database fits?



    NoSql            Graph database   Analytics



                                         Transaction processing
      New SQL
                     Caching
                                      As-a-service

  None Relational
                                          Key Value database

 Relational


                    Apliances         Document database
                                                                  7
2 major types of data management systems




                    OLTP / ODS




               Analytics (OLAP) / EDW




                                           8
Evolution

                         OLAP (Analytics)


            Relational                      NewSQL


       Non - Relational                     NoSQL


                    OLTP (Transactional)



                    Database as a service
                                                     9
Evolution – NoSQL database types

                      OLAP (Analytics)


         Relational                      NewSQL


       Non - Relational        NoSQL: KV/Document/Graph


                  OLTP (Transactional)



                  Database as a service
                                                          10
The evolving database landscape




             Matthew Aslett – 451 group – Nov 2012
                                                     11
NoSQL catalog

                                                         Column
                Key-Value   Data Structure   Document                Graph
                                                          family
(memory only)
   Cache




                memcached        redis
(memory/disk)




                membase                      couchbase   cassandra   Neo4j
  Database




                                              couchDB



                                              mongoDB
                                                                             12
Survey: The leading driver for NoSQL adoption

           What is the biggest data management problem
           driving your use of NoSQL in the coming year?


           Lack of flexibility/rigid schemas                                                       49%



                 Inability to scale out data                                  35%



            High latency/low performance                              29%



                                      Costs          16%



                                All of these   12%



                                     Other     11%


                                                      Source: Couchbase NoSQL Survey, December 2011, n=1351




                                                                                                              13
FLEXIBLE SCHEMA
   COMPARING
  DATA MODELS




                  14
Key Value vs. Document database

    Pure Key-Value Database         Document Database


           10101001010100                {
           100011110101100                    “ID”: 1,
           010100010100011                    “FIRST”: “Frank”,
           110011000101010                    “LAST”: “Weigel”,
                                              “ZIP”: “94040”,
           010010010011001
                                              “CITY”: “MV”,
           101010100100011
                                              “STATE”: “CA”
           101010101001010
                                          }


      Couchbase Server 1.8           Couchbase Server 2.0
       - Current release             - Adds indexing/querying



          Both Key-Value & Document Use-Cases Supported

                                                                  15
Relational vs Document Data Model


        C1     C2      C3     C4



                                                 {     JSON
                                                     JSON
                                                 }
                                                         JSON

   Relational data model                   Document data model
  Highly-structured table organization    Collection of complex documents with
  with rigidly-defined data formats and     arbitrary, nested data formats and
             record structure.                   varying “record” format.



                                                                                 16
RDBMS Example: User Profile

             User Info                               Address Info
     KEY    First    Last    ZIP_id         ZIP_id    CITY   STATE    ZIP

      1     Frank   Weigel     2              1       DEN     CO     30303



      2      Ali    Dodson     2              2       MV      CA     94040



      3     Mark     Azad      2              3       CHI     IL     60609



      4     Steve    Yen       3              4       NY      NY     10010




   To get info about specific user, you perform a join across two tables


                                                                             17
Document Example: User Profile




 {
     “ID”: 1,
     “FIRST”: “Frank”,
     “LAST”: “Weigel”,
     “ZIP”: “94040”,
     “CITY”: “MV”,
                          =                      +
     “STATE”: “CA”
 }
                   JSON




                          All data in a single document

                                                          18
Making a Change Using RDBMS
           User Table                                      Photo Table                            Country Table
                                       Country     TEL                               Country
User ID   First        Last     Zip              User ID     Photo ID    Comment       ID      Country ID       Country name
                                         ID         3
                                                   2          d043         NYC         001        001              USA
  1       Frank       Wiegel   94040    001
                                                   2          b054         Bday        007        002               UK
  2        Joe        Smith    94040    001        5          c036        Miami        001        003            Argentina
  3        Ali        Dodson   94040    001        7          d072        Sunset       133
                                                                                                  004            Australia
                                                 5002         e086        Spain        133
  4       Sarah       Gorin    NW1      002                                                       005              Aruba

                                        001
                                                           Status Table                           006             Austria
  5       Bob         Young    30303                                                 Country
                                                 User ID     Status ID     Text        ID
                                                                                                  007              Brazil
  6       Nancy       Baker    10010    001        1           a42        At conf     134
                                                                                                  008             Canada
                                                   4           b26        excited     007
  7        Ray        Jones    31311    001
                                                   5           c32        hockey      008         009              Chile
  8        Lee         Chen    V5V3M    008
                                                   12          d83        Go A’s      001                   •
                                                                                                            •
                                                                                                            •
                                                 5000          e34        sailing     005
                  •                       .
                  •                       .                                                       130            Portugal
                  •                       .
                                                    Affiliations Table
                                                                                     Country
                                                 User ID      Affl ID    Affl Name     ID         131            Romania
50000     Doug        Moore    04252    001        2           a42         Cal        001         132             Russia
                                                   4           b96         USC        001
50001     Mary        White    SW195    002                                                       133              Spain
                                                   7           c14         UW         001
50002     Lisa         Clark   12425    001        8           e22       Oxford       002         134             Sweden
                                                                                                                               19
Making the Same Change With Couchbase



                   {
                    “ID”: 1,
                    “FIRST”: “Frank”,
                    “LAST”: “Weigel”,
                    “ZIP”: “94040”,
                    “CITY”: “MV”,
                    “STATE”: “CA”,
                    “STATUS”:
                                       ,}
                     { “TEXT”: “At Conf”
                   } “GEO_LOC”: “134” },
                    “COUNTRY”: ”USA”
                   }
                                       JSON


             Just add information to a document


                                                  20
Document Databases


• Each record in the database is a self-
  describing document                      {

• Each document has an independent         “UUID”: “21f7f8de-8051-5b89-86
                                           “Time”: “2011-04-01T13:01:02.42
                                           “Server”: “A2223E”,

  structure                                “Calling Server”: “A2213W”,
                                           “Type”: “E100”,
                                           “Initiating User”: “dsallings@spy.net”,

• Documents can be complex                 “Details”:
                                                    {


• All databases require a unique key
                                                    “IP”: “10.1.1.22”,
                                                    “API”: “InsertDVDQueueItem”,
                                                    “Trace”: “cleansed”,

• Documents are stored using JSON or
                                                    “Tags”:
                                                          [
                                                          “SERVER”,

  XML or their derivatives                                “US-West”,
                                                          “API”
                                                            ]

• Database can look into the documents     }
                                                    }


   • Content can be indexed and queried


                                                                                     21
Document database

• Json objects
• Each document has an independent schema

{                                                       {
    "_id": "brewery_Cleveland_ChopHouse_and_Brewery",       "_id": "beer_Double_Cream_Oatmeal_Stout",
     "_rev": "1-00000061480b50910000000000000000",          "_rev": "1-0000042ee19241b60000000000000000",
     "city": "Cleveland",                                   "category": "North American Ale",
     "updated": "2010-07-22 20:00:20",                      "style": "American-Style Stout",
     "code": "44113",                                       "name": "Double Cream Oatmeal Stout",
     "name": "Cleveland ChopHouse and Brewery",             "updated": "2010-07-22 20:00:20",
     "country": "United States",                            "brewery": "Olde Peninsula Brewpub and Restaurant",
     "phone": "1-216-623-0909",                             "$expiration": 0,
     "state": "Ohio",                                       "$flags": 0
     "address": [                                       }
                    "824 West St.Clair Avenue”
     ],
     "geo": {
                "loc": [
                              "-81.6994",
                              "41.4995”
                ],
     ]          "accuracy": "ROOFTOP”
     },
     "$expiration": 0,
     "$flags": 0
}
                                                                                                                  22
Document modeling


         •   Are these separate object in the model layer?

 Q       •
         •
             Are these objects accessed together?
             Do you need updates to these objects to be atomic?
         •   Are multiple people editing these objects concurrently?


     When considering how to model data for a given
     application
     • Think of a logical container for the data
     • Think of how data groups together



                                                                       23
Document Design Options

 • One document that contains all related data
    – Data is de-normalized
    – Better performance and scale
    – Eliminate client-side joins

 • Separate documents for different object types with cross
   references
    –   Data duplication is reduced
    –   Objects may not be co-located
    –   Transactions supported only on a document boundary
    –   Most document databases do not support joins or multi
        document transactions

                                                                24
Document ID / Key selection

•   Similar to primary keys in relational databases
•   Documents are sharded based on the document ID
•   ID based document lookup is extremely fast
•   Usually an ID can only appear once in a bucket

           • Do you have a unique way of referencing objects?
 Q         • Are related objects stored in separate documents?

Options
    •UUIDs, date-based IDs, numeric IDs
    •Hand-crafted (human readable)
    •Matching prefixes (for multiple related objects)

                                                                 25
Example: Entities for a Blog
                                                BLOG
   • User profile
     The main pointer into the user data
       • Blog entries
       • Badge settings, like a twitter badge
   • Blog posts
        Contains the blogs themselves
   • Blog comments
     • Comments from other users


                                                       26
Blog Document – Option 1 – Single document

    {
    “UUID ”: “2 1 f7 f8 de-8 0 5 1-5 b89 -8 6
    “Time”: “2 0 1 1-0 4-0 1 T1 3 :0 1 :0 2.4 2
 { “Server”: “A2 2 2 3 E”,
 “_id”: “Hello_World”,
    “Calling Server”: “A2 2 1 3 W”,
    “Type”: “E1 0 0 ”,
 “author”: “John Smith”,
    “Initiating Us er”: “ds allings @s py.net”,
 “type”: “post”
    “D etails ”:
 “title”: “Hello World”,
                {
 “format”: “markdown”,0 .1 .1 .2 2 ”,
                “IP”: “1
 “body”: “Hello from [Couchbase](http://couchbase.com).”,
                “API”: “Ins ertD VD QueueItem”,
                “Trace”: “cleans ed”,
 “html”: “<p>Hello from <a href=“http: …
                “Tags ”:
 “comments”:[           [
          [“format”: “markdown”, “body”:”Awesome post!”],
                        “SERVER”,
           [“format”: “markdown”, “body”:”Like it.” ]
                        “US-Wes t”,
         ]              “API”
 }                        ]
                }
    }


                                                            27
Threaded Comments

• You can imagine how to take this to a threaded list

                          List     First
                                                       Reply to
                                   comment
        Blog                                    List   comment



                                    More
                                    Comments
Advantages
• Only fetch the data when you need it
  • For example, rendering part of a web page
• Spread the data and load across the entire cluster
                                                                  28
Blog Document – Option 2 - Split into multiple docs

{
{
“UUID ”: “21f7f8de-8 0 5 1-5b89 -8 6
“_id”: “Hello_World”,
“Time”: “2 0 1 1-0 4-01T13:01:02.42
“Server”: “A2223E”,
“author”: “John Smith”,
“Calling Server”: “A2213W”,
“type”: “post” ”,
“Type”: “E100
“title”: “Hello World”,
“Initiating Us er”: “ds allings @s py.net”,
“D etails ”:
“format”: “markdown”,
          {
“body”: “Hello“10.1.1.22”,
          “IP”: from
          “API”: “Ins ertDVD QueueItem”,
[Couchbase](http://couchbase.com).”,
          “Trace”: “cleans ed”,
“html”: “<p>Hello from <a href=“http: …
          “Tags ”:
                [
“comments”:[ “SERVER”,
              “comment1_Hello_world”
                “US-Wes t”,
            ] “API”
                  ]        {
                                                                   COMMENT
}         }                “UUID ”: “ 2 1 f7 f8 de-8 0 5 1 -5 b8  9 -8 6
                           “Time”: “ 2 0 1 1 -0 4 -0 1 T1 3 :0 1 :0 2.4 2
                           “Server”:    “A2 2 2 3 E”,
}
                            {
                           “Calling Server”:
                           “Type”: “E1 0 0 ”,
                                                   “A2 2 1 3 W ”,


      BLOG DOC             “Initiating Us er”: “ds allings @s py.net”,

                            “_id”: “comment1_Hello_World”,
                           “D etails ”:
                                    {
                                    “IP ”: “ 1 0 .1 .1 .2 2 ”,
                            “format”: “markdown”,
                                    “AP I”: “ Ins ertD VD QueueItem”,
                                    “Trace”: “cleans ed”,

                            “body”:”Awesome post!”
                                    “Tags ”:
                                           [
                                           “SERVER”,

                            }              “US-Wes t”,
                                           “AP I”
                                             ]
                                    }
                           }
                                                                             29
Example 2 – Different object types
                                                     User
 [Serializable]                                                  Key                     Value
 User                                                User_1234                1234;Cheli;
 {
   public long ID;                                   Buddies
   public string Name;                                           Key                        Value

                                                     User_1234_Buddies        User_5678
     [NonSerialized]                                                          User_9876
     public list<User> Buddies;
                                                     Messages
                                                                 Key                        Value
     [NonSerialized]
     public list<Messages> Messages                  User_1234_Messages       Expire-> 9/9/9999
                                                                              Message_1234
                                                                              Message_5678
     [NonSerialized]
     public Dictionary<Game,List<Bet>> BetsByGame
 }

                                                                 Key                        Value
                                                     User_1234_BetsByGame_1   Bet_1234
  BetsByGame                                                                  Bet_2345
               Key                   Value
                                                                 Key                        Value
     User_1234_BetsByGame   User_1234_BetsByGame_1
                            User_1234_BetsByGame_2   User_1234_BetsByGame_2   Bet_9876
30                                                                                                  30
COUCHBASE DATABASE


                     31
Relational Technology Scales Up
                                                         Application Scales Out
                                                 Just add more commodity web servers

                                                    System Cost
                                                    Application Performance


Web/App Server Tier




                                                                    Users

                                                            RDBMS Scales Up
                                                   Get a bigger, more complex server

                                                    System Cost
                                                    Application Performance



                                                                                       Won’t
                                                                                       scale
                                                                                       beyond
                                                                                       this point
                      Relational Database
                                                                    Users



               Expensive and disruptive sharding, doesn’t perform at web scale
                                                                                                    32
Couchbase Server Scales Out Like App Tier
                                                            Application Scales Out
                                                    Just add more commodity web servers

                                                           System Cost
                                                           Application Performance


Web/App Server Tier




                                                   Users

                                                         NoSQL Database Scales Out
                                                    Cost and performance mirrors app tier

                                                           System Cost
                                                           Application Performance


               Couchbase Distributed Data Store




                                                   Users



                       Scaling out flattens the cost and performance curves
                                                                                            33
Couchbase Server                                             (a.k.a. Membase)




           Simple. Fast. Elastic. NoSQL.
   Couchbase automatically distributes data across commodity servers. Built-in caching enables
        apps to read and write data with sub-millisecond latency. And with no schema to
    manage, Couchbase effortlessly accommodates changing data management requirements.




                                                                                                 34
Couchbase Server Is The Complete Solution


           Easy                        Consistent High
 ✔      Scalability                   ✔ Performance
   One click scalability and no app       Sub millisecond latency with high
               changes.                   throughput for reads and writes.




 ✔      Always On                     ✔       Flexible
         24x365                              Data Model
    Maintenance, upgrades and          JSON document model with no fixed
    cluster resizing all online        schema.
    without application downtime


                                                                              35
Use Case Examples

Web app or Use-case     Couchbase Solution                           Example Customer
Content and Metadata Couchbase document store + Elastic Search       McGraw-Hill…
Management System
Social Game or Mobile Couchbase stores game and player data          Zynga, OMGPOP…
App
Ad Targeting            Couchbase stores user information for fast   AOL…
                        access
User Profile Store      Couchbase Server as a key-value store        TuneWiki…


Session Store           Couchbase Server as a key-value store        Concur….


High Availability       Couchbase Server as a memcached tier         Orbitz…
Caching Tier            replacement

Chat/Messaging          Couchbase Server                             DOCOMO…
Platform
                                                                                        37
# 1 reason for users to move to noSQL




                                        •    3
                                        38
                                        38   8
PERFORMANCE
PREDICTABLE LATENCY




                      39
Key results of Cisco and Solarflare Benchmark

Couchbase Server demonstrates

• Consistent sub-millisecond
  latency for mixed workload

• High throughput

• Linear scalability



     http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-708169.pdf   40
Your secret weapon: Sub-millisecond AND consistent latency
Latency (micro seconds)




                                                Consistently low latencies
                                                in microseconds for
                                                varying documents sizes
                                                with a mixed workload




                          Object size (Bytes)


                                                                       41
Your secret weapon: Linear scalability

                                     High throughput with 1.4
                                     GB/sec data transfer rate
                                     using 4 servers
Operations per second




                        Linear throughput
                        scalability



                             Number of servers in cluster


                                                                 42
Write Performance Comparison
                               30

                                                            Insert/update latencies vs. throughput

                               25                               Mongodb
95th Percentile Latency (ms)




                               20




                               15

                                                                                     Cassandra

                               10




                               5
                                                                                                                              Couchbase


                               0
                                    0   2000   4000      6000      8000      10000     12000     14000    16000       18000   20000   22000
                                                                     Operations per Second
                                               http://altoros.com/nosql_databases_for_interactive_applications.html
                                                                                                                                       43
SCALE




        44
Draw Something by OMGPOP




                           45
50 Million Users in 50 Days
     Draw Something by OMGPOP
     Daily Active Users (millions)
16




14




12




10




8




6




4




2




       2/6   8   10   12   14   16   18   20   22   24   26   28   3/1   3   5   7   9   11   13   15   17   19   21


                                                                                                                       46
Game Data Went Non-Linear
     Draw Something by OMGPOP
     Daily Active Users (millions)
16




14               By March 29:
                 • 30 million downloads
12
                 • 3,000+ drawings/second
10               • 2 billion drawings
8
                 • 105,000 TPS
                 • 3.3 TB data stored
6




4




2




       2/6   8   10   12   14   16   18   20   22   24   26   28   3/1   3   5   7   9   11   13   15   17   19   21


                                                                                                                       47
In Contrast: The Simpsons Tapped Out
     The Simpson’s: Tapped Out
     Daily Active Users (millions)
16




14

               EA Launches The
             Simpsons Tapped Out
12




10




8




6




4
                                           #2 Free app on iPad
                                           #3 Free app on iPhone
2




       2/6   8   10   12   14   16   18   20   22   24   26   28   3/1   3   5   7   9   11   13   15   17   19   21


                                                                                                                       48
ALWAYS ONLINE




                49
Partitioning The Data – vbucket (internal partitions) map




                                                            50
Basic Operation – scale out
                   APP SERVER 1                                     APP SERVER 2
                                                                                                      Docs distributed evenly across
                       COUCHBASE CLIENT LIBRARY                          COUCHBASE CLIENT LIBRARY
                                                                                                       servers in the cluster
                                                                                                      Each server stores both active
                                    CLUSTER MAP                                     CLUSTER MAP
                                                                                                       & replica docs
                                                                                                        Only one server active at a time
                                                                                                      Client library provides app with
                         Read/Write/Update                  Read/Write/Update                          simple interface to database
                                                                                                      Cluster map provides map to
                                                                                                       which server doc is on
                                                                                                        App never needs to know
        SERVER 1                             SERVER 2                           SERVER 3
                                                                                                      App reads, writes, updates
        Active Docs                          Active Docs                        Active Docs
                                                                                                       docs
          Doc 5        DOC                     Doc 4        DOC                    Doc 1       DOC
                                                                                                      Multiple App Servers can
          Doc 2        DOC                     Doc 7        DOC                    Doc 3       DOC     access same document at
          Doc 9        DOC                     Doc 8        DOC                    Doc 6       DOC
                                                                                                       same time

        Replica Docs                         Replica Docs                       Replica Docs

          Doc 4        DOC                     Doc 6        DOC                    Doc 7       DOC

          Doc 1        DOC                     Doc 3        DOC                    Doc 9       DOC

          Doc 8        DOC                     Doc 2        DOC                    Doc 5       DOC



                                  COUCHBASE SERVER CLUSTER

User Configured Replica Count = 1                                                                                                           51
Add Nodes
                  APP SERVER 1                                APP SERVER 2


                                                                                                              Two servers added to
                       COUCHBASE CLIENT LIBRARY                    COUCHBASE CLIENT LIBRARY                    cluster
                                                                                                                  One-click operation
                                    CLUSTER MAP                               CLUSTER MAP
                                                                                                              Docs automatically
                                                                                                               rebalanced across
                                                                                                               cluster
                                                                                                                  Even distribution of
                                                                                                                   docs
                            Read/Write/Update                    Read/Write/Update                                Minimum doc
                                                                                                                   movement
                                                                                                              Cluster map updated
                                                                                                              App database calls now
                                                                                                               distributed over larger #
   SERVER 1                  SERVER 2              SERVER 3                  SERVER 4         SERVER 5         of servers
  Active Docs                  Active Docs          Active Docs              Active Docs      Active Docs
                                                       Active Docs
    Doc 5        DOC           Doc 4         DOC     Doc 1       DOC
                                                          Doc 3
    Doc 2        DOC           Doc 7         DOC     Doc 3       DOC
                                                         Doc 6
    Doc 9        DOC           Doc 8         DOC     Doc 6       DOC

  Replica Docs                Replica Docs          Replica Docs             Replica Docs     Replica Docs
                                                       Replica Docs
    Doc 4        DOC           Doc 6         DOC     Doc 7       DOC
                                                          Doc 7
    Doc 1        DOC           Doc 3         DOC     Doc 9       DOC
                                                         Doc 9
    Doc 8        DOC           Doc 2         DOC     Doc 5       DOC



                                              COUCHBASE SERVER CLUSTER

User Configured Replica Count = 1                                                                                                     52
Fail Over Node
                  APP SERVER 1                                APP SERVER 2
                                                                                                              App servers happily accessing docs
                                                                                                               on Server 3
                       COUCHBASE CLIENT LIBRARY                   COUCHBASE CLIENT LIBRARY                    Server fails
                                                                                                              App server requests to server 3 fail
                                    CLUSTER MAP                              CLUSTER MAP                      Cluster detects server has failed
                                                                                                                  Promotes replicas of docs to active
                                                                                                                  Updates cluster map
                                                                                                              App server requests for docs now
                                                                                                               go to appropriate server
                                                                                                              Typically rebalance would follow




   SERVER 1                  SERVER 2              SERVER 3                  SERVER 4             SERVER 5
  Active Docs                  Active Docs          Active Docs              Active Docs          Active Docs
                                                       Active Docs
    Doc 5        DOC           Doc 4         DOC     Doc 1       DOC           Doc 9        DOC     Doc 6        DOC
                                                          Doc 3
    Doc 2        DOC           Doc 7         DOC     Doc 3                     Doc 8                             DOC
                                                         Doc 6
                                             DOC

  Replica Docs                Replica Docs          Replica Docs             Replica Docs         Replica Docs
                                                       Replica Docs
    Doc 4        DOC           Doc 6         DOC     Doc 7       DOC           Doc 5        DOC     Doc 8        DOC
                                                          Doc 7
    Doc 1        DOC           Doc 3         DOC     Doc 9       DOC           Doc 2                             DOC
                                                         Doc 9




                                              COUCHBASE SERVER CLUSTER

User Configured Replica Count = 1                                                                                                                        53
New in Couchbase Server 2.0


      JSON support         Indexing and Querying


            JSON
          JSON JSO
           JSON N
           JSON




  Incremental Map Reduce   Cross data center replication




                                                           54
Additional Couchbase Server Features


  Append-only storage layer

  Online compaction

  Better working set management

  Reduce server warm-up time

  Monitoring and admin API & UI

  SDKs, documentation and examples for a variety of languages


                                                                55
Couchbase Server 2.0 Architecture
    8092           11211                  11210
    Couch View     Memcapable 1.0         Memcapable 2.0



                     Moxi




                                                              REST management API/Web UI




                                                                                                                                                                                                                                vBucket state and replication manager
                              Memcached Interface
       Couch API




                                                                                                                                                 Global singleton supervisor


                                                                                                                                                                                Rebalance orchestrator
                                                                                                                         Configuration manager




                                                                                                                                                                                                          Node health monitor
                                                                                                       Process monitor
                                                                                           Heartbeat
                     Couchbase EP Engine
                                           Write/replica
                      Hash table cache

           Data Manager                      Queues                      Cluster Manager
                                                                                 Membase

                                         storage interface

        Distributed                 CouchStore
         Indexing                   Auto compaction          http                               on each node                                                                   one per cluster


                           CouchBase                                                                                 Erlang/OTP



                                                             HTTP                            Erlang port mapper                                                                                          Distributed Erlang
                                                             8091                            4369                                                                                                        21100 - 21199
                                                                                                                                                                                                                                                                        56
Couchbase Server 2.0 Architecture
    8092           11211                  11210
    Couch View     Memcapable 1.0         Memcapable 2.0



                     Moxi




                                                              REST management API/Web UI




                                                                                                                                                                                                                                vBucket state and replication manager
                              Memcached Interface
       Couch API




                                                                                                                                                 Global singleton supervisor


                                                                                                                                                                                Rebalance orchestrator
                                                                                                                         Configuration manager




                                                                                                                                                                                                          Node health monitor
                                                                                                       Process monitor
                                                                                           Heartbeat
                     Couchbase EP Engine
                                           Write/replica
                      Hash table cache
                                             Queues                      Cluster Manager
                                                                                 Membase

                                         storage interface

        Distributed                 CouchStore
         Indexing                   Auto compaction          http                               on each node                                                                   one per cluster


                           CouchBase                                                                                 Erlang/OTP



                                                             HTTP                            Erlang port mapper                                                                                          Distributed Erlang
                                                             8091                            4369                                                                                                        21100 - 21199
                                                                                                                                                                                                                                                                        57
Couchbase Server 2.0 Architecture
    8092           11211                  11210
    Couch View     Memcapable 1.0         Memcapable 2.0



                     Moxi




                                                              REST management API/Web UI




                                                                                                                                                                                                                                vBucket state and replication manager
                              Memcached Interface
       Couch API




                                                                                                                                                 Global singleton supervisor


                                                                                                                                                                                Rebalance orchestrator
                                                                                                                         Configuration manager




                                                                                                                                                                                                          Node health monitor
                                                                                                       Process monitor
                                                                                           Heartbeat
                     Couchbase EP Engine
                      Hash table cache     Write/replica
                                             Queues



                                         storage interface

        Distributed                 CouchStore
         Indexing                   Auto compaction          http                               on each node                                                                   one per cluster


                           CouchBase                                                                                 Erlang/OTP



                                                             HTTP                            Erlang port mapper                                                                                          Distributed Erlang
                                                             8091                            4369                                                                                                        21100 - 21199
                                                                                                                                                                                                                                                                        58
Indexing and querying

• Built-in incremental map reduce

• Map functions are written and executed on Java Script
  (using Google’s V8 engine)

• Index is built incrementally as mutation streams in

• Query in a scatter/gather fashion



                                                          59
Map function
• Map functions
function (doc) {
  if (doc.country, doc.state, doc.city) {
    emit([doc.country, doc.state, doc.city], 1);
  } else if (doc.country, doc.state) {
    emit([doc.country, doc.state], 1);
  } else if (doc.country) {
    emit([doc.country], 1);
  }
}




        REST call: http://db1.couchbase.com:8092/beer-sample/_design/dev_beer/_view/by_location?limit=10
                                                                                                           60
Reduce functions

• Built in reduce functions
   • _count
   • _sum
   • _stats ({“sum”: 1411, “count”: 1411, “min”: 1, “max”: 1, “sumsqr”:1411})



• Developing procedure
   • Develop against a subset of the data
   • Built the index on the entire cluster
   • Promote a dev_ view to production



                                                                                61
Indexing and Querying
                  APP SERVER 1                                APP SERVER 2
                   APP SERVER 1                               APP SERVER 2
                                                                                               Indexing work is distributed
                       COUCHBASE CLIENT LIBRARY
                       COUCHBASE CLIENT LIBRARY
                                                                  COUCHBASE CLIENT LIBRARY
                                                                  COUCHBASE CLIENT LIBRARY      amongst nodes
                                                                                                Large data set possible
                              CLUSTER MAP MAP
                                  CLUSTER                               CLUSTER MAPMAP
                                                                            CLUSTER
                                                                                                Parallelize the effort
                                                                                               Each node has index for data
                                                                                                stored on it
                                                                              Query
                                                                             Response          Queries combine the results
                                                                                                from required nodes


       SERVER 1                         SERVER 2                        SERVER 3
        Active Docs                      Active Docs                     Active Docs
          Doc 5        DOC                 Doc 4        DOC                  Doc 1      DOC

          Doc 2        DOC                 Doc 7        DOC                  Doc 3      DOC

          Doc 9        DOC                 Doc 8        DOC                  Doc 6      DOC

        Replica Docs                     Replica Docs                    Replica Docs

          Doc 4        DOC                 Doc 6        DOC                  Doc 7      DOC

          Doc 1        DOC                 Doc 3        DOC                  Doc 9      DOC

          Doc 8        DOC                 Doc 2        DOC                  Doc 5      DOC




User Configured Replica Count = 1                                                                                          62
Cross Data Center Replication



   US DATA                        EUROPE DATA                 ASIA DATA
   CENTER                           CENTER                     CENTER
             Replication                        Replication


                           Replication




   Data close to users
   Multiple locations for disaster recovery
   Independently managed clusters serving local data

                                                                          63
XDCR: Cross Data Center Replication

• Replicate your Couchbase data across clusters
• Clusters may be spread across geos
• Configured on a per-bucket basis
• Supports unidirectional and bidirectional operation
• Application can read and write from both clusters
  (active – active replication)
• Scales out linearly
• Different from intra-cluster replication



                                                        64
Intra-cluster Replication




                            65
Cross Datacenter Replication (XDCR)




                                      66
Elastic Search integration
                                    COUCHBASE SERVER CLUSTER
                                                                                                   Use the cross data center
        SERVER 1                         SERVER 2                       SERVER 3                    interface
        Active Docs                     Active Docs                     Active Docs                Agnostic to topology changes
          Doc 5        DOC                Doc 4         DOC               Doc 1        DOC
                                                                                                   De-duplication
          Doc 2        DOC                Doc 7         DOC               Doc 3        DOC         Effective changes feed of the
          Doc 9        DOC                Doc 8         DOC               Doc 6        DOC          entire cluster
        Replica Docs                     Replica Docs                   Replica Docs

          Doc 4        DOC                Doc 6         DOC               Doc 7        DOC

          Doc 1        DOC                Doc 3         DOC               Doc 9        DOC

          Doc 8        DOC                Doc 2         DOC               Doc 5        DOC




                                    CROSS DATA CENTER CONNETROR




                       Changes feed to consumed by
               Elastic Search cluster, or any other consumer
    http://blog.couchbase.com/couchbase-and-full-text-search-couchbase-transport-elastic-search


User Configured Replica Count = 1                                                                                                   67
Couchbase and Hadoop Integration
• Support large-scale analytics on application data by streaming data
  from Couchbase to Hadoop
   – Real-time integration using Flume
   – Batch integration using Sqoop
• Examples
   – Various game statistics (e.g., monthly / daily / hourly rankings)
   – Analyze game patterns from users to enhance various game metrics



                                                          memcached
                               Sqoop               TAP    protocol listener/sender
                                                                         engine interface



                                                       Couchbase Storage Engine



                                                                                        6
                                                                                        68
Couchbase Client SDKs


Java Client
SDK
              User Code


.Net SDK       Java client API
                                                CouchbaseClient cb = new CouchbaseClient(listURIs,
                                                "aBucket", "letmein");
                                                // this is all the same as before
                                                cb.set("hello", 0, "world");
                                                cb.get("hello");
               spymemcached      HTTP couchDB   Map<String, Object> manyThings =
PHP SDK        Connection        connection
                                                cb.getBulk(Collection<String> keys);
                                                /* accessing a view
                                                View view =
                                                cb.getView("design_document", "my_view");
                                                Query query = new Query();
                                                query.getRange("abegin", "theend");
Ruby SDK
              Couchbase Server
Python SDK

              http://www.couchbase.come/develop
                                                                                               69
THANK YOU

          COUCHBASE
  SIMPLE, FAST, ELASTIC NOSQL

sharon@couchbase.com
@sharonyb




                                70

More Related Content

What's hot

TOP NEWSQL DATABASES AND FEATURES CLASSIFICATION
TOP NEWSQL DATABASES AND FEATURES CLASSIFICATIONTOP NEWSQL DATABASES AND FEATURES CLASSIFICATION
TOP NEWSQL DATABASES AND FEATURES CLASSIFICATION
ijdms
 

What's hot (7)

NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
NOSQL overview and intro to graph databases with Neo4j (Geeknight May 2010)
 
web 3.0 part1
web 3.0 part1web 3.0 part1
web 3.0 part1
 
Eifrem neo4j
Eifrem neo4jEifrem neo4j
Eifrem neo4j
 
TOP NEWSQL DATABASES AND FEATURES CLASSIFICATION
TOP NEWSQL DATABASES AND FEATURES CLASSIFICATIONTOP NEWSQL DATABASES AND FEATURES CLASSIFICATION
TOP NEWSQL DATABASES AND FEATURES CLASSIFICATION
 
NoSQL
NoSQLNoSQL
NoSQL
 
A Slightly Different Web of Data
A Slightly Different Web of DataA Slightly Different Web of Data
A Slightly Different Web of Data
 
VNSISPL_DBMS_Concepts_appA
VNSISPL_DBMS_Concepts_appAVNSISPL_DBMS_Concepts_appA
VNSISPL_DBMS_Concepts_appA
 

Similar to Couchbase at the academic bisilim, Turkey

B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.
B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.
B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.
vinithamaniB
 
Couchbase Overview Nov 2013
Couchbase Overview Nov 2013Couchbase Overview Nov 2013
Couchbase Overview Nov 2013
Jeff Harris
 
Characteristics of no sql databases
Characteristics of no sql databasesCharacteristics of no sql databases
Characteristics of no sql databases
Dipti Borkar
 

Similar to Couchbase at the academic bisilim, Turkey (20)

Go simple-fast-elastic-with-couchbase-server-borkar
Go simple-fast-elastic-with-couchbase-server-borkarGo simple-fast-elastic-with-couchbase-server-borkar
Go simple-fast-elastic-with-couchbase-server-borkar
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.
B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.
B.Vinithamani,II-M.sc.,Computer science,Bon Secours college for women,thanjavur.
 
Couchbase Overview Nov 2013
Couchbase Overview Nov 2013Couchbase Overview Nov 2013
Couchbase Overview Nov 2013
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data World
 
Anti-social Databases
Anti-social DatabasesAnti-social Databases
Anti-social Databases
 
Unit 3 MongDB
Unit 3 MongDBUnit 3 MongDB
Unit 3 MongDB
 
NoSql databases
NoSql databasesNoSql databases
NoSql databases
 
Unit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docxUnit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docx
 
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Opera...
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with Spark
 
A unified data modeler in the world of big data
A unified data modeler in the world of big dataA unified data modeler in the world of big data
A unified data modeler in the world of big data
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudant
 
Business Intelligence & NoSQL Databases
Business Intelligence & NoSQL DatabasesBusiness Intelligence & NoSQL Databases
Business Intelligence & NoSQL Databases
 
the rising no sql technology
the rising no sql technologythe rising no sql technology
the rising no sql technology
 
Nathean ISA Open Data 28 feb 2012
Nathean ISA Open Data 28 feb 2012Nathean ISA Open Data 28 feb 2012
Nathean ISA Open Data 28 feb 2012
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
Characteristics of no sql databases
Characteristics of no sql databasesCharacteristics of no sql databases
Characteristics of no sql databases
 

Couchbase at the academic bisilim, Turkey

  • 1. 1
  • 2. Distributed Document database Sharon Barr VP of Engineering 2
  • 3. Couchbase NoSQL Leadership Leading NoSQL database company Open Source development & business model Behind Couchbase open source project Document-oriented NoSQL database Focused on interactive internet and mobile applications Provide more flexible, higher performance, more scalable database than relational alternative Most mature, reliable and widely deployed solution >5,000 paid production deployments worldwide, over 350 customers Headquarters in Silicon Valley (Mountain View, CA) ~100 employees including >60 in engineering/product >80% of commits to Couchbase, memcached, Apache CouchDB 3
  • 4. Agenda What is a Document database The document model Couchbase Server Couchbase nosql solution 5
  • 5. The evolving database landscape Matthew Aslett – 451 group – Dec 2012 6
  • 6. Where does document database fits? NoSql Graph database Analytics Transaction processing New SQL Caching As-a-service None Relational Key Value database Relational Apliances Document database 7
  • 7. 2 major types of data management systems OLTP / ODS Analytics (OLAP) / EDW 8
  • 8. Evolution OLAP (Analytics) Relational NewSQL Non - Relational NoSQL OLTP (Transactional) Database as a service 9
  • 9. Evolution – NoSQL database types OLAP (Analytics) Relational NewSQL Non - Relational NoSQL: KV/Document/Graph OLTP (Transactional) Database as a service 10
  • 10. The evolving database landscape Matthew Aslett – 451 group – Nov 2012 11
  • 11. NoSQL catalog Column Key-Value Data Structure Document Graph family (memory only) Cache memcached redis (memory/disk) membase couchbase cassandra Neo4j Database couchDB mongoDB 12
  • 12. Survey: The leading driver for NoSQL adoption What is the biggest data management problem driving your use of NoSQL in the coming year? Lack of flexibility/rigid schemas 49% Inability to scale out data 35% High latency/low performance 29% Costs 16% All of these 12% Other 11% Source: Couchbase NoSQL Survey, December 2011, n=1351 13
  • 13. FLEXIBLE SCHEMA COMPARING DATA MODELS 14
  • 14. Key Value vs. Document database Pure Key-Value Database Document Database 10101001010100 { 100011110101100 “ID”: 1, 010100010100011 “FIRST”: “Frank”, 110011000101010 “LAST”: “Weigel”, “ZIP”: “94040”, 010010010011001 “CITY”: “MV”, 101010100100011 “STATE”: “CA” 101010101001010 } Couchbase Server 1.8 Couchbase Server 2.0 - Current release - Adds indexing/querying Both Key-Value & Document Use-Cases Supported 15
  • 15. Relational vs Document Data Model C1 C2 C3 C4 { JSON JSON } JSON Relational data model Document data model Highly-structured table organization Collection of complex documents with with rigidly-defined data formats and arbitrary, nested data formats and record structure. varying “record” format. 16
  • 16. RDBMS Example: User Profile User Info Address Info KEY First Last ZIP_id ZIP_id CITY STATE ZIP 1 Frank Weigel 2 1 DEN CO 30303 2 Ali Dodson 2 2 MV CA 94040 3 Mark Azad 2 3 CHI IL 60609 4 Steve Yen 3 4 NY NY 10010 To get info about specific user, you perform a join across two tables 17
  • 17. Document Example: User Profile { “ID”: 1, “FIRST”: “Frank”, “LAST”: “Weigel”, “ZIP”: “94040”, “CITY”: “MV”, = + “STATE”: “CA” } JSON All data in a single document 18
  • 18. Making a Change Using RDBMS User Table Photo Table Country Table Country TEL Country User ID First Last Zip User ID Photo ID Comment ID Country ID Country name ID 3 2 d043 NYC 001 001 USA 1 Frank Wiegel 94040 001 2 b054 Bday 007 002 UK 2 Joe Smith 94040 001 5 c036 Miami 001 003 Argentina 3 Ali Dodson 94040 001 7 d072 Sunset 133 004 Australia 5002 e086 Spain 133 4 Sarah Gorin NW1 002 005 Aruba 001 Status Table 006 Austria 5 Bob Young 30303 Country User ID Status ID Text ID 007 Brazil 6 Nancy Baker 10010 001 1 a42 At conf 134 008 Canada 4 b26 excited 007 7 Ray Jones 31311 001 5 c32 hockey 008 009 Chile 8 Lee Chen V5V3M 008 12 d83 Go A’s 001 • • • 5000 e34 sailing 005 • . • . 130 Portugal • . Affiliations Table Country User ID Affl ID Affl Name ID 131 Romania 50000 Doug Moore 04252 001 2 a42 Cal 001 132 Russia 4 b96 USC 001 50001 Mary White SW195 002 133 Spain 7 c14 UW 001 50002 Lisa Clark 12425 001 8 e22 Oxford 002 134 Sweden 19
  • 19. Making the Same Change With Couchbase { “ID”: 1, “FIRST”: “Frank”, “LAST”: “Weigel”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA”, “STATUS”: ,} { “TEXT”: “At Conf” } “GEO_LOC”: “134” }, “COUNTRY”: ”USA” } JSON Just add information to a document 20
  • 20. Document Databases • Each record in the database is a self- describing document { • Each document has an independent “UUID”: “21f7f8de-8051-5b89-86 “Time”: “2011-04-01T13:01:02.42 “Server”: “A2223E”, structure “Calling Server”: “A2213W”, “Type”: “E100”, “Initiating User”: “dsallings@spy.net”, • Documents can be complex “Details”: { • All databases require a unique key “IP”: “10.1.1.22”, “API”: “InsertDVDQueueItem”, “Trace”: “cleansed”, • Documents are stored using JSON or “Tags”: [ “SERVER”, XML or their derivatives “US-West”, “API” ] • Database can look into the documents } } • Content can be indexed and queried 21
  • 21. Document database • Json objects • Each document has an independent schema { { "_id": "brewery_Cleveland_ChopHouse_and_Brewery", "_id": "beer_Double_Cream_Oatmeal_Stout", "_rev": "1-00000061480b50910000000000000000", "_rev": "1-0000042ee19241b60000000000000000", "city": "Cleveland", "category": "North American Ale", "updated": "2010-07-22 20:00:20", "style": "American-Style Stout", "code": "44113", "name": "Double Cream Oatmeal Stout", "name": "Cleveland ChopHouse and Brewery", "updated": "2010-07-22 20:00:20", "country": "United States", "brewery": "Olde Peninsula Brewpub and Restaurant", "phone": "1-216-623-0909", "$expiration": 0, "state": "Ohio", "$flags": 0 "address": [ } "824 West St.Clair Avenue” ], "geo": { "loc": [ "-81.6994", "41.4995” ], ] "accuracy": "ROOFTOP” }, "$expiration": 0, "$flags": 0 } 22
  • 22. Document modeling • Are these separate object in the model layer? Q • • Are these objects accessed together? Do you need updates to these objects to be atomic? • Are multiple people editing these objects concurrently? When considering how to model data for a given application • Think of a logical container for the data • Think of how data groups together 23
  • 23. Document Design Options • One document that contains all related data – Data is de-normalized – Better performance and scale – Eliminate client-side joins • Separate documents for different object types with cross references – Data duplication is reduced – Objects may not be co-located – Transactions supported only on a document boundary – Most document databases do not support joins or multi document transactions 24
  • 24. Document ID / Key selection • Similar to primary keys in relational databases • Documents are sharded based on the document ID • ID based document lookup is extremely fast • Usually an ID can only appear once in a bucket • Do you have a unique way of referencing objects? Q • Are related objects stored in separate documents? Options •UUIDs, date-based IDs, numeric IDs •Hand-crafted (human readable) •Matching prefixes (for multiple related objects) 25
  • 25. Example: Entities for a Blog BLOG • User profile The main pointer into the user data • Blog entries • Badge settings, like a twitter badge • Blog posts Contains the blogs themselves • Blog comments • Comments from other users 26
  • 26. Blog Document – Option 1 – Single document { “UUID ”: “2 1 f7 f8 de-8 0 5 1-5 b89 -8 6 “Time”: “2 0 1 1-0 4-0 1 T1 3 :0 1 :0 2.4 2 { “Server”: “A2 2 2 3 E”, “_id”: “Hello_World”, “Calling Server”: “A2 2 1 3 W”, “Type”: “E1 0 0 ”, “author”: “John Smith”, “Initiating Us er”: “ds allings @s py.net”, “type”: “post” “D etails ”: “title”: “Hello World”, { “format”: “markdown”,0 .1 .1 .2 2 ”, “IP”: “1 “body”: “Hello from [Couchbase](http://couchbase.com).”, “API”: “Ins ertD VD QueueItem”, “Trace”: “cleans ed”, “html”: “<p>Hello from <a href=“http: … “Tags ”: “comments”:[ [ [“format”: “markdown”, “body”:”Awesome post!”], “SERVER”, [“format”: “markdown”, “body”:”Like it.” ] “US-Wes t”, ] “API” } ] } } 27
  • 27. Threaded Comments • You can imagine how to take this to a threaded list List First Reply to comment Blog List comment More Comments Advantages • Only fetch the data when you need it • For example, rendering part of a web page • Spread the data and load across the entire cluster 28
  • 28. Blog Document – Option 2 - Split into multiple docs { { “UUID ”: “21f7f8de-8 0 5 1-5b89 -8 6 “_id”: “Hello_World”, “Time”: “2 0 1 1-0 4-01T13:01:02.42 “Server”: “A2223E”, “author”: “John Smith”, “Calling Server”: “A2213W”, “type”: “post” ”, “Type”: “E100 “title”: “Hello World”, “Initiating Us er”: “ds allings @s py.net”, “D etails ”: “format”: “markdown”, { “body”: “Hello“10.1.1.22”, “IP”: from “API”: “Ins ertDVD QueueItem”, [Couchbase](http://couchbase.com).”, “Trace”: “cleans ed”, “html”: “<p>Hello from <a href=“http: … “Tags ”: [ “comments”:[ “SERVER”, “comment1_Hello_world” “US-Wes t”, ] “API” ] { COMMENT } } “UUID ”: “ 2 1 f7 f8 de-8 0 5 1 -5 b8 9 -8 6 “Time”: “ 2 0 1 1 -0 4 -0 1 T1 3 :0 1 :0 2.4 2 “Server”: “A2 2 2 3 E”, } { “Calling Server”: “Type”: “E1 0 0 ”, “A2 2 1 3 W ”, BLOG DOC “Initiating Us er”: “ds allings @s py.net”, “_id”: “comment1_Hello_World”, “D etails ”: { “IP ”: “ 1 0 .1 .1 .2 2 ”, “format”: “markdown”, “AP I”: “ Ins ertD VD QueueItem”, “Trace”: “cleans ed”, “body”:”Awesome post!” “Tags ”: [ “SERVER”, } “US-Wes t”, “AP I” ] } } 29
  • 29. Example 2 – Different object types User [Serializable] Key Value User User_1234 1234;Cheli; { public long ID; Buddies public string Name; Key Value User_1234_Buddies User_5678 [NonSerialized] User_9876 public list<User> Buddies; Messages Key Value [NonSerialized] public list<Messages> Messages User_1234_Messages Expire-> 9/9/9999 Message_1234 Message_5678 [NonSerialized] public Dictionary<Game,List<Bet>> BetsByGame } Key Value User_1234_BetsByGame_1 Bet_1234 BetsByGame Bet_2345 Key Value Key Value User_1234_BetsByGame User_1234_BetsByGame_1 User_1234_BetsByGame_2 User_1234_BetsByGame_2 Bet_9876 30 30
  • 31. Relational Technology Scales Up Application Scales Out Just add more commodity web servers System Cost Application Performance Web/App Server Tier Users RDBMS Scales Up Get a bigger, more complex server System Cost Application Performance Won’t scale beyond this point Relational Database Users Expensive and disruptive sharding, doesn’t perform at web scale 32
  • 32. Couchbase Server Scales Out Like App Tier Application Scales Out Just add more commodity web servers System Cost Application Performance Web/App Server Tier Users NoSQL Database Scales Out Cost and performance mirrors app tier System Cost Application Performance Couchbase Distributed Data Store Users Scaling out flattens the cost and performance curves 33
  • 33. Couchbase Server (a.k.a. Membase) Simple. Fast. Elastic. NoSQL. Couchbase automatically distributes data across commodity servers. Built-in caching enables apps to read and write data with sub-millisecond latency. And with no schema to manage, Couchbase effortlessly accommodates changing data management requirements. 34
  • 34. Couchbase Server Is The Complete Solution Easy Consistent High ✔ Scalability ✔ Performance One click scalability and no app Sub millisecond latency with high changes. throughput for reads and writes. ✔ Always On ✔ Flexible 24x365 Data Model Maintenance, upgrades and JSON document model with no fixed cluster resizing all online schema. without application downtime 35
  • 35. Use Case Examples Web app or Use-case Couchbase Solution Example Customer Content and Metadata Couchbase document store + Elastic Search McGraw-Hill… Management System Social Game or Mobile Couchbase stores game and player data Zynga, OMGPOP… App Ad Targeting Couchbase stores user information for fast AOL… access User Profile Store Couchbase Server as a key-value store TuneWiki… Session Store Couchbase Server as a key-value store Concur…. High Availability Couchbase Server as a memcached tier Orbitz… Caching Tier replacement Chat/Messaging Couchbase Server DOCOMO… Platform 37
  • 36. # 1 reason for users to move to noSQL • 3 38 38 8
  • 38. Key results of Cisco and Solarflare Benchmark Couchbase Server demonstrates • Consistent sub-millisecond latency for mixed workload • High throughput • Linear scalability http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-708169.pdf 40
  • 39. Your secret weapon: Sub-millisecond AND consistent latency Latency (micro seconds) Consistently low latencies in microseconds for varying documents sizes with a mixed workload Object size (Bytes) 41
  • 40. Your secret weapon: Linear scalability High throughput with 1.4 GB/sec data transfer rate using 4 servers Operations per second Linear throughput scalability Number of servers in cluster 42
  • 41. Write Performance Comparison 30 Insert/update latencies vs. throughput 25 Mongodb 95th Percentile Latency (ms) 20 15 Cassandra 10 5 Couchbase 0 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 Operations per Second http://altoros.com/nosql_databases_for_interactive_applications.html 43
  • 42. SCALE 44
  • 43. Draw Something by OMGPOP 45
  • 44. 50 Million Users in 50 Days Draw Something by OMGPOP Daily Active Users (millions) 16 14 12 10 8 6 4 2 2/6 8 10 12 14 16 18 20 22 24 26 28 3/1 3 5 7 9 11 13 15 17 19 21 46
  • 45. Game Data Went Non-Linear Draw Something by OMGPOP Daily Active Users (millions) 16 14 By March 29: • 30 million downloads 12 • 3,000+ drawings/second 10 • 2 billion drawings 8 • 105,000 TPS • 3.3 TB data stored 6 4 2 2/6 8 10 12 14 16 18 20 22 24 26 28 3/1 3 5 7 9 11 13 15 17 19 21 47
  • 46. In Contrast: The Simpsons Tapped Out The Simpson’s: Tapped Out Daily Active Users (millions) 16 14 EA Launches The Simpsons Tapped Out 12 10 8 6 4 #2 Free app on iPad #3 Free app on iPhone 2 2/6 8 10 12 14 16 18 20 22 24 26 28 3/1 3 5 7 9 11 13 15 17 19 21 48
  • 48. Partitioning The Data – vbucket (internal partitions) map 50
  • 49. Basic Operation – scale out APP SERVER 1 APP SERVER 2  Docs distributed evenly across COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY servers in the cluster  Each server stores both active CLUSTER MAP CLUSTER MAP & replica docs  Only one server active at a time  Client library provides app with Read/Write/Update Read/Write/Update simple interface to database  Cluster map provides map to which server doc is on  App never needs to know SERVER 1 SERVER 2 SERVER 3  App reads, writes, updates Active Docs Active Docs Active Docs docs Doc 5 DOC Doc 4 DOC Doc 1 DOC  Multiple App Servers can Doc 2 DOC Doc 7 DOC Doc 3 DOC access same document at Doc 9 DOC Doc 8 DOC Doc 6 DOC same time Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 2 DOC Doc 5 DOC COUCHBASE SERVER CLUSTER User Configured Replica Count = 1 51
  • 50. Add Nodes APP SERVER 1 APP SERVER 2  Two servers added to COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY cluster  One-click operation CLUSTER MAP CLUSTER MAP  Docs automatically rebalanced across cluster  Even distribution of docs Read/Write/Update Read/Write/Update  Minimum doc movement  Cluster map updated  App database calls now distributed over larger # SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5 of servers Active Docs Active Docs Active Docs Active Docs Active Docs Active Docs Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 3 Doc 2 DOC Doc 7 DOC Doc 3 DOC Doc 6 Doc 9 DOC Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 7 Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 9 Doc 8 DOC Doc 2 DOC Doc 5 DOC COUCHBASE SERVER CLUSTER User Configured Replica Count = 1 52
  • 51. Fail Over Node APP SERVER 1 APP SERVER 2  App servers happily accessing docs on Server 3 COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY  Server fails  App server requests to server 3 fail CLUSTER MAP CLUSTER MAP  Cluster detects server has failed  Promotes replicas of docs to active  Updates cluster map  App server requests for docs now go to appropriate server  Typically rebalance would follow SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5 Active Docs Active Docs Active Docs Active Docs Active Docs Active Docs Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 9 DOC Doc 6 DOC Doc 3 Doc 2 DOC Doc 7 DOC Doc 3 Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 5 DOC Doc 8 DOC Doc 7 Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 2 DOC Doc 9 COUCHBASE SERVER CLUSTER User Configured Replica Count = 1 53
  • 52. New in Couchbase Server 2.0 JSON support Indexing and Querying JSON JSON JSO JSON N JSON Incremental Map Reduce Cross data center replication 54
  • 53. Additional Couchbase Server Features Append-only storage layer Online compaction Better working set management Reduce server warm-up time Monitoring and admin API & UI SDKs, documentation and examples for a variety of languages 55
  • 54. Couchbase Server 2.0 Architecture 8092 11211 11210 Couch View Memcapable 1.0 Memcapable 2.0 Moxi REST management API/Web UI vBucket state and replication manager Memcached Interface Couch API Global singleton supervisor Rebalance orchestrator Configuration manager Node health monitor Process monitor Heartbeat Couchbase EP Engine Write/replica Hash table cache Data Manager Queues Cluster Manager Membase storage interface Distributed CouchStore Indexing Auto compaction http on each node one per cluster CouchBase Erlang/OTP HTTP Erlang port mapper Distributed Erlang 8091 4369 21100 - 21199 56
  • 55. Couchbase Server 2.0 Architecture 8092 11211 11210 Couch View Memcapable 1.0 Memcapable 2.0 Moxi REST management API/Web UI vBucket state and replication manager Memcached Interface Couch API Global singleton supervisor Rebalance orchestrator Configuration manager Node health monitor Process monitor Heartbeat Couchbase EP Engine Write/replica Hash table cache Queues Cluster Manager Membase storage interface Distributed CouchStore Indexing Auto compaction http on each node one per cluster CouchBase Erlang/OTP HTTP Erlang port mapper Distributed Erlang 8091 4369 21100 - 21199 57
  • 56. Couchbase Server 2.0 Architecture 8092 11211 11210 Couch View Memcapable 1.0 Memcapable 2.0 Moxi REST management API/Web UI vBucket state and replication manager Memcached Interface Couch API Global singleton supervisor Rebalance orchestrator Configuration manager Node health monitor Process monitor Heartbeat Couchbase EP Engine Hash table cache Write/replica Queues storage interface Distributed CouchStore Indexing Auto compaction http on each node one per cluster CouchBase Erlang/OTP HTTP Erlang port mapper Distributed Erlang 8091 4369 21100 - 21199 58
  • 57. Indexing and querying • Built-in incremental map reduce • Map functions are written and executed on Java Script (using Google’s V8 engine) • Index is built incrementally as mutation streams in • Query in a scatter/gather fashion 59
  • 58. Map function • Map functions function (doc) { if (doc.country, doc.state, doc.city) { emit([doc.country, doc.state, doc.city], 1); } else if (doc.country, doc.state) { emit([doc.country, doc.state], 1); } else if (doc.country) { emit([doc.country], 1); } } REST call: http://db1.couchbase.com:8092/beer-sample/_design/dev_beer/_view/by_location?limit=10 60
  • 59. Reduce functions • Built in reduce functions • _count • _sum • _stats ({“sum”: 1411, “count”: 1411, “min”: 1, “max”: 1, “sumsqr”:1411}) • Developing procedure • Develop against a subset of the data • Built the index on the entire cluster • Promote a dev_ view to production 61
  • 60. Indexing and Querying APP SERVER 1 APP SERVER 2 APP SERVER 1 APP SERVER 2  Indexing work is distributed COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY amongst nodes  Large data set possible CLUSTER MAP MAP CLUSTER CLUSTER MAPMAP CLUSTER  Parallelize the effort  Each node has index for data stored on it Query Response  Queries combine the results from required nodes SERVER 1 SERVER 2 SERVER 3 Active Docs Active Docs Active Docs Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 2 DOC Doc 7 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 6 DOC Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 2 DOC Doc 5 DOC User Configured Replica Count = 1 62
  • 61. Cross Data Center Replication US DATA EUROPE DATA ASIA DATA CENTER CENTER CENTER Replication Replication Replication  Data close to users  Multiple locations for disaster recovery  Independently managed clusters serving local data 63
  • 62. XDCR: Cross Data Center Replication • Replicate your Couchbase data across clusters • Clusters may be spread across geos • Configured on a per-bucket basis • Supports unidirectional and bidirectional operation • Application can read and write from both clusters (active – active replication) • Scales out linearly • Different from intra-cluster replication 64
  • 65. Elastic Search integration COUCHBASE SERVER CLUSTER  Use the cross data center SERVER 1 SERVER 2 SERVER 3 interface Active Docs Active Docs Active Docs  Agnostic to topology changes Doc 5 DOC Doc 4 DOC Doc 1 DOC  De-duplication Doc 2 DOC Doc 7 DOC Doc 3 DOC  Effective changes feed of the Doc 9 DOC Doc 8 DOC Doc 6 DOC entire cluster Replica Docs Replica Docs Replica Docs Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 8 DOC Doc 2 DOC Doc 5 DOC CROSS DATA CENTER CONNETROR Changes feed to consumed by Elastic Search cluster, or any other consumer http://blog.couchbase.com/couchbase-and-full-text-search-couchbase-transport-elastic-search User Configured Replica Count = 1 67
  • 66. Couchbase and Hadoop Integration • Support large-scale analytics on application data by streaming data from Couchbase to Hadoop – Real-time integration using Flume – Batch integration using Sqoop • Examples – Various game statistics (e.g., monthly / daily / hourly rankings) – Analyze game patterns from users to enhance various game metrics memcached Sqoop TAP protocol listener/sender engine interface Couchbase Storage Engine 6 68
  • 67. Couchbase Client SDKs Java Client SDK User Code .Net SDK Java client API CouchbaseClient cb = new CouchbaseClient(listURIs, "aBucket", "letmein"); // this is all the same as before cb.set("hello", 0, "world"); cb.get("hello"); spymemcached HTTP couchDB Map<String, Object> manyThings = PHP SDK Connection connection cb.getBulk(Collection<String> keys); /* accessing a view View view = cb.getView("design_document", "my_view"); Query query = new Query(); query.getRange("abegin", "theend"); Ruby SDK Couchbase Server Python SDK http://www.couchbase.come/develop 69
  • 68. THANK YOU COUCHBASE SIMPLE, FAST, ELASTIC NOSQL sharon@couchbase.com @sharonyb 70

Editor's Notes

  1. Partial listing of companies with paid production deploymentsThousands more using open source
  2. Before we jump into evaluation of NoSQL, let’s take a look at a sampling of NoSQL databases. Foundationally, every NoSQL database , solution is a key –value store. A primary key identifies record and the value is just a blob. Document databases, column and graph databases add more functionality like indexing and querying, that’s what you would expect from a database. Many would argue that memcached was the precursor to all NoSQL databases. In-memory key value store. Redis is also an in-memory key value store but with a lot more operations on lists and sets. On the database side, membase open source. Key value with persistence, replication for high availability, and highly scalable with consistent sharding. Built-in object level cache, memcached compatibleCouchbase is a descendant of membase, is a document database and uses JSON as the data model. Its horizontally scalable, replicates data for high availability, includes a built-in cache for high throughput low latency, but in addition embeds couchDB technology for indexing and querying capabilities and incremental map reduceMongo – stores BSON, has master slave replication, auto-sharding, ad-hoc query support best when used with indexes.
  3. we ran a survey sometime late last year, here are the result. We had 1300 respondants. Tried to advertise it in a few different places to get a minimally biased set. And something in these results surprised us. The requirements for applications have changed over the years, particularly interactive web apps. The need to support millions of users in some cases over a matter of weeks points to the need for a highly scalable data tier. So the scalability driver did not surprise us. What did surprise us was the schema flexibility requirement,. the need for schema flexibility to rapidly create and push out updates to applications. In our first webinar, James Philips walked through the NoSQL taxonomy and a comparison of key-value, document-oriented or column-oriented databases. Most of these databases can scale out and don’t require schema definition. But Today, I will focus on distributed document-oriented NoSQL database technology for the rest of this presentation, while biased, we at Couchbase believe that a document-oriented databases give you the best balance between schema flexibility and performance. MongoDB and Couchbase being the two most visible and widely adopted examples.
  4. Most of you are probably familiar with the table layout. A table is defined with a set of column. And each record in the table conforms to the schema. If you wish to capture different data in the future, the table schema must be changed using the ALTER TABLE statement. Typically data is normalized in the 3rd normal form reduce duplication. Large tables are split into smaller tablesusing foreign keys
  5. Example. Normalized schema 2 tables Foreign keys (links) connects the two. To get information about a specific error, you will perform and JOIN across the two tables
  6. Single doc contains aggregated info that would normallly be distributed across tables. Of course in real use cases it tends to be info spread out over tens, hundresds or even thousands of tables in real world complex systems (like SAP)Example. Normalized schema 2 tables Fk connects the two. To get information about a specific error, you will perform and join across the two tables
  7. The data is modeled for the application code and not for the database.
  8. Document oriented databases are in some ways are extensions to key-value store, where you access the document based on keys but can also create indexes to ask specific questions of the content within the document data is stored as self describing documents. …. Each document can have a different schema. Simple – list of attributes with numbers or strings as values or objects embedded within objects to form complex docs. You need a unique key / document id used to reference / access the document Couchbase users json, mongodb uses bson. In Couchbase , for querying, you first create views over data. Views are built using incremental map reduce. Mongodb supports adhoc querying but in most cases you need indexes Sharding to scale horizontally across a cluster and replication for high availability in case on node failures.
  9. This heavily depends on your application and use case. Are these separate objects in the ORM layer? Are these accessed together What are the atomicity requirementsWhat are concurrency requirements
  10. The simpler approach is to embed all related information into 1 document. Data is denormalized and almost represent a pre-computed join across tables. In contrast to this, you could split out objects into separate docs and include references in related objects. The join needs to be processed client-side by the application.Most document databases currently do not support joins.
  11. Key selection is very important. Key’s are hard to change at a latter point. ID’s are similar to the primary key defined when the table is created. Lookups are extremely fast because clients know exactly which server the document belongs to based on consistent hashing. ID’s can appear only once per bucket. In couchbase, we call them buckets, A bucket is equivalent to a table or a collection. Selection your ID depends on your document model as well. Questions. Options. UUID….Hand crafted. In Some NoSQL database systems,data is sorted by ID. If you use prefixes for related objects , you can look up related objects faster. Selecting a clever ID,can make your life a lot easier.
  12. You have different entities within the application.---
  13. It has mostof the fields you’d expect to have in a blog entry. The comment field is an array within embedded comment objects Easy to get all information about a blog. Issues with this approach. You may not want to display all of the comments on a page. Some blogs may be very popular and have lots commments. So you don’t want to get such a large amount of data from the database.
  14. If you’re expecting a very large number of comments, or want to display them threaded, you can easily imagine doing so by extending the list technique discussed earlier. This makes the application more complicated but load is spread across the cluster.
  15. As you see here, rather than storing comments inline, we can separate them to a comment list, and then from there to individual comments.
  16. Typical architecture, we have stateless application servers, sitting behind a load balancer. as the usage grows, adding additional app servers , update the load balancer and scale out the application linearly on both aspects – Costs and Performance. But the data tier is has a shared everything architecture. At a minimum, these are shared cache or shared disk systems. And so you need to scale up you will need expensive hardware. And even from a performance perspective you hit a limit. so both cost and performance with this approach is non –linear.
  17. If you contrast this architecture for NoSQL systems with relational systems, with a document model and auto-sharding, the database now scales horizontally along with your app servers tier. Giving you the linear cost and performance you want.
  18. For those who don’t know what Draw Something is – it is a “social” game like Pictionary. Two players play. A player is presented with a list of three words, from which they pick one to draw. The other player then sees the drawing and has to guess the word. And it goes back and forth like that.
  19. Well, the game launched on February 6, 2012. Like most new games, social media integration (Facebook in this case) makes it easy to both invite your friends to play, and to highlight that you are playing the game. This “social component” helps build popularity. A few weeks into its life, Draw Something began to get a lot of attention, including from celebrities who also used social media (facebook, twitter and pinterest) to talk about their experience. One of the stars of Jersey Shore tweeted about the game in early March, kicking off the initial round of growth – to 1 million daily active uers. Miley Cyrus tweeted about her Draw Something “addiction” on March 8 and growth accelerated – from over 4 million daily users. Two weeks later, at over 14 million daily active users, the company behind Draw Something was acquired by Zynga for a purported $200 plus million.
  20. As user growth exploded, the data associated with the game expanded exponentially. By the time the company was acquired, there were over 5000 THOUSAND drawings EVERY SECOND being created and stored by Draw Something. Unprecedented growth – growth most systems would crumble under.
  21. Unfortunately, not everyone prepares. On March 1, as Vinny and Pauly D of the Jersey Shore were tweeting about Draw Something, EA launched a game called The Simpson’s: Tapped Out. Almost immediately the game charged to #2 on the iPAD and #3 on the iPhone top free app lists. Growth started to follow the same trajectory as Draw Something! But the outcome couldn’t have been more different. While Draw Something continued to grow, EA was unable to keep up with the success of the game. Games were reportedly being “lost,” there was huge lag and users were beginning to complain, loudly. Rather than praise on twitter, there was a flood of negative reaction. EA was forced, just 4 days later! To pull the game from the App Store. As of the end of March, 2012, it had still not returned. What a contrast.
  22. JSON support – natively stored as json, whne you build an app, there is not conversion required. New doc viewing , editing capability. Indexing and querying – look inside your json, build views and query for a key, for ranges or to aggregate data Incremental mapreduce – powers indexing. Build complex views over your data. Great for real-time analytics XDCR – replicate information from one cluster to another cluster
  23. All nodes are equal, single node type, easy to scale your cluster. No single point of failoverEvery node manages some active data and some replica data. Data is distributed across the clsuter and hence the load is also uniformly distributed using auto sharding. We have a fixed number of shards that a key get hashed to. 1024 shards, distributed across the cluster. Replication within the cluster for high availability. Number of replicas are configurable with upto 3 replicas. With auto-failiover or manual failover, replica information is immediately promoted to active Add multiple nodes at a time to grow and shrink your cluster.
  24. CAPI interface – basic Couch API of which some goes through the caching layer (CRUD), some goes directly to Couch (Views)
  25. CAPI interface – basic Couch API of which some goes through the caching layer (CRUD), some goes directly to Couch (Views)
  26. CAPI interface – basic Couch API of which some goes through the caching layer (CRUD), some goes directly to Couch (Views)
  27. Not yet enabled in current DP, will be available for Beta
  28. Overview of what this feature is
  29. Review Existing Couchbase Server Replication*NEEDS HIGHER RES IMAGE*