Couchbase at the academic bisilim, Turkey

Distributed Document database

Sharon Barr
VP of Engineering

2

Couchbase NoSQL Leadership

Leading NoSQL database company
Open Source development & business model
Behind Couchbase open source project

Document-oriented NoSQL database
Focused on interactive internet and mobile applications

Provide more flexible, higher performance,
more scalable database than relational alternative

Most mature, reliable and widely deployed solution
>5,000 paid production deployments worldwide, over 350 customers

Headquarters in Silicon Valley (Mountain View, CA)
~100 employees including >60 in engineering/product
>80% of commits to Couchbase, memcached, Apache CouchDB

3

Agenda

What is a Document database
The document model

Couchbase Server
Couchbase nosql solution

5

The evolving database landscape

Matthew Aslett – 451 group – Dec 2012
6

Where does document database fits?

NoSql Graph database Analytics

Transaction processing
New SQL
Caching
As-a-service

None Relational
Key Value database

Relational

Apliances Document database
7

2 major types of data management systems

OLTP / ODS

Analytics (OLAP) / EDW

8

Evolution

OLAP (Analytics)

Relational NewSQL

Non - Relational NoSQL

OLTP (Transactional)

Database as a service
9

Evolution – NoSQL database types

OLAP (Analytics)

Relational NewSQL

Non - Relational NoSQL: KV/Document/Graph

OLTP (Transactional)

Database as a service
10

The evolving database landscape

Matthew Aslett – 451 group – Nov 2012
11

NoSQL catalog

Column
Key-Value Data Structure Document Graph
family
(memory only)
Cache

memcached redis
(memory/disk)

membase couchbase cassandra Neo4j
Database

couchDB

mongoDB
12

Survey: The leading driver for NoSQL adoption

What is the biggest data management problem
driving your use of NoSQL in the coming year?

Lack of flexibility/rigid schemas 49%

Inability to scale out data 35%

High latency/low performance 29%

Costs 16%

All of these 12%

Other 11%

Source: Couchbase NoSQL Survey, December 2011, n=1351

13

FLEXIBLE SCHEMA
COMPARING
DATA MODELS

14

Key Value vs. Document database

Pure Key-Value Database Document Database

10101001010100 {
100011110101100 “ID”: 1,
010100010100011 “FIRST”: “Frank”,
110011000101010 “LAST”: “Weigel”,
“ZIP”: “94040”,
010010010011001
“CITY”: “MV”,
101010100100011
“STATE”: “CA”
101010101001010
}

Couchbase Server 1.8 Couchbase Server 2.0
- Current release - Adds indexing/querying

Both Key-Value & Document Use-Cases Supported

15

Relational vs Document Data Model

C1 C2 C3 C4

{ JSON
JSON
}
JSON

Relational data model Document data model
Highly-structured table organization Collection of complex documents with
with rigidly-defined data formats and arbitrary, nested data formats and
record structure. varying “record” format.

16

RDBMS Example: User Profile

User Info Address Info
KEY First Last ZIP_id ZIP_id CITY STATE ZIP

1 Frank Weigel 2 1 DEN CO 30303

2 Ali Dodson 2 2 MV CA 94040

3 Mark Azad 2 3 CHI IL 60609

4 Steve Yen 3 4 NY NY 10010

To get info about specific user, you perform a join across two tables

17

Document Example: User Profile

{
“ID”: 1,
“FIRST”: “Frank”,
“LAST”: “Weigel”,
“ZIP”: “94040”,
= +
“STATE”: “CA”
}
JSON

All data in a single document

18

Making a Change Using RDBMS
User Table Photo Table Country Table
Country TEL Country
User ID First Last Zip User ID Photo ID Comment ID Country ID Country name
ID 3
2 d043 NYC 001 001 USA
1 Frank Wiegel 94040 001
2 b054 Bday 007 002 UK
2 Joe Smith 94040 001 5 c036 Miami 001 003 Argentina
3 Ali Dodson 94040 001 7 d072 Sunset 133
004 Australia
5002 e086 Spain 133
4 Sarah Gorin NW1 002 005 Aruba

001
Status Table 006 Austria
5 Bob Young 30303 Country
User ID Status ID Text ID
007 Brazil
6 Nancy Baker 10010 001 1 a42 At conf 134
008 Canada
4 b26 excited 007
7 Ray Jones 31311 001
5 c32 hockey 008 009 Chile
8 Lee Chen V5V3M 008
12 d83 Go A’s 001 •
•
•
5000 e34 sailing 005
• .
• . 130 Portugal
• .
Affiliations Table
Country
User ID Affl ID Affl Name ID 131 Romania
50000 Doug Moore 04252 001 2 a42 Cal 001 132 Russia
4 b96 USC 001
50001 Mary White SW195 002 133 Spain
7 c14 UW 001
50002 Lisa Clark 12425 001 8 e22 Oxford 002 134 Sweden
19

Making the Same Change With Couchbase

{
“ID”: 1,
“FIRST”: “Frank”,
“LAST”: “Weigel”,
“ZIP”: “94040”,
“STATE”: “CA”,
“STATUS”:
,}
{ “TEXT”: “At Conf”
} “GEO_LOC”: “134” },
“COUNTRY”: ”USA”
}
JSON

Just add information to a document

20

Document Databases

• Each record in the database is a self-
describing document {

• Each document has an independent “UUID”: “21f7f8de-8051-5b89-86
“Time”: “2011-04-01T13:01:02.42
“Server”: “A2223E”,

structure “Calling Server”: “A2213W”,
“Type”: “E100”,
“Initiating User”: “dsallings@spy.net”,

• Documents can be complex “Details”:
{

• All databases require a unique key
“IP”: “10.1.1.22”,
“API”: “InsertDVDQueueItem”,
“Trace”: “cleansed”,

• Documents are stored using JSON or
“Tags”:
[
“SERVER”,

XML or their derivatives “US-West”,
“API”
]

• Database can look into the documents }
}

• Content can be indexed and queried

21

Document database

• Json objects
• Each document has an independent schema

{ {
"_id": "brewery_Cleveland_ChopHouse_and_Brewery", "_id": "beer_Double_Cream_Oatmeal_Stout",
"_rev": "1-00000061480b50910000000000000000", "_rev": "1-0000042ee19241b60000000000000000",
"city": "Cleveland", "category": "North American Ale",
"updated": "2010-07-22 20:00:20", "style": "American-Style Stout",
"code": "44113", "name": "Double Cream Oatmeal Stout",
"name": "Cleveland ChopHouse and Brewery", "updated": "2010-07-22 20:00:20",
"country": "United States", "brewery": "Olde Peninsula Brewpub and Restaurant",
"phone": "1-216-623-0909", "$expiration": 0,
"state": "Ohio", "$flags": 0
"address": [ }
"824 West St.Clair Avenue”
],
"geo": {
"loc": [
"-81.6994",
"41.4995”
],
] "accuracy": "ROOFTOP”
},
"$expiration": 0,
"$flags": 0
}
22

Document modeling

• Are these separate object in the model layer?

Q •
•
Are these objects accessed together?
Do you need updates to these objects to be atomic?
• Are multiple people editing these objects concurrently?

When considering how to model data for a given
application
• Think of a logical container for the data
• Think of how data groups together

23

Document Design Options

• One document that contains all related data
– Data is de-normalized
– Better performance and scale
– Eliminate client-side joins

• Separate documents for different object types with cross
references
– Data duplication is reduced
– Objects may not be co-located
– Transactions supported only on a document boundary
– Most document databases do not support joins or multi
document transactions

24

Document ID / Key selection

• Similar to primary keys in relational databases
• Documents are sharded based on the document ID
• ID based document lookup is extremely fast
• Usually an ID can only appear once in a bucket

• Do you have a unique way of referencing objects?
Q • Are related objects stored in separate documents?

Options
•UUIDs, date-based IDs, numeric IDs
•Hand-crafted (human readable)
•Matching prefixes (for multiple related objects)

25

Example: Entities for a Blog
BLOG
• User profile
The main pointer into the user data
• Blog entries
• Badge settings, like a twitter badge
• Blog posts
Contains the blogs themselves
• Blog comments
• Comments from other users

26

Blog Document – Option 1 – Single document

{
“UUID ”: “2 1 f7 f8 de-8 0 5 1-5 b89 -8 6
“Time”: “2 0 1 1-0 4-0 1 T1 3 :0 1 :0 2.4 2
{ “Server”: “A2 2 2 3 E”,
“_id”: “Hello_World”,
“Calling Server”: “A2 2 1 3 W”,
“Type”: “E1 0 0 ”,
“author”: “John Smith”,
“Initiating Us er”: “ds allings @s py.net”,
“type”: “post”
“D etails ”:
“title”: “Hello World”,
{
“format”: “markdown”,0 .1 .1 .2 2 ”,
“IP”: “1
“body”: “Hello from [Couchbase](http://couchbase.com).”,
“API”: “Ins ertD VD QueueItem”,
“Trace”: “cleans ed”,
“html”: “<p>Hello from <a href=“http: …
“Tags ”:
“comments”:[ [
[“format”: “markdown”, “body”:”Awesome post!”],
“SERVER”,
[“format”: “markdown”, “body”:”Like it.” ]
“US-Wes t”,
] “API”
} ]
}
}

27

Threaded Comments

• You can imagine how to take this to a threaded list

List First
Reply to
comment
Blog List comment

More
Comments
Advantages
• Only fetch the data when you need it
• For example, rendering part of a web page
• Spread the data and load across the entire cluster
28

Blog Document – Option 2 - Split into multiple docs

{
{
“UUID ”: “21f7f8de-8 0 5 1-5b89 -8 6
“_id”: “Hello_World”,
“Time”: “2 0 1 1-0 4-01T13:01:02.42
“Server”: “A2223E”,
“author”: “John Smith”,
“Calling Server”: “A2213W”,
“type”: “post” ”,
“Type”: “E100
“title”: “Hello World”,
“Initiating Us er”: “ds allings @s py.net”,
“D etails ”:
“format”: “markdown”,
{
“body”: “Hello“10.1.1.22”,
“IP”: from
“API”: “Ins ertDVD QueueItem”,
[Couchbase](http://couchbase.com).”,
“html”: “<p>Hello from <a href=“http: …
“Tags ”:
[
“comments”:[ “SERVER”,
“comment1_Hello_world”
“US-Wes t”,
] “API”
] {
COMMENT
} } “UUID ”: “ 2 1 f7 f8 de-8 0 5 1 -5 b8 9 -8 6
“Time”: “ 2 0 1 1 -0 4 -0 1 T1 3 :0 1 :0 2.4 2
“Server”: “A2 2 2 3 E”,
}
{
“Calling Server”:
“Type”: “E1 0 0 ”,
“A2 2 1 3 W ”,

BLOG DOC “Initiating Us er”: “ds allings @s py.net”,

“_id”: “comment1_Hello_World”,
“D etails ”:
{
“IP ”: “ 1 0 .1 .1 .2 2 ”,
“format”: “markdown”,
“AP I”: “ Ins ertD VD QueueItem”,

“body”:”Awesome post!”
“Tags ”:
[
“SERVER”,

} “US-Wes t”,
“AP I”
]
}
}
29

Example 2 – Different object types
User
[Serializable] Key Value
User User_1234 1234;Cheli;
{
public long ID; Buddies
public string Name; Key Value

User_1234_Buddies User_5678
[NonSerialized] User_9876
public list<User> Buddies;
Messages
Key Value
[NonSerialized]
public list<Messages> Messages User_1234_Messages Expire-> 9/9/9999
Message_1234
Message_5678
[NonSerialized]
public Dictionary<Game,List<Bet>> BetsByGame
}

Key Value
User_1234_BetsByGame_1 Bet_1234
BetsByGame Bet_2345
Key Value
Key Value
User_1234_BetsByGame User_1234_BetsByGame_1
User_1234_BetsByGame_2 User_1234_BetsByGame_2 Bet_9876
30 30

COUCHBASE DATABASE

31

Relational Technology Scales Up
Application Scales Out
Just add more commodity web servers

System Cost
Application Performance

Web/App Server Tier

Users

RDBMS Scales Up
Get a bigger, more complex server

System Cost

Won’t
scale
beyond
this point
Relational Database
Users

Expensive and disruptive sharding, doesn’t perform at web scale
32

Couchbase Server Scales Out Like App Tier
Application Scales Out
Just add more commodity web servers

System Cost

Web/App Server Tier

Users

NoSQL Database Scales Out
Cost and performance mirrors app tier

System Cost

Couchbase Distributed Data Store

Users

Scaling out flattens the cost and performance curves
33

Couchbase Server (a.k.a. Membase)

Simple. Fast. Elastic. NoSQL.
Couchbase automatically distributes data across commodity servers. Built-in caching enables
apps to read and write data with sub-millisecond latency. And with no schema to
manage, Couchbase effortlessly accommodates changing data management requirements.

34

Couchbase Server Is The Complete Solution

Easy Consistent High
✔ Scalability ✔ Performance
One click scalability and no app Sub millisecond latency with high
changes. throughput for reads and writes.

✔ Always On ✔ Flexible
24x365 Data Model
Maintenance, upgrades and JSON document model with no fixed
cluster resizing all online schema.
without application downtime

35

Use Case Examples

Web app or Use-case Couchbase Solution Example Customer
Content and Metadata Couchbase document store + Elastic Search McGraw-Hill…
Management System
Social Game or Mobile Couchbase stores game and player data Zynga, OMGPOP…
App
Ad Targeting Couchbase stores user information for fast AOL…
access
User Profile Store Couchbase Server as a key-value store TuneWiki…

Session Store Couchbase Server as a key-value store Concur….

High Availability Couchbase Server as a memcached tier Orbitz…
Caching Tier replacement

Chat/Messaging Couchbase Server DOCOMO…
Platform
37

# 1 reason for users to move to noSQL

• 3
38
38 8

PERFORMANCE
PREDICTABLE LATENCY

39

Key results of Cisco and Solarflare Benchmark

Couchbase Server demonstrates

• Consistent sub-millisecond
latency for mixed workload

• High throughput

• Linear scalability

http://www.cisco.com/en/US/prod/collateral/switches/ps9441/ps9670/white_paper_c11-708169.pdf 40

Your secret weapon: Sub-millisecond AND consistent latency
Latency (micro seconds)

Consistently low latencies
in microseconds for
varying documents sizes
with a mixed workload

Object size (Bytes)

41

Your secret weapon: Linear scalability

High throughput with 1.4
GB/sec data transfer rate
using 4 servers
Operations per second

Linear throughput
scalability

Number of servers in cluster

42

Write Performance Comparison
30

Insert/update latencies vs. throughput

25 Mongodb
95th Percentile Latency (ms)

20

15

Cassandra

10

5
Couchbase

0
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000
Operations per Second
http://altoros.com/nosql_databases_for_interactive_applications.html
43

Draw Something by OMGPOP

45

50 Million Users in 50 Days
Daily Active Users (millions)
16

14

12

10

8

6

4

2

2/6 8 10 12 14 16 18 20 22 24 26 28 3/1 3 5 7 9 11 13 15 17 19 21

46

Game Data Went Non-Linear
16

14 By March 29:
• 30 million downloads
12
• 3,000+ drawings/second
10 • 2 billion drawings
8
• 105,000 TPS
• 3.3 TB data stored
6

4

2

2/6 8 10 12 14 16 18 20 22 24 26 28 3/1 3 5 7 9 11 13 15 17 19 21

47

In Contrast: The Simpsons Tapped Out
The Simpson’s: Tapped Out
16

14

EA Launches The
Simpsons Tapped Out
12

10

8

6

4
#2 Free app on iPad
#3 Free app on iPhone
2

2/6 8 10 12 14 16 18 20 22 24 26 28 3/1 3 5 7 9 11 13 15 17 19 21

48

Partitioning The Data – vbucket (internal partitions) map

50

Basic Operation – scale out
APP SERVER 1 APP SERVER 2
 Docs distributed evenly across
COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY
servers in the cluster
 Each server stores both active
CLUSTER MAP CLUSTER MAP
& replica docs
 Only one server active at a time
 Client library provides app with
Read/Write/Update Read/Write/Update simple interface to database
 Cluster map provides map to
which server doc is on
 App never needs to know
SERVER 1 SERVER 2 SERVER 3
 App reads, writes, updates
Active Docs Active Docs Active Docs
docs
Doc 5 DOC Doc 4 DOC Doc 1 DOC
 Multiple App Servers can
Doc 2 DOC Doc 7 DOC Doc 3 DOC access same document at
same time

Replica Docs Replica Docs Replica Docs




COUCHBASE SERVER CLUSTER

User Configured Replica Count = 1 51

Add Nodes

 Two servers added to
COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY cluster
 One-click operation
CLUSTER MAP CLUSTER MAP
 Docs automatically
rebalanced across
cluster
 Even distribution of
docs
Read/Write/Update Read/Write/Update  Minimum doc
movement
 Cluster map updated
 App database calls now
distributed over larger #
SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5 of servers
Active Docs Active Docs Active Docs Active Docs Active Docs
Active Docs
Doc 3
Doc 6

Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs
Replica Docs
Doc 7
Doc 9



Fail Over Node
 App servers happily accessing docs
on Server 3
COUCHBASE CLIENT LIBRARY COUCHBASE CLIENT LIBRARY  Server fails
 App server requests to server 3 fail
CLUSTER MAP CLUSTER MAP  Cluster detects server has failed
 Promotes replicas of docs to active
 Updates cluster map
 App server requests for docs now
go to appropriate server
 Typically rebalance would follow

SERVER 1 SERVER 2 SERVER 3 SERVER 4 SERVER 5
Active Docs Active Docs Active Docs Active Docs Active Docs
Active Docs
Doc 5 DOC Doc 4 DOC Doc 1 DOC Doc 9 DOC Doc 6 DOC
Doc 3
Doc 2 DOC Doc 7 DOC Doc 3 Doc 8 DOC
Doc 6
DOC

Replica Docs Replica Docs Replica Docs Replica Docs Replica Docs
Replica Docs
Doc 4 DOC Doc 6 DOC Doc 7 DOC Doc 5 DOC Doc 8 DOC
Doc 7
Doc 1 DOC Doc 3 DOC Doc 9 DOC Doc 2 DOC
Doc 9



New in Couchbase Server 2.0

JSON support Indexing and Querying

JSON
JSON JSO
JSON N
JSON

Incremental Map Reduce Cross data center replication

54

Additional Couchbase Server Features

Append-only storage layer

Online compaction

Better working set management

Reduce server warm-up time

Monitoring and admin API & UI

SDKs, documentation and examples for a variety of languages

55

Couchbase Server 2.0 Architecture
8092 11211 11210
Couch View Memcapable 1.0 Memcapable 2.0

Moxi

REST management API/Web UI

vBucket state and replication manager
Memcached Interface
Couch API

Global singleton supervisor

Rebalance orchestrator
Configuration manager

Node health monitor
Process monitor
Heartbeat
Couchbase EP Engine
Write/replica
Hash table cache

Data Manager Queues Cluster Manager
Membase

storage interface

Distributed CouchStore
Indexing Auto compaction http on each node one per cluster

CouchBase Erlang/OTP

HTTP Erlang port mapper Distributed Erlang
8091 4369 21100 - 21199
56

8092 11211 11210

Moxi


Memcached Interface
Couch API



Node health monitor
Process monitor
Heartbeat
Couchbase EP Engine
Write/replica
Hash table cache
Queues Cluster Manager
Membase

storage interface



8091 4369 21100 - 21199
57

8092 11211 11210

Moxi


Memcached Interface
Couch API



Node health monitor
Process monitor
Heartbeat
Couchbase EP Engine
Hash table cache Write/replica
Queues

storage interface



8091 4369 21100 - 21199
58

Indexing and querying

• Built-in incremental map reduce

• Map functions are written and executed on Java Script
(using Google’s V8 engine)

• Index is built incrementally as mutation streams in

• Query in a scatter/gather fashion

59

Map function
• Map functions
function (doc) {
if (doc.country, doc.state, doc.city) {
emit([doc.country, doc.state, doc.city], 1);
} else if (doc.country, doc.state) {
emit([doc.country, doc.state], 1);
} else if (doc.country) {
emit([doc.country], 1);
}
}

REST call: http://db1.couchbase.com:8092/beer-sample/_design/dev_beer/_view/by_location?limit=10
60

Reduce functions

• Built in reduce functions
• _count
• _sum
• _stats ({“sum”: 1411, “count”: 1411, “min”: 1, “max”: 1, “sumsqr”:1411})

• Developing procedure
• Develop against a subset of the data
• Built the index on the entire cluster
• Promote a dev_ view to production

61

Indexing and Querying
 Indexing work is distributed
COUCHBASE CLIENT LIBRARY
COUCHBASE CLIENT LIBRARY amongst nodes
 Large data set possible
CLUSTER MAP MAP
CLUSTER CLUSTER MAPMAP
CLUSTER
 Parallelize the effort
 Each node has index for data
stored on it
Query
Response  Queries combine the results
from required nodes

SERVER 1 SERVER 2 SERVER 3
Active Docs Active Docs Active Docs








Cross Data Center Replication

US DATA EUROPE DATA ASIA DATA
CENTER CENTER CENTER
Replication Replication

Replication

 Data close to users
 Multiple locations for disaster recovery
 Independently managed clusters serving local data

63

XDCR: Cross Data Center Replication

• Replicate your Couchbase data across clusters
• Clusters may be spread across geos
• Configured on a per-bucket basis
• Supports unidirectional and bidirectional operation
• Application can read and write from both clusters
(active – active replication)
• Scales out linearly
• Different from intra-cluster replication

64

Intra-cluster Replication

65

Cross Datacenter Replication (XDCR)

66

Elastic Search integration
 Use the cross data center
SERVER 1 SERVER 2 SERVER 3 interface
Active Docs Active Docs Active Docs  Agnostic to topology changes
 De-duplication
Doc 2 DOC Doc 7 DOC Doc 3 DOC  Effective changes feed of the
Doc 9 DOC Doc 8 DOC Doc 6 DOC entire cluster




CROSS DATA CENTER CONNETROR

Changes feed to consumed by
Elastic Search cluster, or any other consumer
http://blog.couchbase.com/couchbase-and-full-text-search-couchbase-transport-elastic-search


Couchbase and Hadoop Integration
• Support large-scale analytics on application data by streaming data
from Couchbase to Hadoop
– Real-time integration using Flume
– Batch integration using Sqoop
• Examples
– Various game statistics (e.g., monthly / daily / hourly rankings)
– Analyze game patterns from users to enhance various game metrics

memcached
Sqoop TAP protocol listener/sender
engine interface

Couchbase Storage Engine

6
68

Couchbase Client SDKs

Java Client
SDK
User Code

.Net SDK Java client API
CouchbaseClient cb = new CouchbaseClient(listURIs,
"aBucket", "letmein");
// this is all the same as before
cb.set("hello", 0, "world");
cb.get("hello");
spymemcached HTTP couchDB Map<String, Object> manyThings =
PHP SDK Connection connection
cb.getBulk(Collection<String> keys);
/* accessing a view
View view =
cb.getView("design_document", "my_view");
Query query = new Query();
query.getRange("abegin", "theend");
Ruby SDK
Couchbase Server
Python SDK

http://www.couchbase.come/develop
69

THANK YOU

COUCHBASE
SIMPLE, FAST, ELASTIC NOSQL

sharon@couchbase.com
@sharonyb

70

Couchbase at the academic bisilim, Turkey

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Similar to Couchbase at the academic bisilim, Turkey

Similar to Couchbase at the academic bisilim, Turkey (20)

Couchbase at the academic bisilim, Turkey

Editor's Notes