Couchbase 101

COUCHBASE 101
Dipti Borkar
Head, WW Solutions Engineering

©2015 Couchbase Inc. 2
Agenda
 Where does Couchbase fit in?
 Key Concepts
 Operations
 Cluster-wide operations
 Look at a Live Cluster

Big Data = Operational + Analytic (NoSQL + Hadoop)
 Online
 Web/Mobile/IoT apps
 Millions of
customers/consumers
 Offline, batch-oriented
 Analytics apps
 Hundreds of business analysts

Couchbase meets today’s & tomorrow’s requirements
Flexible data model
Consistent performance at scale
High availability
Easy, affordable scalability
24x365

Enterprises use Couchbase to enable key
objectives
360 Degree
CustomerView
Profile
Management
Catalog Fraud
Detection
Content
Management
Internet of
Things
Digital
Communication
RealTime
Big Data
Mobile
Applications
Personalization

Couchbase can act as a
Key-Value Store Document Store
2014-06-23-10:15am : 75F
2014-06-23-11:30am : 77F
2014-06-23-02:00pm : 82F
0001:
{firstname: “Dipti”,
lastname: “Borkar”,
language: “English”,
time_zone: “PST”,
zip: 94403
}
Key - UTF-8 string up to 250 bytes
Value - can be 0 bytes – 20 MB (best practice < 1 MB)

Fundamentals
 Similar to primary keys in relational databases
 Documents are partitioned based on the document ID
 ID based document lookup is extremely fast
 Must be unique
 JSON
 Binary - integers, strings, booleans
 Common binary values include serialized objects, compressed XML,
compressed text, encrypted values
Document ID or Key
Value
 CAS Value (unique identifier for concurrency)
 TTL
 Flags (optional client library metadata)
 Revision #
Metadata

 Can Represent Complex Objects and Data Structures
 Very simple notation, lightweight, compact, readable
 The most common API return type for Integrations
 Facebook, Twitter, you name it, return JSON
 Native to Javascript (can be useful)
 Can be inserted straight into Couchbase (faster development)
 Serialization and Deserialization are very fast
Benefits of JSON

Storing and retrieving documents
©2014 Couchbase, Inc.
Couchbase Cluster
Server Nodes
User/application data
Which live on
Data Buckets
Documents
Read from / Written to
That form a
Clients
Servers
Dynamically scalable
Based on hash partitioning

User Object
string uid
string firstname
string lastname
int age
array favorite_colors
string email
u::john@couchbase.com
{ “uid”: 123456,
“firstname”: “John”,
“lastname”: “Smith”,
“age”: 22,
“favorite_colors”: [“blue”, “black”],
“email”: “john@couchbase.com”
}
User Object
string uid
string firstname
string lastname
int age
array favorite_colors
string email
u::john@couchbase.com
{ “uid”: 123456,
“firstname”: “John”,
“lastname”: “Smith”,
“age”: 22,
“favorite_colors”: [“blue”, “black”],
“email”: “john@couchbase.com”
}
add()
get()
Objects Serialized to JSON and Back

Couchbase provides a complete Data Management solution
High availability
cache
Key-value
store
Document
database
Embedded
database
Sync
management
Multi-purpose capabilities support a broad range of apps and use cases
Enterprises often start with cache, then broaden usage to other apps and use cases

What makes Couchbase unique?
Performance &
scalability leader
Sub millisecond latency
with high throughput;
memory-centric
architecture
Multi-
purpose
Simplified
administration
Easy to deploy &
manage; integrated
Admin Console, single-
click cluster expansion
& rebalance
Cache, key value store,
document database,
and local/mobile
database in single
platform
Always-on
availability
Data replication across
nodes, clusters, and
data centers
Enterprises choose Couchbase for several key advantages
24x365

Couchbase Server Architecture
Query
Engine
Object-
managed
Cache
Storage Engine
DATA MANAGER
11210 / 11211
Data access ports
8092
Query API
HTTP
REST management
API/Web UI
Replication, Rebalance,
Shard State Manager
Erlang /OTP
CLUSTER
MANAGER
8091
Admin Console

Single Node Operations -Write
33 2
Managed Cache
DiskQueue
Disk
Replication
Queue
App Server
Memory-to-Memory
Replication to other
node
Doc
DocDoc

Managed Cache
Disk
Single Node Operations - Read
Managed Cache
Doc 1
Get
Doc 1
Doc 1Doc 1
App Server
DiskQueue
Replication
Queue
Memory-to-Memory
node

Disk
Managed Cache
Single Node Operations – Cache Ejection
Doc 1
Doc 1
Doc 2Doc 3Doc 4Doc 5Doc 6
App Server
DiskQueue
Replication
Queue
Memory-to-Memory
node

Single Node Operations – Cache Miss
33 2
DiskQueue
Disk
Replication
Queue
App Server
Memory-to-Memory
node
Doc 1
Doc 1
Doc 1Doc 1
Managed Cache
Get
Doc 1

Auto sharding – Bucket and vBuckets
 Each bucket has active and replica data sets
 Each data set has 1024Virtual Bucket (vBuckets)
 Documents get logically mapped to vBuckets
 Document IDs always get hashed to the same virtual bucket
 Virtual buckets to do not have a fixed physical server location
 Mapping between the virtual buckets and physical server is called the
cluster map
 Each virtual bucket contains 1/1024th portion of the data set
vB
Data buckets
vB
1 ….. 1024
Virtual buckets

Cluster Map
Hash function (KEY)
vB1 vB2 vB3 vB4 vB5 vB1024
Physical
servers
A B C
Add node to scale out
Logical
Partitions
Cluster Map
New Cluster Map
Documents
Read from / Written to

Cluster Map

Cluster Map – 2 nodes added

read/write/update
Active
SERVER 1
Active
SERVER 2
Active
SERVER 3
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
CLUSTER MAP
APP SERVER 2
Shard
5
Shard
2
Shard
9
Shard
Shard
Shard
Shard
4
Shard
7
Shard
8
Shard
Shard
Shard
Shard
1
Shard
3
Shard
6
Shard
Shard
Shard
Replica Replica Replica
Shard
4
Shard
1
Shard
8
Shard
Shard
Shard
Shard
6
Shard
3
Shard
2
Shard
Shard
Shard
Shard
7
Shard
9
Shard
5
Shard
Shard
Shard
Multi-Node Operations
©2014 Couchbase, Inc. 26
• Docs distributed evenly across
servers
• Each server stores both active and
replica docs
- Only one server active at a time
• Client library provides app with
simple interface to database
• Cluster map provides map
to which server doc is on
- App never needs to know
• App reads, writes, updates docs
• Multiple app servers can access
same document at same time

SERVER 4 SERVER 5
Replica
Active
Replica
Active
read/write/update
APP SERVER 1
CLUSTER MAP
CLUSTER MAP
APP SERVER 2
Active
SERVER 1
Shard
9
Shard
Replica
Shard
4
Shard
1
Shard
8
Shard
Shard
Shard
Active
SERVER 2
Shard
8
Shard
Replica
Shard
6
Shard
3
Shard
2
Shard
Shard
Shard
Active
SERVER 3
Shard
6
Shard
Replica
Shard
7
Shard
9
Shard
5
Shard
Shard
Shard
read/write/update
Shard
5
Shard
2
Shard
Shard
Shard
4
Shard
7
Shard
Shard
Shard
1
Shard
3
Shard
Shard
Adding Nodes
©2014 Couchbase, Inc. 27
• Two servers added with
one-click operation
• Docs automatically rebalance across
cluster
- Even distribution of docs
- Minimum doc movement
• Cluster map updated
• App database calls now distributed
over larger number
of servers

SERVER 4 SERVER 5
Replica
Active
Replica
ActiveActive
SERVER 1
Shard 5
Shard 2
Shard 9Shard
Shard
Shard
Replica
Shard 4
Shard 1
Shard 8Shard
Shard
Shard
Active
SERVER 2
Shard 4
Shard 7 Shard 8
Shard
Shard Shard
Replica
Shard 6
Shard 3 Shard 2
Shard
Shard Shard
Active
SERVER 3
Shard 1
Shard 3
Shard 6Shard
Shard
Shard
Replica
Shard 7
Shard 9
Shard 5Shard
Shard
Shard
• App servers accessing Shards
• Requests to Server 3 fail
• Cluster detects server failed
o Promotes replicas of
Shards to active
o Updates cluster map
• Requests for docs now go to
appropriate server
• Typically rebalance
would follow
Shard 1 Shard 3
Shard
Managing failures
App Server 1
CLUSTER MAP
CLUSTER MAP
App Server 2

Cross Data Center Replication
XDCR

Market leading memory-to-memory replication
NewYork
San
Francisco

XDCR: Cross Data Center Replication
 Application can access both clusters (master – master)
 Scales out linearly
 Different from intra-cluster replication (“CP” versus “AP”)

XDCR: Flexible topologies
 One-one, one-many, many-one
 Differently sized and resourced clusters supported

33 2
XDCR after Write
Managed Cache
DiskQueue
Disk
Replication
Queue
App Server
Couchbase Server Node
Doc 1
Doc 1
XDCR
Queue
Doc 1Doc 1
(New in 3.0) Memory-
to-Memory
Replication to remote
cluster
Memory-to-Memory
node

Indexing and Querying Features
 Index and Query
 Distributed indexing and querying
 Secondary indexes of JSON document content
 Flexible querying of indexes
 Incremental Map-Reduce
 Distributed simple real-time analytics
 Only considers changes due to updated data
 FullText Search
 Robust integration with ElasticSearch / Solr cluster
 Flexible full text search and faceted search

33 2
View processing after write
Managed Cache
DiskQueue
Disk
Replication
Queue
App Server
Couchbase Server Node
Doc 1
Doc 1
To other node
View engine Doc 1Doc 1

Active
SERVER 1
Shard
5
Shard
2
Shard
Shard
Replica
Shard
4
Shard
1
Shard
Shard
Shard
1
Active
SERVER 3
Shard
5
Shard
2
Shard
Shard
Replica
Shard
4
Shard
1
Shard
Shard
Shard
1
Active
SERVER 2
Shard
5
Shard
2
Shard
Shard
Replica
Shard
4
Shard
1
Shard
Shard
Shard
1
APP SERVER 1
CLUSTER MAP
CLUSTER MAP
APP SERVER 2
Couchbase Server Architecture -Views
• Indexing work is distributed
amongst nodes
• Large data set possible
• Parallelize the effort
• Each node has index for
data stored on it
• Queries combine the results
from required nodes

Couchbase Elastic Search Connector

Couchbase Solr Connector

Why SQL for NoSQL
 JSON document model provides
 Rich Structure (no assembly)
 Structure Evolution (flexible schema, seamless change)
 SQL provides
 Query across relationships
 Query in general
 Why SQL for JSON?
 To address all these data concerns
 N1QL is SQL for JSON

Models for Representing Data
Data Concern Relational Model JSON Document Model (NoSQL)
Rich Structure
 Multiple flat tables
 Constant assembly and
disassembly
 Documents
 No assembly required!
Relationships
 Represented
 Queried (SQL)
 Represented
 Queried? Not so far…
Value Evolution  Data can be updated  Data can be updated
Structure Evolution
 Uniform and rigid
 Change is disruptive and
manual
 Flexible
 Change is seamless and data-
driven

SELECT
 Standard SELECT pipeline
 SELECT, FROM,WHERE, GROUP BY, ORDER BY, LIMIT, OFFSET
 Queries across relationships
 JOINs
 Subqueries
 NEST — a JOIN that embeds child objects within their parent
 UNNEST — a JOIN that surfaces nested objects as top-level data
 Aggregation
 Set operators
 UNION, INTERSECT, EXCEPT

N1QL Architecture
 Single node installation, services
defined dynamically
 Query service access Index and Data to
formulate response
 All queries and direct access is topology
aware and dynamically scalable

Indexing
 CREATE / DROP INDEX
 Two types of indexes
 View indexes
 GSI indexes (global secondary indexes—new)
 Can index any data expression
 Nested / complex expressions
 Computed expressions
 EXPLAIN

Data writes*
 UPDATE …WHERE …
 Partial updates; deep updates
 DELETE …WHERE …
 Deeply nested conditions
 INSERT …VALUES …; INSERT … SELECT …
 Bulk insert; transfer and transformation
 MERGE
 INSERT or UPDATE; ETL support
*Single-document atomicity.

Q & A
Thank you.
dipti@couchbase.com
@dborkar

Couchbase 101

More Related Content

What's hot

Similar to Couchbase 101

More from Dipti Borkar

Recently uploaded

Couchbase 101

Editor's Notes