COUCHBASE 101
Dipti Borkar
Head, WW Solutions Engineering
©2015 Couchbase Inc. 2
Agenda
 Where does Couchbase fit in?
 Key Concepts
 Operations
 Cluster-wide operations
 Look at a Live Cluster
©2015 Couchbase Inc. 3
Big Data = Operational + Analytic (NoSQL + Hadoop)
 Online
 Web/Mobile/IoT apps
 Millions of
customers/consumers
 Offline, batch-oriented
 Analytics apps
 Hundreds of business analysts
©2015 Couchbase Inc. 4
Couchbase meets today’s & tomorrow’s requirements
Flexible data model
Consistent performance at scale
High availability
Easy, affordable scalability
24x365
©2015 Couchbase Inc. 5
Enterprises use Couchbase to enable key
objectives
360 Degree
CustomerView
Profile
Management
Catalog Fraud
Detection
Content
Management
Internet of
Things
Digital
Communication
RealTime
Big Data
Mobile
Applications
Personalization
Key Concepts
6
©2015 Couchbase Inc. 7
Couchbase can act as a
Key-Value Store Document Store
2014-06-23-10:15am : 75F
2014-06-23-11:30am : 77F
2014-06-23-02:00pm : 82F
0001:
{firstname: “Dipti”,
lastname: “Borkar”,
language: “English”,
time_zone: “PST”,
zip: 94403
}
Key - UTF-8 string up to 250 bytes
Value - can be 0 bytes – 20 MB (best practice < 1 MB)
©2015 Couchbase Inc. 8
Fundamentals
 Similar to primary keys in relational databases
 Documents are partitioned based on the document ID
 ID based document lookup is extremely fast
 Must be unique
 JSON
 Binary - integers, strings, booleans
 Common binary values include serialized objects, compressed XML,
compressed text, encrypted values
Document ID or Key
Value
 CAS Value (unique identifier for concurrency)
 TTL
 Flags (optional client library metadata)
 Revision #
Metadata
©2015 Couchbase Inc. 9
 Can Represent Complex Objects and Data Structures
 Very simple notation, lightweight, compact, readable
 The most common API return type for Integrations
 Facebook, Twitter, you name it, return JSON
 Native to Javascript (can be useful)
 Can be inserted straight into Couchbase (faster development)
 Serialization and Deserialization are very fast
Benefits of JSON
©2015 Couchbase Inc. 10
Storing and retrieving documents
©2014 Couchbase, Inc.
Couchbase Cluster
Server Nodes
User/application data
Which live on
Data Buckets
Documents
Read from / Written to
That form a
Clients
Servers
Dynamically scalable
Based on hash partitioning
©2015 Couchbase Inc. 11
User Object
string uid
string firstname
string lastname
int age
array favorite_colors
string email
u::john@couchbase.com
{ “uid”: 123456,
“firstname”: “John”,
“lastname”: “Smith”,
“age”: 22,
“favorite_colors”: [“blue”, “black”],
“email”: “john@couchbase.com”
}
User Object
string uid
string firstname
string lastname
int age
array favorite_colors
string email
u::john@couchbase.com
{ “uid”: 123456,
“firstname”: “John”,
“lastname”: “Smith”,
“age”: 22,
“favorite_colors”: [“blue”, “black”],
“email”: “john@couchbase.com”
}
add()
get()
Objects Serialized to JSON and Back
©2014 Couchbase, Inc.
©2015 Couchbase Inc. 12
Couchbase provides a complete Data Management solution
High availability
cache
Key-value
store
Document
database
Embedded
database
Sync
management
Multi-purpose capabilities support a broad range of apps and use cases
Enterprises often start with cache, then broaden usage to other apps and use cases
©2015 Couchbase Inc. 13
What makes Couchbase unique?
Performance &
scalability leader
Sub millisecond latency
with high throughput;
memory-centric
architecture
Multi-
purpose
Simplified
administration
Easy to deploy &
manage; integrated
Admin Console, single-
click cluster expansion
& rebalance
Cache, key value store,
document database,
and local/mobile
database in single
platform
Always-on
availability
Data replication across
nodes, clusters, and
data centers
Enterprises choose Couchbase for several key advantages
24x365
Operations
©2015 Couchbase Inc. 15
Couchbase Server Architecture
Query
Engine
Object-
managed
Cache
Storage Engine
DATA MANAGER
11210 / 11211
Data access ports
8092
Query API
HTTP
REST management
API/Web UI
Replication, Rebalance,
Shard State Manager
Erlang /OTP
CLUSTER
MANAGER
8091
Admin Console
©2015 Couchbase Inc. 16
Single Node Operations -Write
33 2
Managed Cache
DiskQueue
Disk
Replication
Queue
App Server
Memory-to-Memory
Replication to other
node
Doc
DocDoc
©2015 Couchbase Inc. 17
Managed Cache
Disk
Single Node Operations - Read
Managed Cache
Doc 1
Get
Doc 1
Doc 1Doc 1
App Server
DiskQueue
Replication
Queue
Memory-to-Memory
Replication to other
node
©2015 Couchbase Inc. 18
Disk
Managed Cache
Single Node Operations – Cache Ejection
Doc 1
Doc 1
Doc 2Doc 3Doc 4Doc 5Doc 6
Doc 2Doc 3Doc 4Doc 5Doc 6
App Server
DiskQueue
Replication
Queue
Memory-to-Memory
Replication to other
node
©2015 Couchbase Inc. 19
Single Node Operations – Cache Miss
33 2
DiskQueue
Disk
Replication
Queue
App Server
Memory-to-Memory
Replication to other
node
Doc 1
Doc 2Doc 3Doc 4Doc 5Doc 6
Doc 2Doc 3Doc 4Doc 5Doc 6
Doc 1
Doc 1Doc 1
Managed Cache
Get
Doc 1
Cluster-wide Operations
©2015 Couchbase Inc. 21
Auto sharding – Bucket and vBuckets
 Each bucket has active and replica data sets
 Each data set has 1024Virtual Bucket (vBuckets)
 Documents get logically mapped to vBuckets
 Document IDs always get hashed to the same virtual bucket
 Virtual buckets to do not have a fixed physical server location
 Mapping between the virtual buckets and physical server is called the
cluster map
 Each virtual bucket contains 1/1024th portion of the data set
vB
Data buckets
vB
1 ….. 1024
Virtual buckets
©2015 Couchbase Inc. 22
Cluster Map
©2014 Couchbase, Inc.
Hash function (KEY)
vB1 vB2 vB3 vB4 vB5 vB1024
Physical
servers
A B C
Add node to scale out
Logical
Partitions
Cluster Map
New Cluster Map
Documents
Read from / Written to
©2015 Couchbase Inc. 23
Cluster Map
©2015 Couchbase Inc. 24
Cluster Map
©2015 Couchbase Inc. 25
Cluster Map – 2 nodes added
©2015 Couchbase Inc. 26
read/write/update
Active
SERVER 1
Active
SERVER 2
Active
SERVER 3
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Shard
5
Shard
2
Shard
9
Shard
Shard
Shard
Shard
4
Shard
7
Shard
8
Shard
Shard
Shard
Shard
1
Shard
3
Shard
6
Shard
Shard
Shard
Replica Replica Replica
Shard
4
Shard
1
Shard
8
Shard
Shard
Shard
Shard
6
Shard
3
Shard
2
Shard
Shard
Shard
Shard
7
Shard
9
Shard
5
Shard
Shard
Shard
Multi-Node Operations
©2014 Couchbase, Inc. 26
• Docs distributed evenly across
servers
• Each server stores both active and
replica docs
- Only one server active at a time
• Client library provides app with
simple interface to database
• Cluster map provides map
to which server doc is on
- App never needs to know
• App reads, writes, updates docs
• Multiple app servers can access
same document at same time
©2015 Couchbase Inc. 27
SERVER 4 SERVER 5
Replica
Active
Replica
Active
read/write/update
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Active
SERVER 1
Shard
9
Shard
Replica
Shard
4
Shard
1
Shard
8
Shard
Shard
Shard
Active
SERVER 2
Shard
8
Shard
Replica
Shard
6
Shard
3
Shard
2
Shard
Shard
Shard
Active
SERVER 3
Shard
6
Shard
Replica
Shard
7
Shard
9
Shard
5
Shard
Shard
Shard
read/write/update
Shard
5
Shard
2
Shard
Shard
Shard
4
Shard
7
Shard
Shard
Shard
1
Shard
3
Shard
Shard
Adding Nodes
©2014 Couchbase, Inc. 27
• Two servers added with
one-click operation
• Docs automatically rebalance across
cluster
- Even distribution of docs
- Minimum doc movement
• Cluster map updated
• App database calls now distributed
over larger number
of servers
©2015 Couchbase Inc. 28
SERVER 4 SERVER 5
Replica
Active
Replica
ActiveActive
SERVER 1
Shard 5
Shard 2
Shard 9Shard
Shard
Shard
Replica
Shard 4
Shard 1
Shard 8Shard
Shard
Shard
Active
SERVER 2
Shard 4
Shard 7 Shard 8
Shard
Shard Shard
Replica
Shard 6
Shard 3 Shard 2
Shard
Shard Shard
Active
SERVER 3
Shard 1
Shard 3
Shard 6Shard
Shard
Shard
Replica
Shard 7
Shard 9
Shard 5Shard
Shard
Shard
• App servers accessing Shards
• Requests to Server 3 fail
• Cluster detects server failed
o Promotes replicas of
Shards to active
o Updates cluster map
• Requests for docs now go to
appropriate server
• Typically rebalance
would follow
Shard 1 Shard 3
Shard
Managing failures
App Server 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
App Server 2
A look at a live cluster
Cross Data Center Replication
XDCR
©2015 Couchbase Inc. 31
Market leading memory-to-memory replication
NewYork
San
Francisco
©2015 Couchbase Inc. 32
XDCR: Cross Data Center Replication
 Application can access both clusters (master – master)
 Scales out linearly
 Different from intra-cluster replication (“CP” versus “AP”)
©2015 Couchbase Inc. 35
XDCR: Flexible topologies
 One-one, one-many, many-one
 Differently sized and resourced clusters supported
©2015 Couchbase Inc. 36
33 2
XDCR after Write
Managed Cache
DiskQueue
Disk
Replication
Queue
App Server
Couchbase Server Node
Doc 1
Doc 1
XDCR
Queue
Doc 1Doc 1
(New in 3.0) Memory-
to-Memory
Replication to remote
cluster
Memory-to-Memory
Replication to other
node
©2015 Couchbase Inc. 37
Indexing and Querying Features
©2014 Couchbase, Inc.
 Index and Query
 Distributed indexing and querying
 Secondary indexes of JSON document content
 Flexible querying of indexes
 Incremental Map-Reduce
 Distributed simple real-time analytics
 Only considers changes due to updated data
 FullText Search
 Robust integration with ElasticSearch / Solr cluster
 Flexible full text search and faceted search
©2015 Couchbase Inc. 38
33 2
View processing after write
Managed Cache
DiskQueue
Disk
Replication
Queue
App Server
Couchbase Server Node
Doc 1
Doc 1
To other node
View engine Doc 1Doc 1
©2015 Couchbase Inc. 39
Active
SERVER 1
Shard
5
Shard
2
Shard
Shard
Replica
Shard
4
Shard
1
Shard
Shard
Shard
1
Active
SERVER 3
Shard
5
Shard
2
Shard
Shard
Replica
Shard
4
Shard
1
Shard
Shard
Shard
1
Active
SERVER 2
Shard
5
Shard
2
Shard
Shard
Replica
Shard
4
Shard
1
Shard
Shard
Shard
1
APP SERVER 1
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
APP SERVER 2
Couchbase Server Architecture -Views
©2014 Couchbase, Inc.
• Indexing work is distributed
amongst nodes
• Large data set possible
• Parallelize the effort
• Each node has index for
data stored on it
• Queries combine the results
from required nodes
©2015 Couchbase Inc. 40
Couchbase Elastic Search Connector
©2015 Couchbase Inc. 41
Couchbase Solr Connector
N1QL
Why SQL for NoSQL?
©2015 Couchbase Inc. 43
Why SQL for NoSQL
 JSON document model provides
 Rich Structure (no assembly)
 Structure Evolution (flexible schema, seamless change)
 SQL provides
 Query across relationships
 Query in general
 Why SQL for JSON?
 To address all these data concerns
 N1QL is SQL for JSON
©2015 Couchbase Inc. 44
Models for Representing Data
Data Concern Relational Model JSON Document Model (NoSQL)
Rich Structure
 Multiple flat tables
 Constant assembly and
disassembly
 Documents
 No assembly required!
Relationships
 Represented
 Queried (SQL)
 Represented
 Queried? Not so far…
Value Evolution  Data can be updated  Data can be updated
Structure Evolution
 Uniform and rigid
 Change is disruptive and
manual
 Flexible
 Change is seamless and data-
driven
©2015 Couchbase Inc. 45
SELECT
 Standard SELECT pipeline
 SELECT, FROM,WHERE, GROUP BY, ORDER BY, LIMIT, OFFSET
 Queries across relationships
 JOINs
 Subqueries
 NEST — a JOIN that embeds child objects within their parent
 UNNEST — a JOIN that surfaces nested objects as top-level data
 Aggregation
 Set operators
 UNION, INTERSECT, EXCEPT
©2015 Couchbase Inc. 46
N1QL Architecture
 Single node installation, services
defined dynamically
 Query service access Index and Data to
formulate response
 All queries and direct access is topology
aware and dynamically scalable
©2015 Couchbase Inc. 47
Indexing
 CREATE / DROP INDEX
 Two types of indexes
 View indexes
 GSI indexes (global secondary indexes—new)
 Can index any data expression
 Nested / complex expressions
 Computed expressions
 EXPLAIN
©2015 Couchbase Inc. 48
Data writes*
 UPDATE …WHERE …
 Partial updates; deep updates
 DELETE …WHERE …
 Deeply nested conditions
 INSERT …VALUES …; INSERT … SELECT …
 Bulk insert; transfer and transformation
 MERGE
 INSERT or UPDATE; ETL support
*Single-document atomicity.
Q & A
Thank you.
dipti@couchbase.com
@dborkar

Couchbase 101

  • 1.
    COUCHBASE 101 Dipti Borkar Head,WW Solutions Engineering
  • 2.
    ©2015 Couchbase Inc.2 Agenda  Where does Couchbase fit in?  Key Concepts  Operations  Cluster-wide operations  Look at a Live Cluster
  • 3.
    ©2015 Couchbase Inc.3 Big Data = Operational + Analytic (NoSQL + Hadoop)  Online  Web/Mobile/IoT apps  Millions of customers/consumers  Offline, batch-oriented  Analytics apps  Hundreds of business analysts
  • 4.
    ©2015 Couchbase Inc.4 Couchbase meets today’s & tomorrow’s requirements Flexible data model Consistent performance at scale High availability Easy, affordable scalability 24x365
  • 5.
    ©2015 Couchbase Inc.5 Enterprises use Couchbase to enable key objectives 360 Degree CustomerView Profile Management Catalog Fraud Detection Content Management Internet of Things Digital Communication RealTime Big Data Mobile Applications Personalization
  • 6.
  • 7.
    ©2015 Couchbase Inc.7 Couchbase can act as a Key-Value Store Document Store 2014-06-23-10:15am : 75F 2014-06-23-11:30am : 77F 2014-06-23-02:00pm : 82F 0001: {firstname: “Dipti”, lastname: “Borkar”, language: “English”, time_zone: “PST”, zip: 94403 } Key - UTF-8 string up to 250 bytes Value - can be 0 bytes – 20 MB (best practice < 1 MB)
  • 8.
    ©2015 Couchbase Inc.8 Fundamentals  Similar to primary keys in relational databases  Documents are partitioned based on the document ID  ID based document lookup is extremely fast  Must be unique  JSON  Binary - integers, strings, booleans  Common binary values include serialized objects, compressed XML, compressed text, encrypted values Document ID or Key Value  CAS Value (unique identifier for concurrency)  TTL  Flags (optional client library metadata)  Revision # Metadata
  • 9.
    ©2015 Couchbase Inc.9  Can Represent Complex Objects and Data Structures  Very simple notation, lightweight, compact, readable  The most common API return type for Integrations  Facebook, Twitter, you name it, return JSON  Native to Javascript (can be useful)  Can be inserted straight into Couchbase (faster development)  Serialization and Deserialization are very fast Benefits of JSON
  • 10.
    ©2015 Couchbase Inc.10 Storing and retrieving documents ©2014 Couchbase, Inc. Couchbase Cluster Server Nodes User/application data Which live on Data Buckets Documents Read from / Written to That form a Clients Servers Dynamically scalable Based on hash partitioning
  • 11.
    ©2015 Couchbase Inc.11 User Object string uid string firstname string lastname int age array favorite_colors string email u::john@couchbase.com { “uid”: 123456, “firstname”: “John”, “lastname”: “Smith”, “age”: 22, “favorite_colors”: [“blue”, “black”], “email”: “john@couchbase.com” } User Object string uid string firstname string lastname int age array favorite_colors string email u::john@couchbase.com { “uid”: 123456, “firstname”: “John”, “lastname”: “Smith”, “age”: 22, “favorite_colors”: [“blue”, “black”], “email”: “john@couchbase.com” } add() get() Objects Serialized to JSON and Back ©2014 Couchbase, Inc.
  • 12.
    ©2015 Couchbase Inc.12 Couchbase provides a complete Data Management solution High availability cache Key-value store Document database Embedded database Sync management Multi-purpose capabilities support a broad range of apps and use cases Enterprises often start with cache, then broaden usage to other apps and use cases
  • 13.
    ©2015 Couchbase Inc.13 What makes Couchbase unique? Performance & scalability leader Sub millisecond latency with high throughput; memory-centric architecture Multi- purpose Simplified administration Easy to deploy & manage; integrated Admin Console, single- click cluster expansion & rebalance Cache, key value store, document database, and local/mobile database in single platform Always-on availability Data replication across nodes, clusters, and data centers Enterprises choose Couchbase for several key advantages 24x365
  • 14.
  • 15.
    ©2015 Couchbase Inc.15 Couchbase Server Architecture Query Engine Object- managed Cache Storage Engine DATA MANAGER 11210 / 11211 Data access ports 8092 Query API HTTP REST management API/Web UI Replication, Rebalance, Shard State Manager Erlang /OTP CLUSTER MANAGER 8091 Admin Console
  • 16.
    ©2015 Couchbase Inc.16 Single Node Operations -Write 33 2 Managed Cache DiskQueue Disk Replication Queue App Server Memory-to-Memory Replication to other node Doc DocDoc
  • 17.
    ©2015 Couchbase Inc.17 Managed Cache Disk Single Node Operations - Read Managed Cache Doc 1 Get Doc 1 Doc 1Doc 1 App Server DiskQueue Replication Queue Memory-to-Memory Replication to other node
  • 18.
    ©2015 Couchbase Inc.18 Disk Managed Cache Single Node Operations – Cache Ejection Doc 1 Doc 1 Doc 2Doc 3Doc 4Doc 5Doc 6 Doc 2Doc 3Doc 4Doc 5Doc 6 App Server DiskQueue Replication Queue Memory-to-Memory Replication to other node
  • 19.
    ©2015 Couchbase Inc.19 Single Node Operations – Cache Miss 33 2 DiskQueue Disk Replication Queue App Server Memory-to-Memory Replication to other node Doc 1 Doc 2Doc 3Doc 4Doc 5Doc 6 Doc 2Doc 3Doc 4Doc 5Doc 6 Doc 1 Doc 1Doc 1 Managed Cache Get Doc 1
  • 20.
  • 21.
    ©2015 Couchbase Inc.21 Auto sharding – Bucket and vBuckets  Each bucket has active and replica data sets  Each data set has 1024Virtual Bucket (vBuckets)  Documents get logically mapped to vBuckets  Document IDs always get hashed to the same virtual bucket  Virtual buckets to do not have a fixed physical server location  Mapping between the virtual buckets and physical server is called the cluster map  Each virtual bucket contains 1/1024th portion of the data set vB Data buckets vB 1 ….. 1024 Virtual buckets
  • 22.
    ©2015 Couchbase Inc.22 Cluster Map ©2014 Couchbase, Inc. Hash function (KEY) vB1 vB2 vB3 vB4 vB5 vB1024 Physical servers A B C Add node to scale out Logical Partitions Cluster Map New Cluster Map Documents Read from / Written to
  • 23.
    ©2015 Couchbase Inc.23 Cluster Map
  • 24.
    ©2015 Couchbase Inc.24 Cluster Map
  • 25.
    ©2015 Couchbase Inc.25 Cluster Map – 2 nodes added
  • 26.
    ©2015 Couchbase Inc.26 read/write/update Active SERVER 1 Active SERVER 2 Active SERVER 3 APP SERVER 1 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP APP SERVER 2 Shard 5 Shard 2 Shard 9 Shard Shard Shard Shard 4 Shard 7 Shard 8 Shard Shard Shard Shard 1 Shard 3 Shard 6 Shard Shard Shard Replica Replica Replica Shard 4 Shard 1 Shard 8 Shard Shard Shard Shard 6 Shard 3 Shard 2 Shard Shard Shard Shard 7 Shard 9 Shard 5 Shard Shard Shard Multi-Node Operations ©2014 Couchbase, Inc. 26 • Docs distributed evenly across servers • Each server stores both active and replica docs - Only one server active at a time • Client library provides app with simple interface to database • Cluster map provides map to which server doc is on - App never needs to know • App reads, writes, updates docs • Multiple app servers can access same document at same time
  • 27.
    ©2015 Couchbase Inc.27 SERVER 4 SERVER 5 Replica Active Replica Active read/write/update APP SERVER 1 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP APP SERVER 2 Active SERVER 1 Shard 9 Shard Replica Shard 4 Shard 1 Shard 8 Shard Shard Shard Active SERVER 2 Shard 8 Shard Replica Shard 6 Shard 3 Shard 2 Shard Shard Shard Active SERVER 3 Shard 6 Shard Replica Shard 7 Shard 9 Shard 5 Shard Shard Shard read/write/update Shard 5 Shard 2 Shard Shard Shard 4 Shard 7 Shard Shard Shard 1 Shard 3 Shard Shard Adding Nodes ©2014 Couchbase, Inc. 27 • Two servers added with one-click operation • Docs automatically rebalance across cluster - Even distribution of docs - Minimum doc movement • Cluster map updated • App database calls now distributed over larger number of servers
  • 28.
    ©2015 Couchbase Inc.28 SERVER 4 SERVER 5 Replica Active Replica ActiveActive SERVER 1 Shard 5 Shard 2 Shard 9Shard Shard Shard Replica Shard 4 Shard 1 Shard 8Shard Shard Shard Active SERVER 2 Shard 4 Shard 7 Shard 8 Shard Shard Shard Replica Shard 6 Shard 3 Shard 2 Shard Shard Shard Active SERVER 3 Shard 1 Shard 3 Shard 6Shard Shard Shard Replica Shard 7 Shard 9 Shard 5Shard Shard Shard • App servers accessing Shards • Requests to Server 3 fail • Cluster detects server failed o Promotes replicas of Shards to active o Updates cluster map • Requests for docs now go to appropriate server • Typically rebalance would follow Shard 1 Shard 3 Shard Managing failures App Server 1 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP App Server 2
  • 29.
    A look ata live cluster
  • 30.
    Cross Data CenterReplication XDCR
  • 31.
    ©2015 Couchbase Inc.31 Market leading memory-to-memory replication NewYork San Francisco
  • 32.
    ©2015 Couchbase Inc.32 XDCR: Cross Data Center Replication  Application can access both clusters (master – master)  Scales out linearly  Different from intra-cluster replication (“CP” versus “AP”)
  • 33.
    ©2015 Couchbase Inc.35 XDCR: Flexible topologies  One-one, one-many, many-one  Differently sized and resourced clusters supported
  • 34.
    ©2015 Couchbase Inc.36 33 2 XDCR after Write Managed Cache DiskQueue Disk Replication Queue App Server Couchbase Server Node Doc 1 Doc 1 XDCR Queue Doc 1Doc 1 (New in 3.0) Memory- to-Memory Replication to remote cluster Memory-to-Memory Replication to other node
  • 35.
    ©2015 Couchbase Inc.37 Indexing and Querying Features ©2014 Couchbase, Inc.  Index and Query  Distributed indexing and querying  Secondary indexes of JSON document content  Flexible querying of indexes  Incremental Map-Reduce  Distributed simple real-time analytics  Only considers changes due to updated data  FullText Search  Robust integration with ElasticSearch / Solr cluster  Flexible full text search and faceted search
  • 36.
    ©2015 Couchbase Inc.38 33 2 View processing after write Managed Cache DiskQueue Disk Replication Queue App Server Couchbase Server Node Doc 1 Doc 1 To other node View engine Doc 1Doc 1
  • 37.
    ©2015 Couchbase Inc.39 Active SERVER 1 Shard 5 Shard 2 Shard Shard Replica Shard 4 Shard 1 Shard Shard Shard 1 Active SERVER 3 Shard 5 Shard 2 Shard Shard Replica Shard 4 Shard 1 Shard Shard Shard 1 Active SERVER 2 Shard 5 Shard 2 Shard Shard Replica Shard 4 Shard 1 Shard Shard Shard 1 APP SERVER 1 COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP APP SERVER 2 Couchbase Server Architecture -Views ©2014 Couchbase, Inc. • Indexing work is distributed amongst nodes • Large data set possible • Parallelize the effort • Each node has index for data stored on it • Queries combine the results from required nodes
  • 38.
    ©2015 Couchbase Inc.40 Couchbase Elastic Search Connector
  • 39.
    ©2015 Couchbase Inc.41 Couchbase Solr Connector
  • 40.
  • 41.
    ©2015 Couchbase Inc.43 Why SQL for NoSQL  JSON document model provides  Rich Structure (no assembly)  Structure Evolution (flexible schema, seamless change)  SQL provides  Query across relationships  Query in general  Why SQL for JSON?  To address all these data concerns  N1QL is SQL for JSON
  • 42.
    ©2015 Couchbase Inc.44 Models for Representing Data Data Concern Relational Model JSON Document Model (NoSQL) Rich Structure  Multiple flat tables  Constant assembly and disassembly  Documents  No assembly required! Relationships  Represented  Queried (SQL)  Represented  Queried? Not so far… Value Evolution  Data can be updated  Data can be updated Structure Evolution  Uniform and rigid  Change is disruptive and manual  Flexible  Change is seamless and data- driven
  • 43.
    ©2015 Couchbase Inc.45 SELECT  Standard SELECT pipeline  SELECT, FROM,WHERE, GROUP BY, ORDER BY, LIMIT, OFFSET  Queries across relationships  JOINs  Subqueries  NEST — a JOIN that embeds child objects within their parent  UNNEST — a JOIN that surfaces nested objects as top-level data  Aggregation  Set operators  UNION, INTERSECT, EXCEPT
  • 44.
    ©2015 Couchbase Inc.46 N1QL Architecture  Single node installation, services defined dynamically  Query service access Index and Data to formulate response  All queries and direct access is topology aware and dynamically scalable
  • 45.
    ©2015 Couchbase Inc.47 Indexing  CREATE / DROP INDEX  Two types of indexes  View indexes  GSI indexes (global secondary indexes—new)  Can index any data expression  Nested / complex expressions  Computed expressions  EXPLAIN
  • 46.
    ©2015 Couchbase Inc.48 Data writes*  UPDATE …WHERE …  Partial updates; deep updates  DELETE …WHERE …  Deeply nested conditions  INSERT …VALUES …; INSERT … SELECT …  Bulk insert; transfer and transformation  MERGE  INSERT or UPDATE; ETL support *Single-document atomicity.
  • 47.
    Q & A Thankyou. dipti@couchbase.com @dborkar

Editor's Notes

  • #4 KEY POINTS: BIG DATA IS NOT ONE THING – IT’S A COMBINATION OF OPERATIONAL (NOSQL) AND ANALYTICAL DATABASES. YOU NEED BOTH. COUCHBASE PROVIDES THE OPERATIONAL SOLUTION. Big data has two major pieces: Operational and Analytical Operational is about: Real time Online, interactive Customer/consumer facing Processing data at high velocity Analytical is about: Offline analytics Often batch oriented Takes time processing Directly touches relatively few users (business analysts) These two pieces together form “Big Data” There’s some overlap NoSQL can deliver some analytics Hadoop can deliver some operational But in general each technology designed for separate purposes Couchbase fits on the operational side, Hadoop on the analytics side
  • #5 KEY POINTS: COUCHBASE DELIVERS ALL THE CAPABILITIES NEEDED TO MEET TODAY’S REQUIREMENTS FOR PERFORMANCE, SCALABILITY, AVAILABILITY, AND DATA MODEL FLEXIBILITY. THESE TRANSLATE INTO MAJOR BENEFITS FOR YOUR BUSINESS. Couchbase was purpose-built to solve today’s requirements for enterprise-class, mission-critical, web and mobile applications. Specifically, Couchbase delivers the following capabilities: Fast performance at scale -- submillisecond latency to enable highly responsive applications, for millions or even hundreds of millions of users. Easy, affordable scalability – Couchbase is a distributed database that scales out on commodity hardware with push button simplicity. We make it very easy to add or remove capacity on demand with no system downtime. On premises, in the cloud, wherever you want. High availability – Couchbase automatically replicates your data across your servers, clusters, and data centers, so it’s always available, 24x7. And Couchbase doesn’t require any downtime to maintain. Flexible data model – Couchbase gives you complete flexibility to handle any kind of data, and to change your data model on the fly to accommodate new data attributes or new data types. It’s the kind of flexibility that developers love, because it gets rid of the rigid schemas that slow them down. So developers can build applications faster and easier. All this adds up to powerful benefits for your enterprise: Faster development & time to market Better business agility Improved customer experience Increased loyalty and revenue Lower IT costs and increased efficiency
  • #6 KEY POINT: ENTERPRISES ARE USING COUCHBASE ACROSS A RANGE OF MISSION CRITICAL USE CASES. As the slide shows, Couchbase supports a wide range of use cases, from Profile Management to Fraud Detection. Each use case has its own set of requirements – some need very high performance, some need very high availability, some need flexibility of the data model. The ability to meet all of these requirements is what has driven adoption of Couchbase.
  • #8 All information that you store in Couchbase Server are documents with keys. Keys are unique identifiers for a document, and values are either JSON documents or if you choose the data you want to store can be byte stream, data types, or other forms of serialized objects. Value can be JSON or binary objects, such as integers and strings. Keys are also known as document IDs and serve the same function as a SQL primary key. A key in Couchbase Server can be any string, including strings with separators and identifiers, such as ‘person_93679.’ A key is unique. When Couchbase Server is used as a store for JSON documents, the records can be indexed and queried. Couchbase Server provides a JavaScript-based query engine to find records based on field values.
  • #9 Key selection is very important. Key’s are hard to change at a latter point. ID’s are similar to the primary key defined when the table is created. Lookups are extremely fast because clients know exactly which server the document belongs to based on consistent hashing. ID’s can appear only once per bucket. In couchbase, we call them buckets, A bucket is equivalent to a table or a collection. Selection your ID depends on your document model as well. Questions. Options. UUID…. Hand crafted. In Some NoSQL database systems, data is sorted by ID. If you use prefixes for related objects , you can look up related objects faster. Selecting a clever ID, can make your life a lot easier.
  • #13 KEY POINT: COUCHBASE PROVIDES A SET OF MULTI-PURPOSE, CORE CAPABILITIES THAT SUPPORT A BROAD RANGE OF APPLICATIONS AND USE CASES, ALL IN A SINGLE DATA MANAGEMENT PLATFORM. Couchbase provides a set of technology capabilities to support a broad range of applications and use cases: High Availability Cache: Couchbase provides an integrated managed object cache, so you can start out using Couchbase as a high availability cache on top of your existing relational database. For example, you can use Couchbase as a session store in front of your relational database, if your relational DB is struggling to keep up with the load required for online interactive applications. Key-Value Store: Many customers start with Couchbase as a cache and then broaden their usage to other capabilities, like using Couchbase as a Key-Value Store for things like Profile Management. Document Database: From there, you can grow into using Couchbase as a Document Database, where you can do more with capabilities like indexing and Cross Data Center Replication. Embedded Database: Couchbase also provides an embedded database called Couchbase Lite. It’s a purpose-built database for the device, so you can build applications that are always available and always work, whether offline or online. Sync Management: Finally, as part of our solution for mobile applications, we provide Couchbase Sync Gateway, which automatically synchronizes data on the device with Couchbase Server in the cloud so your developer doesn’t have to write code to manage the complex sync process. Starting with cache and then expanding to other capabilities is often a good way to learn the technology and get comfortable with Couchbase for a wider set of use cases.
  • #14 Couchbase has emerged as a leading NoSQL provider for number of reasons: Best in performance and scalability We’ve engineered Couchbase from the ground up for high performance and scalability Couchbase is designed to deliver sub-millisecond responsiveness with very high throughput for both reads and writes We consistently outperform competitors like MongoDB and DataStax in multiple independent benchmarks Our performance advantage is driven in large part by our memory-centric architecture, which includes an integrated managed object cache and stream-based replication Broad use case support We’re the only NoSQL provider that has consolidated distributed cache, key-value store, and a JSON-based document database in a single platform This means customers can use Couchbase for a much broader range of applications Integrated mobile solution We’re the only vendor that provides an end-to-end NoSQL mobile solution -- allows customers to easily build mobile apps that run great on or offline Includes a JSON database embedded on the device, along with a prebuilt syncing tier So apps run great on the device, even without a network connection or no connectivity at all Data on the device auto-syncs with the backend server when a connection is available Simplified administration We’ve designed Couchbase to be exceptionally easy to deploy and manage Features such as an integrated Admin Console and single-click cluster expansion & rebalance dramatically increase admin efficiency
  • #16 Each Couchbase node is exactly the same. All nodes are broken down into two components: A data manager (on the left) and a cluster manager (on the right). It’s important to realize that these are separate processes within the system specifically designed so that a node can continue serving its data even in the face of cluster problems like network disruption. The data manager is written in C and C++ and is responsible both for the object caching layer, persistence layer and querying engine. It is based off of memcached and so provides a number of benefits; -The very low lock contention of memcached allows for extremely high throughput and low latencies both to a small set of documents (or just one) as well as across millions of documents -Being compatible with the memcached protocol means we are not only a drop-in replacement, but inherit support for automatic item expiration (TTL), atomic incrementer. -We’ve increased the maximum object size to 20mb, but still recommend keeping them much smaller -Support for both binary objects as well as natively supporting JSON documents -All of the metadata for the documents and their keys is kept in RAM at all times. While this does add a bit of overhead per item, it also allows for extremely fast “miss” speeds which are critical to the operation of some applications….we don’t have to scan a disk to know when we don’t have some data. The cluster manager is based on Erlang/OTP which was developed by Ericsson to deal with managing hundreds or even thousands of distributed telco switches. This component is responsible for configuration, administration, process monitoring, statistics gathering and the UI and REST interface. Note that there is no data manipulation done through this interface.
  • #24 The application makes a call for a key called NYC MQ1 We run the key through the crc 32 function and the result of that hash function is that it points to vbucket3 Which in turn points to couchbase server number 1
  • #25 We now run a different key through through the has and we now come up with differnet vbucket, vbucket 4 and that points to server 3
  • #26 We now run a different key through through the has and we now come up with differnet vbucket, vbucket 4 and that points to server 3
  • #32 KEY POINT: COUCHBASE’S MEMORY-TO-MEMORY DATA REPLICATION IS MARKETING DEFINING AND UNIQUE TO COUCHBASE – IT’S ONE OF THE KEY REASONS ENTERPRISES CHOOSE COUCHBASE OVER RELATIONAL DATABASES AND OTHER NOSQL PRODUCTS. The built-in replication in Couchbase is extremely fast and highly scalable. It’s memory to memory – which means it’s not limited by the slower speed of reading data from a disk, so it’s very, very fast. And it’s extremely scalable – you can very high throughput with large numbers of writes going from one cluster to another. You can have different topology on both sides. Obviously you need to have the capacity appropriately sized so they can handle the load. This memory-to-memory replication is market defining and unique to Couchbase. No other solution like this, built into the database, exists in the market today. This is one of the key reasons enterprises choose Couchbase over other NoSQL products like MongoDB and Cassandra, and over relational databases.
  • #33 Every node must be able to talk to every other node in each cluster…this has certain implications for cloud deployments: http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-tasks-xdcr-cloud.html
  • #37 This slide has an click-by-click animation 1.  (click) A set request comes in from the application . 2.  Couchbase Server responses back that they key is written 3. (click)Couchbase Server then Replicates the data out to memory in the other nodes At the same time it is put the data into a write queue to be persisted to disk (click)Once it is on disk, the item is processed by the view engine and sent out any configured XDCR link to one or more clusters
  • #38 Indexing and querying distributed create indexes on the fields in JSON documents Called Views in Couchabse Server Views are queried to find the objects you are interested in, e.g. range queries to find all players that have black sheep on their farm (if asked: No ad-hoc query language. Index are described via simple Javascript.) Incremental Map Reduce “Normal” map reduce is batch based: i.e. it has to run across all data everytime. So you don’t get updated results often, especially over large data sets. Incremental Map reduce is only considering data that has changed and then calculates the updated result. This happens fast, in near real-time. Distributed across all nodes, so able to cope with large data amounts Does single map and reduce step, so great for simple analytics like leaderboards, counts sums, across data having specific attributes/charcteristics. Full Text Search Integration with separate Elastic search cluster, using XDCR technology Robust, so will efficiently cope with node failures rebalances or interrupted connections to keep the full text index in sync Elastic search is a very fast JSON document based full text indexing open source solution, based on Apache Lucene (the same as used by SOLR that more people will know) Elastic search is also clustered and scales easily and provides very flexible and powerful full text search capabilities
  • #39 This slide has an click-by-click animation 1.  (click) A set request comes in from the application . 2.  Couchbase Server responses back that they key is written 3. (click)Couchbase Server then Replicates the data out to memory in the other nodes At the same time it is put the data into a write queue to be persisted to disk (click)Once it is on disk, the item is processed by the view engine and sent out any configured XDCR link to one or more clusters
  • #41 http://blog.couchbase.com/couchbase-and-full-text-search-couchbase-transport-elastic-search ElasticSearch cluster is fed the documents from the Couchbase Server cluster Elastic search indexes the fields(configurable which ones) and by default will only store references back to the document id The application does document access via the Couchbase Server Cluster and uses The Views and incremental map reduce on the Couchbase cluster. For full text queries it queries the Ealstic search cluster directly (simple Http and JSON interface) The full text queries typically returns the ids of the matching documents. Documents are then retrieved from the Couchbase Server cluster. This way the high throughput document access always comes from high performance Couchbase Cluster.
  • #42 http://blog.couchbase.com/couchbase-and-full-text-search-couchbase-transport-elastic-search ElasticSearch cluster is fed the documents from the Couchbase Server cluster Elastic search indexes the fields(configurable which ones) and by default will only store references back to the document id The application does document access via the Couchbase Server Cluster and uses The Views and incremental map reduce on the Couchbase cluster. For full text queries it queries the Ealstic search cluster directly (simple Http and JSON interface) The full text queries typically returns the ids of the matching documents. Documents are then retrieved from the Couchbase Server cluster. This way the high throughput document access always comes from high performance Couchbase Cluster.
  • #47 We now run a different key through through the has and we now come up with differnet vbucket, vbucket 4 and that points to server 3