Couchbase Data Platform | Big Data Demystified

Couchbase NoSQL
Platform
Technical overview
Lior King
Sr. Solution Architect
Lior.King@Couchbase.com

About
Couchbase
400~
EMPLOYEES
800+
CUSTOMERS
100%
OPEN SOURCE

500+ Digital Businesses Run on Couchbase
6 of the Top 10
E-Commerce
Companies
in the US
6 of the Top 10
US & European
Broadcast
Companies
6 of the Top 10
Online Casino
Gaming
Companies
The Top 3
Credit Reporting
Companies
The top 3
GDS Companies
3 of the Top 10
Airlines

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2019. All rights reserved.
Couchbase Data
Platform
• Service-Centric Clustered Data System
- Multi-process Architecture
- Dynamic Distribution of Facilities
- Cluster Map Distribution
- Automatic Failover
- Enterprise Monitoring/Management
- Security
• Offline Mobile Data Integration
• Streaming REST API
• SQL-like Query Engine for JSON
• Clustered* Global Indexes
• Lowest Latency Key-Value API
• Active-Active Inter-DC Replication
• Local Aggregate Indexes
• Full-Text Search*
• Operational Analytics*

Couchbase = K/V + Document DB + ….
• Couchbase is a hybrid engine system:
• Super fast K/V engine
• Based on Memcached distributed cache.
• Document DB engine
• Uses the K/V engine for super fast performance
• ANSI SQL-like language on JSON data.
• A distributed cache & a database –
IN ONE PLATFORM

Why use a document based DB?
• Flexible schema = faster development.
• No code impedance
• The data structure in the database DB the data structure in your code.
• Easy & fast deployments.
• Easy maintenance.
• Best fit microservices architecture.
So why does RDBMS still very popular?
SQL

The Power of the Flexible JSON Schema
• Ability to store data in multiple
ways
o Denormalized single document, as
opposed to normalizing data across
multiple table
o Dynamic Schema to add new values
when needed

Efficient Sub-Document Operations
• Document Mutations:
• Atomic Operate on individual fields
• Identical syntax behavior to regular bucket methods
(upsert, insert, get, replace)
• Support for JSON fragments.
• Support for Arrays with uniqueness guarantees and
ordinal placement (front/back)

Nickel (N1QL) : SQL-Like Querying Support
• SQL-like Query Language
• Expressive, familiar, and feature-rich language for querying, transforming, and manipulating
JSON data
• ANSI 92 SQL Compatible – Selects, Inserts, Updates, Group By, Sort,
Functions etc.
• N1QL extends SQL to handle data that is:
• Nested: Contains nested objects, arrays
• Heterogeneous: Schema-optional, non-uniform
• Distributed: Partitioned across a cluster Flexibility
of JSON
Power of
SQL

Storing And Retrieving Documents

Memory First Architecture
• Sub millisecond speed
• No bottlenecks
• Based on Memcached

Write Operation
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
QUEUE
REPLICATION
QUEUE
DOC 1
DOC 1DOC 1

Couchbase Read Operation
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
QUEUE
REPLICATION
QUEUE
DOC 1
GET
DOC 1
DOC 1

Cache Ejection
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
QUEUE
REPLICATION
QUEUE
DOC 1
DOC 2DOC 3DOC 4DOC 5
DOC 1
DOC 2 DOC 3 DOC 4 DOC 5
Single-node type means
easier administration and
scaling
 Layer consolidation means read
through and write through cache
 Couchbase automatically removes
data that has already been
persisted from RAM

Cache Miss
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
QUEUE
REPLICATION
QUEUE
DOC 1 DOC 2 DOC 3 DOC 4 DOC 5
DOC 2 DOC 3 DOC 4 DOC 5
GET
DOC 1
DOC 1
DOC 1
Single-node type means
easier administration and
scaling
 Layer consolidation means 1
single interface for App to talk to
and get its data back as fast as
possible
 Separation of cache and disk
allows for fastest access out of
RAM while pulling data from disk
in parallel

Persistence
• Guards against most form of
failures
• Protects against data loss
• Configurable durability
• Always on Availability

Auto Sharding – Bucket and vBuckets
Virtual buckets
 A bucket is a logical, unique key space
 Multiple buckets can exist within a single cluster of nodes
 Each bucket has active and replica data sets (1, 2 or 3 extra
copies)
 Each data set has 1024 Virtual Buckets (vBuckets)
 Each vBucket contains 1/1024th portion of the data set
 vBuckets do not have a fixed physical server location
 Mapping between the vBuckets and physical servers is
called the cluster map
 Document IDs (keys) always get hashed to the same vbucket
 Couchbase SDK’s lookup the vbucket -> server mapping

Virtual Buckets Replication
vB
Data buckets
vB
1 ….. 1024
Active Virtual buckets
vB vB
1 ….. 1024
Replica Virtual buckets

Rebalance
ACTIVE ACTIVE ACTIVE
REPLICA REPLICA REPLICA
Couchbase Server 1 Couchbase Server 2 Couchbase Server 3
ACTIVE ACTIVE
REPLICA REPLICA
Couchbase Server 4 Couchbase Server 5
SHARD
5
SHARD
2
SHARD SHARD
SHARD
4
SHARD SHARD
SHARD
1
SHARD
3
SHARD SHARD
SHARD
4
SHARD
1
SHARD
8
SHARD SHARD SHARD
SHARD
6
SHARD
3
SHARD
2
SHARD SHARD SHARD
SHARD
7
SHARD
9
SHARD
5
SHARD SHARD SHARD
SHARD
7
SHARD
SHARD
6
SHARD
SHARD
8
SHARD
9
SHARD
READ/WRITE/UPDATE

Fail over
ACTIVE ACTIVE ACTIVE
REPLICA REPLICA REPLICA
Couchbase Server 1 Couchbase Server 2 Couchbase Server 3
ACTIVE ACTIVE
REPLICA REPLICA
Couchbase Server 4 Couchbase Server 5
SHARD
5
SHARD
2
SHARD SHARD
SHARD
4
SHARD SHARD
SHARD
1
SHARD
3
SHARD SHARD
SHARD
4
SHARD
1
SHARD
8
SHARD SHARD
SHARDSHARD
6
SHARD
2
SHARD SHARD SHARD
SHARD
7
SHARD
9
SHARD
5
SHARD SHARD
SHARD
SHARD
7
SHARD
SHARD
6
SHARDSHARD
8
SHARD
9
SHARD
SHARD
3
SHARD
1
SHARD
3
SHARD

Elastic Scalability
• Linear scalability by adding nodes
• Multi-Dimensional Scalability
(MDS)
• Extremely easy scaling

Multi Dimensional Scaling (MDS)
NODE 1 NODE 14
Data Full
Text
AnalyticsGlobal
Index
Query Eventing
Cluster Manager
Managed
Cache
Key-Value
Store
Document
Database Mobile
N1QL
Query
Full Text
Search Analytics

Replication
• High availability from node failures
• Disaster recovery from data center
failures with XDCR (Cross Data Center
Replication)
• Supports Active-Active replication
between data centers.

XDCR
(Cross Data Center
Replication)
25
Active
Active
Active

Query
• Using SQL for JSON called N1QL
• If you know SQL – N1QL will look
extremely familiar.
• Support for ANSI joins, aggregations,
subqueries, ordering etc.

{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-2847-3909",
"expiry" : "2019-03"
},
{
"type" : "master",
"cardnum" : "6274-2842-2847-3909",
"expiry" : "2019-03"
}
],
"Connections" : [
{
"CustId" : "XYZ987",
"Name" : "Joe Smith"
},
{
"CustId" : "PQR823",
"Name" : "Dylan Smith"
}
{
"CustId" : "PQR823",
"Name" : "Dylan Smith"
}
],
"Purchases" : [
{ "id":12, item: "mac", "amt": 2823.52 }
{ "id":19, item: "ipad2", "amt": 623.52 }
]
}
LoyaltyInfo Result Documents
Orders
Customer
You specify WHAT
Couchbase Server figures out
HOW
Input: JSON Output: JSON
N1QL is Declarative: What Vs How

N1QL (Example)
SELECT customers.id,
customers.NAME.lastname,
customers.NAME.firstname
Sum(orderline.amount)
FROM orders UNNEST orders.lineitems AS orderline
JOIN customers ON KEYS orders.custid
WHERE customers.state = 'NY'
GROUP BY customers.id,
customers.NAME.lastname
HAVING sum(orderline.amount) > 10000
ORDER BY sum(orderline.amount) DESC
Dotted sub-document reference
Names are CASE-SENSITIVE
UNNEST to flatten the arrays
JOINS with Document KEY of
customers

Query Execution Flow
1. Application submits
N1QL query
2. Query is parsed,
analyzed and plan is
created
1
2

3. Query Service makes
request to Index
Service
4. Index Service returns
document keys and
data
3
4

5. If Covering Index, skip
step 6
6. If filtering is required,
fetch documents from
Data Service56

7. Apply final logic (e.g.
SORT, ORDER BY)
8. Return formatted
results to application
7
8

Data Modification Statements
• UPDATE … SET … WHERE …
• DELETE FROM … WHERE …
• INSERT INTO … ( KEY, VALUE ) VALUES …
• INSERT INTO … ( KEY …, VALUE … ) SELECT …
• MERGE INTO … USING … ON …
WHEN [ NOT ] MATCHED THEN …
Note: Couchbase provides per-document atomicity.

Data Modification Statements
INSERT INTO ORDERS (KEY, VALUE)
VALUES ("1.ABC.X382", {"O_ID":482, "O_D_ID":3, "O_W_ID":4});
UPDATE ORDERS
SET O_CARRIER_ID = ”ABC987”
WHERE O_ID = 482 AND O_D_ID = 3 AND O_W_ID = 4
DELETE FROM NEW_ORDER
WHERE NO_D_ID = 291 AND
NO_W_ID = 3482 AND
NO_O_ID = 2483
JSON literals can be used in
any expression

Full SQL Pipeline
©2019 Couchbase. All rights reserved. 38
Global
Index
CLIENT
Key/Value API
FetchScanParse Plan Join Filter
Pre-
Aggregate Offset Limit Project
Data-parallel — Query is N data streams over N cores*
Memory-based
Pluggable architecture — datastore, index…
REQUEST RESPONSE
SortAggregate

● Use cases samples:
○ revenue growth month over month
○ top N sale districts by revenues for
a given week
○ ranking of sales person by region
based on revenue booked
● Answer common but complex business
queries with minimal lines of code and
optimized performance
● Couchbase is the first NoSQL
Database to support ANSI Window
Functions
ANSI Window Functions
With ANSI Window Functions, developers can simplify financial and statistical aggregations in
an easy and optimized way

ANSI Common Table Expression (CTE)
● CTE allows developer to isolate
SQL statement into temporary
named result set that can be
referenced as a source table in
the context of a larger query
● Offers the advantages of
readability and ease of
maintenance of complex queries
without compromising
performance.
● Couchbase is the first NoSQL
Database to support ANSI CTE.
With ANSI Common Table Expression, developers have ease of maintenance and better
readability by naming temporary SQL statements
SELECT b.month,
b.current_period_task_count,
ROUND(((b.current_period_task_count - b.last_period_task_count ) /
b.last_period_task_count),2)
FROM last_period_task AS b
last_period_task AS (
SELECT x.month, x.current_period_task_count,
LAG(x.current_period_task_count) OVER (ORDER BY x.month)
AS last_period_task_count
FROM current_period_task x
)
WITH current_period_task AS (
SELECT DATE_TRUNC_STR(a.startDate,'month') AS month,
COUNT(1) AS current_period_task_count
FROM crm a
WHERE a.type='activity' AND a.activityType = 'Task'
AND DATE_PART_STR(a.startDate,'year') = 2018
GROUP BY DATE_TRUNC_STR(a.startDate,'month’)
),

User-Defined Functions (Developer Preview)
CREATE FUNCTION getsalestax(state,city)
DROP FUNCTION getsalestax
EXECUTE FUNCTION getsalestax("CA","Santa Clara")
SELECT getsalestax(state,city) FROM invoice
● Allow developers to define custom
functions in Javascript (similar PL/SQL)
callable from N1QL queries. Interactive
debugger to simplify development and
testing
● Server-side logic that can be reused by any
application and micro services; improve
code maintenance and developer
productivity
● Improve code performance by bringing
application logic closer to the data
With User-Defined Functions, developer define callable custom functions to ease development,
simplify reusability and improve code performance.

Cost-Based Optimizer (Developer Preview)
Statistics &
metadata
● Cost-based optimizer that generates the optimal
access path based on statistics collected on the
data
● Eliminates time tweaking a query and providing
optimizer hints to get the rule-based optimizer to
pick the right query plan
● Leverages decades of research and experience
in query optimization to collect and use statistics
on JSON, arrays, and objects.
● Couchbase is the first NoSQL database with
dynamic schema to support cost-based optimizer
With Cost-Based Optimizer, queries will run faster by using optimal access path without the
need to provide optimizer hints

Index Advisor (Developer Preview)
● The Index Advisor suggests appropriate
indexes to speed up a given query, taking
the guesswork out of query tuning
● The Index Advisor can also monitor and
analyze the statistics collected from
running a workload, and suggest the
indexes that will speed up the queries in
the workload
● The Index Advisor greatly reduces the
complexity and efforts required for
enterprise developers and operations
engineers to determine the right indexes
to speed up their queries
With Index Advisor, developer can create better indexes based on the suggestions and speed
up queries easily.

Indexing
• Quick and efficient access to your data
• No need to scan all the documents
• Support for filtered indexes, compounded
indexes and covering indexes.

Index Options
Index Type Description
1 Primary Index Index on the document key on the whole bucket
2 Named Primary
Index
Give name for the primary index. Allows multiple primary indexes in the cluster
3 Secondary Index Index on the key-value or document-key
4 Secondary
Composite Index
Index on more than one key-value
5 Functional Index Index on function or expression on key-values
6 Array Index Index individual elements of the arrays
7 Covering Index Query able to answer using the the data from the index and skips retrieving the
item.
8 Adaptive Index Special type of GSI array index that can index all or specified fields of a
document.
9 Replica Index The feature of indexing that allows load balancing. Thus providing scale-out,
multi-dimensional scaling, performance, and high availability.

Full Text Search
• Search within texts – extremely fast
• Language aware (supports 19 languages)
• Scoring results mechanism
• Rich querying capabilities

Underlying Concepts
Scoring
User searches…
Beautiful
Searched as…
Beauti
Document contains…
Beauty
Indexed as…
Beauti
stemmingstemming Text Analysis
✔
Match!
Inverted indexes Language awareness
Terms
my: Doc 1, Doc 2, Doc 3
dog: Doc 1, Doc 2, Doc 81
has: Doc 1, Doc 2, Doc 3
fleas: Doc 1, Doc 81
…
Where found

Full Text Search - Capabilities
Query
 Basic: Match, Match Phrase, Fuzzy, Prefix, Regexp, Wildcard, Boolean Field
 Compound: QueryString, Boolean, Conjunction, Disjunction
 Range: DateRange, NumericRange
 Special Purpose: DocID, MatchAll, MatchNone, Phrase, Term, Geospatial
 Scoring (TF/IDF), boosting, field scoping
 New/DP: TermRange, Geospatial
Indexing
 Real time indexing (inverted index, auto-updated upon mutation)
 Default map and map by document type
 Dynamic mapping
 Stored fields, Term vectors
 Analyzers: Tokenization, Token Filtering (stop word removal, stemming – language specific)
 Aliasing

SEARCH Predicates in Query (Full Text Search)
● Couchbase provides a comprehensive search capability in query beyond the simple LIKE() operator in
most databases
● The SEARCH() operator supports keyword and fuzzy matchings across multiple document fields
● Developers do not have to write complex code to process and combine the results from separate SQL
and search queries
● Better query performance with inverted indexes for search predicates instead of inefficient scans for
LIKE()
SELECT * FROM `beer-sample` b
WHERE SEARCH(b.desc, "fruity”) AND b.abv < 0.5;
With Search predicates, developers can combine SQL and Search queries for powerful
integration and better query performances

Mobile
• Full stack platform for mobile and IoT
apps.
• Real time automatic sync – built-in
• Fully secured
• Data can be accessed when offline.

Couchbase - The Data Platform for Mobile Engagement
COUCHBASE LITE SYNC GATEWAY COUCHBASE SERVER
Lightweight embedded NoSQL database
with full CRUD and
query functionality.
Secure web gateway with
synchronization, data access, and
data integration APIs for accessing,
integrating, and synchronizing data
over the web.
Highly scalable, highly available,
high performance NoSQL
database server.
Client Middle Tier Storage
Security
Built-in enterprise level security throughout the entire stack includes user authentication, user and role based data access control (RBAC), secure
transport (TLS), and 256-bit AES full database encryption.

Eventing
• Triggering user defined business logic in real
time
• Runs on the cluster.
• Logic is written in JavaScript (V8)

Analytics
• Powerful parallel query processing over
JSON
• Made for long running complex SQL-like
queries
• Complete workload isolation – does not affect
the operational data processing

Distributed
Complex Query
Processing
Parallel Query Processing
• Distributed Massively Parallel Query
Processor (MPP) quickly executes complex
queries on larger datasets
• Comprehensive SQL-like
query language
56
Query takes 1 minute Query takes 5 seconds
©2019 Couchbase. All rights reserved.

Container and Cloud
Deployment
• Couchbase replication is zone and region
aware –ideal for cloud deployments.
• Pre built modules for all major cloud vendors.
• Support for containers
• Automation with Couchbase Autonomous
Operator for Kubernetes.
• Support for Red Hat OpenShift

59
Naming
Image to use
Size
How many

Distributed ACID Transactions
WHY
With distributed ACID transactions, developers simplify application logic by relying on all-or-
nothing semantics for durably modifying multiple documents distributed on different nodes.
Transition from RDBMS schema:
● De-normalization has limitations
Multi-Asset Coordination:
● Transfer from one user to another
● Reservation items such as flights
Business/Application-level transaction
needs to modify multiple documents (all-or-
nothing):
● Microservices SAGAs - Event based
orchestration
transactions.run((txnctx) -> {
// Insert a document
JsonDocument doc1 =
JsonDocument.create("newDoc",JsonObject.create());
txnctx.insert(bucket, doc1);
// Replace a document
TransactionJsonDocument doc2 =
txnctx.getOrError(bucket, "doc2");
doc2.content().put( "name", "bob");
txnctx.replace(doc2);
// Commit transaction
txnctx.commit();
});

ACID in Couchbase
Couchbase Server 6.5 provides strong ACID guarantees while balancing scalability, availability
and performance.
A Atomicity
Guarantees all-or-nothing semantics for updating multiple documents in more
than one shards on different nodes.
C Consistency
Replicas are strongly consistent for chosen durability level. Indexes and
XDCR cluster are eventually consistent.
I Isolation
Read Committed isolation for concurrent readers as it provides strong
semantics without compromising availability, scalability, and performance.
Applications can specify scan consistency level for performance.
D Durability
Data protection under failures: 3 different levels - replicate to majority of the nodes;
replicate to majority and persist to disk on primary; or persist to disk on majority of the
nodes.

Thank
you
Lior King
Sr. Solution Architect
Lior.King@Couchbase.com

Couchbase Data Platform | Big Data Demystified

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Couchbase Data Platform | Big Data Demystified

Similar to Couchbase Data Platform | Big Data Demystified (20)

More from Omid Vahdaty

More from Omid Vahdaty (20)

Recently uploaded

Recently uploaded (20)

Couchbase Data Platform | Big Data Demystified