Couchbase NoSQL
Platform
Technical overview
Lior King
Sr. Solution Architect
Lior.King@Couchbase.com
About
Couchbase
400~
EMPLOYEES
800+
CUSTOMERS
100%
OPEN SOURCE
500+ Digital Businesses Run on Couchbase
6 of the Top 10
E-Commerce
Companies
in the US
6 of the Top 10
US & European
Broadcast
Companies
6 of the Top 10
Online Casino
Gaming
Companies
The Top 3
Credit Reporting
Companies
The top 3
GDS Companies
3 of the Top 10
Airlines
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2019. All rights reserved.
Couchbase Data
Platform
• Service-Centric Clustered Data System
- Multi-process Architecture
- Dynamic Distribution of Facilities
- Cluster Map Distribution
- Automatic Failover
- Enterprise Monitoring/Management
- Security
• Offline Mobile Data Integration
• Streaming REST API
• SQL-like Query Engine for JSON
• Clustered* Global Indexes
• Lowest Latency Key-Value API
• Active-Active Inter-DC Replication
• Local Aggregate Indexes
• Full-Text Search*
• Operational Analytics*
Couchbase = K/V + Document DB + ….
• Couchbase is a hybrid engine system:
• Super fast K/V engine
• Based on Memcached distributed cache.
• Document DB engine
• Uses the K/V engine for super fast performance
• ANSI SQL-like language on JSON data.
• A distributed cache & a database –
IN ONE PLATFORM
Why use a document based DB?
• Flexible schema = faster development.
• No code impedance
• The data structure in the database DB the data structure in your code.
• Easy & fast deployments.
• Easy maintenance.
• Best fit microservices architecture.
So why does RDBMS still very popular?
SQL
The Power of the Flexible JSON Schema
• Ability to store data in multiple
ways
o Denormalized single document, as
opposed to normalizing data across
multiple table
o Dynamic Schema to add new values
when needed
Efficient Sub-Document Operations
• Document Mutations:
• Atomic Operate on individual fields
• Identical syntax behavior to regular bucket methods
(upsert, insert, get, replace)
• Support for JSON fragments.
• Support for Arrays with uniqueness guarantees and
ordinal placement (front/back)
Nickel (N1QL) : SQL-Like Querying Support
• SQL-like Query Language
• Expressive, familiar, and feature-rich language for querying, transforming, and manipulating
JSON data
• ANSI 92 SQL Compatible – Selects, Inserts, Updates, Group By, Sort,
Functions etc.
• N1QL extends SQL to handle data that is:
• Nested: Contains nested objects, arrays
• Heterogeneous: Schema-optional, non-uniform
• Distributed: Partitioned across a cluster Flexibility
of JSON
Power of
SQL
Storing And Retrieving Documents
Memory First Architecture
• Sub millisecond speed
• No bottlenecks
• Based on Memcached
Write Operation
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
QUEUE
REPLICATION
QUEUE
DOC 1
DOC 1DOC 1
Couchbase Read Operation
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
QUEUE
REPLICATION
QUEUE
DOC 1
GET
DOC 1
DOC 1
Cache Ejection
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
QUEUE
REPLICATION
QUEUE
DOC 1
DOC 2DOC 3DOC 4DOC 5
DOC 1
DOC 2 DOC 3 DOC 4 DOC 5
Single-node type means
easier administration and
scaling
 Layer consolidation means read
through and write through cache
 Couchbase automatically removes
data that has already been
persisted from RAM
Cache Miss
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
QUEUE
REPLICATION
QUEUE
DOC 1 DOC 2 DOC 3 DOC 4 DOC 5
DOC 2 DOC 3 DOC 4 DOC 5
GET
DOC 1
DOC 1
DOC 1
Single-node type means
easier administration and
scaling
 Layer consolidation means 1
single interface for App to talk to
and get its data back as fast as
possible
 Separation of cache and disk
allows for fastest access out of
RAM while pulling data from disk
in parallel
Persistence
• Guards against most form of
failures
• Protects against data loss
• Configurable durability
• Always on Availability
Auto Sharding – Bucket and vBuckets
Virtual buckets
 A bucket is a logical, unique key space
 Multiple buckets can exist within a single cluster of nodes
 Each bucket has active and replica data sets (1, 2 or 3 extra
copies)
 Each data set has 1024 Virtual Buckets (vBuckets)
 Each vBucket contains 1/1024th portion of the data set
 vBuckets do not have a fixed physical server location
 Mapping between the vBuckets and physical servers is
called the cluster map
 Document IDs (keys) always get hashed to the same vbucket
 Couchbase SDK’s lookup the vbucket -> server mapping
Virtual Buckets Replication
vB
Data buckets
vB
1 ….. 1024
Active Virtual buckets
vB vB
1 ….. 1024
Replica Virtual buckets
The cluster map
Rebalance
ACTIVE ACTIVE ACTIVE
REPLICA REPLICA REPLICA
Couchbase Server 1 Couchbase Server 2 Couchbase Server 3
ACTIVE ACTIVE
REPLICA REPLICA
Couchbase Server 4 Couchbase Server 5
SHARD
5
SHARD
2
SHARD SHARD
SHARD
4
SHARD SHARD
SHARD
1
SHARD
3
SHARD SHARD
SHARD
4
SHARD
1
SHARD
8
SHARD SHARD SHARD
SHARD
6
SHARD
3
SHARD
2
SHARD SHARD SHARD
SHARD
7
SHARD
9
SHARD
5
SHARD SHARD SHARD
SHARD
7
SHARD
SHARD
6
SHARD
SHARD
8
SHARD
9
SHARD
READ/WRITE/UPDATE
Fail over
ACTIVE ACTIVE ACTIVE
REPLICA REPLICA REPLICA
Couchbase Server 1 Couchbase Server 2 Couchbase Server 3
ACTIVE ACTIVE
REPLICA REPLICA
Couchbase Server 4 Couchbase Server 5
SHARD
5
SHARD
2
SHARD SHARD
SHARD
4
SHARD SHARD
SHARD
1
SHARD
3
SHARD SHARD
SHARD
4
SHARD
1
SHARD
8
SHARD SHARD
SHARDSHARD
6
SHARD
2
SHARD SHARD SHARD
SHARD
7
SHARD
9
SHARD
5
SHARD SHARD
SHARD
SHARD
7
SHARD
SHARD
6
SHARDSHARD
8
SHARD
9
SHARD
SHARD
3
SHARD
1
SHARD
3
SHARD
Elastic Scalability
• Linear scalability by adding nodes
• Multi-Dimensional Scalability
(MDS)
• Extremely easy scaling
Multi Dimensional Scaling (MDS)
NODE 1 NODE 14
Data Full
Text
AnalyticsGlobal
Index
Query Eventing
Cluster Manager
Managed
Cache
Key-Value
Store
Document
Database Mobile
N1QL
Query
Full Text
Search Analytics
Replication
• High availability from node failures
• Disaster recovery from data center
failures with XDCR (Cross Data Center
Replication)
• Supports Active-Active replication
between data centers.
XDCR
(Cross Data Center
Replication)
25
Active
Active
Active
Couchbase Manager
Demo
Query
• Using SQL for JSON called N1QL
• If you know SQL – N1QL will look
extremely familiar.
• Support for ANSI joins, aggregations,
subqueries, ordering etc.
28
N1QL – SQL for JSON
29
N1QL – SQL for JSON
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-2847-3909",
"expiry" : "2019-03"
},
{
"type" : "master",
"cardnum" : "6274-2842-2847-3909",
"expiry" : "2019-03"
}
],
"Connections" : [
{
"CustId" : "XYZ987",
"Name" : "Joe Smith"
},
{
"CustId" : "PQR823",
"Name" : "Dylan Smith"
}
{
"CustId" : "PQR823",
"Name" : "Dylan Smith"
}
],
"Purchases" : [
{ "id":12, item: "mac", "amt": 2823.52 }
{ "id":19, item: "ipad2", "amt": 623.52 }
]
}
LoyaltyInfo Result Documents
Orders
Customer
You specify WHAT
Couchbase Server figures out
HOW
Input: JSON Output: JSON
N1QL is Declarative: What Vs How
N1QL (Example)
SELECT customers.id,
customers.NAME.lastname,
customers.NAME.firstname
Sum(orderline.amount)
FROM orders UNNEST orders.lineitems AS orderline
JOIN customers ON KEYS orders.custid
WHERE customers.state = 'NY'
GROUP BY customers.id,
customers.NAME.lastname
HAVING sum(orderline.amount) > 10000
ORDER BY sum(orderline.amount) DESC
Dotted sub-document reference
Names are CASE-SENSITIVE
UNNEST to flatten the arrays
JOINS with Document KEY of
customers
Query Execution Flow
1. Application submits
N1QL query
2. Query is parsed,
analyzed and plan is
created
1
2
Query Execution Flow
3. Query Service makes
request to Index
Service
4. Index Service returns
document keys and
data
3
4
Query Execution Flow
5. If Covering Index, skip
step 6
6. If filtering is required,
fetch documents from
Data Service56
Query Execution Flow
7. Apply final logic (e.g.
SORT, ORDER BY)
8. Return formatted
results to application
7
8
Data Modification Statements
• UPDATE … SET … WHERE …
• DELETE FROM … WHERE …
• INSERT INTO … ( KEY, VALUE ) VALUES …
• INSERT INTO … ( KEY …, VALUE … ) SELECT …
• MERGE INTO … USING … ON …
WHEN [ NOT ] MATCHED THEN …
Note: Couchbase provides per-document atomicity.
Data Modification Statements
INSERT INTO ORDERS (KEY, VALUE)
VALUES ("1.ABC.X382", {"O_ID":482, "O_D_ID":3, "O_W_ID":4});
UPDATE ORDERS
SET O_CARRIER_ID = ”ABC987”
WHERE O_ID = 482 AND O_D_ID = 3 AND O_W_ID = 4
DELETE FROM NEW_ORDER
WHERE NO_D_ID = 291 AND
NO_W_ID = 3482 AND
NO_O_ID = 2483
JSON literals can be used in
any expression
Full SQL Pipeline
©2019 Couchbase. All rights reserved. 38
Global
Index
CLIENT
Key/Value API
FetchScanParse Plan Join Filter
Pre-
Aggregate Offset Limit Project
Data-parallel — Query is N data streams over N cores*
Memory-based
Pluggable architecture — datastore, index…
REQUEST RESPONSE
SortAggregate
● Use cases samples:
○ revenue growth month over month
○ top N sale districts by revenues for
a given week
○ ranking of sales person by region
based on revenue booked
● Answer common but complex business
queries with minimal lines of code and
optimized performance
● Couchbase is the first NoSQL
Database to support ANSI Window
Functions
ANSI Window Functions
With ANSI Window Functions, developers can simplify financial and statistical aggregations in
an easy and optimized way
ANSI Common Table Expression (CTE)
● CTE allows developer to isolate
SQL statement into temporary
named result set that can be
referenced as a source table in
the context of a larger query
● Offers the advantages of
readability and ease of
maintenance of complex queries
without compromising
performance.
● Couchbase is the first NoSQL
Database to support ANSI CTE.
With ANSI Common Table Expression, developers have ease of maintenance and better
readability by naming temporary SQL statements
SELECT b.month,
b.current_period_task_count,
ROUND(((b.current_period_task_count - b.last_period_task_count ) /
b.last_period_task_count),2)
FROM last_period_task AS b
last_period_task AS (
SELECT x.month, x.current_period_task_count,
LAG(x.current_period_task_count) OVER (ORDER BY x.month)
AS last_period_task_count
FROM current_period_task x
)
WITH current_period_task AS (
SELECT DATE_TRUNC_STR(a.startDate,'month') AS month,
COUNT(1) AS current_period_task_count
FROM crm a
WHERE a.type='activity' AND a.activityType = 'Task'
AND DATE_PART_STR(a.startDate,'year') = 2018
GROUP BY DATE_TRUNC_STR(a.startDate,'month’)
),
User-Defined Functions (Developer Preview)
CREATE FUNCTION getsalestax(state,city)
DROP FUNCTION getsalestax
EXECUTE FUNCTION getsalestax("CA","Santa Clara")
SELECT getsalestax(state,city) FROM invoice
● Allow developers to define custom
functions in Javascript (similar PL/SQL)
callable from N1QL queries. Interactive
debugger to simplify development and
testing
● Server-side logic that can be reused by any
application and micro services; improve
code maintenance and developer
productivity
● Improve code performance by bringing
application logic closer to the data
With User-Defined Functions, developer define callable custom functions to ease development,
simplify reusability and improve code performance.
Cost-Based Optimizer (Developer Preview)
Statistics &
metadata
● Cost-based optimizer that generates the optimal
access path based on statistics collected on the
data
● Eliminates time tweaking a query and providing
optimizer hints to get the rule-based optimizer to
pick the right query plan
● Leverages decades of research and experience
in query optimization to collect and use statistics
on JSON, arrays, and objects.
● Couchbase is the first NoSQL database with
dynamic schema to support cost-based optimizer
With Cost-Based Optimizer, queries will run faster by using optimal access path without the
need to provide optimizer hints
Index Advisor (Developer Preview)
● The Index Advisor suggests appropriate
indexes to speed up a given query, taking
the guesswork out of query tuning
● The Index Advisor can also monitor and
analyze the statistics collected from
running a workload, and suggest the
indexes that will speed up the queries in
the workload
● The Index Advisor greatly reduces the
complexity and efforts required for
enterprise developers and operations
engineers to determine the right indexes
to speed up their queries
With Index Advisor, developer can create better indexes based on the suggestions and speed
up queries easily.
Query Monitor UI
Indexing
• Quick and efficient access to your data
• No need to scan all the documents
• Support for filtered indexes, compounded
indexes and covering indexes.
Index Options
Index Type Description
1 Primary Index Index on the document key on the whole bucket
2 Named Primary
Index
Give name for the primary index. Allows multiple primary indexes in the cluster
3 Secondary Index Index on the key-value or document-key
4 Secondary
Composite Index
Index on more than one key-value
5 Functional Index Index on function or expression on key-values
6 Array Index Index individual elements of the arrays
7 Covering Index Query able to answer using the the data from the index and skips retrieving the
item.
8 Adaptive Index Special type of GSI array index that can index all or specified fields of a
document.
9 Replica Index The feature of indexing that allows load balancing. Thus providing scale-out,
multi-dimensional scaling, performance, and high availability.
Basic N1QL Demo
Full Text Search
• Search within texts – extremely fast
• Language aware (supports 19 languages)
• Scoring results mechanism
• Rich querying capabilities
Underlying Concepts
Scoring
User searches…
Beautiful
Searched as…
Beauti
Document contains…
Beauty
Indexed as…
Beauti
stemmingstemming Text Analysis
✔
Match!
Inverted indexes Language awareness
Terms
my: Doc 1, Doc 2, Doc 3
dog: Doc 1, Doc 2, Doc 81
has: Doc 1, Doc 2, Doc 3
fleas: Doc 1, Doc 81
…
Where found
Full Text Search - Capabilities
Query
 Basic: Match, Match Phrase, Fuzzy, Prefix, Regexp, Wildcard, Boolean Field
 Compound: QueryString, Boolean, Conjunction, Disjunction
 Range: DateRange, NumericRange
 Special Purpose: DocID, MatchAll, MatchNone, Phrase, Term, Geospatial
 Scoring (TF/IDF), boosting, field scoping
 New/DP: TermRange, Geospatial
Indexing
 Real time indexing (inverted index, auto-updated upon mutation)
 Default map and map by document type
 Dynamic mapping
 Stored fields, Term vectors
 Analyzers: Tokenization, Token Filtering (stop word removal, stemming – language specific)
 Aliasing
SEARCH Predicates in Query (Full Text Search)
● Couchbase provides a comprehensive search capability in query beyond the simple LIKE() operator in
most databases
● The SEARCH() operator supports keyword and fuzzy matchings across multiple document fields
● Developers do not have to write complex code to process and combine the results from separate SQL
and search queries
● Better query performance with inverted indexes for search predicates instead of inefficient scans for
LIKE()
SELECT * FROM `beer-sample` b
WHERE SEARCH(b.desc, "fruity”) AND b.abv < 0.5;
With Search predicates, developers can combine SQL and Search queries for powerful
integration and better query performances
Mobile
• Full stack platform for mobile and IoT
apps.
• Real time automatic sync – built-in
• Fully secured
• Data can be accessed when offline.
Couchbase - The Data Platform for Mobile Engagement
COUCHBASE LITE SYNC GATEWAY COUCHBASE SERVER
Lightweight embedded NoSQL database
with full CRUD and
query functionality.
Secure web gateway with
synchronization, data access, and
data integration APIs for accessing,
integrating, and synchronizing data
over the web.
Highly scalable, highly available,
high performance NoSQL
database server.
Client Middle Tier Storage
Security
Built-in enterprise level security throughout the entire stack includes user authentication, user and role based data access control (RBAC), secure
transport (TLS), and 256-bit AES full database encryption.
Eventing
• Triggering user defined business logic in real
time
• Runs on the cluster.
• Logic is written in JavaScript (V8)
Analytics
• Powerful parallel query processing over
JSON
• Made for long running complex SQL-like
queries
• Complete workload isolation – does not affect
the operational data processing
Distributed
Complex Query
Processing
Parallel Query Processing
• Distributed Massively Parallel Query
Processor (MPP) quickly executes complex
queries on larger datasets
• Comprehensive SQL-like
query language
56
Query takes 1 minute Query takes 5 seconds
©2019 Couchbase. All rights reserved.
Container and Cloud
Deployment
• Couchbase replication is zone and region
aware –ideal for cloud deployments.
• Pre built modules for all major cloud vendors.
• Support for containers
• Automation with Couchbase Autonomous
Operator for Kubernetes.
• Support for Red Hat OpenShift
Architecture
59
Naming
Image to use
Size
How many
Distributed ACID Transactions
WHY
With distributed ACID transactions, developers simplify application logic by relying on all-or-
nothing semantics for durably modifying multiple documents distributed on different nodes.
Transition from RDBMS schema:
● De-normalization has limitations
Multi-Asset Coordination:
● Transfer from one user to another
● Reservation items such as flights
Business/Application-level transaction
needs to modify multiple documents (all-or-
nothing):
● Microservices SAGAs - Event based
orchestration
transactions.run((txnctx) -> {
// Insert a document
JsonDocument doc1 =
JsonDocument.create("newDoc",JsonObject.create());
txnctx.insert(bucket, doc1);
// Replace a document
TransactionJsonDocument doc2 =
txnctx.getOrError(bucket, "doc2");
doc2.content().put( "name", "bob");
txnctx.replace(doc2);
// Commit transaction
txnctx.commit();
});
ACID in Couchbase
Couchbase Server 6.5 provides strong ACID guarantees while balancing scalability, availability
and performance.
A Atomicity
Guarantees all-or-nothing semantics for updating multiple documents in more
than one shards on different nodes.
C Consistency
Replicas are strongly consistent for chosen durability level. Indexes and
XDCR cluster are eventually consistent.
I Isolation
Read Committed isolation for concurrent readers as it provides strong
semantics without compromising availability, scalability, and performance.
Applications can specify scan consistency level for performance.
D Durability
Data protection under failures: 3 different levels - replicate to majority of the nodes;
replicate to majority and persist to disk on primary; or persist to disk on majority of the
nodes.
Thank
you
Lior King
Sr. Solution Architect
Lior.King@Couchbase.com

Couchbase Data Platform | Big Data Demystified

  • 1.
    Couchbase NoSQL Platform Technical overview LiorKing Sr. Solution Architect Lior.King@Couchbase.com
  • 2.
  • 3.
    500+ Digital BusinessesRun on Couchbase 6 of the Top 10 E-Commerce Companies in the US 6 of the Top 10 US & European Broadcast Companies 6 of the Top 10 Online Casino Gaming Companies The Top 3 Credit Reporting Companies The top 3 GDS Companies 3 of the Top 10 Airlines
  • 4.
    Confidential and Proprietary.Do not distribute without Couchbase consent. © Couchbase 2019. All rights reserved. Couchbase Data Platform • Service-Centric Clustered Data System - Multi-process Architecture - Dynamic Distribution of Facilities - Cluster Map Distribution - Automatic Failover - Enterprise Monitoring/Management - Security • Offline Mobile Data Integration • Streaming REST API • SQL-like Query Engine for JSON • Clustered* Global Indexes • Lowest Latency Key-Value API • Active-Active Inter-DC Replication • Local Aggregate Indexes • Full-Text Search* • Operational Analytics*
  • 5.
    Couchbase = K/V+ Document DB + …. • Couchbase is a hybrid engine system: • Super fast K/V engine • Based on Memcached distributed cache. • Document DB engine • Uses the K/V engine for super fast performance • ANSI SQL-like language on JSON data. • A distributed cache & a database – IN ONE PLATFORM
  • 6.
    Why use adocument based DB? • Flexible schema = faster development. • No code impedance • The data structure in the database DB the data structure in your code. • Easy & fast deployments. • Easy maintenance. • Best fit microservices architecture. So why does RDBMS still very popular? SQL
  • 7.
    The Power ofthe Flexible JSON Schema • Ability to store data in multiple ways o Denormalized single document, as opposed to normalizing data across multiple table o Dynamic Schema to add new values when needed
  • 8.
    Efficient Sub-Document Operations •Document Mutations: • Atomic Operate on individual fields • Identical syntax behavior to regular bucket methods (upsert, insert, get, replace) • Support for JSON fragments. • Support for Arrays with uniqueness guarantees and ordinal placement (front/back)
  • 9.
    Nickel (N1QL) :SQL-Like Querying Support • SQL-like Query Language • Expressive, familiar, and feature-rich language for querying, transforming, and manipulating JSON data • ANSI 92 SQL Compatible – Selects, Inserts, Updates, Group By, Sort, Functions etc. • N1QL extends SQL to handle data that is: • Nested: Contains nested objects, arrays • Heterogeneous: Schema-optional, non-uniform • Distributed: Partitioned across a cluster Flexibility of JSON Power of SQL
  • 10.
  • 11.
    Memory First Architecture •Sub millisecond speed • No bottlenecks • Based on Memcached
  • 12.
    Write Operation APPLICATION SERVER MANAGEDCACHE DISK DISK QUEUE REPLICATION QUEUE DOC 1 DOC 1DOC 1
  • 13.
    Couchbase Read Operation APPLICATIONSERVER MANAGED CACHE DISK DISK QUEUE REPLICATION QUEUE DOC 1 GET DOC 1 DOC 1
  • 14.
    Cache Ejection APPLICATION SERVER MANAGEDCACHE DISK DISK QUEUE REPLICATION QUEUE DOC 1 DOC 2DOC 3DOC 4DOC 5 DOC 1 DOC 2 DOC 3 DOC 4 DOC 5 Single-node type means easier administration and scaling  Layer consolidation means read through and write through cache  Couchbase automatically removes data that has already been persisted from RAM
  • 15.
    Cache Miss APPLICATION SERVER MANAGEDCACHE DISK DISK QUEUE REPLICATION QUEUE DOC 1 DOC 2 DOC 3 DOC 4 DOC 5 DOC 2 DOC 3 DOC 4 DOC 5 GET DOC 1 DOC 1 DOC 1 Single-node type means easier administration and scaling  Layer consolidation means 1 single interface for App to talk to and get its data back as fast as possible  Separation of cache and disk allows for fastest access out of RAM while pulling data from disk in parallel
  • 16.
    Persistence • Guards againstmost form of failures • Protects against data loss • Configurable durability • Always on Availability
  • 17.
    Auto Sharding –Bucket and vBuckets Virtual buckets  A bucket is a logical, unique key space  Multiple buckets can exist within a single cluster of nodes  Each bucket has active and replica data sets (1, 2 or 3 extra copies)  Each data set has 1024 Virtual Buckets (vBuckets)  Each vBucket contains 1/1024th portion of the data set  vBuckets do not have a fixed physical server location  Mapping between the vBuckets and physical servers is called the cluster map  Document IDs (keys) always get hashed to the same vbucket  Couchbase SDK’s lookup the vbucket -> server mapping
  • 18.
    Virtual Buckets Replication vB Databuckets vB 1 ….. 1024 Active Virtual buckets vB vB 1 ….. 1024 Replica Virtual buckets
  • 19.
  • 20.
    Rebalance ACTIVE ACTIVE ACTIVE REPLICAREPLICA REPLICA Couchbase Server 1 Couchbase Server 2 Couchbase Server 3 ACTIVE ACTIVE REPLICA REPLICA Couchbase Server 4 Couchbase Server 5 SHARD 5 SHARD 2 SHARD SHARD SHARD 4 SHARD SHARD SHARD 1 SHARD 3 SHARD SHARD SHARD 4 SHARD 1 SHARD 8 SHARD SHARD SHARD SHARD 6 SHARD 3 SHARD 2 SHARD SHARD SHARD SHARD 7 SHARD 9 SHARD 5 SHARD SHARD SHARD SHARD 7 SHARD SHARD 6 SHARD SHARD 8 SHARD 9 SHARD READ/WRITE/UPDATE
  • 21.
    Fail over ACTIVE ACTIVEACTIVE REPLICA REPLICA REPLICA Couchbase Server 1 Couchbase Server 2 Couchbase Server 3 ACTIVE ACTIVE REPLICA REPLICA Couchbase Server 4 Couchbase Server 5 SHARD 5 SHARD 2 SHARD SHARD SHARD 4 SHARD SHARD SHARD 1 SHARD 3 SHARD SHARD SHARD 4 SHARD 1 SHARD 8 SHARD SHARD SHARDSHARD 6 SHARD 2 SHARD SHARD SHARD SHARD 7 SHARD 9 SHARD 5 SHARD SHARD SHARD SHARD 7 SHARD SHARD 6 SHARDSHARD 8 SHARD 9 SHARD SHARD 3 SHARD 1 SHARD 3 SHARD
  • 22.
    Elastic Scalability • Linearscalability by adding nodes • Multi-Dimensional Scalability (MDS) • Extremely easy scaling
  • 23.
    Multi Dimensional Scaling(MDS) NODE 1 NODE 14 Data Full Text AnalyticsGlobal Index Query Eventing Cluster Manager Managed Cache Key-Value Store Document Database Mobile N1QL Query Full Text Search Analytics
  • 24.
    Replication • High availabilityfrom node failures • Disaster recovery from data center failures with XDCR (Cross Data Center Replication) • Supports Active-Active replication between data centers.
  • 25.
  • 26.
  • 27.
    Query • Using SQLfor JSON called N1QL • If you know SQL – N1QL will look extremely familiar. • Support for ANSI joins, aggregations, subqueries, ordering etc.
  • 28.
  • 29.
  • 30.
    { "Name" : "JaneSmith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2842-2847-3909", "expiry" : "2019-03" } ], "Connections" : [ { "CustId" : "XYZ987", "Name" : "Joe Smith" }, { "CustId" : "PQR823", "Name" : "Dylan Smith" } { "CustId" : "PQR823", "Name" : "Dylan Smith" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 } { "id":19, item: "ipad2", "amt": 623.52 } ] } LoyaltyInfo Result Documents Orders Customer You specify WHAT Couchbase Server figures out HOW Input: JSON Output: JSON N1QL is Declarative: What Vs How
  • 31.
    N1QL (Example) SELECT customers.id, customers.NAME.lastname, customers.NAME.firstname Sum(orderline.amount) FROMorders UNNEST orders.lineitems AS orderline JOIN customers ON KEYS orders.custid WHERE customers.state = 'NY' GROUP BY customers.id, customers.NAME.lastname HAVING sum(orderline.amount) > 10000 ORDER BY sum(orderline.amount) DESC Dotted sub-document reference Names are CASE-SENSITIVE UNNEST to flatten the arrays JOINS with Document KEY of customers
  • 32.
    Query Execution Flow 1.Application submits N1QL query 2. Query is parsed, analyzed and plan is created 1 2
  • 33.
    Query Execution Flow 3.Query Service makes request to Index Service 4. Index Service returns document keys and data 3 4
  • 34.
    Query Execution Flow 5.If Covering Index, skip step 6 6. If filtering is required, fetch documents from Data Service56
  • 35.
    Query Execution Flow 7.Apply final logic (e.g. SORT, ORDER BY) 8. Return formatted results to application 7 8
  • 36.
    Data Modification Statements •UPDATE … SET … WHERE … • DELETE FROM … WHERE … • INSERT INTO … ( KEY, VALUE ) VALUES … • INSERT INTO … ( KEY …, VALUE … ) SELECT … • MERGE INTO … USING … ON … WHEN [ NOT ] MATCHED THEN … Note: Couchbase provides per-document atomicity.
  • 37.
    Data Modification Statements INSERTINTO ORDERS (KEY, VALUE) VALUES ("1.ABC.X382", {"O_ID":482, "O_D_ID":3, "O_W_ID":4}); UPDATE ORDERS SET O_CARRIER_ID = ”ABC987” WHERE O_ID = 482 AND O_D_ID = 3 AND O_W_ID = 4 DELETE FROM NEW_ORDER WHERE NO_D_ID = 291 AND NO_W_ID = 3482 AND NO_O_ID = 2483 JSON literals can be used in any expression
  • 38.
    Full SQL Pipeline ©2019Couchbase. All rights reserved. 38 Global Index CLIENT Key/Value API FetchScanParse Plan Join Filter Pre- Aggregate Offset Limit Project Data-parallel — Query is N data streams over N cores* Memory-based Pluggable architecture — datastore, index… REQUEST RESPONSE SortAggregate
  • 39.
    ● Use casessamples: ○ revenue growth month over month ○ top N sale districts by revenues for a given week ○ ranking of sales person by region based on revenue booked ● Answer common but complex business queries with minimal lines of code and optimized performance ● Couchbase is the first NoSQL Database to support ANSI Window Functions ANSI Window Functions With ANSI Window Functions, developers can simplify financial and statistical aggregations in an easy and optimized way
  • 40.
    ANSI Common TableExpression (CTE) ● CTE allows developer to isolate SQL statement into temporary named result set that can be referenced as a source table in the context of a larger query ● Offers the advantages of readability and ease of maintenance of complex queries without compromising performance. ● Couchbase is the first NoSQL Database to support ANSI CTE. With ANSI Common Table Expression, developers have ease of maintenance and better readability by naming temporary SQL statements SELECT b.month, b.current_period_task_count, ROUND(((b.current_period_task_count - b.last_period_task_count ) / b.last_period_task_count),2) FROM last_period_task AS b last_period_task AS ( SELECT x.month, x.current_period_task_count, LAG(x.current_period_task_count) OVER (ORDER BY x.month) AS last_period_task_count FROM current_period_task x ) WITH current_period_task AS ( SELECT DATE_TRUNC_STR(a.startDate,'month') AS month, COUNT(1) AS current_period_task_count FROM crm a WHERE a.type='activity' AND a.activityType = 'Task' AND DATE_PART_STR(a.startDate,'year') = 2018 GROUP BY DATE_TRUNC_STR(a.startDate,'month’) ),
  • 41.
    User-Defined Functions (DeveloperPreview) CREATE FUNCTION getsalestax(state,city) DROP FUNCTION getsalestax EXECUTE FUNCTION getsalestax("CA","Santa Clara") SELECT getsalestax(state,city) FROM invoice ● Allow developers to define custom functions in Javascript (similar PL/SQL) callable from N1QL queries. Interactive debugger to simplify development and testing ● Server-side logic that can be reused by any application and micro services; improve code maintenance and developer productivity ● Improve code performance by bringing application logic closer to the data With User-Defined Functions, developer define callable custom functions to ease development, simplify reusability and improve code performance.
  • 42.
    Cost-Based Optimizer (DeveloperPreview) Statistics & metadata ● Cost-based optimizer that generates the optimal access path based on statistics collected on the data ● Eliminates time tweaking a query and providing optimizer hints to get the rule-based optimizer to pick the right query plan ● Leverages decades of research and experience in query optimization to collect and use statistics on JSON, arrays, and objects. ● Couchbase is the first NoSQL database with dynamic schema to support cost-based optimizer With Cost-Based Optimizer, queries will run faster by using optimal access path without the need to provide optimizer hints
  • 43.
    Index Advisor (DeveloperPreview) ● The Index Advisor suggests appropriate indexes to speed up a given query, taking the guesswork out of query tuning ● The Index Advisor can also monitor and analyze the statistics collected from running a workload, and suggest the indexes that will speed up the queries in the workload ● The Index Advisor greatly reduces the complexity and efforts required for enterprise developers and operations engineers to determine the right indexes to speed up their queries With Index Advisor, developer can create better indexes based on the suggestions and speed up queries easily.
  • 44.
  • 45.
    Indexing • Quick andefficient access to your data • No need to scan all the documents • Support for filtered indexes, compounded indexes and covering indexes.
  • 46.
    Index Options Index TypeDescription 1 Primary Index Index on the document key on the whole bucket 2 Named Primary Index Give name for the primary index. Allows multiple primary indexes in the cluster 3 Secondary Index Index on the key-value or document-key 4 Secondary Composite Index Index on more than one key-value 5 Functional Index Index on function or expression on key-values 6 Array Index Index individual elements of the arrays 7 Covering Index Query able to answer using the the data from the index and skips retrieving the item. 8 Adaptive Index Special type of GSI array index that can index all or specified fields of a document. 9 Replica Index The feature of indexing that allows load balancing. Thus providing scale-out, multi-dimensional scaling, performance, and high availability.
  • 47.
  • 48.
    Full Text Search •Search within texts – extremely fast • Language aware (supports 19 languages) • Scoring results mechanism • Rich querying capabilities
  • 49.
    Underlying Concepts Scoring User searches… Beautiful Searchedas… Beauti Document contains… Beauty Indexed as… Beauti stemmingstemming Text Analysis ✔ Match! Inverted indexes Language awareness Terms my: Doc 1, Doc 2, Doc 3 dog: Doc 1, Doc 2, Doc 81 has: Doc 1, Doc 2, Doc 3 fleas: Doc 1, Doc 81 … Where found
  • 50.
    Full Text Search- Capabilities Query  Basic: Match, Match Phrase, Fuzzy, Prefix, Regexp, Wildcard, Boolean Field  Compound: QueryString, Boolean, Conjunction, Disjunction  Range: DateRange, NumericRange  Special Purpose: DocID, MatchAll, MatchNone, Phrase, Term, Geospatial  Scoring (TF/IDF), boosting, field scoping  New/DP: TermRange, Geospatial Indexing  Real time indexing (inverted index, auto-updated upon mutation)  Default map and map by document type  Dynamic mapping  Stored fields, Term vectors  Analyzers: Tokenization, Token Filtering (stop word removal, stemming – language specific)  Aliasing
  • 51.
    SEARCH Predicates inQuery (Full Text Search) ● Couchbase provides a comprehensive search capability in query beyond the simple LIKE() operator in most databases ● The SEARCH() operator supports keyword and fuzzy matchings across multiple document fields ● Developers do not have to write complex code to process and combine the results from separate SQL and search queries ● Better query performance with inverted indexes for search predicates instead of inefficient scans for LIKE() SELECT * FROM `beer-sample` b WHERE SEARCH(b.desc, "fruity”) AND b.abv < 0.5; With Search predicates, developers can combine SQL and Search queries for powerful integration and better query performances
  • 52.
    Mobile • Full stackplatform for mobile and IoT apps. • Real time automatic sync – built-in • Fully secured • Data can be accessed when offline.
  • 53.
    Couchbase - TheData Platform for Mobile Engagement COUCHBASE LITE SYNC GATEWAY COUCHBASE SERVER Lightweight embedded NoSQL database with full CRUD and query functionality. Secure web gateway with synchronization, data access, and data integration APIs for accessing, integrating, and synchronizing data over the web. Highly scalable, highly available, high performance NoSQL database server. Client Middle Tier Storage Security Built-in enterprise level security throughout the entire stack includes user authentication, user and role based data access control (RBAC), secure transport (TLS), and 256-bit AES full database encryption.
  • 54.
    Eventing • Triggering userdefined business logic in real time • Runs on the cluster. • Logic is written in JavaScript (V8)
  • 55.
    Analytics • Powerful parallelquery processing over JSON • Made for long running complex SQL-like queries • Complete workload isolation – does not affect the operational data processing
  • 56.
    Distributed Complex Query Processing Parallel QueryProcessing • Distributed Massively Parallel Query Processor (MPP) quickly executes complex queries on larger datasets • Comprehensive SQL-like query language 56 Query takes 1 minute Query takes 5 seconds ©2019 Couchbase. All rights reserved.
  • 57.
    Container and Cloud Deployment •Couchbase replication is zone and region aware –ideal for cloud deployments. • Pre built modules for all major cloud vendors. • Support for containers • Automation with Couchbase Autonomous Operator for Kubernetes. • Support for Red Hat OpenShift
  • 58.
  • 59.
  • 60.
    Distributed ACID Transactions WHY Withdistributed ACID transactions, developers simplify application logic by relying on all-or- nothing semantics for durably modifying multiple documents distributed on different nodes. Transition from RDBMS schema: ● De-normalization has limitations Multi-Asset Coordination: ● Transfer from one user to another ● Reservation items such as flights Business/Application-level transaction needs to modify multiple documents (all-or- nothing): ● Microservices SAGAs - Event based orchestration transactions.run((txnctx) -> { // Insert a document JsonDocument doc1 = JsonDocument.create("newDoc",JsonObject.create()); txnctx.insert(bucket, doc1); // Replace a document TransactionJsonDocument doc2 = txnctx.getOrError(bucket, "doc2"); doc2.content().put( "name", "bob"); txnctx.replace(doc2); // Commit transaction txnctx.commit(); });
  • 61.
    ACID in Couchbase CouchbaseServer 6.5 provides strong ACID guarantees while balancing scalability, availability and performance. A Atomicity Guarantees all-or-nothing semantics for updating multiple documents in more than one shards on different nodes. C Consistency Replicas are strongly consistent for chosen durability level. Indexes and XDCR cluster are eventually consistent. I Isolation Read Committed isolation for concurrent readers as it provides strong semantics without compromising availability, scalability, and performance. Applications can specify scan consistency level for performance. D Durability Data protection under failures: 3 different levels - replicate to majority of the nodes; replicate to majority and persist to disk on primary; or persist to disk on majority of the nodes.
  • 62.
    Thank you Lior King Sr. SolutionArchitect Lior.King@Couchbase.com