Utilizing Arrays: Modeling, Querying and Indexing

©2016 Couchbase Inc.
{ "Utilizing Arrays" :
["Modeling", "Querying", "Indexing"] }
1
Keshav Murthy
Director,Couchbase R&D

©2016 Couchbase Inc.©2016 Couchbase Inc.
Agenda
• Introduction to Arrays
• Data Modeling with Arrays
• Query PerformanceWith Arrays
• Array Indexing
• FunWithArrays
• Query Performance
• Tag Search
• String Search
2

©2016 Couchbase Inc. 3
IntroductionTo Arrays

Every N1QL query returns Arrays
4
cbq> select distinct type from `travel-sample`;
{
…
"results": [
{ "type": "route“ },
{ "type": "airport” },
{ "type": "hotel" },
{ "type": "airline” },
{ "type": "landmark” }
] ,
"status": "success",
"metrics": {
"elapsedTime": "840.518052ms",
"executionTime": "840.478414ms",
"resultCount": 5,
"resultSize": 202
}
}
Results from every query is an array.
cbq> SELECT * FROM `travel-
sample`WHERE type = 'airport' and
faa = 'BLR';
{
"results": [],
"metrics": {
"elapsedTime": "9.606755ms",
"executionTime": "9.548749ms",
"resultCount": 0,
"resultSize": 0
}
}

Introduction to Arrays
• An arrangement of quantities or symbols in rows
and columns; a matrix
6
• An indexed set of related elements

JSON Arrays
7
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"hobbies" : ["lego", "piano", "badminton", "robotics"],
"scores" : [3.4, 2.9, 9.2, 4.1],
"legos" : [
true,
9292,
"fighter 2",
{
"name" : "Millenium Falcon",
"type" : "Starwars"
}
]
}
• Arrays in JSON can
contain simply values,
or any combination of
JSON types within the
same array.
• No type or structure
enforcement within
the array.

JSON Arrays
8
{
"Name": "Jane Smith",
"DOB" : "1990-01-30",
"phones" : [
"+1 510-523-3529", "+1 650-392-4923"
],
"Billing": [
{
"type": "visa",
"cardnum": "5827-2842-2847-3909",
"expiry": "2019-03"
},
{
"type": "master",
"cardnum": "6274-2542-5847-3949",
"expiry": "2018-12"
}
]
}
Billing has two credit card
entries, stored as an ARRAY
Two phone number entries

JSON Arrays : Syntax Diagram
9

Data Modeling with Arrays

Properties of Real-World Data
• Rich structure
• Attributes, Sub-structure
• Relationships
• To other data
• Value evolution
• Data is updated
• Structure evolution
• Data is reshaped
Customer
Name
DOB
Billing
Connections
Purchases

Modeling Data in RelationalWorld
Billing
ConnectionsPurchases
Contacts
Customer
• Rich structure
• Normalize & JOIN Queries
• Relationships
• JOINS and Constraints
• Value evolution
• INSERT, UPDATE, DELETE
• Structure evolution
• ALTER TABLE
• Application Downtime
• Application Migration
• Application Versioning

Using JSON For RealWorld Data
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Table: Customer
{
"DOB" : "1990-01-30"
}
• The primary (CustomerID) becomes the DocumentKey
• Column name-Column value become KEY-VALUE
pair.
{
"Name" : {
"fname": "Jane",
"lname": "Smith"
}
"DOB" : "1990-01-30"
}
OR
Customer DocumentKey: CBL2015

Using JSON to Store Data
CustomerID Name DOB
Table: Customer
{
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-
2847-3909",
"expiry" : "2019-03"
}
]
}
CustomerID Type Cardnum Expiry
CBL2015 visa 5827… 2019-03
Table: Billing
• Rich Structure & Relationships
• Billing information is stored as a sub-document
• There could be more than a single credit card. So, use an array.

CustomerID Name DOB
Table: Customer
{
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-
2847-3909",
"expiry" : "2019-03"
},
{
"type" : "master",
"cardnum" : "6274-2542-
5847-3949",
"expiry" : "2018-12"
}
]
}
CustomerID Type Cardnum Expiry
CBL2015 visa 5827… 2019-03
CBL2015 master 6274… 2018-12
Table: Billing
Value evolution
 Simply add additional array element or
update a value.

CustomerID ConnId Name
CBL2015 XYZ987 Joe Smith
CBL2015 SKR007 Sam Smith
CBL2015 RGV492 Rav Smith
Table: Connections
{
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-2847-3909",
"expiry" : "2019-03"
},
{
"type" : "master",
"cardnum" : "6274-2542-5847-3949",
"expiry" : "2018-12"
}
],
"Connections" : [
{
"ConnId" : "XYZ987",
"Name" : "Joe Smith"
},
{
"ConnId" : "SKR007",
"Name" : "Sam Smith"
},
{
"ConnId" : "RGV491",
"Name" : "Rav Smith"
}
Structure evolution
 Simply add new key-value pairs
 No downtime to add new KV pairs
 Applications can validate data
 Structure evolution over time.
Relations via Reference

{
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-2847-3909",
"expiry" : "2019-03"
},
{
"type" : "master",
"cardnum" : "6274-2842-2847-3909",
"expiry" : "2019-03"
}
],
"Connections" : [
{
"ConnId" : "XYZ987",
"Name" : "Joe Smith"
},
{
"ConnId" : "SKR007",
"Name" : "Sam Smith"
},
{
"ConnId" : "RGV491",
"Name" : "Rav Smith"
}
],
"Purchases" : [
{ "id":12, item: "mac", "amt": 2823.52 }
{ "id":19, item: "ipad2", "amt": 623.52 }
]
}
CustomerID Name DOB
Customer
ID
Type Cardnum Expiry
CBL2015 visa 5827… 2019-03
CBL2015 maste
r
6274… 2018-12
CBL2015 SKR007 Sam Smith
CBL2015 RGV492 Rav Smith
CustomerID item amt
CBL2015 mac 2823.52
CBL2015 ipad2 623.52
CBL2015 SKR007 Sam
Smith
Contacts
Customer
Billing
ConnectionsPurchases

Models for Representing Data
Data Concern Relational Model JSON Document Model (NoSQL)
Rich Structure
 Multiple flat tables
 Constant assembly / disassembly
 Documents
 No assembly required!
Relationships
 Represented
 Queried (SQL)
 Represented
 N1QL, MongoDB, CQL
Value Evolution  Data can be updated  Data can be updated
Structure Evolution
 Uniform and rigid
 Manual change (disruptive)
 Flexible
 Dynamic change

Querying Arrays

Querying Arrays
• Array Access
• Expressions
• Functions
• Aggregates
• Statements
• Array Clauses
20

Array Access: Expressions, Functions and Aggregates.
21
• EXPRESSIONS
• ARRAY
• ANY
• EVERY
• IN
• WITHIN
• Construct [elem]
• Slice array[start:end]
• Selection array[#pos]
• FUNCTIONS
• ISARRAY
• TYPE
• ARRAY_APPEND
• ARRAY_CONCAT
• ARRAY_CONTAINS
• ARRAY_DISTINCT
• ARRAY_IFNULL
• ARRAY_FLATTEN
• ARRAY_INSERT
• ARRAY_INTERSECT
• ARRAY_LENGTH
• ARRAY_POSITION
• AGGREGATES
• ARRAY_AVG
• ARRAY_COUNT
• ARRAY_MIN
• ARRAY_MAX
• FUNCTIONS
• ARRAY_PREPEND
• ARRAY_PUT
• ARRAY_RANGE
• ARRAY_REMOVE
• ARRAY_REPEAT
• ARRAY_REPLACE
• ARRAY_REVERSE
• ARRAY_SORT
• ARRAY_STAR
• ARRAY_SUM

Array access
22
{
"DOB" : "1990-01-30",
"phones" : [
"+1 510-523-3529", "+1 650-392-4923"
],
"Billing": [
{
"type": "visa",
"cardnum": "5827-2842-2847-3909",
"expiry": "2019-03"
},
{
"type": "master",
"cardnum": "6274-2542-5847-3949",
"expiry": "2018-12"
}
]
}
SELECT phones from t;
[
{
"phones": [
"+1 510-523-3529",
"+1 650-392-4923"
]
}
]
SELECT phones[1] from t;
[
{
"$1": "+1 650-392-4923"
}
]
SELECT phones[0:1] from t;
[
{
"$1": [
"+1 510-523-3529"
]
}
]

Array access: Expressions and functions
23
{
"DOB" : "1990-01-30",
"phones" : [
"+1 510-523-3529", "+1 650-392-4923"
],
"Billing": [
{
"type": "visa",
"cardnum": "5827-2842-2847-3909",
"expiry": "2019-03"
},
{
"type": "master",
"cardnum": "6274-2542-5847-3949",
"expiry": "2018-12"
}
]
}
SELECT Billing[0].cardnum from t;
[
{
"cardnum": "5827-2842-2847-3909"
}
]
SELECT Billing[*].cardnum from t;
[
{
"cardnum": [
"5827-2842-2847-3909",
"6274-2542-5847-3949"
]
}
]
SELECT ISARRAY(Name) name, ISARRAY(phones)
phones from t;
[
{
"name": false,
"phones": true
}
]

Array access : Functions
24
{
"DOB" : "1990-01-30",
"phones" : [
"+1 510-523-3529", "+1 650-392-4923"
],
"Billing": [
{
"type": "visa",
"cardnum": "5827-2842-2847-3909",
"expiry": "2019-03"
},
{
"type": "master",
"cardnum": "6274-2542-5847-3949",
"expiry": "2018-12"
}
]
}
SELECT ARRAY_CONCAT(phones, ["+1 408-284-
2921"]) from t;
[
{
"$1": [
"+1 510-523-3529",
"+1 650-392-4923",
"+1 408-284-2921"
]
}
]
SELECT ARRAY_COUNT(Billing) billing,
ARRAY_COUNT(phones) phones from t;
[
{
"billing": 2,
"phones": 2
}
]

Array access : Functions
25
SELECT phones, ARRAY_REVERSE(phones)
reverse from t;
{
"phones": [
"+1 510-523-3529",
"+1 650-392-4923"
],
"reverse": [
"+1 650-392-4923",
"+1 510-523-3529"
]
}
]
SELECT phones, ARRAY_INSERT(phones, 0, "+1 415-
439-4923") newlist from t;[
{
"billing": 2,
"phones": 2
}
]
SELECT phones, ARRAY_INSERT(phones, 0, "+1 415-
439-4923") newlist from t;
[
{
"newlist": [
"+1 415-439-4923",
"+1 510-523-3529",
"+1 650-392-4923"
],
"phones": [
"+1 510-523-3529",
"+1 650-392-4923"
]
}
]

Array access : Aggregates
26
SELECT ARRAY_MIN(Billing) AS minbill FROM
t;
[
{
"minbill": {
"cardnum": "5827-2842-2847-3909",
"expiry": "2019-03",
"type": "visa"
}
}
]
SELECT name,
ARRAY_AVG(reviews[*].ratings[*].Overall) AS
avghotelrating
FROM `travel-sample`
WHERE type = 'hotel'
ORDER BY avghotelrating desc
LIMIT 3;
[
{
"avghotelrating": 5,
"name": "Culloden House Hotel"
},
{
"name": "The Bulls Head"
},
{
"name": "La Pradella"
}
]

SELECT: ARRAY & FIRST Expression
27
ARRAY: The ARRAY operator lets you map and filter
the elements or attributes of a collection, object, or
objects. It evaluates to an array of the operand
expression, that satisfies the WHEN clause, if provided.
SELECT ARRAY [name, r.ratings.`Value`]
FOR r IN reviews
WHEN r.ratings.`Value` = 4
END
SELECT FIRST [name, r.ratings.`Value`]
FOR r IN reviews
END
FIRST: The FIRST operator enables you to map and
filter the elements or attributes of a collection, object,
or objects. It evaluates to a single element based on
the operand expression that satisfies the WHEN clause,
if provided.

Statements
• INSERT
• INSERT documents with arrays
• INSERT multiple documents with arrays
• INSERT result of documents from SELECT
• UPDATE
• UPDATE specific elements and objects within an array
• DELETE
• DELETE documents based on values within one or more arrays
• MERGE
• MERGE documents to INSERT, UPDATE or DELETE documents.
• SELECT
• Fetch documents given an array of keys
• JOIN based on array of keys
• Predicates (filters) on arrays
• Array expressions, functions and aggregates
• UNNEST, NEST operations
28

Statements:INSERT
INSERT INTO customer VALUES ("KEY01", { "cid": "ABC01", "orders": ["LG012", "LG482", "LG134"] });
INSERT INTO customer VALUES (("KEY01", { "cid": "XYC21", "orders": ["LG92", "LG859"] }),
VALUES (("KEY04", { "cid": "PQR49", "orders": ["LG47", "LG09", "LG134"] }),
VALUES (("KEY09", { "cid": "KDL29", "orders": ["LG082"] });
INSERT INTO customer
(
KEY uuid(),
value c
)
SELECT mycustomers AS c
FROM newcustomers AS n
WHERE n.type = "premium";
29

Statements: DELETE
DELETE
FROM customer
WHERE orders = ["LG012", "LG482", "LG134"];
DELETE
FROM customer
WHERE ANY o IN orders SATISFIES o = "LG012" END;
DELETE
FROM customer
WHERE ANY o IN orders SATISFIES o = "LG012" END
RETURNING meta().id, *;
30

Statements:UPDATE
UPDATE customer USE KEYS ["KEY091"] SET orders = ["LG012", "LG482", "LG134"];
UPDATE customer USE KEYS ["KEY091"]
SET orders = ARRAY_REMOVE(orders, "LG012") ;
UPDATE customer USE KEYS ["KEY091"]
SET orders = ARRAY_APPEND(orders, "LG892") ;
31

Statements : SELECT
• SELECT
• Array predicates
• NEST, UNNEST
• Fetch documents given an array of keys
• JOIN based on array of keys
32

SELECT statement
ARRAY PREDICATES

SELECT: Array predicates
34
• ANY
• EVERY
• SATISFIES
• IN
• WITH
• WHEN

35
• Arrays and Objects: Arrays are compared element-
wise. Objects are first compared by length; objects
of equal length are compared pairwise, with the
pairs sorted by name.
• IN clause: Use this when you want to evaluate based
on specific field.
• WITHIN clause: Use this when you don’t know which
field contains the value you’re looking for. The
WITHIN operator evaluates to TRUE if the right-side
value contains the left-side value as a child or
descendant. The NOT WITHIN operator evaluates to
TRUE if the right-side value does not contain the left-
side value as a child or descendant.
SELECT *
WHERE type = 'hotel’
AND ANY r IN reviews
SATISFIES r.ratings.`Value` >= 3
END;
SELECT *
AND ANY r WITHIN reviews
SATISFIES r LIKE '%Ozella%'
END;
• EVERY: EVERY is a range predicate that tests a
Boolean condition over the elements or attributes of
a collection, object, or objects. It uses the IN and
WITHIN operators to range through the collection.
SELECT *
AND EVERY r IN reviews
SATISFIES r.ratings.Cleanliness >= 4
END;

36
• ARRAY_CONTAINS
• Returns true if the array contains value.
SELECT name, t.public_likes
FROM `travel-sample` t
WHERE type="hotel" AND
ARRAY_CONTAINS(t.public_likes,
"Vallie Ryan") = true;
[
{
"name": "Medway Youth Hostel",
"public_likes": [
"Julius Tromp I",
"Corrine Hilll",
"Jaeden McKenzie",
"Vallie Ryan",
"Brian Kilback",
"Lilian McLaughlin",
"Ms. Moses Feeney",
"Elnora Trantow"
]
}
]

Array Expressions, Functions and Aggregates.
37
• EXPRESSIONS
• ARRAY
• ANY
• EVERY
• IN
• WITHIN
• Construct [elem]
• Slice array[start:end]
• Selection array[#pos]
• FUNCTIONS
• ISARRAY
• TYPE
• ARRAY_APPEND
• ARRAY_CONCAT
• ARRAY_CONTAINS
• ARRAY_DISTINCT
• ARRAY_IFNULL
• ARRAY_FLATTEN
• ARRAY_INSERT
• ARRAY_INTERSECT
• ARRAY_LENGTH
• ARRAY_POSITION
• AGGREGATES
• ARRAY_AVG
• ARRAY_COUNT
• ARRAY_MIN
• ARRAY_MAX
• ARRAY_SUM
• FUNCTIONS
• ARRAY_PREPEND
• ARRAY_PUT
• ARRAY_RANGE
• ARRAY_REMOVE
• ARRAY_REPEAT
• ARRAY_REPLACE
• ARRAY_REVERSE
• ARRAY_SORT
• ARRAY_STAR

SELECT: ARRAY & FIRST Expression
38
ARRAY: The ARRAY operator lets you map and filter
the elements or attributes of a collection, object, or
objects. It evaluates to an array of the operand
expression, that satisfies the WHEN clause, if provided.
SELECT ARRAY [name, r.ratings.`Value`]
FOR r IN reviews
END
SELECT FIRST [name, r.ratings.`Value`]
FOR r IN reviews
END
FIRST: The FIRST operator enables you to map and
filter the elements or attributes of a collection, object,
or objects. It evaluates to a single element based on
the operand expression that satisfies the WHEN clause,
if provided.

SELECT statement
UNNEST and NEST

Querying Arrays: UNNEST
• UNNEST : If a document or object contains
an array, UNNEST performs a join of the
nested array with its parent document. Each
resulting joined object becomes an input to
the query. UNNEST, JOINs can be chained.
40
SELECT r.author, COUNT(r.author) AS authcount
FROM `travel-sample` t UNNEST reviews r
WHERE t.type="hotel"
GROUP BY r.author
ORDER BY COUNT(r.author) DESC
LIMIT 5;
[
{
"authcount": 2,
"author": "Anita Baumbach"
},
{
"authcount": 2,
"author": "Uriah Gutmann"
},
{
"authcount": 2,
"author": "Ashlee Champlin"
},
{
"authcount": 2,
"author": "Cassie O'Hara"
},
{
"authcount": 1,
"author": "Zoe Kshlerin"
}
]

Querying Arrays: NEST
• NEST is the inverse of UNNEST.
• Nesting is conceptually the inverse of
unnesting. Nesting performs a join across
two keyspaces. But instead of producing a
cross-product of the left and right inputs, a
single result is produced for each left input,
while the corresponding right inputs are
collected into an array and nested as a single
array-valued field in the result object.
41
SELECT *
FROM `travel-sample` route
NEST `travel-sample` airline
ON KEYS route.airlineid
WHERE route.type = ‘airline' LIMIT 1;
[
{
"airline": [
{
"callsign": "AIRFRANS",
"country": "France",
"iata": "AF",
"icao": "AFR",
"id": 137,
"name": "Air France",
"type": "airline"
}
],
"route": {
"airline": "AF",
"airlineid": "airline_137",
"destinationairport": "MRS",
"distance": 2881.617376098415,
"equipment": "320",
"id": 10000,
"schedule": [
{
"day": 0,
"flight": "AF198",
"utc": "10:13:00"
},
{
"day": 0,
"flight": "AF547",
"utc": "19:14:00"
},
{
"day": 0,
"flight": "AF943",

Query Performance with Arrays

Array Indexing
• Before 4.5, creating index on
array attribute would index the
entire array as a single scalar
value.
CREATE INDEX i1 ON
`travel-sample`(schedule);
"schedule": [
{
"day" : 0,
"flight" : "AI111",
"utc" : "1:11:11"}
},
{
"day": 1,
"flight": "AF552",
"utc": "14:41:00"
},
{
"day": 2,
"flight": "AF166",
"utc": "08:59:00"
}, …
]

Array Indexing - motivation
[
{ "day" : 0,
"special_flights" :
[
{ "flight" : "AI111", "utc" : ”1:11:11"},
{ "flight" : "AI222", "utc" : ”2:22:22" }
]
},
{
"day": 1,
"flight": "AF552",
"utc": "14:41:00”
}, …
]
"London":[
"London",
"Tokyo",
"NewYork",
…
]

Why array indexing?
• When NoSQL databases asked customers to denormalize, they put the child
table info into arrays in parent tables.
• E.g. Each customer doc had all phone numbers, contacts, orders in arrays.
• Not easy to query - need to specify full array value in where predicates.
• Ex: list of users who purchased a product – Unknown values & large list
• Was not possible to index part of the array with objects
• Bloated index size (indexes whole array value)
• Example: Index just the day field in array of flights in schedule.
• Performance Limitation
• ANY…IN orWITHIN array
• Ease of querying - Must specify full array value inWHERE-clause
• Manageable for Known or handful of values
• Difficult for Unknown or Large list of values.

Who wants array indexing?
• Find my crew based on the airline.
WHERE ANY p IN ods.pilot satisfies p.filen = ”XYZ1012" END ;
• Find my customer based on one of the emails on the customer
WHERE ANY a IN u.telecom SATISFIES a.system = ‘email’ AND a.value = ‘a@b.com’ END ;
• Find service qualification based on arrays of arrays.
WHERE ANY c_0 IN `item`.`blackoutserviceblocklist` SATISFIES
ANY c_1 IN c_0.`blackoutserviceblock`.`ppvservicelist` SATISFIES
c_1.`ppvservice`.`eventcode` = "E001"
END
END ;

What is Array Indexing?
• Enables visibility into the array structure
schedule =
• Subset of array elements can be indexed & searched efficiently
[
{ "day" : 0,
"special_flights" :
[
{ "flight" : "AI111" , "utc" : "1:11:11"},
{ "flight" : "AI932" , "utc" : "2:22:22"}
]
},
{
"day": 1,
"flight": "AF552",
"utc": "14:41:00"
}, …
]

How Array Indexing Helps?
• Index only required elements or attributes in the the array
• Efficient on Index storage & search time
• Benefits are lot more significant for nested arrays/objects

HowArray Indexing Helps -- Example
"schedule”:
[ { "day" : 0,
"special_flights" : [
{ "flight" : "AI111", "utc" : "1:11:11"},
{ "flight" : "AI222", "utc" : "2:22:22"}
]
},
{
"day": 1,
"flight": "AF552",
"utc": "14:41:00"
},
{
"day": 2,
"flight": "AF166",
"utc": "08:59:00"
}, …
]
"flight":"AF552",
"flight":"AF166",
…
Array Index in Couchbase

Create Array Index
• No syntax changes to DML statements
• Supports all DML statements with a WHERE-clause
• SELECT, UPDATE, DELETE, etc.
• Array index support only for GSI indexes.
• Supports both standard secondary and memory optimized index.
CREATE INDEX isched ON `travel-sample`
(DISTINCT ARRAY v.flight FOR v IN schedule END) WHERE type = "route";

Array Index syntax
(ALL ARRAY p FOR p IN public_likes END)
WHERE type = "hotel" ;
"Julius Smith", [DocID]
"Corrine Hill", [DocID]
"Jaeden McKenzie", [DocID]
"Vallie Ryan", [DocID]
"Brian Kilback", [DocID]
"Lilian McLaughlin", [DocID]
"Ms. Moses Feeney", [DocID]
"Elnora Trantow”, [DocID]
"public_likes": [
"Julius Smith",
"Corrine Hill",
"Jaeden McKenzie",
"Vallie Ryan",
"Brian Kilback",
"Lilian McLaughlin",
"Ms. Moses Feeney",
"Elnora Trantow"
]

Example - Indexing individual attributes/elements
• "Find the total number of flights scheduled on 3rd day"
(DISTINCT ARRAY v.day FOR v IN schedule END) WHERE type = "route” ;
SELECT count(*) FROM `travel-sample`
WHERE type = "route" AND
ANY v IN schedule SATISFIES v.day = 3 END;

Example - Indexing individual attributes/elements
explain SELECT count(1) FROM `travel-sample`
{
"#operator": "DistinctScan",
"scan": {
"#operator": "IndexScan",
"index": "isched",
"index_id": "2b24c681fa54d83f",
"keyspace": "travel-sample",
"namespace": "default",
"spans": [
{
"Range": {
"High": [
"3"
],
"Inclusion": 3,
"Low": [
"3"
]

Example - Index with Array Elements and Other Attributes
• "Find all scheduled flights with hops, and group by number of stops"
CREATE INDEX iflight_stops ON `travel-sample`
( stops, DISTINCT ARRAY v.flight FOR v IN schedule END )
WHERE type = "route" ;
SELECT * FROM `travel-sample`
WHERE type = "route"
AND ANY v IN schedule SATISFIES v.flight LIKE 'AA%' END
AND stops >= 0;

Example - Indexing Nested Arrays
"schedule" : [
{"day" : 0,
{"flight" : "AI111",
"utc" : "1:11:11"},
{"flight" : "AI222",
"utc" : "2:22:22" }
]
},
{"day": 1,
"flight": "AF552",
"utc": "14:41:00"
} …
]

Example - Indexing Nested Arrays
• "Find the total number of special flights scheduled"
CREATE INDEX inested ON `travel-sample`
(DISTINCT ARRAY
(DISTINCT ARRAY y.flight
FOR y IN x.special_flights END)
FOR x IN schedule END)
WHERE type = "route" ;
WHERE type ="route" AND
ANY x IN schedule SATISFIES
(ANY y IN x.special_flights
SATISFIES y.flight IS NOT NULL END)
END ;
"schedule”:
[ { "day" : 0,
{ "flight" : "AI111", "utc":"1:11:11"},
{ "flight" : "AI222", "utc":"2:22:22"}
]
},
{
"day": 1,
"flight": "AF552",
"utc": "14:41:00"
},
{
"day": 2,
"flight": "AF166",
"utc": "08:59:00"
}, …
]

Example – UNNEST
• N1QL array indexing
supports both collection
predicates
• ANY
• ANY AND EVERY
• Exploited UNNEST
CREATE INDEX idx_crew ON flight
(DISTINCT ARRAY c FOR c IN public_likes END);
SELECT *
FROM flight UNNEST crew_ids AS c
WHERE c = "Joe Smith" ;

Restrictions in 4.5
Variable names and index keys, such as v & v.day
used in CREATE INDEX and subsequent SELECT statements must be same.
(DISTINCT ARRAY v.day FOR v IN schedule END) WHERE type = "route" ;

Restrictions in 4.5
• Supported operators:
DISTINCT ARRAY
ALL ARRAY
ARRAY
ANY
ANY AND EVERY
IN, WITHIN
UNNEST
• NOT supported operators: EVERY

Fun with Arrays

SELECT: Fetch Documents
SELECT * FROM customer USE KEYS ["KEY01"] ;
SELECT * FROM customer USE KEYS [ "CUST:09", "CUST:29", "CUST:234", "CUST:852", "CUST:258"] ;
SELECT status, COUNT(status)
FROM customer c USE KEYS [ "CUST:09", "CUST:29", "CUST:234", "CUST:852", "CUST:258" ]
WHERE c.region = 'US’
GROUP BY status;
SELECT product, COUNT(product)
FROM customer c USE KEYS [ "CUST:09", "CUST:29", "CUST:234", "CUST:852", "CUST:258" ]
INNER JOIN
locations ON KEYS c.lid
WHERE c.region = 'US’
GROUP BY product;
61

SELECT: JOIN
62
SELECT COUNT(1)
FROM `beer-sample` beer
INNER JOIN
`beer-sample` brewery ON KEYS beer.brewery_id
WHERE state = ‘CA’
• JOIN operation combines documents
from two key spaces
• JOIN criteria is based on ON KEYS clause
• The outer table uses the index scan, if
possible
• The fetch of the inner table (brewery)
document-by-document
• Couchbase 4.6 improves this by fetching
in batches.

SELECT: JOIN
SELECT COUNT(1)
FROM (
SELECT RAW META().id
WHERE state = ‘CA’) as blist
INNER JOIN
`beer-sample` brewery ON KEYS blist;
63
SELECT COUNT(1)
FROM (
SELECT ARRAY_AGG(META().id) karray
WHERE state = ‘CA’) as b
INNER JOIN
`beer-sample` brewery ON KEYS b.karray;
• Why not get all of the required document IDs from the index scan then do a big bulk get on the
outer table?
• Two ways to do it.
a) Use the array aggregate (ARRAY_AGG()) to create the list
b) Use RAW to create the the array and then use that to JOIN.

Data.gov : NewYork Names
{
"meta": {
"view": {
"id": "25th-nujf",
"name": "Most Popular Baby Names by Sex and Mother's Ethnic Group, New York City",
"category": "Health",
"createdAt": 1382724894,
"description": "The most popular baby names by sex and mother's ethnicity in New York City.",
"displayType": "table",
…
"columns": [{
"id": -1,
"name": "sid",
"dataTypeName": "meta_data",
"fieldName": ":sid",
"position": 0,
"renderTypeName": "meta_data",
"format": {}
}, {
"id": -1,
"name": "id",
"dataTypeName": "meta_data",
"fieldName": ":id",
"position": 0,
"renderTypeName": "meta_data",
"format": {}
}
...
]
"data": [
[1, "EB6FAA1B-EE35-4D55-B07B-8E663565CCDF", 1, 1386853125, "399231", 1386853125, "399231", "{n}", "2011", "FEMALE",
"HISPANIC", "GERALDINE", "13", "75"],
[2, "2DBBA431-D26F-40A1-9375-AF7C16FF2987", 2, 1386853125, "399231", 1386853125, "399231", "{n}", "2011", "FEMALE",
"HISPANIC", "GIA", "21", "67"],
[3, "54318692-0577-4B21-80C8-9CAEFCEDA8BA", 3, 1386853125, "399231", 1386853125, "399231", "{n}", "2011", "FEMALE",
"HISPANIC", "GIANNA", "49", "42"]
...
]
} 64

INSERT INTO nynames (KEY UUID(), VALUE kname)
SELECT {":sid":d[0],
":id":d[1],
":position":d[2],
":created_at":d[3],
":created_meta":d[4],
":updated_at":d[5],
":updated_meta":d[6],
":meta":d[7],"brth_yr":d[8],
"brth_yr":d[9],
"ethcty":d[10],
"nm":d[11],
"cnt":d[12],
"rnk":d[13]} kname
FROM (SELECT d FROM datagov UNNEST data d) as u1;
65

INSERT INTO nynames
(
KEY UUID(),
value o
)
SELECT o
FROM (
SELECT meta.`view`.columns[*].fieldName f,
data
FROM datagov) d
UNNEST data d1
LET o = OBJECT p:d1[ARRAY_POSITION(d.f, p)] FOR p IN d.f END ;
66

SPLIT & CONQUOR:
SELECT name FROM `travel-sam5ple`
WHERE type = 'hotel' LIMIT 5;
[
{
"name": "Medway Youth Hostel"
},
{
"name": "The Balmoral Guesthouse"
},
{
"name": "The Robins"
},
{
"name": "Le Clos Fleuri"
},
{
"name": "Glasgow Grand Central"
}
]
67
• Problem: Search for a word within a string

SPLIT & CONQUER:
select name
from `travel-sample`
where type = 'hotel' and
lower(name) LIKE '%grand%';
[
{
},
{
"name": "Horton Grand Hotel"
},
{
"name": "Manchester Grand Hyatt"
},
{
"name": "Grande Colonial Hotel"
},
{
"name": "Grand Hotel Serre Chevalier"
},
{
"name": "The Sheraton Grand Hotel"
}
]
68
• Use the LIKE predicate
• Runs in about 81 milliseconds to search 917
documents

SPLIT & CONQUER:
CREATE INDEX idxtravelname ON
`travel-sample`
(DISTINCT ARRAY wrd
FOR wrd IN SPLIT(LOWER(name)) END) where type =
'hotel';
SELECT name FROM `travel-sample`
WHERE ANY wrd IN SPLIT(LOWER(name)) satisfies wrd =
'grand' END AND type = 'hotel';
[
{
"name": "The Sheraton Grand Hotel"
},
{
"name": "Horton Grand Hotel"
},
{
"name": "Grand Hotel Serre Chevalier"
},
{
},
{
"name": "Manchester Grand Hyatt"
}
]
~ 69
• Convert into LOWER case
• Split the name into words.
• SPLIT() returns a ARRAY of these words.
• Create the INDEX on this array.
• Query using the Array predicate.
• Query runs in 10 ms.
• Benefits grow with number of docs.

Bucket: article
{
{
"tags": "JSON,N1QL,COUCHBASE,BIGDATA,NAME,data.gov,SQL",
"title": "What's in a New York Name? Unlock data.gov Using N1QL "
}, {
"tags": "TWITTER,NOSQL,SQL,QUERIES,ANALYSIS,HASHTAGS,JSON,COUCHBASE,ANALYTICS,INDEX",
"title": "SQL on Twitter: Analysis Made Easy Using N1QL"
}, {
"tags":
"CONCURRENCY,MONGODB,COUCHBASE,INDEX,READ,WRITE,PERFORMANCE,SNAPSHOT,CONSISTENCY",
"title": "Concurrency Behavior: MongoDB vs. Couchbase"
}, {
"tags": "COUCHBASE,N1QL,JOIN,PERFORMANCE,INDEX,DATA MODEL,FLEXIBLE,SCHEMA",
"title": "JOIN Faster With Couchbase Index JOINs"
}, {
"tags":
"NOSQL,NOSQL,BENCHMARK,SQL,JSON,COUCHBASE,MONGODB,YCSB,PERFORMANCE,QUERY,INDEX",
"title": "How Couchbase Won YCSB"
}
}

Questions:
Find all the articles with N1QL in their title
Find all the articles with COUCHBASE in their tags
{
{
}, {
"tags": "TWITTER,NOSQL,SQL,QUERIES,ANALYSIS,HASHTAGS,JSON,COUCHBASE,ANALYTICS,INDEX",
"title": "SQL on Twitter: Analysis Made Easy Using N1QL"
}, {
"tags":
"CONCURRENCY,MONGODB,COUCHBASE,INDEX,READ,WRITE,PERFORMANCE,SNAPSHOT,CONSISTENCY",
"title": "Concurrency Behavior: MongoDB vs. Couchbase"
}, {
"tags": "COUCHBASE,N1QL,JOIN,PERFORMANCE,INDEX,DATA MODEL,FLEXIBLE,SCHEMA",
"title": "JOIN Faster With Couchbase Index JOINs"
}, {
"tags":
"NOSQL,NOSQL,BENCHMARK,SQL,JSON,COUCHBASE,MONGODB,YCSB,PERFORMANCE,QUERY,INDEX",
"title": "How Couchbase Won YCSB"
}
}

Basic Framework
"tags": "JSON,N1QL,COUCHBASE,BIGDATA,NAME,data.gov,SQL"
[
"JSON",
"N1QL",
"COUCHBASE",
"BIGDATA",
"NAME",
"data.gov",
"SQL"
]
SPLIT() into an
array Array
Index
Distinct array wrd for wrd in
split(tags,”,”) end
Index this array N1QL
Query
Service
SELECT *
FROM articles
WHERE ANY wrd IN SPLIT(tags, ",")
satisfies wrd = "COUCHBASE”
END

Basic Framework
[
"What's",
"in",
"a",
"New",
"York",
"Name?",
"Unlock",
"data.gov",
"Using",
"N1QL"
]
SPLIT() into an
array
Array
Index
??? N1QL
Query
Service
???

New Function:TOKENS() in Couchbase 4.6 – OUT in DP now.
TOKENS(expression [, parameter])
expression : JSON expression
parameter : options
{"names":true} Include the key names in the JSON “key”:value pair.
{"case":"lower"} Return the values in upper/lower case
{"specials":true} Recognize special characters like @, - to form tokens.
"tagsarray": [
"data",
"gov",
"bigdata",
"n1ql",
"couchbase",
"sql",
"json",
"name"
],
select title, tags, tokens(tags, {"case":"lower"}) tagsarray, tokens(title) titlearray from articles limit 1;
"title": "What's in a New York Name? Unlock data.gov Using N1QL ",
"titlearray": [
"s",
"Unlock",
"data",
"N1QL",
"gov",
"in",
"Using",
"New",
"What",
"York",
"a",
"Name"
]

UsingTOKENS() – Index on title, lower case
create index ititlesearch on articles(distinct array wrd for wrd in tokens(title, {"case":"lower"}) end);
explain select title from articles where any wrd in tokens(title, {"case":"lower"}) satisfies wrd = 'n1ql' end;
{
"scan": {
"index": "ititlesearch",
"index_id": "7a162af1199565b5",
"keyspace": "articles",
"spans": [
{
"Range": {
"High": [
""n1ql""
],
"Inclusion": 3,
"Low": [
""n1ql""
]
}
}
],
"using": "gsi"
}
},

UsingTOKENS() Index on theWHOLE document
create index ititlesearch2 on articles
(distinct array wrd for wrd in tokens(articles, { "case":"lower" , "names":true }) end);
explain select title from articles where
any wrd in tokens(articles, {"case":"lower", "names":true }) satisfies wrd = ’title' end;
"scan": {
"index": "ititlesearch2",
"index_id": "c60792ca9f957cfd",
"keyspace": "articles",
"spans": [
{
"Range": {
"High": [
""title""
],
"Inclusion": 3,
"Low": [
"“title""
]
}
}
],
"using": "gsi"
}

Keshav Murthy
Director, Couchbase Engineering
keshav@couchase.com

Goal of N1QL
Give developers and enterprises an expressive,
powerful, and complete language for querying,
transforming, and manipulating JSON data.

Array Indexing – How array is expanded in GSI
Sl. Create Index Expression Key versions generated
by Projector
Index Entries in GSI
storage
1. age [K1] [K1]docid
2. age, name, children [K1, K2, [c1, c2, c3]] [K1, K2, [c1, c2, c3]]docid
3. ALL ARRAY c FOR c IN cities END [[K11, K12, K13]] [ K11]docid
[ K12]docid
[ K13]docid
4. ALL ARRAY c FOR c IN cities END, age [[K11, K12, K13], K2] [ K11, K2]docid
[ K12, K2]docid
[ K13, K2]docid
4.1 age, ALL ARRAY c FOR c IN cities END, name [K1, [ K21, K22, K23,], K3] [K1, K21, K3]docid
[K1, K22, K3]docid
[K1, K23, K3]docid

Array Indexing – How array is expanded in GSI
Sl. Create Index Expression Key versions generated by
Projector
Index Entries in GSI storage
5. ALL ARRAY c FOR c IN cities END,
children
[[K11, K12, K13], [c1, c2, c3]] [ K11, [c1, c2, c3]]docid
[ K12, [c1, c2, c3]]docid
[ K13, [c1, c2, c3]]docid
6. ALL ARRAY (ALL ARRAY y FOR y IN c
END) FOR c IN cities END
[
[K1, K2, K3, K4, K5]
]
[K1]docid
[K2]docid
[K3]docid
[K4]docid
[K5]docid

Array Indexing Performance in ForestDB (3.6K sets)
Metrics KPI Measured comments
Array Q2(stale=ok) 13000 15140 Single doc match
& fetch
Array Q2(stale=false) 700 9420 Same with
consistency
Array Q3(stale=ok) 1100 1435 100 doc match
and fetch.
consistency

Array Indexing Performance in MOI with 30K sets
Metrics KPI Measured comments
Array Q2(stale=ok) 13000 15251 Single doc match
& fetch
consistency
Array Q3(stale=ok) 1100 1371 100 doc match
and fetch.
consistency

UNITED – POC on 4.0
Response times in milliseconds
1 Thread 10 Thread 50 Thread
Q1 13 35.1 197.84
Q2 28 66.8 285.32
Q3 - 7d 160 606 2960.2
Q3 - 28d 1725 8240.3 41439.86
1 Thread 5 Threads
Q1 1500 31000
Q2 Timed out.
Q3 23000 90000
MongoDB Query
Couchbase
Query
Response times in milliseconds

UNITED -- POC
• Query 2 – Get the selected flight using the document key. For each crew member
(pilot and flight attendant) found in the flight details.
• Fetch the previous flight assigned to the crew member
• Fetch the next flight assigned to the crew member
select
ods.GMT_EST_DEP_DTM,ods.PRFL_ACT_GMT_DEP_DTM,ods.PRFL_SCHED_GMT_DEP_D
TM,ods.GMT_EST_ARR_DTM,
ods.PRFL_ACT_GMT_ARR_DTM,ods.PRFL_SCHED_GMT_ARR_DTM,ods.FLT_LCL_ORIG_
DT,ods.PRFL_FLT_NBR,
ods.PRFL_TAIL_NBR,PILOT.PRPS_RSV_IND
from ods unnest ods.PILOT
where ods.TYPE='CREW_ON_FLIGHT' and
((ods.PRFL_ACT_GMT_DEP_DTM is not missing and
ods.PRFL_ACT_GMT_DEP_DTM > "2015-07-15T02:45:00Z") OR
(ods.PRFL_ACT_GMT_DEP_DTM is missing and ods.GMT_EST_DEP_DTM is not
null and ods.GMT_EST_DEP_DTM > "2015-07-15T02:45:00Z"))
and any p in ods.PILOT satisfies p.FILEN = "U110679" end
order by ods.GMT_EST_DEP_DTM limit 1

UNITED – POC Queries on 4.5
• 422,137 documents.
• Query2: BEFORE array indexing
• Primary index scan
• 38.91 seconds.
create index idx_odspilot on ods(DISTINCT ARRAY p.FILEN in p in PILOT END);
• Query2: AFTER array indexing
• Array index scan [DistinctScan]
• 8.51 millisecond
• Improvement of 4572 TIMES

Array Indexing – Size and numbers
• There is no limit on number of elements in the array.
• Total size of array index key should not exceed setting max_array_seckey_size (Default =
10K)
CREATE INDEX i1 on default(ALL flights, airlineid) . Lets say a given document is:
{
"flights": ["AF552", "AF166", "AF268", "AF422"],
"airlineid": "airline_137"
}
The indexable array keys for the document are:
[ ["AF552", "airline_137"], ["AF166", "airline_137"], ["AF268", "airline_137"], ["AF422",
"airline_137"] ]
Sum of lengths above items should be < max_array_seckey_size. Setting can be
increased but not decreased.

Statements : MERGE
BIG MERGE statement – Use travel-sample
explain merge into b1 using b2 on key "11" when matched then update set b1.o3=1;
merge into b1 using (select id from b2 where x < 10) as b3 on key b3.id when matched then update
set b1.o4=1;
merge into `travel-sample` using default on key "2" when matched then update set `travel-
sample`.name="aaa";
MERGE into WAREHOUSE using `beer-sample` ON KEY to_string("yakima_brewing_and_malting_grant_s_ales-
deep_powder_winter_ale²) when matched then delete;
88

Utilizing Arrays: Modeling, Querying and Indexing

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Viewers also liked

Viewers also liked (8)

Similar to Utilizing Arrays: Modeling, Querying and Indexing

Similar to Utilizing Arrays: Modeling, Querying and Indexing (20)

More from Keshav Murthy

More from Keshav Murthy (13)

Recently uploaded

Recently uploaded (20)

Utilizing Arrays: Modeling, Querying and Indexing

Editor's Notes