Arrays can be simple; arrays can be complex. JSON arrays give you a method to collapse the data model while retaining structure flexibility. Arrays of scalars, objects, and arrays are common structures in a JSON data model. Once you have this, you need to write queries to update and retrieve the data you need efficiently. This talk will discuss modeling and querying arrays. Then, it will discuss using array indexes to help run those queries on arrays faster.
Arrays can be simple; arrays can be complex. JSON arrays give you a method to collapse the data model while retaining structure flexibility. Arrays of scalars, objects, and arrays are common structures in a JSON data model. Once you have this, you need to write queries to update and retrieve the data you need efficiently. This talk will discuss modeling and querying arrays. Then, it will discuss using array indexes to help run those queries on arrays faster.
Arrays can be simple; arrays can be complex. JSON arrays give you a method to collapse the data model while retaining structure flexibility. Arrays of scalars, objects, and arrays are common structures in a JSON data model. Once you have this, you need to write queries to update and retrieve the data you need efficiently. This talk will discuss modeling and querying arrays. Then, it will discuss using array indexes to help run those queries on arrays faster.
Arrays can be simple; arrays can be complex. JSON arrays give you a method to collapse the data model while retaining structure flexibility. Arrays of scalars, objects, and arrays are common structures in a JSON data model. Once you have this, you need to write queries to update and retrieve the data you need efficiently. This talk will discuss modeling and querying arrays. Then, it will discuss using array indexes to help run those queries on arrays faster.
cbq> select distinct type from `travel-sample`;
{
"requestID": "458b7651-53a3-4a83-9abe-b65959420010",
"signature": {
"type": "json"
},
"results": [
{
"type": "route"
},
{
"type": "airport"
},
{
"type": "hotel"
},
{
"type": "airline"
},
{
"type": "landmark"
}
],
"status": "success",
"metrics": {
"elapsedTime": "840.518052ms",
"executionTime": "840.478414ms",
"resultCount": 5,
"resultSize": 202
}
An array is a way to hold more than one value at a time. It’s like a list of items.
Think of an array as the columns in a spreadsheet. You can have a spreadsheet with only one column or lots of columns.
An array is a way to hold more than one value at a time. It’s like a list of items.
Think of an array as the columns in a spreadsheet. You can have a spreadsheet with only one column or lots of columns.
An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).
An array is an ordered collection of values. An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by , (comma).
A value can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested.
An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).
An array is an ordered collection of values. An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by , (comma).
A value can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested.
An object is an unordered set of name/value pairs. An object begins with { (left brace) and ends with } (right brace). Each name is followed by : (colon) and the name/value pairs are separated by , (comma).
An array is an ordered collection of values. An array begins with [ (left bracket) and ends with ] (right bracket). Values are separated by , (comma).
A value can be a string in double quotes, or a number, or true or false or null, or an object or an array. These structures can be nested.
Let’s look at modeling Customer data.
Rich Structure
In relational database, this customers data would be stored in five normalized tables.
Each time you want to construct a customer object, you JOIN the data in these tables;
Each time you persist, you find the appropriate rows in relevant tables and insert/update.
Relationship
Enforcement is via referential constraints. Objects are constructed by JOINS, EACH time.
Value Evolution
Additional values of the SAME TYPE (e.g. additional phone, additional address) is managed by additional ROWS in one of the tables.
Customer:contacts will have 1:n relationship.
Structure Evolution:
This is the most difficult part.changing the structure is difficult, within a table, across tae table.
While you can do these via ALTER TABLE, requires downtime, migration and application versioning.
This is one of the problem document databases try to handle by representing data in JSON.
Let’s see how to represent customer data in JSON.
So, finally, you have a JSON document that represents a CUSTOMER.
In a single JSON document, relationship between the data is implicit by use of sub-structures and arrays and arrays of sub-structures.
The whole array is one blob of value that was indexed before 4.5. Any query should have to specify the entire array to find a match, which was not practical
Let’s see how array indexing helps. First it enables visibility into the array structure, so index can be created on subset of finer array elements or attributes.
With Array Indexing, subset of the array elements or attributes can be individually indexed & searched
We can index only required subset of the array, and hence be efficient on Index storage & search times.
Clearly, Benefits are lot more effectively visible with nested arrays/objects
For example, index created in earlier versions would look like the blue triangle with whole array indexed.
With array indexes in 4.5 only flight attributes with in the array can be indexed, which is much more efficient on storage and performance.
In summary, array indexing brings Performance, and ease of querying with arrays
For ex: this SELECT statement finds the total number of flights scheduled on 3rd day of the week,
It iterates using the ANY operator to find matching index keys.
Note that, the DML statement uses the exact array variables and predicates which are used in create index
this example creates composite index with attributes in the array such as ‘v.flight’, where v is an array element, and non-array attribute such as ‘stops’.
The SELECT query Finds all scheduled flights with one or more stops, and groups the result by number of stops.
Note how the array elements can be iterated in the projection list of SELECT
Lets look at an example with nested arrays. Consider the schedule array in travel-sample, with the nested array special-flights.
So, the create index statement also uses nested DISTINCT ARRAY construct to create the index on each distinct special flight.
Here is a SELECT statement to find the total number of scheduled special flights, which uses.
Again, note the nested form of ANY construct and the use of matching variables names & index keys of the corresponding CREATE index statement.
This feature has few. First the variable names and index keys, such as v & v.day, that are used in CREATE INDEX & SELECT must exactly match
The query predicate, which must appear in the WHERE clause of a SELECT, UPDATE, or DELETE statement, must have the exact matching format as the variable in the array index key, including the name of the variable like v.
Only the operators… are supported.
3. SELECT * FROM default WHERE ANY c IN cities SATISFIES c = "Bombay" END;
4. SELECT * FROM default WHERE ANY c IN cities SATISFIES c = "Bombay" END AND age < 35 ;
The select in #4 can be done using index in #3. But range low and high are different depending on index created.
If #4 select is used with #3 create index, then range is: High= ["\"Bombay\"”] Low= ["\"Bombay\"”] Inclusion: 3
If #4 select is used with #4 create index, then range is: High= ["\"Bombay\"”,"35”] Low= ["\"Bombay\"”,"null”] Inclusion: 0
#6 :
If two docs are:
D1 =
{
"age": 25,
"cities":[["Bangalore","Mysore"],["Chennai","Ooty"]]
}
D2 =
{
"age": 30,
"cities":[["Siliguri","Kolkata"],["Kohlapur","Mumbai"]]}
}
Then Create index query would be:
CREATE INDEX idcities_nested ON default(ALL ARRAY (ALL ARRAY y for y IN c END) FOR c IN cities END)
The above query throughput (queries per second) measured for ForestDB with 3.6K set ops per second.
The above query throughput (queries per second) measured for MOI with 30K set ops per second.
Q1 took 13 ms but with Couchbase query, it took about 1500 ms.