Agile Document Models & Data Structures

©2016 Couchbase Inc.
Agile Document Models
& Data Structures
1

©2016 Couchbase Inc. ©2016 Couchbase Inc.
Speaking Your Language
•  Topics for today:
•  Data structures - tie into native language collection interfaces
•  Sub-document - lower level access with focused power
•  Data modeling with Couchbase
•  Session: “Picking the right API for the right job”
•  SDK Goal: complex data access made easy
•  More than just a document storage/retrieval system
•  Tight SDK integration is key
•  Consistent, transparent developer experience across languages
2

©2016 Couchbase Inc. 3
Data Structures API

Couchbase SDK Data Structures API
•  Target SDK release along with 4.6
•  Builds on awesomeness of sub-document API
•  Simpliﬁed access without touching whole document
•  Make JSON data types transparent
•  Native integration of Map, List, Set, Queues…
•  Java Collections Framework
•  .NET System.Collections
•  Python, Node.js, Go
4

Typical Document Data Access
JSON
Doc CB JSON
Object SDK Collec?ons
Framework App
5

Simpliﬁed Data Structure Access
JSON
Doc CB Collec?ons
Framework
SDK
DS
6
App
“user1”: {“name”:... ,
“address”:.. ,
“favs”: [...]},
“user2”:{“name” ,
“address” ...,
”favs”: [...]},
for (String f : favs) {}
“user1”: {“name”:... ,
“address”:.. ,
“favs”: [...]},
“address” ...,
”favs”: [...]},

Targeted Collection Updates
Item
From
Collec?on
App
Sub-doc
Update CB
7
MapAdd(“user1”,”favs”, “newfav”)
“user1”: {“name”:... ,
“address”:.. ,
“favs”: [...]},
“address” ...,
”favs”: [...]},

The Four Data Structures…
Structure JSON Type JSON Example
Lists
-  Append, prepend, insert
-  Size/count
JSON Array: [… , ... ] [ 1, 2, “abc” ]
Maps
-  Add/remove by key
-  Size/count
JSON Object: { “key”:
“value”}
{ “name”: “value” }
Sets
-  Specialized add/remove
-  Unique values
-  Size/count
JSON Array: [ … , ... ] [ 1, 3, 6, 8 ]
Queue
-  First in – ﬁrst out
-  Pop – retrieve/remove
-  Size/count
JSON Array: [… , ... ]

[ “task1”, “task2”, “task3” ]
remove 1...
[ “task2”, “task3”, “task4” ]
8

Consistent Access Across Languages
Func:ons
Lists ListGet ListPush ListShift ListDelete ListSet ListSize
>namesList = bucket.ListGet(“key”)
>print namesList
[‘name1’,’name2’,’name3’]
Maps MapGet MapRemove MapSize MapSet
Sets SetAdd SetExists SetSize SetRemove
Queue QueuePush QueuePop QueueSize QueueRemove
9
•  Idiomatic -vs- functional
•  Java Collections Framework
•  .NET System.Collections
•  As well as functional approach
* Experimental features alert: may add/remove to this list – feedback welcome!

Consistent Access Across Languages
10
Collec:ons Approach
Lists List<String> namesList = new CouchbaseArrayList<String>("key", bucket);
for (String name : namesList) { … }
Maps var namesDict = new CouchbaseDictionary<string, Poco>(_bucket, “key”);
namesDict.Add(“newkey1”, new Poco { Name = “poco1” });
Sets var namesSet = new CouchbaseSet<Poco>(_bucket, "pocos");
namesSet.Add(new Poco { Key = "poco1", Name = "Poco-pica" });
namesSet.Remove(new Poco {Key = "poco1", Name = "Poco-pica"});
foreach(var poco in namesSet){ … }
Queue var namesQueue = new CouchbaseQueue<Poco>(_bucket, key);
namesQueue.Enqueue(new Poco { Name = "pcoco1" });
var item = namesQueue.Peek();
•  Support for advanced capabilities of collection frameworks

Sub-Document API

Sub-Document API
“The sub-document API enables you to access parts of JSON documents (sub-documents) efficiently
without requiring the transfer of the entire document over the network.
This improves performance and brings better efficiency to the network IO path, especially when
working with large JSON documents.”
•  First released in 4.5, support cross SDK
•  Efficient document lookup, insert & update
•  Powerful lower level control, focusing on particular elements
•  Keep work on server
•  Two methods available – lookup and mutate/change
12

Digging Below Data Structures
Data Structures API Sub-Document API
MapGet(key, mapkey) LookupIn(key).get(mapkey)
MapRemove(key, mapkey) MutateIn(key).remove(mapkey)
MapSet(key, mapkey,
value, createMap)
MutateIn(key).(mapkey, value,
create_doc=createMap)
13

Sub-Document API
14
Opera:ons
LookupIn LookupIn(key, operation(path))
Get Exists Execute
MutateIn MutateIn(key, operation(path, value))
Counter Insert Remove Replace Upsert Execute 
arrayAddunique arrayAppend arrayInsert arrayPrepend
Chaining
Opera:ons
MutateIn(key, operation(path, value),  
operation(path, value),
operation(path, value))
Returns SubdocResult<rc=0x0, key='map1', cas=0x14b6458980042,
specs=(Spec<GET, 'subkey1'>, Spec<EXISTS, 'subkey1'>),
results=[(0, u'subvalue1'), (0, None)]>

Sample Sub-Document Lookup
15
LookupIn(key, operation(path))
LookupIn(‘copilotmark’) 
.get(‘phones.number') 
.execute(); 
LookupIn(‘copilotmark’) 
.exists(‘phones’) 
.get(‘phones.number') 
.get(‘gender’) 
.execute();
SubdocResult<rc=0x0, key=’copilotmark', cas=0x14b6458980042,
specs=(Spec<EXISTS, ‘phones’>, <GET, ’phones.number'>, <GET, ‘gender’),
results=[(0, None,), (0, ’212-771-1834’), (0, u’male')]>

Sample Sub-Document Change
16
MutateIn(key, path, value)
MutateIn(‘copilotmark’)  
.replace(‘phones.number’,  
‘212-787-2212’) 
.upsert(‘nickname’,
‘Freddie’) 
.execute()

Data Modeling for Couchbase Server

What is Data Modeling?
18
•  A data model is a conceptual representation of the data structures that are required by a
database
•  The data structures include the data objects, the associations between data objects, and the
rules which govern operations on the objects.

Data Modeling Approaches
19
NoSQL
Relaxed Normaliza?on
schema implied by structure
ﬁelds may be empty, duplicate, or missing
Rela:onal
Required Normaliza?on
schema enforced by db
same ﬁelds in all records
•  Minimize data inconsistencies (one item = one loca?on)
•  Reduced update cost (no duplicated data)
•  Preserve storage resources
•  Op?mized to planned/actual access pagerns
•  Flexibly with soiware architecture
•  Supports clustered architecture
•  Reduced server overhead

Modeling Couchbase Documents
20
•  Couchbase Server is a document database
•  Data is stored in JSON documents, not in tables
•  Relational databases rely on an explicit pre-deﬁned schema to describe the structure of data
•  JSON documents are self-describing

What and Why JSON?
21
•  What is JSON?
–  Lightweight data interchange format
–  Based on JavaScript
–  Programming language independent
–  Field names must be unique
•  Why JSON?
–  Schema ﬂexibility
–  Less verbose
–  Can represent Objects and Arrays
(including nested documents)
There is NO IMPEDENCE MISMATCH between a JSON Document and a Java
Object

JSON Design Choices
22
•  Couchbase Server neither enforces nor validates for any particular document structure
•  Choices that impact JSON document design:
–  Single Root Attributes
–  Objects vs. Arrays
–  Array Element Types
–  Timestamp Formats
–  Property Names
–  Empty and Null Property Values
–  JSON Schema

Root Attributes vs. Embedded Attributes
23
•  The choice of having a single root attribute or the “type” attribute embedded.

24
•  Accessing the document with a root attribute
SELECT track.* FROM couchmusic

25
•  Accessing the document with the “type” attribute
SELECT * FROM couchmusic
WHERE
type=‘track’

Objects vs. Arrays
26
•  The choice of having an object type, or an array type

Objects vs. Arrays
27
•  How would the object look like?
class UserProfile{
Phone phones;
}
class Phone{
String cell;
String landline;
}

Objects vs. Arrays
28
•  How would the object look like?
class UserProfile{
List<Phone> phones;
}
class Phone{
String number;
String type;
}

Array Element Types
Array of strings
Array of objects
29
•  Array elements can be simple types, objects or arrays:

Array Element Types
Array of strings
30
class Playlist{
List<String> tracks;
}
...
String trackId = tracks.get(1);
JsonDocument trackDocument =
bucket.get(trackId)
Mul:ple get() calls to retrieve the document. Worth it?

Array Element Types
31
class Playlist{
List<Track> tracks;
}
...
myPlaylist.getTracks()
.get(1).getArtistName();
Limited Denormaliza:on: commonly needed data (e.g., ?tle) in local object, detail available in referenced
foreign document

Timestamp Formats
Array of ?me
components
String (ISO 8601)
Number (Unix style)
(Epoch)
•  Working and dealing with timestamps has been challenging ever since
•  When storing timestamps, you have at least 3 options:
16

Observed Practices with Timestamp Formats
•  Storing as Epoch will help you to easily sort the documents
•  If you wanted the documents to be sorted in the order of their “last update” time
•  SELECT * FROM couchmusic WHERE type = ‘track’
ORDER BY updates DESC
•  Storing date as array format helps
•  To grouping
16

Taking Advantage of Storing Date as an Array
•  Group options can be speciﬁed to control the execution of the view
•  The group and group_level options are only useful
when a Reduce function has been deﬁned in the
corresponding View
•  The group_level option, used when the key is an Array,
determines how many elements of the key are
used when aggregating the results.
16

Example of View group_level = 1
Key Value
[2014] 36
[2015] 20
Execute Reduce
Key Value
[2014,11,29,18,49,36] 3
[2014,12,03,20,11,26] 5
[2014,12,03,23,37,21] 2
[2014,12,06,10,12,19] 8
[2014,12,09,05,01,26] 3
[2014,12,18,01,04,30] 11
[2014,12,26,18,34,44] 4
[2015,01,03,16,48,32] 7
[2015,01,03,20,20,06] 5
[2015,01,15,08,17,28] 8
Copyright © 2015 Couchbase, Inc. 35
•  For the data below with Reduce function deﬁned as _sum and group_level = 1

Key Value
[2014,11] 3
[2014,12] 33
[2015,01] 20
Key Value
[2014,11,29,18,49,36] 3
[2014,12,03,20,11,26] 5
[2014,12,03,23,37,21] 2
[2014,12,06,10,12,19] 8
[2014,12,09,05,01,26] 3
[2014,12,18,01,04,30] 11
[2014,12,26,18,34,44] 4
[2015,01,03,16,48,32] 7
[2015,01,03,20,20,06] 5
[2015,01,15,08,17,28] 8
Execute Reduce

Key Value
[2014,11,29,18,49,36] 3
[2014,12,03,20,11,26] 5
[2014,12,03,23,37,21] 2
[2014,12,06,10,12,19] 8
[2014,12,09,05,01,26] 3
[2014,12,18,01,04,30] 11
[2014,12,26,18,34,44] 4
[2015,01,03,16,48,32] 7
[2015,01,03,20,20,06] 5
[2015,01,15,08,17,28] 8
Key Value
[2014,11,29] 3
[2014,12,03] 7
[2015,12,06] 8
[2015,12,09] 3
[2015,12,18] 11
[2015,12,26] 4
[2014,01,03] 12
[2014,01,15] 8
Execute Reduce

Empty and Null Property Values
38
•  Keep in mind that JSON supports optional properties
•  If a property has a null value, consider dropping it from the JSON, unless there's a good reason
not to
•  N1QL makes it easy to test for missing or null property values
•  Be sure your application code handles the case where a property value is missing
SELECT * FROM couchmusic1 WHERE userprofile.address IS NULL;
SELECT * FROM couchmusic1 WHERE userprofile.gender IS MISSING;

Empty, Null and Missing Property Values
39
{
countryCode: “UK”,
currencyCode: “GBP”,
region: “Europe”
}
{
region: “”
}
WHERE region
IS NOT MISSING, IS NOT NULL, IS VALUED
WHERE region
IS NOT MISSING, IS NOT NULL, IS NOT VALUED
{
currencyCode: “GBP”
}
{
region: null
}
WHERE region IS MISSING WHERE region IS NULL

JSON Schema
40
•  Couchbase Server pays absolutely no attention to the shape of your JSON documents so long
as they are well-formed
•  There are times when it is useful to validate that a JSON document conforms to some
expected shape
•  JSON Schema is a JSON-based format for deﬁning the structure of JSON data
•  There are implementations for most popular programming languages
•  Learn more here: http://json-schema.org

Example of JSON Schema
41

Example of JSON Schema – Type Speciﬁcation
Available type speciﬁca?ons include:
•  array
•  boolean
•  integer
•  number
•  object
•  string
•  enum
42

Type speciﬁc valida?ons include:
•  minimum
•  maximum
•  minLength
•  maxLength
•  format
•  pagern
43
Example of JSON Schema – Type Speciﬁc Validation

Example of JSON Schema – Required Properties
Required proper?es can be speciﬁed for each object
44

Example of JSON Schema – Additional Properties
Addi?onal proper?es can be disabled
45

Data Nesting (aka Denormalization)
46
•  As you know, relational database design promotes separating data using normalization, which
doesn’t scale
•  For NoSQL systems, we often avoid normalization so that we can scale
•  Nesting allows related objects to be organized into a hierarchical tree structure where you can
have multiple levels of grouping
•  Rule of thumb is to nest no more than 3 levels deep unless there is a very good reason to do so
•  You will often want to include a timestamp in the nested data

Example #1 of Data Nesting
•  Playlist with owner attribute containing username of corresponding userproﬁle
47
Document Key: copilotmarks61569

•  Playlist with owner attribute containing a subset of the corresponding userproﬁle

48
* Note the inclusion of the updated agribute

•  Playlist with tracks attribute containing an array of track IDs

49

•  Playlist with tracks attribute containing an array of track objects
50
* Note the inclusion of the updated agribute

Key Design

Choices with JSON Key Design
52
•  A key formed of attributes that exist in the real world:
–  Phone numbers
–  Usernames
–  Social security numbers
–  Account numbers
–  SKU, UPC or QR codes
–  Device IDs
•  Often the ﬁrst choice for document keys
•  Be careful when working with any personally identiﬁable information (PII), sensitive personal
information (SPI) or protected health information (PHI)

Surrogate Keys
53
•  We often use surrogate keys when no obvious natural key exist
•  They are not derived from application data
•  They can be generated values
–  3305311F4A0FAAFEABD001D324906748B18FB24A (SHA-1)
–  003C6F65-641A-4CGA-8E5E-41C947086CAE (UUID)
•  They can be sequential numbers (often implemented using the Counter feature of Couchbase
Server)
–  456789, 456790, 456791, …

Key Value Patterns
•  Common practice for users of Couchbase Server to follow patterns for formatting key values by
using symbols such as single or double colons
•  DocType::ID
–  userproﬁle::fredsmith79
–  playlist::003c6f65-641a-4c9a-8e5e-41c947086cae
•  AppName::DocType::ID
–  couchmusic::userproﬁle::fredsmith79
54
Enables Mul:-Tenency
–  pizza::user::101
–  Pizza::user::102
–  burger::user::101
–  burger::user::102

Lookup Key Pattern
55
•  The purpose of the Lookup Key Pattern is to allow multiple ways to reach the same data,
essentially a secondary index
•  For example, we want to lookup a Userproﬁle by their email address instead of their ID
•  To accomplish this, we create another small document that refers to the Userproﬁle
document we are interested in
•  Implementing this pattern is straightforward, just create an additional document containing a
single property that stores the key to the primary document
•  With the introduction of N1QL, this pattern will be less commonly used

Lookup Key Pattern
56
userprofile::copilotmarks61569 andy.bowman@games.com
JSON
andy.bowman@games.com
•  Lookup document can be JsonDocument or StringDocument

Trade-oﬀs in Data Modeling

Making Tough Choices
58
•  We must also make trade-oﬀs in data modeling:
–  Document size
–  Atomicity
–  Complexity
–  Speed

Document Size
59
•  Couchbase Server supports documents up to 20 Mb
•  Larger documents take more disk space, more time to transfer across the network and more
time to serialize/deserialize
•  If you are dealing with documents that are potentially large (greater than 1 Mb), you must test
thoroughly to ﬁnd out if speed of access is adequate as you scale. If not, you will need to break
up the document into smaller ones.
•  You may need to limit the number of dependent child objects you embed

Atomicity
60
•  Atomicity in Couchbase Server is at the document level
•  Couchbase Server does not support transactions
•  They can be simulated if you are willing to write and maintain additional code to implement
them (generally not recommended)
•  If you absolutely need changes to be atomic, they will have to be part of the same document
•  The maximum document size for Couchbase Server may limit how much data you can store in
a single document

Complexity
61
•  Complexity aﬀects every area of software systems including data modeling
•  The complexity of queries (N1QL)
•  The complexity of code for updating multiple copies of the same data

Speed
62
•  As it relates to data modeling, speed of access is critical
•  When using N1QL to access data, keep in mind that query by document key is fastest and
query by secondary index is usually much slower
•  If implementing an interactive use case, you will want to avoid using JOINs
•  You can use data duplication to improve the speed of accessing related data and thus trade
improved speed for greater complexity and larger document size
•  Keep in mind that Couchbase Views can be used when up to the second accuracy is not
required

Remember
63
SDK get() is faster than (get by key)
N1QL with MOI is faster than
N1QL with GSI is faster than

Model you document key, such that you document can be retrieved with the key, if possible, than
a N1QL query

Embed vs. Refer
64
•  All of the previous trade-oﬀs are usually rolled into a single decision – whether to embed or
refer
•  When to embed:
–  Reads greatly outnumber writes
–  You're comfortable with the slim risk of inconsistent data across the multiple copies
–  You're optimizing for speed of access
•  When to refer:
–  Consistency of the data is a priority
–  You want to ensure your cache is used eﬃciently
–  The embedded version would be too large or complex

Next Steps
•  Flexible data access is key to solutions using document stores
•  Join us for discussion on Forums or discuss with our experts here
•  https://forums.couchbase.com
•  https://developer.couchbase.com/server
65

Get Trained on Couchbase
http://training.couchbase.com
http://training.couchbase.com/online

CS300: Couchbase NoSQL Server Administration
CD220: Developing Couchbase NoSQL Applications
CD210: Couchbase NoSQL Data Modeling, Querying, and Tuning Using N1QL
CD257: Developing Couchbase Mobile NoSQL Applications

Tyler Mitchell
Senior Product Manager, SDK
tyler@couchbase.com
@1tylermitchell

Clarence J M Tauro, Ph.D.
Senior Instructor

clarence@couchbase.com
@javapsyche

Agile Document Models & Data Structures

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Agile Document Models & Data Structures

Similar to Agile Document Models & Data Structures (20)

Agile Document Models & Data Structures