Introducing Azure DocumentDB - NoSQL, No Problem

{
"name": "Andrew Liu",
"e-mail": "andrl@microsoft.com",
"twitter": "@aliuy8"
}

Item Author Pages Language
Harry Potter and the Sorcerer’s
Stone
J.K. Rowling 309 English
Game of Thrones: A Song of Ice
and Fire
George R.R.
Martin
864 English

Item Author Pages Language
Harry Potter and the Sorcerer’s
Stone
J.K. Rowling 309 English
Game of Thrones: A Song of Ice
and Fire
George R.R.
Martin
864 English
Lenovo Thinkpad X1 Carbon ??? ??? ???

fully managed, scalable, queryable, schemafree JSON
document database service for modern applications
transactional processing
rich query
managed as a service
elastic scale
internet accessible http/rest
schema-free data model
arbitrary data formats

query over
schema-free
JSON
transactional
integrated javascript
tunable
performance
fully managed
as a service

No need to define secondary indices / schema hints for indexing!

-- Nested lookup against index
SELECT Books.Author
FROM Books
WHERE Books.Author.Name = "Leo Tolstoy"
-- Transformation, Filters, Array access
SELECT { Name: Books.Title, Author: Books.Author.Name }
FROM Books
WHERE Books.Price > 10 AND Books.Languages[0] = "English"
-- Joins, User Defined Functions (UDF)
SELECT CalculateRegionalTax(Books.Price, "USA", "WA")
FROM Books
JOIN LanguagesArr IN Books.Languages
WHERE LanguagesArr.Language = "Russian"
SQL Query Grammar

function(playerId1, playerId2) {
var playersToSwap = __.filter (function (document) {
return (document.id == playerId1 || document.id == playerId2);
});
var player1 = playersToSwap[0], player2 = playersToSwap[1];
var player1ItemTemp = player1.item;
player1.item = player2.item;
player2.item = player1ItemTemp;
__.replaceDocument(player1)
.then(function() { return __.replaceDocument(player2); })
.fail(function(error){ throw 'Unable to update players, abort'; });
}
client.executeStoredProcedureAsync
("procs/1234", ["MasterChief", "SolidSnake“])
.then(function (response) {
console.log(“success!");
}, function (err) {
console.log("Failed to swap!", error);
}
);
Client Database

Brewer’s CAP Theorem
Consistency
Availability Partition Tolerance

DocumentDB offers 4 consistency levelsBrewer’s CAP Theorem
Consistency
Availability Partition Tolerance
99.95% Availability SLA

• Predictable Performance
• Hourly Billing
• 99.95% Availability
• Adjustable Performance Levels
S1 S2 S3
I’m not
crying
anymore

“With Azure DocumentDB, we didn’t have to say ‘no’ to
the business, and we weren’t a bottleneck to launching
the promotion — in fact, we came in ahead of schedule.”

{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"addresses": [
{
"line1": "100 Some Street",
"line2": "Unit 1",
"city": "Seattle",
"state": "WA",
"zip": 98012 }
],
"contactDetails": [
{"email: "thomas@andersen.com"},
{"phone": "+1 555 555-5555", "extension": 5555}
]
}
Try model your entity as a self-
contained document
Generally, use embedded data
models when:
contains
one-to-few
changes infrequently
won’t grow
integral
better read performance

In general, use normalized data
models when:
Write performance
one-to-many
many-to-many
changes frequently
{
"id": "xyz",
"username: "user xyz"
}
{
"id": "address_xyz",
"userid": "xyz",
"address" : {
…
}
}
{
"id: "contact_xyz",
"userid": "xyz",
"email" : "user@user.com"
"phone" : "555 5555"
}
Normalizing typically provides better write performance

No magic bullet
Think about how your data is
going to be written, read and
model accordingly
{
"id": "1",
"firstName": "Thomas",
"lastName": "Andersen",
"countOfBooks": 3,
"books": [1, 2, 3],
"images": [
{"thumbnail": "http://....png"}
{"profile": "http://....png"}
]
}
{
"id": 1,
"name": "DocumentDB 101",
"authors": [
{"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"},
{"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"}
]
}

Request Unit (RU) is the
normalized currency
% Memory
% IOPS
% CPU
Replica gets a fixed budget
of Request Units
Resource
Resource
set
Resource
Resource
DocumentsSQL
sprocs
args
Resource Resource
Predictable Performance

Operation Request units
(RUs)
consumed*
Reading a single 1KB document 1
Reading a single 2KB document 2
Query with a simple predicate for a 1KB
document
3
Creating a single 1 KB document with 10
JSON properties (consistent indexing)
14
Create a single 1 KB document with 100 JSON
properties (consistent indexing)
20
Replacing a single 1 KB document 28
Execute a stored procedure with two create
documents
30

• Data Size
A single collection holds 10GB
• Throughput
3 Performance tiers with a max of 2,500 RU/sec

Tenant Partition Id
Customer 1
Big Customer 2
Another 3

{
record: "1",
created: {
"date": "6/1/2014",
"epoch": 1401662986
}
},
{
record: "3",
created: {
"date": "9/23/2014"
"epoch": 1411512586
}
} ,
{
record: "123",
created: {
"date": "8/17/2013"
"epoch": 1376779786
}
}
SELECT * FROM root r WHERE r.date.epoch BETWEEN 1376779786 AND 1401662986
{
record: "1",
created: {
"date": "6/1/2014",
"epoch": 1401662986
}
},
{
record: "3",
created: {
"date": "9/23/2014"
"epoch": 1411512586
}
}
{
record: "43233",
created: {
"epoch": 1411512586
}
} ,
{
record: "1123",
created: {
"date": "8/17/2013"
"epoch": 1376779786
}
},
{
record: "43234",
created: {
"epoch": 1376779786
}

Hash sharding
• Examples: Profile data (user ID, app ID), (user ID), Device and vehicle data (device/vin ID),
Catalog data (item ID)
• Pros: balanced, stateless
• Cons: reshuffling is hard
Range sharding
• Examples: Operational data (timestamp), (timestamp, event ID)
• Pros: easy sliding window, range queries
• Cons: stateful
Lookup sharding
• SaaS/multitenant service (tenant ID), Metadata store (type ID)
• Pros: simple, easy to reshuffle, can span accounts
• Cons: stateful, works only on discrete keys

How it works
Automatic indexing of documents
JSON documents are represented as
trees
Structural information and instance
values are normalized into a JSON-Path
Fixed upper bound on index size
(typically 5-10% in real production data)
Example
{"headquarters": "Belgium"}  /"headquarters"/"Belgium"
{"exports": [{"city": “Moscow"}, {"city": Athens"}]}  /"exports"/0/"city"/"Moscow"
and /"exports"/1/"city"/"Athens".

Configuration Level Options
Automatic Per collection True (default) or False
Override with each document write
Indexing Mode Per collection Consistent or Lazy
Lazy for eventual updates/bulk ingestion
Included and excluded
paths
Per path Individual path or recursive includes (? And *)
Indexing Type Per path Support Hash (Default) and Range
Hash for equality, range for range queries
Indexing Precision Per path Supports 3 – 7 per path
Tradeoff storage, query RUs and write RUs

Path Description/use case
/ Default path for collection. Recursive and applies to whole document tree.
/"prop"/? Serve queries like the following (with Hash or Range types respectively):
SELECT * FROM collection c WHERE c.prop = "value"
SELCT * FROM collection c WHERE c.prop > 5
/"prop"/* All paths under the specified label.
/"prop"/"subprop"/ Used during query execution to prune documents that do not have the
specified path.
/"prop"/"subprop"/? Serve queries (with Hash or Range types respectively):
SELECT * FROM collection c WHERE c.prop.subprop = "value"
SELECT * FROM collection c WHERE c.prop.subprop > 5

Introducing Azure DocumentDB - NoSQL, No Problem

Introducing Azure DocumentDB - NoSQL, No Problem

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Introducing Azure DocumentDB - NoSQL, No Problem

Similar to Introducing Azure DocumentDB - NoSQL, No Problem (20)

Recently uploaded

Recently uploaded (20)

Introducing Azure DocumentDB - NoSQL, No Problem

Editor's Notes