Dealing with Azure Cosmos DB

What is Azure Cosmos DB
Global Distribution
Worldwide presence
Automatic multi-region replication
Multi-homing APIs
Manual and automatic failovers

Five Consistency Models
Helps navigate Brewer's CAP theorem
Intuitive Programming
• Tunable well-defined consistency levels
• Override on per-request basis
Clear PACELC tradeoffs
• Partition – Availability vs Consistency
• Else – Latency vs Consistency

Comprehensive SLAs
99.99% availability
Durable quorum committed writes
Latency, consistency, and throughput also covered by
financially backed SLAs
Made possible with highly-redundant architecture
Operation
type Single region
Multi-region (single
region writes)
Multi-region (multi-
region writes)
Writes 99.99 99.99 99.999
Reads 99.99 99.999 99.999

Cosmos DB Data Formats
• The query syntax is geared at navigating
graphs – you could say e.g. .has(‘person’, ‘name’,
‘Thomas’).outE(‘Knows’) to find people who
Thomas knows.

Cosmos DB Indexing
 Indexing on by default
 Index only specific paths in your document

Cosmos DB Resources - Core (SQL) API
Container

Cosmos DB Resources
• Database Account:
• Regions
• API

Cosmos DB Resources
• The container that houses
your data
• /dbs/{id} is not your ID
• Hash known as a “Self Link”

Cosmos DB Resources
• Video
• Audio
• Blob
• Etc.

Cosmos DB Resources
• Invite in an existing azure account
• Allows you to set permissions on
each concept of the database

Cosmos DB Resources
• Authorization token
• Associated with a user
• Grants access to a given
resource

Cosmos DB Resources
• Most like a “table”
• Structure is not defined
• Dynamic shapes based on
what
you put in it

Cosmos DB Resources
• A blob of JSON representing your data
• Can be a deeply nested shape
• No specialty types
• No specific encoding types

Cosmos DB Resources
• Think media – at the
document level!
The maximum size for a document
and it's attachment in CosmosDB now is 2 MB.
In DocumentDB the maximum size of document
and it’s attachment was 512KB
Media size – 2GB

Cosmos DB Resources
• Written in JavaScript!
• Is transactional
• Executed by the database engine
• Can live in the store
• Can be sent over the wire

Cosmos DB Resources
• Can be Pre or Post (before or after)
• Can operate on the following actions
• Create
• Replace
• Delete
• All
• Also written in javascript!
+ Azure
Functions

Cosmos DB Resources
• Can only be ran on a query
• Modifies the result of a
given query
• mathSqrt()

Cosmos DB Resources
 JSON array {"exports": [{"city": “Moscow"}, {"city": Athens"}]} correspond to the
paths /"exports"/0/"city"/"Moscow" and /"exports"/1/"city"/"Athens".

Cosmos DB Resources
Property User settable or system generated? Purpose
_rid System generated System generated, unique and hierarchical
identifier of the resource.
_etag System generated etag of the resource required for
optimistic concurrency control.
_ts System generated Last updated timestamp of the resource.
_self System generated Unique addressable URI of the resource.
id User settable User defined unique name of the
resource.

Cosmos DB Resources
Value of the _self Description
/dbs Feed of databases under a database account.
/dbs/{_rid-db} Database with the unique id property with the value {_rid-db}.
/dbs/{_rid-db}/colls/ Feed of collections under a database.
/dbs/{_rid-db}/colls/{_rid-coll} Collection with the unique id property with the value {_rid-coll}.
/dbs/{_rid-db}/users/ Feed of users under a database.
/dbs/{_rid-db}/users/{_rid-user} User with the unique id property with the value {_rid-user}.
/dbs/{_rid-db}/users/{_rid-user}/permissions Feed of permissions under a database.
/dbs/{_rid-db}/users/{_rid-user}/permissions/{_rid-permission} Permission with the unique id property with the value {_rid-permission}.

Request Units
Request Units (RUs) is a rate-based currency – e.g. 1000 RU/second
Abstracts physical resources for performing requests
% IOPS% CPU% Memory

Request Units
Each request consumes # of RU
Approx. 1 RU = 1 read of 1 KB document
Approx. 5 RU = 1 write of a 1KB document
Query: Depends on query & documents involved
GET
POST
PUT
Query
…
=
=
=
=

Request Units- Provisioned throughput
Provisioned in terms of RU/sec – e.g. 1000 RU/s
Billed for highest RU/s in 1 hour
Easy to increase and decrease on demand
Rate limiting based on amount of throughput provisioned
Background processes like TTL expiration, index
transformations scheduled when quiescent
Min RU/sec
Max
RU/sec
IncomingRequests
No rate limiting,
process background
operations
Rate limiting –
SDK retry
No rate limiting

Cosmos DB Resources
Partitioned
collection
Single partition
collection (only via
SDK v.2) S1 S2 S3
Maximum
throughput
Unlimited 10K RU/s 250 RU/s 1 K RU/s 2.5 K RU/s
Minimum
throughput
2.5K
400 RU/s
400 RU/s 250 RU/s 1 K RU/s 2.5 K RU/s
Maximum storage Unlimited 10 GB 10 GB 10 GB 10 GB
Price Throughput: $6 /
100 RU/s
Storage: $0.25/GB
Throughput: $6 /
100 RU/s
Storage: $0.25/GB
$25 USD $50 USD $100 USD

Data Modeling in
Azure Cosmos DB

Sample structure
{
"ID": 1,
"ItemName": "hamburger",
"ItemDescription": "cheeseburger, no cheese",
“CategoryId": 5,
"Category": "sandwiches"
"CategoryDescription": "2 pieces of bread + filling"
}

Modeling challenge : To embed or reference?
{
"menuID": 1,
"menuName": "Lunch menu",
"items": [
{"ID": 1, "ItemName": "hamburger", "ItemDescription":...}
{"ID": 2, "ItemName": "cheeseburger", "ItemDescription":...}
]
}
{
"menuID": 1,
"menuName": "Lunch menu",
"items": [
{"ID": 1}
{"ID": 2}
]
}
{"ID": 1, "ItemName": “hamburger", "ItemDescription":...}
{"ID": 2, "ItemName": “cheeseburger", "ItemDescription":...}

But wait, you can do both!
{
"id": "speaker1",
"name": "Alice",
"email": "alice@contoso.com",
“address”: “1 Microsoft Way”
“phone”: “555-5555”
"sessions":[
{"id": "session1"},
{"id": "session2"}
]
}
{
"id": “session1",
"name": "Modelling Data 101",
"speakers":[
{"id": "speaker1“, “name”: “Alice”,
“email”: “alice@contoso.com”},
{"id": "speaker2“, “name”: “Bob”}
]
}
Embed reference less frequently
used

Partitioning in
Azure Cosmos DB

Partitioning
Logical partition: Stores all data associated with the same partition
key value
Physical partition: Fixed amount of reserved SSD-backed storage +
compute.
Cosmos DB distributes logical partitions among a smaller number of
physical partitions.
From user’s perspective: define 1 partition key per container

Containers support unlimited storage by dynamically
allocating additional physical partitions
Storage for single partition key value (logical partition)
is quota'ed to 10GB.
When a partition key reaches its provisioned storage
limit, requests to create new resources will return a
HTTP Status Code of 403 (Forbidden).
Azure Cosmos DB will automatically add partitions, and
may also return a 403 if:
• An authorization token has expired
• A programmatic element (UDF, Stored Procedure,
Trigger) has been flagged for repeated violations
Partition Key Storage Limits
HTTP 403

Partitioning in Cosmos DB
p
p1 p2

Partitioning in Cosmos DB
API Partition Key Row Key
DocumentDB custom partition key path fixed id
MongoDB custom shard key fixed _id
Graph custom partition key
property
fixed id
Table fixed PartitionKey fixed RowKey

Developing against
Cosmos DB (SQL API)

Developing against Cosmos DB SQL API

Querying Cosmos DB SQL API
 SELECT
AS
AS
AS
AS
 FROM
 JOIN IN
 JOIN IN

 var
id: "contains",
function {
if (arr.indexOf(obj) > -1) {
 return true;
 }
 return false;
}

 SELECT FROM Families WHERE
contains "Andersen" false

var result =
client.ExecuteStoredProcedureAsync(
createdStoredProcedure._self);

Cosmos DB: Table API
Azure Table Storage
Azure Cosmos DB: Table storage
(preview)
Latency Fast, but no upper bounds on latency Single-digit millisecond latency for reads and
writes, backed with <10-ms latency reads
and <15-ms latency writes at the 99th
percentile, at any scale, anywhere in the
world
Throughput variable throughput model. Tables have a
scalability limit of 20,000 operations/s
Highly scalable with dedicated reserved
throughput per table, that is backed by
SLAs. Accounts have no upper limit on
throughput, and support >10 million
operations/s per table
Global Distribution Single region with one optional readable
secondary read region for HA. You cannot
initiate failover
Turn-key global distribution from one to 30+
regions, Support for automatic and manual
failovers at any time, anywhere in the world
Indexing Only primary index on PartitionKey and
RowKey. No secondary indexes
Automatic and complete indexing on all
properties, no index management

Cosmos DB: Table API
Azure Table Storage
Azure Cosmos DB: Table storage
(preview)
Query Query execution uses index for primary
key, and scans otherwise.
Queries can take advantage of
automatic indexing on properties for
fast query times. Azure Cosmos DB's
database engine is capable of
supporting aggregates, geo-spatial, and
sorting.
Consistency Strong within primary region, Eventual
with secondary region
Five well-defined consistency levels to
trade off availability, latency,
throughput, and consistency based on
your application needs
Pricing Storage-optimized Throughput-optimized
SLAs 99.9% availability 99.99% availability within a single
region, and ability to add more regions
for higher availability. Industry-leading
comprehensive SLAs on general
availability

Common Scenarios
Event Sourcing (Microservices)

1. Event driven design with Azure Fuctions
Azure Functions
(E-Commerce Checkout API)
Azure Cosmos
DB (Order Event
Store)
Azure Functions
(Microservice 1: Tax)
Azure Functions
(Microservice 2: Payment)
Azure Functions
(Microservice N:
Fulfillment)
. . .

2. Real-time data movement
Data Movement / Backup
…

3. Materialized View
SubscriptionI
D
UserID Create
Date
…
123abc Ben6 6/17/17
456efg Ben6 3/14/17
789hij Jen4 8/1/16
012klm Joe3 3/4/17
UserID Total Subscriptions
Ben6 2
Jen4 1
Joe3 1

Three different ways to use the Change Feed
Implementation Use Case Advantages
Azure Functions
Serverless
applications
Easy to implement.
Used as a trigger, input or output binding to an Azure
Function.
Change Feed
Processor Library
Distributed
applications
Ability to distribute the processing of events towards
multiple clients. Requires a “leases collection”.
SQL API SDK for
.NET or Java
Not
recommended
Requires manual implementation in a .NET or Java
application.

Cosmos DB Performance: SQL API
 Use direct connection mode for better performance

 var client =DocumentClient client = new DocumentClient
 (serviceEndpoint, authKey,
 new ConnectionPolicy
 {
 ConnectionMode = ConnectionMode.Direct,
 ConnectionProtocol = Protocol.Tcp
 });

Document document = await
client.ReadDocumentAsync("/dbs/1234/colls/1234354/docs/2332435465");

Multi-master at global scale with Azure Cosmos DB

Billing Model
Two components: Consumed Storage + Provisioned Throughput
You are billed on consumed storage and provisioned throughput
Containers in a database can share throughput
Unit Price (for most Azure regions)
SSD Storage (per GB) $0.25 per month
Provisioned Throughput (single region
writes)
$0.008/hour per 100 RU/s
Provisioned Throughput (multi-region
writes)
$0.016/hour per 100 multi-region write
RU/s
* pricing may vary by region; for up-to-date pricing, see: https://azure.microsoft.com/pricing/details/cosmos-

Billing Model
Automatically configuring provisioned throughput with Autopilot Preview
You are billed on consumed storage and provisioned throughput
Containers in a database can share throughput
Autopilot Throughput – Unit (100 RU/s
per hour)
Price
100 Autopilot RU/s, single-region account $0.012/hour
100 Autopilot RU/s, multi-region, single
master account with N regions
N regions x $0.012/hour, where N > 1
100 RU/s multi-region, multi-master
account with N regions
N regions x $0.016/hour, where N > 1

Billing Model
Reserved capacity for provisioned throughput
1 YEAR RESERVATION 3 YEAR RESERVATION
THROUGHPUT SINGLE REGION WRITE MULTIPLE REGION WRITE SINGLE REGION WRITE MULTIPLE REGION WRITE
PRICE/SAVINGS PRICE PER 100 RU/S
(SAVINGS OVER PAYG)
PRICE PER 100 RU/S
(SAVINGS OVER PAYG)
PRICE PER 100 RU/S
(SAVINGS OVER PAYG)
PRICE PER 100 RU/S
(SAVINGS OVER PAYG)
First 50K RU/s $0.0068 (~15%) $0.0128 (~20%) $0.006 (~25%) $0.0112 (~30%)
Next 450K RU/s $0.006 (~25%) $0.0112 (~30%) $0.0052 (~35%) $0.0096 (~40%)
Next 2.5M RU/s $0.0056 (~30%) $0.0104 (~35%) $0.0044 (~45%) $0.008 (~50%)
Over 3M RU/s $0.0044 (~45%) $0.008 (~50%) $0.0032 (~60%) $0.0056 (~65%)

Billing Model
Free Cosmos DB Tier
Azure Cosmos DB Free Tier. Develop and test applications, or
run small production workloads free within the Azure
environment.
Get Started: Enable Free Tier on a new account to receive 400
RU/s throughput and 5 GBs storage free each month for the life
of your account.

Dealing with Azure Cosmos DB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Dealing with Azure Cosmos DB

Similar to Dealing with Azure Cosmos DB (20)

More from Mihail Mateev

More from Mihail Mateev (7)

Recently uploaded

Recently uploaded (20)

Dealing with Azure Cosmos DB