[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~

Azure Cosmos DB Deep Dive
~ Partitioning, Global Distribution and Indexing ~
SATO Naoki (Neo) (@satonaoki)
Azure Technologist, Microsoft

Agenda
Overview
Partitioning Strategies
Global Distribution
Indexing

Overview of partitioning
+
container
15,000 RUs
physical
partition 1
7,500 RUs
physical
partition 2
7,500 RUs
Client application
(write)
Another client
application
(read)

Client application
(write)
Another client
application
(read)
Application writes data and
provides a partition key value
with every item
+
container
15,000 RUs
physical
partition 1
7,500 RUs
physical
partition 2
7,500 RUs

Client application
(write)
Another client
application
(read)
Cosmos DB uses partition
key value to route data to a
partition
+
container
15,000 RUs
physical
partition 1
7,500 RUs
physical
partition 2
7,500 RUs

+
Client application
(write)
Another client
application
(read)
Every partition can store up
to 50GB of data and serve
up to 10,000 RU/s
container
15,000 RUs
physical
partition 1
7,500 RUs
physical
partition 2
7,500 RUs

+
Client application
(write)
Another client
application
(read)
The total throughput for the
container will be divided evenly
across all partitions
container
15,000 RUs
physical
partition 1
7,500 RUs
physical
partition 2
7,500 RUs

container
15,000 RUs
physical
partition 1
5,000 RUs
physical
partition 2
5,000 RUs
Client application
(write)
Another client
application
(read)
If more data or throughput is
needed, Cosmos DB will add a new
partition automatically
physical
partition 3
5,000 RUs

container
15,000 RUs
physical
partition 1
5,000 RUs
physical
partition 2
5,000 RUs
Client application
(write)
Another client
application
(read)
The data will be redistributed
as a result
physical
partition 3
5,000 RUs

container
15,000 RUs
physical
partition 1
5,000 RUs
physical
partition 2
5,000 RUs
Client application
(write)
Another client
application
(read)
And the total throughput
capacity will be divided evenly
between all partitions
physical
partition 3
5,000 RUs

container
15,000 RUs
physical
partition 1
5,000 RUs
physical
partition 2
5,000 RUs
Client application
(write)
Another client
application
(read)
To read data efficiently, the app
must provide the partition key of
the documents it is requesting
physical
partition 3
5,000 RUs

How is data distributed?
{#}
Range of partition
addresses
Hashing
algorithm
Physical partitions
Data with
partition keys

{#}
Range of partition
addresses
Hashing
algorithm
Physical partitions
Data with
partition keys
Whenever a document is
inserted, the partition key
value will be checked and
assigned to a physical
partition
pk = 1

{#}
Range of partition
addresses
Hashing
algorithm
Physical partitions
Data with
partition keys
The item will be assigned to a
partition based on its
partitioning key.
pk = 1

{#}
Range of partition
addresses
Hashing
algorithm
Physical partitions
All partition key values will
be distributed amongst the
physical partitions
Data with
partition keys

{#}
Range of partition
addresses
Hashing
algorithm
Physical partitions
However, items with the
exact same partition key
value will be co-located
pk = 1
pk = 1

First scenario: Splitting partitions

Partitioning dynamics
Sri
Tim
Client application
(write)
Thomas
Scenario 1

Sri
Tim
Client application
(write)
Thomas
Scenario 1
All partitions are almost
full of data

Sri
Tim
Client application
(write)
Thomas
Scenario 1
In order to insert this
document, we need to
increase the total capacity

Sri
Tim
Client application
(write)
Thomas
Scenario 1
We have added a new
empty partition for the new
document

Sri
Tim
Client application
(write)
Thomas
Scenario 1
And now we will take the
largest partition and re-balance
it with the new one

Sri
Tim
Client application
(write)
Thomas
Scenario 1
Now that it's re-balanced, we
can keep inserting new data

Second scenario: Adding more throughput

All scale settings can
be modified using the
Data Explorer

All scale settings can
be modified using the
Data Explorer
They can also be modified
programmatically via the SDK
or Azure CLI

Throughput has a
lower and upper limit

Throughput has a
Lower limit is determined by
the current number of
physical partitions

Throughput has a
Lower limit is determined by
the current number of
physical partitions
Upper limit adds new
partitions

When the limit is set beyond the
current capacity, more physical
partitions will be added
This process can take a few
to several minutes

To do this, go to the Metrics
blade in the Azure Portal

Then select the Storage tab
and select your desired
container

An efficient partitioning strategy
has a close to even
distribution

has a close to even
distribution
An inefficient partitioning
strategy is the main source
of cost and performance
challenges

has a close to even
distribution
An inefficient partitioning
strategy is the main source
of cost and performance
challenges
A random partition key can
provide an even data
distribution

How to deal with multi-tenancy?

Database Account
(per tenant)
Container w/
Dedicated
Throughput
(per tenant)
Container w/
Shared Throughput
(per tenant)
Partition Key
(per tenant)
Isolation Knobs
Independent geo-replication
knobs
Multiple throughput knobs
(dedicated throughput –
eliminating noisy neighbors)
Independent throughput knobs
(dedicated throughput –
eliminating noisy neighbors)
Group tenants within database
account(s) based on regional needs
Share throughput across tenants
grouped by database
(great for lowering cost on “spiky”
tenants)
Easy management of tenants
(drop container when tenant leaves)
Mitigate noisy-neighbor blast radius
(group tenants by database)
Share throughput across tenants
grouped by container
(great for lowering cost on “spiky”
tenants)
Enables easy queries across tenants
(containers act as boundary for queries)
Mitigate noisy-neighbor blast radius
(group tenants by container)
Throughput
requirements
>400 RUs per Tenant
(> $24 per tenant)
>400 RUs per Tenant
(> $24 per tenant)
>100 RUs per Tenant
(> $6 per tenant)
>0 RUs per Tenant
(> $0 per tenant)
T-Shirt Size
Large
Example: Premium offer for
B2B apps
Large
Example: Premium offer for B2B
apps
Medium
Example: Standard offer for B2B apps
Small
Example: B2C apps

Consistency Latency Availability

A
Atomicity
C
Consistency
I
Isolation
D
Durability

In the case of network Partitioning in a distributed
computer system, one has to choose between
Availability and Consistency, but Else, even when
the system is running normally in the absence of
partitions, one has to choose between Latency and
Consistency.

Demo
Read Latency with single region, vs multi-region

Region A
Region B
Region C
Azure
Traffic
Manager
Master
(read/write)
Master
(read/write)
Master
(read/write)
Master
(read/write)
Replica
(read)
Replica
(read)

Demo
Write latency for single-write vs. multi-write

Strong Bounded-staleness Session Consistent prefix Eventual

Consistency
Level
Quorum Reads Quorum Writes
Strong Local Minority (2 RU) Global Majority (1 RU)
Bounded
Staleness
Local Minority (2 RU) Local Majority (1 RU)
Session Single replica using
session token(1 RU)
Local Majority (1 RU)
Consistent Prefix Single replica (1 RU) Local Majority (1 RU)
Eventual Single replica (1 RU) Local Majority (1 RU)
forwarder
follower
follower

Demo
Consistency vs. Latency
Consistency vs. Throughput

Internet
Device
Traffic ManagerMobile
Browser
West US 2
Cosmos DB
Application
Gateway
Web Tier
Middle Tier
Load
Balancer
North
Europe
Cosmos DB
Application
Gateway
Web Tier
Middle Tier
Load
Balancer
Southeast
Asia
Cosmos DB
Application
Gateway
Web Tier
Middle Tier
Load
Balancer

Time
Lost Data Downtime
RPO Disaster RTO

Time
Lost Data Downtime
RPO Disaster RTO
Region(s) Mode Consistency RPO RTO
1 Any Any < 240 minutes < 1 week
>1 Single Master Session, Consistent Prefix, Eventual < 15 minutes < 15 minutes
>1 Single Master Bounded Staleness K & T* < 15 minutes
>1 Single Master Strong 0 < 15 minutes
>1 Multi Master Session, Consistent Prefix, Eventual < 15 minutes 0
>1 Multi Master Bounded Staleness K & T* 0
>1 Multi Master Strong N/A < 15 minutes
Partition
Yes
Availability Consistency
No
Latency Consistency
*Number of "K" updates of an item or "T" time. In >1 regions, K=100,000 updates or T=5 minutes.

Azure Cosmos DB’s schema-less service automatically indexes all
your data, regardless of the data model, to delivery blazing fast
queries.
Item Color
Microwave
safe
Liquid
capacity
CPU Memory Storage
Geek
mug
Graphite Yes 16ox ??? ??? ???
Coffee
Bean
mug
Tan No 12oz ??? ??? ???
Surface
book
Gray ??? ??? 3.4 GHz
Intel
Skylake
Core i7-
6600U
16GB 1 TB SSD
• Automatic index management
• Synchronous auto-indexing
• No schemas or secondary indices needed
• Works across every data model
GEEK

Custom Indexing Policies
Though all Azure Cosmos DB data is indexed by default,
you can specify a custom indexing policy for your
collections. Custom indexing policies allow you to design
and customize the shape of your index while maintaining
schema flexibility.
• Define trade-offs between storage, write and query
performance, and query consistency
• Include or exclude documents and paths to and from the
index
• Configure various index types
{
"automatic": true,
"indexingMode": "Consistent",
"includedPaths": [{
"path": "/*",
"indexes": [{
"kind": “Range",
"dataType": "String",
"precision": -1
}, {
"kind": "Range",
"dataType": "Number",
"precision": -1
}, {
"kind": "Spatial",
"dataType": "Point"
}]
}],
"excludedPaths": [{
"path": "/nonIndexedContent/*"
}]
}

{
"locations": [
{
"country": "Germany",
"city": "Berlin"
},
{
"country": "France",
"city": "Paris"
}
],
"headquarter": "Belgium",
"exports": [
{ "city": "Moscow" },
{ "city": "Athens" }
]
}
locations headquarter exports
0
country city
Germany Berlin
1
country city
France Paris
0 1
city
Athens
city
Moscow
Belgium

{
"locations": [
{
"country": "Germany",
"city": "Bonn",
"revenue": 200
}
],
"headquarter": "Italy",
"exports": [
{
"city": "Berlin",
"dealers": [
{ "name": "Hans" }
]
},
{ "city": "Athens" }
]
}
0
country city
Germany Bonn
revenue
200
0 1
citycity
Berlin
Italy
dealers
0
name
Hans

Athens
0
country city
Germany Bonn
revenue
200
0 1
citycity
Berlin
Italy
dealers
0
name
Hans
0
country city
Germany Berlin
1
country city
France Paris
0 1
city
Athens
city
Moscow
Belgium

0
country city
Germany
Berlin
revenue
200
0 1
city
Athens
city
Berlin
Italy
dealers
0
name
Hans
Bonn
1
country city
France Paris
Belgium
Moscow

{
"indexingMode": "none",
"automatic": false,
"includedPaths": [],
"excludedPaths": []
}
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/age/?",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
]
},
{
"path": "/gender/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
},
]
}
],
"excludedPaths": [
{
"path": "/*"
}
]
}

On-the-fly Index Changes
In Azure Cosmos DB, you can make changes to the
indexing policy of a collection on the fly. Changes can
affect the shape of the index, including paths,
precision values, and its consistency model.
A change in indexing policy effectively requires a
transformation of the old index into a new index.

Metrics Analysis
The SQL APIs provide information about performance metrics, such as the
index storage used and the throughput cost (request units) for every
operation. You can use this information to compare various indexing
policies, and for performance tuning.
When running a HEAD or GET request against a collection resource, the
x-ms-request-quota and the x-ms-request-usage headers provide the
storage quota and usage of the collection.
You can use this information to compare various indexing policies,
and for performance tuning.

Understand query patterns – which properties are being
used?
Understand impact on write cost – index update RU cost
scales with # properties

http://cosmosdb.com/
https://azure.microsoft.com/try/cosmosdb/
https://docs.microsoft.com/learn/paths/work-with-nosql-data-in-
azure-cosmos-db/
Resources

[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to [db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~

Similar to [db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~ (20)

More from Naoki (Neo) SATO

More from Naoki (Neo) SATO (20)

Recently uploaded

Recently uploaded (20)

[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~