SlideShare a Scribd company logo
1 of 40
Download to read offline
Cosmos DB - NoSQL strikes back
Introduction to the dark side of data
Andre Essing
Sponsors
Many thanks to our sponsors, without
whom such an event would not be possible.
Sponsors
Many thanks to our sponsors, without whom such an
event would not be possible.
Andre Essing
Technology Solutions Professional
Microsoft Deutschland GmbH
Andre advises customers in topics all around the
Microsoft Data Platform. Since version 7.0, Andre
gathering experience with the SQL Server product
family. Today Andre concentrates on working with
data in the cloud, like Modern Data Warehouse
architectures, Artificial Intelligence and new scalable
database systems like Azure Cosmos DB.
aessingandre.essing@microsoft.com /aessing @aessingandreessing.de
W H AT I S N O S Q L
NOSQL, BUILT FOR SIMPLE AND FAST APPLICATION
DEVELOPMENT
NoSQL, referring most times to “Non-SQL”, “Not Only SQL” or
also “non-relational” is a kind of database where the data is
modeled differently to relational systems.
• Different kinds available
• Document
• Key/Value
• Columnar
• Graph
• etc.
• Non-Relational
• Schema agnostic
• Built for scale and performance
• Different consistency model
D I F F E R E N T WAY S O F S TO R I N G D ATA W I T H Y O U R
M O D E R N A P P
Come as you are
Data normalization
SQL
MongoDB
Table API
Turnkey global
distribution
Elastic scale out
of storage & throughput
Guaranteed low latency
at the 99th percentile
Comprehensive
SLAs
Five well-defined
consistency models
A Z U R E C O S M O S D B
DocumentColumn-family
Key-value Graph
A globally distributed, massively scalable, multi-model database service
Leveraging Azure Cosmos DB to automatically scale
your data across the globe
This module will reference partitioning in the context
of all Azure Cosmos DB modules and APIs.
R E S O U R C E M O D E L
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
A C C O U N T U R I A N D C R E D E N T I A L S
********.azure.com
IGeAvVUp …
D ATA B A S E R E P R E S E N TAT I O N S
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
DatabaseDatabaseUsers
DatabaseDatabasePermission
C O N TA I N E R R E P R E S E N TAT I O N S
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
= Collection Graph Table
C O N TA I N E R - L E V E L R E S O U R C E S
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem ConflictSproc Trigger UDF
S Y S T E M TO P O LO G Y ( B E H I N D T H E S C E N E S )
Resource
Manager
Language
Runtime(s)
Hosts
Query
Processor
RSM
Index Manager
Bw-tree++/ LLAMA++
Log Manager
IO Manager
Resource Governor
Transport
Database engine
Admission control
…
…
Planet Earth Azure regions Datacenters Stamps Fault domains
Cluster Machine Replica Database engine
Container
Various agents
R E S O U R C E H I E R A R C H Y
CONTAINERS
Logical resources “surfaced” to APIs as tables,
collections or graphs, which are made up of one or
more physical partitions or servers.
RESOURCE PARTITIONS
• Consistent, highly available, and resource-governed
coordination primitives
• Consist of replica sets, with each replica hosting an
instance of the database engine
Containers
Resource Partitions
CollectionsTables Graphs
Tenants
Leader
Follower
Follower
Forwarder
Replica Set
To remote resource partition(s)
Request Units (RUs) is a rate-based currency
Abstracts physical resources for performing requests
Key to multi-tenancy, SLAs, and COGS efficiency
Foreground and background activities
R E Q U E S T U N I T S
% IOPS% CPU% Memory
R E Q U E S T U N I T S
Normalized across various access methods
1 RU = 1 read of 1 KB document
Each request consumes fixed RUs
Applies to reads, writes, queries, and stored procedure
execution
GET
POST
PUT
Query
…
=
=
=
=
R E Q U E S T U N I T S
Normalized across various access methods
1 read of 1 KB document from a single partition
Each request consumes fixed RUs
Applies to reads, writes, queries, and stored procedure
execution
Provisioned in terms of RU/sec
Rate limiting based on amount of throughput provisioned
Can be increased or decreased instantaneously
Metered Hourly
Background processes like TTL expiration, index
transformations scheduled when quiescent
Min RU/sec
Max RU/sec
IncomingRequests
Replica Quiescent
Rate limit
No rate limiting
E L A S T I C S C A L E O U T O F S TO R A G E A N D T H R O U G H P U T
SCALES AS YOUR APPS’ NEEDS CHANGE
Independently and elastically scale storage and
throughput across regions – even during unpredictable
traffic bursts – with a database that adapts to your
app’s needs.
• Elastically scale throughput from 10 to
100s of millions of requests/sec across
multiple regions
• Support for requests/sec for different
workloads
• Pay only for the throughput and
storage you need
Leveraging Azure Cosmos DB to automatically scale
your data across the globe
This module will reference partitioning in the context
of all Azure Cosmos DB modules and APIs.
PA R T I T I O N I N G
PA R T I T I O N S
Cosmos DB Container
(e.g. Collection)
Partition Key: City
Logical Partitioning Abstraction
Behind the Scenes:
Physical Partition Sets
hash(City)
Psuedo-random distribution of data over range of possible hashed values
PA R T I T I O N S
…
Partition 1 Partition 2 Partition n
Frugal # of Partitions based on actual storage and throughput needs
(yielding scalability with low total cost of ownership)
hash(City)
Pseudo-random distribution of data over range of possible hashed values
Cologne
Hamburg
…
Munich
Stuttgart
Berlin
Leipzig
Bremen
Frankfurt
Dresden
…
PA R T I T I O N S
What happens when partitions need to grow?
hash(City)
Pseudo-random distribution of data over range of possible hashed values
…
Partition 1 Partition 2 Partition n
Cologne
Hamburg
…
Munich
Stuttgart
Berlin
Leipzig
Bremen
Frankfurt
Dresden
…
PA R T I T I O N S
Partition Ranges can be dynamically sub-divided to seamlessly
grow database as the application grows while simultaneously
maintaining high availability.
Partition management is fully managed by Azure Cosmos DB,
so you don't have to write code or manage your partitions.
+
Partition x Partition x1 Partition x2
hash(User ID)
Pseudo-random distribution of data over range of possible hashed values
Stuttgart
Berlin
…
Cologne
Hamburg
Stuttgart
Berlin
Leipzig
Dresden
…
Cologne
Hamburg
…
PA R T I T I O N S
Best Practices: Design Goals for Choosing a Good Partition Key
• Distribute the overall request + storage volume
• Avoid “hot” partition keys
Steps for Success
• Ballpark scale needs (size/throughput)
• Understand the workload
• # of reads/sec vs writes per sec
• Use pareto principal (80/20 rule) to help optimize bulk of workload
• For reads – understand top 3-5 queries (look for common filters)
• For writes – understand transactional needs
General Tips
• Build a POC to strengthen your understanding of the workload and
iterate (avoid analyses paralysis)
• Don’t be afraid of having too many partition keys
• Partitions keys are logical
• More partition keys = more scalability
• Partition Key is scope for multi-record transactions and routing queries
• Queries can be intelligently routed via partition key
• Omitting partition key on query requires fan-out
High Availability
• Automatic and Manual Failover
• Multi-homing API removes need for app
redeployment
Low Latency (anywhere in the world)
• Packets cannot move fast than the speed of light
• Sending a packet across the world under ideal
network conditions takes 100’s of milliseconds
• You can cheat the speed of light – using data
locality
• CDN’s solved this for static content
• Azure Cosmos DB solves this for dynamic content
T U R N K E Y G LO B A L
D I S T R I B U T I O N
T U R N K E Y G LO B A L D I S T R I B U T I O N
• Automatic and transparent replication worldwide
• Each partition hosts a replica set per region
• Customers can test end to end application
availability by programmatically simulating failovers
• All regions are hidden behind a single global URI
with multi-homing capabilities
• Customers can dynamically add / remove
additional regions at any time
Writes/
Reads
Reads
"airport" : “AMS" "airport" : “MEL"
West US
Container
"airport" : "LAX"
Local Distribution (via horizontal partitioning)
GlobalDistribution(ofresourcepartitions)
Reads
30K transactions/sec
Writes/
Reads
Reads
Reads
West Europe
30K transactions/sec
Partition-key = "airport"
R E P L I C AT I N G D ATA G LO B A L LY
R E P L I C AT I N G D ATA G LO B A L LY
Strong Bounded-staleness Session Consistent prefix Eventual
F I V E W E L L - D E F I N E D C O N S I S T E N C Y M O D E L S
CHOOSE THE BEST CONSISTENCY MODEL FOR YOUR APP
Five well-defined, consistency models
Overridable on a per-request basis
Provides control over performance-consistency tradeoffs,
backed by comprehensive SLAs.
An intuitive programming model offering low latency and
high availability for your planet-scale app.
CLEAR TRADEOFFS
• Latency
• Availability
• Throughput
Azure Cosmos DB’s schema-less service automatically
indexes all your data, regardless of the data model, to
delivery blazing fast queries.
H A N D L E A N Y D ATA
W I T H N O S C H E M A O R
I N D E X I N G R E Q U I R E D
Item Color
Microwave
safe
Liquid
capacity
CPU Memory Storage
Geek
mug
Graphite Yes 16ox ??? ??? ???
Coffee
Bean
mug
Tan No 12oz ??? ??? ???
Surface
book
Gray ??? ??? 3.4 GHz
Intel
Skylake
Core i7-
6600U
16GB 1 TB SSD
• Automatic index management
• Synchronous auto-indexing
• No schemas or secondary indices needed
• Works across every data model
GEEK
I N D E X I N G J S O N D O C U M E N T S
{
"locations": [
{
"country": "Germany",
"city": "Berlin"
},
{
"country": "France",
"city": "Paris"
}
],
"headquarter": "Belgium",
"exports": [
{ "city": "Moscow" },
{ "city": "Athens" }
]
}
locations headquarter exports
0
country city
Germany Berlin
1
country city
France Paris
0 1
city
Athens
city
Moscow
Belgium
I N D E X I N G J S O N D O C U M E N T S
{
"locations": [
{
"country": "Germany",
"city": "Bonn",
"revenue": 200
}
],
"headquarter": "Italy",
"exports": [
{
"city": "Berlin",
"dealers": [
{ "name": "Hans" }
]
},
{ "city": "Athens" }
]
}
locations headquarter exports
0
country city
Germany Bonn
revenue
200
0 1
citycity
Berlin
Italy
dealers
0
name
Hans
I N D E X I N G J S O N D O C U M E N T S
Athens
locations headquarter exports
0
country city
Germany Bonn
revenue
200
0 1
citycity
Berlin
Italy
dealers
0
name
Hans
locations headquarter exports
0
country city
Germany Berlin
1
country city
France Paris
0 1
city
Athens
city
Moscow
Belgium
I N V E R T E D I N D E X
locations headquarter exports
0
country city
Germany
Berlin
revenue
200
0 1
city
Athens
city
Berlin
Italy
dealers
0
name
Hans
Bonn
1
country city
France Paris
Belgium
Moscow
I N D E X P O L I C I E S
CUSTOM INDEXING POLICIES
Though all Azure Cosmos DB data is indexed by default, you
can specify a custom indexing policy for your collections.
Custom indexing policies allow you to design and customize
the shape of your index while maintaining schema flexibility.
• Define trade-offs between storage, write and query
performance, and query consistency
• Include or exclude documents and paths to and from the
index
• Configure various index types
{
"automatic": true,
"indexingMode": "Consistent",
"includedPaths": [{
"path": "/*",
"indexes": [{
"kind": "Hash",
"dataType": "String",
"precision": -1
}, {
"kind": "Range",
"dataType": "Number",
"precision": -1
}, {
"kind": "Spatial",
"dataType": "Point"
}]
}],
"excludedPaths": [{
"path": "/nonIndexedContent/*"
}]
}
Some data produced by applications are only useful
for a finite period of time:
• Machine-generated event data
• Application log data
• User session information
It is important that the database system systematically
purges this data at pre-configured intervals.
S H O R T - L I F E T I M E D ATA
C H A N G E F E E D S C E N A R I O S
Many thanks to all volunteers!
Ben Kettner
Volker Bachmann
Cornelia Matthesius
Gabi Münster
Kai Gerlach
Alexander Klein
Björn Peters
Christa Kurschat
Christian Gräfe
Dirk Hondong
Dominik Petri
Henrik Schütze
Kai Michael Poppe
Klaus Betzing
Nadine Witthöft
Tobias Blödt
Rafael Dabrowski
Tanja Salwiczek
SQLSaturday #772 - Munich
27.10.2018
http://www.sqlsaturday.com/772
PASS Deutschland e.V.
For further information about future events, visit our
PASS Deutschland e.V. booth in the exhibitor area.

More Related Content

What's hot

Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Summit
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...Spark Summit
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
 
Ingesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsarIngesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsarTimothy Spann
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleScyllaDB
 
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningHandling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningSpark Summit
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensCitus Data
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Juan Pedro Moreno
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengDatabricks
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Josef A. Habdank
 
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Databricks
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkRahul Kumar
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Sid Anand
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Spark Summit
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...Nathan Bijnens
 

What's hot (20)

Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...
 
Ingesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsarIngesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsar
 
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions ScaleZeotap: Moving to ScyllaDB - A Graph of Billions Scale
Zeotap: Moving to ScyllaDB - A Graph of Billions Scale
 
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningHandling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui MengChallenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
Challenging Web-Scale Graph Analytics with Apache Spark with Xiangrui Meng
 
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
Extreme Apache Spark: how in 3 months we created a pipeline that can process ...
 
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
 
Reactive dashboard’s using apache spark
Reactive dashboard’s using apache sparkReactive dashboard’s using apache spark
Reactive dashboard’s using apache spark
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 

Similar to Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of your data)

Tour de France Azure PaaS 3/7 Stocker des informations
Tour de France Azure PaaS 3/7 Stocker des informationsTour de France Azure PaaS 3/7 Stocker des informations
Tour de France Azure PaaS 3/7 Stocker des informationsAlex Danvy
 
Azure Cosmos DB L100 Pitch Deck
Azure Cosmos DB L100 Pitch DeckAzure Cosmos DB L100 Pitch Deck
Azure Cosmos DB L100 Pitch DeckNicholas Vossburg
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseAzure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseBizTalk360
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADXRiccardo Zamana
 
AWS reInvent 2018 Recap - Solutions Updates Part 2
AWS reInvent 2018 Recap - Solutions Updates Part 2AWS reInvent 2018 Recap - Solutions Updates Part 2
AWS reInvent 2018 Recap - Solutions Updates Part 2Amazon Web Services
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFAmazon Web Services
 
Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB OverviewAndrew Liu
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in ActionAzure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in ActionDenys Chamberland
 
Keynote 1: AWS re:Invent 2017 Recap - Solutions Overview
Keynote 1: AWS re:Invent 2017 Recap - Solutions OverviewKeynote 1: AWS re:Invent 2017 Recap - Solutions Overview
Keynote 1: AWS re:Invent 2017 Recap - Solutions OverviewAmazon Web Services
 
Azure CosmosDb - Where we are
Azure CosmosDb - Where we areAzure CosmosDb - Where we are
Azure CosmosDb - Where we areMarco Parenzan
 
Azure CosmosDB the new frontier of big data and nosql
Azure CosmosDB the new frontier of big data and nosqlAzure CosmosDB the new frontier of big data and nosql
Azure CosmosDB the new frontier of big data and nosqlRiccardo Cappello
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Session: Modern Data WareHouse
Session: Modern Data WareHouseSession: Modern Data WareHouse
Session: Modern Data WareHouseKarina Matos
 
Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.Mohammad Asif
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개Ha-Yang(White) Moon
 

Similar to Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of your data) (20)

Technical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DBTechnical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DB
 
Tour de France Azure PaaS 3/7 Stocker des informations
Tour de France Azure PaaS 3/7 Stocker des informationsTour de France Azure PaaS 3/7 Stocker des informations
Tour de France Azure PaaS 3/7 Stocker des informations
 
Azure Cosmos DB L100 Pitch Deck
Azure Cosmos DB L100 Pitch DeckAzure Cosmos DB L100 Pitch Deck
Azure Cosmos DB L100 Pitch Deck
 
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud DatabaseAzure Cosmos DB - The Swiss Army NoSQL Cloud Database
Azure Cosmos DB - The Swiss Army NoSQL Cloud Database
 
Azure Comsos DB Use Cases
Azure Comsos DB Use CasesAzure Comsos DB Use Cases
Azure Comsos DB Use Cases
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
 
AWS reInvent 2018 Recap - Solutions Updates Part 2
AWS reInvent 2018 Recap - Solutions Updates Part 2AWS reInvent 2018 Recap - Solutions Updates Part 2
AWS reInvent 2018 Recap - Solutions Updates Part 2
 
Using Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SFUsing Data Lakes: Data Analytics Week SF
Using Data Lakes: Data Analytics Week SF
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB Overview
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Azure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in ActionAzure Cosmos DB + Gremlin API in Action
Azure Cosmos DB + Gremlin API in Action
 
Keynote 1: AWS re:Invent 2017 Recap - Solutions Overview
Keynote 1: AWS re:Invent 2017 Recap - Solutions OverviewKeynote 1: AWS re:Invent 2017 Recap - Solutions Overview
Keynote 1: AWS re:Invent 2017 Recap - Solutions Overview
 
Azure CosmosDb - Where we are
Azure CosmosDb - Where we areAzure CosmosDb - Where we are
Azure CosmosDb - Where we are
 
Azure CosmosDB the new frontier of big data and nosql
Azure CosmosDB the new frontier of big data and nosqlAzure CosmosDB the new frontier of big data and nosql
Azure CosmosDB the new frontier of big data and nosql
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Session: Modern Data WareHouse
Session: Modern Data WareHouseSession: Modern Data WareHouse
Session: Modern Data WareHouse
 
Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개
 

More from Andre Essing

Ready for take-off - How to get your databases into the cloud
Ready for take-off - How to get your databases into the cloudReady for take-off - How to get your databases into the cloud
Ready for take-off - How to get your databases into the cloudAndre Essing
 
SQL Server goes Linux - Hello, my name is Tux, I would like to join the #SQLF...
SQL Server goes Linux - Hello, my name is Tux, I would like to join the #SQLF...SQL Server goes Linux - Hello, my name is Tux, I would like to join the #SQLF...
SQL Server goes Linux - Hello, my name is Tux, I would like to join the #SQLF...Andre Essing
 
SQL Server Best Practices - Install SQL Server like a boss (RELOADED)
SQL Server Best Practices - Install SQL Server like a boss (RELOADED)SQL Server Best Practices - Install SQL Server like a boss (RELOADED)
SQL Server Best Practices - Install SQL Server like a boss (RELOADED)Andre Essing
 
SQL Server und Azure: Impulse für die Modernisierung der IT-Strategie
SQL Server und Azure: Impulse für die Modernisierung der IT-StrategieSQL Server und Azure: Impulse für die Modernisierung der IT-Strategie
SQL Server und Azure: Impulse für die Modernisierung der IT-StrategieAndre Essing
 
SQL Server Monitoring - Piloten fliegen auch nicht blind
SQL Server Monitoring - Piloten fliegen auch nicht blindSQL Server Monitoring - Piloten fliegen auch nicht blind
SQL Server Monitoring - Piloten fliegen auch nicht blindAndre Essing
 
SQL Server Release Management - SPs, CUs und CODs, ich verstehe nur Bahnhof
SQL Server Release Management - SPs, CUs und CODs, ich verstehe nur BahnhofSQL Server Release Management - SPs, CUs und CODs, ich verstehe nur Bahnhof
SQL Server Release Management - SPs, CUs und CODs, ich verstehe nur BahnhofAndre Essing
 

More from Andre Essing (6)

Ready for take-off - How to get your databases into the cloud
Ready for take-off - How to get your databases into the cloudReady for take-off - How to get your databases into the cloud
Ready for take-off - How to get your databases into the cloud
 
SQL Server goes Linux - Hello, my name is Tux, I would like to join the #SQLF...
SQL Server goes Linux - Hello, my name is Tux, I would like to join the #SQLF...SQL Server goes Linux - Hello, my name is Tux, I would like to join the #SQLF...
SQL Server goes Linux - Hello, my name is Tux, I would like to join the #SQLF...
 
SQL Server Best Practices - Install SQL Server like a boss (RELOADED)
SQL Server Best Practices - Install SQL Server like a boss (RELOADED)SQL Server Best Practices - Install SQL Server like a boss (RELOADED)
SQL Server Best Practices - Install SQL Server like a boss (RELOADED)
 
SQL Server und Azure: Impulse für die Modernisierung der IT-Strategie
SQL Server und Azure: Impulse für die Modernisierung der IT-StrategieSQL Server und Azure: Impulse für die Modernisierung der IT-Strategie
SQL Server und Azure: Impulse für die Modernisierung der IT-Strategie
 
SQL Server Monitoring - Piloten fliegen auch nicht blind
SQL Server Monitoring - Piloten fliegen auch nicht blindSQL Server Monitoring - Piloten fliegen auch nicht blind
SQL Server Monitoring - Piloten fliegen auch nicht blind
 
SQL Server Release Management - SPs, CUs und CODs, ich verstehe nur Bahnhof
SQL Server Release Management - SPs, CUs und CODs, ich verstehe nur BahnhofSQL Server Release Management - SPs, CUs und CODs, ich verstehe nur Bahnhof
SQL Server Release Management - SPs, CUs und CODs, ich verstehe nur Bahnhof
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of your data)

  • 1. Cosmos DB - NoSQL strikes back Introduction to the dark side of data Andre Essing
  • 2. Sponsors Many thanks to our sponsors, without whom such an event would not be possible.
  • 3. Sponsors Many thanks to our sponsors, without whom such an event would not be possible.
  • 4. Andre Essing Technology Solutions Professional Microsoft Deutschland GmbH Andre advises customers in topics all around the Microsoft Data Platform. Since version 7.0, Andre gathering experience with the SQL Server product family. Today Andre concentrates on working with data in the cloud, like Modern Data Warehouse architectures, Artificial Intelligence and new scalable database systems like Azure Cosmos DB. aessingandre.essing@microsoft.com /aessing @aessingandreessing.de
  • 5. W H AT I S N O S Q L NOSQL, BUILT FOR SIMPLE AND FAST APPLICATION DEVELOPMENT NoSQL, referring most times to “Non-SQL”, “Not Only SQL” or also “non-relational” is a kind of database where the data is modeled differently to relational systems. • Different kinds available • Document • Key/Value • Columnar • Graph • etc. • Non-Relational • Schema agnostic • Built for scale and performance • Different consistency model
  • 6. D I F F E R E N T WAY S O F S TO R I N G D ATA W I T H Y O U R M O D E R N A P P Come as you are Data normalization
  • 7. SQL MongoDB Table API Turnkey global distribution Elastic scale out of storage & throughput Guaranteed low latency at the 99th percentile Comprehensive SLAs Five well-defined consistency models A Z U R E C O S M O S D B DocumentColumn-family Key-value Graph A globally distributed, massively scalable, multi-model database service
  • 8. Leveraging Azure Cosmos DB to automatically scale your data across the globe This module will reference partitioning in the context of all Azure Cosmos DB modules and APIs. R E S O U R C E M O D E L Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem
  • 9. Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem A C C O U N T U R I A N D C R E D E N T I A L S ********.azure.com IGeAvVUp …
  • 10. D ATA B A S E R E P R E S E N TAT I O N S Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem DatabaseDatabaseUsers DatabaseDatabasePermission
  • 11. C O N TA I N E R R E P R E S E N TAT I O N S Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem = Collection Graph Table
  • 12. C O N TA I N E R - L E V E L R E S O U R C E S Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem ConflictSproc Trigger UDF
  • 13. S Y S T E M TO P O LO G Y ( B E H I N D T H E S C E N E S ) Resource Manager Language Runtime(s) Hosts Query Processor RSM Index Manager Bw-tree++/ LLAMA++ Log Manager IO Manager Resource Governor Transport Database engine Admission control … … Planet Earth Azure regions Datacenters Stamps Fault domains Cluster Machine Replica Database engine Container Various agents
  • 14. R E S O U R C E H I E R A R C H Y CONTAINERS Logical resources “surfaced” to APIs as tables, collections or graphs, which are made up of one or more physical partitions or servers. RESOURCE PARTITIONS • Consistent, highly available, and resource-governed coordination primitives • Consist of replica sets, with each replica hosting an instance of the database engine Containers Resource Partitions CollectionsTables Graphs Tenants Leader Follower Follower Forwarder Replica Set To remote resource partition(s)
  • 15. Request Units (RUs) is a rate-based currency Abstracts physical resources for performing requests Key to multi-tenancy, SLAs, and COGS efficiency Foreground and background activities R E Q U E S T U N I T S % IOPS% CPU% Memory
  • 16. R E Q U E S T U N I T S Normalized across various access methods 1 RU = 1 read of 1 KB document Each request consumes fixed RUs Applies to reads, writes, queries, and stored procedure execution GET POST PUT Query … = = = =
  • 17. R E Q U E S T U N I T S Normalized across various access methods 1 read of 1 KB document from a single partition Each request consumes fixed RUs Applies to reads, writes, queries, and stored procedure execution Provisioned in terms of RU/sec Rate limiting based on amount of throughput provisioned Can be increased or decreased instantaneously Metered Hourly Background processes like TTL expiration, index transformations scheduled when quiescent Min RU/sec Max RU/sec IncomingRequests Replica Quiescent Rate limit No rate limiting
  • 18. E L A S T I C S C A L E O U T O F S TO R A G E A N D T H R O U G H P U T SCALES AS YOUR APPS’ NEEDS CHANGE Independently and elastically scale storage and throughput across regions – even during unpredictable traffic bursts – with a database that adapts to your app’s needs. • Elastically scale throughput from 10 to 100s of millions of requests/sec across multiple regions • Support for requests/sec for different workloads • Pay only for the throughput and storage you need
  • 19. Leveraging Azure Cosmos DB to automatically scale your data across the globe This module will reference partitioning in the context of all Azure Cosmos DB modules and APIs. PA R T I T I O N I N G
  • 20. PA R T I T I O N S Cosmos DB Container (e.g. Collection) Partition Key: City Logical Partitioning Abstraction Behind the Scenes: Physical Partition Sets hash(City) Psuedo-random distribution of data over range of possible hashed values
  • 21. PA R T I T I O N S … Partition 1 Partition 2 Partition n Frugal # of Partitions based on actual storage and throughput needs (yielding scalability with low total cost of ownership) hash(City) Pseudo-random distribution of data over range of possible hashed values Cologne Hamburg … Munich Stuttgart Berlin Leipzig Bremen Frankfurt Dresden …
  • 22. PA R T I T I O N S What happens when partitions need to grow? hash(City) Pseudo-random distribution of data over range of possible hashed values … Partition 1 Partition 2 Partition n Cologne Hamburg … Munich Stuttgart Berlin Leipzig Bremen Frankfurt Dresden …
  • 23. PA R T I T I O N S Partition Ranges can be dynamically sub-divided to seamlessly grow database as the application grows while simultaneously maintaining high availability. Partition management is fully managed by Azure Cosmos DB, so you don't have to write code or manage your partitions. + Partition x Partition x1 Partition x2 hash(User ID) Pseudo-random distribution of data over range of possible hashed values Stuttgart Berlin … Cologne Hamburg Stuttgart Berlin Leipzig Dresden … Cologne Hamburg …
  • 24. PA R T I T I O N S Best Practices: Design Goals for Choosing a Good Partition Key • Distribute the overall request + storage volume • Avoid “hot” partition keys Steps for Success • Ballpark scale needs (size/throughput) • Understand the workload • # of reads/sec vs writes per sec • Use pareto principal (80/20 rule) to help optimize bulk of workload • For reads – understand top 3-5 queries (look for common filters) • For writes – understand transactional needs General Tips • Build a POC to strengthen your understanding of the workload and iterate (avoid analyses paralysis) • Don’t be afraid of having too many partition keys • Partitions keys are logical • More partition keys = more scalability • Partition Key is scope for multi-record transactions and routing queries • Queries can be intelligently routed via partition key • Omitting partition key on query requires fan-out
  • 25. High Availability • Automatic and Manual Failover • Multi-homing API removes need for app redeployment Low Latency (anywhere in the world) • Packets cannot move fast than the speed of light • Sending a packet across the world under ideal network conditions takes 100’s of milliseconds • You can cheat the speed of light – using data locality • CDN’s solved this for static content • Azure Cosmos DB solves this for dynamic content T U R N K E Y G LO B A L D I S T R I B U T I O N
  • 26. T U R N K E Y G LO B A L D I S T R I B U T I O N • Automatic and transparent replication worldwide • Each partition hosts a replica set per region • Customers can test end to end application availability by programmatically simulating failovers • All regions are hidden behind a single global URI with multi-homing capabilities • Customers can dynamically add / remove additional regions at any time Writes/ Reads Reads "airport" : “AMS" "airport" : “MEL" West US Container "airport" : "LAX" Local Distribution (via horizontal partitioning) GlobalDistribution(ofresourcepartitions) Reads 30K transactions/sec Writes/ Reads Reads Reads West Europe 30K transactions/sec Partition-key = "airport"
  • 27. R E P L I C AT I N G D ATA G LO B A L LY
  • 28. R E P L I C AT I N G D ATA G LO B A L LY
  • 29. Strong Bounded-staleness Session Consistent prefix Eventual F I V E W E L L - D E F I N E D C O N S I S T E N C Y M O D E L S CHOOSE THE BEST CONSISTENCY MODEL FOR YOUR APP Five well-defined, consistency models Overridable on a per-request basis Provides control over performance-consistency tradeoffs, backed by comprehensive SLAs. An intuitive programming model offering low latency and high availability for your planet-scale app. CLEAR TRADEOFFS • Latency • Availability • Throughput
  • 30. Azure Cosmos DB’s schema-less service automatically indexes all your data, regardless of the data model, to delivery blazing fast queries. H A N D L E A N Y D ATA W I T H N O S C H E M A O R I N D E X I N G R E Q U I R E D Item Color Microwave safe Liquid capacity CPU Memory Storage Geek mug Graphite Yes 16ox ??? ??? ??? Coffee Bean mug Tan No 12oz ??? ??? ??? Surface book Gray ??? ??? 3.4 GHz Intel Skylake Core i7- 6600U 16GB 1 TB SSD • Automatic index management • Synchronous auto-indexing • No schemas or secondary indices needed • Works across every data model GEEK
  • 31. I N D E X I N G J S O N D O C U M E N T S { "locations": [ { "country": "Germany", "city": "Berlin" }, { "country": "France", "city": "Paris" } ], "headquarter": "Belgium", "exports": [ { "city": "Moscow" }, { "city": "Athens" } ] } locations headquarter exports 0 country city Germany Berlin 1 country city France Paris 0 1 city Athens city Moscow Belgium
  • 32. I N D E X I N G J S O N D O C U M E N T S { "locations": [ { "country": "Germany", "city": "Bonn", "revenue": 200 } ], "headquarter": "Italy", "exports": [ { "city": "Berlin", "dealers": [ { "name": "Hans" } ] }, { "city": "Athens" } ] } locations headquarter exports 0 country city Germany Bonn revenue 200 0 1 citycity Berlin Italy dealers 0 name Hans
  • 33. I N D E X I N G J S O N D O C U M E N T S Athens locations headquarter exports 0 country city Germany Bonn revenue 200 0 1 citycity Berlin Italy dealers 0 name Hans locations headquarter exports 0 country city Germany Berlin 1 country city France Paris 0 1 city Athens city Moscow Belgium
  • 34. I N V E R T E D I N D E X locations headquarter exports 0 country city Germany Berlin revenue 200 0 1 city Athens city Berlin Italy dealers 0 name Hans Bonn 1 country city France Paris Belgium Moscow
  • 35. I N D E X P O L I C I E S CUSTOM INDEXING POLICIES Though all Azure Cosmos DB data is indexed by default, you can specify a custom indexing policy for your collections. Custom indexing policies allow you to design and customize the shape of your index while maintaining schema flexibility. • Define trade-offs between storage, write and query performance, and query consistency • Include or exclude documents and paths to and from the index • Configure various index types { "automatic": true, "indexingMode": "Consistent", "includedPaths": [{ "path": "/*", "indexes": [{ "kind": "Hash", "dataType": "String", "precision": -1 }, { "kind": "Range", "dataType": "Number", "precision": -1 }, { "kind": "Spatial", "dataType": "Point" }] }], "excludedPaths": [{ "path": "/nonIndexedContent/*" }] }
  • 36. Some data produced by applications are only useful for a finite period of time: • Machine-generated event data • Application log data • User session information It is important that the database system systematically purges this data at pre-configured intervals. S H O R T - L I F E T I M E D ATA
  • 37. C H A N G E F E E D S C E N A R I O S
  • 38. Many thanks to all volunteers! Ben Kettner Volker Bachmann Cornelia Matthesius Gabi Münster Kai Gerlach Alexander Klein Björn Peters Christa Kurschat Christian Gräfe Dirk Hondong Dominik Petri Henrik Schütze Kai Michael Poppe Klaus Betzing Nadine Witthöft Tobias Blödt Rafael Dabrowski Tanja Salwiczek
  • 39. SQLSaturday #772 - Munich 27.10.2018 http://www.sqlsaturday.com/772
  • 40. PASS Deutschland e.V. For further information about future events, visit our PASS Deutschland e.V. booth in the exhibitor area.