Azure Cosmos DB is Microsoft's globally distributed, multi-model database service. In this session we covered ,modeling of data using NOSQL cosmos database and how it's helpful for distributed application to maintain high availability ,scaling in multiple region and throughput.
5. Turnkey global distribution
Elastic scale out
of storage & throughput
Comprehensive SLAs
Guaranteed low latency at the 99th percentile
Five well-defined consistency models
Azure Cosmos DB
A globally distributed, massively scalable, multi-model database service
6. Turnkey global distribution
Elastic scale out
of storage & throughput
Comprehensive SLAs
Guaranteed low latency at the 99th percentile
Five well-defined consistency models
Azure Cosmos DB
A globally distributed, massively scalable, multi-model database service
Column-family Document
Graph
Key-value
7. Column-family Document
Graph
Turnkey global distribution
Elastic scale out
of storage & throughput
Comprehensive SLAs
Guaranteed low latency at the 99th percentile
Five well-defined consistency models
TableAPI
Key-value
Cosmos DB’s API for
MongoDB
Azure Cosmos DB
A globally distributed, massively scalable, multi-model database service
8. Features
• Multi-model data paradigm: key-value, document, graph, family of columns;
• Low latency for 99% of queries: less than 10 ms for read operations and less than 15 ms for
(indexed) write operations;
• Designed for high throughput;
• Ensures availability, consistency of data, delay at SLA level of 99.999%;
• Configurable throughput;
• Automatic replication (master-slave);
• Automatic data indexing;
• Configurable levels of consistency of data. Five different levels (Strong, Bounded Staleness,
Session, Consistent Prefix, Eventual);
11. CONTAINERS
Logical resources “surfaced” to APIs as tables,
collections or graphs, which are made up of one or
more physical partitions or servers.
Containers
Resource Partitions
CollectionsTables Graphs
Tenants
Follower
Follower
Leader
Forwarder
Replica Set
RESOURCE PARTITIONS
• Consistent, highly available, and resource-governed
coordination primitives
• Consist of replica sets, with each replica hosting an
instance of the database engine
To remote resource partition(s)
Resource Hierarchy
19. Ways to Model Your Data
Normalize everything
Embed as 1 piece
20. Data Modelling: Relational vs. Document
UserID Name Dob
1 John Smith 8/30/1964
StockID UserID Qty Symbol
1 1 100 MSFT
2 1 75 WMT
Document
{
"id": 1,
"name": "John Smith",
"dob": "1964-30-08",
"holdings": [
{ "qty": 100, "symbol": "MSFT" },
{ "qty": 75, "symbol": "WMT" }
]
}
User Table
Holdings Table
Relational Store Document Store
Rows Documents
Columns Properties
Strongly-typed schemas Schema-free
Highly normalized Typically denormalized
21. Modelling challenges
• How to de-normalize ?
• How to normalize ?
• To embed or reference ?
• Can I apply joins ?
• Should I put data types in same collection ,or different ?
23. When to embed ?
o Data that is queried together, should live together.
o Child data is dependent on parent.
o 1:1 relationship eg. All customer have email, phone, nric number for
1:1 relationship.
o Data doesn’t change that frequently eg. Email ,address don’t change too often.
o Usually embedding provides better read performance but trade-off for write performance,
So if we aren’t doing more write this approach will be good.
24. When to reference ?
o 1 : many (unbounded relationship)
o many : many relationships
o Data changes at different rates
o What is referenced, is heavily referenced by many others
o Typically provides better write performance
o But may require more network calls for reads
25. Why is choice of partition key so important?
o Enables your data in Cosmos DB to scale
o Large impact on performance of system
What can go wrong?
o Hot partitions
o Choice forces many cross-partition queries for workload
Partitioning
26. Logical partition: Stores all data associated with the same partition key value
Physical partition: Fixed amount of reserved SSD-backed storage + compute.
Cosmos DB distributes logical partitions among a smaller number of physical partitions.
From your perspective: define 1 partition key per container
Partitioning
27. Partition Key: User Id
Logical Partitioning Abstraction
Behind the Scenes:
Physical Partition Sets
hash(User Id)
Psuedo-random distribution of data over
range of possible hashed values
Cosmos DB Container (e.g. Collection)
28. hash(User Id)
….
Melvin
karen
…
Physical
Partition 1
Physical
Partition 2
Physical
Partition n
John
Dharma
Shireesh
Nilesh
Sukhi
Bob
Milton
…
Frugal # of Partitions based on actual storage and throughput needs
(yielding scalability with low total cost of ownership)
Range 1 Range 2 Range n
Physical Partition Sets
31. hash(User Id)
Partition Ranges can be dynamically sub-divided
To seamlessly grow database as the application grows
While sedulously maintaining high availability
Best of All:
Partition management is completely taken care of by the system
You don’t have to lift a finger… the database takes care of you.
Partition X
Dharma
Shireesh
Nilesh
Sukhi
Bob
Milton
…
+
Dharma
Shireesh
…
Partition X1
Nilesh
Sukhi
…
Partition X2
Range 1 Range 2 Range X1 Range X2
Physical Partition Sets
33. How do you ensure consistent reads across replicas?
- Define a consistency level
Replication within aregion
- Data moves extremely fast (typically, within1ms)between neighboring
racks
Global replication
- Ittakeshundreds of milliseconds to move data across continents
Strongerconsistency
Higherlatency
Loweravailability
Weakerconsistency
Lower latency Higher
availability
Replication and Consistency
34. Consistency Level Guarantees
Strong Linearizability (once operation is complete, it will be visible to all), No dirty reads
Bounded Staleness Consistent Prefix.
Reads lag behind writes by at most k prefixes or t interval (Dirty reads possible Bounded by
time and updates.)
Similar properties to strong consistency (except within staleness window), while preserving 99.99%
availability and low latency.
Session Consistent Prefix.
Within a session: Predictable consistency for a session, high read throughput + low latency
No dirty reads for writers (read your own writes),Dirty reads possible for other users
Consistent Prefix Reads will never see out of order writes (no gaps).
Eventual Potential for out of order reads. Lowest cost for reads of all consistency levels.
Well-Defined Consistency Models