Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Azure Cosmos DB – The Swiss Army
NoSQL Cloud Database
Steef-Jan Wiggers
https://nl.linkedin.com/in/steefjan
What can you expect?
• Introduction of Cosmos DB
• Scenarios
• Some In-depth discussion of capabilities (Scale, Consistenc...
Introduction
Evolution
Service
Customers
Scenarios
Azure Cosmos DB Evolution
Originally started to address the
problems faced by large scale apps
inside Microsoft
• Built fr...
Cosmos DB Service
5
Column-family
Document
Graph
Turnkey global distribution
Elastic scale out
of storage & throughput
Gua...
Customers
Scenario - Telemetry&SensorData
Azure IoT Hub
Apache Storm on
Azure HDInsight
Azure Cosmos DB
(telemetry and
device state)...
Scenario – Order processing
Azure Functions
(E-Commerce Checkout API)
Azure Cosmos DB
(Order Event Store)
Azure Functions
...
Scenario – Multiplayer gaming
Azure Cosmos DB
(game database)
Azure HDInsight
(game analytics)
Azure CDN
Azure API Apps
(g...
Capabilities
Global distribution
Resource model
Scale
Consistency
Indexing
Turnkey Global Distribution
High Availability
• Automatic and Manual Failover
• Multi-homing API removes need for app rede...
Global Distribution
• Automatic and transparent replication
worldwide
• Each partition hosts a replica set per
region
• Cu...
Replication globally
Resource model
• Resources identified by their logical
and stable URI
• Hierarchical overlay over horizontally
partitioned...
Account URI and Credentials
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
********.azure...
Creating Account
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
Database representation
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
DatabaseDatabaseCo...
Container representation
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
= Collection Grap...
Creating collection – SQL API
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem
Container level resources
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem ConflictSproc Tr...
System topology (behind the scenes)
Resource Hierarchy
Containers
Resource Partitions
CollectionsTables Graphs
Tenants
CONTAINERS
Logical resources “surfaced”...
Scale
• Pay as go for storage and throughput
• Elastic scale across regions
• Partitions
Horizontal Scaling
• Containers are horizontally
partitioned
• Each partition made highly available
via a replica set
• Pa...
Elastic Scale Out of Storage and
Throughput
SCALES AS YOUR APPS’ NEEDS CHANGE
• Database elastically scales storage and
th...
Partitions
Cosmos DB Container
(e.g. Collection)
Partition Key: User ID
Logical Partitioning Abstraction
Behind the Scenes...
Partitions
…
Partition 1 Partition 2 Partition n
Frugal # of Partitions based on actual storage and throughput needs
(yiel...
Partitions
Best Practices: Design Goals for Choosing a Good Partition Key
Distribute the overall request + storage volume
...
Consistency
• Global distribution forces us to navigate the CAP
theorem
• Writing correct distributed applications is hard...
Programmable data consistency
Strong consistency
High latency
Eventual consistency,
Low latency
Well-defined consistency models
• Intuitive programming model
• 5 Well-defined, consistency models
• Overridable on a per-...
5 Well-defined, consistency models
PACELC Theorem: In the case of network partitioning (P) in a distributed computer
syste...
Consistency levels
Choose the right consistency
• Five well-defined, consistency models
• Overridable on a per-request basis
• Provides contr...
Handle any data with no schema or
indexing required
Azure Cosmos DB’s schema-less service automatically indexes all your d...
Indexing Json Documents
{
"locations": [
{
"country": "Germany",
"city": "Berlin"
},
{
"country": "France",
"city": "Paris...
Indexing Json Documents
{
"locations": [
{
"country": "Germany",
"city": "Bonn",
"revenue": 200
}
],
"headquarter": "Italy...
Indexing Json Documents
locations headquarter exports
0
country city
Germany Bonn
revenue
200
0 1
citycity
Berlin
Italy
de...
Inverted Index
locations headquarter exports
0
country city
Germany
Berlin
revenue
200
0 1
city
Athens
city
Berlin
Italy
d...
Index Policies
CUSTOM INDEXING POLICIES
Though all Azure Cosmos DB data is indexed by default, you
can specify a custom in...
Swiss Army
Knife
Multi-model + Multi API
Reference Case
Demos
Multi-model + multi-API
• Different models:
• Graph
• Key-Value
• Document DB
• No schema or index management
• Automatic ...
Mimic strategy
• MongoDB
• Cassandra
What model?
Graph model
A graph is a structure that's composed of vertices and
edges. Both vertices and edges can have an arbitrary
nu...
Demo - Graph
Use case – Cosmos DB Graph
https://customers.microsoft.com/en-in/story/reed-business-information-professional-services-azu...
Document (JSON)
• A schema-less JSON database engine with rich SQL querying capabilities.
• Store documents
• Searchable b...
Demo - Home Automation
data
data
data
data
start/stop
data
start/stop
start/stop
Cosmos DB
Event Hubs
IoT Hub
Functions
da...
Logic App Cosmos DB Connector
• Cosmos DB connector is a standard connector.
• Several operation including create and upda...
Demo – Data Ingestion
Convert Epoch to
DateTime
Ingest
Pull Push Pull Push
Function A
Store in Collection
More…
Request Units
Latency
SLA
Request Units
• Request Units (RU) is a rate-based
currency
• Abstracts physical resources for
performing requests
• Key t...
Request Units
• Normalized across various access
methods
• 1 RU = 1 read of 1 KB document
• Each request consumes fixed RU...
Request Units
• Provisioned in terms of RU/sec and RU/min
granularities
• Rate limiting based on amount of throughput
prov...
Latency
• Simultaneously read/write
• Latch free and write-optimized
• 10-ms latency on read
• 15-ms latency on writes
~ 1...
Performance Demo
Rules
Subscription0
Subscription1
Subscription2
Subscription3
Subscription4
Cosmos DB
Database
Document C...
SLA
• Fully managed service
• 99,99% SLA for latency
• Guaranteed throughput,
consistency and high
availability
Key Takeaways
• Multiple options with models and API’s
• Various consistency models
• Global scale
• Flexible through put
...
Call to action
• Documentation: https://docs.microsoft.com/en-us/azure/
• Middleware Friday: https://www.youtube.com/watch...
Thanks for watching!
facebook.com/BizTalk360
twitter.com/BizTalk360
http://www.biztalk360.com
Upcoming SlideShare
Loading in …5
×

Azure Cosmos DB - The Swiss Army NoSQL Cloud Database

228 views

Published on

Microsoft Cosmos DB is the Swiss army NoSQL database in the cloud. It is a multi-model, multi-API, globally-distributed, highly-available, and secure No-SQL database in Azure. In this session, we will explore its capabilities and features through several demos.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Azure Cosmos DB - The Swiss Army NoSQL Cloud Database

  1. 1. Azure Cosmos DB – The Swiss Army NoSQL Cloud Database Steef-Jan Wiggers https://nl.linkedin.com/in/steefjan
  2. 2. What can you expect? • Introduction of Cosmos DB • Scenarios • Some In-depth discussion of capabilities (Scale, Consistency, Indexing, Models, API) • Reference case • RU/s, Latency, SLA • Demos
  3. 3. Introduction Evolution Service Customers Scenarios
  4. 4. Azure Cosmos DB Evolution Originally started to address the problems faced by large scale apps inside Microsoft • Built from the ground up for the cloud • Used extensively inside Microsoft • One of the fastest growing services on Azure 2010 2014 2015 2017 DocumentDB Cosmos DBProject Florence
  5. 5. Cosmos DB Service 5 Column-family Document Graph Turnkey global distribution Elastic scale out of storage & throughput Guaranteed low latency at the 99th percentile Comprehensive SLAs Five well-defined consistency models Table API Key-value MongoDB API A globally distributed, massively scalable, multi-model database service Cassandra API
  6. 6. Customers
  7. 7. Scenario - Telemetry&SensorData Azure IoT Hub Apache Storm on Azure HDInsight Azure Cosmos DB (telemetry and device state) events Azure Web Jobs (Change feed processor) Azure Function latest state Azure Data Lake (archival)
  8. 8. Scenario – Order processing Azure Functions (E-Commerce Checkout API) Azure Cosmos DB (Order Event Store) Azure Functions (Microservice 1: Tax) Azure Functions (Microservice 2: Payment) Azure Functions (Microservice N: Fufillment) . . .
  9. 9. Scenario – Multiplayer gaming Azure Cosmos DB (game database) Azure HDInsight (game analytics) Azure CDN Azure API Apps (game backend) Azure Storage (game files) Azure Notification Hubs (push notifications) Azure Functions Azure Traffic Manager
  10. 10. Capabilities Global distribution Resource model Scale Consistency Indexing
  11. 11. Turnkey Global Distribution High Availability • Automatic and Manual Failover • Multi-homing API removes need for app redeployment Low Latency (anywhere in the world) • Packets cannot move fast than the speed of light • Sending a packet across the world under ideal network conditions takes 100’s of milliseconds • You can cheat the speed of light – using data locality • CDN’s solved this for static content • Azure Cosmos DB solves this for dynamic content
  12. 12. Global Distribution • Automatic and transparent replication worldwide • Each partition hosts a replica set per region • Customers can test end to end application availability by programmatically simulating failovers • All regions are hidden behind a single global URI with multi-homing capabilities • Customers can dynamically add / remove additional regions at any time Writes/ Reads Reads "airport" : “AMS" "airport" : “MEL" West US Container "airport" : "LAX" Local Distribution (via horizontal partitioning) GlobalDistribution(ofresourcepartitions) Reads 30K transactions/sec Writes/ Reads Reads Reads West Europ 30K transactions/sec Partition-key = "airport"
  13. 13. Replication globally
  14. 14. Resource model • Resources identified by their logical and stable URI • Hierarchical overlay over horizontally partitioned entities; spanning machines, clusters and regions • Extensible custom projections based on specific type of API interface • Stateless interaction (HTTP and TCP) Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem
  15. 15. Account URI and Credentials Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem ********.azure.com IGeAvVUp …
  16. 16. Creating Account Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem
  17. 17. Database representation Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem DatabaseDatabaseContainer DatabaseDatabaseItem
  18. 18. Container representation Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem = Collection Graph Table
  19. 19. Creating collection – SQL API Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem
  20. 20. Container level resources Account DatabaseDatabaseDatabase DatabaseDatabaseContainer DatabaseDatabaseItem ConflictSproc Trigger UDF
  21. 21. System topology (behind the scenes)
  22. 22. Resource Hierarchy Containers Resource Partitions CollectionsTables Graphs Tenants CONTAINERS Logical resources “surfaced” to APIs as tables, collections or graphs, which are made up of one or more physical partitions or servers. RESOURCE PARTITIONS Consistent, highly available, and resource- governed coordination primitives Consist of replica sets, with each replica hosting an instance of the database engine Leader Follower Follower Forwarder Replica Set To remote resource partition(s)
  23. 23. Scale • Pay as go for storage and throughput • Elastic scale across regions • Partitions
  24. 24. Horizontal Scaling • Containers are horizontally partitioned • Each partition made highly available via a replica set • Partition management is transparent and highly responsive • Partitioning scheme is dictated by a “partition-key” (partition-key = “airport”) Replica set Resource Partitions {"airport":“DUB"} Container = Collection Graph Table
  25. 25. Elastic Scale Out of Storage and Throughput SCALES AS YOUR APPS’ NEEDS CHANGE • Database elastically scales storage and throughput • How? Scale-out! • Collections can span across large clusters of machines • Can start small and seamlessly grow as your app grows
  26. 26. Partitions Cosmos DB Container (e.g. Collection) Partition Key: User ID Logical Partitioning Abstraction Behind the Scenes: Physical Partition Sets hash(User ID) Psuedo-random distribution of data over range of possible hashed values
  27. 27. Partitions … Partition 1 Partition 2 Partition n Frugal # of Partitions based on actual storage and throughput needs (yielding scalability with low total cost of ownership) hash(User ID) Pseudo-random distribution of data over range of possible hashed values Andrew Mike … Bob Dharma Shireesh Karthik Rimma Alice Carol …
  28. 28. Partitions Best Practices: Design Goals for Choosing a Good Partition Key Distribute the overall request + storage volume Avoid “hot” partition keys Steps for Success • Ballpark scale needs (size/throughput) • Understand the workload • # of reads/sec vs writes per sec • Use pareto principal (80/20 rule) to help optimize bulk of workload • For reads – understand top 3-5 queries (look for common filters) • For writes – understand transactional needs General Tips • Build a POC to strengthen your understanding of the workload and iterate (avoid analyses paralysis) • Don’t be afraid of having too many partition keys • Partitions keys are logical • More partition keys more scalability • Partition Key is scope for multi-record transactions and routing queries • Queries can be intelligently routed via partition key • Omitting partition key on query requires fan-out
  29. 29. Consistency • Global distribution forces us to navigate the CAP theorem • Writing correct distributed applications is hard • Five well-defined consistency levels • Intuitive and practical with clear PACELC tradeoffs • Programmatically change at anytime • Can be overridden on a per-request basis
  30. 30. Programmable data consistency Strong consistency High latency Eventual consistency, Low latency
  31. 31. Well-defined consistency models • Intuitive programming model • 5 Well-defined, consistency models • Overridable on a per-request basis Clear tradeoffs • Latency • Availability • Throughput
  32. 32. 5 Well-defined, consistency models PACELC Theorem: In the case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C) (as per the CAP theorem), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and consistency (C).
  33. 33. Consistency levels
  34. 34. Choose the right consistency • Five well-defined, consistency models • Overridable on a per-request basis • Provides control over performance-consistency tradeoffs, backed by comprehensive SLAs. • An intuitive programming model offering low latency and high availability for your planet-scale app. CLEAR TRADEOFFS • Latency • Availability • Throughput Strong Bounded-staleness Session Consistent prefix Eventual
  35. 35. Handle any data with no schema or indexing required Azure Cosmos DB’s schema-less service automatically indexes all your data, regardless of the data model, to delivery blazing fast queries. • Automatic index management • Synchronous auto-indexing • No schemas or secondary indices needed • Works across every data model
  36. 36. Indexing Json Documents { "locations": [ { "country": "Germany", "city": "Berlin" }, { "country": "France", "city": "Paris" } ], "headquarter": "Belgium", "exports": [ { "city": "Moscow" }, { "city": "Athens" } ] } locations headquarter exports 0 country city Germany Berlin 1 country city France Paris 0 1 city Athens city Moscow Belgium
  37. 37. Indexing Json Documents { "locations": [ { "country": "Germany", "city": "Bonn", "revenue": 200 } ], "headquarter": "Italy", "exports": [ { "city": "Berlin", "dealers": [ { "name": "Hans" } ] }, { "city": "Athens" } ] } locations headquarter exports 0 country city Germany Bonn revenue 200 0 1 citycity Berlin Italy dealers 0 name Hans
  38. 38. Indexing Json Documents locations headquarter exports 0 country city Germany Bonn revenue 200 0 1 citycity Berlin Italy dealers 0 name Hans locations headquarter exports 0 country city Germany Berlin 1 country city France Paris 0 1 city Athens city Moscow Belgium
  39. 39. Inverted Index locations headquarter exports 0 country city Germany Berlin revenue 200 0 1 city Athens city Berlin Italy dealers 0 name Hans Bonn 1 country city France Paris Belgium Moscow
  40. 40. Index Policies CUSTOM INDEXING POLICIES Though all Azure Cosmos DB data is indexed by default, you can specify a custom indexing policy for your collections. Custom indexing policies allow you to design and customize the shape of your index while maintaining schema flexibility. • Define trade-offs between storage, write and query performance, and query consistency • Include or exclude documents and paths to and from the index • Configure various index types { "automatic": true, "indexingMode": "Consistent", "includedPaths": [{ "path": "/*", "indexes": [{ "kind": "Hash", "dataType": "String", "precision": -1 }, { "kind": "Range", "dataType": "Number", "precision": -1 }, { "kind": "Spatial", "dataType": "Point" }] }], "excludedPaths": [{ "path": "/nonIndexedContent/*" }] }
  41. 41. Swiss Army Knife Multi-model + Multi API Reference Case Demos
  42. 42. Multi-model + multi-API • Different models: • Graph • Key-Value • Document DB • No schema or index management • Automatic indexing • API support: • SQL • JavaScript • Gremlin • MongoDB • Azure Table Storage • Cassandra
  43. 43. Mimic strategy • MongoDB • Cassandra
  44. 44. What model?
  45. 45. Graph model A graph is a structure that's composed of vertices and edges. Both vertices and edges can have an arbitrary number of properties. • Vertices - Vertices denote discrete objects, such as a person, a place, or an event. • Edges - Edges denote relationships between vertices. For example, a person might know another person, be involved in an event, and recently been at a location. • Properties - Properties express information about the vertices and edges.
  46. 46. Demo - Graph
  47. 47. Use case – Cosmos DB Graph https://customers.microsoft.com/en-in/story/reed-business-information-professional-services-azure
  48. 48. Document (JSON) • A schema-less JSON database engine with rich SQL querying capabilities. • Store documents • Searchable by integrating with Azure Search • Easy integration with Azure Functions • Change Feed
  49. 49. Demo - Home Automation data data data data start/stop data start/stop start/stop Cosmos DB Event Hubs IoT Hub Functions data data notify
  50. 50. Logic App Cosmos DB Connector • Cosmos DB connector is a standard connector. • Several operation including create and update document. • Note: The maximum size of a document that is supported by the Document DB (Azure Cosmos DB) connector is 2 MB.
  51. 51. Demo – Data Ingestion Convert Epoch to DateTime Ingest Pull Push Pull Push Function A Store in Collection
  52. 52. More… Request Units Latency SLA
  53. 53. Request Units • Request Units (RU) is a rate-based currency • Abstracts physical resources for performing requests • Key to multi-tenancy, SLAs, and COGS efficiency • Foreground and background activities % IOPS % CPU % Memory
  54. 54. Request Units • Normalized across various access methods • 1 RU = 1 read of 1 KB document • Each request consumes fixed RUs • Applies to reads, writes, queries, and stored procedure execution GET POST PUT Query
  55. 55. Request Units • Provisioned in terms of RU/sec and RU/min granularities • Rate limiting based on amount of throughput provisioned • Can be increased or decreased instantaneously • Metered Hourly • Background processes like TTL expiration, index transformations scheduled when quiescent Min RU/sec Max RU/sec IncomingRequests Replica Quiescent Rate limit No throttling
  56. 56. Latency • Simultaneously read/write • Latch free and write-optimized • 10-ms latency on read • 15-ms latency on writes ~ 100 reads per second ~ 70 writes per second
  57. 57. Performance Demo Rules Subscription0 Subscription1 Subscription2 Subscription3 Subscription4 Cosmos DB Database Document Collection
  58. 58. SLA • Fully managed service • 99,99% SLA for latency • Guaranteed throughput, consistency and high availability
  59. 59. Key Takeaways • Multiple options with models and API’s • Various consistency models • Global scale • Flexible through put • Support for diverse scenario’s
  60. 60. Call to action • Documentation: https://docs.microsoft.com/en-us/azure/ • Middleware Friday: https://www.youtube.com/watch?v=ZplZgOoGUzY • Graph demo: https://github.com/anthonychu/cosmosdb-gremlin-flights • Pluralsight: https://www.pluralsight.com/courses/azure-cosmos-db • Channel 9: https://channel9.msdn.com/Events/Build/2018/BRK3319
  61. 61. Thanks for watching! facebook.com/BizTalk360 twitter.com/BizTalk360 http://www.biztalk360.com

×