Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~

334 views

Published on

[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~

https://satonaoki.wordpress.com/2019/09/30/dbts2019-azure-cosmos-db-deep-dive/

Published in: Software
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

[db tech showcase Tokyo 2019] Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~

  1. 1. Azure Cosmos DB Deep Dive ~ Partitioning, Global Distribution and Indexing ~ SATO Naoki (Neo) (@satonaoki) Azure Technologist, Microsoft
  2. 2. Agenda Overview Partitioning Strategies Global Distribution Indexing
  3. 3. Azure Cosmos DB Overview
  4. 4. Partitioning Strategies
  5. 5. Overview of partitioning
  6. 6. Overview of partitioning + container 15,000 RUs physical partition 1 7,500 RUs physical partition 2 7,500 RUs Client application (write) Another client application (read)
  7. 7. Overview of partitioning Client application (write) Another client application (read) Application writes data and provides a partition key value with every item + container 15,000 RUs physical partition 1 7,500 RUs physical partition 2 7,500 RUs
  8. 8. Overview of partitioning Client application (write) Another client application (read) Cosmos DB uses partition key value to route data to a partition + container 15,000 RUs physical partition 1 7,500 RUs physical partition 2 7,500 RUs
  9. 9. Overview of partitioning + Client application (write) Another client application (read) Every partition can store up to 50GB of data and serve up to 10,000 RU/s container 15,000 RUs physical partition 1 7,500 RUs physical partition 2 7,500 RUs
  10. 10. Overview of partitioning + Client application (write) Another client application (read) The total throughput for the container will be divided evenly across all partitions container 15,000 RUs physical partition 1 7,500 RUs physical partition 2 7,500 RUs
  11. 11. Overview of partitioning container 15,000 RUs physical partition 1 5,000 RUs physical partition 2 5,000 RUs Client application (write) Another client application (read) If more data or throughput is needed, Cosmos DB will add a new partition automatically physical partition 3 5,000 RUs
  12. 12. Overview of partitioning container 15,000 RUs physical partition 1 5,000 RUs physical partition 2 5,000 RUs Client application (write) Another client application (read) The data will be redistributed as a result physical partition 3 5,000 RUs
  13. 13. Overview of partitioning container 15,000 RUs physical partition 1 5,000 RUs physical partition 2 5,000 RUs Client application (write) Another client application (read) And the total throughput capacity will be divided evenly between all partitions physical partition 3 5,000 RUs
  14. 14. Overview of partitioning container 15,000 RUs physical partition 1 5,000 RUs physical partition 2 5,000 RUs Client application (write) Another client application (read) To read data efficiently, the app must provide the partition key of the documents it is requesting physical partition 3 5,000 RUs
  15. 15. How is data distributed?
  16. 16. How is data distributed? {#} Range of partition addresses Hashing algorithm Physical partitions Data with partition keys
  17. 17. How is data distributed? {#} Range of partition addresses Hashing algorithm Physical partitions Data with partition keys Whenever a document is inserted, the partition key value will be checked and assigned to a physical partition pk = 1
  18. 18. How is data distributed? {#} Range of partition addresses Hashing algorithm Physical partitions Data with partition keys The item will be assigned to a partition based on its partitioning key. pk = 1
  19. 19. How is data distributed? {#} Range of partition addresses Hashing algorithm Physical partitions All partition key values will be distributed amongst the physical partitions Data with partition keys
  20. 20. How is data distributed? {#} Range of partition addresses Hashing algorithm Physical partitions However, items with the exact same partition key value will be co-located pk = 1 pk = 1
  21. 21. How are partitions managed?
  22. 22. First scenario: Splitting partitions
  23. 23. Partitioning dynamics Sri Tim Client application (write) Thomas Scenario 1
  24. 24. Partitioning dynamics Sri Tim Client application (write) Thomas Scenario 1 All partitions are almost full of data
  25. 25. Partitioning dynamics Sri Tim Client application (write) Thomas Scenario 1 In order to insert this document, we need to increase the total capacity
  26. 26. Partitioning dynamics Sri Tim Client application (write) Thomas Scenario 1 We have added a new empty partition for the new document
  27. 27. Partitioning dynamics Sri Tim Client application (write) Thomas Scenario 1 And now we will take the largest partition and re-balance it with the new one
  28. 28. Partitioning dynamics Sri Tim Client application (write) Thomas Scenario 1 Now that it's re-balanced, we can keep inserting new data
  29. 29. Second scenario: Adding more throughput
  30. 30. Cosmos DB Data Explorer
  31. 31. All scale settings can be modified using the Data Explorer
  32. 32. All scale settings can be modified using the Data Explorer They can also be modified programmatically via the SDK or Azure CLI
  33. 33. Throughput has a lower and upper limit
  34. 34. Throughput has a lower and upper limit Lower limit is determined by the current number of physical partitions
  35. 35. Throughput has a lower and upper limit Lower limit is determined by the current number of physical partitions Upper limit adds new partitions
  36. 36. When the limit is set beyond the current capacity, more physical partitions will be added This process can take a few to several minutes
  37. 37. Best practices
  38. 38. Best practices
  39. 39. Best practices
  40. 40. Best practices
  41. 41. Best practices
  42. 42. Best practices
  43. 43. Best practices
  44. 44. To do this, go to the Metrics blade in the Azure Portal
  45. 45. Then select the Storage tab and select your desired container
  46. 46. An efficient partitioning strategy has a close to even distribution
  47. 47. An efficient partitioning strategy has a close to even distribution An inefficient partitioning strategy is the main source of cost and performance challenges
  48. 48. An efficient partitioning strategy has a close to even distribution An inefficient partitioning strategy is the main source of cost and performance challenges A random partition key can provide an even data distribution
  49. 49. Best practices
  50. 50. Best practices
  51. 51. Best practices
  52. 52. How to deal with multi-tenancy?
  53. 53. Database Account (per tenant) Container w/ Dedicated Throughput (per tenant) Container w/ Shared Throughput (per tenant) Partition Key (per tenant) Isolation Knobs Independent geo-replication knobs Multiple throughput knobs (dedicated throughput – eliminating noisy neighbors) Independent throughput knobs (dedicated throughput – eliminating noisy neighbors) Group tenants within database account(s) based on regional needs Share throughput across tenants grouped by database (great for lowering cost on “spiky” tenants) Easy management of tenants (drop container when tenant leaves) Mitigate noisy-neighbor blast radius (group tenants by database) Share throughput across tenants grouped by container (great for lowering cost on “spiky” tenants) Enables easy queries across tenants (containers act as boundary for queries) Mitigate noisy-neighbor blast radius (group tenants by container) Throughput requirements >400 RUs per Tenant (> $24 per tenant) >400 RUs per Tenant (> $24 per tenant) >100 RUs per Tenant (> $6 per tenant) >0 RUs per Tenant (> $0 per tenant) T-Shirt Size Large Example: Premium offer for B2B apps Large Example: Premium offer for B2B apps Medium Example: Standard offer for B2B apps Small Example: B2C apps
  54. 54. Global Distribution
  55. 55. Consistency Latency Availability
  56. 56. A Atomicity C Consistency I Isolation D Durability
  57. 57. Master Replica
  58. 58. Master Replica
  59. 59. In the case of network Partitioning in a distributed computer system, one has to choose between Availability and Consistency, but Else, even when the system is running normally in the absence of partitions, one has to choose between Latency and Consistency.
  60. 60. Master Replica
  61. 61. Master Replica
  62. 62. Read Latency
  63. 63. Demo Read Latency with single region, vs multi-region
  64. 64. Write Latency
  65. 65. Region A Region B Region C Azure Traffic Manager Master (read/write) Master (read/write) Master (read/write) Master (read/write) Replica (read) Replica (read)
  66. 66. Demo Write latency for single-write vs. multi-write
  67. 67. Consistency
  68. 68. Strong Bounded-staleness Session Consistent prefix Eventual
  69. 69. Consistency Level Quorum Reads Quorum Writes Strong Local Minority (2 RU) Global Majority (1 RU) Bounded Staleness Local Minority (2 RU) Local Majority (1 RU) Session Single replica using session token(1 RU) Local Majority (1 RU) Consistent Prefix Single replica (1 RU) Local Majority (1 RU) Eventual Single replica (1 RU) Local Majority (1 RU) forwarder follower follower
  70. 70. Demo Consistency vs. Latency Consistency vs. Throughput
  71. 71. Availability
  72. 72. Internet Device Traffic ManagerMobile Browser West US 2 Cosmos DB Application Gateway Web Tier Middle Tier Load Balancer North Europe Cosmos DB Application Gateway Web Tier Middle Tier Load Balancer Southeast Asia Cosmos DB Application Gateway Web Tier Middle Tier Load Balancer
  73. 73. Time Lost Data Downtime RPO Disaster RTO
  74. 74. Time Lost Data Downtime RPO Disaster RTO Region(s) Mode Consistency RPO RTO 1 Any Any < 240 minutes < 1 week >1 Single Master Session, Consistent Prefix, Eventual < 15 minutes < 15 minutes >1 Single Master Bounded Staleness K & T* < 15 minutes >1 Single Master Strong 0 < 15 minutes >1 Multi Master Session, Consistent Prefix, Eventual < 15 minutes 0 >1 Multi Master Bounded Staleness K & T* 0 >1 Multi Master Strong N/A < 15 minutes Partition Yes Availability Consistency No Latency Consistency *Number of "K" updates of an item or "T" time. In >1 regions, K=100,000 updates or T=5 minutes.
  75. 75. Indexing
  76. 76. Azure Cosmos DB’s schema-less service automatically indexes all your data, regardless of the data model, to delivery blazing fast queries. Item Color Microwave safe Liquid capacity CPU Memory Storage Geek mug Graphite Yes 16ox ??? ??? ??? Coffee Bean mug Tan No 12oz ??? ??? ??? Surface book Gray ??? ??? 3.4 GHz Intel Skylake Core i7- 6600U 16GB 1 TB SSD • Automatic index management • Synchronous auto-indexing • No schemas or secondary indices needed • Works across every data model GEEK
  77. 77. Custom Indexing Policies Though all Azure Cosmos DB data is indexed by default, you can specify a custom indexing policy for your collections. Custom indexing policies allow you to design and customize the shape of your index while maintaining schema flexibility. • Define trade-offs between storage, write and query performance, and query consistency • Include or exclude documents and paths to and from the index • Configure various index types { "automatic": true, "indexingMode": "Consistent", "includedPaths": [{ "path": "/*", "indexes": [{ "kind": “Range", "dataType": "String", "precision": -1 }, { "kind": "Range", "dataType": "Number", "precision": -1 }, { "kind": "Spatial", "dataType": "Point" }] }], "excludedPaths": [{ "path": "/nonIndexedContent/*" }] }
  78. 78. { "locations": [ { "country": "Germany", "city": "Berlin" }, { "country": "France", "city": "Paris" } ], "headquarter": "Belgium", "exports": [ { "city": "Moscow" }, { "city": "Athens" } ] } locations headquarter exports 0 country city Germany Berlin 1 country city France Paris 0 1 city Athens city Moscow Belgium
  79. 79. { "locations": [ { "country": "Germany", "city": "Bonn", "revenue": 200 } ], "headquarter": "Italy", "exports": [ { "city": "Berlin", "dealers": [ { "name": "Hans" } ] }, { "city": "Athens" } ] } locations headquarter exports 0 country city Germany Bonn revenue 200 0 1 citycity Berlin Italy dealers 0 name Hans
  80. 80. Athens locations headquarter exports 0 country city Germany Bonn revenue 200 0 1 citycity Berlin Italy dealers 0 name Hans locations headquarter exports 0 country city Germany Berlin 1 country city France Paris 0 1 city Athens city Moscow Belgium
  81. 81. locations headquarter exports 0 country city Germany Berlin revenue 200 0 1 city Athens city Berlin Italy dealers 0 name Hans Bonn 1 country city France Paris Belgium Moscow
  82. 82. { "indexingMode": "none", "automatic": false, "includedPaths": [], "excludedPaths": [] } { "indexingMode": "consistent", "automatic": true, "includedPaths": [ { "path": "/age/?", "indexes": [ { "kind": "Range", "dataType": "Number", "precision": -1 }, ] }, { "path": "/gender/?", "indexes": [ { "kind": "Range", "dataType": "String", "precision": -1 }, ] } ], "excludedPaths": [ { "path": "/*" } ] }
  83. 83. On-the-fly Index Changes In Azure Cosmos DB, you can make changes to the indexing policy of a collection on the fly. Changes can affect the shape of the index, including paths, precision values, and its consistency model. A change in indexing policy effectively requires a transformation of the old index into a new index.
  84. 84. Metrics Analysis The SQL APIs provide information about performance metrics, such as the index storage used and the throughput cost (request units) for every operation. You can use this information to compare various indexing policies, and for performance tuning. When running a HEAD or GET request against a collection resource, the x-ms-request-quota and the x-ms-request-usage headers provide the storage quota and usage of the collection. You can use this information to compare various indexing policies, and for performance tuning.
  85. 85. Understand query patterns – which properties are being used? Understand impact on write cost – index update RU cost scales with # properties
  86. 86. http://cosmosdb.com/ https://azure.microsoft.com/try/cosmosdb/ https://docs.microsoft.com/learn/paths/work-with-nosql-data-in- azure-cosmos-db/ Resources
  87. 87. © 2018 Microsoft Corporation. All rights reserved. 本情報の内容(添付文書、リンク先などを含む)は、作成日時点でのものであり、予告なく変更される場合があります。 © 2019 Microsoft Corporation. All rights reserved.

×