Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cosmos DB - Database for Serverless era


Published on

Slides from ProgNet2018 workshop

Published in: Software
  • Be the first to comment

  • Be the first to like this

Cosmos DB - Database for Serverless era

  1. 1. Cosmos DB Database for Serverless Era Michał Jankowski
  2. 2. 2 about me 2 Michał Jankowski technical architect / team leader traveller / photographer @JankowskiMichal
  3. 3. aim. Learn using Cosmos DB in your Serverless solutions. Improve performance of your whole solution.
  4. 4. 4 way of working A bit of theory, then a lot of demos and practice. I would encourage you to work together and exchange your knowledge. we should have fun
  5. 5. Steve Jobs, cofounder of Apple Great things in business are never done by one person. They're done by a team of people.
  6. 6. 6 presentation agenda Serverless & Cosmos DB A short theoretical introduction to these topics SQL API Let's learn how to use the main feature of this database by using SQL API
  7. 7. 7 presentation agenda Azure Functions We will learn how Azure Function can cooperate with Cosmos DB. Graph API At the end, we will work with API that allows storing data that are connected with each other by multiple relations.
  8. 8. 8 short survey Have you worked with Azure? Have you worked with Azure Functions? Have you worked with Cosmos DB?
  9. 9. serverless the future compute
  10. 10. 10 serverless characteristics server abstraction There is no server managing tasks. productivity Reduce tasks related to infrastructure. You can focus on development activities. event driven Function does not work when there is no event triggering it. It can also instantly scale up. focus on features And then you are able to focus on business logic of your app. microbilling Pay only when there are events. But think about DDOS on your wallet. faster time to market All items mentioned together allow you to reduce time to market.
  11. 11. 11 serverless in Azure An event-based serverless compute experience to accelerate your development. Scale based on demand and pay only for the resources you consume. Azure Functions A single service for managing routing of all events from any source to any destination. Designed for high availability, consistent performance and dynamic scale. Event Grid lets you focus on your app logic rather than infrastructure. Event Grid
  12. 12. 12 serverless in Azure Provide a way to simplify and implement scalable integrations and workflows in the cloud. It provides a visual designer to model and automate your process as a series of steps known as a workflow. Logic Apps Is a service that allows you to create automated workflows between your favourite applications and services to synchronize files, get notifications, collect data, and more. Flow
  13. 13. 13 serverless in Azure Was built from the ground up with global distribution and horizontal scale at its core. It offers turnkey global distribution with multi-master support across any number of Azure regions by transparently scaling and replicating your data wherever your users are. Cosmos DB
  14. 14. cosmos db database for applications anywhere in the world
  15. 15. 15 DocumentDB preview 21/08 2014 SQL grammar over schema-free JSON Tuneable throughput, indexing, consistency Server-side ACID transactions microsoft noSQL database evolution Internal Microsoft DocumentDB service 2010 - 2014 Office OneNote Xbox Part of Azure portal
  16. 16. 16 Cosmos DB …improvements – May 2016 Partitioned collections Geo-replication MongoDB wire protocol support microsoft noSQL database evolution 08/04 2015 DocumentDB GA ORDER BY String range queries Geospatial support Partitioning support by SDK 10/05 2017
  17. 17. 17 azure cosmos db limitless elastic scale around the globe With Azure Cosmos DB, you pay only for the throughput and storage you need. Azure Cosmos DB allows you to independently and elastically scale storage and throughput at any time, anywhere across the globe, making it a perfect ally for your serverless applications. Only Azure Cosmos DB allows you to use key-value, graph, column-family, and document data in one service. Azure Cosmos DB automatically indexes all data and allows you to use your favourite API including SQL, JavaScript, Gremlin, MongoDB, Apache® Cassandra, and Azure Table Storage to access your data. multi-model + multi-API Easily build globally distributed applications without the hassle of complex, multiple- datacenter configurations. Designed as a globally distributed database system, Azure Cosmos DB automatically replicates your data to any number of regions of your choice for fast, responsive access. Azure Cosmos DB supports transparent multi-homing and guarantees 99.999% high availability. turnkey global distribution
  18. 18. 18 azure cosmos db industry-leading, enterprise-grade SLAs Rest assured your apps are running on a "battle-tested" database service built on world-class infrastructure. Azure Cosmos DB gives you enterprise-grade security and compliance, and is the first and only service to offer industry- leading comprehensive SLAs for 99.999% high availability, latency at the 99th percentile, guaranteed throughput, and consistency. Serve read and write requests from the nearest region while simultaneously distributing data across the globe. With its latch-free and write-optimized database engine, Azure Cosmos DB guarantees less than 10-ms latencies on reads and less than 15-ms latencies on (indexed) writes at the 99th percentile. guaranteed low latency at 99th percentile Azure Cosmos DB offers five well-defined consistency levels—strong, bounded staleness, consistent-prefix, session, and eventual—for an intuitive programming model with low latency and high availability for your planet-scale app. multiple, well-defined consistency choices
  19. 19. who does use it???
  20. 20. Key benefits • Cosmos DB supports fast ingestion of message data from 1:1 communication, group chats • Cosmos DB enables real-time query over message and group conversations, with custom filters on when user enters/leaves thread Business need • Provide search capabilities over TBs-PBs of Skype and Teams conversations • Fast ingestion with multiple writes, overlay group memberships • Secure & compliant data storage with high privacy requirements Azure Cosmos DB Azure Cosmos DB Azure Cosmos DB USERS GROUPS MESSAGES Skype Ingestion service Skype Query service 44TB Message data Skype powers 1M searches per second over conversation data 6TB User data 1TB Group data Source: Building globally distributed applications with Azure Cosmos DB
  21. 21. Key benefits • Cosmos DB can scale elastically without operational overhead of MongoDB • Perform fast queries over events to deliver recommended services, safety notices to vehicles • Perform staged migration via MongoDB APIs Business need • Need to ingest massive volumes of diagnostic data from vehicles and take real-time actions as part of connected car platform • Management and operations of database infrastructure to handle exponential growth of data 8TB Vehicle Telemetry 250K Lexus Cars Toyota drives connected car push forward with Azure Cosmos DB Azure Cosmos DBAzure HDInsight Storm Azure Storage (archival) Source: Building globally distributed applications with Azure Cosmos DB
  22. 22. Business need • Handle millions of players on Day 1 due to popularity of the TV series • Match-making of players for competitive and lag-free experience • Provide new content weekly, and iterate on social functionality • Key benefits • Cosmos DB provides elastic scalability for millions of users and flexible schema to support social features and gameplay • Global distribution allows for low latency for players spread worldwide • Automatic indexing used to build real-time leaderboards Performance at massive scale allows millions to play mobile game Azure Traffic Manager Azure API Aps (game backend) Azure CDN Azure Cosmos DB Azure Functions Azure Notification Hubs (push notifications) Azure Storage (game files) 1M Peak Active #1 iOS App Store 1B Daily Queries Source: Building globally distributed applications with Azure Cosmos DB
  23. 23. 23 maybe we should start coding
  24. 24. 24 before we start We will be working with expensive service. Use it wisely. And after all, please remember to clean your environment.
  25. 25. 25 prepare your environment
  26. 26. 26 multiple APIs SQL API (json) MongoDB API (bson) document Cassandra API column-family Gremlin API (graph traversal language) graph Table API (potential replacement for Azure Table Storage) key-value
  27. 27. demo 1.
  28. 28. 28 this is time for you It is time for you to start your journey with this product. Play with Cosmos DB
  29. 29. 3… 2… 1… lift off!!!!!
  30. 30. 30 data in Cosmos DB • denormalized data • referential integrity NOT enforced • mixed data in collections • flexible schema • SQL-like language as well as JavaScript and others Cosmos DB (document db) • normalised data • referential integrity enforced by normalisation and relationships • uniform data in tables • schema is set • SQL Relational database
  31. 31. 31 document vs relational database Cosmos DB (document db) Relational database
  32. 32. 32 whole model in one record { "lastName": "Cartwright", "parents": [ { "firstName": "Elvira", "role": "mother", "age": 64 }, { "firstName": "Randolph", "role": "father", "age": 67 } ], "children": [ { "firstName": "Dana", "age": 15, "gender": "female“ }, { "firstName": "Pat", "age": 13, "gender": "male“, "grade": 7, "pets": [ { "name": "Concepcion", "type": "guinea pig“ }, { "name": "Haleigh", "type": "hamster“ } ]}], "address": { "state": "North Dakota", "city": "West Bretthaven", "country": "Guinea" } } complex structure one-to-many relation different object schema
  33. 33. 33 database creation • create new one or use an existing one • setup performance on database level – throughput • there is no limit for it, but we need to contact support when we need more than 1 000 000 RU/s • we need to confirm higher costs we can
  34. 34. 34 request units normalized measure of request processing cost Item size Reads / second Writes / second Request units 1 KB 500 100 1,000 RU/s 1 KB 500 500 3,000 RU/s 4 KB 500 100 1,350 RU/s 4 KB 500 500 4,150 RU/s 64 KB 500 100 9,800 RU/s 64 KB 500 500 29,000 RU/s Table that shows how many request units to provision for items with three different sizes (1 KB, 4 KB, and 64 KB) and at two different performance levels (500 reads/second + 100 writes/second and 500 reads/second + 500 writes/second). In this example, the data consistency is set to Session, and the indexing policy is set to None. • combines memory, CPU and IOPS into currency rate • each same request will always consume the same amount of RUs • each time we will get information about operation cost • we are paying for some capacity: • when it is exhausted our operations will be replanned • we can increase or decrease the amount of throughput instantaneously • without capacity our service will stop • cost of write operation is higher than read one • pricing cost may be different per region
  35. 35.
  36. 36. 36 collection creation option for start direction when we think about something serious
  37. 37. 37 partition key Remember about some limits: • A physical partition can store a maximum of 10GB of data • A physical partition can facilitate at most 10,000 RU/s partition key selection is the most important decision Cosmos DB account Database Collections Physical partition Logical partition Logical partition Physical partition Documents
  38. 38. how it works? Smith Jones Williams Taylor Davies Brown Wilson Thomas Johnson Roberts Evans Container Last name as partition key. physical partition logical partition
  39. 39. Evans logical partition is growing Smith Jones Williams Taylor Davies Brown Wilson Thomas Johnson Roberts Evans Container
  40. 40. new physical partition will be created Smith Jones Williams Taylor Davies Brown Wilson Thomas Johnson Container Roberts Evans
  41. 41. it is still growingSmith Jones Williams Taylor Davies Brown Wilson Thomas Johnson Container Roberts Evans
  42. 42. Smith Jones Williams Taylor Davies Brown Wilson Thomas Johnson Container Roberts London Manchester Container Last name as partition key. Evans family with city as partition key. Liverpool Leeds Evans family as separate container
  43. 43. how to do it right? • Choosing partition key purely depends on structure of data. • It is important to choose a partition key property that has a number of distinct values. • An ideal partition key is one that appears frequently as a filter in your queries and has sufficient cardinality to ensure your solution is scalable. • If chosen partition key doesn't have many distinct values then all queries will get fired to a single partition which may slow down performance. • General: • Do not be afraid of having too many partition keys. In most cases, more partition keys mean more scalability
  44. 44. demo 2.
  45. 45. 45 this is time for you 1. Familiarise with presented queries. 2. Try to develop queries to a bigger database Let's do something more serious
  46. 46. demo 3.
  47. 47. what is your opinion?
  48. 48. 48 from my perspective quite simple product with some limitations and great potential we need to change how we are thinking about data and database can be very expensive when it is wrong designed or used
  49. 49. cosmos db & serverless
  50. 50. 50 serverless main benefits cost can be more cost-effective than renting or purchasing a fixed quantity of servers operations / scalability a serverless architecture means that developers and operators do not need to spend time setting up and tuning autoscaling policies or system productivity simplifying the task of back-end software development
  51. 51. 51 the performance latency How fast is the response on provided request? throughput How many requests can be handled in a provided period?
  52. 52. 52 know the limits The speed of light in vacuum is a universal physical constant important in many areas of physics. Its exact value is 299,792,458 metres per second. The speed at which light propagates through transparent materials, such as glass or air, is less than c.
  53. 53. 53 what does it mean for us? ~ 200ms ~ 5ms
  54. 54. geo replication. • it has never been so easy • you can replicate your data to as many data centers as you need • you can do it with just a few clicks
  55. 55. 55 data consistency sessionbounded staleness consistent prefix strong eventual • lower latency • higher availability • better read scalability
  56. 56. 56 bounded staleness Bounded staleness consistency guarantees that the reads may lag behind writes by at most K versions or prefixes of an item or t time-interval. The cost of a read operation (in terms of RUs consumed) with bounded staleness is higher than session and eventual consistency, but the same as strong consistency.
  57. 57. 57 session Session consistency is ideal for all scenarios where a device or user session is involved since it guarantees monotonic reads, monotonic writes, and read your own writes (RYW) guarantees. Session consistency provides predictable consistency for a session, and maximum read throughput while offering the lowest latency writes and reads.
  58. 58. 58 consistent prefix Consistent prefix guarantees that in absence of any further writes, the replicas within the group eventually converge. Consistent prefix guarantees that reads never see out of order writes. If writes were performed in the order A, B, C, then a client sees either A or A,B, or A,B,C, but never out of order like A,C or B,A,C.
  59. 59. 59 eventual Eventual consistency guarantees that in absence of any further writes, the replicas within the group eventually converge. It is the weakest form of consistency a client may get the values that are older than the ones it had seen before. Provides the weakest read consistency but offers the lowest latency for both reads and writes with the lowest cost of a read operation.
  60. 60. prefer bounded staleness20% Azure Cosmos DB tenants use session consistency73% experiment with various consistency levels initially before settling on a specific consistency 3%
  61. 61. 61 read – nothing to do In the rare event of an Azure regional outage or data center outage, Cosmos DB automatically triggers failovers of all Cosmos DB accounts with a presence in the affected region. write – automatic failover must be on If the affected region is the current write region and automatic failover is enabled for the Azure Cosmos DB account, then the region is automatically marked as offline. Then, an alternative region is promoted as the write region for the affected Azure Cosmos DB account. * manual failover Can be used as follow the clock model. regional failover
  62. 62. 62 data index customisations scope - include or exclude documents and paths to and from the index index types - hash - supports efficient equality and JOIN queries - range - supports efficient equality queries, range queries (using >, <, >=, <=, !=), and ORDER BY queries - spatial - supports efficient spatial (within and distance) queries precision - make trade-offs between index storage overhead and query performance index update mode - consistent, lazy, and none indexing everything by default
  63. 63. azure functions. By definition, synergy happens when the interaction between two elements produces an effect greater than the individual elements’ contribution.
  64. 64. demo 4.
  65. 65. 65 this is time for you 1. Create a function that allows adding new ToDo items to the database 2. Create a function that will list all ToDo items assigned to list Let’s build API for the application
  66. 66. graph api
  67. 67. 67 main features many-to-many complex relations typical usage of this type of API social networks analyse interconnected data and relationships excessive JOINs recommendation engines knowledge graphs
  68. 68. denote discrete objects, such as a person, a place, or an event Id: DEN Label: airport Properties: • Code: DEN • City: Denver • Description: Denver International Airport • Elevation: 5443 verticle denote relationships between vertices Id Label: route Properties: • Distance: 542 edge
  69. 69. 69 sample graph label: route properties: • distance: 2249 id: United States label: country properties: • code: US • name: United States label: contains label: contains id: DEN label: airport properties: • code: DEN • city: Denver • elevation: 5443 id: ATL label: airport properties: • code: ATL • city: Atlanta • elevation: 1026
  70. 70. demo 5.
  71. 71. 71 this is time for you 1. Get all the details about your favourite airport. 2. Check how you can get to your vacation location. Maybe you should plan your holidays
  72. 72. summary. We made a brief introduction to serverless and its' connections to Cosmos DB. We learned how to use SQL API and connect Cosmos DB with Azure Functions. You should know how you can make optimisation of your environment. We tried graph API.
  73. 73. 73 do you have any questions? @JankowskiMichal
  74. 74. 74 more information • cosmos-db/ • db/ • db/sql-api-introduction • functions/ • db/serverless-computing-database • db/graph-introduction
  75. 75. 75 thank you @JankowskiMichal