Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure DocumentDB

640 views

Published on

Data analysts, data engineers, and application developers are supporting unprecedented rates of change, whether talking about latency requirements to the expanding arena of data usage scenarios. While the technology functionality must rapidly evolve to meet customer needs and respond to competitive pressures, how can we enhance the data platform to help manage this unpredictability?

To help address these realities, data practitioners from a diverse set of backgrounds are increasingly relying on schema-free, distributed, scalable, and high-performance data storage (also known as NoSQL databases). In this session, we will showcase a wide variety of customer scenarios, business goals, and technical challenges faced by real-world customers. More importantly, how adding Azure DocumentDB into a data practitioner's arsenal within the Microsoft/Azure data ecosystem will allow you to easily solve these complex design patterns at massive scale.

Published in: Data & Analytics
  • Be the first to comment

[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure DocumentDB

  1. 1. Blazing Fast, Planet-Scale Customer Scenarios with Azure DocumentDB Denny Lee Program Manager Azure DocumentDB @dennylee Andrew Liu Program Manager Azure DocumentDB @aliuy8
  2. 2. A Brief Overview...
  3. 3. Elastically Scalable Throughput + Storage
  4. 4. Guaranteed low latency Reads <10ms @ P99 Writes <15ms @ P99
  5. 5. Globally distributed
  6. 6. Speaks your language
  7. 7. Azure DocumentDB
  8. 8. { "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents schema-agnostic JSON store for hierarchical and de-normalized data at scale
  9. 9. Not these documents
  10. 10. { "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents schema-agnostic JSON store for hierarchical and de-normalized data at scale
  11. 11. “If all you have is a hammer, everything looks like a nail“ -Abraham Maslow
  12. 12. Choose the right tools for the right job SQL SQL Server 2016 SQL Database Azure DocumentDB Azure Search Azure HDInsight Azure Data Lake Azure DW APS Azure Stream Analytics SQL SQL Server 2016 Azure Data Factory Azure ML Azure Data Catalog Power BI SQL SQL Server 2016 SQL Server 2016 SQL Microsoft Data Platform
  13. 13. 3 V’s of data : Endless possibilities LearningGaming Retail Telematics Mobile Apps IoT
  14. 14. Let’s talk about scale. Problem 1: Volume and Velocity
  15. 15. More users, more problems
  16. 16. <10ms 99P query latency >1M game downloads ~1B requests / day The Walking Dead , results
  17. 17. How ? Just throw some data in a database!
  18. 18. Not that easy…
  19. 19. The right tool for the job ?
  20. 20. The answer for low latency @ massive scale
  21. 21. Fact: Managing shards is really painful. Managing shards or partitions Good news: DocumentDB has done all the heavy lifting.
  22. 22. Elastic scale
  23. 23. Request Unit (RU) is the normalized currency % Memory % IOPS % CPU Replica gets a fixed budget of Request Units Resource Resource set Resource Resource DocumentsSQL sprocs args Resource Resource Predictable Performance Request units
  24. 24. Creating partitioned collections
  25. 25. Scale Demo Code: https://aka.ms/docdb-benchmark
  26. 26. Configured @10,100 RUs ~940 writes / second Writing @ ~9800 RUs
  27. 27. Configured @250,000 RUs ~12,100 writes / second Writing @ ~128,800 RUs VM @ 99% CPU
  28. 28. Globally Distributed Azure DocumentDB gives you the ability cheat the speed of light!
  29. 29. … with well-defined consistency models! Bounded Staleness Sessio n EventualStrong LEFT TO RIGHT  Relaxed consistency => better performance and availability Consistency Level Strong Bounded Staleness Session Eventual Total global order Yes Yes, outside of the “staleness window” No, partial “session” order No Consistent prefix guarantee Yes Yes Yes Yes Monotonic reads Yes Yes, across regions outside of the staleness window and within a region all the time Yes, for the given session No Monotonic writes Yes Yes Yes Yes Read your writes Yes Yes (in the write region) Yes No 27% 3% 54% 16% Observed Distribution BoundedStaleness Eventual Session Strong
  30. 30. App defined regional preferences
  31. 31. Global Distribution Demo Code: https://aka.ms/docdb-latency-script-nodejs
  32. 32. Let’s talk about schema-freedom. Problem 2: Variety
  33. 33. Item Color Microwave Safe Liquid Capacity Geek Mug Graphite Yes 16oz Coffee Bean Mug Tan No 12oz Problem 2: Variety
  34. 34. Item Color Microwave Safe Liquid Capacity Geek Mug Graphite Yes 16oz Coffee Bean Mug Tan No 12oz Surface Book Gray ??? ??? Variety : Different attributes
  35. 35. Variety : Different attributes
  36. 36. Item Color Microwave Safe Liquid Capacity CPU Memory Storage Geek Mug Graphite Yes 16oz ??? ??? ??? Coffee Bean Mug Tan No 12oz ??? ??? ??? Surface Book Gray ??? ??? 3.4 GHz Intel Skylake Core i7- 6600U 16GB 1 TB SSD Variety : More columns ?
  37. 37. Item Color Microwave Safe Liquid Capacity Geek Mug Graphite Yes 16oz Coffee Bean Mug Tan No 12oz Variety : More tables ? Item CPU Memory Storage Surface Book 3.4 GHz Intel Skylake Core i7- 6600U 16GB 1 TB SSD
  38. 38. ProductId Name 1 Geek Mug 2 Coffee Bean Mug 3 Surface Book Variety : Master data ? ProductId Attribute Value 1 Microwave Safe Yes 1 Liquid Capacity 16oz … … … 2 Microwave Safe No 2 Liquid Capacity 12oz … … … 3 CPU 3.4 GHz Intel Skylake Core i7- 6600U 3 Memory 16GB … … …
  39. 39. 2.4 GHz Core i5-6300U 3.4 GHz Core i7-6600U Variety : JSON is beautiful
  40. 40. Retail • Product Catalog • Product Recommendations + Personalization Gaming • Multiplayer + Social Gameplay IoT / Sensor Data • Telemetry + Event Store • Device Registry Social Analytics + Ad Technology • User behavior telemetry • 3rd-Party Data from Web Crawlers Common scenarios
  41. 41. IoT / Sensor Data • Telemetry + Event Store • Device Registry Common scenarios IoT / Sensor Data Challenges: • Hardware is relatively hard to update • Different generations of devices => different schema (Variety) • Lots of sensors emitting telemetry => high rate of ingestion (Volume + Velocity)
  42. 42. IoT : Vehicle Telematics
  43. 43. IoT : Vehicle Telematics Ingress API HOT Warm Cold
  44. 44. Common Scenarios Social Analytics + Ad Technology: • Ingest + Analyze 3rd-Party Data => Who dictates schema? How do you index? (Variety) • Lots of social / user profiles => high rate of ingestion (Volume + Velocity) Social Analytics + Ad Technology • User behavior telemetry • 3rd-Party Data from Web Crawlers
  45. 45. Social Analytics + Ad Technology >1B Social Media Profiles >50M Tweets per Day
  46. 46. Social Analytics + Ad Technology >1B Social Media Profiles >50M Tweets per Day Before moving to DocumentDB, my developers would need to come to me to confirm that our Elasticsearch deployment would support their data or if I would need to scale things to handle it. DocumentDB removed me as a bottleneck, which has been great for me and them. -Stephen Hankinson, CTO, Affinio
  47. 47. Data Science Demo
  48. 48. Example: Graph Structures
  49. 49. Example: Graph Structures
  50. 50. Classic Graph Scenario: Flights vertex = airports edges = flights
  51. 51. Flight Graph with Spark and DocumentDB Notebook View: https://aka.ms/docdb-spark-graph Code: https://aka.ms/docdb-spark-graph-code Demo
  52. 52. Understanding most important airport (most flights in / out) tripGraph.inDegrees .sort(desc("inDegree")) .limit(10)) Graph Calculations: Degrees, PageRank 56
  53. 53. • Blazing Fast IoT Scenarios • Updateable columns • Push-down predicate filtering Advantages of DocumentDB in Data Science Scenarios 57
  54. 54. Advantages Blazing Fast IoT Scenarios 58 Flight information global safety alerts weather Data Science Scenarios Device Notifications Web / REST API
  55. 55. Advantages Updateable Columns 59 Flight information Data Science Scenarios Device Notifications Web / REST API { tripid: “100100”, delay: -5, time: “01:00:01” } { tripid: “100100”, delay: -30, time: “01:00:01” } {delay:-30} {delay:-30} {delay:-30}
  56. 56. Advantages Pushdown Predicate Filtering 60 Data Science Scenarios {city:SEA} locations headquarter exports 0 1 country Germany city Seattle country France city Paris city Moscow city Athens Belgium 0 1 {city:SEA, dst: POR, ...}, {city:SEA, dst: JFK, ...}, {city:SEA, dst: SFO, ...}, {city:SEA, dst: YVR, ...}, {city:SEA, dst: YUL, ...}, ...
  57. 57. Azure DocumentDB
  58. 58. More Resources / Coming Soon Want to know more about Spark-to-DocumentDB Connector? Have any other questions?
  59. 59. Session Evaluations ways to access Go to passSummit.com Download the GuideBook App and search: PASS Summit 2016 Follow the QR code link displayed on session signage throughout the conference venue and in the program guide Submit by 5pm Friday November 6th to WIN prizes Your feedback is important and valuable. 3
  60. 60. Thank You Learn more from Azure DocumentDB askdocdb@microsoft.com or follow @DocumentDB

×