Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Azure DocumentDB


Published on

This presentation provides an introduction to Azure DocumentDB. Topics include elastic scale, global distribution and guaranteed low latencies (with SLAs) - all in a managed document store that you can query using SQL and Javascript. We also review common scenarios and advanced Data Sciences scenarios.

Published in: Data & Analytics
  • Be the first to comment

Introduction to Azure DocumentDB

  1. 1. Introduction to Azure DocumentDB Denny Lee, Principal Program Manager, Azure DocumentDB
  2. 2. Denny Lee • Principal Program Manager for Azure DocumentDB • 20+ years of experience in databases, distributed systems, data sciences, and software development at Microsoft, Concur, and Databricks • Noteable Projects: • Project Isotope: Incubation team for HDInsight • Yahoo! 24TB cube: Largest SSAS cube in production @dennylee
  3. 3. A Brief Overview...
  4. 4. { "name": "SmugMug", "permalink": "smugmug", "homepage_url": "", "blog_url": "", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents schema-agnostic JSON store for hierarchical and de-normalized data at scale
  5. 5. Not these documents
  6. 6. { "name": "SmugMug", "permalink": "smugmug", "homepage_url": "", "blog_url": "", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents schema-agnostic JSON store for hierarchical and de-normalized data at scale
  7. 7. Elastically Scalable Throughput + Storage
  8. 8. Guaranteed low latency Reads <10ms @ P99 Writes <15ms @ P99
  9. 9. Globally Distributed
  10. 10. Speaks your language
  11. 11. DocumentDB Query Playground Demo Code:
  12. 12. A Primer on Scale...
  13. 13. The 4 Vs of Big Data Exceeds physical limits of vertical scalabilityVolume Many different formats making integration expensiveVariety Small decision window compared to data change rateVelocity Many options or variables confounding analysisVariability
  14. 14. The 4 Vs of Big Data Volume Variety Velocity Variability Mobile Apps Retail Learning Telematics IoT Gaming
  15. 15. Let’s talk about scale. Volume and Velocity
  16. 16. Ability to Scale from Day 1 • Bursty • Unpredictable traffic Gaming + Social Experience • Lag-free • Responsive experiences Move fast without breaking things • Iterative development needs More users, more problems
  17. 17. • Game scores, guilds and social membership • Leaderboards by country and social • Guild management and messaging • #1 in Apple app store for free apps <10ms 99P query latency >1M game downloads ~1B requests / day The Walking Dead, results
  18. 18. Caches • Scores are continuously updated • Write heavy without locality RDBMS • Scale-out requires partitioning • Schema and index management Other NoSQL Stores • Longer tail on latencies • Need to specify secondary indexes for lookups The right tool for the job ?
  19. 19. Fully managed NoSQL database Horizontal scaling for TB and RPS High performance, write optimized Schema agnostic indexing + Azure DocumentDB The answer for low latency @ massive scale
  20. 20. Fact: Managing shards is really painful. Managing shards or partitions Good news: DocumentDB has done all the heavy lifting.
  21. 21. Elastic scale
  22. 22. Measuring Throughput (Request Units) Replica gets a fixed budget of request units Request Unit/sec (RU) is the normalized currency % IOPS % CPU % Memory READ GET Document Documents INSERT POST REPLACE PUT Document Operations consume request units (RUs) Query POST Documents … Min RU/sec Max RU/sec IncomingRequests Replica Quiescent Rate limit No throttling Requests get rate limited if they exceed the SLA Customers pay for reserved request units by the hour
  23. 23. Elastic Scale Demo Code:
  24. 24. Configured @10,100 RUs ~940 writes / s ~9800 RUs
  25. 25. Configured @250,000 RUs ~12,100 writes / s ~128,800 RUs VM @ 99% CPU
  26. 26. A Global Distribution Primer…
  27. 27. Globally Distributed Azure DocumentDB gives you the ability circumvent the speed of light! High Availability and Disaster Recovery Replicate to any Number of regions Global low latency access Dynamically configure write and read regions
  28. 28. … with well-defined consistency models! Consistency Level Strong Bounded Stateless Session Eventual Total Global Order Yes Yes (outside of the “staleness window”) No, partial “session” order No Consistent prefix guarantee Yes Yes Yes Yes Monotonic Reads Yes Yes (within region and across regions outside of the staleness window) Yes (for the given session) No Monotonic Writes Yes Yes Yes Yes Read your writes Yes Yes (in the write region) Yes No stronger consistency faster performance
  29. 29. Global Distribution Demo Code:
  30. 30. Common Scenarios
  31. 31. Common scenarios Retail Gaming IoT Social Product Catalog Recommendations Personalization User Store Recommendations Personalization Event Store Device Registry Telemetry Store User Behavior Telemetry Personalization
  32. 32. Common scenarios IoT Event Store Device Registry Telemetry Store IoT / Sensor Data Challenges: • Hardware is relatively hard to update • Different generation of devices => different schemas (variety) • Many sensors emitting telemetry => high rate of ingestion (volume + variety)
  33. 33. Top 5 Automotive Manufacture in the World Telematics services include: • Safety service • Diagnostic service • Remote service Ingest and query 100+ TB of semi-structure data IoT : Vehicle Telematics
  34. 34. IoT : Vehicle Telematics Ingress API Inbound Interface (Web API) Raw Event Store (HOT) (DocumentDB) Aggregated Event Store (Warm) (DocumentDB) Aggregated Event Store (Cold) (Blob Storage) Outbound Interface (Web API) Message Queue (Event Hubs) Stream Processor (Stream Analytics)
  35. 35. Common scenarios Social + AdTech Challenges: • Ingest + Analyze Third Party Data => Who dictates schema? (variety) => How do you index? • A lot of social and user data => high rate of ingestion (volume + variety) Social User Behavior Telemetry Personalization
  36. 36. • Startup - Advanced Marketing Intelligence Platform • Utilizes deep learning to analyze billions of relational network connections to build a social fingerprint for each user • Extracts knowledge and cultural insights by analyzing what people choose to follow Social Analytics + Ad Technology >1B Social Media Profiles >50M Tweets per Day
  37. 37. • Store tweets, geo-location data, and ML results in DocumentDB • Data from each social media producer has its own schema that evolves independently • Need to iterate rapidly… no time for managing VMs Social Analytics + Ad Technology >1B Social Media Profiles >50M Tweets per Day
  38. 38. Before moving to DocumentDB, my developers would need to come to me to confirm that our Elasticsearch deployment would support their data or if I would need to scale things to handle it. DocumentDB removed me as a bottleneck, which has been great for me and them. Stephen Hankinson, CTO, Affinio Quote
  39. 39. Geospatial Support including polygons Demo Want to try? Go to DocumentDB Query Playground
  40. 40. Polygon Query Example Polygon of coordinates -124.630000, 48.360000 -123.870000, 46.140000 -122.230000, 45.540000 -119.170000, 45.950000 -116.920000, 45.960000 -116.990000, 49.000000 -123.050000, 49.020000 -123.150000, 48.310000 -124.630000, 48.360000
  41. 41. Finding Volcanos with DocumentDB
  42. 42. Data Sciences: Apache Spark + DocumentDB
  43. 43. Example: Graph Structures
  44. 44. Example: Graph Structures
  45. 45. Classic Graph Scenario: Flights vertex = airports edges = flights
  46. 46. Data Sciences: Apache Spark + DocumentDB Demo Notebook View: pyView: Code:
  47. 47. Graph Calculations: Degrees, PageRank What is the most important airport (most flights in / out) tripGraph.inDegrees .sort(desc("inDegree")) .limit(10))
  48. 48. Advantages Data Science Scenarios • Blazing Fast IoT Scenarios • Updateable columns • Push-down predicate filtering
  49. 49. Advantages Blazing Fast IoT Scenarios Flight information global safety alerts weather Data Science Scenarios Device Notifications Web / REST API
  50. 50. Advantages Updateable Columns Flight information Data Science Scenarios Device Notifications Web / REST API { tripid: “100100”, delay: -5, time: “01:00:01” } { tripid: “100100”, delay: -30, time: “01:00:01” } {delay:-30} {delay:-30} {delay:-30}
  51. 51. Advantages Pushdown Predicate Filtering Data Science Scenarios {city:SEA} locations headquarter exports 0 1 country Germany city Seattle country France city Paris city Moscow city Athens Belgium 0 1 {city:SEA, dst: POR, ...}, {city:SEA, dst: JFK, ...}, {city:SEA, dst: SFO, ...}, {city:SEA, dst: YVR, ...}, {city:SEA, dst: YUL, ...}, ...
  52. 52. References Get direct access to the engineering team -> Resources • Schema Agnostic Indexing with DocumentDB, VLDB 2015 • Consistency Levels in DocumentDB • SQL Queries with DocumentDB • Language Integrated JavaScript queries and transactions with DocumentDB • Distribute your data globally with DocumentDB
  53. 53. More Resources AskDocDB@microsoft Follow @DocumentDB Use #DocumentDB #azure-documentDB