[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure DocumentDB

Blazing Fast, Planet-Scale
Customer Scenarios with
Azure DocumentDB
Denny Lee
Program Manager
Azure DocumentDB
@dennylee
Andrew Liu
Program Manager
Azure DocumentDB
@aliuy8

Elastically Scalable Throughput + Storage

Guaranteed low latency
Reads <10ms @ P99
Writes <15ms @ P99

{
"name": "SmugMug",
"permalink": "smugmug",
"homepage_url":
"http://www.smugmug.com",
"blog_url":
"http://blogs.smugmug.com/",
"category_code": "photo_video",
"products": [
{
"name": "SmugMug",
"permalink": "smugmug"
}
],
"offices": [
{
"description": "",
"address1": "67 E. Evelyn Ave",
"address2": "",
"zip_code": "94041",
"city": "Mountain View",
"state_code": "CA",
"country_code": "USA",
"latitude": 37.390056,
"longitude": -122.067692
}
]
}
Perfect for
these
Documents
schema-agnostic JSON store
for
hierarchical and de-normalized data at scale

“If all you have is a hammer, everything looks like a nail“
-Abraham Maslow

Choose the right
tools for the right job
SQL
SQL
Server 2016
SQL
Database
Azure
DocumentDB
Azure
Search
Azure
HDInsight
Azure
Data Lake
Azure DW APS
Azure
Stream Analytics
SQL
SQL
Server 2016
Azure
Data Factory
Azure
ML
Azure
Data Catalog
Power BI
SQL
SQL
Server 2016
SQL
Server 2016
SQL
Microsoft Data Platform

3 V’s of data : Endless possibilities
LearningGaming
Retail
Telematics
Mobile Apps
IoT

Let’s talk about scale.
Problem 1: Volume and Velocity

<10ms
99P query
latency
>1M
game
downloads
~1B
requests / day
The Walking Dead , results

How ?
Just throw some data in a
database!

The answer for low latency @ massive scale

Fact: Managing shards is really painful.
Managing shards or partitions
Good news: DocumentDB has done all the heavy lifting.

Request Unit (RU) is the
normalized currency
%
Memory
%
IOPS
%
CPU
Replica gets a fixed budget
of Request Units
Resource
Resource
set
Resource
Resource
DocumentsSQL
sprocs
args
Resource Resource
Predictable Performance
Request units

Creating partitioned collections

Scale
Demo
Code: https://aka.ms/docdb-benchmark

Configured @10,100 RUs
~940 writes / second
Writing @ ~9800 RUs

Configured @250,000 RUs
~12,100 writes / second
Writing @ ~128,800 RUs
VM @ 99% CPU

Globally Distributed
Azure DocumentDB gives you the ability cheat the speed of light!

… with well-defined consistency models!
Bounded
Staleness
Sessio
n
EventualStrong
LEFT TO RIGHT  Relaxed consistency => better performance and availability
Consistency Level Strong Bounded Staleness Session Eventual
Total global order Yes Yes, outside of the “staleness
window”
No, partial “session” order No
Consistent prefix
guarantee
Yes Yes Yes Yes
Monotonic reads Yes Yes, across regions outside of the
staleness window and within a region
all the time
Yes, for the given session No
Monotonic writes Yes Yes Yes Yes
Read your writes Yes Yes (in the write region) Yes No
27%
3%
54%
16%
Observed Distribution
BoundedStaleness
Eventual
Session
Strong

App defined regional preferences

Global Distribution
Demo
Code: https://aka.ms/docdb-latency-script-nodejs

Let’s talk about schema-freedom.
Problem 2: Variety

Item Color Microwave Safe Liquid Capacity
Geek Mug Graphite Yes 16oz
Coffee Bean Mug Tan No 12oz
Problem 2: Variety

Item Color Microwave Safe Liquid Capacity
Surface Book Gray ??? ???
Variety : Different attributes

Variety : Different attributes

Item Color Microwave
Safe
Liquid
Capacity
CPU Memory Storage
Geek Mug Graphite Yes 16oz ??? ??? ???
Coffee Bean Mug Tan No 12oz ??? ??? ???
Surface Book Gray ??? ??? 3.4 GHz Intel
Skylake Core i7-
6600U
16GB 1 TB SSD
Variety : More columns ?

Item Color Microwave
Safe
Liquid
Capacity
Variety : More tables ?
Item CPU Memory Storage
Surface Book 3.4 GHz Intel
Skylake Core i7-
6600U
16GB 1 TB SSD

ProductId Name
1 Geek Mug
2 Coffee Bean Mug
3 Surface Book
Variety : Master data ?
ProductId Attribute Value
1 Microwave Safe Yes
1 Liquid Capacity 16oz
… … …
2 Microwave Safe No
2 Liquid Capacity 12oz
… … …
3 CPU 3.4 GHz Intel Skylake Core i7-
6600U
3 Memory 16GB
… … …

2.4 GHz Core i5-6300U
3.4 GHz Core i7-6600U
Variety : JSON is beautiful

Retail
• Product Catalog
• Product Recommendations + Personalization
Gaming
• Multiplayer + Social Gameplay
IoT / Sensor Data
• Telemetry + Event Store
• Device Registry
Social Analytics + Ad Technology
• User behavior telemetry
• 3rd-Party Data from Web Crawlers
Common scenarios

IoT / Sensor Data
• Telemetry + Event Store
• Device Registry
Common scenarios
IoT / Sensor Data Challenges:
• Hardware is relatively hard to update
• Different generations of devices
=> different schema
(Variety)
• Lots of sensors emitting telemetry
=> high rate of ingestion
(Volume + Velocity)

IoT : Vehicle Telematics
Ingress API
HOT
Warm
Cold

Common Scenarios
Social Analytics + Ad Technology:
• Ingest + Analyze 3rd-Party Data
=> Who dictates schema? How do you index?
(Variety)
• Lots of social / user profiles
=> high rate of ingestion
(Volume + Velocity)
• User behavior telemetry
• 3rd-Party Data from Web Crawlers

>1B
Social Media
Profiles
>50M
Tweets per Day

>1B
Social Media
Profiles
>50M
Tweets per Day
Before moving to DocumentDB, my developers would
need to come to me to confirm that our Elasticsearch
deployment would support their data or if I would need
to scale things to handle it. DocumentDB removed me
as a bottleneck, which has been great for me and them.
-Stephen Hankinson, CTO, Affinio

Classic Graph Scenario: Flights
vertex = airports
edges = flights

Flight Graph with
Spark and DocumentDB
Notebook
View: https://aka.ms/docdb-spark-graph
Code: https://aka.ms/docdb-spark-graph-code
Demo

Understanding most important
airport (most flights in / out)
tripGraph.inDegrees
.sort(desc("inDegree"))
.limit(10))
Graph Calculations: Degrees, PageRank
56

• Blazing Fast IoT Scenarios
• Updateable columns
• Push-down predicate filtering
Advantages of DocumentDB in Data
Science Scenarios
57

Advantages
Blazing Fast IoT Scenarios
58
Flight
information
global safety
alerts
weather
Data Science Scenarios
Device
Notifications
Web / REST API

Advantages
Updateable Columns
59
Flight
information
Device
Notifications
Web / REST API
{
tripid: “100100”,
delay: -5,
time: “01:00:01”
}
{
tripid: “100100”,
delay: -30,
time: “01:00:01”
}
{delay:-30}
{delay:-30}
{delay:-30}

Advantages
Pushdown Predicate Filtering
60
{city:SEA}
locations headquarter exports
0 1
country
Germany
city
Seattle
country
France
city
Paris
city
Moscow
city
Athens
Belgium 0 1
{city:SEA, dst: POR, ...},
{city:SEA, dst: JFK, ...},
{city:SEA, dst: SFO, ...},
{city:SEA, dst: YVR, ...},
{city:SEA, dst: YUL, ...},
...

More Resources / Coming Soon
Want to know more about Spark-to-DocumentDB
Connector?
Have any other questions?

Session Evaluations
ways to access
Go to passSummit.com Download the GuideBook App
and search: PASS Summit 2016
Follow the QR code link displayed
on session signage throughout the
conference venue and in the
program guide
Submit by 5pm
Friday November 6th to
WIN prizes
Your feedback is
important and valuable. 3

Thank You
Learn more from
Azure DocumentDB
askdocdb@microsoft.com or follow @DocumentDB

[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure DocumentDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to [PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure DocumentDB

Similar to [PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure DocumentDB (20)

Recently uploaded

Recently uploaded (20)

[PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure DocumentDB

Editor's Notes