DotNetLombardia
Milano Fiori, Italy
 www.slideshare.net/marco.parenzan
 www.github.com/marcoparenzan
 marco [dot] parenzan [at] 1nn0va [dot] it
 www.1nnova.it
 @marco_parenzan
Formazione ,Divulgazione e Consulenza con 1nn0va
Microsoft MVP 2015 for Microsoft Azure
Cloud Architect, NET developer
Loves Functional Programming, Html5 Game Programming and Internet of Things AZURE
COMMUNITY
BOOTCAMP 2015
IoT Day - 08/05/2015
@1nn0va
#microservicesconf2015
9 Maggio 2015
Classic MVC
Business Logic
Contract BL/P
View
Controller
CQRS for IoT (Service Bus Powered)
Event Handler
UI
Event
Command Handler
Event
Device
Queue
Topics/Subscription
Event Hub
Write
Model
Read
/Search
Model
The traditional world
http://azure.microsoft.com/en-us/documentation/infographics/cloud-design-patterns/
IoT day 2015
Business, no longer data, is the foundation of software design
DDD!=OOP
Don’t start from Data
Data are not unique
No more ACID…ACID transactions are not useful with a
distributed model over different storages
IoT day 2015
Key/Value
Table
Blob
Queue
Graph
Document
IoT day 2015
try to treat your entities as self-contained documents represented in JSON
When working with relational databases, we've been taught for years to normalize, normalize,
normalize.
There are contains relationships between entities.
There are one-to-few relationships between entities.
There is embedded data that changes infrequently.
There is embedded data won't grow without bound.
There is embedded data that is integral to data in a document.
better read performance
IoT day 2015
Representing one-to-many relationships.
Representing many-to-many relationships.
Related data changes frequently.
Referenced data could be unbounded
Provides more flexibility than embedding
More round trips to read data
Normalizing typically provides better write performance
IoT day 2015
Promote code first development (mapping objects to json)
Resilient to iterative schema changes
Richer query and indexing (compared to KV stores)
Low impedance as object / JSON store; no ORM required
It just works
It’s fast
IoT day 2015
a container of JSON documents and the associated JavaScript
application logic
JSON docs inside of a collection can vary dramatically
A unit of scale for transaction and query throughput (capacity
units allocated uniformly across all collections)
A unit of scale for capacity
A unit of replication
IoT day 2015
Collections in DocumentDB are not just logical containers, but
also physical containers
They are the transaction boundary for stored procedures and triggers
entry point to queries and CRUD operations
Each collection is assigned a reserved amount of throughput which is
not shared with other collections in the same account
Collections do not enforce schema
Partitioning
IoT day 2015
In hash partitioning, partitions are assigned based on the value
of a hash function, allowing you to evenly distribute requests
and data across a number of partitions. This is commonly used
to partition data produced or consumed from a large number
of distinct clients, and is useful for storing user profiles, catalog
items, and IoT ("Internet of Things") telemetry data.
Evenly distribute across n number of partitions (algorithmic) ….
IoT day 2015
In range partitioning, partitions are assigned based on whether
the partition key is within a certain range
This is commonly used for partitioning with time
stamp properties
Keep current data hot, Warm historical data, Scale-down older
data, Purge / Archive
IoT day 2015
In lookup partitioning, partitions are assigned based on a lookup
map that assigns discrete partition values to specific partitions a.k.a. a
partition or shard map
This is commonly used for partitioning by region
Home tenant / user to a specific partition. Use "master" lookup.
Cache this shard map to avoid making the lookup the bottleneck
Tenant Partition Id
Customer 1
Big Customer 2
Another 3
Consistency
IoT day 2015
Query / transaction throughput (and reliability – i.e., hardware
failure) depend on replication!
All writes to the primary are replicated across two secondary replicas
All reads are distributed across three copies
“Scalability of throughput” – allowing different clients to read from different replicas
helps prevent bottlenecks
BUT replication takes time!
Potential scenario: some clients are
reading while another is writing
Now, the data is out-of-date, inconsistent!
IoT day 2015
Trade-off: speed (performance & availability) or consistency
(data correctness)?
“Does every read need the MOST current data?”
“Or do I need every request to be handled and handled quickly?”
No “one size fits all” answer … so it’s up to you!
4 options …
For the entire Db…
…In a future release, we intend to support overriding the default consistency level
on a per collection basis.
IoT day 2015
client always sees completely consistent data
Slowest reads / writes
Mission critical: e.x. stock market, banking, airline reservation
IoT day 2015
Default – even trade-off between performance & availability vs.
data correctness
client reads its own writes, but other clients reading this same
data might see older values
IoT day 2015
client might see old data, but it can specify a limit for how old
that data can be (ex. 2 seconds)
Updates happen in order received
similar to Session consistency, but speeds up reads while still
preserving the order of updates
IoT day 2015
client might see old data for as long as it takes a write to
propagate to all replicas
High performance & availability, but a client might sometimes
read out-of-date information or see updates out of order
IoT day 2015
At the database level (see preview portal)
On a per-read or per-query basis (optional parameter on
CreateDocumentQuery method)
IoT day 2015
Use Weaker Consistency Levels for better Read latencies
IoT
Data Analysis
http://azure.microsoft.com/blog/2015/01/27/performance-tips-
for-azure-documentdb-part-2/
https://github.com/marcoparenzan/CSharpDay2015https://github.com/marcoparenzan/CSharpDay2015
DotNetLombardia
Milano Fiori, Italy

Azure Document Db

  • 1.
  • 2.
     www.slideshare.net/marco.parenzan  www.github.com/marcoparenzan marco [dot] parenzan [at] 1nn0va [dot] it  www.1nnova.it  @marco_parenzan Formazione ,Divulgazione e Consulenza con 1nn0va Microsoft MVP 2015 for Microsoft Azure Cloud Architect, NET developer Loves Functional Programming, Html5 Game Programming and Internet of Things AZURE COMMUNITY BOOTCAMP 2015 IoT Day - 08/05/2015 @1nn0va #microservicesconf2015 9 Maggio 2015
  • 3.
  • 4.
    CQRS for IoT(Service Bus Powered) Event Handler UI Event Command Handler Event Device Queue Topics/Subscription Event Hub Write Model Read /Search Model
  • 5.
  • 6.
  • 7.
    IoT day 2015 Business,no longer data, is the foundation of software design DDD!=OOP Don’t start from Data Data are not unique No more ACID…ACID transactions are not useful with a distributed model over different storages
  • 9.
  • 10.
    IoT day 2015 tryto treat your entities as self-contained documents represented in JSON When working with relational databases, we've been taught for years to normalize, normalize, normalize. There are contains relationships between entities. There are one-to-few relationships between entities. There is embedded data that changes infrequently. There is embedded data won't grow without bound. There is embedded data that is integral to data in a document. better read performance
  • 11.
    IoT day 2015 Representingone-to-many relationships. Representing many-to-many relationships. Related data changes frequently. Referenced data could be unbounded Provides more flexibility than embedding More round trips to read data Normalizing typically provides better write performance
  • 12.
    IoT day 2015 Promotecode first development (mapping objects to json) Resilient to iterative schema changes Richer query and indexing (compared to KV stores) Low impedance as object / JSON store; no ORM required It just works It’s fast
  • 13.
    IoT day 2015 acontainer of JSON documents and the associated JavaScript application logic JSON docs inside of a collection can vary dramatically A unit of scale for transaction and query throughput (capacity units allocated uniformly across all collections) A unit of scale for capacity A unit of replication
  • 14.
    IoT day 2015 Collectionsin DocumentDB are not just logical containers, but also physical containers They are the transaction boundary for stored procedures and triggers entry point to queries and CRUD operations Each collection is assigned a reserved amount of throughput which is not shared with other collections in the same account Collections do not enforce schema
  • 15.
  • 16.
    IoT day 2015 Inhash partitioning, partitions are assigned based on the value of a hash function, allowing you to evenly distribute requests and data across a number of partitions. This is commonly used to partition data produced or consumed from a large number of distinct clients, and is useful for storing user profiles, catalog items, and IoT ("Internet of Things") telemetry data. Evenly distribute across n number of partitions (algorithmic) ….
  • 17.
    IoT day 2015 Inrange partitioning, partitions are assigned based on whether the partition key is within a certain range This is commonly used for partitioning with time stamp properties Keep current data hot, Warm historical data, Scale-down older data, Purge / Archive
  • 18.
    IoT day 2015 Inlookup partitioning, partitions are assigned based on a lookup map that assigns discrete partition values to specific partitions a.k.a. a partition or shard map This is commonly used for partitioning by region Home tenant / user to a specific partition. Use "master" lookup. Cache this shard map to avoid making the lookup the bottleneck Tenant Partition Id Customer 1 Big Customer 2 Another 3
  • 19.
  • 20.
    IoT day 2015 Query/ transaction throughput (and reliability – i.e., hardware failure) depend on replication! All writes to the primary are replicated across two secondary replicas All reads are distributed across three copies “Scalability of throughput” – allowing different clients to read from different replicas helps prevent bottlenecks BUT replication takes time! Potential scenario: some clients are reading while another is writing Now, the data is out-of-date, inconsistent!
  • 21.
    IoT day 2015 Trade-off:speed (performance & availability) or consistency (data correctness)? “Does every read need the MOST current data?” “Or do I need every request to be handled and handled quickly?” No “one size fits all” answer … so it’s up to you! 4 options … For the entire Db… …In a future release, we intend to support overriding the default consistency level on a per collection basis.
  • 22.
    IoT day 2015 clientalways sees completely consistent data Slowest reads / writes Mission critical: e.x. stock market, banking, airline reservation
  • 23.
    IoT day 2015 Default– even trade-off between performance & availability vs. data correctness client reads its own writes, but other clients reading this same data might see older values
  • 24.
    IoT day 2015 clientmight see old data, but it can specify a limit for how old that data can be (ex. 2 seconds) Updates happen in order received similar to Session consistency, but speeds up reads while still preserving the order of updates
  • 25.
    IoT day 2015 clientmight see old data for as long as it takes a write to propagate to all replicas High performance & availability, but a client might sometimes read out-of-date information or see updates out of order
  • 26.
    IoT day 2015 Atthe database level (see preview portal) On a per-read or per-query basis (optional parameter on CreateDocumentQuery method)
  • 27.
    IoT day 2015 UseWeaker Consistency Levels for better Read latencies IoT Data Analysis http://azure.microsoft.com/blog/2015/01/27/performance-tips- for-azure-documentdb-part-2/
  • 28.