This document summarizes a presentation about DocumentDB on Azure. It discusses what DocumentDB is, how it works as a fully managed NoSQL database, and some key features for developers. DocumentDB allows storing and querying JSON documents, offers tunable consistency levels, and exposes APIs for common languages like .NET, Node.js, and Python. The presentation provides an overview of DocumentDB's capabilities and when it would be a good fit compared to relational databases or other document stores.
1. Cool NoSQL on Azure with
DocumentDB
Azure User Group Belgium
2. Who am I
• Jan Hentschel
• Senior Software Development Lead – Ultra Tendency UG
• @Horizon_Net
• http://janatdevelopment.com
• Microsoft MVP for Azure
3. What Microsoft says about DocumentDB
Fully managed, scalable, queryable, schemafree JSON document
service for modern application …
What?
4. What DocumentDB really is
• Fully managed = Work on JSON data without managing VM or cluster
infrastructure
• Scalable = Runs on Azure
• Queryable = JavaScript as a modern T-SQL
• Schemafree = Document Store
5. What you need to know about document
stores
• It’s all about collections and documents
• A collection stores a bunch of documents
• Documents are schema-free
• You can put any kind of documents into one collection
• It was never easier to store your cook recipes together with your financial
data
• And don’t try to use your “relational mind”!!!
6. What you also need to know
• Transactional support with ACID semantics
• API exposed as REST over HTTP
• All entities uniquely addressable by a logical URI
• Tunable consistency
• Tune and trade off consistency through well defined levels to suit application
scenarios and performance needs
• Consistency level can be weakened per read/query request
7. What … tunable consistency?
Level Writes Reads
Strong Sync quorum writes Quorum reads
Bounded Async replication Quorum reads
Session Async replication Session bound replica
Eventual Async replication Any replica
8. Some theory … with BASE in mind
• Eventual consistency
• All changes will be propagated in some point in the future
• Quorum
• Response after data is written on (replication_factor / 2 + 1) nodes
9. What’s important for a developer?
• JavaScript UDFs, Triggers, Stored Procedures
• Language integrated transactions
• The entire procedure is wrapped in an implicit ACID transaction
• JavaScript exception results into aborting the transaction
• “document oriented” SQL grammar
• REST/HTTP APIs and client SDKs
• .NET, Node.js, JavaScript, Python
• C++ and Java planned
• Asynchronous support for all operations
10. What’s the resource model like?
src: http://azure.microsoft.com/en-us/
documentation/articles/documentdb-interactions-
with-resources/
12. Current quotas
• # of stored procedures, triggers and UDFs per collection = 25
• # of AND clauses per query = 5
• # of OR clauses per query = 5
• request size of document = 256Kb
• request size of stored procedure, trigger and UDF = 256Kb
• For more see - http://azure.microsoft.com/en-us/
documentation/articles/documentdb-limits/
13. What’s more important for a C# developer?
• Support gateway and direct connectivity
• Async APIs for all operations
• HTTP and TCP transports available
• POCOs, inherited document types and dynamics
LINQ!!!
LINQ!!!
LINQ!!!
14. The small print
It’s all sandboxed so …
… no imports are allowed
… eval() is disallowed
… execution is time boxed
… resource governed for CPU, IO and memory
16. When should you use DocumentDB
In General
• You don’t want to do replication and scale-out by yourself
• You want to have tunable consistency
• You want to do rapid development
Compared to relational databases
• You don’t want predefined columns
Compared to other document stores
• You want to use a SQL-like grammar
17. Last words
DocumentDB is still preview …
… expect some things to change
… give feedback
… SDKs open sourced through GitHub
Samples available here!
Watch out for CloudBrew on November 29th
Editor's Notes
And don’t try to use your “relational mind”
Don’t try to put the metaphors you know from the SQL world into a document store
STRONG - all writes are visible to all readers. Writes synchronously committed by a majority quorum of replicas and reads are
acknowledged by the majority read quorum
BOUNDED STALENESS - guaranteed ordering of writes, reads adhere to minimum freshness. Writes are propagated asynchronously,
reads are acknowledged by majority quorum lagging by at most K prefixes
SESSION - read your own writes. Writes are propagated asynchronously while reads for a session are issued against the replica that can
serve the requested version
EVENTUAL - reads eventually converge with writes. Writes are propagated asynchronously while reads can be acknowledged by any
replica. Readers may view older data then previously observed.
Session - ideal consistency and performance tradeoff for many application scenarios. High performance writes and reads with predictable consistency
BASE
basically available = a version can be asked for at any time
soft state = version and state of the data is volatile
eventual consistency = all change will be propagated in some point in the future
Database Account – A database account is associated with one or more capacity units representing provisioned document storage and throughput, a set of databases and blob storage. You can create one or more database accounts using your Azure subscription.
Database - A database is a logical container of document storage partitioned across collections. It is also a users container.
User - The logical namespace for scoping/partitioning permissions.
Permission - An authorization token associated with a user for authorized access to a specific resource.
Collection - A collection is a container of JSON documents and associated JavaScript application logic.
Stored Procedure - Application logic written in JavaScript which is registered with a collection and transactionally executed within the database engine.
Trigger - Application logic written in JavaScript modeling side effects associated with an insert, replace or delete operations.
UDF - A side effect free, application logic written in JavaScript. UDFs enable you to model a custom query operator and thereby extend the core DocumentDB query language.
Document - User defined (arbitrary) JSON content. By default, no schema needs to be defined or secondary indices need to be provided for all the documents added to a collection.
Attachment - Attachment are special documents containing references and associated metadata to an external blob/media. The developer can choose to have the blob managed by DocumentDB or store it with an external blob service provider such as OneDrive, Dropbox etc.