DocumentDB 
Managed NoSQL database 
RADU PINTILIE 
LIVIU MAZILU
DocumentDB October 25, 2014 
Previous subjects 
CODECAMP 
Challenges in distributed applications 
SQL Azure Federation 
HDInsight 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Agenda 
DocumentDB 
The need for storage 
DocumentDB Overview 
Development 
Case Scenarios 
© EXPERT NETWORK
DocumentDB October 25, 2014 
The need for storage 
Why do we store data? 
How do we store it? 
What’s important? 
© EXPERT NETWORK
DocumentDB October 25, 2014 
What are the options 
Flat files 
Relational 
Non-relational 
Key-value 
Tabular 
Document 
© EXPERT NETWORK
DocumentDB October 25, 2014 
What’s important 
CAP theorem 
Consistency – each unit always has the same view of the 
data 
Availability – all units can always read or write 
Partition tolerance – system works well across physical 
network partitions 
Plot twist : you can choose only two 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Consistent, Available (CA) Systems 
CA Systems have trouble with partitions and typically deal 
with it with replication. Examples of CA systems include: 
Traditional RDBMSs like Postgres, MySQL, etc (relational) 
Vertica (column-oriented) 
Aster Data (relational) 
Greenplum (relational) 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Consistent, Partition-Tolerant (CP) Systems 
CP Systems have trouble with availability while keeping 
data consistent across partitioned nodes. Examples of CP 
systems include: 
BigTable (column-oriented/tabular) 
Hypertable (column-oriented/tabular) 
HBase (column-oriented/tabular) 
MongoDB (document-oriented) 
Terrastore (document-oriented) 
Redis (key-value) 
Scalaris (key-value) 
MemcacheDB (key-value) 
Berkeley DB (key-value) 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Available, Partition-Tolerant (AP) Systems 
AP Systems achieve "eventual consistency" through 
replication and verification. Examples of AP systems 
include: 
Dynamo (key-value) 
Voldemort (key-value) 
Tokyo Cabinet (key-value) 
KAI (key-value) 
Cassandra (column-oriented/tabular) 
CouchDB (document-oriented) 
SimpleDB (document-oriented) 
Riak (document-oriented) 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Features 
DocumentDB 
Fully managed 
Schema-less, NoSQL document database 
Stored entities are JSON documents 
Tunable consistency 
Designed to scale into petabytes 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Databases in Azure 
Relational 
SQL Database (PaaS) 
SQL Server (IaaS) 
NoSQL 
Azure Tables – structured, non-relational data 
DocumentDB – document database 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Resource Model 
Database Account 
Database 
Collection 
Document 
© EXPERT NETWORK 
Attachment 
Stored Procedure 
Trigger 
User-defined functions 
User 
Permission 
Media
DocumentDB October 25, 2014 
Resource Addresing 
Interface is RESTful 
Each resource has a unique ID 
API URL : 
codecamp.documents.azure.com 
Document path : 
/dbs/{database id}/colls/{collection id}/docs/{document id} 
Example URL : 
dbs/Cv8kAA==/colls/Cv8kAMUKpAA=/docs/Cv8kAMUKpAACA 
AAAAAAAAA==/ 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Operations 
For each resource 
Create 
Replace 
Delete 
Read 
Query 
Read – GET Operation on a specified ID, returns a single 
resource. 
Query – POST Operation on a collection with a request 
containing DocumentDB SQL text, returning a collection 
© EXPERT NETWORK
DocumentDB October 25, 2014 
DocumentDB SQL 
SELECT <select-list> 
FROM <from-specification> 
WHERE <filter-condition> 
Similar to normal SQL 
Ability to reach into JSON tree to: 
Access values for filter condition 
Shape select list 
User-defined functions 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Consistency Levels 
Strong - the operation will not return until the query has been made durable 
Bounded Staleness - guarantees the order of propagation of writes but with 
reads potentially lagging behind the writes - useful for applications dealing 
with time and ordered operations 
Session - strong consistency scoped to a single client session. This consistency 
level is usually sufficient 
Eventual - the weakest form of consistency where a client may get the values 
which are older than the ones it had seen before, over time. Lowest latency for 
reads and writes 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Indexing Policy 
Specified at the collection level 
Automatic indexing 
By default all properties indexed automatically. This is tunable for individual 
documents and paths within a document – either inclusion or exclusion of a 
path 
Index precision can be specified for strings and numbers 
Indexing mode 
Consistent – By default indexes synchronously updated on insert, replace or 
delete 
Lazy – asynchronous index update (targeted at bulk ingestion) 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Performance 
Capacity Unit 
Specified amount of storage capacity and operational throughput 
Collection quota per capacity unit 
Provisioning unit for scaleout for both performance and storage 
Configured at the database account level 
Preview limit is 10GB, 3 collections per capacity unit 
Storage is SSD backed 
Microsoft has used databases with terabytes of storage 
(designed for petabytes) 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Stored Procedures,Triggers and UDFs 
DocumentDB supports server-side JavaScript 
Stored Procedures: 
Registered at collection level 
Operate on any document in the collection 
Invoked inside transaction 
Triggers: 
Pre- or Post: create, replace or delete operations 
Invoked inside transaction 
User-Defined Functions 
Scalar functions invoked only inside queries 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Libraries 
.NET API 
Node.js 
JavaScript client 
JavaScript server 
Python 
© EXPERT NETWORK
DocumentDB October 25, 2014 
RESTful API 
Core interface to DocumentDB 
Used by all client libraries 
Standard operations against all DocumentDB resources: 
CREATE, DELETE, PUT, GET, POST 
Returns permanent resource URL on creation 
DocumentDB request headers 
© EXPERT NETWORK
DocumentDB October 25, 2014 
DEMO 
© EXPERT NETWORK
DocumentDB October 25, 2014 
USE CASE 
SCENARIOS 
Good for unstructured data 
Denormalized schema 
Need to scale 
Hybrid solutions (RDBMS + NoSQL) 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Conclusions 
The need for storage 
DocumentDB Overview 
Development 
Case Scenarios 
© EXPERT NETWORK
DocumentDB October 25, 2014 
Questions 
© EXPERT NETWORK 
? 
DocumentDB
DocumentDB October 25, 2014 
DocumentDB 
Feedback 
Please complete the feedback forms 
© EXPERT NETWORK
DocumentDB October 25, 2014 
© EXPERT NETWORK 
THANK YOU

Radu pintilie + liviu mazilu document db

  • 2.
    DocumentDB Managed NoSQLdatabase RADU PINTILIE LIVIU MAZILU
  • 3.
    DocumentDB October 25,2014 Previous subjects CODECAMP Challenges in distributed applications SQL Azure Federation HDInsight © EXPERT NETWORK
  • 4.
    DocumentDB October 25,2014 Agenda DocumentDB The need for storage DocumentDB Overview Development Case Scenarios © EXPERT NETWORK
  • 5.
    DocumentDB October 25,2014 The need for storage Why do we store data? How do we store it? What’s important? © EXPERT NETWORK
  • 6.
    DocumentDB October 25,2014 What are the options Flat files Relational Non-relational Key-value Tabular Document © EXPERT NETWORK
  • 7.
    DocumentDB October 25,2014 What’s important CAP theorem Consistency – each unit always has the same view of the data Availability – all units can always read or write Partition tolerance – system works well across physical network partitions Plot twist : you can choose only two © EXPERT NETWORK
  • 8.
    DocumentDB October 25,2014 Consistent, Available (CA) Systems CA Systems have trouble with partitions and typically deal with it with replication. Examples of CA systems include: Traditional RDBMSs like Postgres, MySQL, etc (relational) Vertica (column-oriented) Aster Data (relational) Greenplum (relational) © EXPERT NETWORK
  • 9.
    DocumentDB October 25,2014 Consistent, Partition-Tolerant (CP) Systems CP Systems have trouble with availability while keeping data consistent across partitioned nodes. Examples of CP systems include: BigTable (column-oriented/tabular) Hypertable (column-oriented/tabular) HBase (column-oriented/tabular) MongoDB (document-oriented) Terrastore (document-oriented) Redis (key-value) Scalaris (key-value) MemcacheDB (key-value) Berkeley DB (key-value) © EXPERT NETWORK
  • 10.
    DocumentDB October 25,2014 Available, Partition-Tolerant (AP) Systems AP Systems achieve "eventual consistency" through replication and verification. Examples of AP systems include: Dynamo (key-value) Voldemort (key-value) Tokyo Cabinet (key-value) KAI (key-value) Cassandra (column-oriented/tabular) CouchDB (document-oriented) SimpleDB (document-oriented) Riak (document-oriented) © EXPERT NETWORK
  • 11.
    DocumentDB October 25,2014 Features DocumentDB Fully managed Schema-less, NoSQL document database Stored entities are JSON documents Tunable consistency Designed to scale into petabytes © EXPERT NETWORK
  • 12.
    DocumentDB October 25,2014 Databases in Azure Relational SQL Database (PaaS) SQL Server (IaaS) NoSQL Azure Tables – structured, non-relational data DocumentDB – document database © EXPERT NETWORK
  • 13.
    DocumentDB October 25,2014 Resource Model Database Account Database Collection Document © EXPERT NETWORK Attachment Stored Procedure Trigger User-defined functions User Permission Media
  • 14.
    DocumentDB October 25,2014 Resource Addresing Interface is RESTful Each resource has a unique ID API URL : codecamp.documents.azure.com Document path : /dbs/{database id}/colls/{collection id}/docs/{document id} Example URL : dbs/Cv8kAA==/colls/Cv8kAMUKpAA=/docs/Cv8kAMUKpAACA AAAAAAAAA==/ © EXPERT NETWORK
  • 15.
    DocumentDB October 25,2014 Operations For each resource Create Replace Delete Read Query Read – GET Operation on a specified ID, returns a single resource. Query – POST Operation on a collection with a request containing DocumentDB SQL text, returning a collection © EXPERT NETWORK
  • 16.
    DocumentDB October 25,2014 DocumentDB SQL SELECT <select-list> FROM <from-specification> WHERE <filter-condition> Similar to normal SQL Ability to reach into JSON tree to: Access values for filter condition Shape select list User-defined functions © EXPERT NETWORK
  • 17.
    DocumentDB October 25,2014 Consistency Levels Strong - the operation will not return until the query has been made durable Bounded Staleness - guarantees the order of propagation of writes but with reads potentially lagging behind the writes - useful for applications dealing with time and ordered operations Session - strong consistency scoped to a single client session. This consistency level is usually sufficient Eventual - the weakest form of consistency where a client may get the values which are older than the ones it had seen before, over time. Lowest latency for reads and writes © EXPERT NETWORK
  • 18.
    DocumentDB October 25,2014 Indexing Policy Specified at the collection level Automatic indexing By default all properties indexed automatically. This is tunable for individual documents and paths within a document – either inclusion or exclusion of a path Index precision can be specified for strings and numbers Indexing mode Consistent – By default indexes synchronously updated on insert, replace or delete Lazy – asynchronous index update (targeted at bulk ingestion) © EXPERT NETWORK
  • 19.
    DocumentDB October 25,2014 Performance Capacity Unit Specified amount of storage capacity and operational throughput Collection quota per capacity unit Provisioning unit for scaleout for both performance and storage Configured at the database account level Preview limit is 10GB, 3 collections per capacity unit Storage is SSD backed Microsoft has used databases with terabytes of storage (designed for petabytes) © EXPERT NETWORK
  • 20.
    DocumentDB October 25,2014 Stored Procedures,Triggers and UDFs DocumentDB supports server-side JavaScript Stored Procedures: Registered at collection level Operate on any document in the collection Invoked inside transaction Triggers: Pre- or Post: create, replace or delete operations Invoked inside transaction User-Defined Functions Scalar functions invoked only inside queries © EXPERT NETWORK
  • 21.
    DocumentDB October 25,2014 Libraries .NET API Node.js JavaScript client JavaScript server Python © EXPERT NETWORK
  • 22.
    DocumentDB October 25,2014 RESTful API Core interface to DocumentDB Used by all client libraries Standard operations against all DocumentDB resources: CREATE, DELETE, PUT, GET, POST Returns permanent resource URL on creation DocumentDB request headers © EXPERT NETWORK
  • 23.
    DocumentDB October 25,2014 DEMO © EXPERT NETWORK
  • 24.
    DocumentDB October 25,2014 USE CASE SCENARIOS Good for unstructured data Denormalized schema Need to scale Hybrid solutions (RDBMS + NoSQL) © EXPERT NETWORK
  • 25.
    DocumentDB October 25,2014 Conclusions The need for storage DocumentDB Overview Development Case Scenarios © EXPERT NETWORK
  • 26.
    DocumentDB October 25,2014 Questions © EXPERT NETWORK ? DocumentDB
  • 27.
    DocumentDB October 25,2014 DocumentDB Feedback Please complete the feedback forms © EXPERT NETWORK
  • 28.
    DocumentDB October 25,2014 © EXPERT NETWORK THANK YOU

Editor's Notes

  • #3 Pregătire: Prezentare speakeri Agendă: vom discuta despre serviciul DocumentDB
  • #4 Obiectiv : Sustinem CodeCamp si este a patra noastra participare la eveniment. Subiectele anterioare au fost despre : Provocari in realizarea aplicatiilor distribuite, am discutat despre SQL Azure Federation (partionarea orizontala a tabelelor), HDInsight – procesare de bigData. Ne place sa testam noile tehnologii de la Microsoft. Tranzitie : DocumentDB – este in preview si am vrut sa il testam. In continuare vom vedea agenda prezentarii. Puncte de discutat: Note:
  • #5 Prezentare agenda
  • #6 Aici apelam la public pentru a raspunde la intrebari. What’s important : viteza de dezvoltare, scalabilitate, perfromanta, cost, uptime.
  • #7 Discutam despre stocarea de date Relational storage vs NoSQL Relational storage – partitionare verticala si orizontala, schema fixa, need for join NoSQL – many types : Document database – MongoDB, CouchDB, RavenDB What they have in common : over HTTP, JSON storage, multiple API. Concerns : simplicity, speed, scalability
  • #8 Teorema CAP Intr-un sistem distribuit este imposibil sa furnizezi toate cele 3 atribute In order to get both availability and partition tolerance, you have to give up consistency. Consider if you have two nodes, X and Y. Now, there is a break between network comms in X and Y, so they can't synch updates. At this point you can either: A) Allow the nodes to get out of sync (giving up consistency), or B) Consider the cluster to be "down" (giving up availability)
  • #9 CA - data is consistent between all nodes - as long as all nodes are online - and you can read/write from any node and be sure that the data is the same, but if you ever develop a partition between nodes, the data will be out of sync (and won't re-sync once the partition is resolved).
  • #10 CP - data is consistent between all nodes, and maintains partition tolerance (preventing data desync) by becoming unavailable when a node goes down.
  • #11 AP - nodes remain online even if they can't communicate with each other and will resync data once the partition is resolved, but you aren't guaranteed that all nodes will have the same view on data (either during or after the partition) CAP theorem is not so black and white Where’s DocumentDB?
  • #12 Fully managed : este oferita infrastructure hardware si software ca serviciu Baza de date ce stocheaza documente, schema nu este fixa Entitatile stocate – documente JSON Consistenta configurabila – in functie de scenariu putem alege intre a avea consistency sau availability Conceput sa poata scala
  • #13 SQL Database – Platform as a service (hardware, networking, software) SQL Server – Infrastructure as a service (hardware) Azure Tables – structured, non relational data DocumentDB – NoSQL document database
  • #14 Descris modelul de resurse
  • #15 Cum adresam resursele? Interfata RESTful
  • #16 Descrise operatiile
  • #17 Sintaxa similara cu SQL obisnuit Putem naviga in tree-ul JSON pentru filtrare si selectie Exista user defined function
  • #18 Strong - always guaranteed to read the latest acknowledged write Bounded Staleness - useful for applications dealing with time and ordered operations Session – default one Eventual - Lowest latency for reads and writes
  • #19 - indecsi = default hash indexes - document indexat = > queryurile o sa il gaseasca - neindexat- nu il gasim cu queryuri, dar il gasim dupa id-ul resursei - usecase: disable indexes on path you won't search on => viteza la create - lazy - pentru viteza la bulk insert, dar citirile pot deveni inconsistente
  • #20 Conceptul de capacity unit – este oferita o capacitate de stocare si putere de calcul bine definite - Numar maxim de colectii pe unitate de capacitate
  • #22 Exista o serie de librarii peste API-ul rest ce administreaza datele.
  • #23 API REST – interfata spre documentDB Headere – configurat request-ul
  • #24  Fiddler- creare resursa Query simplu dupa ID Query where Udf Stored procedure
  • #25 -NoSQL is typically good for unstructured/"schemaless" data - usually, you don't need to explicitly define your schema up front and can just include new fields without any ceremony -NoSQL typically favours a denormalised schema due to no support for JOINs per the RDBMS world. So you would usually have a flattened, denormalized representation of your data. -It's often very easy to scale out NoSQL solutions. Adding more nodes to replicate data to is one way to a) offer more scalability and b) offer more protection against data loss if one node goes down. -It doesn't have to be a 1 or the other choice. My experience has been using RDBMS in conjunction with NoSQL for certain use cases.