Schema Agnostic Indexing with
Azure DocumentDB
@dharmashukla, DocumentDB
Presented at VLDB 2015
Sudipta Sengupta, Justin Levandoski,
David Lomet
Microsoft Research
Dharma Shukla, Shireesh Thota, Karthik Raman,
Madhan Gajendran, Ankur Shah, Sergii Ziuzin,
Krishnan Sundaram, Miguel Gonzalez Guajardo, Anna
Wawrzyniak, Samer Boshra,
Renato Ferreira, Mohamed Nassar,
Michael Koltachev, Ji Huang
Microsoft Corporation
 Overview of DocumentDB
 Schema Agnostic Indexing
 Logical Index Organization
 Physical Index Organization
 Summary
Outline
 Fully managed, multi-tenant, geo-distributed document database service on
Azure
 Born out of the needs of internal Microsoft applications; GA since April 2015
 Built from the ground up with resource governance
 Provisioned throughput, performance isolation, OPEX efficiency
 Well defined consistency levels with predictable performance
 Database engine built for JSON & JavaScript
 Automatic indexing of JSON values and rich (SQL and JavaScript) query
 JavaScript language integrated transactions and query directly inside the database engine
What is DocumentDB?
Strong Bounded Staleness Session Eventual
Architecture
Database
Collection
Document
Account
User
Permission
JavaScript Object Literals
JSON serializable
values (aka JSON
Infoset)
{
"locations":
[
{ "country": "Germany", "city": "Berlin" },
{ "country": "France", "city": "Paris" }
],
"headquarter": "Belgium",
"exports":[{ "city": "Moscow" },{ "city": "Athens"}]
}
locations headquarter exports
0 1
country
Germany
city
Berlin
country
France
city
Paris
city
Moscow
city
Athens
Belgium 0 1
• Automatic indexing of document trees without
requiring schema or secondary indices
• SQL and JavaScript query processing on the trees
• Lazy materialization of JavaScript values from the
instances of trees
JSON document as tree
Schema-agnostic indexing
• Index is a union of all the document trees
Common
structure
• Structural information and instance values are normalized into a
unifying concept of JSON-Path
Terms Postings List/Values
$/location/0/ 1, 2
location/0/country/ 1, 2
location/0/city/ 1, 2
0/country/Germany 1, 2
1/country/France 2
… …
0/city/Moscow 2
0/dealers/0 2
0
Germany
location
0
location
country
0
country
Range (>, <, !=) &
ORDERBY queries
0
Germany
location
0
location
country
0
country
Wildcard queries Spatial queries
0
coordinates
Dynamic
Encoding of
Postings List
(E-WAH/differential)
Logical Index Organization
Query
{
"results":
[
{
"locations":
[
{"country":"Germany","city":"Berlin"},
{"country":"France","city":"Paris"}
]
}
]
}
{ "locations":
[ { "country": "Germany", "city": "Berlin" },
{ "country": "France", "city": "Paris" }
],
"headquarter": "Belgium",
"exports": [{ "city": "Moscow" }, { "city": "Athens" }]
}
{ "locations": [{ "country": "Germany", "city": "Bonn", "revenue": 200 } ],
"headquarter": "Italy",
"exports": [ { "city": "Berlin","dealers": [{"name": "Hans"}] }, { "city": "Athens" }
]
}
locations headquarter exports
0 1
country
Germany
city
Berlin
country
France
city
Paris
city
Moscow
city
Athens
Belgium
locations headquarter
0
country
Germany
city
Bonn
revenue
200
Italy
0 1
exports
city
Berlin
city
Athens
0
1
dealers
0
Hans
name
0
locations
0 1
country
Germany
city
Berlin
country
France
city
Paris
SELECT C.locations
FROM company C
WHERE C.headquarter = "Belgium"
results
Query result
Input documents
function businessLogic() {
var country = "Belgium";
__.filter(function(x){return x.headquarter===country;});}
SQL JavaScript
doc_id =5
key: “age/22”
payload: +doc5
key: “age/21”
payload: -doc5
key: “city/seattle”
payload: +doc5
key: “zip/98103”
payload: +doc5
…
Path/Posting List updates
Index
Query Processor
Indexscan > “age/30”
< “age/32”
doc1, doc5, doc7
System model for writes and queries
B-Tree
Cache
Log Structured Store
Index Maintanance Requirements
• Support sustained volume of rapid writes
without any term locality
• Queries should honor various consistency
levels
• Index maintenance must operate within
frugal resource budget
• Low write, read and space amplification
Page P
Page
ID
Physical
Address
P
Mapping Table
Δ: Insert record 50
Δ: Delete record 48
Δ: Update record 35 Δ: Insert record 60
Consolidated Page P
Update record 35 Insert record 60
HighlyConcurrentPageUpdatesHighly concurrent index updates
Base page
Log-structured Store on SSD
.
.
.
.
.
Mapping
table
Writeorderinginlog
Base page
Base page
-record
-record
(Latch-free)
Flush Buffer
(8MB)
.
.
Base page
-record
-record
RAM
-record
WriteOptimizedStorageOrganizationWrite optimized storage organization
• Little to no term locality on index write path
• Unable to keep “hot set” of leaf pages
cached in memory
• Performing read to modify each leaf node
leads to very high I/O overhead
• Requires method to maintain efficient write
path for sustained term ingestion with
predictable performance
update term t1
delete term t58
insert term t109
update term t179
update term t568
delete term t732
Lack of term locality
Blindupdates&ValueMerge
Address
Mapping Table
Log Structured Store (LSS)
T  {doc1, doc2, doc3, doc5}
Term T  -doc2
P
Read I/O
Page Stub
Address
Mapping Table
Log Structured Store (LSS)
Term T  +doc5
P
T->+doc2 T->-doc2
Page Stub
{doc1, doc2, doc3} {+doc5} {-doc2}
Term lookup or full
page consolidate
Page P
T  {doc1, doc2, doc3}
Add doc5 to posting list for term T
Page P
T  {doc1, doc2, doc3}
Page P
T  {doc1, doc2, doc3}
…
Consolidated Page P
T  {doc1, doc3, doc5}
Blind update for term T
Blind updates and value merge
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0 2000 4000 6000 8000 10000
NumberofIOs
Index Size (MB)
Update Blind Update
Summary

Schema Agnostic Indexing with Azure DocumentDB

  • 1.
    Schema Agnostic Indexingwith Azure DocumentDB @dharmashukla, DocumentDB Presented at VLDB 2015 Sudipta Sengupta, Justin Levandoski, David Lomet Microsoft Research Dharma Shukla, Shireesh Thota, Karthik Raman, Madhan Gajendran, Ankur Shah, Sergii Ziuzin, Krishnan Sundaram, Miguel Gonzalez Guajardo, Anna Wawrzyniak, Samer Boshra, Renato Ferreira, Mohamed Nassar, Michael Koltachev, Ji Huang Microsoft Corporation
  • 2.
     Overview ofDocumentDB  Schema Agnostic Indexing  Logical Index Organization  Physical Index Organization  Summary Outline
  • 3.
     Fully managed,multi-tenant, geo-distributed document database service on Azure  Born out of the needs of internal Microsoft applications; GA since April 2015  Built from the ground up with resource governance  Provisioned throughput, performance isolation, OPEX efficiency  Well defined consistency levels with predictable performance  Database engine built for JSON & JavaScript  Automatic indexing of JSON values and rich (SQL and JavaScript) query  JavaScript language integrated transactions and query directly inside the database engine What is DocumentDB? Strong Bounded Staleness Session Eventual
  • 4.
  • 5.
    JavaScript Object Literals JSONserializable values (aka JSON Infoset) { "locations": [ { "country": "Germany", "city": "Berlin" }, { "country": "France", "city": "Paris" } ], "headquarter": "Belgium", "exports":[{ "city": "Moscow" },{ "city": "Athens"}] } locations headquarter exports 0 1 country Germany city Berlin country France city Paris city Moscow city Athens Belgium 0 1 • Automatic indexing of document trees without requiring schema or secondary indices • SQL and JavaScript query processing on the trees • Lazy materialization of JavaScript values from the instances of trees JSON document as tree Schema-agnostic indexing
  • 6.
    • Index isa union of all the document trees Common structure • Structural information and instance values are normalized into a unifying concept of JSON-Path Terms Postings List/Values $/location/0/ 1, 2 location/0/country/ 1, 2 location/0/city/ 1, 2 0/country/Germany 1, 2 1/country/France 2 … … 0/city/Moscow 2 0/dealers/0 2 0 Germany location 0 location country 0 country Range (>, <, !=) & ORDERBY queries 0 Germany location 0 location country 0 country Wildcard queries Spatial queries 0 coordinates Dynamic Encoding of Postings List (E-WAH/differential) Logical Index Organization
  • 7.
    Query { "results": [ { "locations": [ {"country":"Germany","city":"Berlin"}, {"country":"France","city":"Paris"} ] } ] } { "locations": [ {"country": "Germany", "city": "Berlin" }, { "country": "France", "city": "Paris" } ], "headquarter": "Belgium", "exports": [{ "city": "Moscow" }, { "city": "Athens" }] } { "locations": [{ "country": "Germany", "city": "Bonn", "revenue": 200 } ], "headquarter": "Italy", "exports": [ { "city": "Berlin","dealers": [{"name": "Hans"}] }, { "city": "Athens" } ] } locations headquarter exports 0 1 country Germany city Berlin country France city Paris city Moscow city Athens Belgium locations headquarter 0 country Germany city Bonn revenue 200 Italy 0 1 exports city Berlin city Athens 0 1 dealers 0 Hans name 0 locations 0 1 country Germany city Berlin country France city Paris SELECT C.locations FROM company C WHERE C.headquarter = "Belgium" results Query result Input documents function businessLogic() { var country = "Belgium"; __.filter(function(x){return x.headquarter===country;});} SQL JavaScript
  • 8.
    doc_id =5 key: “age/22” payload:+doc5 key: “age/21” payload: -doc5 key: “city/seattle” payload: +doc5 key: “zip/98103” payload: +doc5 … Path/Posting List updates Index Query Processor Indexscan > “age/30” < “age/32” doc1, doc5, doc7 System model for writes and queries
  • 9.
    B-Tree Cache Log Structured Store IndexMaintanance Requirements • Support sustained volume of rapid writes without any term locality • Queries should honor various consistency levels • Index maintenance must operate within frugal resource budget • Low write, read and space amplification
  • 10.
    Page P Page ID Physical Address P Mapping Table Δ:Insert record 50 Δ: Delete record 48 Δ: Update record 35 Δ: Insert record 60 Consolidated Page P Update record 35 Insert record 60 HighlyConcurrentPageUpdatesHighly concurrent index updates
  • 11.
    Base page Log-structured Storeon SSD . . . . . Mapping table Writeorderinginlog Base page Base page -record -record (Latch-free) Flush Buffer (8MB) . . Base page -record -record RAM -record WriteOptimizedStorageOrganizationWrite optimized storage organization
  • 12.
    • Little tono term locality on index write path • Unable to keep “hot set” of leaf pages cached in memory • Performing read to modify each leaf node leads to very high I/O overhead • Requires method to maintain efficient write path for sustained term ingestion with predictable performance update term t1 delete term t58 insert term t109 update term t179 update term t568 delete term t732 Lack of term locality
  • 13.
    Blindupdates&ValueMerge Address Mapping Table Log StructuredStore (LSS) T  {doc1, doc2, doc3, doc5} Term T  -doc2 P Read I/O Page Stub Address Mapping Table Log Structured Store (LSS) Term T  +doc5 P T->+doc2 T->-doc2 Page Stub {doc1, doc2, doc3} {+doc5} {-doc2} Term lookup or full page consolidate Page P T  {doc1, doc2, doc3} Add doc5 to posting list for term T Page P T  {doc1, doc2, doc3} Page P T  {doc1, doc2, doc3} … Consolidated Page P T  {doc1, doc3, doc5} Blind update for term T Blind updates and value merge 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 0 2000 4000 6000 8000 10000 NumberofIOs Index Size (MB) Update Blind Update
  • 14.