© 2013 triAGENS GmbH | 2013-06-18
Query Languages
for Document Stores
2013-06-18
Jan Steemann
© 2013 triAGENS GmbH | 2013-06-18
me
 I'm a software developer
 working at triAGENS GmbH, CGN
 on - a document store
© 2013 triAGENS GmbH | 2013-06-18
Documents
© 2013 triAGENS GmbH | 2013-06-18
Documents
 documents are self-contained,
aggregate data structures...
 ...consisting of named and typed attributes,
which can be nested / hierarchical
 documents can be used to model complex
business objects
© 2013 triAGENS GmbH | 2013-06-18
Example order document
{ 
  "id": "abc­100­22", 
  "date": "2013­04­26" 
  "customer": {
    "id": "c­199­023",
    "name": "acme corp."
  },
  "items": [ { 
      "id": "p­123",
      "quantity": 1,
      "price": 25.13
  } ]
}  
© 2013 triAGENS GmbH | 2013-06-18
Document stores
 document stores are databases
specialised in handling documents
 they've been around for a while
 got really popular with the NoSQL buzz
(CouchDB, MongoDB, ...)
© 2013 triAGENS GmbH | 2013-06-18
Why use
Document
Stores?
© 2013 triAGENS GmbH | 2013-06-18
Saving programming language data
 document stores allow saving a
programming language object as a whole
 your programming language object
becomes a document in the database,
without the need for much transformation
 compare this to saving data in a relational
database...
© 2013 triAGENS GmbH | 2013-06-18
Persistence the relational way
orders
id date
1 2013-04-20
2 2013-04-21
3 2013-04-21
4 2013-04-22
customers
customer
c1
c2
c1
c3
id name
c1
c2
c3
acme corp.
sample.com
abc co.
orderitems
1
order item
1
price quantity
23.25 1
© 2013 triAGENS GmbH | 2013-06-18
Benefits of document stores
 no impedance mismatch,
no complex object-relational mapping,
no normalisation requirements
 querying documents is often easier and
faster than querying highly normalised
relational data
© 2013 triAGENS GmbH | 2013-06-18
Schema-less
 in document stores, there is no "table"-
schema as in the relational world
 each document can have different attributes
 there is no such thing as ALTER TABLE
 that's why document stores are called
schema-less or schema-free
© 2013 triAGENS GmbH | 2013-06-18
Querying
Document
Stores
© 2013 triAGENS GmbH | 2013-06-18
Querying by document id is easy
 every document store allows querying a
single document at a time
 accessing documents by their unique ids is
almost always dead-simple
© 2013 triAGENS GmbH | 2013-06-18
Complex queries?
 what if you want to run complex queries (e.g.
projections, filters, aggregations,
transformations, joins, ...)??
 let's check the available options in some of
the popular document stores
© 2013 triAGENS GmbH | 2013-06-18
CouchDB: map-reduce
 querying by something else than document
key / id requires writing a view
 views are JavaScript functions that are
stored inside the database
 views are populated by incremental map-
reduce
© 2013 triAGENS GmbH | 2013-06-18
map-reduce
 the map function is applied on each document
(that changed)
 map can filter out non-matching documents
 or emit modified or unmodified versions of them
 emitted documents can optionally be passed into
a reduce function
 reduce is called with groups of similar
documents and can thus perform aggregation
© 2013 triAGENS GmbH | 2013-06-18
CouchDB map-reduce example
map = function (doc) {
  var i, n = doc.orderItems.length;
  for (i = 0; i < n; ++i) {
    emit(doc.orderItems[i], 1);
  }
};
reduce = function (keys, values, rereduce) {
  if (rereduce) {
    return sum(values);
  }
  return values.length;
};
© 2013 triAGENS GmbH | 2013-06-18
map-reduce
 map-reduce is generic and powerful
 provides a programming language
 need to create views for everything that is
queried
 access to a single "table" at a time (no
cross-"table" views)
 a bit clumsy for ad-hoc exploratory queries
© 2013 triAGENS GmbH | 2013-06-18
MongoDB: find()
 ad-hoc queries in MongoDB are much easier
 can directly apply filters on collections,
allowing to find specific documents easily:
mongo> db.orders.find({ 
  "customer": { 
    "id": "c1",
    "name": "acme corp."
  }
});
© 2013 triAGENS GmbH | 2013-06-18
MongoDB: complex filters
 can filter on any document attribute or
sub-attribute
 indexes will automatically be used if present
 nesting filters allows complex queries
 quite flexible and powerful, but tends to be
hard to use and read for more complex
queries
© 2013 triAGENS GmbH | 2013-06-18
MongoDB: complex filtering
mongo> db.users.find({ 
  "$or": [ 
    { 
      "active": true 
    }, 
    { 
      "age": { 
        "$gte": 40 
      } 
    } 
  ]
});
© 2013 triAGENS GmbH | 2013-06-18
MongoDB: more options
 can also use JavaScript functions for filtering,
or JavaScript map-reduce
 several aggregation functions are also
provided
 neither option allows running cross-"table"
queries
© 2013 triAGENS GmbH | 2013-06-18
Why not use a
Query
Language?
© 2013 triAGENS GmbH | 2013-06-18
Query languages
 a good query language should
 allow writing both simple and complex
queries, without having to switch the
methodology
 provide the required features for filtering,
aggregation, joining etc.
 hide the database internals
© 2013 triAGENS GmbH | 2013-06-18
SQL
 in the relational world, there is one accepted
general-purpose query language: SQL
 it is quite well-known and mature:
 35+ years of experience
 many developers and established tools
around it
 standardised (but mind the "dialects"!)
© 2013 triAGENS GmbH | 2013-06-18
SQL in document stores?
 SQL is good at handling relational data
 not good at handling multi-valued or
hierchical attributes, which are common in
documents
 (too) powerful: SQL provides features many
document stores intentionally lack (e.g. joins,
transactions)
 SQL has not been adopted by document
stores yet
© 2013 triAGENS GmbH | 2013-06-18
Query
Languages
for Document
Stores
© 2013 triAGENS GmbH | 2013-06-18
XQuery?
 XQuery is a query and programming
language
 targeted mainly at processing XML data
 can process hierarchical data
 very powerful and extensible
 W3C recommendation
© 2013 triAGENS GmbH | 2013-06-18
XQuery
 XQuery has found most adoption in the area
of XML processing
 today people want to use JSON, not XML
 XQuery not available in popular document
stores
© 2013 triAGENS GmbH | 2013-06-18
ArangoDB Query Language (AQL)
 ArangoDB provides AQL, a query language
made for JSON document processing
 it allows running complex queries on
documents, including joins and aggregation
 language syntax was inspired by XQuery and
provides similar concepts such as
FOR, LET, RETURN, ...
 the language integrates JSON "naturally"
© 2013 triAGENS GmbH | 2013-06-18
AQL example
FOR order IN orders
  FILTER order.status == "processed"
  LET itemsValue = SUM((
    FOR item IN order.items
      FILTER item.status == "confirmed"
      RETURN item.price * item.quantity
  ))
  FILTER itemsValue >= 500
  RETURN {
    "items"      : order.items,
    "itemsValue" : itemsValue,
    "itemsCount" : LENGTH(order.items)
  }
© 2013 triAGENS GmbH | 2013-06-18
AQL: some features
 queries can combine data from multiple
"tables"
 this allows joins using any document
attributes or sub-attributes
 indexes will be used if present
© 2013 triAGENS GmbH | 2013-06-18
AQL: join example
FOR user IN users
  FILTER user.id == 1234
  RETURN {
    "user"  : user,
    "posts" : (FOR post IN blogPosts
      FILTER post.userId == user.id &&
             post.date >= '2013­06­13'          
  
      RETURN post
    )
  }
© 2013 triAGENS GmbH | 2013-06-18
AQL: additional features
 AQL provides basic functionality to query
graphs, too
 the language can be extended with user-
defined JavaScript functions
© 2013 triAGENS GmbH | 2013-06-18
JSONiq
 JSONiq is a data processing and query
language for handling JSON data
 it is based on XQuery, thus provides the same
FLWOR expressions: FOR, LET, WHERE,
ORDER, ...
 JSON is integrated "naturally"
 most of the XML handling is removed
© 2013 triAGENS GmbH | 2013-06-18
JSONiq: example
for $order in collection("orders")
  where $order.customer.id eq "abc­123"
  return {
    customer : $order.customer,
    items    : $order.items
  }
© 2013 triAGENS GmbH | 2013-06-18
JSONiq: join example
for $post in collection("posts")
  let $postId := $post.id
  for $comment in collection("comments")
    where $comment.postId eq $postId
    group by $postId
    order by count($comment) descending
    return {
      id       : $postId,
      comments : count($comment)
    }
© 2013 triAGENS GmbH | 2013-06-18
JSONiq
 JSONiq is a generic, database-agnostic
language
 it can be extended with user-defined XQuery
functions
 JSONiq is currently not implemented inside
any document database...
© 2013 triAGENS GmbH | 2013-06-18
JSONiq
 ...but it can be used via a service (at 28.io)
 the service provides the JSONiq query
language and implements functionality not
provided by a specific database
 such features are implemented client-side,
e.g. joins for MongoDB
© 2013 triAGENS GmbH | 2013-06-18
Summary
© 2013 triAGENS GmbH | 2013-06-18
Summary
 today's document stores provide different,
proprietary mechanisms for querying data
 there is currently no standard query
mechanism for document stores as there is
in the relational world (SQL)
© 2013 triAGENS GmbH | 2013-06-18
Summary
 you CAN use query languages in document
stores today, e.g. AQL and JSONiq
 if you like the idea, give them a try, provide
feedback and contribute!

Query Languages for Document Stores

  • 1.
    © 2013 triAGENSGmbH | 2013-06-18 Query Languages for Document Stores 2013-06-18 Jan Steemann
  • 2.
    © 2013 triAGENSGmbH | 2013-06-18 me  I'm a software developer  working at triAGENS GmbH, CGN  on - a document store
  • 3.
    © 2013 triAGENSGmbH | 2013-06-18 Documents
  • 4.
    © 2013 triAGENSGmbH | 2013-06-18 Documents  documents are self-contained, aggregate data structures...  ...consisting of named and typed attributes, which can be nested / hierarchical  documents can be used to model complex business objects
  • 5.
    © 2013 triAGENSGmbH | 2013-06-18 Example order document {    "id": "abc­100­22",    "date": "2013­04­26"    "customer": {     "id": "c­199­023",     "name": "acme corp."   },   "items": [ {        "id": "p­123",       "quantity": 1,       "price": 25.13   } ] }  
  • 6.
    © 2013 triAGENSGmbH | 2013-06-18 Document stores  document stores are databases specialised in handling documents  they've been around for a while  got really popular with the NoSQL buzz (CouchDB, MongoDB, ...)
  • 7.
    © 2013 triAGENSGmbH | 2013-06-18 Why use Document Stores?
  • 8.
    © 2013 triAGENSGmbH | 2013-06-18 Saving programming language data  document stores allow saving a programming language object as a whole  your programming language object becomes a document in the database, without the need for much transformation  compare this to saving data in a relational database...
  • 9.
    © 2013 triAGENSGmbH | 2013-06-18 Persistence the relational way orders id date 1 2013-04-20 2 2013-04-21 3 2013-04-21 4 2013-04-22 customers customer c1 c2 c1 c3 id name c1 c2 c3 acme corp. sample.com abc co. orderitems 1 order item 1 price quantity 23.25 1
  • 10.
    © 2013 triAGENSGmbH | 2013-06-18 Benefits of document stores  no impedance mismatch, no complex object-relational mapping, no normalisation requirements  querying documents is often easier and faster than querying highly normalised relational data
  • 11.
    © 2013 triAGENSGmbH | 2013-06-18 Schema-less  in document stores, there is no "table"- schema as in the relational world  each document can have different attributes  there is no such thing as ALTER TABLE  that's why document stores are called schema-less or schema-free
  • 12.
    © 2013 triAGENSGmbH | 2013-06-18 Querying Document Stores
  • 13.
    © 2013 triAGENSGmbH | 2013-06-18 Querying by document id is easy  every document store allows querying a single document at a time  accessing documents by their unique ids is almost always dead-simple
  • 14.
    © 2013 triAGENSGmbH | 2013-06-18 Complex queries?  what if you want to run complex queries (e.g. projections, filters, aggregations, transformations, joins, ...)??  let's check the available options in some of the popular document stores
  • 15.
    © 2013 triAGENSGmbH | 2013-06-18 CouchDB: map-reduce  querying by something else than document key / id requires writing a view  views are JavaScript functions that are stored inside the database  views are populated by incremental map- reduce
  • 16.
    © 2013 triAGENSGmbH | 2013-06-18 map-reduce  the map function is applied on each document (that changed)  map can filter out non-matching documents  or emit modified or unmodified versions of them  emitted documents can optionally be passed into a reduce function  reduce is called with groups of similar documents and can thus perform aggregation
  • 17.
    © 2013 triAGENSGmbH | 2013-06-18 CouchDB map-reduce example map = function (doc) {   var i, n = doc.orderItems.length;   for (i = 0; i < n; ++i) {     emit(doc.orderItems[i], 1);   } }; reduce = function (keys, values, rereduce) {   if (rereduce) {     return sum(values);   }   return values.length; };
  • 18.
    © 2013 triAGENSGmbH | 2013-06-18 map-reduce  map-reduce is generic and powerful  provides a programming language  need to create views for everything that is queried  access to a single "table" at a time (no cross-"table" views)  a bit clumsy for ad-hoc exploratory queries
  • 19.
    © 2013 triAGENSGmbH | 2013-06-18 MongoDB: find()  ad-hoc queries in MongoDB are much easier  can directly apply filters on collections, allowing to find specific documents easily: mongo> db.orders.find({    "customer": {      "id": "c1",     "name": "acme corp."   } });
  • 20.
    © 2013 triAGENSGmbH | 2013-06-18 MongoDB: complex filters  can filter on any document attribute or sub-attribute  indexes will automatically be used if present  nesting filters allows complex queries  quite flexible and powerful, but tends to be hard to use and read for more complex queries
  • 21.
    © 2013 triAGENSGmbH | 2013-06-18 MongoDB: complex filtering mongo> db.users.find({    "$or": [      {        "active": true      },      {        "age": {          "$gte": 40        }      }    ] });
  • 22.
    © 2013 triAGENSGmbH | 2013-06-18 MongoDB: more options  can also use JavaScript functions for filtering, or JavaScript map-reduce  several aggregation functions are also provided  neither option allows running cross-"table" queries
  • 23.
    © 2013 triAGENSGmbH | 2013-06-18 Why not use a Query Language?
  • 24.
    © 2013 triAGENSGmbH | 2013-06-18 Query languages  a good query language should  allow writing both simple and complex queries, without having to switch the methodology  provide the required features for filtering, aggregation, joining etc.  hide the database internals
  • 25.
    © 2013 triAGENSGmbH | 2013-06-18 SQL  in the relational world, there is one accepted general-purpose query language: SQL  it is quite well-known and mature:  35+ years of experience  many developers and established tools around it  standardised (but mind the "dialects"!)
  • 26.
    © 2013 triAGENSGmbH | 2013-06-18 SQL in document stores?  SQL is good at handling relational data  not good at handling multi-valued or hierchical attributes, which are common in documents  (too) powerful: SQL provides features many document stores intentionally lack (e.g. joins, transactions)  SQL has not been adopted by document stores yet
  • 27.
    © 2013 triAGENSGmbH | 2013-06-18 Query Languages for Document Stores
  • 28.
    © 2013 triAGENSGmbH | 2013-06-18 XQuery?  XQuery is a query and programming language  targeted mainly at processing XML data  can process hierarchical data  very powerful and extensible  W3C recommendation
  • 29.
    © 2013 triAGENSGmbH | 2013-06-18 XQuery  XQuery has found most adoption in the area of XML processing  today people want to use JSON, not XML  XQuery not available in popular document stores
  • 30.
    © 2013 triAGENSGmbH | 2013-06-18 ArangoDB Query Language (AQL)  ArangoDB provides AQL, a query language made for JSON document processing  it allows running complex queries on documents, including joins and aggregation  language syntax was inspired by XQuery and provides similar concepts such as FOR, LET, RETURN, ...  the language integrates JSON "naturally"
  • 31.
    © 2013 triAGENSGmbH | 2013-06-18 AQL example FOR order IN orders   FILTER order.status == "processed"   LET itemsValue = SUM((     FOR item IN order.items       FILTER item.status == "confirmed"       RETURN item.price * item.quantity   ))   FILTER itemsValue >= 500   RETURN {     "items"      : order.items,     "itemsValue" : itemsValue,     "itemsCount" : LENGTH(order.items)   }
  • 32.
    © 2013 triAGENSGmbH | 2013-06-18 AQL: some features  queries can combine data from multiple "tables"  this allows joins using any document attributes or sub-attributes  indexes will be used if present
  • 33.
    © 2013 triAGENSGmbH | 2013-06-18 AQL: join example FOR user IN users   FILTER user.id == 1234   RETURN {     "user"  : user,     "posts" : (FOR post IN blogPosts       FILTER post.userId == user.id &&              post.date >= '2013­06­13'                    RETURN post     )   }
  • 34.
    © 2013 triAGENSGmbH | 2013-06-18 AQL: additional features  AQL provides basic functionality to query graphs, too  the language can be extended with user- defined JavaScript functions
  • 35.
    © 2013 triAGENSGmbH | 2013-06-18 JSONiq  JSONiq is a data processing and query language for handling JSON data  it is based on XQuery, thus provides the same FLWOR expressions: FOR, LET, WHERE, ORDER, ...  JSON is integrated "naturally"  most of the XML handling is removed
  • 36.
    © 2013 triAGENSGmbH | 2013-06-18 JSONiq: example for $order in collection("orders")   where $order.customer.id eq "abc­123"   return {     customer : $order.customer,     items    : $order.items   }
  • 37.
    © 2013 triAGENSGmbH | 2013-06-18 JSONiq: join example for $post in collection("posts")   let $postId := $post.id   for $comment in collection("comments")     where $comment.postId eq $postId     group by $postId     order by count($comment) descending     return {       id       : $postId,       comments : count($comment)     }
  • 38.
    © 2013 triAGENSGmbH | 2013-06-18 JSONiq  JSONiq is a generic, database-agnostic language  it can be extended with user-defined XQuery functions  JSONiq is currently not implemented inside any document database...
  • 39.
    © 2013 triAGENSGmbH | 2013-06-18 JSONiq  ...but it can be used via a service (at 28.io)  the service provides the JSONiq query language and implements functionality not provided by a specific database  such features are implemented client-side, e.g. joins for MongoDB
  • 40.
    © 2013 triAGENSGmbH | 2013-06-18 Summary
  • 41.
    © 2013 triAGENSGmbH | 2013-06-18 Summary  today's document stores provide different, proprietary mechanisms for querying data  there is currently no standard query mechanism for document stores as there is in the relational world (SQL)
  • 42.
    © 2013 triAGENSGmbH | 2013-06-18 Summary  you CAN use query languages in document stores today, e.g. AQL and JSONiq  if you like the idea, give them a try, provide feedback and contribute!