SlideShare a Scribd company logo
1 of 67
Download to read offline
© 2013 triAGENS GmbH | 2013-08-24
Query mechanisms
for NoSQL databases
FrOSCon, 2013-08-24
Jan Steemann
© 2013 triAGENS GmbH | 2013-08-24
Me
 I'm a software developer,
working at triAGENS GmbH, CGN
 I work a lot on ,
a NoSQL document database
 I like databases in general
© 2013 triAGENS GmbH | 2013-08-24
How to save this programming
language user object in a database?
{
  "id" : 1234,
  "name" : {
    "first" : "foo",
    "last" : "bar"
  },
  "topics": [
    "skating",
    "music"
  ]
}
© 2013 triAGENS GmbH | 2013-08-24
Relational
Databases
© 2013 triAGENS GmbH | 2013-08-24
Relational databases – tables
 data are stored in tables with typed columns
 all records in a table are homogenously
structured and have the same columns and
data types
 tables are flat (no hierchical data in a table)
 columns have primitive data types:
multi-valued data are not supported
© 2013 triAGENS GmbH | 2013-08-24
Relational databases – schemas
 relational databases have a schema that
defines which tables, columns etc. there are
 users are required to define the schema
elements before data can be stored
 inserted data must match the schema or the
database will reject it
© 2013 triAGENS GmbH | 2013-08-24
Saving the user object in a
relational database
 we cannot store the object as it is in a
relational table, we must first normalise
 for the example, we end up with 3 database
tables (user, topic, and an n:m mapping table
between them)
 note that the object in the programming
language now has a different schema than
we have in the database
© 2013 triAGENS GmbH | 2013-08-24
Schema we may have come to
CREATE TABLE `user` (
  id INTEGER NOT NULL,
  firstName VARCHAR(40) NOT NULL,
  lastName VARCHAR(40) NOT NULL,
  PRIMARY KEY(id)
);
CREATE TABLE `topic` (
  id INTEGER NOT NULL auto_increment,
  name VARCHAR(40) NOT NULL,
  PRIMARY KEY(id),
  UNIQUE KEY(name)
);
CREATE TABLE `userTopic` (
  userId INTEGER NOT NULL,
  topicId INTEGER NOT NULL,
  PRIMARY KEY(userId, topicId),
  FOREIGN KEY(userId) REFERENCES user(id),
  FOREIGN KEY(topicId) REFERENCES topic(id)
);
user
id
firstName
lastName
topic
id
name
userTopic
userId
topicId
© 2013 triAGENS GmbH | 2013-08-24
Now we can save the user object
BEGIN;
­­ insert the user
INSERT INTO `user` (id, firstName, lastName) 
VALUES (1234, "foo", "bar");
­­ insert topics (must ignore duplicate keys)
INSERT INTO `topic` (name) VALUES ("skating");
INSERT INTO `topic` (name) VALUES ("music");
­­ insert user­to­topics mapping
INSERT INTO `userTopic` (userId, topicId) 
SELECT 1234, id FROM `topic` 
WHERE name IN ("skating", "music");
COMMIT;
© 2013 triAGENS GmbH | 2013-08-24
Joins, ACID, and transactions
 to get our data back, we need to read from
multiple tables, either with or without joins
 to make multi-table (or other multi-record)
operations behave predictably in concurrency
situations, relational databases provide
transactions and control over the ACID
properties (atomicity, consistency, isolation,
durability)
© 2013 triAGENS GmbH | 2013-08-24
The ubiquity of SQL
 note that all we did (schema setup, data
manipulation/selection, transactions &
concucrrency control) can be accomplished
with SQL queries
 note: some of the SQL work may be hidden
by object-relational mappers (ORMs)
 SQL is the standard means to query and
administer relational databases
© 2013 triAGENS GmbH | 2013-08-24
NoSQL
Databases
© 2013 triAGENS GmbH | 2013-08-24
Relational databases criticisms (I)
 lots of new databases have emerged in the
past few years, often because...
 ...object-relational mapping can be
complex or costly
 ...relational databases do not play well with
dynamically structured data and often-
varying schemas
© 2013 triAGENS GmbH | 2013-08-24
Relational databases criticisms (II)
 lots of new databases have emerged in the
past few years, often because...
 ...overhead of SQL parsing and full-blown
query engines may be significant for
simple access patterns (primary key
access, BLOB storage etc.)
 ...scaling to many servers with the ACID
guarantees provided by relational databases
is hard
© 2013 triAGENS GmbH | 2013-08-24
NoSQL and NewSQL databases
 many of the recent databases are labelled
 NoSQL (the non-relational ones) or
 NewSQL (the relational ones)
 because they provide alternative solutions
for some of the mentioned problems
 especially the NoSQL ones often sacrifice
features that relational databases have in
their DNA
© 2013 triAGENS GmbH | 2013-08-24
Example NoSQL databases
© 2013 triAGENS GmbH | 2013-08-24
NoSQL database characteristics
 NoSQL databases have multiple (but not
necessarily all) of these characteristics:
 non-relational
 schema-free
 open source
 simple APIs
 several, but not all of them, are distributed
and eventually consistent
© 2013 triAGENS GmbH | 2013-08-24
Non-relational
 NoSQL databases are generally non-
relational, meaning they do not follow the
relational model
 they do not provide tables with flat fixed-
column records
 instead, it is common to work with self-
contained aggregates (which may include
hierarchical data) or even BLOBs
© 2013 triAGENS GmbH | 2013-08-24
Non-relational
 this eliminates the need for complex
object-relational mapping and many data
normalisation requirements
 working on aggregates and BLOBs also led
to sacrificing complex and costly features,
such as query languages, query planners,
referential integrity, joins, ACID guarantees for
cross-record operations etc. in many of these
databases
© 2013 triAGENS GmbH | 2013-08-24
Schema-free
 most NoSQL databases are schema-free
(or at least are very relaxed about schemas)
 there is often no need to define any sort of
schema for the data
 being schema-free allows different records in
the same domain (e.g. "user") to have
heterogenous structures
 this allows a gentle migration of data
© 2013 triAGENS GmbH | 2013-08-24
Simple APIs
 NoSQL databases often provide simple
interfaces to store and query data
 in many cases, the APIs offer access to low-
level data manipulation and selection
methods
 queries capabilities are often limited so
queries can be expressed in a simple way
 SQL is not widely used
© 2013 triAGENS GmbH | 2013-08-24
Simple APIs
 many NoSQL databases have simple text-
based protocols or HTTP REST APIs with
JSON inside
 databases with HTTP APIs are web-enabled
and can be run as internet-facing services
 several vendors provide database-as-a-
service offers
© 2013 triAGENS GmbH | 2013-08-24
Distributed
 several NoSQL databases (not all!) can be
run in a distributed fashion, providing auto-
scalability and failover capabilities
 in a distributed setup, ACID features are often
sacrificed for scalability and throughput
 replication between distributed nodes is
often lazy, meaning the database is
eventually consistent
© 2013 triAGENS GmbH | 2013-08-24
NoSQL databases variety
 there are 100+ NoSQL databases around
 they are often categorised based on the data
model they support, for example:
 document stores
 key-value stores
 wide column/column family stores
 graph databases
 NoSQL databases are typically very different
from each other
© 2013 triAGENS GmbH | 2013-08-24
Document
stores
© 2013 triAGENS GmbH | 2013-08-24
Documents – principle
 documents are self-contained, aggregate
data structures
 they consist of attributes (name-value pairs)
 attribute values have data types, which can
also be nested/hierarchical
© 2013 triAGENS GmbH | 2013-08-24
Example document (JSON)
{
  "id" : 1234,
  "name" : {
    "first" : "foo",
    "last" : "bar"
  },
  "topics": [
    "skating",
    "music"
  ]
}
© 2013 triAGENS GmbH | 2013-08-24
Objects vs. documents
 programming language objects can often
be stored easily in documents
 lists/arrays, and sub-objects from
programming language objects do not need
to be normalised and re-assembled later
 one programming language object is often
one document in the database
© 2013 triAGENS GmbH | 2013-08-24
Document stores
 document stores have a type system, so
they can perform some basic validation on
data
 as each document carries an implicit
schema, document stores can access all
document attributes and sub-attributes
individually, offering lots of query power
 today will look at document stores CouchDB,
MongoDB, ArangoDB
© 2013 triAGENS GmbH | 2013-08-24
Document stores – CouchDB
 CouchDB is a document store with a JSON
type system
 similar documents are organised in
databases
 the server functionality is exposed via an
HTTP REST API
 to communicate with the CouchDB server,
use curl or the browser
© 2013 triAGENS GmbH | 2013-08-24
Saving the user object in CouchDB
 to create a database "user" for storing
documents, send an HTTP PUT request to the
server:
> curl ­X PUT
  http://couchdb:5984/user
 to save the user object as a document, send
its JSON representation to the server:
> curl ­X POST 
  ­d '{"_id":"1234", ...}'
  http://couchdb:5984/user
© 2013 triAGENS GmbH | 2013-08-24
Querying the user object in CouchDB
 to retrieve the object using its unique
document id, send an HTTP GET request:
> curl ­X GET        
  http://couchdb:5984/user/1234
© 2013 triAGENS GmbH | 2013-08-24
Views in CouchDB
 querying documents by anything else than
their id attributes requires creating a view
 views are populated with user-defined
JavaScript map-reduce functions
 views are normally populated lazily (when
the view is queried) and incrementally
 view results are persisted so views are
persistent secondary indexes
© 2013 triAGENS GmbH | 2013-08-24
Generic map-reduce algorithm
 map-reduce is a general framework, present
in many databases
 map-reduce requires at least a map function
 map is applied on each (changed)
document to filter out irrelevant documents,
and to emit data for all documents of interest
 the emitted data is sorted and passed in
groups to reduce for aggregation, or, if no
reduce, is the final result
© 2013 triAGENS GmbH | 2013-08-24
Filtering with map
map = function (doc) {
  for (i = 0; 
       i < doc.topics.length; i++) {
    if (doc.topics[i] === 'music') {
      emit(null, doc);
      return; // done
    }
  }
};
[ null, { "_id" : 1234, .... } ]
...
© 2013 triAGENS GmbH | 2013-08-24
Counting with map
map = function (doc) {
  for (i = 0; i < doc.topics.length; ++i) {
    // emit [ name, 1 ] for each topic
    emit(doc.topics[i], 1);
  }
};
[ "skating", 1 ]
[ "skating", 1 ]
[ "music", 1 ]
...
© 2013 triAGENS GmbH | 2013-08-24
Aggregating with reduce
reduce = function (keys, values, rereduce) {
  if (rereduce) {
    // reducing a reduce result
    return sum(values);
  }
  // return number of values in group
  return values.length;
};
[ "skating", 2 ]
[ "music", 1 ]
...
© 2013 triAGENS GmbH | 2013-08-24
Map-reduce
 map-reduce functionality is available in many
NoSQL databases
 it got popular because map can be run fully
distributed, thus allowing the analysis of big
datasets
 it is actual programming, not writing queries!
© 2013 triAGENS GmbH | 2013-08-24
Document stores – MongoDB
 MongoDB is a document store with a BSON
(a binary superset of JSON) type system
 similar documents are organised in
databases with collections
 to connect to a MongoDB server, use the
mongo client (no HTTP)
© 2013 triAGENS GmbH | 2013-08-24
Saving the user object in MongoDB
 to store the user object, use save:
mongo> db.user.save({
  "_id" : 1234,
  "name" : {
    "first" : "foo",
    "last" : "bar"
  },
  "topics" : [ "skating", "music" ]
});
© 2013 triAGENS GmbH | 2013-08-24
Querying the user object in MongoDB
 use find to filter on any attribute or sub-
attribute(s):
mongo> db.user.find({ 
  "_id" : 1234
});
mongo> db.user.find({ 
  "name.first" : "foo"
});
© 2013 triAGENS GmbH | 2013-08-24
Querying using $query $operators
mongo> db.user.find({ 
  "$or" : [ 
    { "name.first" : "foo"},
    { 
      "topics" : { 
        "$in" : [ "skating" ] 
      }
    }
  ]
});
© 2013 triAGENS GmbH | 2013-08-24
Querying in MongoDB: more options
 find queries can be combined with count(),
limit(), skip(), sort() etc. functions
 secondary indexes can be created on
attributes or sub-attributes to speed up
searches
 several aggregation functions are also
provided
 no joins or cross-collection queries are
possible
© 2013 triAGENS GmbH | 2013-08-24
Querying in MongoDB: more options
 find queries can be combined with count(),
limit(), skip(), sort() etc. functions
 secondary indexes can be created on
attributes or sub-attributes to speed up
searches
 several aggregation functions are also
provided
 no joins or cross-collection queries are
possible
© 2013 triAGENS GmbH | 2013-08-24
Document stores – ArangoDB
 ArangoDB is a document store that uses a
JSON type system
 similar documents are organised in
collections
 server functionality is exposed via HTTP
REST API
 to connect, use curl, the arangosh client or
the browser
© 2013 triAGENS GmbH | 2013-08-24
Saving the user object in ArangoDB
arangosh> db._create("user");
arangosh> db.user.save({
  "_key" : "1234",
  "name" : {
    "first" : "foo",
    "last" : "bar"
  },
  "topics": [
    "skating",
    "music"
  ]
});
© 2013 triAGENS GmbH | 2013-08-24
Querying the user object in ArangoDB
 to get the object back, query it by its unique
key:
arangosh> db.user.document("1234");
 to retrieve document(s) provide some
example values:
arangosh> db.user.byExample({ 
  "name.first": "foo" 
});
© 2013 triAGENS GmbH | 2013-08-24
ArangoDB Query Language (AQL)
 in addition to the low-level access methods,
ArangoDB also provides a high-level query
language, AQL
 the language integrates JSON naturally
 AQL allows running complex queries,
including aggregation and joins
 indexes on the filter conditions and join
attributes will be used if present
© 2013 triAGENS GmbH | 2013-08-24
Querying with AQL
to query all users with at least 3 topics
(including topic "skating") with topic counts:
FOR u IN user
  FILTER "skating" IN u.topics &&
         LENGTH(u.topics) >= 3
  RETURN {
    "name" : u.name,
    "topics" : u.topics,
    "count" : LENGTH(u.topics)
  }
© 2013 triAGENS GmbH | 2013-08-24
Aggregation using AQL
to count the frequencies of all topics:
FOR u IN user
  FOR t IN u.topics
    COLLECT topicName = t INTO g
    RETURN {
      "name" : topicName,
      "count" : LENGTH(g)
    }    
© 2013 triAGENS GmbH | 2013-08-24
Key-value
stores
© 2013 triAGENS GmbH | 2013-08-24
Key-value stores – principle
 in a key-value store, a value is mapped to a
unique key
 to store data, supply both key and value:
> store.set("user­1234", "..."); 
 to retrieve a value, supply its key:
> value = store.get("user­1234");
 keys are organised in databases, buckets,
keyspaces etc.
© 2013 triAGENS GmbH | 2013-08-24
Key-value stores – values
 key-value stores treat value data as
indivisible BLOBs by default (some
operations will treat values as numeric)
 for the store, the values do not have a
known structure and will not be validated
 as no structure is known, values can only be
queried via their keys, not by values or sub-
parts of values
© 2013 triAGENS GmbH | 2013-08-24
Key-value stores – basic operations
 key-value stores are very efficient for basic
operations on keys, such as set, get, del,
replace, incr, decr
 many stores also provide automatic ttl-
based expiration of values (useful for
caches)
 some provide key enumeration to retrieve
the full or a restricted list of keys
© 2013 triAGENS GmbH | 2013-08-24
Saving the user object in Redis
 Redis is a (single server) key-value store
 to connect, use redis­cli (or telnet)
 to store the user object in Redis:
redis> set user­1234 
       <serialized object 
        representation> 
© 2013 triAGENS GmbH | 2013-08-24
Querying the user object from Redis
 to retrieve the user object, supply the key:
redis> get user­1234
<serialized object representation> 
 to query the list of users, we can use key
enumeration using a prefix:
redis> keys user­*
1) "user­1234"
 that's about what we can do with BLOB
values
© 2013 triAGENS GmbH | 2013-08-24
Additional querying in Redis
 Redis provides extra commands to work on
data structures (sets, lists, hashes)
 these commands allow to Redis to be used
for some extra use cases
© 2013 triAGENS GmbH | 2013-08-24
Mapping users to topics in Redis
 we can use Redis sets to map users to topics
 each topic gets its own set
 and user ids are added to all sets they have
topics for:
redis> sadd topic­skating 1234
redis> sadd topic­music 1234
redis> sadd topic­skating 2345
redis> sadd topic­running 3456
© 2013 triAGENS GmbH | 2013-08-24
Querying users for topics in Redis
 which users have topic "skating" assigned?
redis> smembers topic­skating
1) "1234"
2) "2345"
 which users have both topics "skating" and
"music" assigned (intersection)?
redis> sinter topic­skating
              topic­music
1) "1234"
© 2013 triAGENS GmbH | 2013-08-24
Querying distinct values in Redis
 using the sets and key enumeration, we can
also answer the question "what distinct topics
are there?":
redis> keys topic­*
1) "topic­skating"
2) "topic­music"
3) "topic­running"
© 2013 triAGENS GmbH | 2013-08-24
Data structure commands in Redis
 there is no general-purpose query language
so querying is rather limited
 in general, data must be made to fit the
commands
 the special commands are very useful to
implement counters, queues, and
publish/subscribe
© 2013 triAGENS GmbH | 2013-08-24
Other key-value stores
 other key-value stores use the memcache
protocol or provide an HTTP API
 some allow users to maintain secondary
indexes
 these indexes can be used for equality and
range queries on the index data
 some key-value stores also provide map-
reduce for arbitrary queries
© 2013 triAGENS GmbH | 2013-08-24
Summary
© 2013 triAGENS GmbH | 2013-08-24
Summary – non-relational
 NoSQL databases are very different from
relational databases and do not follow the
relational model
 instead of working on fixed column tables, they
work on aggregates or BLOBs
 they often intentionally lack features that
relational databases have
 SQL is not widely used to query and administer
© 2013 triAGENS GmbH | 2013-08-24
Summary – categories
 there are different categories of NoSQL
databases, with different use cases and
limitations each
 key-value stores normally focus on high
throughput and/or scalability, and often allow
limited querying only
 document stores try to be more general
purpose and often allow more complex
queries
© 2013 triAGENS GmbH | 2013-08-24
Summary – usage
 the APIs of NoSQL databases are often
simple, so it is easy to get started with
them
 providing database access via HTTP REST
APIs is quite common in the NoSQL world
 this allows querying the database directly
from any HTTP-enabled clients (browsers,
mobile devices etc.)
© 2013 triAGENS GmbH | 2013-08-24
Summary – variety
 NoSQL databases are very different from
each other
 there are yet no standards such as SQL is in
the relational world
 there is an interesting attempt to establish a
cross-database query language (JSONiq)

More Related Content

What's hot

What's hot (20)

Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
What's New in MySQL 5.7 InnoDB
What's New in MySQL 5.7 InnoDBWhat's New in MySQL 5.7 InnoDB
What's New in MySQL 5.7 InnoDB
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
database migration simple, cross-engine and cross-platform migrations with ...
database migration   simple, cross-engine and cross-platform migrations with ...database migration   simple, cross-engine and cross-platform migrations with ...
database migration simple, cross-engine and cross-platform migrations with ...
 
Spark
SparkSpark
Spark
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQL
 
Azure purview
Azure purviewAzure purview
Azure purview
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Improving PySpark performance: Spark Performance Beyond the JVM
Improving PySpark performance: Spark Performance Beyond the JVMImproving PySpark performance: Spark Performance Beyond the JVM
Improving PySpark performance: Spark Performance Beyond the JVM
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
Oracle RAC features on Exadata
Oracle RAC features on ExadataOracle RAC features on Exadata
Oracle RAC features on Exadata
 
Modern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data CaptureModern ETL Pipelines with Change Data Capture
Modern ETL Pipelines with Change Data Capture
 
Oracle MAA (Maximum Availability Architecture) 18c - An Overview
Oracle MAA (Maximum Availability Architecture) 18c - An OverviewOracle MAA (Maximum Availability Architecture) 18c - An Overview
Oracle MAA (Maximum Availability Architecture) 18c - An Overview
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
 
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best PracticesOracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
Oracle Real Application Clusters (RAC) 12c Rel. 2 - Operational Best Practices
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 

Similar to Query mechanisms for NoSQL databases

Query Languages for Document Stores
Query Languages for Document StoresQuery Languages for Document Stores
Query Languages for Document Stores
InteractiveCologne
 

Similar to Query mechanisms for NoSQL databases (20)

Query Languages for Document Stores
Query Languages for Document StoresQuery Languages for Document Stores
Query Languages for Document Stores
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Challenges Management and Opportunities of Cloud DBA
Challenges Management and Opportunities of Cloud DBAChallenges Management and Opportunities of Cloud DBA
Challenges Management and Opportunities of Cloud DBA
 
Artigo no sql x relational
Artigo no sql x relationalArtigo no sql x relational
Artigo no sql x relational
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
 
Beyond Relational Databases
Beyond Relational DatabasesBeyond Relational Databases
Beyond Relational Databases
 
Nosql
NosqlNosql
Nosql
 
Nosql
NosqlNosql
Nosql
 
Architecture et modèle de données Cassandra
Architecture et modèle de données CassandraArchitecture et modèle de données Cassandra
Architecture et modèle de données Cassandra
 
Introduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSEIntroduction to Cassandra and datastax DSE
Introduction to Cassandra and datastax DSE
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
Solution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big DataSolution Use Case Demo: The Power of Relationships in Your Big Data
Solution Use Case Demo: The Power of Relationships in Your Big Data
 
MySQL Connector/Node.js and the X DevAPI
MySQL Connector/Node.js and the X DevAPIMySQL Connector/Node.js and the X DevAPI
MySQL Connector/Node.js and the X DevAPI
 
No sql database
No sql databaseNo sql database
No sql database
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
 
No sql
No sqlNo sql
No sql
 
Azure SQL DWH
Azure SQL DWHAzure SQL DWH
Azure SQL DWH
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 

More from ArangoDB Database

More from ArangoDB Database (20)

ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
 
ArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at ScaleArangoDB 3.9 - Further Powering Graphs at Scale
ArangoDB 3.9 - Further Powering Graphs at Scale
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale Webinar: ArangoDB 3.8 Preview - Analytics at Scale
Webinar: ArangoDB 3.8 Preview - Analytics at Scale
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDB
 
Getting Started with ArangoDB Oasis
Getting Started with ArangoDB OasisGetting Started with ArangoDB Oasis
Getting Started with ArangoDB Oasis
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
 
Hacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge GraphsHacktoberfest 2020 - Intro to Knowledge Graphs
Hacktoberfest 2020 - Intro to Knowledge Graphs
 
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release WebinarA Graph Database That Scales - ArangoDB 3.7 Release Webinar
A Graph Database That Scales - ArangoDB 3.7 Release Webinar
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
 
ArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at ScaleArangoDB 3.7 Roadmap: Performance at Scale
ArangoDB 3.7 Roadmap: Performance at Scale
 
Webinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB OasisWebinar: What to expect from ArangoDB Oasis
Webinar: What to expect from ArangoDB Oasis
 
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
ArangoDB 3.5 Feature Overview Webinar - Sept 12, 2019
 
3.5 webinar
3.5 webinar 3.5 webinar
3.5 webinar
 
Webinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDBWebinar: How native multi model works in ArangoDB
Webinar: How native multi model works in ArangoDB
 
An introduction to multi-model databases
An introduction to multi-model databasesAn introduction to multi-model databases
An introduction to multi-model databases
 
Running complex data queries in a distributed system
Running complex data queries in a distributed systemRunning complex data queries in a distributed system
Running complex data queries in a distributed system
 

Recently uploaded

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Recently uploaded (20)

TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 

Query mechanisms for NoSQL databases

  • 1. © 2013 triAGENS GmbH | 2013-08-24 Query mechanisms for NoSQL databases FrOSCon, 2013-08-24 Jan Steemann
  • 2. © 2013 triAGENS GmbH | 2013-08-24 Me  I'm a software developer, working at triAGENS GmbH, CGN  I work a lot on , a NoSQL document database  I like databases in general
  • 3. © 2013 triAGENS GmbH | 2013-08-24 How to save this programming language user object in a database? {   "id" : 1234,   "name" : {     "first" : "foo",     "last" : "bar"   },   "topics": [     "skating",     "music"   ] }
  • 4. © 2013 triAGENS GmbH | 2013-08-24 Relational Databases
  • 5. © 2013 triAGENS GmbH | 2013-08-24 Relational databases – tables  data are stored in tables with typed columns  all records in a table are homogenously structured and have the same columns and data types  tables are flat (no hierchical data in a table)  columns have primitive data types: multi-valued data are not supported
  • 6. © 2013 triAGENS GmbH | 2013-08-24 Relational databases – schemas  relational databases have a schema that defines which tables, columns etc. there are  users are required to define the schema elements before data can be stored  inserted data must match the schema or the database will reject it
  • 7. © 2013 triAGENS GmbH | 2013-08-24 Saving the user object in a relational database  we cannot store the object as it is in a relational table, we must first normalise  for the example, we end up with 3 database tables (user, topic, and an n:m mapping table between them)  note that the object in the programming language now has a different schema than we have in the database
  • 8. © 2013 triAGENS GmbH | 2013-08-24 Schema we may have come to CREATE TABLE `user` (   id INTEGER NOT NULL,   firstName VARCHAR(40) NOT NULL,   lastName VARCHAR(40) NOT NULL,   PRIMARY KEY(id) ); CREATE TABLE `topic` (   id INTEGER NOT NULL auto_increment,   name VARCHAR(40) NOT NULL,   PRIMARY KEY(id),   UNIQUE KEY(name) ); CREATE TABLE `userTopic` (   userId INTEGER NOT NULL,   topicId INTEGER NOT NULL,   PRIMARY KEY(userId, topicId),   FOREIGN KEY(userId) REFERENCES user(id),   FOREIGN KEY(topicId) REFERENCES topic(id) ); user id firstName lastName topic id name userTopic userId topicId
  • 9. © 2013 triAGENS GmbH | 2013-08-24 Now we can save the user object BEGIN; ­­ insert the user INSERT INTO `user` (id, firstName, lastName)  VALUES (1234, "foo", "bar"); ­­ insert topics (must ignore duplicate keys) INSERT INTO `topic` (name) VALUES ("skating"); INSERT INTO `topic` (name) VALUES ("music"); ­­ insert user­to­topics mapping INSERT INTO `userTopic` (userId, topicId)  SELECT 1234, id FROM `topic`  WHERE name IN ("skating", "music"); COMMIT;
  • 10. © 2013 triAGENS GmbH | 2013-08-24 Joins, ACID, and transactions  to get our data back, we need to read from multiple tables, either with or without joins  to make multi-table (or other multi-record) operations behave predictably in concurrency situations, relational databases provide transactions and control over the ACID properties (atomicity, consistency, isolation, durability)
  • 11. © 2013 triAGENS GmbH | 2013-08-24 The ubiquity of SQL  note that all we did (schema setup, data manipulation/selection, transactions & concucrrency control) can be accomplished with SQL queries  note: some of the SQL work may be hidden by object-relational mappers (ORMs)  SQL is the standard means to query and administer relational databases
  • 12. © 2013 triAGENS GmbH | 2013-08-24 NoSQL Databases
  • 13. © 2013 triAGENS GmbH | 2013-08-24 Relational databases criticisms (I)  lots of new databases have emerged in the past few years, often because...  ...object-relational mapping can be complex or costly  ...relational databases do not play well with dynamically structured data and often- varying schemas
  • 14. © 2013 triAGENS GmbH | 2013-08-24 Relational databases criticisms (II)  lots of new databases have emerged in the past few years, often because...  ...overhead of SQL parsing and full-blown query engines may be significant for simple access patterns (primary key access, BLOB storage etc.)  ...scaling to many servers with the ACID guarantees provided by relational databases is hard
  • 15. © 2013 triAGENS GmbH | 2013-08-24 NoSQL and NewSQL databases  many of the recent databases are labelled  NoSQL (the non-relational ones) or  NewSQL (the relational ones)  because they provide alternative solutions for some of the mentioned problems  especially the NoSQL ones often sacrifice features that relational databases have in their DNA
  • 16. © 2013 triAGENS GmbH | 2013-08-24 Example NoSQL databases
  • 17. © 2013 triAGENS GmbH | 2013-08-24 NoSQL database characteristics  NoSQL databases have multiple (but not necessarily all) of these characteristics:  non-relational  schema-free  open source  simple APIs  several, but not all of them, are distributed and eventually consistent
  • 18. © 2013 triAGENS GmbH | 2013-08-24 Non-relational  NoSQL databases are generally non- relational, meaning they do not follow the relational model  they do not provide tables with flat fixed- column records  instead, it is common to work with self- contained aggregates (which may include hierarchical data) or even BLOBs
  • 19. © 2013 triAGENS GmbH | 2013-08-24 Non-relational  this eliminates the need for complex object-relational mapping and many data normalisation requirements  working on aggregates and BLOBs also led to sacrificing complex and costly features, such as query languages, query planners, referential integrity, joins, ACID guarantees for cross-record operations etc. in many of these databases
  • 20. © 2013 triAGENS GmbH | 2013-08-24 Schema-free  most NoSQL databases are schema-free (or at least are very relaxed about schemas)  there is often no need to define any sort of schema for the data  being schema-free allows different records in the same domain (e.g. "user") to have heterogenous structures  this allows a gentle migration of data
  • 21. © 2013 triAGENS GmbH | 2013-08-24 Simple APIs  NoSQL databases often provide simple interfaces to store and query data  in many cases, the APIs offer access to low- level data manipulation and selection methods  queries capabilities are often limited so queries can be expressed in a simple way  SQL is not widely used
  • 22. © 2013 triAGENS GmbH | 2013-08-24 Simple APIs  many NoSQL databases have simple text- based protocols or HTTP REST APIs with JSON inside  databases with HTTP APIs are web-enabled and can be run as internet-facing services  several vendors provide database-as-a- service offers
  • 23. © 2013 triAGENS GmbH | 2013-08-24 Distributed  several NoSQL databases (not all!) can be run in a distributed fashion, providing auto- scalability and failover capabilities  in a distributed setup, ACID features are often sacrificed for scalability and throughput  replication between distributed nodes is often lazy, meaning the database is eventually consistent
  • 24. © 2013 triAGENS GmbH | 2013-08-24 NoSQL databases variety  there are 100+ NoSQL databases around  they are often categorised based on the data model they support, for example:  document stores  key-value stores  wide column/column family stores  graph databases  NoSQL databases are typically very different from each other
  • 25. © 2013 triAGENS GmbH | 2013-08-24 Document stores
  • 26. © 2013 triAGENS GmbH | 2013-08-24 Documents – principle  documents are self-contained, aggregate data structures  they consist of attributes (name-value pairs)  attribute values have data types, which can also be nested/hierarchical
  • 27. © 2013 triAGENS GmbH | 2013-08-24 Example document (JSON) {   "id" : 1234,   "name" : {     "first" : "foo",     "last" : "bar"   },   "topics": [     "skating",     "music"   ] }
  • 28. © 2013 triAGENS GmbH | 2013-08-24 Objects vs. documents  programming language objects can often be stored easily in documents  lists/arrays, and sub-objects from programming language objects do not need to be normalised and re-assembled later  one programming language object is often one document in the database
  • 29. © 2013 triAGENS GmbH | 2013-08-24 Document stores  document stores have a type system, so they can perform some basic validation on data  as each document carries an implicit schema, document stores can access all document attributes and sub-attributes individually, offering lots of query power  today will look at document stores CouchDB, MongoDB, ArangoDB
  • 30. © 2013 triAGENS GmbH | 2013-08-24 Document stores – CouchDB  CouchDB is a document store with a JSON type system  similar documents are organised in databases  the server functionality is exposed via an HTTP REST API  to communicate with the CouchDB server, use curl or the browser
  • 31. © 2013 triAGENS GmbH | 2013-08-24 Saving the user object in CouchDB  to create a database "user" for storing documents, send an HTTP PUT request to the server: > curl ­X PUT   http://couchdb:5984/user  to save the user object as a document, send its JSON representation to the server: > curl ­X POST    ­d '{"_id":"1234", ...}'   http://couchdb:5984/user
  • 32. © 2013 triAGENS GmbH | 2013-08-24 Querying the user object in CouchDB  to retrieve the object using its unique document id, send an HTTP GET request: > curl ­X GET           http://couchdb:5984/user/1234
  • 33. © 2013 triAGENS GmbH | 2013-08-24 Views in CouchDB  querying documents by anything else than their id attributes requires creating a view  views are populated with user-defined JavaScript map-reduce functions  views are normally populated lazily (when the view is queried) and incrementally  view results are persisted so views are persistent secondary indexes
  • 34. © 2013 triAGENS GmbH | 2013-08-24 Generic map-reduce algorithm  map-reduce is a general framework, present in many databases  map-reduce requires at least a map function  map is applied on each (changed) document to filter out irrelevant documents, and to emit data for all documents of interest  the emitted data is sorted and passed in groups to reduce for aggregation, or, if no reduce, is the final result
  • 35. © 2013 triAGENS GmbH | 2013-08-24 Filtering with map map = function (doc) {   for (i = 0;         i < doc.topics.length; i++) {     if (doc.topics[i] === 'music') {       emit(null, doc);       return; // done     }   } }; [ null, { "_id" : 1234, .... } ] ...
  • 36. © 2013 triAGENS GmbH | 2013-08-24 Counting with map map = function (doc) {   for (i = 0; i < doc.topics.length; ++i) {     // emit [ name, 1 ] for each topic     emit(doc.topics[i], 1);   } }; [ "skating", 1 ] [ "skating", 1 ] [ "music", 1 ] ...
  • 37. © 2013 triAGENS GmbH | 2013-08-24 Aggregating with reduce reduce = function (keys, values, rereduce) {   if (rereduce) {     // reducing a reduce result     return sum(values);   }   // return number of values in group   return values.length; }; [ "skating", 2 ] [ "music", 1 ] ...
  • 38. © 2013 triAGENS GmbH | 2013-08-24 Map-reduce  map-reduce functionality is available in many NoSQL databases  it got popular because map can be run fully distributed, thus allowing the analysis of big datasets  it is actual programming, not writing queries!
  • 39. © 2013 triAGENS GmbH | 2013-08-24 Document stores – MongoDB  MongoDB is a document store with a BSON (a binary superset of JSON) type system  similar documents are organised in databases with collections  to connect to a MongoDB server, use the mongo client (no HTTP)
  • 40. © 2013 triAGENS GmbH | 2013-08-24 Saving the user object in MongoDB  to store the user object, use save: mongo> db.user.save({   "_id" : 1234,   "name" : {     "first" : "foo",     "last" : "bar"   },   "topics" : [ "skating", "music" ] });
  • 41. © 2013 triAGENS GmbH | 2013-08-24 Querying the user object in MongoDB  use find to filter on any attribute or sub- attribute(s): mongo> db.user.find({    "_id" : 1234 }); mongo> db.user.find({    "name.first" : "foo" });
  • 42. © 2013 triAGENS GmbH | 2013-08-24 Querying using $query $operators mongo> db.user.find({    "$or" : [      { "name.first" : "foo"},     {        "topics" : {          "$in" : [ "skating" ]        }     }   ] });
  • 43. © 2013 triAGENS GmbH | 2013-08-24 Querying in MongoDB: more options  find queries can be combined with count(), limit(), skip(), sort() etc. functions  secondary indexes can be created on attributes or sub-attributes to speed up searches  several aggregation functions are also provided  no joins or cross-collection queries are possible
  • 44. © 2013 triAGENS GmbH | 2013-08-24 Querying in MongoDB: more options  find queries can be combined with count(), limit(), skip(), sort() etc. functions  secondary indexes can be created on attributes or sub-attributes to speed up searches  several aggregation functions are also provided  no joins or cross-collection queries are possible
  • 45. © 2013 triAGENS GmbH | 2013-08-24 Document stores – ArangoDB  ArangoDB is a document store that uses a JSON type system  similar documents are organised in collections  server functionality is exposed via HTTP REST API  to connect, use curl, the arangosh client or the browser
  • 46. © 2013 triAGENS GmbH | 2013-08-24 Saving the user object in ArangoDB arangosh> db._create("user"); arangosh> db.user.save({   "_key" : "1234",   "name" : {     "first" : "foo",     "last" : "bar"   },   "topics": [     "skating",     "music"   ] });
  • 47. © 2013 triAGENS GmbH | 2013-08-24 Querying the user object in ArangoDB  to get the object back, query it by its unique key: arangosh> db.user.document("1234");  to retrieve document(s) provide some example values: arangosh> db.user.byExample({    "name.first": "foo"  });
  • 48. © 2013 triAGENS GmbH | 2013-08-24 ArangoDB Query Language (AQL)  in addition to the low-level access methods, ArangoDB also provides a high-level query language, AQL  the language integrates JSON naturally  AQL allows running complex queries, including aggregation and joins  indexes on the filter conditions and join attributes will be used if present
  • 49. © 2013 triAGENS GmbH | 2013-08-24 Querying with AQL to query all users with at least 3 topics (including topic "skating") with topic counts: FOR u IN user   FILTER "skating" IN u.topics &&          LENGTH(u.topics) >= 3   RETURN {     "name" : u.name,     "topics" : u.topics,     "count" : LENGTH(u.topics)   }
  • 50. © 2013 triAGENS GmbH | 2013-08-24 Aggregation using AQL to count the frequencies of all topics: FOR u IN user   FOR t IN u.topics     COLLECT topicName = t INTO g     RETURN {       "name" : topicName,       "count" : LENGTH(g)     }    
  • 51. © 2013 triAGENS GmbH | 2013-08-24 Key-value stores
  • 52. © 2013 triAGENS GmbH | 2013-08-24 Key-value stores – principle  in a key-value store, a value is mapped to a unique key  to store data, supply both key and value: > store.set("user­1234", "...");   to retrieve a value, supply its key: > value = store.get("user­1234");  keys are organised in databases, buckets, keyspaces etc.
  • 53. © 2013 triAGENS GmbH | 2013-08-24 Key-value stores – values  key-value stores treat value data as indivisible BLOBs by default (some operations will treat values as numeric)  for the store, the values do not have a known structure and will not be validated  as no structure is known, values can only be queried via their keys, not by values or sub- parts of values
  • 54. © 2013 triAGENS GmbH | 2013-08-24 Key-value stores – basic operations  key-value stores are very efficient for basic operations on keys, such as set, get, del, replace, incr, decr  many stores also provide automatic ttl- based expiration of values (useful for caches)  some provide key enumeration to retrieve the full or a restricted list of keys
  • 55. © 2013 triAGENS GmbH | 2013-08-24 Saving the user object in Redis  Redis is a (single server) key-value store  to connect, use redis­cli (or telnet)  to store the user object in Redis: redis> set user­1234         <serialized object          representation> 
  • 56. © 2013 triAGENS GmbH | 2013-08-24 Querying the user object from Redis  to retrieve the user object, supply the key: redis> get user­1234 <serialized object representation>   to query the list of users, we can use key enumeration using a prefix: redis> keys user­* 1) "user­1234"  that's about what we can do with BLOB values
  • 57. © 2013 triAGENS GmbH | 2013-08-24 Additional querying in Redis  Redis provides extra commands to work on data structures (sets, lists, hashes)  these commands allow to Redis to be used for some extra use cases
  • 58. © 2013 triAGENS GmbH | 2013-08-24 Mapping users to topics in Redis  we can use Redis sets to map users to topics  each topic gets its own set  and user ids are added to all sets they have topics for: redis> sadd topic­skating 1234 redis> sadd topic­music 1234 redis> sadd topic­skating 2345 redis> sadd topic­running 3456
  • 59. © 2013 triAGENS GmbH | 2013-08-24 Querying users for topics in Redis  which users have topic "skating" assigned? redis> smembers topic­skating 1) "1234" 2) "2345"  which users have both topics "skating" and "music" assigned (intersection)? redis> sinter topic­skating               topic­music 1) "1234"
  • 60. © 2013 triAGENS GmbH | 2013-08-24 Querying distinct values in Redis  using the sets and key enumeration, we can also answer the question "what distinct topics are there?": redis> keys topic­* 1) "topic­skating" 2) "topic­music" 3) "topic­running"
  • 61. © 2013 triAGENS GmbH | 2013-08-24 Data structure commands in Redis  there is no general-purpose query language so querying is rather limited  in general, data must be made to fit the commands  the special commands are very useful to implement counters, queues, and publish/subscribe
  • 62. © 2013 triAGENS GmbH | 2013-08-24 Other key-value stores  other key-value stores use the memcache protocol or provide an HTTP API  some allow users to maintain secondary indexes  these indexes can be used for equality and range queries on the index data  some key-value stores also provide map- reduce for arbitrary queries
  • 63. © 2013 triAGENS GmbH | 2013-08-24 Summary
  • 64. © 2013 triAGENS GmbH | 2013-08-24 Summary – non-relational  NoSQL databases are very different from relational databases and do not follow the relational model  instead of working on fixed column tables, they work on aggregates or BLOBs  they often intentionally lack features that relational databases have  SQL is not widely used to query and administer
  • 65. © 2013 triAGENS GmbH | 2013-08-24 Summary – categories  there are different categories of NoSQL databases, with different use cases and limitations each  key-value stores normally focus on high throughput and/or scalability, and often allow limited querying only  document stores try to be more general purpose and often allow more complex queries
  • 66. © 2013 triAGENS GmbH | 2013-08-24 Summary – usage  the APIs of NoSQL databases are often simple, so it is easy to get started with them  providing database access via HTTP REST APIs is quite common in the NoSQL world  this allows querying the database directly from any HTTP-enabled clients (browsers, mobile devices etc.)
  • 67. © 2013 triAGENS GmbH | 2013-08-24 Summary – variety  NoSQL databases are very different from each other  there are yet no standards such as SQL is in the relational world  there is an interesting attempt to establish a cross-database query language (JSONiq)