Oslo baksia2014

Domain Driven Design,
Multi-Model Databases and
ArangoDB
Baksia Meetup, Oslo, 23 October 2014
Max Neunhöffer
www.arangodb.com

Max Neunhöffer
I am a mathematician
“Earlier life”: Research in Computer Algebra
(Computational Group Theory)
Always juggled with big data
Now: working in database development, NoSQL, ArangoDB
I like:
research,
hacking,
teaching,
tickling the highest performance out of computer systems.
1

ArangoDB GmbH
triAGENS GmbH offers consulting services since 2004:
software architecture
project management
software development
business analysis
a lot of experience with specialised database systems.
have done NoSQL, before the term was coined at all
2011/2012, an idea emerged:
to build the database one had wished to have all those years!
development of ArangoDB as open source software since 2012
ArangoDB GmbH: spin-off to take care of ArangoDB (2014)
2

A typical Project: a Web Shop
The Speci1cation Workshop
(need recommendation engine, need statistics, etc.)
The Developers get to work . . .
(tables, relations, normalisation, schemas, queries, front-ends, etc.)
HANDOVER
(Why can I not . . . ? This is unusable!)
3

Solution: Agile Approach and Domain Driven Design
These days, many use (or try to use):
agile methods (Scrum, sprints, rapid prototyping)
with continuous feedback from product owners to developers
promising less surprises in deployment and high 2exibility.
Domain Driven Design (Eric Evans, 2004):
identify a Domain (area in which software is applied)
make a Model (abstract description of situation)
use a Ubiquitous Language (that all team members speak)
clearly de1ne the Context in which the model applies.
Model your data as close to the domain as possible.
Example: object oriented programming
4

Fundamental Problem: need a ubiquitous Language
Listening to team members, you hear completely different things:
Product Managers talk about
customers “browsing” through the shop,
powerful search for products (with the “good ones” on top),
“useful” recommendations.
Developers talk about
tables, normalisation, queries and joins
secondary indexes, front-end pages
object oriented, model view controller, responsive design
=) both groups think the others are morons
5

The problem is rooted very deeply
functionality not gathered
methodically
+
“obvious” functions are missing
no common language
+
misunderstandings about details
6

NoSQL: Richer Data Models are closer to the Domain
Some terms used by Evans as part of the ubiquitous language:
Entity: has an identity and mutable state (e.g. a person)
Value object: is identi1ed by its attributes and immutable
(e.g. an address)
Aggregate: is a combination of entities and value objects into one
transactional unit (e.g. a customer with its orders)
Association: is a relation between entities and value objects, can
have attributes, usually immutable
Consequences
These terms coming from the Domain must be present in
the Design. The whole team must understand the same
when talking about them.
7

Polyglot Persistence
Idea
Use the right data model for each part of a system.
For an application, persist
an object or structured data as a JSON document,
a hash table in a key/value store,
relations between objects in a graph database,
a homogeneous array in a relational DBMS.
If the table has many empty cells or inhomogeneous rows, use
a column-based database.
Take scalability needs into account!
8

Document and key/value stores
Document store
A document store stores a set of documents, which usually
means JSON data, these sets are called collections. The
database has access to the contents of the documents.
each document in the collection has a unique key
secondary indexes possible, leading to more powerful queries
different documents in the same collection: structure can vary
no schema is required for a collection
database normalisation can be relaxed
“Special case”: key/value store
Opaque values, restrict to key lookup without secondary
indexes:
=) high performance and perfect scalability
9

Graph databases
Graph database
A graph database stores a labelled graph. Vertices and
edges can be documents. Graphs are good to model
relations.
graphs often describe data very naturally (e.g. the facebook
friendship graph)
graphs can be stored using tables, however, graph queries
notoriously lead to expensive joins
there are interesting and useful graph algorithms like “shortest
path” or “neighbourhood”
need a good query language to reap the bene1ts
horizontal scalability is troublesome
graph databases vary widely in scope and usage, no standard
10

Massively parallel: map-reduce and friends
The area of massively parallel
A massively parallel database can use thousands of servers
distributed all over the world and still appears as a single
service.
Humongous data capacity and very high read/write
performance
examples are Apache Cassandra, Apache Hadoop, Google’s
Spanner, Riak and others
these systems have important use cases, in particular in the
analytic domain
query capabilities are somewhat limited like for example only
“map/reduce”
) good horizontal scalability at the cost of reduced query 2exibility
11

A typical Use Case — an Online Shop
We need to hold
customer data: usually homogeneous, but still variations
=) use a document store:
product data: even for a specialised business quite
inhomogeneous
shopping carts: need very fast lookup by session key
=) use a key/value store:
order and sales data: relate customers and products
recommendation engine data: links between different entities
=) use a graph database:
12

Polyglot Persistence is nice, but . . .
Consequence: One needs multiple database systems in the persis-tence
layer of a single project!
Polyglot persistence introduces some friction through
data synchronisation,
data conversion,
increased installation and administration effort,
more training needs.
Wouldn’t it be nice, . . .
. . . to enjoy the bene1ts without paying with the
disadvantages?
13

The Multi-Model Approach
Multi-model database
A multi-model database combines a document store with a
graph database and is at the same time a key/value store.
Vertices are documents in a vertex collection,
edges are documents in an edge collection.
a single, common query language for all three data models
is able to compete with specialised products on their turf
allows for polyglot persistence using a single database
queries can mix the different data models
can replace a RDMBS in many cases
14

Why is this possible at all?
Document stores and key/value stores
Document stores: have primary key, are key/value stores.
Without using secondary indexes, performance is nearly as
good as with opaque data instead of JSON.
Good horizontal scalability can be achieved for key lookups.
Document stores and graph databases
graph database: would like to associate arbitrary data with
vertices and edges, so JSON documents are a good choice.
A good edge index, giving fast access to neighbours.
This can be a secondary index.
Graph support in the query language.
Implementations of graph algorithms in the database engine.
15

A Map of the NoSQL Landscape
Transaction Processing DBs
Map/reduce
Column Stores
Analytic processing DBs
Extensibility
Complex queries
Documents
Massively distributed
Graphs
Structured
Data
Key/Value
16

is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
memory eZcient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
API extensible by JavaScript code in the Foxx framework,
offers many drivers for a wide range of languages,
is easy to use with web front end and good documentation,
enjoys good professional as well as community support
and has sharding since Version 2.0.
17

A Map of the NoSQL Landscape
Map/reduce
Column Stores
Extensibility
Complex queries
Documents
Graphs
Structured
Data
Key/Value
18

The ArangoDB Territory
Map/reduce
Column Stores
Extensibility
Complex queries
Documents
Graphs
Structured
Data
Key/Value
19

Strong Consistency
ArangoDB offers
atomic and isolated CRUD operations for single documents,
transactions spanning multiple documents and multiple
collections,
snapshot semantics for complex queries,
very secure durable storage using append only and storing
multiple revisions,
all this for documents as well as for graphs.
In the (not too distant) future, ArangoDB will
offer the same ACID semantics even with sharding,
implement complete MVCC semantics to allow for lock-free
concurrent transactions.
20

Replication and Sharding — horizontal scalability
Right now, ArangoDB provides
easy setup of (asynchronous) replication,
which allows read access parallelisation (master/slaves setup),
sharding with automatic data distribution to multiple servers.
Very soon, ArangoDB will feature
fault tolerance by automatic failover and synchronous
replication in cluster mode,
zero administration by a self-reparing and self-balancing
cluster architecture.
21

Powerful query language: AQL
The built in Arango Query Language AQL allows
complex, powerful and convenient queries,
with transaction semantics,
allowing to do joins,
with user de1nable functions (in JavaScript).
AQL is independent of the driver used and
offers protection against injections by design.
For Version 2.3, we are reengineering the AQL query engine:
use a C++ implementation for high performance,
optimise distributed queries in the cluster.
22

Extensible through JavaScript and Foxx
The HTTP API of ArangoDB
can be extended by user-de1ned JavaScript code,
that is executed in the DB server for high performance.
This is formalised by the Foxx framework,
which allows to implement complex, user-de1ned APIs with
direct access to the DB engine.
Very 2exible and secure authentication schemes can be
implemented conveniently by the user in JavaScript.
Because JavaScript runs everywhere (in the DB server as well
as in the browser), one can use the same libraries in the
back-end and in the front-end.
=) implement your own micro services
23

Oslo baksia2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Oslo baksia2014

Similar to Oslo baksia2014 (20)

More from Max Neunhöffer

More from Max Neunhöffer (9)

Recently uploaded

Recently uploaded (20)

Oslo baksia2014