Max Neunhöffer presents on the future of NoSQL databases and argues multi-model databases will become the standard. He discusses different NoSQL data models like document stores, key-value stores, graph databases and column-oriented databases. He advocates the benefits of a polyglot persistence approach but notes the disadvantages of managing multiple databases. Max introduces ArangoDB as a multi-model database that supports documents, graphs and key-value in a single database to provide the benefits of polyglot persistence without the disadvantages. He provides examples of how ArangoDB has been used and outlines its features including queries, extensibility and horizontal scalability. Max predicts that in five years, the default approach will be to use a
Introduction to ArangoDB (nosql matters Barcelona 2012)ArangoDB Database
ArangoDB is a universal open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript/Ruby extensions.
The video is also available online:
http://2012.nosql-matters.org/bcn/speakers/
Jan Steemann had a talk at Javascript Everywhere in Paris 2012 on Javascript in Arangodb an open source NoSQL database. With ArangoDB you can use Javascript and/or Ruby (mruby) as embedded language
The is the RFC for AvocadoDB's query language. AvocadoDB is an open source nosql database (see www.avocadodb.org) offering a mixture of data models like key value pairs, documents and graphs.
The REST API for AvocadoDB is already available and stable and people are writing APIs using it. Awesome. As AvocacoDB offers more complex data structures like graphs and lists REST is not enough. We implemented a first version of a query language some time ago which is very similar to SQL and UNQL.
Then we realized that this approach was not completely satisfying as some queries cannot expressed very well with it, especially multi-valued attributes/lists. UNQL addresses this partly, but does not go far enough. Another issue are graphs. AvocadoDB supports querying graphs, neither SQL nor UNQL offer any "natural" graph traversal facilities.
As we did not find any existing query language that addresses the problems we found we had to define a new query language which is presented in the presentation.
Have some feedback on this? Come to www.avocadodb.org and tell us what you think about it. :-)
Introduction to ArangoDB (nosql matters Barcelona 2012)ArangoDB Database
ArangoDB is a universal open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript/Ruby extensions.
The video is also available online:
http://2012.nosql-matters.org/bcn/speakers/
Jan Steemann had a talk at Javascript Everywhere in Paris 2012 on Javascript in Arangodb an open source NoSQL database. With ArangoDB you can use Javascript and/or Ruby (mruby) as embedded language
The is the RFC for AvocadoDB's query language. AvocadoDB is an open source nosql database (see www.avocadodb.org) offering a mixture of data models like key value pairs, documents and graphs.
The REST API for AvocadoDB is already available and stable and people are writing APIs using it. Awesome. As AvocacoDB offers more complex data structures like graphs and lists REST is not enough. We implemented a first version of a query language some time ago which is very similar to SQL and UNQL.
Then we realized that this approach was not completely satisfying as some queries cannot expressed very well with it, especially multi-valued attributes/lists. UNQL addresses this partly, but does not go far enough. Another issue are graphs. AvocadoDB supports querying graphs, neither SQL nor UNQL offer any "natural" graph traversal facilities.
As we did not find any existing query language that addresses the problems we found we had to define a new query language which is presented in the presentation.
Have some feedback on this? Come to www.avocadodb.org and tell us what you think about it. :-)
Domain Driven Design is a software development process that focuses on finding a common language for the involved parties. This language and the resulting models are taken from the domain rather than the technical details of the implementation. The goal is to improve the communication between customers, developers and all other involved groups. Even if Eric Evan's book about this topic was written almost ten years ago, this topic remains important because a lot of projects fail for communication reasons.
Relational databases have their own language and influence the design of software into a direction further away from the Domain: Entities have to be created for the sole purpose of adhering to best practices of relational database. Two kinds of NoSQL databases are changing that: Document stores and graph databases. In a document store you can model a "contains" relation in a more natural way and thereby express if this entity can exist outside of its surrounding entity. A graph database allows you to model relationships between entities in a straight forward way that can be expressed in the language of the domain.
In this talk I want to look at the way a multi model database that combines a document store and a graph database can help you to model your problems in a way that is understandable for all parties involved, and explain the benefits of this approach for the software development process.
ArangoDB is a native multi-model database system developed by triAGENS GmbH. The database system supports three important data models (key/value, documents, graphs) with one database core and a unified query language AQL (ArangoDB Query Language). ArangoDB is a NoSQL database system but AQL is similar in many ways to SQL
In this hotcode 2013 talk Lucas and Frank gave an overview over NoSQL and explained why it is a good idea to use Javascript also in the database environment.
As more businesses realised that data, in all forms and sizes, is critical to making the best possible decisions, we see the continued growth of systems that support massive volume of non-relational or unstructured forms of data. Nothing shows the picture more starkly than the Gartner Magic quadrant for operational database management systems, which assumes that, by 2017, all leading operational DBMSs will offer multiple data models, relational and NoSQL, in a single DBMS platform. Having a single data platform for managing both well-structured data and NoSQL data is beneficial to users; this approach reduces significantly integration, migration, development, maintenance, and operational issues. Therefore, a challenging research work is how to develop efficient consolidated single data management platform covering both relational data and NoSQL to reduce integration issues, simplify operations, and eliminate migration issues.
In this tutorial, we review the previous work on multi-model data management and provide the insights on the research challenges and directions for future work.
Papers and more materials on this tutorial can be found at: http://udbms.cs.helsinki.fi/?tutorials
DBPedia past, present and future - Dimitris Kontokostas. Reveals recent developments in the Linked Data and knowledge graphs field and how DBPedia progress with wikipedia data.
In this talk I will explain the motivation behind the multi model database approach, discuss its advantages and limitations, and will keep the presentation concrete and practice oriented by showing concrete usage examples from node.js .
guacamole: an Object Document Mapper for ArangoDBMax Neunhöffer
In this talk I will give a brief introduction and overview for guacamole, showing how easy it is to get started with using ArangoDB as the persistence layer for a Rails app. I will also explain the philosophy behind ArangoDB's "multi-model approach", but still show concrete code examples, and all of this in 15 minutes.
Domain Driven Design is a software development process that focuses on finding a common language for the involved parties. This language and the resulting models are taken from the domain rather than the technical details of the implementation. The goal is to improve the communication between customers, developers and all other involved groups. Even if Eric Evan's book about this topic was written almost ten years ago, this topic remains important because a lot of projects fail for communication reasons.
Relational databases have their own language and influence the design of software into a direction further away from the Domain: Entities have to be created for the sole purpose of adhering to best practices of relational database. Two kinds of NoSQL databases are changing that: Document stores and graph databases. In a document store you can model a "contains" relation in a more natural way and thereby express if this entity can exist outside of its surrounding entity. A graph database allows you to model relationships between entities in a straight forward way that can be expressed in the language of the domain.
In this talk I want to look at the way a multi model database that combines a document store and a graph database can help you to model your problems in a way that is understandable for all parties involved, and explain the benefits of this approach for the software development process.
ArangoDB is a native multi-model database system developed by triAGENS GmbH. The database system supports three important data models (key/value, documents, graphs) with one database core and a unified query language AQL (ArangoDB Query Language). ArangoDB is a NoSQL database system but AQL is similar in many ways to SQL
In this hotcode 2013 talk Lucas and Frank gave an overview over NoSQL and explained why it is a good idea to use Javascript also in the database environment.
As more businesses realised that data, in all forms and sizes, is critical to making the best possible decisions, we see the continued growth of systems that support massive volume of non-relational or unstructured forms of data. Nothing shows the picture more starkly than the Gartner Magic quadrant for operational database management systems, which assumes that, by 2017, all leading operational DBMSs will offer multiple data models, relational and NoSQL, in a single DBMS platform. Having a single data platform for managing both well-structured data and NoSQL data is beneficial to users; this approach reduces significantly integration, migration, development, maintenance, and operational issues. Therefore, a challenging research work is how to develop efficient consolidated single data management platform covering both relational data and NoSQL to reduce integration issues, simplify operations, and eliminate migration issues.
In this tutorial, we review the previous work on multi-model data management and provide the insights on the research challenges and directions for future work.
Papers and more materials on this tutorial can be found at: http://udbms.cs.helsinki.fi/?tutorials
DBPedia past, present and future - Dimitris Kontokostas. Reveals recent developments in the Linked Data and knowledge graphs field and how DBPedia progress with wikipedia data.
In this talk I will explain the motivation behind the multi model database approach, discuss its advantages and limitations, and will keep the presentation concrete and practice oriented by showing concrete usage examples from node.js .
guacamole: an Object Document Mapper for ArangoDBMax Neunhöffer
In this talk I will give a brief introduction and overview for guacamole, showing how easy it is to get started with using ArangoDB as the persistence layer for a Rails app. I will also explain the philosophy behind ArangoDB's "multi-model approach", but still show concrete code examples, and all of this in 15 minutes.
3.Implementation with NOSQL databases Document Databases (Mongodb).pptxRushikeshChikane2
this Chapter gives information about Document Based Database and Graph based Database. It gives their basic structures, Features,applications ,Limitations and use cases
Comparative study of no sql document, column store databases and evaluation o...ijdms
In the last decade, rapid growth in mobile applications, web technologies, social media generating
unstructured data has led to the advent of various nosql data stores. Demands of web scale are in
increasing trend everyday and nosql databases are evolving to meet up with stern big data requirements.
The purpose of this paper is to explore nosql technologies and present a comparative study of document
and column store nosql databases such as cassandra, MongoDB and Hbase in various attributes of
relational and distributed database system principles. Detailed study and analysis of architecture and
internal working cassandra, Mongo DB and HBase is done theoretically and core concepts are depicted.
This paper also presents evaluation of cassandra for an industry specific use case and results are
published.
Alexander Aldev - Co-founder and CTO of MammothDB, currently focused on the architecture of the distributed database engine. Notable achievements in the past include managing the launch of the first triple-play cable service in Bulgaria and designing the architecture and interfaces from legacy systems of DHL Global Forwarding's data warehouse. Has lectured on Hadoop at AUBG and MTel.
"The future of Big Data tooling" will briefly review the architectural concepts of current Big Data tools like Hadoop and Spark. It will make the argument, from the perspective of both technology and economics, that the future of Big Data tools is in optimizing local storage and compute efficiency.
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
We will take a deep dive into ArangoDB (https://www.arangodb.com/) together with Max (https://www.linkedin.com/in/maxneunhoeffer) one of the core developers of the product.
ArangoDB is a multi-model database, which means that it is a document store, a key/value store and a graph database, all in one engine and with a query language that supports all three data models, as well as joins and transactions. Queries can use a single data model or can even mix them.
ArangoDB scales out horizontally with convenient cluster deployment using Apache Mesos. Furthermore, the HTTP API can easily be extended by server-side JavaScript code using high performance access to the C++ database core.
During the talk I will show all these features using several different cloud deployments, since in most projects one will not deploy a ArangoDB monolith, but rather multiple instances, each either a possibly replicated single server, or a cluster. This demonstrates that all these properties together make ArangoDB a very useful and valuable tool in modern microservice oriented architectures.
Recently, ArangoDB integrated its cluster management with Apache Mesos. This makes it now possible to launch an ArangoDB cluster on a Mesos cluster with a single, albeit complex shell command. In a DCOS-enabled Mesosphere cluster this is even easier, because one can use the dcos subcommand for ArangoDB, which essentially turns a Mesosphere cluster into a single, large computer.
In this talk I explain the whole setup and show (live on stage) how to deploy ArangoDB clusters on Amazon Web Services, and how we used this to scale ArangoDB up until it could sustain 1000000 document writes per second.
Recently, ArangoDB integrated its cluster management with Apache Mesos. This makes it now possible to launch an ArangoDB cluster on a Mesos cluster with a single, albeit complex shell command. In a DCOS-enabled Mesosphere cluster this is even easier, because one can use the dcos subcommand for ArangoDB, which essentially turns a Mesosphere cluster into a single, large computer.
In this talk I explain the whole setup and show (live on stage) how to deploy ArangoDB clusters on Google Compute Engine, and how we used this to scale ArangoDB up until it could sustain 1000000 document writes per second.
Processing large-scale graphs with Google PregelMax Neunhöffer
Graphs are a very popular data structure to store relations like
friendship or web pages and their links. Therefore graph databases
have become popular recently and some of them even allow sharding,
i.e. automatic distribution of the data across multiple machines.
On the other hand, very computation-intensive algorithms for graphs are known and used in practice, and they often access very large data sets, which leads to heavy communication loads.
Therefore, it is an obvious idea to run such graph algorithms on the database servers, close to the data, making use of the computational power of the storage nodes.
Google's Pregel framework allows to implement a lot of graph algorithms in a general system and plays a role similar to the map-reduce skeleton, but for graphs.
In this talk I will explain the framework and describe its implementation in the multi-model database ArangoDB.
Backbone using Extensible Database APIs over HTTPMax Neunhöffer
These days, more and more software applications are designed using a micro services architecture, that is, as suites of independently deployable services, talking to each other with well-defined interfaces. This approach is helped by the fact that many NoSQL databases expose their API through HTTP, which makes it particularly easy to define the interfaces.
The multi-model NoSQL database ArangoDB embeds Google's V8 JavaScript engine and features the Foxx framework, which allows the developer to extend ArangoDB's API by user defined JavaScript code that runs on the database server.
In this talk I will explain the benefits of this approach to the software architecture and development process. I will keep the presentation practice oriented by showing concrete examples in ArangoDB and JavaScript, using Backbone.js
Complex queries in a distributed multi-model databaseMax Neunhöffer
A multi-model database is a document store, a graph database as well as a key/value store. To allow for convenient and powerful querying such a database needs a query language that understands all three data models and allows to mix these models in queries. For example, it should be possible to find some documents in a collection according to some criteria, then follow some edges in a graph in which the documents represent vertices, and finally join the results with documents from yet another collection.
In this talk I will explain how a query engine for such a language works, give an overview of the life of a query from parsing, over translation into an execution plan, the optimisation phase and finally the execution. I will show how distributed query execution plans look like, how the query optimiser reasons about them and how the distributed execution works.
In 2014 we had to do a major overhaul of ArangoDB's database engine,because we wanted to introduce a write-ahead log. Since for a database this change is similar in nature to the proverbial open-heart surgery for humans, it was clear from day one that this would be a difficult endeavour with a lot of risk to break things. Rather fundamental changes were needed in nearly all places of the kernel code and it seemedimpossible to serialise the work to keep the system in a working state. As usual, time was at a premium, since the next major release had to go out of the door in 2 months time.
In this talk I will tell the story of this overhaul, explain the role of unit tests and continuous integration and describe the challenges we faced and how finally overcame them.
ArangoDB is an open source, multi-model NoSQL database that is written in C++ and embeds Google's V8 engine to implement the higher levels of its functionality in JavaScript. Recently we decided to switch from C++03 to C++11 for the database kernel. In this talk I will first give a short overview of the software architecture of ArangoDB and proceed to tell you about our practical experiences with the switch to C++11. I will explain which of the parts of the "new" standard have been more important and which have been less useful, and I will report about the difficulties we encountered.
Extensible Database APIs and their role in Software ArchitectureMax Neunhöffer
This event will start with a presentation on “Extensible database APIs and their role in software architecture”, centered around JavaScript. This will be followed by a hands-on interactive workshop. Participants with their own computers will learn how to create a small web application with a database backend, within the session, using only JavaScript. This will be a guided hands-on session using the multi-model NoSQL database ArangoDB and its Foxx JavaScript extension framework. Presenting this workshop will be Max Neunhöffer from https://www.arangodb.com/.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
1. Is multi-model the future of
NoSQL?
Max Neunhöffer
SouthBay.NET Meetup, 5 March 2015
www.arangodb.com
2. Max Neunhöffer
I am a mathematician
“Earlier life”: Research in Computer Algebra
(Computational Group Theory)
Always juggled with big data
Now: working in database development, NoSQL, ArangoDB
I like:
research,
hacking,
teaching,
tickling the highest performance out of computer systems.
1
3. ArangoDB GmbH
triAGENS GmbH offers consulting services since 2004:
software architecture
project management
software development
business analysis
a lot of experience with specialised database systems
have done NoSQL, before the term was coined at all
2011/2012, an idea emerged:
to build the database one had wished to have all those years!
development of ArangoDB as open source software since 2012
ArangoDB GmbH: spin-off to take care of ArangoDB (2014)
2
4. Document and Key/Value Stores
Document store
A document store stores a set of documents, which usually
means JSON data, these sets are called collections. The
database has access to the contents of the documents.
each document in the collection has a unique key
secondary indexes possible, leading to more powerful queries
different documents in the same collection: structure can vary
no schema is required for a collection
database normalisation can be relaxed
Key/value store
Opaque values, only key lookup without secondary indexes:
=⇒ high performance and perfect scalability
3
5. Graph databases
Graph database
A graph database stores a labelled graph. Vertices and
edges can be documents. Graphs are good to model
relations.
graphs often describe data very naturally (e.g. the facebook
friendship graph)
graphs can be stored using tables, however, graph queries
notoriously lead to expensive joins
there are interesting and useful graph algorithms like “shortest
path” or “neighbourhood”
need a good query language to reap the benefits
horizontal scalability is troublesome
graph databases vary widely in scope and usage, no standard
4
6. Column-oriented data stores
Column-oriented data astores
A column-oriented database stores tables but “keeps
columns together” rather than rows.
access to a whole column is fast
sparse rows are handled efficiently
particularly good for certain types of data analysis
often implemented in a key/value-like fashion
row access can be slow
columns have homogeneous data, so compression works well
prominent examples: C-Store and Cassandra
5
7. Massively parallel: map-reduce and friends
The area of massively parallel
A massively parallel database can use thousands of servers
distributed all over the world and still appears as a single
service.
Humongous data capacity and very high read/write
performance
examples are Apache Cassandra, Apache Hadoop, Google’s
Spanner, Riak and others
these systems have important use cases, in particular in the
analytic domain
query capabilities are somewhat limited like for example only
“map/reduce”
⇒ good horizontal scalability at the cost of reduced query flexibility
6
8. Polyglot Persistence
Idea
Use the right data model for each part of a system.
For an application, persist
an object or structured data as a JSON document,
a hash table in a key/value store,
relations between objects in a graph database,
a homogeneous array in a relational DBMS.
If the table has many empty cells or inhomogeneous rows, use
a column-oriented database.
Take scalability needs into account!
7
9. A typical Use Case — an Online Shop
We need to hold
customer data: usually homogeneous, but still variations
=⇒ use a relational DB: MySQL
product data: even for a specialised business quite
inhomogeneous
=⇒ use a document store:
shopping carts: need very fast lookup by session key
=⇒ use a key/value store:
order and sales data: relate customers and products
=⇒ use a document store:
recommendation engine data: links between different entities
=⇒ use a graph database:
8
10. Polyglot Persistence is nice, but . . .
Consequence: One needs multiple database systems in the persis-
tence layer of a single project!
Polyglot persistence introduces some friction through
data synchronisation,
data conversion,
increased installation and administration effort,
more training needs.
Wouldn’t it be nice, . . .
. . . to enjoy the benefits without the disadvantages?
9
11. The Multi-Model Approach
Multi-model database
A multi-model database combines a document store with a
graph database and is at the same time a key/value store.
Vertices are documents in a vertex collection,
edges are documents in an edge collection.
a single, common query language for all three data models
is able to compete with specialised products on their turf
allows for polyglot persistence using a single database
queries can mix the different data models
can replace a RDMBS in many cases
10
12. Why is this possible at all?
Document stores and key/value stores
Document stores: have primary key, are key/value stores.
Without using secondary indexes, performance is nearly as
good as with opaque data instead of JSON.
Good horizontal scalability can be achieved for key lookups.
11
13. Why is this possible at all?
Document stores and graph databases
graph database: would like to associate arbitrary data with
vertices and edges, so JSON documents are a good choice.
A good edge index, giving fast access to neighbours.
This can be a secondary index.
Graph support in the query language.
Implementations of graph algorithms in the DB engine.
12
14. A Map of the NoSQL Landscape
Transaction Processing DBs
Analytic processing DBs
Map/reduce
Column Stores
Extensibility
Documents
Massively distributed
Graphs
Structured
Data
Key/Value
Complex queries
13
15. Use case: Aircraft fleet management
One of our customers uses ArangoDB to
store each part, component, unit or aircraft as a document
model containment as a graph
thus can easily find all parts of some component
keep track of maintenance intervals
perform queries orthogonal to the graph structure
thereby getting good efficiency for all needed queries
14
16. Use case: Family tree management
For genealogy, the natural object is a family tree.
data naturally comes as a (directed) graph
many queries are traversals or shortest path
but not all, for example:
“all people with name James” in a family tree, sorted by birthday
“all family members who studied at Berkeley”, sorted by
number of children
quite often, queries mixing the different models are useful
15
17. Use case: knowledge bases
encode nearly arbitrary knowledge
often produced by machine learning
queried in very complex ways by expert systems
often in connection to an inference engine
need linked data with lots of associations
typical queries have unpredictable path length, thus graph
queries shine
nevertheless, often queries orthogonal to the links are needed
16
18. Recently: Key/Value stores adding other models
(by Basho), originally a key/value store, adds support for
documents with their 2.0 version (late 2014)
(sponsored by Pivotal), originally an in-memory
key/value store, has over time added more data types and
more complex operations
FoundationDB (by FoundationDB) is a key/value store, but is
now marketed as a multi-model database by adding additional
layers on top
OrientDB (by Orient Technologies) started as an object
database and nowadays calls itself a multi-model database
17
19. Recently: DataStax acquired Aurelius
In February 2015, DataStax (commercialised version of Cassan-
dra (column-oriented)), announced the acquisition of Aurelius, the
company behind TitanDB (a distributed graph database on top of
Cassandra).
In their own words:
“Bringing Graph Database Technology To Cassandra.”
“Will deliver massively scalable, always-on graph database
technology.”
“Will simplify the adoption of leading NoSQL technologies to
support multi-model use case environments.”
18
20. Recently: MongoDB 3.0 adds pluggable DB engine
is one of the most popular document stores.
In February 2015, they announced their 3.0 version, to be released
in March, featuring
a pluggable storage engine layer
transparent on-disk compression
etc.
This indicates their interest to support more data models than “just
documents”.
It will be very interesting indeed to see if and how they extend their
query-language . . .
19
21. is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
API extensible by JavaScript code in the Foxx framework,
offers many drivers for a wide range of languages,
is easy to use with web front end and good documentation,
enjoys good professional as well as community support
and has sharding since Version 2.0.
20
22. Configurable consistency
ArangoDB offers
atomic and isolated CRUD operations for single documents,
transactions spanning multiple documents and multiple
collections,
snapshot semantics for complex queries,
very secure durable storage using append only and storing
multiple revisions,
all this for documents as well as for graphs.
In the near future, ArangoDB will
implement complete MVCC semantics to allow for lock-free
concurrent transactions
and offer the same ACID semantics even with sharding.
21
23. Replication and Sharding — horizontal scalability
Right now, ArangoDB provides
easy setup of (asynchronous) replication,
which allows read access parallelisation (master/slaves setup),
sharding with automatic data distribution to multiple servers.
Very soon, ArangoDB will feature
fault tolerance by automatic failover and synchronous
replication in cluster mode,
zero administration by a self-reparing and self-balancing
cluster architecture,
full integration with Apache Mesos and Mesosphere.
22
24. Powerful query language: AQL
The built in Arango Query Language AQL allows
complex, powerful and convenient queries,
with transaction semantics,
allowing to do joins,
with user definable functions (in JavaScript).
AQL is independent of the driver used and
offers protection against injections by design.
For Version 2.3, we have reengineered the AQL query engine:
use a C++ implementation for high performance,
optimise distributed queries in the cluster.
23
25. Extensible through JavaScript and Foxx
The HTTP API of ArangoDB
can be extended by user-defined JavaScript code,
that is executed in the DB server for high performance.
This is formalised by the Foxx microservice framework,
which allows to implement complex, user-defined APIs with
direct access to the DB engine.
Very flexible and secure authentication schemes can be
implemented conveniently by the user in JavaScript.
Because JavaScript runs everywhere (in the DB server as well
as in the browser), one can use the same libraries in the
back-end and in the front-end.
=⇒ implement your own micro services
24
26. The Future of NoSQL: My Observations
I observe
2 decades ago the most versatile solutions eventually
dominated the relational DB market
(Oracle, MySQL, PostgreSQL),
the rise of the polyglot persistence idea
a trend towards multi-model databases
specialised products broadening their scope
even relational systems add support for JSON documents
devOps gaining influence (Docker phenomenon)
25
27. The Future of NoSQL: My Predictions
In 5 years time . . .
the default approach is to use a multi-model database,
the big vendors will all add other data models,
the NoSQL solutions will conquer a sizable portion
of what is now dominated by the relational model,
specialized products will only survive, if they find a niche.
26