In 2014 we had to do a major overhaul of ArangoDB's database engine,because we wanted to introduce a write-ahead log. Since for a database this change is similar in nature to the proverbial open-heart surgery for humans, it was clear from day one that this would be a difficult endeavour with a lot of risk to break things. Rather fundamental changes were needed in nearly all places of the kernel code and it seemedimpossible to serialise the work to keep the system in a working state. As usual, time was at a premium, since the next major release had to go out of the door in 2 months time.
In this talk I will tell the story of this overhaul, explain the role of unit tests and continuous integration and describe the challenges we faced and how finally overcame them.
Processing large-scale graphs with Google PregelMax Neunhöffer
Graphs are a very popular data structure to store relations like
friendship or web pages and their links. Therefore graph databases
have become popular recently and some of them even allow sharding,
i.e. automatic distribution of the data across multiple machines.
On the other hand, very computation-intensive algorithms for graphs are known and used in practice, and they often access very large data sets, which leads to heavy communication loads.
Therefore, it is an obvious idea to run such graph algorithms on the database servers, close to the data, making use of the computational power of the storage nodes.
Google's Pregel framework allows to implement a lot of graph algorithms in a general system and plays a role similar to the map-reduce skeleton, but for graphs.
In this talk I will explain the framework and describe its implementation in the multi-model database ArangoDB.
In this talk we present the term polyglot persistence, give a brief introduction to the world of NoSQL database and point out the benefits and costs of polyglot persistence. Thereafter we present the idea of a multi-model database that reduces the costs for polyglot persistence but keeps its benefits. Next up we present ArangoDB as a Multi-Model database
Extensible Database APIs and their role in Software ArchitectureMax Neunhöffer
This event will start with a presentation on “Extensible database APIs and their role in software architecture”, centered around JavaScript. This will be followed by a hands-on interactive workshop. Participants with their own computers will learn how to create a small web application with a database backend, within the session, using only JavaScript. This will be a guided hands-on session using the multi-model NoSQL database ArangoDB and its Foxx JavaScript extension framework. Presenting this workshop will be Max Neunhöffer from https://www.arangodb.com/.
In this talk I will explain the motivation behind the multi model database approach, discuss its advantages and limitations, and will keep the presentation concrete and practice oriented by showing concrete usage examples from node.js .
Backbone using Extensible Database APIs over HTTPMax Neunhöffer
These days, more and more software applications are designed using a micro services architecture, that is, as suites of independently deployable services, talking to each other with well-defined interfaces. This approach is helped by the fact that many NoSQL databases expose their API through HTTP, which makes it particularly easy to define the interfaces.
The multi-model NoSQL database ArangoDB embeds Google's V8 JavaScript engine and features the Foxx framework, which allows the developer to extend ArangoDB's API by user defined JavaScript code that runs on the database server.
In this talk I will explain the benefits of this approach to the software architecture and development process. I will keep the presentation practice oriented by showing concrete examples in ArangoDB and JavaScript, using Backbone.js
ArangoDB is an open source, multi-model NoSQL database that is written in C++ and embeds Google's V8 engine to implement the higher levels of its functionality in JavaScript. Recently we decided to switch from C++03 to C++11 for the database kernel. In this talk I will first give a short overview of the software architecture of ArangoDB and proceed to tell you about our practical experiences with the switch to C++11. I will explain which of the parts of the "new" standard have been more important and which have been less useful, and I will report about the difficulties we encountered.
Processing large-scale graphs with Google PregelMax Neunhöffer
Graphs are a very popular data structure to store relations like
friendship or web pages and their links. Therefore graph databases
have become popular recently and some of them even allow sharding,
i.e. automatic distribution of the data across multiple machines.
On the other hand, very computation-intensive algorithms for graphs are known and used in practice, and they often access very large data sets, which leads to heavy communication loads.
Therefore, it is an obvious idea to run such graph algorithms on the database servers, close to the data, making use of the computational power of the storage nodes.
Google's Pregel framework allows to implement a lot of graph algorithms in a general system and plays a role similar to the map-reduce skeleton, but for graphs.
In this talk I will explain the framework and describe its implementation in the multi-model database ArangoDB.
In this talk we present the term polyglot persistence, give a brief introduction to the world of NoSQL database and point out the benefits and costs of polyglot persistence. Thereafter we present the idea of a multi-model database that reduces the costs for polyglot persistence but keeps its benefits. Next up we present ArangoDB as a Multi-Model database
Extensible Database APIs and their role in Software ArchitectureMax Neunhöffer
This event will start with a presentation on “Extensible database APIs and their role in software architecture”, centered around JavaScript. This will be followed by a hands-on interactive workshop. Participants with their own computers will learn how to create a small web application with a database backend, within the session, using only JavaScript. This will be a guided hands-on session using the multi-model NoSQL database ArangoDB and its Foxx JavaScript extension framework. Presenting this workshop will be Max Neunhöffer from https://www.arangodb.com/.
In this talk I will explain the motivation behind the multi model database approach, discuss its advantages and limitations, and will keep the presentation concrete and practice oriented by showing concrete usage examples from node.js .
Backbone using Extensible Database APIs over HTTPMax Neunhöffer
These days, more and more software applications are designed using a micro services architecture, that is, as suites of independently deployable services, talking to each other with well-defined interfaces. This approach is helped by the fact that many NoSQL databases expose their API through HTTP, which makes it particularly easy to define the interfaces.
The multi-model NoSQL database ArangoDB embeds Google's V8 JavaScript engine and features the Foxx framework, which allows the developer to extend ArangoDB's API by user defined JavaScript code that runs on the database server.
In this talk I will explain the benefits of this approach to the software architecture and development process. I will keep the presentation practice oriented by showing concrete examples in ArangoDB and JavaScript, using Backbone.js
ArangoDB is an open source, multi-model NoSQL database that is written in C++ and embeds Google's V8 engine to implement the higher levels of its functionality in JavaScript. Recently we decided to switch from C++03 to C++11 for the database kernel. In this talk I will first give a short overview of the software architecture of ArangoDB and proceed to tell you about our practical experiences with the switch to C++11. I will explain which of the parts of the "new" standard have been more important and which have been less useful, and I will report about the difficulties we encountered.
Following the classical software architecture patterns we tend to design large monolith of software applications.
These monoliths are typically quite difficult to scale as they often require powerful machines, making the option to scale out very expensive.
In most cases these monoliths of software are designed to run on a single machine only, hence scaling out is complicated or even impossible without refactoring large portions of the application.
Therefore a new design pattern called microservices arose.
The pattern of microservices keeps the need of a clustered server setup in mind and helps to keep the application very modular.
This allows to simplify a scale out of your application and even allows to scale the bottlenecks of your application only and hence reducing the total cost for a scale out approach.
In this talk I will introduce the concept of microservices, how they are defined and how to design an application with them.
Furthermore I will show how to scale the application properly and why this is only possible due to the use of microservices.
Also we will have a look at Node.js and why it is a perfect, though not the only, fit to this design strategy.
However scaling is not the only purpose of microservices, they also increase the flexibility and maintainability of applications, this will also be discussed in the talk.
guacamole: an Object Document Mapper for ArangoDBMax Neunhöffer
In this talk I will give a brief introduction and overview for guacamole, showing how easy it is to get started with using ArangoDB as the persistence layer for a Rails app. I will also explain the philosophy behind ArangoDB's "multi-model approach", but still show concrete code examples, and all of this in 15 minutes.
An E-commerce App in action built on top of a Multi-model DatabaseArangoDB Database
This talk presents a genuine use case of ArangoDB's native multi-model approach, by means of the example of an e-commerce app. First the main advantages of a "multi-model" database are explained. Then we dive deep into the native multi-model database ArangoDB and its query language - AQL. We give an introduction to the three data-models ArangoDB covers (Documents, Graphs and Key-Values), and explain that AQL is a uniform query language that can cover all three data-models of ArangoDB, so no context switches are necessary.
The major part of the talk will explain the data model and show concrete AQL queries that would occur in an e-commerce platform. Max will demonstrate the multi-model advantages of AQL and how they lead to better performance and to a simpler life for developers.
Video: https://youtu.be/9MUhdPpPpPc
A Graph Database That Scales - ArangoDB 3.7 Release WebinarArangoDB Database
örg Schad (Head of Engineering and ML) and Chris Woodward (Developer Relations Engineer) introduce the new capabilities to work with graph in a distributed setting. In addition explain and showcase the new fuzzy search within ArangoDB's search engine as well as JSON schema validation.
Get started with ArangoDB: https://www.arangodb.com/arangodb-tra...
Explore ArangoDB Cloud for free with 1-click demos: https://cloud.arangodb.com/home
ArangoDB is a native multi-model database written in C++ supporting graph, document and key/value needs with one engine and one query language. Fulltext search and ranking is supported via ArangoSearch the fully integrated C++ based search engine in ArangoDB.
Performance comparison: Multi-Model vs. MongoDB and Neo4jArangoDB Database
Native multi-model databases combine different data models like documents or graphs in one tool and even allow to mix them in a single query. How can this concept compete with a pure document store like MongoDB or a graph database like Neo4j? I myself and a lot of folks in the community asked that question.
So here are some benchmark results.
Presentation faite lors du Hadoop User Group France du 14 janvier 2016.
L’analytique temps réel avec Riak et Spark par Michael Carney (Basho) et Olivier Girardot de Lateral Thoughts
Selon un rapport de Salesforce, le nombre de sources de données analysées par les entreprises progressera de 83% au cours des cinq prochaines années, ainsi les organisations veulent désormais fournir des connaissances en temps réel même sur les appareils mobiles. Le traitement temps réel est donc, le futur de l’analyse big data.
Ce talk présentera des nouveautés en matière de l’analyse temps réel autour de la famille SGBD Riak et Spark.
Michael Carney est le Directeur Commercial de Basho pour le Sud d’Europe. Fondateur de MySQL France et de MariaDB, Michael a rejoint Basho en janvier 2015 pour explorer le monde de données sans tables !
Olivier Girardot est le CTO de Lateral Thoughts, il est développeur et formateur au sujet de Spark et également spécialiste de Java/Python dans le domaine de la finance de marché.
In-memory data grids (IMDGs) are widely used as distributed, key-value stores for serialized objects, providing fast data access, location transparency, scalability, and high availability. With its support for built-in data structures, such as hashed sets and lists, Redis has demonstrated the value of enhancing standard create/read/update/delete (CRUD) APIs to provide extended functionality and performance gains. This talk describes new techniques which can be used to generalize this concept and enable the straightforward creation of arbitrary, user-defined data structures both within single objects and sharded across the IMDG.
A key challenge for IMDGs is to minimize network traffic when accessing and updating stored data. Standard CRUD APIs place the burden of implementing data structures on the client and require that full objects move between client and server on every operation. In contrast, implementing data structures within the server streamlines communication since only incremental changes to stored objects or requested subsets of this data need to be transferred. However, building extended data structures within IMDG servers creates several challenges, including, how to extend this mechanism, how to efficiently implement data-parallel operations spanning multiple shards, and how to protect the IMDG from errors in user-defined extensions.
This talk will describe two techniques which enable IMDGs to be extended to implement user-defined data structures. One technique, called single method invocation (SMI), allows users to define a class which implements a user-defined data structure stored as an IMDG object and then remotely execute a set of class methods within the IMDG. This enables IMDG clients to pass parameters to the IMDG and receive a result from method execution.
A second technique, called parallel method invocation (PMI), extends this approach to execute a method in parallel on multiple objects sharded across IMDG servers. PMI also provides an efficient mechanism for combining the results of method execution and returning a single result to the invoking client. In contrast to client-based techniques, this combining mechanism is integrated into the IMDG and completes in O(logN) time, where N is the number of IMDG servers.
The talk will describe how user-defined data structures can be implemented within the IMDG to run in a separate process (e.g., a JVM) to ensure that execution errors do not impair the stability of the IMDG. It will examine the associated performance trade-offs and techniques that can be used to minimize overhead.
Lastly, the talk will describe how popular Redis data structures, such as hashed sets, can be implemented as a user-defined data structure using SMI and then extended using both SMI and PMI to build a scalable hashed set that spans multiple shards. It will also examine other examples of user-defined data structures that can be built using these techniques.
ELK Stack (Elasticsearch, Logstash, Kibana) as a Log-Management solution for the Microsoft developer presented at the .net Usergroup in Munich in June 2015.
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowTreasure Data, Inc.
In this hands-on webinar we'll explore the data warehousing concept of Slowly Changing Dimensions (SCDs) and common use cases for managing SCDs when dealing with customer data. This webinar will demonstrate different methods for tracking SCDs in a data warehouse, and how Treasure Data Workflow can be used to create robust data pipelines to handle these processes.
Tapad's data pipeline is an elastic combination of technologies (Kafka, Hadoop, Avro, Scalding) that forms a reliable system for analytics, realtime and batch graph-building, and logging. In this talk, I will speak about the creation and evolution of the pipeline, and a concrete example – a day in the life of an event tracking pixel. We'll also talk about common challenges that we've overcome such as integrating different pieces of the system, schema evolution, queuing, and data retention policies.
Complex queries in a distributed multi-model databaseMax Neunhöffer
A multi-model database is a document store, a graph database as well as a key/value store. To allow for convenient and powerful querying such a database needs a query language that understands all three data models and allows to mix these models in queries. For example, it should be possible to find some documents in a collection according to some criteria, then follow some edges in a graph in which the documents represent vertices, and finally join the results with documents from yet another collection.
In this talk I will explain how a query engine for such a language works, give an overview of the life of a query from parsing, over translation into an execution plan, the optimisation phase and finally the execution. I will show how distributed query execution plans look like, how the query optimiser reasons about them and how the distributed execution works.
Following the classical software architecture patterns we tend to design large monolith of software applications.
These monoliths are typically quite difficult to scale as they often require powerful machines, making the option to scale out very expensive.
In most cases these monoliths of software are designed to run on a single machine only, hence scaling out is complicated or even impossible without refactoring large portions of the application.
Therefore a new design pattern called microservices arose.
The pattern of microservices keeps the need of a clustered server setup in mind and helps to keep the application very modular.
This allows to simplify a scale out of your application and even allows to scale the bottlenecks of your application only and hence reducing the total cost for a scale out approach.
In this talk I will introduce the concept of microservices, how they are defined and how to design an application with them.
Furthermore I will show how to scale the application properly and why this is only possible due to the use of microservices.
Also we will have a look at Node.js and why it is a perfect, though not the only, fit to this design strategy.
However scaling is not the only purpose of microservices, they also increase the flexibility and maintainability of applications, this will also be discussed in the talk.
guacamole: an Object Document Mapper for ArangoDBMax Neunhöffer
In this talk I will give a brief introduction and overview for guacamole, showing how easy it is to get started with using ArangoDB as the persistence layer for a Rails app. I will also explain the philosophy behind ArangoDB's "multi-model approach", but still show concrete code examples, and all of this in 15 minutes.
An E-commerce App in action built on top of a Multi-model DatabaseArangoDB Database
This talk presents a genuine use case of ArangoDB's native multi-model approach, by means of the example of an e-commerce app. First the main advantages of a "multi-model" database are explained. Then we dive deep into the native multi-model database ArangoDB and its query language - AQL. We give an introduction to the three data-models ArangoDB covers (Documents, Graphs and Key-Values), and explain that AQL is a uniform query language that can cover all three data-models of ArangoDB, so no context switches are necessary.
The major part of the talk will explain the data model and show concrete AQL queries that would occur in an e-commerce platform. Max will demonstrate the multi-model advantages of AQL and how they lead to better performance and to a simpler life for developers.
Video: https://youtu.be/9MUhdPpPpPc
A Graph Database That Scales - ArangoDB 3.7 Release WebinarArangoDB Database
örg Schad (Head of Engineering and ML) and Chris Woodward (Developer Relations Engineer) introduce the new capabilities to work with graph in a distributed setting. In addition explain and showcase the new fuzzy search within ArangoDB's search engine as well as JSON schema validation.
Get started with ArangoDB: https://www.arangodb.com/arangodb-tra...
Explore ArangoDB Cloud for free with 1-click demos: https://cloud.arangodb.com/home
ArangoDB is a native multi-model database written in C++ supporting graph, document and key/value needs with one engine and one query language. Fulltext search and ranking is supported via ArangoSearch the fully integrated C++ based search engine in ArangoDB.
Performance comparison: Multi-Model vs. MongoDB and Neo4jArangoDB Database
Native multi-model databases combine different data models like documents or graphs in one tool and even allow to mix them in a single query. How can this concept compete with a pure document store like MongoDB or a graph database like Neo4j? I myself and a lot of folks in the community asked that question.
So here are some benchmark results.
Presentation faite lors du Hadoop User Group France du 14 janvier 2016.
L’analytique temps réel avec Riak et Spark par Michael Carney (Basho) et Olivier Girardot de Lateral Thoughts
Selon un rapport de Salesforce, le nombre de sources de données analysées par les entreprises progressera de 83% au cours des cinq prochaines années, ainsi les organisations veulent désormais fournir des connaissances en temps réel même sur les appareils mobiles. Le traitement temps réel est donc, le futur de l’analyse big data.
Ce talk présentera des nouveautés en matière de l’analyse temps réel autour de la famille SGBD Riak et Spark.
Michael Carney est le Directeur Commercial de Basho pour le Sud d’Europe. Fondateur de MySQL France et de MariaDB, Michael a rejoint Basho en janvier 2015 pour explorer le monde de données sans tables !
Olivier Girardot est le CTO de Lateral Thoughts, il est développeur et formateur au sujet de Spark et également spécialiste de Java/Python dans le domaine de la finance de marché.
In-memory data grids (IMDGs) are widely used as distributed, key-value stores for serialized objects, providing fast data access, location transparency, scalability, and high availability. With its support for built-in data structures, such as hashed sets and lists, Redis has demonstrated the value of enhancing standard create/read/update/delete (CRUD) APIs to provide extended functionality and performance gains. This talk describes new techniques which can be used to generalize this concept and enable the straightforward creation of arbitrary, user-defined data structures both within single objects and sharded across the IMDG.
A key challenge for IMDGs is to minimize network traffic when accessing and updating stored data. Standard CRUD APIs place the burden of implementing data structures on the client and require that full objects move between client and server on every operation. In contrast, implementing data structures within the server streamlines communication since only incremental changes to stored objects or requested subsets of this data need to be transferred. However, building extended data structures within IMDG servers creates several challenges, including, how to extend this mechanism, how to efficiently implement data-parallel operations spanning multiple shards, and how to protect the IMDG from errors in user-defined extensions.
This talk will describe two techniques which enable IMDGs to be extended to implement user-defined data structures. One technique, called single method invocation (SMI), allows users to define a class which implements a user-defined data structure stored as an IMDG object and then remotely execute a set of class methods within the IMDG. This enables IMDG clients to pass parameters to the IMDG and receive a result from method execution.
A second technique, called parallel method invocation (PMI), extends this approach to execute a method in parallel on multiple objects sharded across IMDG servers. PMI also provides an efficient mechanism for combining the results of method execution and returning a single result to the invoking client. In contrast to client-based techniques, this combining mechanism is integrated into the IMDG and completes in O(logN) time, where N is the number of IMDG servers.
The talk will describe how user-defined data structures can be implemented within the IMDG to run in a separate process (e.g., a JVM) to ensure that execution errors do not impair the stability of the IMDG. It will examine the associated performance trade-offs and techniques that can be used to minimize overhead.
Lastly, the talk will describe how popular Redis data structures, such as hashed sets, can be implemented as a user-defined data structure using SMI and then extended using both SMI and PMI to build a scalable hashed set that spans multiple shards. It will also examine other examples of user-defined data structures that can be built using these techniques.
ELK Stack (Elasticsearch, Logstash, Kibana) as a Log-Management solution for the Microsoft developer presented at the .net Usergroup in Munich in June 2015.
Hands-On: Managing Slowly Changing Dimensions Using TD WorkflowTreasure Data, Inc.
In this hands-on webinar we'll explore the data warehousing concept of Slowly Changing Dimensions (SCDs) and common use cases for managing SCDs when dealing with customer data. This webinar will demonstrate different methods for tracking SCDs in a data warehouse, and how Treasure Data Workflow can be used to create robust data pipelines to handle these processes.
Tapad's data pipeline is an elastic combination of technologies (Kafka, Hadoop, Avro, Scalding) that forms a reliable system for analytics, realtime and batch graph-building, and logging. In this talk, I will speak about the creation and evolution of the pipeline, and a concrete example – a day in the life of an event tracking pixel. We'll also talk about common challenges that we've overcome such as integrating different pieces of the system, schema evolution, queuing, and data retention policies.
Complex queries in a distributed multi-model databaseMax Neunhöffer
A multi-model database is a document store, a graph database as well as a key/value store. To allow for convenient and powerful querying such a database needs a query language that understands all three data models and allows to mix these models in queries. For example, it should be possible to find some documents in a collection according to some criteria, then follow some edges in a graph in which the documents represent vertices, and finally join the results with documents from yet another collection.
In this talk I will explain how a query engine for such a language works, give an overview of the life of a query from parsing, over translation into an execution plan, the optimisation phase and finally the execution. I will show how distributed query execution plans look like, how the query optimiser reasons about them and how the distributed execution works.
In this talk the general idea of graph databases is presented and the execution of a graph traversal is shown.
The talk has been given at NoSQL UG Cologne (in Nov. 2013)
In this hotcode 2013 talk Lucas and Frank gave an overview over NoSQL and explained why it is a good idea to use Javascript also in the database environment.
ArangoDB – Persistência Poliglota e Banco de Dados Multi-ModelosHelder Santana
Nesta palestra, apresento uma breve introdução ao mundo do banco de dados NoSQL, apontando os benefícios da persistência poliglota. Apresento o ArangoDB, um banco de dados universal de código aberto com um modelo de dados flexível para documentos, grafos e chave-valor. Crie aplicativos de alto desempenho usando uma linguagem de consulta sql-like ou extensões JavaScript/Ruby.
Jan Steemann: Modelling data in a schema free world (Talk held at Froscon, 2...ArangoDB Database
Even though most NoSQL databases follow
the "schemafree" data paradigma, it is still import to choose the right data model to make the best of the underlying database technology. This talk provides an overview of
the different data storage models available in popular NoSQL databases. It also introduces some best practices on how to model your data for both best performance and best querying.
Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at...Big Data Spain
This talk will give a good overview over the complex architecture of the Pregel framework and will give some insights where there are potential bottlenecks when writing a Pregel algorithm.
Recently a new breed of "multi-model" databases has emerged. They are a document store, a graph database and a key/value store combined in one program. Therefore they are able to cover a lot of use cases which otherwise would need multiple different database systems.
This approach promises a boost to the idea of "polyglot persistence", which has become very popular in recent years although it creates some friction in the form of data conversion and synchronisation between different systems. This is, because with a multi-model database one can enjoy the benefits of polyglot persistence without the disadvantages.
In this talk I will explain the motivation behind the multi-model approach, discuss its advantages and limitations, and will then risk to make some predictions about the NoSQL database market in five years time, which I shall only reveal during the talk.
Jan Steemann had a talk at Javascript Everywhere in Paris 2012 on Javascript in Arangodb an open source NoSQL database. With ArangoDB you can use Javascript and/or Ruby (mruby) as embedded language
Was alternative Datenbanken jenseits von Skalierung bieten. NoSQL Datenbanken bieten sich an, um hochskalierende Anwendungen auf Hunderten von Nodes zu realisieren. Stimmt. Das ist aber nicht alles. Nicht-relationale Datenbanken bieten jenseits von Performanz interessante Features, die man auch in Projekten gebrauchen kann, die nicht die Nutzerzahlen von Twitter und Facebook erreichen. Dieser Vortrag greift Aspekte wie Schemafreiheit, Multi-Model-Datemodellierung und "Database as application server" auf und zeigt, wie man diese sinnvoll in Projekten einsetzen kann und was dabei zu beachten ist.
FOXX - a Javascript application framework on top of ArangoDBArangoDB Database
Foxx allows you to build APIs directly on top of the database ArangoDB in Javascript and therefore skip the middleman (Rails, Django, Symfony or whatever your favorite web framework is). Foxx is designed with simplicity and the specific use case of modern client-side MVC frameworks in mind.
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
DevOps, continuous delivery and modern architectural trends can incredibly speed up the software development process. Big Data applications cannot be an exception and need to keep the same pace.
It is official: MySQL listens to HTTP and speaks JSON. MySQL got a new plugin that lets HTTP clients and JavaScript users connect to MySQL using HTTP. The development preview brings three APIs: key-document for nested JSON documents, CRUD for JSON mapped SQL tables and plain SQL with JSON replies. More so: MySQL 5.7.4 has SQL functions for modifying JSON, for searching documents and new indexing methods!
OpenStack Preso: DevOps on Hybrid Infrastructurerhirschfeld
Discusses the approach for making hybrid DevOps workable including what obstacles must be overcome. Includes demo of multiple OpenStack clouds & Kubernetes deploy on AWS, Google and OpenStack
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"Fwdays
Here in DS team in WIX we want to help to create stunning sites by applying recent achievement of AI research to production. Since Data Science engineering practices are still not fully shaped we found out that it is crucial to bring the best practices from software engineering - give Data Scientist ability to deliver models fast without loss in quality and computation efficiency to stay competitive in this overhyped market. To achieve this we are developing our own infrastructure for creating pipelines and deploying them to production with minimum (to none) engineer involvement.
This talk will cover initial motivation, solved technical issues and lessons learned while building such ML delivery system.
Website: https://fwdays.com/en/event/data-science-fwdays-2019/review/continuous-delivery-of-ml-pipelines-to-production
Kiedy aplikacja napisana w Serverless Frameworku jest mała, można zamieść niektóre rzeczy pod dywan. Ale co, kiedy po kilku miesiącach zaczyna wychodzić spod niego prawdziwy potwór? Co, kiedy musisz przetestować jedną lambdę na środowisku, a deploy całego stacka trwa 20 minut? No i jak przeorganizować aplikację wiedząc, że ciągle będzie rosła? Dowiedz się, jak rozbiliśmy naszą hurtownię danych wykorzystując Serverless Compose. Jakie przyniosło nam to efekty i o czym dowiedzieliśmy się w trakcie.
My Gluecon presentation about hybrid infrastructure and container orchestration deployment. I talk about why composability matters and how AWS sets the standard.
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...DevOps.com
The team at InfluxData needed to review benchmark data on InfluxDB during their development cycle to ensure any new changes continue to improve performance. However, using the existing benchmarking framework, InfluxDB-comparison, was manual and time consuming and not used consistently. To change this, InfluxData asked the team at Bonitoo to enhance the benchmark framework to easily incorporate new use cases, add new versions of the software quickly and easily and provide benchmark data on a cadence that works for the development (daily, weekly, monthly) cycle.
In this webinar the team from Bonitoo will share how they were able to accomplish this as well as build automation into the existing framework. In addition, they will share the benchmark results generated from the framework that highlights how performant a time series database like InfluxDB is compared to the latest versions of products like MongoDB, Cassandra, Elasticsearch, and OpenTSDB.
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
We created a plugin-based data collection tool that can read any chaotically formatted files called "CSV" by guessing its schema automatically
Talked at csv,conf,v2 in Berlin
http://csvconf.com/
A fresh look at Google’s Cloud by Mandy Waite Codemotion
Google, one of the early PaaS (Platform as a Service) pionneers, has recently substantially improved AppEngine, expanded its Cloud Platform to include CloudStorage, BigQuery and soon Google Compute Engine (still in early access as of this writing).
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
AI has been a hot topic lately, with advances being made constantly in what is possible, there has not been as much discussion of the infrastructure and scaling challenges that come with it. How do you support dozens of different languages and frameworks, and make them interoperate invisibly? How do you scale to run abstract code from thousands of different developers, simultaneously and elastically, while maintaining less than 15ms of overhead?
At Algorithmia, we’ve built, deployed, and scaled thousands of algorithms and machine learning models, using every kind of framework (from scikit-learn to tensorflow). We’ve seen many of the challenges faced in this area, and in this talk I’ll share some insights into the problems you’re likely to face, and how to approach solving them.
In brief, we’ll examine the need for, and implementations of, a complete “Operating System for AI” – a common interface for different algorithms to be used and combined, and a general architecture for serverless machine learning which is discoverable, versioned, scalable and sharable.
OpenSource API Server based on Node.js API framework built on supported Node.js platform with Tooling and DevOps. Use cases are Omni-channel API Server, Mobile Backend as a Service (mBaaS) or Next Generation Enterprise Service Bus. Key functionality include built in enterprise connectors, ORM, Offline Sync, Mobile and JS SDKs, Isomorphic JavaScript and Graphical API creation tool.
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
Jan 22nd, 2010 Hadoop meetup presentation on project voldemort and how it plays well with Hadoop at linkedin. The talk focus on Linkedin Hadoop ecosystem. How linkedin manage complex workflows, data ETL , data storage and online serving of 100GB to TB of data.
Similar to Overhauling a database engine in 2 months (20)
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
2. Max Neunhöffer
I am a mathematician
“Earlier life”: Research in Computer Algebra
(Computational Group Theory)
Always juggled with big data
Now: working in database development, NoSQL, ArangoDB
I like:
research,
hacking,
teaching,
tickling the highest performance out of computer systems.
1
3. ArangoDB GmbH
triAGENS GmbH offers consulting services since 2004:
software architecture
project management
software development
business analysis
a lot of experience with specialised database systems.
have done NoSQL, before the term was coined at all
2011/2012, an idea emerged:
to build the database one had wished to have all those years!
development of ArangoDB as open source software since 2012
ArangoDB GmbH: spin-off to take care of ArangoDB (2014)
2
4. is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
including joins between different collections,
configurable consistency guarantees using transactions
is memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
API extensible by JS code in the Foxx Microservice Framework,
offers many drivers for a wide range of languages,
is easy to use with web front end and good documentation,
and enjoys good community as well as professional support.
3
6. ArangoDB in numbers
DB engine written in C++
embeds Google’s V8 (∼ 130 000 lines of code)
mostly in memory, using memory mapped files
processes JSON data, schema-less but “shapes”
library: ∼ 128 000 lines (C++)
DB engine: ∼ 210 000 lines (C++, including 12 000 for utilities)
JavaScript layer: ∼ 1 232 000 lines of code
∼ 85 000 standard API implementation
∼ 592 000 Foxx apps (API extensions, web front end)
∼ 298 000 unit tests
∼ 327 000 node.js modules
further unit tests: ∼ 10 000 C++ and ∼ 24 000 Ruby for HTTP
plus documentation
and drivers (in other repositories)
5
7. The Task
It is March 2014, we have just released V2.0.
V2.1 is scheduled for end of May, V2.2 is scheduled for July
V2.1 is incremental, V2.2 is “Write-Ahead-Log”
work for V2.2 started in March
unfortunately, introducing a WAL is akin to open heart surgery,
→ essentially need to reengineer the database engine
do not have the capacity to assign 10 developers to the job
6
8. The old setup
New data:
{ name: "watch", price: 99 }
Data files:
Collection: products
Collection: sales
(append only)
(append only)
For transactions:
Locks and commit markers on all collections necessary.
7
9. The new setup
"Collector" (later)
Collection: sales
Collection: products (append only)
(append only)
Data files:
New data:
Write Ahead Log (WAL) (append only)
{ name: "watch", price: 99 }
For transactions:
Less locks and commit markers only in WAL.
8
10. Advantages of a WAL
have a single history of events
have a single place to note the commit of a transaction
easy asynchronous replication
efficient sync to disk
uncommitted stuff does not hit the data files at all
better support for transactions → much higher performance
better crash recovery
deterministic, well defined behaviour
9
11. Challenges
need fundamental change in the storage engine
the collector changes stuff that is potentially being read
need to be careful not to create a bottleneck
need to get locking right
crashes hard to test
if possible, users must not notice the change
(except better performance)
10
12. Our testing setup
We do continuous tests after every push to github and nightly.
We have separate test suites for single server and cluster.
Different types of test for good coverage:
low-level C++ library unit tests (10000 LOC C++)
JS tests (separately on server and JS shell, 290000 LOC)
AQL query engine (230000 LOC of the above)
HTTP interface (TCP and SSL, 24000 LOC Ruby)
dump/restore and bulk import
benchmarks (3000 LOC C++)
user interface (phantomjs, comparing screen shots)
run tests with valgrind
check coverage
11
13. Test methodologies
Tests can have different aims:
Unit tests:
ensure that individual components work according to
specifications
Integration tests:
ensure that multiple components work together correctly
Benchmark tests:
ensure performance
End to End tests:
ensure that complex systems as a whole do their job
User interface tests:
ensure that the user interface works and behaves as
documented/specified
12
14. Test characteristics needed for our task
Characteristics of our task
well-defined, deterministic behaviour
if possible, no observable change in functionality
changes relatively far down in the software stack
but reach wide
temporary breakage expected and accepted
=⇒ For our task, we needed:
unit tests,
integration tests and
benchmarks.
Fortunately, we had all these in place!
13
15. Approach — preparation phase
1. design work:
WAL, markers, collection, compaction, where lock what
2. implement infrastructure for WAL:
mmapped files, append op, marker format
3. add “write to WAL” to write operations
4. test filling of WAL
14
16. Approach — breaking phase
4. remove old write operations
5. implement collector thread and adjust compactor thread
6. repair by adjusting read operations for WAL/data file
15