In these slides, Jan Steemann, core member of the ArangoDB project, introduced to the idea of native multi-model databases and how this approach can provide much more flexibility for developers, software architects & data scientists.
We all know good training data is crucial for data scientists to build quality machine learning models. But when productionizing Machine Learning, Metadata is equally important. Consider for example:
- Provenance of model allowing for reproducible builds
- Context to comply with GDPR, CCPA requirements
- Identifying data shift in your production data
This is the reason we built ArangoML Pipeline, a flexible Metadata store which can be used with your existing ML Pipeline.
Today we are happy to announce a release of ArangoML Pipeline Cloud. Now you can start using ArangoML Pipeline without having to even start a separate docker container.
In this webinar, we will show how to leverage ArangoML Pipeline Cloud with your Machine Learning Pipeline by using an example notebook from the TensorFlow tutorial.
Find the video here: https://www.arangodb.com/arangodb-events/arangoml-pipeline-cloud/
Find the recording of this webinar here: https://www.arangodb.com/arangodb-events/3-7-roadmap-performance-at-scale/
After the release of ArangoDB 3.6 we are starting to work on the next version with even more exciting features. As an open-source project we would love to hear your ideas and discuss the roadmap with our community.
Would you like to learn more about Satellite Graphs, Schema Validation, a number of performance and security improvements?
Than join Jörg Schad, Head of Engineering and Machine Learning at ArangoDB, who will share the latest plans for the upcoming ArangoDB 3.7 release as well as the long term roadmap.
Running complex data queries in a distributed systemArangoDB Database
With the always-growing amount of data, it is getting increasingly hard to store and get it back efficiently. While the first versions of distributed databases have put all the burden of sharding on the application code, there are now some smarter solutions that handle most of the data distribution and resilience tasks inside the database.
This poses some interesting questions, e.g.
- how are other than by-primary-key queries actually organized and executed in a distributed system, so that they can run most efficiently?
- how do the contemporary distributed databases actually achieve transactional semantics for non-trivial operations that affect different shards/servers?
This talk will give an overview of these challenges and the available solutions that some open source distributed databases have picked to solve them.
Guacamole Fiesta: What do avocados and databases have in common?ArangoDB Database
First, our CTO, Frank Celler, does a quick overview of the latest feature developments and what is new with ArangoDB.
Then, Senior Graph Specialist, Michael Hackstein talks about multi-model database movement, diving deeper into main advantages and technological benefits. He introduces three data-models of ArangoDB (Documents, Graphs and Key-Values) and the reasons behind the technology. We have a look at the ArangoDB Query language (AQL) with hands-on examples. Compare AQL to SQL, see where the differences are and what makes AQL better comprehensible for developers. Finally, we touch the Foxx Microservice framework which allows to easily extend ArangoDB and include it in your microservices landscape.
Webinar: ArangoDB 3.8 Preview - Analytics at Scale ArangoDB Database
The ArangoDB community and team are proud to preview the next version of ArangoDB, an open-source, highly scalable graph database with multi-model capabilities. Join our CTO, Jörg Schad, Ph.D. and Developer Relation Engineer Chris Woodward in this webinar to learn more about ArangoDB 3.8 and the roadmap for upcoming releases.
These are the slides from the webinar, where Chris & Jan walked through the basic concepts, key features and query options you have within ArangoDB as well as discuss scalability considerations for different data models. Chris is the hands-on guy and will showcase a variety of query options you have with a native multi-model database like ArangoDB
These are the slides to the webinar about Custom Pregel algorithms in ArangoDB https://youtu.be/DWJ-nWUxsO8. It provides a brief introduction to the capabilities and use cases for Pregel.
Different applications need different performance guarantees. Some applications need fastest possible ingest of a dataset (a hare). Other applications need write rate guarantees for every single record (a tortoise). The choice between the two becomes critical as CPU cores, memory size, and disk throughput decrease to save money on cloud virtual machines / AWS instances. This presentation demonstrates how to convert RocksDB's natural "hare mode" into "tortoise mode" for timing sensitive applications and/or lower cost, lower capability hardware. RocksDB's stalls and stops are explained.
We all know good training data is crucial for data scientists to build quality machine learning models. But when productionizing Machine Learning, Metadata is equally important. Consider for example:
- Provenance of model allowing for reproducible builds
- Context to comply with GDPR, CCPA requirements
- Identifying data shift in your production data
This is the reason we built ArangoML Pipeline, a flexible Metadata store which can be used with your existing ML Pipeline.
Today we are happy to announce a release of ArangoML Pipeline Cloud. Now you can start using ArangoML Pipeline without having to even start a separate docker container.
In this webinar, we will show how to leverage ArangoML Pipeline Cloud with your Machine Learning Pipeline by using an example notebook from the TensorFlow tutorial.
Find the video here: https://www.arangodb.com/arangodb-events/arangoml-pipeline-cloud/
Find the recording of this webinar here: https://www.arangodb.com/arangodb-events/3-7-roadmap-performance-at-scale/
After the release of ArangoDB 3.6 we are starting to work on the next version with even more exciting features. As an open-source project we would love to hear your ideas and discuss the roadmap with our community.
Would you like to learn more about Satellite Graphs, Schema Validation, a number of performance and security improvements?
Than join Jörg Schad, Head of Engineering and Machine Learning at ArangoDB, who will share the latest plans for the upcoming ArangoDB 3.7 release as well as the long term roadmap.
Running complex data queries in a distributed systemArangoDB Database
With the always-growing amount of data, it is getting increasingly hard to store and get it back efficiently. While the first versions of distributed databases have put all the burden of sharding on the application code, there are now some smarter solutions that handle most of the data distribution and resilience tasks inside the database.
This poses some interesting questions, e.g.
- how are other than by-primary-key queries actually organized and executed in a distributed system, so that they can run most efficiently?
- how do the contemporary distributed databases actually achieve transactional semantics for non-trivial operations that affect different shards/servers?
This talk will give an overview of these challenges and the available solutions that some open source distributed databases have picked to solve them.
Guacamole Fiesta: What do avocados and databases have in common?ArangoDB Database
First, our CTO, Frank Celler, does a quick overview of the latest feature developments and what is new with ArangoDB.
Then, Senior Graph Specialist, Michael Hackstein talks about multi-model database movement, diving deeper into main advantages and technological benefits. He introduces three data-models of ArangoDB (Documents, Graphs and Key-Values) and the reasons behind the technology. We have a look at the ArangoDB Query language (AQL) with hands-on examples. Compare AQL to SQL, see where the differences are and what makes AQL better comprehensible for developers. Finally, we touch the Foxx Microservice framework which allows to easily extend ArangoDB and include it in your microservices landscape.
Webinar: ArangoDB 3.8 Preview - Analytics at Scale ArangoDB Database
The ArangoDB community and team are proud to preview the next version of ArangoDB, an open-source, highly scalable graph database with multi-model capabilities. Join our CTO, Jörg Schad, Ph.D. and Developer Relation Engineer Chris Woodward in this webinar to learn more about ArangoDB 3.8 and the roadmap for upcoming releases.
These are the slides from the webinar, where Chris & Jan walked through the basic concepts, key features and query options you have within ArangoDB as well as discuss scalability considerations for different data models. Chris is the hands-on guy and will showcase a variety of query options you have with a native multi-model database like ArangoDB
These are the slides to the webinar about Custom Pregel algorithms in ArangoDB https://youtu.be/DWJ-nWUxsO8. It provides a brief introduction to the capabilities and use cases for Pregel.
Different applications need different performance guarantees. Some applications need fastest possible ingest of a dataset (a hare). Other applications need write rate guarantees for every single record (a tortoise). The choice between the two becomes critical as CPU cores, memory size, and disk throughput decrease to save money on cloud virtual machines / AWS instances. This presentation demonstrates how to convert RocksDB's natural "hare mode" into "tortoise mode" for timing sensitive applications and/or lower cost, lower capability hardware. RocksDB's stalls and stops are explained.
A Graph Database That Scales - ArangoDB 3.7 Release WebinarArangoDB Database
örg Schad (Head of Engineering and ML) and Chris Woodward (Developer Relations Engineer) introduce the new capabilities to work with graph in a distributed setting. In addition explain and showcase the new fuzzy search within ArangoDB's search engine as well as JSON schema validation.
Get started with ArangoDB: https://www.arangodb.com/arangodb-tra...
Explore ArangoDB Cloud for free with 1-click demos: https://cloud.arangodb.com/home
ArangoDB is a native multi-model database written in C++ supporting graph, document and key/value needs with one engine and one query language. Fulltext search and ranking is supported via ArangoSearch the fully integrated C++ based search engine in ArangoDB.
In these slides, Jan Steemann, core member of the ArangoDB project, introduced to the idea of native multi-model databases and how this approach can provide much more flexibility for developers, software architects & data scientists.
The is the RFC for AvocadoDB's query language. AvocadoDB is an open source nosql database (see www.avocadodb.org) offering a mixture of data models like key value pairs, documents and graphs.
The REST API for AvocadoDB is already available and stable and people are writing APIs using it. Awesome. As AvocacoDB offers more complex data structures like graphs and lists REST is not enough. We implemented a first version of a query language some time ago which is very similar to SQL and UNQL.
Then we realized that this approach was not completely satisfying as some queries cannot expressed very well with it, especially multi-valued attributes/lists. UNQL addresses this partly, but does not go far enough. Another issue are graphs. AvocadoDB supports querying graphs, neither SQL nor UNQL offer any "natural" graph traversal facilities.
As we did not find any existing query language that addresses the problems we found we had to define a new query language which is presented in the presentation.
Have some feedback on this? Come to www.avocadodb.org and tell us what you think about it. :-)
Introduction to ArangoDB (nosql matters Barcelona 2012)ArangoDB Database
ArangoDB is a universal open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript/Ruby extensions.
The video is also available online:
http://2012.nosql-matters.org/bcn/speakers/
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Julian Hyde
A talk from given by Julian Hyde and Tomer Shiran at Hadoop Summit, Dublin.
Data scientists and analysts want the best API, DSL or query language possible, not to be limited by what the processing engine can support. Polyalgebra is an extension to relational algebra that separates the user language from the engine, so you can choose the best language and engine for the job. It also allows the system to optimize queries and cache results. We demonstrate how Ibis uses Polyalgebra to execute the same Python-based machine learning queries on Impala, Drill and Spark. And we show how to build Polyalgebra expressions in Calcite and how to define optimization rules and storage handlers.
GraphTech Ecosystem - part 2: Graph AnalyticsLinkurious
The graph ecosystem presentation lists and introduces a vast majority of graph analytics actors: graph analytics frameworks; graph processing engines; graph analytics libraries and toolkits; graph query languages and projects.
In this talk, we’ll discuss technical designs of support of HBase as a “native” data source to Spark SQL to achieve both query and load performance and scalability: near-precise execution locality of query and loading, fine-tuned partition pruning, predicate pushdown, plan execution through coprocessor, and optimized and fully parallelized bulk loader. Point and range queries on dimensional attributes will benefit particularly well from the techniques. Preliminary test results vs. established SQL-on-HBase technologies will be provided. The speaker will also share the future plan and real-world use cases, particularly in the telecom industry.
5th in the AskTOM Office Hours series on graph database technologies. https://devgym.oracle.com/pls/apex/dg/office_hours/3084
PGQL: A Query Language for Graphs
Learn how to query graphs using PGQL, an expressive and intuitive graph query language that's a lot like SQL. With PGQL, it's easy to get going writing graph analysis queries to the database in a very short time. Albert and Oskar show what you can do with PGQL, and how to write and execute PGQL code.
A concentrated look at Apache Spark's library Spark SQL including background information and numerous Scala code examples of using Spark SQL with CSV, JSON and databases such as mySQL.
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
Graph data and graph analytics are increasingly important in data science and engineering. Cypher is an open language used for querying and updating graph databases and analytics platforms, which is now available in the Apache Spark environment. Neo4j Morpheus leverages the open source graph language project to integrate data from Neo4j operational graph databases with Hive and JDBC SQL data sources, using new Cypher features like the Property Graph Catalog, named graphs, graph projection, parameterized graph view functions, and graph/table views. Input and output graphs can be loaded and stored as structured collections of DataFrames with strong graph schemas to ensure data consistency and graph query optimization. Property graphs can also be analyzed and transformed using graph algorithms such as those in the GraphFrames project. Besides describing and demonstrating these capabilities, this talk also discusses the Spark Project Improvement Proposal to bring Cypher into Spark 3.0, and outlines current work to unify Cypher with other graph query languages to form a new ISO standard Graph Query Language.
Speakers: Alastair Green, Martin Junghanns
ArangoDB is a native multi-model database system developed by triAGENS GmbH. The database system supports three important data models (key/value, documents, graphs) with one database core and a unified query language AQL (ArangoDB Query Language). ArangoDB is a NoSQL database system but AQL is similar in many ways to SQL
Domain Driven Design is a software development process that focuses on finding a common language for the involved parties. This language and the resulting models are taken from the domain rather than the technical details of the implementation. The goal is to improve the communication between customers, developers and all other involved groups. Even if Eric Evan's book about this topic was written almost ten years ago, this topic remains important because a lot of projects fail for communication reasons.
Relational databases have their own language and influence the design of software into a direction further away from the Domain: Entities have to be created for the sole purpose of adhering to best practices of relational database. Two kinds of NoSQL databases are changing that: Document stores and graph databases. In a document store you can model a "contains" relation in a more natural way and thereby express if this entity can exist outside of its surrounding entity. A graph database allows you to model relationships between entities in a straight forward way that can be expressed in the language of the domain.
In this talk I want to look at the way a multi model database that combines a document store and a graph database can help you to model your problems in a way that is understandable for all parties involved, and explain the benefits of this approach for the software development process.
A Graph Database That Scales - ArangoDB 3.7 Release WebinarArangoDB Database
örg Schad (Head of Engineering and ML) and Chris Woodward (Developer Relations Engineer) introduce the new capabilities to work with graph in a distributed setting. In addition explain and showcase the new fuzzy search within ArangoDB's search engine as well as JSON schema validation.
Get started with ArangoDB: https://www.arangodb.com/arangodb-tra...
Explore ArangoDB Cloud for free with 1-click demos: https://cloud.arangodb.com/home
ArangoDB is a native multi-model database written in C++ supporting graph, document and key/value needs with one engine and one query language. Fulltext search and ranking is supported via ArangoSearch the fully integrated C++ based search engine in ArangoDB.
In these slides, Jan Steemann, core member of the ArangoDB project, introduced to the idea of native multi-model databases and how this approach can provide much more flexibility for developers, software architects & data scientists.
The is the RFC for AvocadoDB's query language. AvocadoDB is an open source nosql database (see www.avocadodb.org) offering a mixture of data models like key value pairs, documents and graphs.
The REST API for AvocadoDB is already available and stable and people are writing APIs using it. Awesome. As AvocacoDB offers more complex data structures like graphs and lists REST is not enough. We implemented a first version of a query language some time ago which is very similar to SQL and UNQL.
Then we realized that this approach was not completely satisfying as some queries cannot expressed very well with it, especially multi-valued attributes/lists. UNQL addresses this partly, but does not go far enough. Another issue are graphs. AvocadoDB supports querying graphs, neither SQL nor UNQL offer any "natural" graph traversal facilities.
As we did not find any existing query language that addresses the problems we found we had to define a new query language which is presented in the presentation.
Have some feedback on this? Come to www.avocadodb.org and tell us what you think about it. :-)
Introduction to ArangoDB (nosql matters Barcelona 2012)ArangoDB Database
ArangoDB is a universal open-source database with a flexible data model for documents, graphs, and key-values. Build high performance applications using a convenient sql-like query language or JavaScript/Ruby extensions.
The video is also available online:
http://2012.nosql-matters.org/bcn/speakers/
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Julian Hyde
A talk from given by Julian Hyde and Tomer Shiran at Hadoop Summit, Dublin.
Data scientists and analysts want the best API, DSL or query language possible, not to be limited by what the processing engine can support. Polyalgebra is an extension to relational algebra that separates the user language from the engine, so you can choose the best language and engine for the job. It also allows the system to optimize queries and cache results. We demonstrate how Ibis uses Polyalgebra to execute the same Python-based machine learning queries on Impala, Drill and Spark. And we show how to build Polyalgebra expressions in Calcite and how to define optimization rules and storage handlers.
GraphTech Ecosystem - part 2: Graph AnalyticsLinkurious
The graph ecosystem presentation lists and introduces a vast majority of graph analytics actors: graph analytics frameworks; graph processing engines; graph analytics libraries and toolkits; graph query languages and projects.
In this talk, we’ll discuss technical designs of support of HBase as a “native” data source to Spark SQL to achieve both query and load performance and scalability: near-precise execution locality of query and loading, fine-tuned partition pruning, predicate pushdown, plan execution through coprocessor, and optimized and fully parallelized bulk loader. Point and range queries on dimensional attributes will benefit particularly well from the techniques. Preliminary test results vs. established SQL-on-HBase technologies will be provided. The speaker will also share the future plan and real-world use cases, particularly in the telecom industry.
5th in the AskTOM Office Hours series on graph database technologies. https://devgym.oracle.com/pls/apex/dg/office_hours/3084
PGQL: A Query Language for Graphs
Learn how to query graphs using PGQL, an expressive and intuitive graph query language that's a lot like SQL. With PGQL, it's easy to get going writing graph analysis queries to the database in a very short time. Albert and Oskar show what you can do with PGQL, and how to write and execute PGQL code.
A concentrated look at Apache Spark's library Spark SQL including background information and numerous Scala code examples of using Spark SQL with CSV, JSON and databases such as mySQL.
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
Graph data and graph analytics are increasingly important in data science and engineering. Cypher is an open language used for querying and updating graph databases and analytics platforms, which is now available in the Apache Spark environment. Neo4j Morpheus leverages the open source graph language project to integrate data from Neo4j operational graph databases with Hive and JDBC SQL data sources, using new Cypher features like the Property Graph Catalog, named graphs, graph projection, parameterized graph view functions, and graph/table views. Input and output graphs can be loaded and stored as structured collections of DataFrames with strong graph schemas to ensure data consistency and graph query optimization. Property graphs can also be analyzed and transformed using graph algorithms such as those in the GraphFrames project. Besides describing and demonstrating these capabilities, this talk also discusses the Spark Project Improvement Proposal to bring Cypher into Spark 3.0, and outlines current work to unify Cypher with other graph query languages to form a new ISO standard Graph Query Language.
Speakers: Alastair Green, Martin Junghanns
ArangoDB is a native multi-model database system developed by triAGENS GmbH. The database system supports three important data models (key/value, documents, graphs) with one database core and a unified query language AQL (ArangoDB Query Language). ArangoDB is a NoSQL database system but AQL is similar in many ways to SQL
Domain Driven Design is a software development process that focuses on finding a common language for the involved parties. This language and the resulting models are taken from the domain rather than the technical details of the implementation. The goal is to improve the communication between customers, developers and all other involved groups. Even if Eric Evan's book about this topic was written almost ten years ago, this topic remains important because a lot of projects fail for communication reasons.
Relational databases have their own language and influence the design of software into a direction further away from the Domain: Entities have to be created for the sole purpose of adhering to best practices of relational database. Two kinds of NoSQL databases are changing that: Document stores and graph databases. In a document store you can model a "contains" relation in a more natural way and thereby express if this entity can exist outside of its surrounding entity. A graph database allows you to model relationships between entities in a straight forward way that can be expressed in the language of the domain.
In this talk I want to look at the way a multi model database that combines a document store and a graph database can help you to model your problems in a way that is understandable for all parties involved, and explain the benefits of this approach for the software development process.
Recently a new breed of "multi-model" databases has emerged. They are a document store, a graph database and a key/value store combined in one program. Therefore they are able to cover a lot of use cases which otherwise would need multiple different database systems.
This approach promises a boost to the idea of "polyglot persistence", which has become very popular in recent years although it creates some friction in the form of data conversion and synchronisation between different systems. This is, because with a multi-model database one can enjoy the benefits of polyglot persistence without the disadvantages.
In this talk I will explain the motivation behind the multi-model approach, discuss its advantages and limitations, and will then risk to make some predictions about the NoSQL database market in five years time, which I shall only reveal during the talk.
3.Implementation with NOSQL databases Document Databases (Mongodb).pptxRushikeshChikane2
this Chapter gives information about Document Based Database and Graph based Database. It gives their basic structures, Features,applications ,Limitations and use cases
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing Postgr...Trivadis
TechEvent 2019: Status of the partnership Trivadis and EDB - Comparing PostgreSQL to Oracle, the best kept secrets; Konrad Häfeli, Jan Karremans - Trivadis
An E-commerce App in action built on top of a Multi-model DatabaseArangoDB Database
This talk presents a genuine use case of ArangoDB's native multi-model approach, by means of the example of an e-commerce app. First the main advantages of a "multi-model" database are explained. Then we dive deep into the native multi-model database ArangoDB and its query language - AQL. We give an introduction to the three data-models ArangoDB covers (Documents, Graphs and Key-Values), and explain that AQL is a uniform query language that can cover all three data-models of ArangoDB, so no context switches are necessary.
The major part of the talk will explain the data model and show concrete AQL queries that would occur in an e-commerce platform. Max will demonstrate the multi-model advantages of AQL and how they lead to better performance and to a simpler life for developers.
Video: https://youtu.be/9MUhdPpPpPc
In this talk I will explain the motivation behind the multi model database approach, discuss its advantages and limitations, and will keep the presentation concrete and practice oriented by showing concrete usage examples from node.js .
MongoDB's flexible schema makes it a great fit for your next content management application as its data model makes it easy to catalog multiple content types with diverse meta data. In this session, we'll review schema design for content management, using GridFS for storing binary files, and how you can leverage MongoDB's auto-sharding to partition your content across multiple servers.
guacamole: an Object Document Mapper for ArangoDBMax Neunhöffer
In this talk I will give a brief introduction and overview for guacamole, showing how easy it is to get started with using ArangoDB as the persistence layer for a Rails app. I will also explain the philosophy behind ArangoDB's "multi-model approach", but still show concrete code examples, and all of this in 15 minutes.
Similar to An introduction to multi-model databases (20)
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ArangoDB Database
Note: You have to download the slides and use either powerpoint or google slides to make the links clickable.
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3).pptx
Note: You have to download the slides and use either powerpoint or google slides to make the links clickable.
Machine Learning + Graph Databases for Better Recommendations
Presented by Chris Woodward
Note: You have to download the slides and use either powerpoint or google slides to make the links clickable.
Machine Learning + Graph Databases for Better Recommendations
Presented by Chris Woodward
The ArangoML Group had a detailed discussion on the topic "GraphSage Vs PinSage" where they shared their thoughts on the difference between the working principles of two popular Graph ML algorithms. The following slidedeck is an accumulation of their thoughts about the comparison between the two algorithms.
These are the slides from the Getting Started with ArangoDB Oasis webinar: https://www.arangodb.com/events/getting-started-with-arangodb-oasis/
Get your own Oasis with a free 14-day trial (no credit card required) at https://cloud.arangodb.com/home.
Hacktoberfest 2020 'Intro to Knowledge Graph' with Chris Woodward of ArangoDB and reKnowledge. Accompanying video is available here: https://youtu.be/ZZt6xBmltz4
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?ArangoDB Database
View the video of this webinar here: https://www.arangodb.com/arangodb-events/gvisor-kata-containers-firecracker-docker/
Containers* have revolutionized the IT landscape and for a long time. Docker seemed to be the default whenever people were talking about containerization technologies**. But traditional container technologies might not be suitable if strong isolation guarantees are required. So recently new technologies such as gVisor, Kata Container, or firecracker have been introduced to close the gap between the strong isolation of virtual machines and the small resource footprint of containers.
In this talk, we will provide an overview of the different containerization technologies, discuss their tradeoffs, and provide guidance for different use cases.
* We will define the term container in more detailed during the talk
** and yes we will also cover some of the pre-docker container space!
The long-awaited Managed Service for ArangoDB is finally here! Users have a fully managed document, graph, and key/value store, plus a search engine, in one place. As we thought of such a powerful service — something that gives you room to breathe, relax, and having someone else taking care of everything —, we called it Oasis.
In this live webinar, Ewout Prangsma, Architect & Teamlead of ArangoDB Oasis, walks you through all the main capabilities of the new service, including high availability, elastic scalability, enterprise-grade security, and also demo the different deployment modes you have at your fingertips.
Before the Q&A part, Ewout also shares what you will be capable of in the future.
The new ArangoDB 3.5 release is here and includes a number of minor and major new features. For example, the ability to perform distributed JOIN operations with SmartJoins, new text search features in ArangoSearch, new consistent backup mechanism, and extended graph database features including k-shortest path queries and the new PRUNE keyword for more efficient queries. Jörg Schad, our Head of Engineering and Machine Learning, will discuss these new features and provide a hands-on demo on how to leverage them for your use case.
Associated webinar recording: https://youtu.be/sTWVmw4GT9A
The new ArangoDB 3.5 release is here and includes a number of minor and major new features. For example, the ability to perform distributed JOIN operations with SmartJoins, new text search features in ArangoSearch, new consistent backup mechanism, and extended graph database features including k-shortest path queries and the new PRUNE keyword for more efficient queries. Jörg Schad, our Head of Engineering and Machine Learning, will discuss these new features and provide a hands-on demo on how to leverage them for your use case.
The Computer Science Behind a modern Distributed DatabaseArangoDB Database
What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Every application needs a stateful layer which holds the data. There are several different necessary components which are anything but trivial to combine, and, of course, even more challenging when attempting to optimize for performance. Over the past years there has been significant progress in both the science and practical implementations of such data stores. In this talk Dan Larkin-York will introduce the audience to some of the challenges, address the difficulties of their interplay, and cover key approaches taken by some of the industry’s leaders (ArangoDB, Cassandra, CockroachDB, MarkLogic, and more).
Hadoop clusters can store nearly everything in a cheap and blazingly fast way to your data lake. Answering questions and gaining insights out of this ever growing stream becomes the decisive part for many businesses. Increasingly data has a natural structure as a graph, with vertices linked by edges, and many questions arising about the data involve graph traversals or other complex queries, for which one does not have an a priori given bound on the length of paths.
Fault-tolerant and thus distributed applications are hard, but Mesos is the ultimate breeding ground for them. Rule number one for designing such beasts is to use well-known, tried and tested, and in particular proven methods and tools as building blocks.
Video recording of the talk: https://www.youtube.com/watch?v=6O_Wuc9FUXg&index=1&list=PLbzoR-pLrL6pLSHrXSg7IYgzSlkOh132K
By Michael Hackstein (@mchacki)
The complexity and amount of data rises. Modern graph databases are designed to handle the complexity but still not for the amount of data. When hitting a certain size of a graph many dedicated graph databases reach their limits in vertical or most common horizontal scalability. In this talk I´ll provide a brief overview about current approaches and their limits towards scalability. Dealing with complex data in a complex system doesn't make things easier... but more fun finding a solution. Join me on my journey to handle billions of edges in a graph database.
Polyglot Persistence & Multi-Model Databases by Michael Hackstein @mchacki
In many modern applications the database side is realized using polyglot persistence – store each data format (graphs, documents, etc.) in an appropriate separate database. This approach yields several benefits, databases are optimized for their specific duty, however there are also drawbacks:
- keep all databases in sync
- queries might require data from several databases
- experts needed for all used systems
A multi-model database is not restricted to one data format, but can cope with several of them. In this talk I will present how a multi-model database can be used in a polyglot persistence setup and how it will reduce the effort drastically.
http://www.jugh.de/termine/polyglot-persistence-multi-model-nosql-databases-69.html
How to design software and APIs for parallelism on modern hardware
- Richard Parker, mathematician and freelance computer programmer in Cambridge, England
I see only three techniques capable of speeding up well written but old software by a factor of ten or more, namely
- Using multiple cores
- Making good use of cache memory
- Using the full power of a single processor.
So far, commercial software seems to have ignored recent changes in hardware. Often, performance does not really matter, but when it does matter, it requires changes to the design and coding of the computer software.
I have been looking into the changes needed for several years now, and it is gradually emerging that one aspect sticks out as being more important than all the others.
If functions, subroutines, methods etc. do a large number of whatever they do, rather than just one, it becomes possible to write clever implementations that run much faster. Several examples of this will be discussed.
I suggest that software designed and written today should take this into account, making it easier to speed it up if and when it becomes necessary to do so.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.