Graph Databases try to make it easy for developers to leverage huge amounts of connected information for everything from routing to recommendations. Doing that poses a number of challenges on the implementation side. In this talk we want to look at the different storage, query and consistency approaches that are used behind the scenes. We’ll check out current and future solutions used in Neo4j and other graph databases for addressing global consistency, query and storage optimization, indexing and more and see which papers and research database developers take inspirations from.
This developer-focused webinar will explain how to use the Cypher graph query language. Cypher, a query language designed specifically for graphs, allows for expressing complex graph patterns using simple ASCII art-like notation and offers a simple but expressive approach for working with graph data.
During this webinar you'll learn:
-Basic Cypher syntax
-How to construct graph patterns using Cypher
-Querying existing data
-Data import with Cypher
-Using aggregations such as statistical functions
-Extending the power of Cypher using procedures and functions
An overview of two types of graph databases: property databases and knowledge/RDF databases, together with their dominant respective query languages, Cypher and SPARQL. Also a quick look at some property DB frameworks, including TinkerPop and its query language, Gremlin.
These webinar slides are an introduction to Neo4j and Graph Databases. They discuss the primary use cases for Graph Databases and the properties of Neo4j which make those use cases possible. They also cover the high-level steps of modeling, importing, and querying your data using Cypher and touch on RDBMS to Graph.
This developer-focused webinar will explain how to use the Cypher graph query language. Cypher, a query language designed specifically for graphs, allows for expressing complex graph patterns using simple ASCII art-like notation and offers a simple but expressive approach for working with graph data.
During this webinar you'll learn:
-Basic Cypher syntax
-How to construct graph patterns using Cypher
-Querying existing data
-Data import with Cypher
-Using aggregations such as statistical functions
-Extending the power of Cypher using procedures and functions
An overview of two types of graph databases: property databases and knowledge/RDF databases, together with their dominant respective query languages, Cypher and SPARQL. Also a quick look at some property DB frameworks, including TinkerPop and its query language, Gremlin.
These webinar slides are an introduction to Neo4j and Graph Databases. They discuss the primary use cases for Graph Databases and the properties of Neo4j which make those use cases possible. They also cover the high-level steps of modeling, importing, and querying your data using Cypher and touch on RDBMS to Graph.
The openCypher Project - An Open Graph Query LanguageNeo4j
We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.
openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.
We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.
The purpose of this talk is to provide more details regarding the above-mentioned aspects.
We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.
openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.
We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.
The purpose of this talk is to provide more details regarding the above-mentioned aspects.
An introduction to Neo4j and Graph Databases. Learn about the primary use cases for Graph Databases and explore the properties of Neo4j that make those use cases possible.
Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
Optimizing Your Supply Chain with the Neo4j GraphNeo4j
With the world’s supply chain system in crisis, it’s clear that better solutions are needed. Digital twins built on knowledge graph technology allow you to achieve an end-to-end view of the process, supporting real-time monitoring of critical assets.
Complex hierarchical relationships between entities can only be mapped with difficulty in a relational database and demanding queries are usually quite slow.
Graph databases are optimized for exactly these kinds of relationships and can provide high-performance results even with huge amounts of data. Moreover, not only the entities that are stored in the database, have attributes, but also their relationships. Queries can look at entities as well as their relationships.
Get to know the basics of graph databases, using Neo4j as an example, and see how it is used C# projects.
Relational databases were conceived to digitize paper forms and automate well-structured business processes, and still have their uses. But RDBMS cannot model or store data and its relationships without complexity, which means performance degrades with the increasing number and levels of data relationships and data size. Additionally, new types of data and data relationships require schema redesign that increases time to market.
A native graph database like Neo4j naturally stores, manages, analyzes, and uses data within the context of connections meaning Neo4j provides faster query performance and vastly improved flexibility in handling complex hierarchies than SQL.
GraphQL is a query language for APIs and a runtime for fulfilling those queries. It gives clients the power to ask for exactly what they need, which makes it a great fit for modern web and mobile apps. In this talk, we explain why GraphQL was created, introduce you to the syntax and behavior, and then show how to use it to build powerful APIs for your data. We will also introduce you to AWS AppSync, a GraphQL-powered serverless backend for apps, which you can use to host GraphQL APIs and also add real-time and offline capabilities to your web and mobile apps. You can follow along if you have an AWS account – no GraphQL experience required!
Level: Beginner
Speaker: Rohan Deshpande - Sr. Software Dev Engineer, AWS Mobile Applications
The openCypher Project - An Open Graph Query LanguageNeo4j
We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.
openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.
We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.
The purpose of this talk is to provide more details regarding the above-mentioned aspects.
We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.
openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.
We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.
The purpose of this talk is to provide more details regarding the above-mentioned aspects.
An introduction to Neo4j and Graph Databases. Learn about the primary use cases for Graph Databases and explore the properties of Neo4j that make those use cases possible.
Neo4j is a powerful and expressive tool for storing, querying and manipulating data. However modeling data as graphs is quite different from modeling data under a relational database. In this talk, Michael Hunger will cover modeling business domains using graphs and show how they can be persisted and queried in Neo4j. We'll contrast this approach with the relational model, and discuss the impact on complexity, flexibility and performance.
Optimizing Your Supply Chain with the Neo4j GraphNeo4j
With the world’s supply chain system in crisis, it’s clear that better solutions are needed. Digital twins built on knowledge graph technology allow you to achieve an end-to-end view of the process, supporting real-time monitoring of critical assets.
Complex hierarchical relationships between entities can only be mapped with difficulty in a relational database and demanding queries are usually quite slow.
Graph databases are optimized for exactly these kinds of relationships and can provide high-performance results even with huge amounts of data. Moreover, not only the entities that are stored in the database, have attributes, but also their relationships. Queries can look at entities as well as their relationships.
Get to know the basics of graph databases, using Neo4j as an example, and see how it is used C# projects.
Relational databases were conceived to digitize paper forms and automate well-structured business processes, and still have their uses. But RDBMS cannot model or store data and its relationships without complexity, which means performance degrades with the increasing number and levels of data relationships and data size. Additionally, new types of data and data relationships require schema redesign that increases time to market.
A native graph database like Neo4j naturally stores, manages, analyzes, and uses data within the context of connections meaning Neo4j provides faster query performance and vastly improved flexibility in handling complex hierarchies than SQL.
GraphQL is a query language for APIs and a runtime for fulfilling those queries. It gives clients the power to ask for exactly what they need, which makes it a great fit for modern web and mobile apps. In this talk, we explain why GraphQL was created, introduce you to the syntax and behavior, and then show how to use it to build powerful APIs for your data. We will also introduce you to AWS AppSync, a GraphQL-powered serverless backend for apps, which you can use to host GraphQL APIs and also add real-time and offline capabilities to your web and mobile apps. You can follow along if you have an AWS account – no GraphQL experience required!
Level: Beginner
Speaker: Rohan Deshpande - Sr. Software Dev Engineer, AWS Mobile Applications
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
Speakers: Michael Hunger (Neo Technology) and Josh Long (Pivotal)
Spring Data Neo4j 3.0 is here and it supports Neo4j 2.0. Neo4j is a tiny graph database with a big punch. Graph databases are imminently suited to asking interesting questions, and doing analysis. Want to load the Facebook friend graph? Build a recommendation engine? Neo4j's just the ticket. Join Spring Data Neo4j lead Michael Hunger (@mesirii) and Spring Developer Advocate Josh Long (@starbuxman) for a look at how to build smart, graph-driven applications with Spring Data Neo4j and Spring Boot.
Find out how NoSQL can help your application with practical examples and use-cases from our Cloud Data Services Developer Advocate Glynn Bird. This webinar won't dwell on the science behind the database, but will walk you through real-life use-cases for NoSQL technologies that you can start using today.
Webinar: https://youtu.be/M_Jqw
Graph Databases in the Microsoft EcosystemMarco Parenzan
With SQL Server and Cosmos Db we now have graph databases broadly available, after being studied for decades in Db theory, or being a niche approach in Open Source with Neo4J. And then there are services like Microsoft Graph and Azure Digital Twins that give us vertical implementations of graph. So let's make a walkaround of graphs in the MIcrosoft ecosystem.
How To Model and Construct Graphs with Oracle Database (AskTOM Office Hours p...Jean Ihm
2nd in the AskTOM Office Hours series on graph database technologies. https://devgym.oracle.com/pls/apex/dg/office_hours/3084
With property graphs in Oracle Database, you can perform powerful analysis on big data such as social networks, financial transactions, sensor networks, and more.
To use property graphs, first, you’ll need a graph model. For a new user, modeling and generating a suitable graph for an application domain can be a challenge. This month, we’ll describe key steps required to construct a meaningful graph, and offer a few tips on validating the generated graph.
Albert Godfrind (EMEA Solutions Architect), Zhe Wu (Architect), and Jean Ihm (Product Manager) walk you through, and take your questions.
Build an Open Source Data Lake For Data ScientistsShawn Zhu
This is a talk I presented in 2019 ICSA (International Chinese Statistics Association) Applied Statistics Symposium in session "How Data Science Drives Success in Enterprises"
Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...javier ramirez
How would you build a database to support sustained ingestion of several hundreds of thousands rows per second while running near real-time queries on top?
In this session I will go over some of the technical decisions and trade-offs we applied when building QuestDB, an open source time-series database developed mainly in JAVA, and how we can achieve over four million row writes per second on a single instance without blocking or slowing down the reads. There will be code and demos, of course.
We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
The talk will motivate why Apache Arrow and related projects (e.g. DataFusion) is a good choice for implementing modern analytic database systems. It reviews the major components in most databases and explains where Apache Arrow fits in, and explains additional integration benefits from using Arrow.
MySQL Day Paris 2018 - MySQL JSON Document StoreOlivier DASINI
NoSQL + SQL = MySQL
MySQL Document Store allows developers to work with SQL relational tables and schema-less JSON collections. To make that possible MySQL has created the X Dev API which puts a strong focus on CRUD by providing a fluent API allowing you to work with JSON documents in a natural way. The X Protocol is a highly extensible and is optimized for CRUD as well as SQL API operations.
MySQL Document store gives users maximum flexibility developing traditional SQL relational applications and NoSQL schema-free document database applications. This eliminates the need for a separate NoSQL document database. Developers can mix and match relational data and JSON documents in the same database as well as the same application. For example, both data models can be queried in the same application and results can be in table, tabular or JSON formats.
The MySQL Document Store architecture consists of the following components:
Native JSON Document Storage - MySQL provides a native JSON datatype is efficiently stored in binary with the ability to create virtual columns that can be indexed. JSON Documents are automatically validated.
X Plugin - The X Plugin enables MySQL to use the X Protocol and uses Connectors and the Shell to act as clients to the server.
X Protocol - The X Protocol is a new client protocol based on top of the Protobuf library, and works for both, CRUD and SQL operations.
X DevAPI - The X DevAPI is a new, modern, async developer API for CRUD and SQL operations on top of X Protocol. It introduces Collections as new Schema objects. Documents are stored in Collections and have their dedicated CRUD operation set.
MySQL Shell - The MySQL Shell is an interactive Javascript, Python, or SQL interface supporting development and administration for the MySQL Server. You can use the MySQL Shell to perform data queries and updates as well as various administration operations.
MySQL Connectors - The following MySQL Connectors support the X Protocol and enable you to use X DevAPI in your chosen language.
MySQL Connector/Node.js
MySQL Connector/PHP
MySQL Connector/Python
MySQL Connector/J
MySQL Connector/NET
MySQL Connector/C++
Big Data Day LA 2015 - How to model anything in Redis by Josiah Carlson of Ze...Data Con LA
Data modeling can be a challenge for any transition to using Redis. Other databases rely on indexes and rich query languages to resolve limitations in your data modeling options, but this doesn't always work with Redis. I will discuss a data modeling technique that I use to solve my volunteer, personal, and professional data modeling challenges.
Looming Marvelous - Virtual Threads in Java Javaland.pdfjexp
Nowadays we have 2 options for concurrency in Java:
* simple, synchronous, blocking code with limited scalability that tracks well linearly at runtime, or.
* complex, asynchronous libraries with high scalability that are harder to handle.
Project Loom aims to bring together the best aspects of these two approaches and make them available to developers.
In the talk, I'll briefly cover the history and challenges of concurrency in Java before we dive into Loom's approaches and do some behind-the-scenes implementation. To manage so many threads reasonably needs some structure - for this there are proposals for "Structured Concurrency" which we will also look at. Some examples and comparisons to test Loom will round up the talk.
Project Loom is included in Java 19 and 20 as a preview feature, it can already be tested how well it works with our applications and libraries.
Spoiler: Pretty good.
Easing the daily grind with the awesome JDK command line toolsjexp
Included in the JDK installation are a lot of handy tools for Java developers, from java, jshell and jcmd to jfr and jdeprscan. These allow you to analyze a running JVM, generate JRE's, run Java source code and much more. In this talk I would like to present a number of these tools with practical examples and thus expand the toolbox of the participants. With the command line tools, many tasks can be automated and executed more efficiently, leaving more time for the exciting things in developer life.
Today, we have 2 options for concurrency in Java:
Simple, synchronous, blocking code with limited scalability that tracks well linearly at runtime, or
complex, asynchronous libraries with high scalability, which are harder to handle
Project Loom aims to bring together the best aspects of these two approaches and make them available to developers.
In the talk, I'll briefly discuss the history and challenges of concurrency in Java before we dive into Loom's approaches and look a bit behind the scenes.
Project Loom is included since Java 17 as a preview feature, it can already be tested to see how well it works with our applications and libraries. Spoiler: Pretty good.
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxjexp
I was there when Cypher was invented in 2012
and have been using it ever since. The language is
extremely powerful and easy to learn. But to truly
master it, you need to understand how it works
internally and how the database executes your
queries. In this session, you'll learn to look behind
the scenes at execution plans with PROFILE and
EXPLAIN and which specific clauses, expressions,
structures, and operations help you minimize
Cypher and database operations. After this talk,
you should be able to speed up your Cypher
statements quite a bit.
The newly released Neo4j Connector for Apache Spark can be used to read and write data between the two systems.
In this demo I show how to use the investigative Data from the FinCEN files to have a full pipeline up an running.
Notebook is in https://github.com/jexp/fincen
How Graphs Help Investigative Journalists to Connect the Dotsjexp
The Journalists of the ICIJ used graph technology to understand the relationships between the leaked pieces of information in the Panama and Paradise Papers.
NBC News applied graph algorithms to the messages and follower networks of Russian Twitter trolls to gain further insights.
The Trumpworld organizational data correlated with US bills and government contracts offers starting points for further investigations.
New tools like graph databases allow data journalists to understand the intricate networks of the criminal, economic and political world better as those three examples show. Each journalist adding new connections helps others to validate their stories. They say "It's like magic".
Join Michael for a look behind the scenes of graph based data ingestion, analysis and investigation.
We will use the open source graph database Neo4j, data visualization and graph algorithms to read between the lines.
Who doesn't know him, the office hero, who sat in the office late into the evening and repaired production? The fact that perhaps another colleague sat on the sofa at home and had an equal share in this success is unfortunately not so appreciated in most company cultures. But why is that? Because we are not used to working at home? Because we think that you are not so productive at home? Because you have family, garden or other activities at home? Michael has been working for distributed companies for a long time, but has also worked in offices for a long time. He will take you on his journey through different working environments and tell you what worked well for him.
The JVM is already a runtime for many languages. With the optimizing Graal compiler added to Java 11 and the language implementations in Truffle for Ruby, Python, JavaScript, and R it becomes possible to run them natively on the JVM, even exchanging data between them.
Michael Hunger explains the concepts behind Truffle and Graal and uses a practical example to show how you can use Python and JavaScript for “stored procedures” in a JVM-based database.
He demonstrates how to optimize the startup time of your application and container images by precompiling it to machine-code and examines its limits and the difference it makes. But nothing is perfect—Michael discusses the limitations and compares performances for the full picture.
Presentation at OSCON, PDX 2019.
https://conferences.oreilly.com/oscon/oscon-or/public/schedule/detail/76092
Neo4j Graph Streaming Services with Apache Kafkajexp
In this presentation we give an high level overview of the Neo4j-Kafka integration and the Confluent partnership.
Providing change-data-capture and ingestion capabilities as Neo4j Extension and the Kafka Connect Neo4j Sink on Confluent Hub allows you to integrate real-time streaming with graph querying and analytics.
APOC Pearls - Whirlwind Tour Through the Neo4j APOC Procedures Libraryjexp
APOC has become the de-facto standard utility library for Neo4j. In this talk, I will demonstrate some of the lesser known but very useful components of APOC that will save you a lot of work. You will also learn how to combine individual functions into powerful constructs to achieve impressive feats
This will be a fast-paced demo/live-coding talk.
Video: https://neo4j.com/graphconnect-2018/session/neo4j-utility-library-apoc-pearls
Unicorn images by TeeTurtle.com (Unstable Unicorns is a fun game & cool t-shirts)
Code we've written once has to be kept readable, maintainable, understandable and extensible for many years. Good code is not self-serving but the foundation for working together.
Refactoring can help you to keep the quality of the relevant parts of our systems high.
The technique is really easy (almost too easy) - improve the naming, structure, and responsibility in small steps that don't change behavior and run your tests after each step.
18 years ago I got hooked on Refactoring when Martin Fowler's first book came out. I've been using it since then on a daily basis on many different projects. Since then a lot has changed, especially with the help of modern IDEs with their automated refactorings and intentions.
Now he asked me to help review the 2nd edition. Our discussions reminded me that each generation of developers should be taught this crucial skill. That's why I want to give an overview of core refactorings and code-smells but also demonstrate the tips and tricks of today's tools that make this task so much easier.
Plus a sneak preview of the upcoming book.
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...jexp
Highlighting the progress in Neo4j 3.3 and 3.4 especially
Neo4j Desktop, Graph Algorithms, NLP, Date-Time, Geospatial, and performance.
Also featuring the new visualization tool Neo4j Bloom.
GraphQL - The new "Lingua Franca" for API-Developmentjexp
Three years ago, with the release of the GraphQL specification, Facebook took a fresh stab at the topic of "API design between remote services and applications." The key aspects of GraphQL provide a common, schema-based, domain-specific language and flexible, dynamic queries at interface boundaries.
In the talk, I'd like to compare GraphQL and REST and showcase benefits for developers and architects using a concrete example in application and API development, data source and system integration.
Not only, is our data is getting not just more complex but also more connected. In order not to lose sight of the web of information, but to use it as a source of new insights and opportunities, technologies such as graph databases can help.
For both analytical and transactional use cases, they allow efficient storage, retrieval, and processing of networked data without loss of detail. In this talk, we want to get to know existing tools and techniques for graph data processing.
We recently released the Neo4j graph algorithms library.
You can use these graph algorithms on your connected data to gain new insights more easily within Neo4j. You can use these graph analytics to improve results from your graph data, for example by focusing on particular communities or favoring popular entities.
We developed this library as part of our effort to make it easier to use Neo4j for a wider variety of applications. Many users expressed interest in running graph algorithms directly on Neo4j without having to employ a secondary system.
We also tuned these algorithms to be as efficient as possible in regards to resource utilization as well as streamlined for later management and debugging.
In this session we'll look at some of these graph algorithms and the types of problems that you can use them for in your applications.
Despite the “Graph” in the name, GraphQL is mostly used to query relational databases, object models or APIs. But it is really easy to support GraphQL endpoints from graph databases too. In this talk, I’ll demonstrate how we implemented a GraphQL extension for the Neo4j graph database. It uses the GraphQL schema definition map arbitrary GraphQL queries into single graph queries and runs them against the data in the Graph database. Using directives in the schema, we added some cool features that are transparent to the end user like computed fields and auto-generated mutations and query types. That allows you to create GraphQL APIs of some complexity without writing a single line of code.
I will show how to use the Neo4j-GraphQL extension, by creating an endpoint for the Game of Thrones dataset, and how we then can use our well-known tools (GraphiQL, apollo-client, graphql-cli, voyager) to interact with it.
Despite the “Graph” in the name, GraphQL is mostly used to query relational databases or object models. But it is really well suited to querying graph databases too. In this talk, I’ll demonstrate how I implemented a GraphQL endpoint for the Neo4j graph database and how you would use it in your app.
The world around us is full of connected information. Neo4j was originally developed to solve two complex "network" problems in a document management system, as it was too hard to manage rich connection information efficiently in traditional and new "NOSQL" databases.During this meetup, we will talk about the technology, and about the journey that a couple of technologists from Malmö took. You will learn* how Neo Technology grew from just the three founders in to a global database company with use-cases in every domain imaginable.* how focusing on customer and community feedback allows us to provide a solution for managing connected data to everyone, not just the large internet companies.
Of course we will also introduce the graph model, it's whiteboard friendlyness and how you get started with Neo4j and it's easy and powerful query language Cypher. We'll also compare the graph and relational data model to see how they differ in shape and capabilities. Finally we discuss the foundations that enable Graph databases to provide higher join performance, faster development processes and more inclusive software for all stakeholders. With use-cases from Gaming, Dating and Finance we'll see how to apply the graph capabilities to these domains to realize new functionality or opportunities that were not possible before.
Finally, if there's a question you've always wanted to ask/discuss, we'll have plenty of time for that at the end of Michael's presentation.
Each of the files or classes of a projects source code represents a tree (AST). Looking at dependencies to other classes besides inheritance creates a graph though. Field types and method parameters are also implicit dependencies. Storing this information in a graph database like Neo4j allows for interesting queries and insights. Class-Graph provides that and is available as open-source github project.
In this talk, Michael Hunger is going to shed some light over the new High Availability architecture for the popular Neo4j Graph Database. We are going to look at the different variants of the Paxos protocol, master failover strategies and cluster management state handling. This piece of infrastructure poses non-trivial challenges to distributed consensus-finding, an interesting session for anyone into scalable systems.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
2. @mesirii
Graph Databases try to make it easy for developers to leverage huge amounts
of connected information for everything from routing to recommendations.
Doing that poses a number of challenges on the implementation side. In this
talk we want to look at the different storage, query and consistency
approaches that are used behind the scenes. We’ll check out current and
future solutions used in Neo4j and other graph databases for addressing
global consistency, query and storage optimization, indexing and more and
see which papers and research database developers take inspirations from.
11. @mesirii
A short history
There was a CMS/DMS in Sweden
Which had two big issues
Language independent Keywords
Complex Access Control for SaaS
RDBMS failed
In Memory Graph was cool
Dot Com Bubble burst
A new star was born
29. @mesirii
Implementation Designs
● Adjacency List
● Adjacency Matrix (compressed)
● Sparse Matrices
● Column Store
● HexaStore
● Hash Index
● Document Store
● Object Storage
30. @mesirii
● pre-materialize connections
● store "neighbours" with each node
● direct memory pointer
● cheap O(1) lookup, O(n) scan
● random memory access !
● Properties on Relationships
● Grouped by Type & Direction
● Neo4j
Adjacency List
Node Rel
Rel
Rel
Rel
Rel
Rel
Rel
Rel
Node
Node
Node
Node Rel
Rel
Rel
Rel
Node
Node
31. @mesirii
Adjacency Matrix
● matrix with nodes as
○ row and column
○ cell is relationship
○ can contain weight
○ 0 … no relationship
● matrix operations as
graph operations
● size is a problem (N^2)
● need to compress
● e.g. bitsets (SparkSee)
32. @mesirii
Sparse Matrix
● linear algebra
● GraphBLAS
○ research & development from Uni Texas
● efficient sparse matrices on CPU & GPU
● matrix operations (and filters) as graph
operations
● RedisGraph
33. @mesirii
Column Store
● sort by "natural ids"
● all properties and relationships as very wide columns
● need fixed schema
34. @mesirii
Hash / Hybrid Index
● Nodes and Relationships are documents
● Additional HashIndex(Source, Target) -> Linked List of Rels
● Used in ArangoDB
36. @mesirii
Hexa-Store
● Used by TripleStores
● Backed by Key-Value Store
● Store all combinations of triples
○ S-P-O
○ S-O-P
○ P-S-O
○ P-O-S
○ O-P-S
○ O-S-P
● And use prefix search for lookups/expand
● JS - GunDB, DGraph, Cayley
38. @mesirii
Native Database
• Each Database is native to it's core model
• eg. relational, column
• so optimized for that model in storage and
operations
• And non-native to other models that you put on top
• which causes lack of safety, performance,
expressiveness
47. @mesirii
File System
Record based files
Fixed Size Records
ID = Record ID
Offset = ID * Block Size
Pointer = Memory +
Offset
Nodes
Relationships
Properties
48. @mesirii
Page Cache
● OS Memory Mapping insufficient
● Which pages are important when (LRU-K)
● Transactional Guarantees / Isolation
● Concurrency
● Use for other types (indexes)
○ Generational Datastructures
● Seed cache
50. @mesirii
DB Engine
● Low Level Kernel SPI for common operations
● Only works with primitives / arrays
● Off Heap (tx, index, metadata, next: query state)
● Record Access
● Transaction Layer (Isolation)
● reusable Cursors (Prefetching)
● soon: Store Abstraction
53. @mesirii
Neo4j Type System (Cypher, Drivers, Browser)
Null
Missing or unknown
value
Boolean
True or false
Integer
64-bit signed integer
Float
Double precision
floating point
Spatial
different 2d and 3d
coordinate systems
Bytes
Raw octet stream
String
Unicode text
List
Ordered collection
Map
Keyed collection
Temporal
(local)date(time)
duration
Structure
Node Relationship Path
54. @mesirii
Why the hell - 4j?
Good
Founders were Java Developers
Easier to hire
Java has memory management (GC)
Java NIO
Portability
JVM got way faster/better of the years
Extensibility in all JVM Languages
Can utilize GraalVM
Bad
Little Access to low level system capabilities
(Cache, Memory, Network)
Need to use Unsafe
Garbage Collection (unpred. pauses)
No value types
Scala runtime behavior
C-Libraries are harder to integrate
59. @mesirii
SQL
SELECT distinct c.CompanyName
FROM customers AS c
JOIN orders AS o
ON (c.CustomerID = o.CustomerID)
JOIN order_details AS od
ON (o.OrderID = od.OrderID)
JOIN products AS p
ON (od.ProductID = p.ProductID)
WHERE p.ProductName = 'Chocolat'
65. @mesirii
A (real) Question
Find all Actors and Movies they acted in
Whose name contains the letter "a"
Aggregate the frequency and movie titles
Filter by who acted in more than 5 movies
Return their name, birth year and movie titles
Ordered by number of movies
Limited to top 10
66. @mesirii
A (real) Cypher Query
MATCH (a:Person)-[:ACTED_IN]->(m:Movie)
WHERE a.name CONTAINS "a"
WITH a,
count(m) AS cnt,
collect(m) AS movies
WHERE cnt > 5
RETURN a.name, a.born,
[m IN movies | m.title] as titles
ORDER BY size(movies) DESC
LIMIT 10
68. @mesirii
● cost based planner
○ e.g. index selectivity, db-statistics
● IDP (Iterative Dynamic Programming)
● Loads of papers on query plannig
Query Planning
71. @mesirii
openCypher
● open-source query language spec
● implementers group
● publishes artifacts
● reference implementation
● open collaboration
● toward a new standard
○ fun with standards orgs
74. @mesirii
Architecture & Data Flow
Application
Cypher Bolt Driver
Cypher Bolt Server
Neo4j
MATCH (a:Person)
WHERE a.name = 'Alice'
RETURN a.surname, a.age
{surname: 'Smith',
age: 33}
Parameterised
Cypher
Result
Stream
metadata
75. @mesirii
Driver Implementation
● Versioned Protocol (Handshake)
● Packstream Protocol based on MessagePack
● Asynchronous w/ sync APIs
● Uses Netty on Server
● Reactive w/ backpressure in v2 next year
77. @mesirii
Driver Concepts
Driver
Top-level object for all Neo4j interaction
Session
Logical context for sequence of transactions
Transaction
Unit of work
Statement Result
Stream of records plus metadata
79. @mesirii
Python (blocking)
uri = "bolt://localhost:7687"
driver = GraphDatabase.driver(uri, auth=("neo4j", "p4ssw0rd"))
def print_names(tx):
result = tx.run("MATCH (a:Person) RETURN a.name")
for record in result:
print(record["a.name"])
with driver.session() as session:
session.read_transaction(print_names)
80. @mesirii
Driver Implementation
● Versioned Protocol (Handshake)
● Packstream Protocol based on MessagePack
● Asynchronous w/ sync APIs
● Uses Netty on Server
● Reactive w/ backpressure in v2 next year
● Same architecture across languages
81. @mesirii
Transaction Routing
Connection
to reader
Session
Load Balancing Connection Pool
Connection
to writer
Connection
to reader
session.read_transaction(...) session.read_transaction(...)
session.write_transaction(...)driver.session() session.close()ACQUIRE
RELEASE
ACQUIRE
RELEASE
ACQUIRE
RELEASE
84. @mesirii
Server Selection Strategy
The Round Robin strategy (prior to 1.5)
continues to try all servers in turn,
leading to a severe backlog of work and
a dramatically lower overall throughput.
The Least Connected strategy
(introduced in 1.5) leads to only a
proportional drop in throughput under
the same circumstances, as the
misbehaving server is avoided.
one server starts to run slow
86. @mesirii
Clustering History
1. Zookeeper
2. Paxos (v1)
3. Paxos (v2)
4. Raft
"Raft is a consensus algorithm that is designed to be easy
to understand. It's equivalent to Paxos in fault-tolerance
and performance." raft.github.io
93. @mesirii
Clustering (next)
● Analytics on Reporting Instances
● Cluster member integration with Spark
● Distributed linear Transactions
● Sharding
● Workload based sharding
95. @mesirii
Past Graph Compute Options
● Data Processing
○ Spark with GraphX, Flink with Gelly
○ Gremlin Graph Computer
● Dedicated Graph Processing
○ Urika, GraphLab, Giraph, Mosaic,
GPS, Signal-Collect, Gradoop
● Data Scientist Toolkit
○ igraph, NetworkX, Boost in Python, R, C
96. @mesirii
Pregel - Bulk Synchronous Parallel (BSP)
The map-reduce for graph compute.
Node-Centric Processing
1. Each node sends message
about it's own state
2. Each node receives messages
from neighbours
3. Updates it's own state
4. Global Compute Superstep
100. @mesirii
How does it work?
Procedures
Neo4j
In Memory
Graph
Read projected
graph
Load projected
graph
Graph
Loader
Execute
algorithm
Store
results
1
2
4
3
Every operation is concurrent
101. @mesirii
How do you use it?
1. Call as Cypher procedure
2. Pass in specification (Label, Prop, Query) and configuration
3. ~.stream variant returns (a lot) of results
CALL algo.<name>.stream('Label','TYPE',{conf})
YIELD nodeId, score
4. non-stream variant writes results to graph; returns statistics
CALL algo.<name>('Label','TYPE',{conf})