- The in-memory workspace of the Graph Data Science library allows efficient reshaping and subsetting of transactional graphs for analytical algorithms.
- Named graphs stored in the catalog can be reused, mutated, and modified for different algorithm runs or workflows. The catalog also allows managing and dropping graphs.
- Projections, such as native and Cypher projections, determine how nodes and relationships are shaped from the transactional graph into the in-memory workspace, including controlling labels, types, orientations, aggregations, and properties.
Max De Marzi is a Neo4j Field Engineer who provides tips and tricks for graph data modeling. The document discusses modeling concepts like:
- Modeling with a property graph data model where nodes can have labels, properties, and relationships to other nodes.
- How traditional relational databases struggle with relationships due to joins slowing performance as data grows, whereas graph databases can traverse relationships in real-time.
- Examples of modeling different domains as graphs including acting, flights, twitter, forms, and supply chains.
- Key advantages of graph databases over relational and other NoSQL databases for modeling flexible, high performance relationships between entities.
This document provides an introduction and overview of Neo4j and graph databases. It begins with an explanation of the limitations of relational databases in modeling relationships and includes slides on Neo4j's native graph data model and architecture. Additional slides cover Neo4j use cases, modeling with graphs, the Neo4j platform and features like the cloud, drivers, and visualization tools. The document concludes with examples of recommender systems queries in Cypher.
This document provides an overview of graph databases and Neo4j. It begins with an introduction to graph databases and their advantages over relational databases for modeling connected data. Examples of real-world use cases that are well-suited for graph databases are given. The document then describes the core components of the graph data model including nodes, relationships, properties, and labels. It provides examples of how to model data as a graph and query graphs using Cypher, the query language for Neo4j. The document concludes by discussing Neo4j as an example of a graph database and its key features and capabilities.
Applying graph analytics on data stored in relational databases can provide tremendous value in many application domains. We discuss the importance of leveraging these analyses, and the challenges in enabling them. We present a tool, called GraphGen, that allows users to visually explore, and rapidly analyze (using NetworkX) different graph structures present in their databases.
WebGL and COLLADA were discussed as technologies for 3D rendering and asset interchange on the web. The presentation covered the history and capabilities of both standards. It also described approaches for loading COLLADA assets into WebGL, such as preprocessing COLLADA into JSON or loading XML directly and parsing it with JavaScript. Optimizing COLLADA assets for WebGL rendering through techniques like quantization and compression was also mentioned.
The magic of (data parallel) distributed systems and where it all breaks - Re...Holden Karau
Distributed systems can seem magical, and sometimes all of the magic works and our job succeeds. However, if you've worked with them for a long enough time you've found a few places where the magic starts to break down and the fact that it's actually a collection of several hundred garden gnomes* rather than a single large garden gnome.
This talk will use Apache Spark, Beam, Flink, Kafka, and Map Reduce to explore the world of data parallel distributed systems. We'll start with some happy pieces of magic, like how we can combine different transformations into a single pass over the data, working between different languages, data partitioning, and lambda serialization. After each new piece of magic is introduced we'll look at how it breaks in one (or two) of the systems.
Come to be told it's not your fault everything is broken, or if your distributed software still works an exciting preview of everything that's going to go wrong. Don't work with distributed systems? Come to be reassured you've made good life choices.
A fast introduction to PySpark with a quick look at Arrow based UDFsHolden Karau
This talk will introduce Apache Spark (one of the most popular big data tools), the different built ins (from SQL to ML), and, of course, everyone's favorite wordcount example. Once we've got the nice parts out of the way, we'll talk about some of the limitations and the work being undertaken to improve those limitations. We'll also look at the cases where Spark is more like trying to hammer a screw. Since we want to finish on a happy note, we will close out with looking at the new vectorized UDFs in PySpark 2.3.
Max De Marzi is a Neo4j Field Engineer who provides tips and tricks for graph data modeling. The document discusses modeling concepts like:
- Modeling with a property graph data model where nodes can have labels, properties, and relationships to other nodes.
- How traditional relational databases struggle with relationships due to joins slowing performance as data grows, whereas graph databases can traverse relationships in real-time.
- Examples of modeling different domains as graphs including acting, flights, twitter, forms, and supply chains.
- Key advantages of graph databases over relational and other NoSQL databases for modeling flexible, high performance relationships between entities.
This document provides an introduction and overview of Neo4j and graph databases. It begins with an explanation of the limitations of relational databases in modeling relationships and includes slides on Neo4j's native graph data model and architecture. Additional slides cover Neo4j use cases, modeling with graphs, the Neo4j platform and features like the cloud, drivers, and visualization tools. The document concludes with examples of recommender systems queries in Cypher.
This document provides an overview of graph databases and Neo4j. It begins with an introduction to graph databases and their advantages over relational databases for modeling connected data. Examples of real-world use cases that are well-suited for graph databases are given. The document then describes the core components of the graph data model including nodes, relationships, properties, and labels. It provides examples of how to model data as a graph and query graphs using Cypher, the query language for Neo4j. The document concludes by discussing Neo4j as an example of a graph database and its key features and capabilities.
Applying graph analytics on data stored in relational databases can provide tremendous value in many application domains. We discuss the importance of leveraging these analyses, and the challenges in enabling them. We present a tool, called GraphGen, that allows users to visually explore, and rapidly analyze (using NetworkX) different graph structures present in their databases.
WebGL and COLLADA were discussed as technologies for 3D rendering and asset interchange on the web. The presentation covered the history and capabilities of both standards. It also described approaches for loading COLLADA assets into WebGL, such as preprocessing COLLADA into JSON or loading XML directly and parsing it with JavaScript. Optimizing COLLADA assets for WebGL rendering through techniques like quantization and compression was also mentioned.
The magic of (data parallel) distributed systems and where it all breaks - Re...Holden Karau
Distributed systems can seem magical, and sometimes all of the magic works and our job succeeds. However, if you've worked with them for a long enough time you've found a few places where the magic starts to break down and the fact that it's actually a collection of several hundred garden gnomes* rather than a single large garden gnome.
This talk will use Apache Spark, Beam, Flink, Kafka, and Map Reduce to explore the world of data parallel distributed systems. We'll start with some happy pieces of magic, like how we can combine different transformations into a single pass over the data, working between different languages, data partitioning, and lambda serialization. After each new piece of magic is introduced we'll look at how it breaks in one (or two) of the systems.
Come to be told it's not your fault everything is broken, or if your distributed software still works an exciting preview of everything that's going to go wrong. Don't work with distributed systems? Come to be reassured you've made good life choices.
A fast introduction to PySpark with a quick look at Arrow based UDFsHolden Karau
This talk will introduce Apache Spark (one of the most popular big data tools), the different built ins (from SQL to ML), and, of course, everyone's favorite wordcount example. Once we've got the nice parts out of the way, we'll talk about some of the limitations and the work being undertaken to improve those limitations. We'll also look at the cases where Spark is more like trying to hammer a screw. Since we want to finish on a happy note, we will close out with looking at the new vectorized UDFs in PySpark 2.3.
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
Here are a few key points about adding vertex colors to the example:
- Storing the color data in a separate buffer is cleaner than concatenating or interleaving it with the position data. This keeps the data layout simple.
- The vertex shader now has inputs for both the position (vp) and color (vc) attributes.
- The color is passed through as an output (fcolour) to the fragment shader.
- The position is still used to set gl_Position for transformation.
- The color input has to start in the vertex shader because that is where per-vertex attributes like color are interpolated across the primitive before being sampled in the fragment shader. The vertex shader interpolates the color value
GraphFrames Access Methods in DSE GraphJim Hatcher
GraphFrames is a powerful feature in Spark that allows you to harness Spark's distributed computing framework to operate on your Graph. Tasks like data ingestion, schema migrations, and analytical jobs can all be run against your Graph. In DSE Graph, there are several methods to leverage GraphFrames including Gremlin, Spark SQL, and Motif. This presentation walks through the basics of using GraphFrames with DSE Graph; then shows how these different methods can be used and how you can evaluate which one is the best for your use case.
Strategies for refactoring and migrating a big old project to be multilingual...benjaoming
The document discusses strategies for refactoring a large project to support multiple databases and languages. It describes migrating the project to use PostgreSQL schemas to separate data for different games into different schemas. It also explains refactoring the code base to split it into separate applications and using django-parler to add multilingual support.
The document discusses modeling considerations for graph databases using the Game of Thrones database as an example. Some key points made include questioning whether the current model captures all necessary information like individuals involved in battles. It is also noted that modeling statuses as labels may not accurately represent changing states over time. The document advocates for a question-driven and iterative approach to modeling to refine the model as understanding improves.
Glen Smith discusses ways to reduce duplication in Grails user interfaces using Grails resources, Bootstrap, and Less CSS. Resources allow bundling and minimizing JavaScript and CSS, improving performance. Bootstrap provides pre-built HTML and CSS components. Less CSS extends CSS with features like variables, mixins, and nesting to reduce duplication. The talk demonstrates using these techniques and plugins to standardize fonts, layouts, forms, and navigation across a Grails application.
Fancy is a JavaScript library that combines Underscore functions with functional programming concepts to allow for writing functional code in a more readable way. It constructs FancyArrays and FancyObjects from normal arrays and objects that allow chaining of Underscore functions. This functional approach to problems like evaluating poker hands avoids side effects, combines functions, and solves problems conceptually rather than using loops. While it may not implement all aspects of functional programming, Fancy emphasizes the use of functions and avoids state to make code more reusable, abstracted, and easier to write and extend.
A super fast introduction to Spark and glance at BEAMHolden Karau
Apache Spark is one of the most popular general purpose distributed systems, with built in libraries to support everything from ML to SQL. Spark has APIs across languages including Scala, Java, Python, and R -- with more 3rd party language support (like Julia & C#). Apache BEAM is a cross-platform tool for building on top of different distributed systems, but its in it’s early stages. This talk will introduce the core concepts of Apache Spark, and look to the potential future of Apache BEAM.
Apache Spark has two core abstractions for representing distributed data and computations. This talk will introduce the basics of RDDs and Spark DataFrames & Datasets, and Spark’s method for achieving resiliency. Since it’s a big data talk, we will include the almost required wordcount example, and end the Spark part with follow up pointers on Spark’s new ML APIs. For folks who are interested we’ll then talk a bit about portability, and how Apache BEAM aims to improve portability (as well it’s unique approach to cross-language support).
Slides from Holden's talk at https://www.meetup.com/Wellington-Data-Scaling-Chats/events/mdcsdpyxcbxb/
This document contains notes from a presentation given by Neal Ford on productivity techniques for programmers. Some key topics covered include: accelerating work by using keyboard shortcuts, search over navigation, reducing distractions, applying DRY principles, and automating repetitive tasks. Ford advocates focusing on acceleration, focus, and automation to work more efficiently. He provides many examples of tools and techniques to improve productivity.
This document provides an overview of graphs for artificial intelligence and machine learning. It discusses definitions of machine learning and AI, as well as common techniques like predictive analytics, transfer learning, and human-like AI. It then covers how graph databases and graph algorithms can be applied to domains like social networks, knowledge graphs, and recommender systems. Specific graph algorithms like triadic closure, structural balance, and graph partitioning are examined. The document also explores emerging areas like graph neural networks, graph convolutional networks, and using graph structures for causal models. It argues that representing data as graphs and applying graph algorithms can provide intelligent behavior without needing general human-level artificial intelligence.
Soft Shake Event / A soft introduction to Neo4JFlorent Biville
The document discusses graph databases and the Neo4j graph database. It begins with an introduction to graphs and property graphs. It then covers topics like the Neo4j data model, core API, querying with Cypher, object graph mapping with Spring Data, and example use cases for graph databases like recommendations, fraud detection, and network analysis.
DevFest Istanbul - a free guided tour of Neo4JFlorent Biville
2013-11-02 : DevFest Türkiye, Istanbul.
Slightly modified version of my previous Neo4J introduction talk about Neo4J in Soft-Shake Event, Geneva, Switzerland.
The document discusses approaches for reducing driver overhead in OpenGL applications. It introduces several OpenGL APIs that can be used to achieve this, including persistent mapped buffers for dynamic geometry, multi-draw indirect for batching draw calls, and packing 2D textures into arrays. Speakers then provide details on implementing these techniques and the performance improvements they provide, such as reducing overhead by 5-10x and allowing an order of magnitude more unique objects per frame. Bindless textures and sparse textures are also covered as advanced methods for further optimizing texture handling and memory usage.
Keeping the fun in functional w/ Apache Spark @ Scala Days NYCHolden Karau
Apache Spark has been a great driver of not only Scala adoption, but introducing a new generation of developers to functional programming concepts. As Spark places more emphasis on its newer DataFrame & Dataset APIs, it’s important to ask ourselves how we can benefit from this while still keeping our fun functional roots. We will explore the cases where the Dataset APIs empower us to do cool things we couldn’t before, what the different approaches to serialization mean, and how to figure out when the shiny new API is actually just trying to steal your lunch money (aka CPU cycles).
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
Watch video at: http://youtu.be/Wg2boMqLjCg
Want to learn how to write faster and more efficient programs for Apache Spark? Two Spark experts from Databricks, Vida Ha and Holden Karau, provide some performance tuning and testing tips for your Spark applications
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
The document describes the Cloudera Data Science Challenge, which involves solving three data science problems using large datasets. For the first problem, Smartfly, the goal is to predict flight delays using historical flight data and machine learning algorithms like logistic regression and SVM. The second problem, Almost Famous, involves statistical analysis of web log data and filtering for spam. The third problem, Winklr, requires social network analysis to recommend users to follow on a social media platform based on click data. The document discusses the approaches, tools, and algorithms used to solve each problem at scale using Apache Spark and Hadoop technologies.
The document describes the Cloudera Data Science Challenge, which involves solving three data science problems using large datasets. For the first problem, Smartfly, the goal is to predict flight delays using historical flight data and machine learning algorithms like logistic regression and SVM. The second problem, Almost Famous, involves statistical analysis of web log data and filtering for spam. The third problem, Winklr, requires analyzing a social network graph to recommend users to follow. The document discusses the approaches, tools, and results for each problem.
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
Here are a few key points about adding vertex colors to the example:
- Storing the color data in a separate buffer is cleaner than concatenating or interleaving it with the position data. This keeps the data layout simple.
- The vertex shader now has inputs for both the position (vp) and color (vc) attributes.
- The color is passed through as an output (fcolour) to the fragment shader.
- The position is still used to set gl_Position for transformation.
- The color input has to start in the vertex shader because that is where per-vertex attributes like color are interpolated across the primitive before being sampled in the fragment shader. The vertex shader interpolates the color value
GraphFrames Access Methods in DSE GraphJim Hatcher
GraphFrames is a powerful feature in Spark that allows you to harness Spark's distributed computing framework to operate on your Graph. Tasks like data ingestion, schema migrations, and analytical jobs can all be run against your Graph. In DSE Graph, there are several methods to leverage GraphFrames including Gremlin, Spark SQL, and Motif. This presentation walks through the basics of using GraphFrames with DSE Graph; then shows how these different methods can be used and how you can evaluate which one is the best for your use case.
Strategies for refactoring and migrating a big old project to be multilingual...benjaoming
The document discusses strategies for refactoring a large project to support multiple databases and languages. It describes migrating the project to use PostgreSQL schemas to separate data for different games into different schemas. It also explains refactoring the code base to split it into separate applications and using django-parler to add multilingual support.
The document discusses modeling considerations for graph databases using the Game of Thrones database as an example. Some key points made include questioning whether the current model captures all necessary information like individuals involved in battles. It is also noted that modeling statuses as labels may not accurately represent changing states over time. The document advocates for a question-driven and iterative approach to modeling to refine the model as understanding improves.
Glen Smith discusses ways to reduce duplication in Grails user interfaces using Grails resources, Bootstrap, and Less CSS. Resources allow bundling and minimizing JavaScript and CSS, improving performance. Bootstrap provides pre-built HTML and CSS components. Less CSS extends CSS with features like variables, mixins, and nesting to reduce duplication. The talk demonstrates using these techniques and plugins to standardize fonts, layouts, forms, and navigation across a Grails application.
Fancy is a JavaScript library that combines Underscore functions with functional programming concepts to allow for writing functional code in a more readable way. It constructs FancyArrays and FancyObjects from normal arrays and objects that allow chaining of Underscore functions. This functional approach to problems like evaluating poker hands avoids side effects, combines functions, and solves problems conceptually rather than using loops. While it may not implement all aspects of functional programming, Fancy emphasizes the use of functions and avoids state to make code more reusable, abstracted, and easier to write and extend.
A super fast introduction to Spark and glance at BEAMHolden Karau
Apache Spark is one of the most popular general purpose distributed systems, with built in libraries to support everything from ML to SQL. Spark has APIs across languages including Scala, Java, Python, and R -- with more 3rd party language support (like Julia & C#). Apache BEAM is a cross-platform tool for building on top of different distributed systems, but its in it’s early stages. This talk will introduce the core concepts of Apache Spark, and look to the potential future of Apache BEAM.
Apache Spark has two core abstractions for representing distributed data and computations. This talk will introduce the basics of RDDs and Spark DataFrames & Datasets, and Spark’s method for achieving resiliency. Since it’s a big data talk, we will include the almost required wordcount example, and end the Spark part with follow up pointers on Spark’s new ML APIs. For folks who are interested we’ll then talk a bit about portability, and how Apache BEAM aims to improve portability (as well it’s unique approach to cross-language support).
Slides from Holden's talk at https://www.meetup.com/Wellington-Data-Scaling-Chats/events/mdcsdpyxcbxb/
This document contains notes from a presentation given by Neal Ford on productivity techniques for programmers. Some key topics covered include: accelerating work by using keyboard shortcuts, search over navigation, reducing distractions, applying DRY principles, and automating repetitive tasks. Ford advocates focusing on acceleration, focus, and automation to work more efficiently. He provides many examples of tools and techniques to improve productivity.
This document provides an overview of graphs for artificial intelligence and machine learning. It discusses definitions of machine learning and AI, as well as common techniques like predictive analytics, transfer learning, and human-like AI. It then covers how graph databases and graph algorithms can be applied to domains like social networks, knowledge graphs, and recommender systems. Specific graph algorithms like triadic closure, structural balance, and graph partitioning are examined. The document also explores emerging areas like graph neural networks, graph convolutional networks, and using graph structures for causal models. It argues that representing data as graphs and applying graph algorithms can provide intelligent behavior without needing general human-level artificial intelligence.
Soft Shake Event / A soft introduction to Neo4JFlorent Biville
The document discusses graph databases and the Neo4j graph database. It begins with an introduction to graphs and property graphs. It then covers topics like the Neo4j data model, core API, querying with Cypher, object graph mapping with Spring Data, and example use cases for graph databases like recommendations, fraud detection, and network analysis.
DevFest Istanbul - a free guided tour of Neo4JFlorent Biville
2013-11-02 : DevFest Türkiye, Istanbul.
Slightly modified version of my previous Neo4J introduction talk about Neo4J in Soft-Shake Event, Geneva, Switzerland.
The document discusses approaches for reducing driver overhead in OpenGL applications. It introduces several OpenGL APIs that can be used to achieve this, including persistent mapped buffers for dynamic geometry, multi-draw indirect for batching draw calls, and packing 2D textures into arrays. Speakers then provide details on implementing these techniques and the performance improvements they provide, such as reducing overhead by 5-10x and allowing an order of magnitude more unique objects per frame. Bindless textures and sparse textures are also covered as advanced methods for further optimizing texture handling and memory usage.
Keeping the fun in functional w/ Apache Spark @ Scala Days NYCHolden Karau
Apache Spark has been a great driver of not only Scala adoption, but introducing a new generation of developers to functional programming concepts. As Spark places more emphasis on its newer DataFrame & Dataset APIs, it’s important to ask ourselves how we can benefit from this while still keeping our fun functional roots. We will explore the cases where the Dataset APIs empower us to do cool things we couldn’t before, what the different approaches to serialization mean, and how to figure out when the shiny new API is actually just trying to steal your lunch money (aka CPU cycles).
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
Watch video at: http://youtu.be/Wg2boMqLjCg
Want to learn how to write faster and more efficient programs for Apache Spark? Two Spark experts from Databricks, Vida Ha and Holden Karau, provide some performance tuning and testing tips for your Spark applications
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
The document describes the Cloudera Data Science Challenge, which involves solving three data science problems using large datasets. For the first problem, Smartfly, the goal is to predict flight delays using historical flight data and machine learning algorithms like logistic regression and SVM. The second problem, Almost Famous, involves statistical analysis of web log data and filtering for spam. The third problem, Winklr, requires social network analysis to recommend users to follow on a social media platform based on click data. The document discusses the approaches, tools, and algorithms used to solve each problem at scale using Apache Spark and Hadoop technologies.
The document describes the Cloudera Data Science Challenge, which involves solving three data science problems using large datasets. For the first problem, Smartfly, the goal is to predict flight delays using historical flight data and machine learning algorithms like logistic regression and SVM. The second problem, Almost Famous, involves statistical analysis of web log data and filtering for spam. The third problem, Winklr, requires analyzing a social network graph to recommend users to follow. The document discusses the approaches, tools, and results for each problem.
Atelier - Architecture d’applications de Graphes - GraphSummit ParisNeo4j
Atelier - Architecture d’applications de Graphes
Participez à cet atelier pratique animé par des experts de Neo4j qui vous guideront pour découvrir l’intelligence contextuelle. En utilisant un jeu de données réel, nous construirons étape par étape une solution de graphes ; de la construction du modèle de données de graphes à l’exécution de requêtes et à la visualisation des données. L’approche sera applicable à de multiples cas d’usages et industries.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
SOPRA STERIA - GraphRAG : repousser les limitations du RAG via l’utilisation ...Neo4j
Romain CAMPOURCY – Architecte Solution, Sopra Steria
Patrick MEYER – Architecte IA Groupe, Sopra Steria
La Génération de Récupération Augmentée (RAG) permet la réponse à des questions d’utilisateur sur un domaine métier à l’aide de grands modèles de langage. Cette technique fonctionne correctement lorsque la documentation est simple mais trouve des limitations dès que les sources sont complexes. Au travers d’un projet que nous avons réalisé, nous vous présenterons l’approche GraphRAG, une nouvelle approche qui utilise une base Neo4j générée pour améliorer la compréhension des documents et la synthèse d’informations. Cette méthode surpasse l’approche RAG en fournissant des réponses plus holistiques et précises.
ADEO - Knowledge Graph pour le e-commerce, entre challenges et opportunités ...Neo4j
Charles Gouwy, Business Product Leader, Adeo Services (Groupe Leroy Merlin)
Alors que leur Knowledge Graph est déjà intégré sur l’ensemble des expériences d’achat de leur plateforme e-commerce depuis plus de 3 ans, nous verrons quelles sont les nouvelles opportunités et challenges qui s’ouvrent encore à eux grâce à leur utilisation d’une base de donnée de graphes et l’émergence de l’IA.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
GraphAware - Transforming policing with graph-based intelligence analysisNeo4j
Petr Matuska, Sales & Sales Engineering Lead, GraphAware
Western Australia Police Force’s adoption of Neo4j and the GraphAware Hume graph analytics platform marks a significant advancement in data-driven policing. Facing the challenges of growing volumes of valuable data scattered in disconnected silos, the organisation successfully implemented Neo4j database and Hume, consolidating data from various sources into a dynamic knowledge graph. The result was a connected view of intelligence, making it easier for analysts to solve crime faster. The partnership between Neo4j and GraphAware in this project demonstrates the transformative impact of graph technology on law enforcement’s ability to leverage growing volumes of valuable data to prevent crime and protect communities.
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesNeo4j
David Pond, Lead Product Manager, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Discover the cutting-edge telemetry solution implemented for Alan Wake 2 by Remedy Entertainment in collaboration with AWS. This comprehensive presentation dives into our objectives, detailing how we utilized advanced analytics to drive gameplay improvements and player engagement.
Key highlights include:
Primary Goals: Implementing gameplay and technical telemetry to capture detailed player behavior and game performance data, fostering data-driven decision-making.
Tech Stack: Leveraging AWS services such as EKS for hosting, WAF for security, Karpenter for instance optimization, S3 for data storage, and OpenTelemetry Collector for data collection. EventBridge and Lambda were used for data compression, while Glue ETL and Athena facilitated data transformation and preparation.
Data Utilization: Transforming raw data into actionable insights with technologies like Glue ETL (PySpark scripts), Glue Crawler, and Athena, culminating in detailed visualizations with Tableau.
Achievements: Successfully managing 700 million to 1 billion events per month at a cost-effective rate, with significant savings compared to commercial solutions. This approach has enabled simplified scaling and substantial improvements in game design, reducing player churn through targeted adjustments.
Community Engagement: Enhanced ability to engage with player communities by leveraging precise data insights, despite having a small community management team.
This presentation is an invaluable resource for professionals in game development, data analytics, and cloud computing, offering insights into how telemetry and analytics can revolutionize player experience and game performance optimization.
Generative Classifiers: Classifying with Bayesian decision theory, Bayes’ rule, Naïve Bayes classifier.
Discriminative Classifiers: Logistic Regression, Decision Trees: Training and Visualizing a Decision Tree, Making Predictions, Estimating Class Probabilities, The CART Training Algorithm, Attribute selection measures- Gini impurity; Entropy, Regularization Hyperparameters, Regression Trees, Linear Support vector machines.
2. ● What is the Graph Catalog?
● Named graphs versus Anonymous graphs
● Native projection versus Cypher projection
● Mutability
● Graph Catalog management
3. We are still on the gameofthrones database and you can either
run the following guide inside the Neo4j Browser
:play http://neo4jguides.tomgeudens.io/gdscatalog.html
(note that this requires a neo4j.conf setting to whitelist the host)
or you open a regular browser session too and go to
https://bit.ly/neo4j-gds-catalog
and cut-and-paste the commands from there
4.
5.
6. The shape of the graph you use for analytics (and algorithms) is
significantly different from the one you have to run the complex
business queries in real time and do the transactional work. To
reiterate the technical terms …
7. ● is a single set of nodes that are interconnected
● is what you need for the majority of the graph algorithms
If you ever wondered why Facebook (or people leveraging Facebook
data) is so - notoriously - good at analytics … think about what the core
Facebook graph is like ...
8. ● two set of nodes that are connected but the sets themselves
are not interconnected
● great as input for algorithms (such as node similarity) that are
used to create a monopartite graph
If you've done basic Neo4j trainings … the Movie graph is also a
bipartite graph.
9. ● lots of sets of nodes and lots of types of relationships between
them
● ideal for describing a domain or business and for real time
complex queries
This is how we teach you to model in graph modeling classes … did I hit
the point home enough now?
10. Procedures (part of the GDS library) that let you reshape and
subset your transactional graph so you have the right data in the
right shape to run analytical algorithms.
This is what you
already know ...
Native Graph Storage
Page Cache
11. Procedures (part of the GDS library) that let you reshape and
subset your transactional graph so you have the right data in the
right shape to run analytical algorithms.
Mutable In-Memory
Workspace
12. While the in-memory workspace disappears when the database is
stopped (it's ephemeral to use a fancy word) it is also not just a one
reshape, one algorithm run, do-it-all-over-again setup. You can
re-use previous reshapes, mutate them, name them, reuse them.
It's a catalog.
In order to fully grasp that we'll shortly list all the modes in which
you can do the Graph Data Science and then explore them in detail
...
13. Rather than give you some dry explanation … try it out. I (or rather
pageranking) give(s) you … Jon Snow!
CALL gds.pageRank.stream({
nodeProjection: "Person",
relationshipProjection: "INTERACTS"
}) YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
LIMIT 20;
Bummer, that didn't work out ...
14. ● I can't even show you the real Graph Catalog stuff here
(although it is used under the hood) because this really is the
one-shot-fire-and-forget-doing-the-algorithm method.
● Which is relatively easy to learn.
● And as the Person, INTERACTS subgraph is a monopartite
graph, a native projection (aka Look ma, no hands) was possible
● ...
You're not remembering the series or the books wrong though, Jon
Snow should have come out on top … so something was wrong!
15. This time we're going for those that are most prominent in the
battles ...
CALL gds.pageRank.stream({
nodeQuery: "MATCH (p:Person) RETURN id(p) AS id",
relationshipQuery: "MATCH (p1:Person)-[]->(:Battle)<-[]-(p2:Person)
RETURN id(p1) AS source, id(p2) AS target"
}) YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
LIMIT 10;
16. ● Again, not a lot of Graph Catalog stuff to show, the
monopartite graph is shaped on the fly …
● While somewhat more complex (you need to write the queries
to do the projection), the results should immediately be more
relevant (as you're in control) … a great approach for proof of
concepts!
● ...
17. Exactly the same question as we had in Mode II, but this time we're
going to name the graph.
CALL gds.graph.create.cypher(
"gds-brutes",
"MATCH (p:Person) RETURN id(p) AS id",
"MATCH (p1:Person)-[]->(:Battle)<-[]-(p2:Person) RETURN id(p1) AS source,
id(p2) AS target"
) YIELD graphName, nodeCount, relationshipCount
RETURN *;
18. Wait … we haven't actually done the algorithm yet ...
CALL gds.pageRank.stream('gds-brutes') YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC LIMIT 10;
CALL gds.betweenness.stream('gds-brutes') YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
LIMIT 10;
But now … we can just keep going ...
19. ● Now we're getting somewhere … a named graph remains
available in between runs of (potentially) different algorithms.
● Rather than going for an adhoc fire-and-forget, this moves the
ball more towards flexible workflows.
● While Cypher projection is a great tool, it comes with the
downside of being - relatively - slow for huge workloads, …
● ...
Don't get impatient, we'll dig deeper into Catalog management in a
minute … allow me to finish the Fab Four first though … also, did you
notice the difference in who came out on top?
20. Exactly the same question as we had in Mode I, but this time we're
going to name the graph.
CALL gds.graph.create(
"gds-interaction",
"Person",
"INTERACTS"
) YIELD graphName, nodeCount, relationshipCount
RETURN *;
21. And run ...
CALL gds.pageRank.stream('gds-interaction') YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
LIMIT 10;
And keep going ...
CALL gds.betweenness.stream('gds-interaction') YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
LIMIT 10;
22. ● So this is the whole nine yards. And it runs at huge scale
(which you can't see here so you'll have to take my word for it)
● There's a chicken and egg problem though, the monopartite
graph must be in the database already.
● ...
So we finally did get Jon Snow, but pagerank should also have gotten
him. Can anybody venture a guess by now on what we're doing wrong
there?
26. ● The in-memory workspace is the secret sauce of the Graph
Data Science library and is super-efficient. It can handle huge
graph projections.
● It does however require memory and you will quickly run out if
you don't manage it properly.
● Also … you will forget what you put in there if you look at it as a
bottomless pit, thus creating overhead for yourself.
● ...
27. There's a very interesting tool that gives you an overview of the
in-memory workspace. Try it
CALL gds.graph.list();
If you followed along so far, you should get two results …
gds-brutes and gds-interacts. You can also examine them
individually. Try it
CALL gds.graph.list('gds-brutes');
Btw, a CALL requires a YIELD … except when it is a statement by itself.
Hence the missing YIELD and RETURN (for brevity) here ...
28. Done with a named graph? Drop it! As there is something not right
with our interactions one, lets get rid of it
CALL gds.graph.drop('gds-interaction');
And verify with the list command that it's indeed gone ...
CALL gds.graph.list();
29. By popular request the engineering team has been working on a
way to actually persist the complete named projection. And as of
the very latest GDS that tool is there (unpolished for now though) ...
CALL gds.graph.export('gds-brutes',{dbName:"brutes"});
WARNING
You will not find this in the guides and I do not want you to try it
now as the steps will confuse a lot of people. Do try this (and
everything else) at home though!
30. I'm not really supposed to show you this one and there's no
guarantee it will stay in the future, but I find this one extremely
useful myself ...
CALL gds.debug.sysInfo();
Very useful for say … quickly figuring out how low you are on heap and
such ...
31.
32. No, not really
● Unless you're improvising a one-shot thing and even then … the
syntax of these things (unless you're doing a trivial demo) is not
easy, you should follow a workflow and use a Named graph.
● Unless you're using an algorithm that hasn't been converted to
using the workspace yet … well … you don't really have a choice
then … (Pathfinding comes to mind)
33. I tried all of the syntax for all of my presentations during these two
days … as you would/should …
● The original decks still had 3.5.x syntax, Emil Eifrem (our CEO)
has sworn to shoot everybody that still shows 3.5.x stuff
● Obviously I also want to show you the latest GDS library
● There are subtle differences about how to write the projections
in the named syntax versus those in the anonymous syntax
● ...
So spare yourself the frustration and pain and learn the syntax you'll
be using for production. Named graphs. Thank me later!
34.
35. Jon Snow didn't show up as the top dog based on the pagerank
algorithm. And I actually showed you earlier what the issue is ...
A person's interaction with another person is obviously undirected
(or bi-directional, whichever you prefer), but the Property Graph
is directed and in modeling trainings you'll hear to not create a
second relationship (as that would duplicate data) then.
36. However, how would an algorithm know that the domain implies
an undirected relationship as the Property Graph has no schema that
specifies / enforces such information?
The algorithm makes the reasonable (default) assumption that
INTERACTS is a directed relationship. Persons that are on the target
end of them are thus not considered in the pagerank. And it turns
out (and this is purely based on how the data was loaded) that Jon
Snow is frequently the target, rarely the source.
38. ● It takes the Person nodes and puts them in the workspace
(again as Person and note that it didn't have to be).
● It takes the INTERACTS relationships and puts them in the
workspace (again as INTERACTS … idem). Because we specify
the orientation as undirected this will effectively result in
doubling the number of them in the workspace ...
I don't always find all this reshaping that obvious myself. Planning
upfront what you are aiming for is a good idea!
39. I just showed you how to fix the problem for an Anonymous graph,
but now we want it as a Named graph …
● Take the syntax from the Mode IV example and create the
named graph again, this time as gds-interaction-natural
● Try to modify the syntax and create a second named graph,
gds-interaction-undirected
● Using gds.graph.list on both named graphs, can you recognize
the difference? Note it down!
When you are ready (give everybody a bit of a chance though), paste
your solution (to second and third bulletpoint) in the chat ...
41. Nodes
● label(s)
● properties
Relationships
● type(s)
● orientation
● aggregation
● properties
And all those can (but also must) be
controlled with either a Native or a Cypher
projection.
● Cypher gives you complete flexiblity,
Native gives you complete
performance.
● Cypher leaves your original graph
standing as is, Native may require
constructs
44. Instead of going to jail for 25 years, Dewey, Cheatum and Howe avoided
the law for another 10 years of money laundering. False names, true
story ...
Because … while aggregation is great for most analytics usecases,
it also destroyed the clear 1% mule kickback scheme that you could
almost literally see with the naked eye … Transactional fraud
detection.
If only there was a way to shape data efficiently - depending on the
usecase - without destroying the more expressive set that describes our
business ...
45. If you remember one thing (ok, one thing + the puppies) of this
session about the Graph Catalog, that is it. That is the purpose of it
and that's why Neo4j can rightfully claim a prominent place in this
game.
And as an aside … the Native Projection can very efficiently (much more
efficient than Cypher Projection) do aggregations for analytical
purposes.
46. Yes, I know it's an empty slide … how could I possibly fit all of it on such
a thing … allow me to swap to my code editor for a second ...
48. Who cares as long as we all agree that this and not Jon Snow is the top
dog!
49.
50. Each of the algorithms comes with eight procedures.
Try typing
CALL gds.wcc
in the browser without completing the line (or entering) and see
what you get ...
51. Algorithm Task
gds.wcc.stats statistics about the run
gds.wcc.write writes result back to database
gds.wcc.mutate writes result back to in-memory graph
gds.wcc.stream streams result
gds.wcc.stats.estimate estimated memory usage statistics
gds.wcc.write.estimate estimated memory usage write
gds.wcc.mutate.estimate estimated memory usage mutate
gds.wcc.stream.estimate estimated memory usage stream
52. A result-stream out of an algorithm
is quite like the printouts we used
to get at work. Nobody ever looked
at the things and they end up as
drawing paper for the kids … ok, the
similarity stopped a bit before that
point, but you get what I mean.
53. Yes, that is how that is spelled, it's not Segway, that's one of those weird
electrical devices that has you balance on two wheels ...
Any-way … have you ever wondered about how underused the
results of a machine learning pipeline often are? You've spend tons
of energy into learning something and then … it ends up on a four
coloured bar chart in Tableau?
So while we're on the topic … there's this thing called a Property
Graph that allows very flexible modeling of your data and would
happily take good care of your newly learned fact ...
54. One of the reasons I've been using the Graph Data Science library
right from the start (back when it was still called algo) is that it can
write back the results to the database.
Unsure who originally thought of that (I suspect it was by incident),
but it was a stroke of genious. And in order to corroborate that, I
have to talk about ...
55. Did you know about this monopartite and bipartite stuff? And how
it relates to analytics? I mean, know before you heard about it
today and had it spelled out to you?
All of you did? Wow … I'm superimpressed now ...
What has been impressing customers ever since we have Graph
Data Science is the unfailing (golden) combination of similarity
followed by community detection.
Similarity turns bipartite subgraphs into monopartite graphs.
Community detection then segments <whatever it is you want to
segment>. Kerching!
56. Has that become a not-PC sentence yet? It will soon no doubt ...
Writing similarity back (as a relationship) to a graph has some other
nice effects. Suddenly doing recommendations becomes a whole
lot easier. If you know (with a simple pointerhop) who is similar to
me … I'm sure you can find ways to tell me what I like.
Those relationships do clutter up the graph though. Wouldn't it be nice
if I could do the golden combination and only get the communities back
as properties?
57. It has taken a while to make my point but I wanted you to fully
understand why being able to mutate the in-memory workspace is
so useful. Now let us finish this session by putting it in practice ...
CALL gds.graph.create('house-bipartite',
['House','Person'],
{ BELONGS_TO: { type: 'BELONGS_TO', orientation: 'REVERSE'}});