Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Neo4j: What's Under the Hood & How Knowing This Can Help You

243 views

Published on

Speaker: Philip Rathle, VP of Products, Neo4j

Published in: Technology
  • Be the first to comment

Neo4j: What's Under the Hood & How Knowing This Can Help You

  1. 1. Neo4j: What’s Under the Hood How knowing this can help you Philip Rathle VP of Product Management @prathle
  2. 2. 1. Choose the right technology tool for the job 2. Solve intractable problems: (Business) <--> ( IT) 3. Identify new business opportunities Today’s Takeaways:
  3. 3. 3 (Perspectives)-[:Shape]->(Understanding)
  4. 4. 1. A Historical Perspective
  5. 5. Data Management in 1979 Paper Forms Tiny RAM Spinning Platters (Low Capacity / Slow, Sequential IO) RDBMS Relational Model The RDBMS Era Confidential - Neo4j, Inc.
  6. 6. Data Management Today Dynamic Real-World Systems Abundant RAM Flash & IO Co- Processors (High-Capacity Storage & Ultra-Fast Random I/O) Confidential - Neo4j, Inc. A New Graph Era Emerging Neo4j Property Graph Model Real-Time Connected Data
  7. 7. 2. An IT Portfolio Perspective
  8. 8. 8 TRADITIONAL DATABASES Store and retrieve data Real time storage & retrieval Up to 3 Max # of hops IT Portfolio Perspective
  9. 9. 9 TRADITIONAL DATABASES BIG DATA TECHNOLOGY Store and retrieve data Aggregate and filter data Real time storage & retrieval Long running queries Aggregation & filtering Up to 3 Max # of hops 1 IT Portfolio Perspective
  10. 10. 10 TRADITIONAL DATABASES BIG DATA TECHNOLOGY Store and retrieve data Aggregate and filter data Connections in data Real time storage & retrieval Real-Time Connected Insights Long running queries Aggregation & filtering “Our Neo4j solution is literally thousands of times faster than the prior MySQL solution, with queries that require 10-100 times less code” Volker Pacher, Senior Developer Up to 3 Max # of hops 1 Millions IT Portfolio Perspective
  11. 11. Illustration by David Somerville based on the original by Hugh McLeod (@gapingvoid) RDBMS & Aggregate- Oriented NoSQL Hadoop / EDW/ Columnar RDBMS |<———————- Graph Database & ———————>| Graph Compute Engine (Graph Transactions & Analytics)
  12. 12. 3. A Technical Architecture Perspective Core Technology Differences
  13. 13. What Makes Neo4j Different? Index-Free Adjacency 13
  14. 14. Connectedness and Size of Data Set ResponseTime Relational and Other NoSQL Databases 0 to 2 hops 0 to 3 degrees Thousands of connections 1000x Advantage Tens to hundreds of hops Thousands of degrees Billions of connections Neo4j “Minutes to milliseconds” This Enables: “Minutes to Milliseconds” Real-Time Query Performance
  15. 15. ACID Consistency Non ‘Graph-ACID’ DBMSs 15 Maintains Integrity Over Time Guaranteed Graph Consistency Becomes Corrupt Over Time Not ‘Good Enough’ for Graphs And is Supported By: ACID Graph Writes : A Requirement for Graph Transactions
  16. 16. A Language For Connected Data Cypher Query Language 16 MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report) WHERE boss.name = “John Doe” RETURN sub.name AS Subordinate, count(report) AS Total Project Impact Less time writing queries • More time understanding the answers • Leaving time to ask the next question Less time debugging queries: • More time writing the next piece of code • Improved quality of overall code base Code that’s easier to read: • Faster ramp-up for new project members • Improved maintainability & troubleshooting
  17. 17. Where in the Organization Do Graphs Add Value 17
  18. 18. The Connected Enterprise Consumers of Connected Data 18 AI & Graph Analytics • Sentiment analysis • Customer segmentation • Machine learning • Cognitive computing • Community detection Transactional Graphs • Fraud detection • Real-time recommendations • Network and IT operations management • Knowledge Graphs • Master Data Management Discovery & Visualization • Fraud detection • Network and IT operations • Product information management • Risk and portfolio analysis Data Scientists Business Users Applications
  19. 19. Neo4j Graph Database Platform 19
  20. 20. 20 Development & Administration Analytics Tooling Graph Analytics Graph Transactions Data Integration Discovery & VisualizationDrivers & APIs AI Neo4j Graph Platform
  21. 21. 21 Neo4j Database Anatomy Full Stack, Native Graph DB Cost-Based Optimizer Role-Based Security Native Graph Engine Transaction Logging/Backup/Recovery Management & monitoring Binary Wire Protocol Clustering Neo4j Neo4jNeo4j Integrations Cypher Query Language Procedures Programmatic Language Drivers MATCH (a)-->(b)
  22. 22. Neo4j Graph Database: Enterprise Features 22 Neo4j Security Foundation Multi-Clustering Support for Global Internet Apps Rolling Upgrades Schema Constraints Concurrent/Transactional Write Performance Auto Cache Reheating For Restarts, Restores and Cluster Expansion Neo4j 3.4 now supports rolling upgrades 3.4 3.5 Upgrade older instances while keeping other members stable and without requiring a restart of the environment 3.5
  23. 23. Neo4j Fabric Schema-Based Security Multi- Database Neo4j 4.0: What’s Coming 23 Reactive Drivers
  24. 24. Neo4j Desktop: A Neo4j Developer’s Toolchest • Mission control for developers • Connect to both local and remote Neo4j servers • Includes development license for Neo4j Enterprise Edition • Manages updates, graph apps, and add-ons • Free with registration https://neo4j.com/download
  25. 25. Working with Graphs 25
  26. 26. Graphs & Data Science/AI 26
  27. 27. Strictly Confidential Graph Data Science Maturity Model Predictive Accuracy & Richness DataScienceComplexity Query-Based Knowledge Graph Knowledge Graphs Graph Embeddings Graph Neural Networks Graph Native Learning Query-Based Feature Engineering Graph Algorithm Feature Engineering Graph Feature Engineering 27
  28. 28. Strictly Confidential #1: Knowledge Graphs Enterprise Maturity DataScienceComplexity Query-Based Knowledge Graph Query-Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks 28
  29. 29. Strictly Confidential Query-Based Knowledge Graphs Connecting the Dots Multiple graph layers of financial information Includes corporate data with cross- relationships, external news, and customized weighting Dashboards and tools • Credit risk • Investment risk • Portfolio news recommendations has become... 29
  30. 30. Strictly Confidential Graph Data Science Maturity Levels 2 & 3: Graph-Enhanced Feature Engineering 30
  31. 31. Strictly Confidential Definitions Feature Engineering: Developing machine learning inputs that have predictive value 31 Graph-Enhanced Features: ML inputs that express information about the connections Or they can describe relationships: Features can describe facts:
  32. 32. Strictly Confidential #2: Query-Based Feature Engineering Enterprise Maturity DataScienceComplexity Query-Based Knowledge Graph Query-Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks 32
  33. 33. Strictly Confidential het.io - HetioNet Knowledge graph integrating 50+ years of biomedical data Leveraged to predict new uses for drugs by using the graph topology to create features to predict new links Query-Based Feature Engineering Mining Data for Drug Discovery 33
  34. 34. Strictly Confidential Query-Based Feature Engineering Mining Data for Drug Discovery het.io - HetioNet Knowledge graph integrating 50+ years of biomedical data Leveraged to predict new uses for drugs by using the graph topology to create features to predict new links 34
  35. 35. Strictly Confidential Query-Based Feature Engineering Mining Data for Drug Discovery 35 Returning ”All Paths” reveals new avenues of study:
  36. 36. Strictly Confidential #3: Graph Algorithm Based Feature Engineering Enterprise Maturity DataScienceComplexity Query-Based Knowledge Graph Query-Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks 36
  37. 37. Strictly Confidential Graph Algorithms: Basis for Connected Features Pathfinding and Search Finds the optimal paths or evaluates route availability and quality Centrality / Importance Determines the importance of distinct nodes in the network Community Detection Detects group clustering or partition options Heuristic Link Prediction Estimates the likelihood of nodes forming a relationship Evaluates how alike nodes are Similarity Embeddings Learned representations of connectivity or topology 37
  38. 38. Strictly Confidential Graph algorithms that might add value in this situation: Connected components to identify disjointed graphs sharing identifiers PageRank to measure influence and transaction volumes Louvain to identify communities that frequently interact Jaccard to measure account similarity Graph-Enhanced Feature Engineering Example: Detecting Financial Fraud Add connected features to existing pipelines, to increase detection accuracy & reduce false positives 38
  39. 39. Strictly Confidential Graph Algorithms Available Today in Neo4j • Parallel Breadth First Search • Parallel Depth First Search • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality Pathfinding & Search Centrality / Importance • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi-Step • Balanced Triad (identification) Community Detection • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Similarity https://neo4j.com/docs/ graph-algorithms/current/ Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors 39
  40. 40. Strictly Confidential Data Lake Graph DB Platform Machine Learning Platform Data Engineering: Bringing it All Together Graph Transactions Graph Analytics Morpheus Explore with Cypher Reshape & ship tables into graphs More tools coming in Spark 3.0 with SparkCypher Persist relevant data as a graph Manage knowledge graphs Run connected feature extraction Carry out graph analytics Bring query-based graph features to ML pipeline 40
  41. 41. Strictly Confidential Learning Resources 41 Neo4j Desktop Neo4j Browser neo4j.com/ graph-algorithms- book/ Graph Algo Book Cypher & Algo Documentation and Examples
  42. 42. 42 Neo4j Community & Ecosystem
  43. 43. GraphTour Chicago Sponsors! 43 Neo4j Community & Ecosystem
  44. 44. Thanks! 44
  45. 45. 1. Knowledge Graphs Context for Decisions 2. Connected Feature Extraction Context for Credibility 4. AI Explainability3. Graph- Accelerated AI Context for Efficiency Context for Accuracy Four Pillars of Graph-Enhanced AI
  46. 46. Strictly Confidential Spark Graph Native Graph Platform Machine Learning Example: Spark and Neo4j Workflow Graph Transactions Graph Analytics Cypher 9 in Spark 3.0 to create non-persistent graphs MLlib to train models Native Graph algorithms, processing, and storage Morpheus 46
  47. 47. Strictly Confidential Explore Graphs Build Graph Solutions Massively scalable Powerful data pipelining Robust ML Libraries Non-persistent, non-native graphs Persistent, dynamic graphs Graph native query and algorithm performance Constantly growing list of graph algorithms and embeddings 47
  48. 48. Strictly Confidential The Path to Graph Data Science Enterprise Maturity DataScienceComplexity Query-Based Knowledge Graph Query-Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks 48
  49. 49. Strictly Confidential Embedding transforms graphs into a feature vector, or set of vectors, describing topology, connectivity, or attributes of nodes and edges in the graph Graph Embeddings • Vertex/node embeddings: describe connectivity of each node • Path embeddings: traversals across the graph • Graph embeddings: encode an entire graph into a single vector Phases of Deep Walk Approach 49
  50. 50. Strictly Confidential Graph Embeddings RECOMMENDATIONS Explainable Reasoning over Knowledge Graphs for Recommendations 50 Pop Folk Castle on the Hill ÷ Album Ed Sheeran I See FireTony Shape of You SungBy IsSingerOf Interact Produce WrittenBy Derek Recommendations for Derek
  51. 51. Strictly Confidential Training multi-layer neural networks using gradient descent Deep Learning 51 HIDDEN LAYERS Input Output 9
  52. 52. Strictly Confidential Graph Native Learning refers to deep learning models that take a graph as an input, performs computations, and return a graph Graph Native Learning Battaglia et al, 201852
  53. 53. Strictly Confidential The Path to Graph Data Science Query-Based Knowledge Graph Query-Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Enterprise Delivery DataScienceComplexity Knowledge Graphs Graph Feature Engineering Graph Native Learning Graph Persistence53
  54. 54. Thank You! @prathle 54
  55. 55. A Word on Graph Query Languages 55 openCypher Most graph database applications use Cypher Industry-shared open initiative since 2015 Makes Cypher available to databases & tools http://www.opencypher.org ISO GQL Formal ISO Language Standard In-Progress. Sibling to SQL. Expected to be highly compatible with Cypher https://www.gqlstandards.org

×