Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Transforming AI with Graphs: Real World Examples using Spark and Neo4j

Graphs – or information about the relationships, connection, and topology of data points – are transforming machine learning. We’ll walk through real world examples of how to get transform your tabular data into a graph and how to get started with graph AI. This talk will provide an overview of how we to incorporate graph based features into traditional machine learning pipelines, create graph embeddings to better describe your graph topology, and give you a preview of approaches for graph native learning using graph neural networks. We’ll talk about relevant, real world case studies in financial crime detection, recommendations, and drug discovery. This talk is intended to introduce the concept of graph based AI to beginners, as well as help practitioners understand new techniques and applications. Key take aways: how graph data can improve machine learning, when graphs are relevant to data science applications, what graph native learning is and how to get started.

  • Login to see the comments

  • Be the first to like this

Transforming AI with Graphs: Real World Examples using Spark and Neo4j

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Alicia Frame, PhD Senior Data Scientist, Neo4j Transforming AI with Graphs: Real World Examples with Spark & Neo4j #UnifiedDataAnalytics #SparkAISummit
  3. 3. Financial Services Drug Discovery Recommendations Cybersecurity Predictive Maintenance Customer Segmentation Churn Prediction Search/MDM Graph Data Science Applications
  4. 4. CAR DRIVES name: “Dan” born: May 29, 1970 twitter: “@dan” name: “Ann” born: Dec 5, 1975 since: Jan 10, 2011 brand: “Volvo” model: “V70” Latitude: 37.5629900° Longitude: -122.3255300° Nodes • Can have Labels to classify nodes • Labels have native indexes Relationships • Relate nodes by type and direction Properties • Attributes of Nodes & Relationships • Stored as Name/Value pairs • Can have indexes and composite indexes MARRIED TO LIVES WITH OW NS PERSON PERSON 7 Labeled Property Graphs
  5. 5. • Current data science models ignore network structure • Graphs add highly predictive features to existing ML models • Otherwise unattainable predictions based on relationships Novel & More Accurate Predictions with the Data You Already Have Machine Learning Pipeline
  6. 6. “The idea is that graph networks are bigger than any one machine-learning approach. Graphs bring an ability to generalize about structure that the individual neural nets don't have.” "Where do the graphs come from that graph networks operate over?”
  7. 7. Building a Graph ML Model Data Sources Native Graph Platform Machine Learning Aggregate Disparate Data and Cleanse Build Predictive ModelsUnify Graphs and Engineer Features Parquet JSON and more… MLlib and more…
  8. 8. Spark Graph Native Graph Platform Machine Learning Example: Spark & Neo4j Workflow Graph Transactions Graph Analytics Cypher 9 in Spark 3.0 to create non- persistent graphs MLlib to Train Models Native Graph Algorithms, Processing, and Storage
  9. 9. Explore Graphs Build Graph Solutions • Massively scalable • Powerful data pipelining • Robust ML Libraries • Non-persistent, non-native graphs • Persistent, dynamic graphs • Graph native query and algorithm performance • Constantly growing list of graph algorithms and embeddings
  10. 10. The Steps of Graph Data Science Query Based Knowledge Graph Query Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks DataScienceComplexity Knowledge Graphs Graph Feature Engineering Graph Native Learning Graph Persistence
  11. 11. Steps Forward in Graph Data Science Query Based Knowledge Graph Query Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Enterprise Maturity DataScienceComplexity
  12. 12. Query based knowledge graphs: Connecting the Dots at NASA “Using Neo4j someone from our Orion project found information from the Apollo project that prevented an issue, saving well over two years of work and one million dollars of taxpayer funds.”
  13. 13. Steps Forward in Graph Data Science Query Based Knowledge Graph Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Query Based Feature Engineering Enterprise Maturity DataScienceComplexity
  14. 14. Churn prediction research has found that simple hand- engineered features are highly predictive • How many calls/texts has an account made? • How many of their contacts have churned? Query-Based Feature Engineering Telecom-churn prediction Telecommunication networks are easily represented as graphs
  15. 15. Query-Based Feature Engineering Telecom-churn prediction 23 Add connected features based on graph queries to tabular data Khan et al, 2015
  16. 16. Spark Graph Native Graph Platform Machine Learning • Merge distributed data into DataFrames • Reshape your tables into graphs • Explore cypher queries • Move to Neo4j to build expert queries • Persist your graph Knowledge Graphs: Getting Started Example with Spark • Bring query based graph features to ML pipeline Graph Transactions Graph Analytics
  17. 17. Steps Forward in Graph Data Science Query Based Feature Engineering Graph Embeddings Graph Neural Networks Query Based Knowledge Graph Graph Algorithm Feature Engineering Enterprise Maturity DataScienceComplexity
  18. 18. Feature Engineering is how we combine and process the data to create new, more meaningful features, such as clustering or connectivity metrics. Graph Feature Engineering Add More Descriptive Features: - Influence - Relationships - Communities
  19. 19. Graph Feature Categories & Algorithms Pathfinding & Search Finds the optimal paths or evaluates route availability and quality Centrality / Importance Determines the importance of distinct nodes in the network Community Detection Detects group clustering or partition options Heuristic Link Prediction Estimates the likelihood of nodes forming a relationship Evaluates how alike nodes are Similarity Embeddings Learned representations of connectivity or topology
  20. 20. • Connected components to identify disjointed graphs sharing identifiers • PageRank to measure influence and transaction volumes • Louvain to identify communities that frequently interact • Jaccard to measure account similarity based on relationships Financial Crime: Detecting Fraud Large financial institutions already have existing pipelines to identify fraud via heuristics and models Graph based features improve accuracy:
  21. 21. +142,000 Peer Reviewed Publications Graph Fraud / Anomaly Detection in the last 10 years
  22. 22. Spark Graph Native Graph Platform Machine Learning • Merge distributed data into DataFrames • Reshape your tables into graphs • Explore cypher queries and simple algorithms • Persist your graph • Create rule based features • Run native graph algorithms and write to graph or stream Graph Feature Engineering: Getting Started Example with Spark • Bring graph features to ML pipeline for training Graph Transactions Graph Analytics
  23. 23. Graph Algorithms in Neo4J • Parallel Breadth First Search • Parallel Depth First Search • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi-Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/ graph-algorithms/current/ Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors
  24. 24. Graph Algorithms in Neo4J • Parallel Breadth First Search • Parallel Depth First Search • Shortest Path • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity – 1 Step & Multi-Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/ graph-algorithms/current/ Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors
  25. 25. Steps Forward in Graph Data Science Query Based Knowledge Graph Graph Algorithm Feature Engineering Graph Neural Networks Query Based Feature Engineering Graph Embeddings Enterprise Maturity DataScienceComplexity
  26. 26. Embedding transforms graphs into a vector, or set of vectors, describing topology, connectivity, or attributes of nodes and edges in the graph Graph Embeddings • Vertex embeddings: describe connectivity of each node • Path embeddings: traversals across the graph • Graph embeddings: encode an entire graph into a single vector
  27. 27. Explainable Reasoning over Knowledge Graphs for Recommendation Graph Embeddings - Recommendations
  28. 28. Explainable Reasoning over Knowledge Graphs for Recommendation Graph Embeddings - Recommendations
  29. 29. Spark Graph Native Graph Platform Machine Learning • Merge distributed data into DataFrames • Reshape your tables into graphs • Explore cypher queries and simple algorithms • Move to Neo4j to build expert queries • Write to persist • Stay tuned for DeepWalk and DeepGL algorithms Graph Feature Engineering: Getting Started Example with Spark • Bring graph features to ML pipeline for training Graph Transactions Graph Analytics
  30. 30. Steps Forward in Graph Data Science Query Based Knowledge Graph Graph Algorithm Feature Engineering Query Based Feature Engineering Graph Neural Networks Graph Embeddings Enterprise Maturity DataScienceComplexity
  31. 31. Deep Learning refers to training multi-layer neural networks using gradient descent Graph Native Learning
  32. 32. Graph Native Learning refers to deep learning models that take a graph as an input, performs computations, and return a graph Graph Native Learning Battaglia et al, 2018
  33. 33. Example: electron path prediction Bradshaw et al, 2019 Graph Native Learning Given reactants and reagents, what will the products be? Given reactants and reagents, what will the products be?
  34. 34. Example: electron path prediction Graph Native Learning
  35. 35. Progressing in Graph Data Science Query Based Knowledge Graph Query Based Feature Engineering Graph Algorithm Feature Engineering Graph Embeddings Graph Neural Networks Enterprise Maturity DataScienceComplexity Knowledge Graphs Graph Feature Engineering Graph Native Learning Graph Persistence
  36. 36. Resources Business • neo4j.com/use-cases/artificial-intelligence-analytics/ Data Scientists/Developers • neo4j.com/sandbox • neo4j.com/developer/ • community.neo4j.com alicia.frame@neo4j.com @aliciaframe1 44#UnifiedAnalytics #SparkAISummit
  37. 37. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

×