Successfully reported this slideshow.
Your SlideShare is downloading. ×

Graph Analytics with ArangoDB

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 56 Ad

More Related Content

Slideshows for you (20)

Similar to Graph Analytics with ArangoDB (20)

Advertisement

More from ArangoDB Database (20)

Recently uploaded (20)

Advertisement

Graph Analytics with ArangoDB

  1. 1. http://bit.ly/ArangoDBGraphAnalytics
  2. 2. tl;dr Graph Analytics Answer questions from Graph Data 2 Graph Embeddings and Graph Neural Networks Learning Graphs Graph-based Machine Learning Metadata Utilizing Graphs for Operating ML Infrastructure https://dzone.com/articles/graph-databases-machine-learning
  3. 3. Challenge...
  4. 4. Agenda ML Infrastructure & Metadata Graphs Graph Database Graph Analytics Graph Embeddings Graphs Neural Networks Part 2
  5. 5. Jörg Schad, PhD ● ○ ○ ○ ● @joerg_schad
  6. 6. ● ● ● ● ●
  7. 7. This workshop... 7 … is for you! Please share ● Expectations ● Questions ● Feedback ● Ask for breaks if needed ● …. … is also virtual! ● Let us work together in these times!
  8. 8. Who are you? 8 Background Expectations ...
  9. 9. This workshop... 9 https://github.com/joerg84/Graph_Powered_ML_Workshop
  10. 10. Why should you care? 10 https://towardsdatascience.com/predictions-and-hopes-for-graph-ml-in-2021-6af2121c3e3d
  11. 11. What problems can we solve? Graph Analytics Answer questions from Graph - Community Detection - Recommendations - Centrality - Path Finding - Fraud Detection - Permission Management - ... 11 Graph Embeddings and Graph Neural Networks Learning Graphs - Node/Link Classification - Link Prediction - Classification of Graphs - ... Graph-based Machine Learning Metadata Utilizing Graphs for Operating ML Infrastructure - Data Provenance - Audit Trails - Privacy (GDPR/CCPA) - ,,,
  12. 12. Agenda ML Infrastructure & Metadata Graphs Graph Database Graph Analytics Graph Embeddings I Graphs Neural Networks
  13. 13. Graph Analytics with ArangoDB Graph Data Model ● Connections are first class citizens ● Vertices and Edges ● Native or build on top of other data models 13
  14. 14. Graph Analytics with ArangoDB Graph Properties ● (un)directed ○ Facebook vs Twitter ● weighted ● Sparse/Dense ● (a)cyclic Graph Queries ● Traversals ● Search ● Graph Algorithms 14
  15. 15. Optional Lab: Graphs & Properties https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Graph_properties.ipynb
  16. 16. Graph Analytics with 16 ▸ ▸ ▸ ▸
  17. 17. Graph Databases 17
  18. 18. 18 AQL - A Query Language That Feels Like Coding ● Common query language for all data-models ● Aims to be human-readable ● Same language for all clients, no matter what programming language people use ● Easy to understand for anyone with an SQL background FOR c IN company FILTER c.name == @companyName FOR department IN 1..6 INBOUND c isPartOf RETURN { c: c.name, department: department.name, ordered: ( FOR o IN orders FILTER o.contact == department.contact RETURN {date: o.date, amount: o.amount} ) }
  19. 19. FOR d IN v_imdb SEARCH ANALYZER(d.description IN TOKENS('amazing action world alien sci-fi science documental', 'text_en') || BOOST(d.description IN TOKENS('galaxy', 'text_en'), 5), 'text_en') SORT BM25(d) DESC LIMIT 10 FOR vertex, edge, path IN 1..1 INBOUND d imdb_edges FILTER path.edges[0].$label == "DIRECTED" RETURN DISTINCT { "director" : vertex.name, "movie" : d.title } ArangoSearch is a powerful search and similarity ranking engine natively integrated into ArangoDB. Combine search with any other data model. 19 ArangoSearch
  20. 20. Property-Graph-Model Languages ● Tinkerpop/Gremlin ● Cypher ● AQL ● ... ● subject, predicate, and object ● No internal structure of nodes/edges ● Languages ● SPARQL 20 Person name: Max City location: born_in year: 1984 --- RDF Triple Store Ontologies & Logic for Inference
  21. 21. 21 https://w3c.github.io/rdf-star/ <<:bob foaf:age 23>> ex:certainty 0.9 . SELECT ?p ?a ?c WHERE { <<?p foaf:age ?a>> ex:certainty ?c . } Support - Convert to plain RDF (tool) - Optimized storage/processing - Conversion to PG (tool) Max Job1 start end empl oyer
  22. 22. Lab: SPARQL https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Sparql.ipynb
  23. 23. Graph Modelling Edge Attribute Vertex Attribute 23 Person name: Max rated rating: 5 --- Person name: Max Movie: Free Solo: Movie: Free Solo Rating rating: 5 gave rated_by
  24. 24. Lab: Property Graph Queries https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Graphs_Queries.ipynb
  25. 25. Graph Analytics with ArangoDB 25 http://btimmermans.com/2017/12/11/machine-learning-overview/
  26. 26. (Graph) Analytics 26 https://research.aimultiple.com/graph-analytics/
  27. 27. Why Graph?
  28. 28. Knowledge Graphs and Machine Learning
  29. 29. Graph Algorithms ● Search/Traversal ○ Find a node/edge ○ BFS/DFS (already covered) ● Pathfinding ○ How to get from a to b ● Centrality ○ What are the important nodes (e.g., influencer) in a network? ● Cycle Detection ○ Deadlock Detection ○ Network Analysis ● Community Detection ○ Are there subgroups? 30
  30. 30. Shortest Path ● Shortest Path ○ Dijkstra ○ Bellman-Ford ● K shortest path ● Single Source Shortest path ● All-Pairs Shortest Path 31 https://towardsdatascience.com/10-graph-algorithms-visually-explained-e57faa1336f3
  31. 31. Minimal Spanning Tree ● Network Broadcast/routing ● Image segmentation ● Algorithms ○ Prim’s algorithm ■ Extend from random start vertex ○ Kruskal’s algorithm ■ Keep choosing cheapest edges as long as it doesn’t create a cycle 32 https://towardsdatascience.com/10-graph-algorithms-visually-explained-e57faa1336f3
  32. 32. Minimal Spanning Tree 33 https://amortizedminds.wordpress.com/tag/algorithm-2/
  33. 33. Minimal Spanning Tree 34 https://amortizedminds.wordpress.com/tag/algorithm-2/
  34. 34. Cycle Detection ● Deadlock Detection ● Network Analysis ● Algorithms ○ DFS ○ Floyd’s algorithm ■ tortoise and the hare algorithm ○ Brent’s algorithm ○ Johnson’s algorithm 35 https://towardsdatascience.com/10-graph-algorithms-visually-explained-e57faa1336f3
  35. 35. Community Detection ● Triangle Count ● (Strongly )Connected Components ○ Kosaraju’s algorithm ○ Tarjan’s algorithm ● Label Propagation ● Application ○ Social Networks ○ Clustering ○ … https://networkx.github.io/documentation/stable/r eference/algorithms/community.html 36
  36. 36. Topological Sort ● ● ● ● Applications ○ Dependencies ○ Scheduling ■ E.g., Makefiles 37
  37. 37. Maximum flow ● ● ● ● ○ 38
  38. 38. Centrality ● Degree Centrality ○ How many in/outgoing connections ● Closeness Centrality ○ Average closeness to all nodes ● Betweenness Centrality ○ Connecting subgroups ○ How often is node on shortest path ● PageRank ○ Transitive Influence 39 https://www.arangodb.com/docs/stable/graphs-pregel.html#vertex-centrality
  39. 39. 40 https://networkx.github.io/ Graph ToolBox ● Load and store graphs ● Analyze network structure ● Build network models ● Design new network algorithms ● Visualize ● ...
  40. 40. Optional) Lab: NetworkX https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/NetworkX.ipynb
  41. 41. Lab: Graphs Algorithms https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Graph_properties.ipynb
  42. 42. Graph Analytics with ArangoDB 43 Fraud Detections Panama papers Enterprise Hierarchies Permission Management Internet Of Things Bill of Materials Representation Learning ...
  43. 43. 44 https://blog.dgraph.io/post/recommendation/
  44. 44. 45 https://www.independent.co.uk/arts-entertainment/films/features/films-best-wat ch-coronavirus-isolation-quarantine-movies-classic-greatest-essential-list-a939 4006.html
  45. 45. 46 User Movie Rates
  46. 46. 47 User Movie Rates I Collaborative Filtering “Find highly rated movies, by people who also like movies I rated highly” 1. Find movies I rated with 5 stars 2. Find users who also rated these movies also with 5 stars 3. Find additional movies also rated 5 stars by those users
  47. 47. Lab: Graph Analytics https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Graph_Analytics.ipynb
  48. 48. Fraud Detection 49 Bank Collection Branch Collection Customer Vertex Collection Account Vertex Collection Transaction Edge Collection AccountHolder Edge Collection
  49. 49. Lab: Fraud Detection https://colab.research.google.com/github/joerg84/Graph_Powered_ML_Workshop/blob/master/Fraud_Detection.ipynb
  50. 50. 51 PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. Google https://en.wikipedia.org/wiki/PageRank
  51. 51. 52 Goal: How likely a random surfer will end up at a page? - Random walk across link graph - Iteratively distributing rank to neighbouring nodes https://en.wikipedia.org/wiki/PageRank https://stanford.edu/~rezab/classes/cme323/S15/notes/lec8.pdf
  52. 52. 53 https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/
  53. 53. 54 https://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/
  54. 54. Lab: Pregel https://colab.research.google.com/github/arangodb/interactive_tutorials/blob/master/notebooks/Pregel.ipynb
  55. 55. Thanks for listening! Reach out with Feedback/Questions! • @arangodb • https://www.arangodb.com/ • docker pull arangodb https://www.udemy.com/course/getting-started-with-arangodb/

×