Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- The AI Rush by Jean-Baptiste Dumont 1724106 views
- AI and Machine Learning Demystified... by Carol Smith 3856494 views
- 10 facts about jobs in the future by Pew Research Cent... 828498 views
- Harry Surden - Artificial Intellige... by Harry Surden 749126 views
- Inside Google's Numbers in 2017 by Rand Fishkin 1338588 views
- Pinot: Realtime Distributed OLAP da... by Kishore Gopalakri... 605693 views

232 views

Published on

Presentation of Neo4j and Graph Algorithms.

Published in:
Engineering

No Downloads

Total views

232

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

16

Comments

0

Likes

1

No embeds

No notes for slide

- 1. github.com/maxdemarzi About 200 public repositories Max De Marzi Neo4j Field Engineer About Me ! 01 02 03 04 maxdemarzi.com @maxdemarzi
- 2. Experience Technical Doesn’t Matter 75 % 50% 95%
- 3. You go home, thinking about graphs All that matters You
- 4. Property Graph It’s super simple. All you get is: Property Graph Model Properties Nodes Relationships
- 5. What you (probably) already know:
- 6. Joins are executed every time you query the relationship Executing a Join means to search for a key B-Tree Index: O(log(n)) Your data grows by 10x, your time goes up by one step on each Join More Data = More Searches Slower Performance The Problem 1 2 3 4
- 7. Same Data, Different Layout No more Tables, no more Foreign Keys, no more Joins
- 8. Relational Databases can’t handle Relationships Degraded Performance Speed plummets as data grows and as the number of joins grows Wrong Language SQL was built with Set Theory in mind, not Graph Theory Not Flexible New types of data and relationships require schema redesign Wrong Model They cannot model or store relationships without complexity 1 2 3 4
- 9. NoSQL Databases can’t handle Relationships Degraded Performance Speed plummets as you try to join data together in the application Wrong Languages Lots of wacky “almost sql” languages terrible at “joins” Not ACID Eventually Consistent means Eventually Corrupt Wrong Model They cannot model or store relationships without complexity 1 2 3 4
- 10. What’s does this mean?
- 11. Double Linked List Relationships
- 12. What’s Our Secret Sauce?
- 13. Fixed Sized Records “Joins” on Creation Spin Spin Spin through this data structure Pointers instead of Lookups1 2 3 4
- 14. Remains steady as database grows Real Time Query Performance Connectedness and Size of Data Set ResponseTime 0 to 2 hops 0 to 3 degrees Thousands of connections Tens to hundreds of hops Thousands of degrees Billions of connections Relational and Other NoSQL Databases Neo4j Neo4j is 1000x faster Reduces minutes to milliseconds
- 15. I don’t know the average height of all hollywood actors, but I do know the Six Degrees of Kevin Bacon But not for every query
- 16. Reimagine your Data as a Graph Better Performance Query relationships in real time Right Language Cypher was purpose built for Graphs Flexible and Consistent Evolve your schema seamlessly while keeping transactions Right Model Graphs simplify how you think 1 2 3 4 Agile, High Performance and Scalable without Sacrifice
- 17. Modeling
- 18. Just draw stuff and “walla” there is your data model Graphs are Whiteboard Friendly
- 19. Movie Property Graph Some Models are Easy
- 20. Should Roles be their own Node? Some Models are Easy but not for all Questions
- 21. How do you model Flight Data?
- 22. Airports Nodes with Flying To Relationships How do you model Flight Data?
- 23. Maybe Flight should be its own Node? How do you model Flight Data?
- 24. Don’t we care about Flights only on particular Days? How do you model Flight Data?
- 25. What is this trick with the date in the relationship type? How do you model Flight Data?
- 26. We don’t need Airports if we model this way! How do you model Flight Data?
- 27. Lets get Creative
- 28. Group Destinations together! How do you model Flight Data?
- 29. OMG WAT! How do you model Flight Data?
- 30. Do not try and bend the data. That’s im possible.
- 31. If they can do it, you can do it! How do you model Comic Books?
- 32. More Modeling
- 33. Cloning Twitter Building a News Feed 9:00 am @hipster This is what I had for breakfast! <Insert Image of squirrel food> 8:30 am @neo4j Automated tweet telling me about Graph Connect 2017 in NYC on Oct 23-24 8:12 am @ex-coworker Stuff I no longer care about. 8:03 am @someguy Inspirational Quote of the Day
- 34. How do others do it? Cloning Twitter
- 35. How do others do it? Cloning Twitter
- 36. The Wrong Way Modeling a Twitter Feed
- 37. A Better Way Modeling a Twitter Feed
- 38. Bigger Model Modeling a Twitter Feed
- 39. Fixed Sized Records “Joins” on Creation Spin Spin Spin through this data structure Pointers instead of Lookups 1 2 3 4 Neo4j Secret Sauce Yet Again
- 40. MAKETHE QUERIES SCALE …and the database scales with them. …and that’s why we don’t make any money.
- 41. SCALING OUT IS IN FASHION But when your model and your query match you don’t have to.
- 42. getDegree is your Friend
- 43. This is Java. What happened to Cypher?
- 44. Java Core API
- 45. Easy to Learn (no really) Java Core API • Step by Step from GraphDatabaseService • Start a transaction (reads and writes) • findNode(Label, Property, Value) • findNodes(Label, Property, Value) • findNodes(Label) • getNodeById(Long) • getRelationships(Direction, Type) • getProperty(Property, (optional) Default Value)
- 46. Get friends of a User Java Core API
- 47. Traversal API
- 48. Interesting to Learn Traversal API • Start with the Simple Defaults (order, relationships, depth, uniqueness, etc) • Custom Expanders • Where should I go next • Custom Evaluators • I’ve gone there… should I accept this path?
- 49. Example Traversal API
- 50. Cypher
- 51. Cypher: Powerful and Expressive Query Language MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} ) LOVES Dan Ann Label Property Label Property Node Node
- 52. MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report) WHERE boss.name = “John Doe” RETURN sub.name AS Subordinate, count(report) AS Total Express Complex Queries Easily with Cypher Find all direct reports and how many people they manage, up to 3 levels down Cypher QuerySQL Query
- 53. Cypher Stored Procedures
- 54. Combine any APIs Cypher Stored Procedures
- 55. Switch to Decision Tree Deck
- 56. Use Cases
- 57. Understanding User Behavior EventsMetrics TargetingSearching Purchase History
- 58. Learn from the Experts • Alex Beutel, CMU • Leman Akoglu, Stony Brook • Christos Faloutsos, CMU • Graph-Based User Behavior Modeling: From Prediction to Fraud Detection • http://www.cs.cmu.edu/~abeutel/kdd2015_tutorial/
- 59. User Behavior Challenges • How can we understand normal user behavior?
- 60. User Behavior Challenges • How can we understand normal user behavior? • How can we find suspicious behavior?
- 61. User Behavior Challenges • How can we understand normal user behavior? • How can we find suspicious behavior? • How can we distinguish the two?
- 62. Does your little girl like Rambo?
- 63. Demographics: Age
- 64. Demographics: Gender
- 65. Do Little Girls like Movies other Little Girls Like?
- 66. Yes! Little Girls like Movies other Little Girls Like
- 67. What do Little Girls Like? MATCH (u:User)-[r:RATED]->(m:Movie) WHERE u.age = 1 AND u.gender = "F" AND r.stars > 3 RETURN m.title, COUNT(r) AS cnt ORDER BY cnt DESC LIMIT 10
- 68. What do Little Girls Like?
- 69. What do Men 25-34 Like? MATCH (u:User)-[r:RATED]->(m:Movie) WHERE u.age = 25 AND u.gender = "M" AND r.stars > 3 RETURN m.title, COUNT(r) AS cnt ORDER BY cnt DESC LIMIT 10
- 70. What do Men 25-34 Like?
- 71. Modeling “Normal” Behavior • Predict Edges (Similar Users)
- 72. Modeling “Normal” Behavior • Predict Edges (Movies I should Watch)
- 73. What Rating should I give 101 Dalmatians? MATCH (me:User {id:1})-[r1:RATED]->(m:Movie) <-[r2:RATED]-(:User)-[r3:RATED]-> (m2:Movie {title:”101 Dalmatians”}) WHERE ABS(r1.stars-r2.stars) <=1 RETURN AVG(r3.stars)
- 74. Modeling “Normal” Behavior • Predict Edges • Predict Node Attributes • Predict Edge Attributes • Clustering and Community Detection
- 75. Predict a Star Rating purely on Demographics MATCH (u:User)-[r:RATED]->(m:Movie {title:”Toy Story”}) WHERE u.age = 1 AND u.gender = "F" RETURN AVG(r.stars)
- 76. Modeling “Normal” Behavior • Predict Edges • Predict Node Attributes • Predict Edge Attributes • Clustering and Community Detection • Fraud Detection
- 77. First-Party Fraud
- 78. First-Party Fraud • Fraudster’s aim: apply for lines of credit, act normally, extend credit, then…run off with it • Fabricate a network of synthetic IDs, aggregate smaller lines of credit into substantial value • Often a hidden problem since only banks are hit • Whereas third-party fraud involves customers whose identities are stolen • More on that later…
- 79. So what? • $10’s billions lost by banks every year • 25% of the total consumer credit write-offs in the USA • Around 20% of unsecured bad debt in E.U. and N.A. is misclassified • In reality it is first-party fraud
- 80. Fraud Ring
- 81. Then the fraud happens… • Revolving doors strategy • Money moves from account to account to provide legitimate transaction history • Banks duly increase credit lines • Observed responsible credit behavior • Fraudsters max out all lines of credit and then bust out
- 82. … and the Bank loses • Collections process ensues • Real addresses are visited • Fraudsters deny all knowledge of synthetic IDs • Bank writes off debt • Two fraudsters can easily rack up $80k • Well organized crime rings can rack up many times that
- 83. Probable Cohabiters Query MATCH (p1:Person)-[:HOLDS|LIVES_AT]->() <-[:HOLDS|LIVES_AT]-(p2:Person) WHERE p1 <> p2 RETURN DISTINCT p1
- 84. Probably Non-Fraudulent Cohabiters
- 85. Probable Cohabiters Query MATCH (p1:Person)-[:HOLDS|LIVES_AT*]->() <-[:HOLDS|LIVES_AT*]-(p2:Person) WHERE p1 <> p2 RETURN DISTINCT p1 The Star (*) means keep going.
- 86. Dodgy-Looking Chain
- 87. How does Neo4j fit with traditional fraud prevention? http://www.gartner.com/newsroom/id/1695014 Gartner’s Layered Fraud Prevention Approach
- 88. Two Sides of the Same Coin Recommendations • Add the relationship that does not exist Fraud Detection • Find the relationships that should not exist
- 89. Modeling User Behavior • Modeling normal users and detecting anomalies are two sides of understanding user behavior
- 90. Recommendation Engines
- 91. Hello World Recommendation
- 92. Hello World Recommendation
- 93. Movie Data Model
- 94. Cypher Query: Movie Recommendation MATCH (watched:Movie {title:"Toy Story”}) <-[r1:RATED]- (p2) -[r2:RATED]-> (unseen:Movie) WHERE r1.rating > 7 AND r2.rating > 7 AND p2.gender = “female” AND p2.age < 35 AND watched.genres = unseen.genres AND NOT( (p:Person) -[:RATED|WATCHED]-> (unseen) ) AND p.username in [“maxdemarzi”,”janedoe”,”jamesdean”] RETURN unseen.title, COUNT(*) ORDER BY COUNT(*) DESC LIMIT 25 What are the Top 25 Movies • that I haven't seen • with the same genres as Toy Story • given high ratings • by women under 35 who liked Toy Story
- 95. Let’s try k-nearest neighbors (k-NN) Cosine Similarity
- 96. Cypher Query: Ratings of Two Users MATCH (p1:Person {name:'Michael Sherman’}) -[r1:RATED]-> (m:Movie), (p2:Person {name:'Michael Hunger’}) -[r2:RATED]-> (m:Movie) RETURN m.name AS Movie, r1.rating AS `M. Sherman's Rating`, r2.rating AS `M. Hunger's Rating` What are the Movies these 2 users have both rated
- 97. Cypher Query: Ratings of Two Users Calculating Cosine Similarity
- 98. Cypher Query: Cosine Similarity MATCH (p1:Person) -[x:RATED]-> (m:Movie) <-[y:RATED]- (p2:Person) WITH SUM(x.rating * y.rating) AS xyDotProduct, SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating) | xDot + a^2)) AS xLength, SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating) | yDot + b^2)) AS yLength, p1, p2 MERGE (p1)-[s:SIMILARITY]-(p2) SET s.similarity = xyDotProduct / (xLength * yLength) Calculate it for all Person nodes with at least one Movie between them
- 99. Movie Data Model
- 100. Cypher Query: k-NN Recommendation MATCH (m:Movie) <-[r:RATED]- (b:Person) -[s:SIMILARITY]- (p:Person {name:'Zoltan Varju'}) WHERE NOT( (p) -[:RATED|WATCHED]-> (m) ) WITH m, s.similarity AS similarity, r.rating AS rating ORDER BY m.name, similarity DESC WITH m.name AS movie, COLLECT(rating)[0..3] AS ratings WITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS recommendation ORDER BY recommendation DESC RETURN movie, recommendation LIMIT 25 What are the Top 25 Movies • that Zoltan Varju has not seen • using the average rating • by my top 3 neighbors
- 101. Graph Algorithms
- 102. Centralities • PageRank • ArticleRank • Betweenness Centrality (a) • Closeness Centrality (b) • Harmonic Centrality (e) • Eigenvector Centrality (c) • Degree Centrality (d)
- 103. Community Detection • Louvain • Label Propagation • Connected Components • Strongly Connected Components • Triangle Counting/Clustering Coefficient • Balanced Triads
- 104. Louvain
- 105. Path Finding • Minimum Weight Spanning Tree • Shortest Path • Single Source Shortest Path • All Pairs Shortest Path • A* • Yen’s K-Shortest Paths • Random Walk
- 106. Similarity • Jaccard Similarity • Cosine Similarity • Pearson Similarity • Euclidian Distance • Overlap Similarity
- 107. Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachments • Resource Allocation • Same Community • Total Neighbors
- 108. Common Neighbors
- 109. Adamic Adar • Builds on Common Neighbors Instead of just Count…compute: • The Sum of the Inverse Log of the degree of each Neighbor See “Friends and neighbors on the Web” by Lada A. Adamic and Eytan Adar
- 110. Sub Graph Features
- 111. Ego-net Patterns
- 112. Ego-net Patterns Ni: number of neighbors of ego i Ei: number of edges in egonet i Wi: total weight of egonet i λw,i: principal eigenvalue of the weighted adjacency matrix of egonet i
- 113. Power Law Density slope=2 slope=1 slope=1.35
- 114. Power Law Weight
- 115. Power Law Eigenvalue
- 116. Why? Why are we doing all this?
- 117. Extract Features from Graph • One of the 1st steps in machine learning from graphs is to extract graph features. Structure of Data >> Data
- 118. Deep Neural Networks for Bank Fraud (2015) https://www.youtube.com/watch?v=TAer-PeIypI Fraud Detection starts about half-way (after intro)
- 119. What else?
- 120. Graph Sage http://snap.stanford.edu/graphsage/
- 121. Link Prediction Based using (SEAL) Link Prediction Based on Graph Neural Networks
- 122. Motifs Link Prediction via Higher-Order Motif Features
- 123. Motifs Link Prediction via Higher-Order Motif Features
- 124. Knowledge Graphs Explainable Reasoning over Knowledge Graphs for Recommendation
- 125. Knowledge Graphs Explainable Reasoning over Knowledge Graphs for Recommendation
- 126. Knowledge Graphs Explainable Reasoning over Knowledge Graphs for Recommendation
- 127. Indirect Relationships
- 128. Connecting unconnected Things indirectly
- 129. What are the Top 10 Jobs for me • that are in the same location I’m in • for which I have the necessary qualifications
- 130. Partial Subgraph Search
- 131. Graph Search
- 132. Don’t use SOLR Facets for this! Multiple Dimensions AgeSize FeaturesProperty Cost
- 133. Multiple Dimensions Java Audio Book! What about Publisher? What about Author? What about Publication Year? What about Java Version? What About…. Left parentheses, n, right parentheses, semi-colon!
- 134. Bucket or Group Values if you have to Discrete Values for Each Dimension
- 135. Nodes for Discrete Dimensional Values Dimensional Model *Use Named Relationship Types instead of HAS
- 136. Catalogs
- 137. Stupid Glasses Loud Pants Skate Boards Neon Colors 1 2 3 4 Who remembers this?
- 138. Look at how thick they were, even back in 1902! It’s a Sears Catalogue!
- 139. Ares Predator Street Samurai Catalog
- 140. With free two day shipping! Cypher Version of the Catalog
- 141. A tree is a simple graph A Tree of Data
- 142. So fast, it’s not even funny. Promotions About 2-4M Traversals per second per core Traversing a 50 level Tree UP costs practically nothing.
- 143. and many more use cases!
- 144. Thank You!

No public clipboards found for this slide

Be the first to comment