Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fraud Detection Class Slides

1,130 views

Published on

Neo4j Fraud Detection Training Class

Published in: Technology

Fraud Detection Class Slides

  1. 1. Graphs in Fraud Detection Max De Marzi Field Engineer, Neo4j @maxdemarzi
  2. 2. About Me • Max De Marzi - Neo4j Field Engineer • My Blog: http://maxdemarzi.com • Find me on Twitter: @maxdemarzi • Email me: maxdemarzi@gmail.com • GitHub: http://github.com/maxdemarzi
  3. 3. Overview Types of Fraud • Credit Card Fraud • First-Party Fraud • Synthetic Identities and Fraud Rings • Insurance Fraud Types of Analysis • Traditional Analysis • Graph-Based Analysis Fraud Detection and Prevention Common Questions
  4. 4. …but before we get into that … • What isn’t Fraud?
  5. 5. I don’t know, but I know who does • Alex Beutel, CMU • Leman Akoglu, Stony Brook • Christos Faloutsos, CMU • Graph-Based User Behavior Modeling: From Prediction to Fraud Detection • http://www.cs.cmu.edu/~abeutel/kdd2015_tutorial/
  6. 6. User Behavior Challenges • How can we understand normal user behavior?
  7. 7. User Behavior Challenges • How can we understand normal user behavior? • How can we find suspicious behavior?
  8. 8. User Behavior Challenges • How can we understand normal user behavior? • How can we find suspicious behavior? • How can we distinguish the two?
  9. 9. Users
  10. 10. Does your little girl like Rambo?
  11. 11. Personalization
  12. 12. Understanding our Users • What do we know about them?
  13. 13. Demographics: Age
  14. 14. Demographics: Gender
  15. 15. Understanding our Users MATCH (u:User)-[r:RATED]->(m:Movie)
 RETURN u.gender, u.age, 
 COUNT (DISTINCT u) AS user_cnt, 
 COUNT (DISTINCT m) AS mov_cnt, 
 COUNT(r) AS rtg_cnt
  16. 16. Understanding our Users
  17. 17. Understanding our Users MATCH (me:User {id:1}) -[r1:RATED]-> (m:Movie) 
 <-[r2:RATED]- (similar_users:User)
 WHERE ABS(r1.stars-r2.stars) <= 1 
 RETURN similar_users.gender, 
 similar_users.age, 
 COUNT(DISTINCT similar_users) AS user_cnt, 
 COUNT(r2) AS rtg_cnt
  18. 18. Understanding our Users
  19. 19. Little Girls like Movies other Little Girls Like
  20. 20. Little Girls like Movies other Little Girls Like
  21. 21. What do Little Girls Like? MATCH (u:User)-[r:RATED]->(m:Movie)
 WHERE u.age = 1 AND u.gender = "F" AND r.stars > 3
 RETURN m.title, COUNT(r) AS cnt
 ORDER BY cnt DESC
 LIMIT 10
  22. 22. What do Little Girls Like?
  23. 23. What do Men 25-34 Like? MATCH (u:User)-[r:RATED]->(m:Movie)
 WHERE u.age = 25 AND u.gender = "M" AND r.stars > 3
 RETURN m.title, COUNT(r) AS cnt
 ORDER BY cnt DESC
 LIMIT 10
  24. 24. What do Men 25-34 Like?
  25. 25. Modeling “Normal” Behavior • Predict Edges
 (Similar Users)
  26. 26. Modeling “Normal” Behavior • Predict Edges
 (Movies I should Watch)
  27. 27. Recommendation Engine with Neo4j Recommendation
  28. 28. Content Based Recommendations • Step 1: Collect Item Characteristics • Step 2: Find similar Items • Step 3: Recommend Similar Items • Example: Similar Movie Genres
  29. 29. There is more to life than Romantic Zombie-coms
  30. 30. Collaborative Filtering Recommendations • Step 1: Collect User Behavior • Step 2: Find similar Users • Step 3: Recommend Behavior taken by similar users • Example: People with similar musical tastes
  31. 31. You are so original!
  32. 32. Using Relationships for Recommendations Content-based filtering Recommend items based on what users have liked in the past Collaborative filtering Predict what users like based on the similarity of their behaviors, activities and preferences to others Movie Person Person RATED SIMILARITY rating: 7 value: .92
  33. 33. Hybrid Recommendations • Combine the two for better results • Like Peanut Butter and Jelly
  34. 34. Hello World Recommendation
  35. 35. Hello World Recommendation X
  36. 36. Movie Data Model
  37. 37. Cypher Query: Movie Recommendation MATCH (watched:Movie {title:"Toy Story”}) <-[r1:RATED]- () -[r2:RATED]-> (unseen:Movie) WHERE r1.rating > 7 AND r2.rating > 7 AND watched.genres = unseen.genres AND NOT( (:Person {username:”maxdemarzi"}) -[:RATED]-> (unseen) ) RETURN unseen.title, COUNT(*) ORDER BY COUNT(*) DESC LIMIT 25 What are the Top 25 Movies • that I haven't seen • with the same genres as Toy Story • given high ratings • by people who liked Toy Story
  38. 38. Let’s try k-nearest neighbors (k-NN) Cosine Similarity
  39. 39. Cypher Query: Ratings of Two Users MATCH (p1:Person {name:'Michael Sherman’}) -[r1:RATED]-> (m:Movie), (p2:Person {name:'Michael Hunger’}) -[r2:RATED]-> (m:Movie) RETURN m.name AS Movie, 
 r1.rating AS `M. Sherman's Rating`, r2.rating AS `M. Hunger's Rating` What are the Movies these 2 users have both rated
  40. 40. Cypher Query: Ratings of Two Users Calculating Cosine Similarity
  41. 41. Cypher Query: Cosine Similarity MATCH (p1:Person) -[x:RATED]-> (m:Movie) <-[y:RATED]- (p2:Person) WITH SUM(x.rating * y.rating) AS xyDotProduct, SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating) | xDot + a^2)) AS xLength, SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating) | yDot + b^2)) AS yLength, p1, p2 MERGE (p1)-[s:SIMILARITY]-(p2) SET s.similarity = xyDotProduct / (xLength * yLength) Calculate it for all Person nodes with at least one Movie between them
  42. 42. Movie Data Model (v2)
  43. 43. Cypher Query: Your nearest neighbors MATCH (p1:Person {name:'Grace Andrews’}) -[s:SIMILARITY]- (p2:Person) WITH p2, s.score AS sim RETURN p2.name AS Neighbor, sim AS Similarity ORDER BY sim DESC LIMIT 5 Who are the • top 5 Persons and their similarity score • ordered by similarity in descending order • for Grace Andrews
  44. 44. Your nearest neighbors
  45. 45. Cypher Query: k-NN Recommendation MATCH (m:Movie) <-[r:RATED]- (b:Person) -[s:SIMILARITY]- (p:Person {name:'Zoltan Varju'}) WHERE NOT( (p) -[:RATED]-> (m) ) WITH m, s.similarity AS similarity, r.rating AS rating ORDER BY m.name, similarity DESC WITH m.name AS movie, COLLECT(rating)[0..3] AS ratings WITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS recommendation ORDER BY recommendation DESC RETURN movie, recommendation
 LIMIT 25 What are the Top 25 Movies • that Zoltan Varju has not seen • using the average rating • by my top 3 neighbors
  46. 46. Modeling “Normal” Behavior • Predict Edges • Predict Node Attributes
 (Age, Gender, etc) Age: 35 Age: ?
  47. 47. Modeling “Normal” Behavior • Predict Edges • Predict Node Attributes • Predict Edge Attributes
 (Rating)
  48. 48. What Rating should I give 101 Dalmatians? MATCH (me:User {id:1})-[r1:RATED]->(m:Movie)
 <-[r2:RATED]-(:User)-[r3:RATED]->
 (m2:Movie {title:”101 Dalmatians”})
 WHERE ABS(r1.stars-r2.stars) <=1
 RETURN AVG(r3.stars)
  49. 49. Modeling “Normal” Behavior • Predict Edges • Predict Node Attributes • Predict Edge Attributes • Clustering and Community Detection
  50. 50. Predict a Star Rating purely on Demographics MATCH (u:User)-[r:RATED]->(m:Movie {title:”Toy Story”})
 WHERE u.age = 1 AND u.gender = "F" 
 RETURN AVG(r.stars)
  51. 51. Modeling “Normal” Behavior • Predict Edges • Predict Node Attributes • Predict Edge Attributes • Clustering and Community Detection • Fraud Detection
  52. 52. Two Sides of the Same Coin Recommendations • Add the relationship that does not exist Fraud Detection • Find the relationships that should not exist
  53. 53. Modeling User Behavior • Modeling normal users and detecting anomalies are two sides of understanding user behavior
  54. 54. Modeling User Behavior • Modeling normal users and detecting anomalies are two sides of understanding user behavior • Rough Model of normal vs outlier
  55. 55. Modeling User Behavior • Modeling normal users and detecting anomalies are two sides of understanding user behavior. • Fine grained models can find more subtle outliers
  56. 56. Modeling User Behavior • Modeling normal users and detecting anomalies are two sides of understanding user behavior • Complex models can capture normal and abnormal patterns
  57. 57. Modeling User Behavior • Modeling normal users and detecting anomalies are two sides of understanding user behavior • Known fraudulent patterns can be searched for directly
  58. 58. Credit Card Fraud
  59. 59. Cross Reference
  60. 60. Find the Nodes ArrayList<Node> nodes = new ArrayList<Node>();
 nodes.add(db.findNode(Labels.CC, “number”, card)); nodes.add(db.findNode(Labels.Phone, “number”, phone)); nodes.add(db.findNode(Labels.Email, “address”, address)); nodes.add(db.findNode(Labels.IP, “address”, ip));
  61. 61. Add the Crosses for(Node node : nodes){ HashMap<String, AtomicInteger> crosses = new HashMap<String, AtomicInteger>(); crosses.put("CCs", new AtomicInteger(0)); crosses.put("Phones", new AtomicInteger(0)); crosses.put("Emails", new AtomicInteger(0)); crosses.put("IPs", new AtomicInteger(0)); for ( Relationship relationship : node.getRelationships(RELATED, Direction.BOTH) ){ Node thing = relationship.getOtherNode(node); String type = thing.getLabels().iterator().next().name() + "s"; crosses.get(type).getAndIncrement(); } results.add(crosses); }
  62. 62. Examine Results [{"ips":4,"emails":7,"ccs":0,"phones":4}, -- cc returned 4 ips, 7 emails, and 3 phones. {"ips":1,"emails":1,"ccs":1,"phones":0}, -- phone returned just 1 item for each cross reference check. {"ips":2,"emails":0,"ccs":4,"phones":3}, -- email returned 2 ips, 4 credit cards and 3 phones. {"ips":0,"emails":1,"ccs":3,"phones":2}] -- ip returned 3 credit cards and 2 phones.
  63. 63. What is a subgraph? KDD 2015 2
  64. 64. Subgraphs
  65. 65. What is a subgraph? KDD 2015 3 • A Subset of nodes and the edges between them
  66. 66. What are some useful subgraphs? Largest dense subgraph (Greatest average degree)
  67. 67. What are some useful subgraphs? E Ego-network: the subgraph among a node and its neighbors
  68. 68. What are some useful subgraphs? Graph queries: find subgraphs of particular pattern
  69. 69. What are some useful subgraphs? Graph queries: find subgraphs of particular pattern MATCH (a)--(b)--(c)--(a)
 RETURN *
  70. 70. What are some useful subgraphs? Graph queries: find subgraphs of particular pattern
  71. 71. What are some useful subgraphs? Graph queries: find subgraphs of particular pattern
  72. 72. What are some useful subgraphs? Graph queries: find subgraphs of particular pattern
  73. 73. What are some useful subgraphs? Graph queries: find subgraphs of particular pattern
  74. 74. What are some useful subgraphs? Graph queries: find subgraphs of particular pattern MATCH (a)—(b)—(c)— (d)—(a)—(c), (d)—(b)
 RETURN *
  75. 75. Graphs as Matrices
  76. 76. Clustering gives Clarity Link
  77. 77. Ego-net Patterns
  78. 78. Ego-net Patterns Ni: number of neighbors of ego i Ei: number of edges in egonet i Wi: total weight of egonet i λw,i: principal eigenvalue of the weighted adjacency matrix of egonet i
  79. 79. Power Law Density slope=2 slope=1 slope=1.35
  80. 80. Power Law Weight
  81. 81. Power Law Eigenvalue
  82. 82. Find Groups within Ego-Nets
  83. 83. Find Groups within Ego-Nets Link
  84. 84. First-Party Fraud
  85. 85. First-Party Fraud • Fraudster’s aim: apply for lines of credit, act normally, extend credit, then…run off with it • Fabricate a network of synthetic IDs, aggregate smaller lines of credit into substantial value • Often a hidden problem since only banks are hit • Whereas third-party fraud involves customers whose identities are stolen • More on that later…
  86. 86. So what? • $10’s billions lost by banks every year • 25% of the total consumer credit write-offs in the USA • Around 20% of unsecured bad debt in E.U. and N.A. is misclassified • In reality it is first-party fraud
  87. 87. Fraud Ring
  88. 88. Then the fraud happens… • Revolving doors strategy • Money moves from account to account to provide legitimate transaction history • Banks duly increase credit lines • Observed responsible credit behavior • Fraudsters max out all lines of credit and then bust out
  89. 89. … and the Bank loses • Collections process ensues • Real addresses are visited • Fraudsters deny all knowledge of synthetic IDs • Bank writes off debt • Two fraudsters can easily rack up $80k • Well organized crime rings can rack up many times that
  90. 90. Discrete Analysis Fails to predict…
  91. 91. …and Makes it Hard to React • When the bust out starts to happen, how do you know what to cancel? • And how do you do it faster then the fraudster to limit your losses? • A graph, that’s how!
  92. 92. Probably Non-Fraudulent Cohabiters
  93. 93. Probable Cohabiters Query MATCH (p1:Person)-[:HOLDS|LIVES_AT*]->()
 <-[:HOLDS|LIVES_AT*]-(p2:Person) WHERE p1 <> p2 RETURN DISTINCT p1
  94. 94. Dodgy-Looking Chain
  95. 95. Risky People MATCH (p1:Person)-[:HOLDS|LIVES_AT]->() <-[:HOLDS|LIVES_AT]-(p2:Person) -[:HOLDS|LIVES_AT]->() <-[:HOLDS|LIVES_AT]-(p3:Person) WHERE p1 <> p2 AND p2 <> p3 AND p3 <> p1 WITH collect (p1.name) + collect(p2.name) + collect(p3.name) AS names UNWIND names AS fraudster RETURN DISTINCT fraudster
  96. 96. Pretty quick… Number of people: [5163] Number of fraudsters: [40] Time taken: [100] ms
  97. 97. Localize the focus MATCH (p1:Person {name:'Sol'})-[:HOLDS|LIVES_AT]-()… Number of fraudsters: [5] Time taken: [13] ms
  98. 98. Stop a bust-out
 in ms.
  99. 99. Quickly Revoke Cards in Bust-Out MATCH (p1:Person)-[:HOLDS|LIVES_AT]->() <-[:HOLDS|LIVES_AT]-(p2:Person) -[:HOLDS|LIVES_AT]->() <-[:HOLDS|LIVES_AT]-(p3:Person) WHERE p1 <> p2 AND p2 <> p3 AND p3 <> p1 WITH collect (p1) + collect(p2)+ collect(p3)
 AS names UNWIND names AS fraudster MATCH (fraudster)-[o:OWNS]->(card:CreditCard) DELETE o, card
  100. 100. Auto Fraud
  101. 101. Whiplash http://georgia-clinic.com/blog/wp-content/uploads/2013/10/whiplash.jpg
  102. 102. Whiplash for Cash http://georgia-clinic.com/blog/wp-content/uploads/2013/10/whiplash.jpg http://cdn2.holytaco.com/wp-content/uploads/2012/06/lottery-winner.jpg
  103. 103. Whiplash for Cash Example Accidents Cars Doctor Attorney People Drives Is Passenger Drivers
 Passengers
 Witnesses
  104. 104. Risk • $80,000,000,000 annually on auto insurance fraud and growing • Even small % reductions are worthwhile! • British policyholders pay ~£100 per year to cover fraud • US drivers pay $200-$300 per year according to US National Insurance Crime Bureau
  105. 105. Regular Drivers
  106. 106. Regular Drivers Query MATCH (p:Person)-[:DRIVES]->(c:Car) WHERE NOT (p)<-[:BRIEFED]-(:Lawyer) AND NOT (p)<-[:EXAMINED]-(:Doctor) AND NOT (p)-[:WITNESSED]->(:Car) AND NOT (p)-[:PASSENGER_IN]->(:Car) RETURN p,c LIMIT 100
  107. 107. Genuine Claimants
  108. 108. Genuine Claimants Query MATCH (p:Person)-[:DRIVES]->(:Car), (p)<-[:BRIEFED]-(:Lawyer), (p)<-[:EXAMINED]-(:Doctor) OPTIONAL MATCH (p)-[w:WITNESSED]->(:Car), (p)-[pi:PASSENGER_IN]->(:Car) RETURN p, count(w) AS noWitnessed,
 count(pi) as noPassengerIn
  109. 109. Fraudsters
  110. 110. Fraudsters MATCH (p:Person)-[:DRIVES]->(:Car), (p)<-[:BRIEFED]-(:Lawyer), (p)<-[:EXAMINED]-(:Doctor), (p)-[w:WITNESSED]->(:Car), (p)-[pi:PASSENGER_IN]->(:Car) WITH p, count(w) AS noWitnessed, 
 count(pi) as noPassengerIn WHERE noWitnessed > 1 OR noPassengerIn > 1 RETURN p
  111. 111. Auto-fraud Graph • Once you have the fraudsters, finding their support team is easy. • (fraudster)<-[:EXAMINED]-(d:Doctor) • (fraudster)<-[:BRIEFED]-(l:Lawyer) • And it’s also easy to find their passengers • (fraudster)-[:DRIVES]->(:Car)<-[:PASSENGER_IN]-(p) • And easy to find other places where they’ve maybe committed fraud • (fraudster)-[:WITNESSED]->(:Car) • (fraudster)-[:PASSENGER_IN]->(:Car) • And you can see this in milliseconds!
  112. 112. It’s all about the patterns
  113. 113. Phony Persona
  114. 114. Online Payments Fraud (First-Party) • Stealing credentials is commonplace • Phishing, malware, simple naïve users • Buying stolen credit card numbers is easy • How should one protect against seemingly fine credentials? • And valid credit card numbers?
  115. 115. We are all little stars • Username and passwords • Two-factor auth • IP addresses, cookies • Credit card, paypal account • Some gaming sites already do some of this • Arts and Crafts platform Etsy already embraced the idea of graph of identity
  116. 116. An Individual Identity Subgraph 128.240.229.18 fred@rbs.co.uk 1234LOL
  117. 117. We are all made of stars…
  118. 118. Other Specific Considerations Specific Weighted Identity Query MATCH (u:User {username:'Jim', password: 'secret'}) OPTIONAL MATCH (u) -[cookie:PROVIDED]->(:Cookie {id:'1234'}) OPTIONAL MATCH (u)-[address:FROM]->(:IP {network:'128.240.0.0'}) RETURN SUM(cookie.weighting) + SUM(address.weighting) AS score Bare Minimum Other Specific Considerations Final Decision
  119. 119. General Weighted Identity Query MATCH (u:User {username:'Jim', password: 'secret'}) OPTIONAL MATCH (u)-[rel]->() WHERE has(rel.weighting) RETURN SUM(rel.weighting) AS score Bare Minimum All Available Weightings Final Decision
  120. 120. An Individual Login History fred@rbs.co.uk 1234LOL
  121. 121. From 1st to 3rd Party • The 1st party identity graph can easily be extended to 3rd party fraud • Like in the bank fraud ring, fraudsters can mix-n-match claims • Start with a few phished accounts and expand from there!
  122. 122. Shared Connections 128.240.229.18 fred@rbs.co.uk 1234LOL nick@bearings.com Ca$hMon£y
  123. 123. Graphing Shared Connections Hmm….
  124. 124. Scan for Potential Fraudsters MATCH (u1:User)--(x)--(u2:User) WHERE u1 <> u2 AND NOT (x:IP) RETURN x Network in common is OK
  125. 125. Stop specific fraudster network, quickly MATCH path = 
 (u1:User {username: 'Jim'})-[*]-(x)-[*]-(u2:User) WHERE u1<>u2 AND NOT (x:IP) AND NOT (x:User) RETURN path
  126. 126. How do these fit with traditional fraud prevention? http://www.gartner.com/newsroom/id/1695014 Gartner’s Layered Fraud Prevention Approach
  127. 127. Demo Time
  128. 128. Bank Fraud http://gist.neo4j.org/?dfdfbddfdc63f4858f80
  129. 129. Credit Card Fraud Detection http://gist.neo4j.org/?3ad4cb2e3187ab21416b
  130. 130. Whiplash for Cash http://gist.neo4j.org/?6bae1e799484267e3c60
  131. 131. Ask for help if you get stuck • Online training - http://neo4j.com/graphacademy/ • Videos - http://vimeo.com/neo4j/videos • Use cases - http://www.neotechnology.com/industries-and-use-cases/ • Meetups • Books to get your started • http://www.graphdatabases.com • http://neo4j.com/book-learning-neo4j/
  132. 132. Deep Neural Networks for Bank Fraud https://www.youtube.com/watch?v=TAer-PeIypI Fraud Detection starts about half-way (after intro)
  133. 133. Thanks for listening @maxdemarzi

×