Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fraud Detection and Neo4j

534 views

Published on

Fraud Rings and Credit Card Fraud in Neo4j

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Fraud Detection and Neo4j

  1. 1. Stop Complex Fraud in its Tracks with Neo4j
  2. 2. github.com/maxdemarzi About 200 public repositories Max De Marzi Neo4j Field Engineer About Me ! 01 02 03 04 maxdemarzi.com @maxdemarzi About 160 blog posts
  3. 3. Who Are Today’s Fraudsters?
  4. 4. Organized in groups Synthetic Identities Stolen Identities Hijacked Devices Who Are Today’s Fraudsters?
  5. 5. Types of Fraud • Credit Card Fraud • Rogue Merchants • Fraud Rings • Insurance Fraud • eCommerce Fraud • Fraud we don’t know about yet…
  6. 6. “Don’t consider traditional technology adequate to keep up with criminal trends” Market Guide for Online Fraud Detection, April 27, 2015
  7. 7. Fraud Detection
  8. 8. Endpoint-Centric Analysis of users and their end-points 1. Navigation Centric Analysis of navigation behavior and suspect patterns 2. Account-Centric Analysis of anomaly behavior by channel 3. PC:s Mobile Phones IP-addresses User ID:s Comparing Transaction Identity Vetting Traditional Fraud Detection Methods
  9. 9. • Fraud rings • Fake IP-adresses • Hijacked devices • Synthetic Identities • Stolen Identities • And more… Weaknesses DISCRETE ANALYSIS Endpoint-Centric Analysis of users and their end-points 1. Navigation Centric Analysis of navigation behavior and suspect patterns 2. Account-Centric Analysis of anomaly behavior by channel 3. Traditional Fraud Detection Methods
  10. 10. CONNECTED ANALYSIS Endpoint-Centric Analysis of users and their end-points Navigation Centric Analysis of navigation behavior and suspect patterns Account-Centric Analysis of anomaly behavior by channel DISCRETE ANALYSIS 1. 2. 3. Cross Channel Analysis of anomaly behavior correlated across channels 4. Entity Linking Analysis of relationships to detect organized crime and collusion 5. Augmented Fraud Detection
  11. 11. INVESTIGATE Revolving Debt Number of Accounts INVESTIGATE Normal behavior Fraud Detection with Discrete Analysis
  12. 12. Revolving Debt Number of Accounts Normal behavior Fraudulent pattern Fraud Detection with Connected Analysis
  13. 13. Fraud Detection with Graphs
  14. 14. Subgraph Patterns Ni: number of neighbors of a node Ei: number of relationships in a subgraph Wi: total “weight” of a subgraph λw,i: largest variability of the “weights” of a subgraph
  15. 15. Power Law Density slope=2 slope=1 slope=1.35
  16. 16. Power Law Weight
  17. 17. Power Law Weight Variability
  18. 18. •Influence •Power •Status •Control •Independence •Information Which Nodes are Important? Graph Famous
  19. 19. •PageRank •ArticleRank •Betweenness Centrality •Closeness Centrality •Eigenvector Centrality •Degree Centrality •Harmonic Centrality Centrality
  20. 20. Add Graph Features to your existing Fraud Detection Models Account ID Community Size Degree PageRank 1 31 15 10.7 3 4 12 3.4 5 98 9 11.2 •Influence •Relationships •Communities
  21. 21. Fraud Rings
  22. 22. Let’s create some Users CREATE (john:User {name:"John"})
 CREATE (sheila:User {name:"Sheila"})
 CREATE (karen:User {name:"Karen"})
  23. 23. Sheila Modeling a fraud ring as a graph John Karen
  24. 24. John has a few accounts CREATE (cc1:Card {number:"4012888888881881", balance: 493.23}) CREATE (ba1:Account {number:"85474584", balance:1322.30, type:”Checking"}) CREATE (us1:Loan {number:"63493639", balance:5000.00, type:"Loan”}) CREATE (john)-[:HAS_ACCOUNT]->(cc1) CREATE (john)-[:HAS_ACCOUNT]->(ba1) CREATE (john)-[:HAS_ACCOUNT]->(us1)
  25. 25. Sheila also has an Identification Number CREATE (ba2:Account {number:"25384738", balance:2983.99, type:"Checking"}) CREATE (cc2:Card {number:"5105105105105100", balance: 893.11}) CREATE (ssn2:Identification {number:"000-42-4329", type:”SSN"}) CREATE (sheila)-[:HAS_ACCOUNT]->(ba2) CREATE (sheila)-[:HAS_ACCOUNT]->(cc2) CREATE (sheila)-[:HAS_ID]->(ssn2)
  26. 26. Karen has a phone number CREATE (ba3:Account {number:"63493639", balance:3204.83, type:"Checking"}) CREATE (us2:Loan {number:"28372342", balance:5000.00, type:"Loan"}) CREATE (phone2:Phone {number:”312-606-0842"}) CREATE (karen)-[:HAS_ACCOUNT]->(ba3) CREATE (karen)-[:HAS_ACCOUNT]->(us2) CREATE (karen)-[:HAS_PHONE]->(phone2)
  27. 27. Sheila John Karen CREDIT CARD BANK ACCOUNT BANK ACCOUNT BANK ACCOUNT PHONE NUMBER UNSECURED LOAN SSN 2 UNSECURED LOAN Nothing suspicious yet CREDIT CARD
  28. 28. John and Sheila are 
 sharing a phone number CREATE (phone1:Phone {number:"312-876-5309"}) CREATE (john)-[:HAS_PHONE]->(phone1) CREATE (sheila)-[:HAS_PHONE]->(phone1)
  29. 29. John and Karen are sharing an Identification Number CREATE (ssn1:Identification {number:"000-91-7434", type:"SSN"}) CREATE (john)-[:HAS_ID]->(ssn1) CREATE (karen)-[:HAS_ID]->(ssn1)
  30. 30. They all share the same address CREATE (ad:Address {line1:"175 N. Harbor Drive", city:"Chicago", state:"IL", zip:”60601"}) CREATE (john)-[:HAS_ADDRESS]->(ad) CREATE (karen)-[:HAS_ADDRESS]->(ad) CREATE (sheila)-[:HAS_ADDRESS]->(ad)
  31. 31. Sheila John Karen CREDIT CARD BANK ACCOUNT BANK ACCOUNT BANK ACCOUNT ADDRESS PHONE NUMBER PHONE NUMBER SSN 1 UNSECURED LOAN SSN 2 UNSECURED LOAN Starting to connect the dots… CREDIT CARD
  32. 32. Let’s add Robert CREATE (robert:User {name:"Robert"}) CREATE (ba4:Account {number:"8374927", balance:1273.39, type:"Checking"}) CREATE (cc3:Card {number:"378282246310005", balance: 134.72}) CREATE (robert)-[:HAS_ACCOUNT]->(ba4) CREATE (robert)-[:HAS_ACCOUNT]->(cc3)
  33. 33. • Triangle Count • Clustering Coefficients • Connected Components (Union Find) • Strongly Connected Components • Label Propagation • Louvain Modularity • Balanced Triad (identification) Community Detection • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank • Personalized PageRank • ArticleRank • Eigenvector Centrality • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Centrality / Importance Similarity Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors Graph Algorithms
  34. 34. Union Find Graph Algorithm Finds sets where all nodes can reach all other nodes •Fraud Detection •Deduplication •Entity Resolution See “The Real Property Graph” blog post:
  35. 35. Union Find Graph Algorithm CALL algo.unionFind.stream(   'MATCH (p:User) RETURN id(p) as id',   'MATCH (p1:User)-->()<--(p2:User)    RETURN id(p1) as source, id(p2) as target',   {graph:'cypher'} ) YIELD nodeId, setId RETURN algo.asNode(nodeId).name AS user, setId
  36. 36. Union Find Graph Algorithm
  37. 37. Connected Components Set 3Set 0
  38. 38. Sheila John Karen CREDIT CARD BANK ACCOUNT BANK ACCOUNT BANK ACCOUNT PHONE NUMBER UNSECURED LOAN SSN 2 UNSECURED LOAN Let’s pretend they aren't that stupid CREDIT CARD
  39. 39. They called from the same number MATCH (john:User {name:"John"}), (sheila:User {name:"Sheila"}) CREATE (ani:ANI {number:"312-666-1234"}) CREATE (ani)-[:CALLED]->(john) CREATE (ani)-[:CALLED]->(sheila)
  40. 40. They logged on using the same browser MATCH (john:User {name:”John"}), (robert:User {name:”Robert"}) CREATE (fg:Browser {fingerprint:”asdf7373jsdf3rw"}) CREATE (fg)-[:ACCESSED]->(john) CREATE (fg)-[:ACCESSED]->(robert)
  41. 41. Sheila John Robert CREDIT CARD BANK ACCOUNT BANK ACCOUNT BANK ACCOUNT BROWSER ANI NUMBER PHONE NUMBER UNSECURED LOAN SSN 2 UNSECURED LOAN Connected by their “connections” CREDIT CARD
  42. 42. Union Find Graph Algorithm again CALL algo.unionFind.stream(   'MATCH (p:User) RETURN id(p) as id',   'MATCH (p1:User)<-[:ACCESSED]-()-[:ACCESSED]->(p2:User)    RETURN id(p1) as source, id(p2) as target',   {graph:'cypher'} ) YIELD nodeId, setId RETURN algo.asNode(nodeId).name AS user, setId
  43. 43. Union Find Graph Algorithm
  44. 44. Connected Components Set 0 Set 2
  45. 45. Why don’t we do both?
  46. 46. Store the results of Union Find CALL algo.unionFind(   'MATCH (p:User) RETURN id(p) as id',   'MATCH (p1:User)--()--(p2:User)    RETURN id(p1) as source, id(p2) as target',   {graph:'cypher'} ) YIELD setCount
  47. 47. Let’s see these partitions MATCH (n:User) RETURN n.partition, COUNT(*) AS members, COLLECT(n.name) AS names ORDER BY members DESC
  48. 48. Our fraudsters are all connected
  49. 49. Our fraudsters are all connected
  50. 50. Not all people are supposed to be loyal to you. Some are meant to come along as a reminder to watch the company you keep. Bad News:
  51. 51. Good News:
  52. 52. Credit Card Fraud
  53. 53. Manual skimming of an ATM Sophisticated Data Breaches Credit Card Information get stolen all the time Rogue Merchant
  54. 54. USE ISSUES Terminal ATM- skimming Data Breach Card Holder Card Issuer Fraudster USE $5MAKES $10 MAKES $2 MAKES MAKES $2000 AT Testing Merchants ATMAKES Tx
  55. 55. Credit Card Transactions as a Graph CREATE (john:User {name:"John"}) CREATE (m1:Merchant {name:"Computer Store"}) CREATE (m2:Merchant {name:"Gas Station"}) CREATE (m3:Merchant {name:"Jewelry Store"}) CREATE (m4:Merchant {name:"Furniture Store"}) CREATE (tx1:Transaction:Fraudulent {amount: 2000.00, date:datetime()}) CREATE (tx2:Transaction {amount: 35.00, date:datetime() - duration('P1D')}) CREATE (tx3:Transaction {amount: 25.00, date:datetime() - duration('P2D')}) CREATE (tx4:Transaction {amount: 12.00, date:datetime() - duration('P3D')}) CREATE (tx1)-[:AT_MERCHANT]->(m1) CREATE (tx2)-[:AT_MERCHANT]->(m2) CREATE (tx3)-[:AT_MERCHANT]->(m3) CREATE (tx4)-[:AT_MERCHANT]->(m4)
  56. 56. Credit Card Transactions as a Graph CREATE (john)-[:MAKES]->(tx1) CREATE (john)-[:MAKES]->(tx2) CREATE (john)-[:MAKES]->(tx3) CREATE (john)-[:MAKES]->(tx4)
  57. 57. John’s Transactions last week // The last week of John's transactions MATCH p = (n:User {name:"John"})-[:MAKES]->(tx) WHERE tx.date > datetime() - duration('P7D') RETURN p
  58. 58. John’s Transactions last week
  59. 59. Credit Card Transactions as a List MATCH (u:User) WHERE SIZE((u)-[:PREV_TX]->()) = 0 AND SIZE((u)-[:MAKES]->()) > 0 WITH u LIMIT 100 MATCH (u)-[r:MAKES]->(tx) WITH u, tx ORDER BY tx.date DESC WITH u, COLLECT(tx) AS transactions, HEAD(COLLECT(tx)) AS last CREATE (u)-[:PREV_TX]->(last) FOREACH (n IN RANGE(0, SIZE(transactions)-2) | FOREACH (next IN [transactions[n]] | FOREACH (prev IN [transactions[n+1]] | CREATE (next)-[:PREV_TX]->(prev) )))
  60. 60. John’s Transactions last week // The last week of John's transactions MATCH p = (n:User {name:"John"})-[:PREV_TX*]->(tx) WHERE NONE (tx IN tail(nodes(p))           WHERE tx.date <= datetime() - duration('P7D')) RETURN p
  61. 61. John’s Transactions last week
  62. 62. TxTx Tx TxTx Tx Tx TxTxTx TxJohn
  63. 63. Tx $2000 TxTx Tx Tx TxTxTxTx Tx Tx Computer Store John
  64. 64. Tx $2000 Tx Tx $25$10$4 TxTx Tx Tx TxTxTx Computer Store John
  65. 65. Following the footsteps // All the transactions marked fraudulent in the last week // and the transactions that came before them // up to two weeks ago. MATCH p = (fraud:Fraudulent)-[:PREV_TX*]->(tx) WHERE fraud.date > datetime() - duration('P7D')   AND NONE (tx IN tail(nodes(p)) WHERE tx.date <= datetime() - duration('P14D')) RETURN p
  66. 66. Following the footsteps
  67. 67. Tx Tx $2000 Tx Tx $25$10$4 TxTx Tx Tx TxTxTx Computer Store John Sheila $2 TxTxSheila TxTxTx Tx Tx TxTx $3000 Tx Jewelry StoreTx $3
  68. 68. Tx Tx $2000 Tx Tx $25$10$4 TxTx Tx Tx TxTxTx Computer Store John Sheila $2 TxTxSheila TxTxTx Tx Tx TxTx $3000 Tx Jewelry StoreTx $3 Robert TxTxTx Tx TxTx TxTxTx Tx Tx
  69. 69. TxTx $2 TxTx Tx $2000 Tx Tx $25$10$4 TxTx Tx Tx TxTxTx Computer Store John Sheila Robert $3 Karen TxTxTx Tx Tx TxTx $3000 Tx Jewelry StoreTx $3 TxTxTx Tx Tx TxTx TxTx TxTx TxTx Tx Tx TxTx $8 $12 Tx $1500 Furniture Store Tx Tx Tx
  70. 70. Find the Suspect Merchants // Top 5 common merchants from fraudulent transaction chains up to two weeks ago. MATCH p = (fraud:Fraudulent)-[:PREV_TX*]->(tx) WHERE fraud.date > datetime() - duration('P7D')   AND NONE (tx IN tail(nodes(p))             WHERE tx.date <= datetime() - duration('P14D')) WITH nodes(p) AS transactions UNWIND transactions AS tx WITH DISTINCT tx MATCH (tx)-[:AT_MERCHANT]->(merchant) RETURN merchant.name, COUNT(*) AS txCount ORDER BY txCount DESC LIMIT 5
  71. 71. Find the Suspect Merchants
  72. 72. TxTx $2 TxTx Tx $2000 Tx Tx $25$10$4 TxTx Tx Tx TxTxTx Computer Store John Gas Station Sheila Robert $3 Karen TxTxTx Tx Tx TxTx $3000 Tx Jewelry StoreTx $3 TxTxTx Tx Tx TxTx TxTx TxTx TxTx Tx Tx TxTx $8 $12 Tx $1500 Furniture Store Tx Tx Tx
  73. 73. How Neo4j fits in
  74. 74. Money Transferring Purchases Bank Services Relational database Develop Patterns Data Science-team + Good for Discrete Analysis – No Holistic View of Data-Relationships – Slow query speed for connections
  75. 75. Money Transferring Purchases Bank Services Relational database Data Lake + Good for Map Reduce + Good for Analytical Workloads – No holistic view – Non-operational workloads – Weeks-to-months processes Develop Patterns Data Science-team Merchant Data Credit Score Data Other 3rd Party Data
  76. 76. Money Transferring Purchases Bank Services Neo4j powers 360° view of transactions in real-time Neo4j Cluster SENSE Transaction stream RESPOND Alerts & notification LOAD RELEVANT DATA Relational database Data Lake Visualization UI Fine Tune Patterns Develop Patterns Data Science-team Merchant Data Credit Score Data Other 3rd Party Data
  77. 77. Money Transferring Purchases Bank Services Neo4j powers 360° view of transactions in real-time Neo4j Cluster SENSE Transaction stream RESPOND Alerts & notification LOAD RELEVANT DATA Relational database Data Lake Visualization UI Fine Tune Patterns Develop Patterns Data Science-team Merchant Data Credit Score Data Other 3rd Party Data Data-set used to explore new insights
  78. 78. Summary
  79. 79. We talked about… Finding Fraud with Graphs Examples of different types of Fraud: Fraud Rings Credit Card Testing Fraud Origination How Neo4j Fits in an Architecture
  80. 80. Detect & prevent fraud in real-time Faster credit risk analysis and transactions Reduce chargebacks Quickly adapt to new methods of fraud Why Neo4j? Who’s using it? Financial institutions use Neo4j to: FINANCE Government Online Retail Names redacted to protect the innocent and conceal the guilty
  81. 81. No really, why Neo4j?
  82. 82. Fixed Sized Records “Joins” on Creation Spin Spin Spin through this data structure Pointers instead of Lookups 1 2 3 4
  83. 83. Partitions Each Node’s relationships are partitioned by type and direction.
  84. 84. Valuable Resources! neo4jsandbox.com https://neo4j.com/use-cases/fraud-detection/ neo4j.com/product Sandbox Fraud Detection Product
  85. 85. Q&A

×