Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

VAT fraud detection : the mysterious case of the missing trader


Published on

How to use graphs and Neo4j to detect tax fraud? A concrete example.

Published in: Data & Analytics
  • Login to see the comments

VAT fraud detection : the mysterious case of the missing trader

  1. 1. VAT fraud : the mysterious case of the missing trader. SAS founded in 2013 in Paris | | @linkurious
  2. 2. Introduction. Jean Villedieu Co-Founder of Linkurious A mix of fraud and graph expertise. >5 years in consulting MSc Political sciences and Competitive Intelligence Scott Mongeau Data Scientist @ SARK7 Fraud Expert PhD in Business Analytics Mgmt and MBA
  3. 3. Father Of Father Of Siblings What is a graph ? This is a graph.
  4. 4. Father Of Father Of Siblings This is a node This is a relationship WWhhaatt iiss aa ggrraapphh :? n /o Ndoedse asn &d rreellaattiioonnsshhiippss. A graph is a set of nodes linked by relationships.
  5. 5. Differents domains where graphs are important. Some of the domains in which our customers use graphs. Supply chains Social networks Communications People, objects, movies, restaurants, music… Suggest new contacts, help discover new music Antennas, servers, phones, people… Diminish network outages Supplier, roads, warehouses, products… Diminish transportation cost, optimize delivery
  6. 6. A very profitable business. £176 million In 2012 in the UK, a fraud ringleader was found guilty of defrauding £176m in a VAT scam. Source :
  7. 7. How does the VAT fraud works. Company B sells the phones to company D (US) and claims a VAT refund. The directors of A and D disappear with €2M in stolen taxes. Company B sells the phones to company C. It charges €10M + €1M for the VAT. Company A (US) sells to Company B (Europe) €10M worth of phones. A €10M B €10M + €1M VAT C €1M VAT refund €10M Tax Agency D €1M for A and €1M for B Step 1 Step 2 Step 3 Step 4
  8. 8. The execution of the fraud can take place in just a few weeks. The tax agencies have data but it exists in silos making it hard to piece it together. Why it is so hard to catch the fraud. The 3 challenges all tax authorities face. Apparences Speed Silos The companies and transactions used for the fraud appear legitimate.
  9. 9. How to make sense of complex data. How can graph technologies helps?
  10. 10. Company registry, transaction history, financial criminals list, tax claims... Different data sources.
  11. 11. Graphs help make sense of complex data. A graph model help see the connections in the data. country : Italy age : 29 criminal_status : unknown Paul (Person) Nicole (Person) Company A (Company) Company C (Company) Company B (Company) country : USA type : LLC creation_date : 08/10/1983 country : Italy type : SRL creation_date : 04/09/1984 country : Italy type : SRL creation_date : 18/04/1990 SELLS_TO COLLECTS_VAT item : phones date : 05/08/2014 amount : 1M SELLS_TO PARENT_OF country : USA age : 53 criminal_status : unknown DIRECTOR_OF DIRECTOR_OF DIRECTOR_OF
  12. 12. Can we use the data to detect fraud cases? How to use the information.
  13. 13. Designing a fraud detection pattern. A fraud expert designs a fraud detection pattern. I know what to look for. Usually my fraud cases involve : ● a set of at least three transaction that includes companies from two different countries ; ● the company in the middle has been created less than 90 days ago ; ● the transactions occur in a less than 15 days ;
  14. 14. Designing a fraud detection pattern. The pattern is translated in a graph language. MATCH p=(a:Company)-[rs:SELLS_TO*]->(c:Company) WHERE <> WITH p, a, c, rs, nodes(p) AS ns WITH p, a, c, rs, filter(n IN ns WHERE n.epoch - 1383123473 < (90*60*60*24)) AS bs WITH p, a, c, rs, head(bs) AS b WHERE NOT b IS NULL WITH p, a, b, c, head(rs) AS r1, last(rs) AS rn WITH p, a, b, c, r1, rn, rn.epoch - r1.epoch AS d WHERE d < (15*60*60*24) RETURN a, b, c, d, r1, rn
  15. 15. Graph databases can tackle big datasets. A graph database handles the data analysis at scale. ETL Traditional databases. Graph database. The graph databases helps store the data from various sources and analyse it in real-time to identify potential fraud cases.
  16. 16. An analyst examines the potential fraud cases. A fraud analyst investigates the potential fraud cases. I need to make sure the alerts detected by our detection system are legitimate. If they are, I need to understand which companies and which individual are involved.
  17. 17. Visualization transforms alerts into actions.. Graph visualization facilitate the data investigation. ETL API Traditional database. Graph database. Graph visualization. Graph visualization solutions like Linkurious help data analysts investigate graph data faster.
  18. 18. Visualizing the results of our pattern. Two suspicious chains of transactions. Companies detected by our query : in dark green US companies, in orange Italian Companies and in light green UK companies.
  19. 19. Looking at the full VAT fraud scheme. The transactions are connected in a larger scheme. The people and companies connected to our initial transactions : in pink the companies, in purple the holdings and in green the people.
  20. 20. Zooming in on a potential criminal. We can focus on key individuals. Looking at Cletis Bysshe, the man at the start of the transactions chain.
  21. 21. Graphs can improve your fraud detection system. Linkurious allows the fraud teams to go deep in the data and build cases against fraud rings. The fraud teams acts faster and more fraud cases can be avoided. Detect fraud cases Graph databases can find suspicious patterns hidden in big data. Accelerate the investigations Save money Graphs and fraud detection.
  22. 22. Try Linkurious. You can do it too!
  23. 23. Contact us to discuss your projects at Conclusion
  24. 24. Additional resources. GraphGist : Blog post on the carousel fraud : trader/ Article on fraud and network analysis : Sample dataset :