Analisi dati per software anti frode. Roberto Marmo, Università di Pavia


Data Driven Economy. May 18th 2018. Data Driven innovation 2018. Dipartimento di ingegneria, Università degli Studi Roma Tre

  1. 1. Data Analysis and Anti-Fraud Software Analisi dati per software anti frode Roberto Marmo Università di Pavia, Facoltà di Ingegneria
  2. 2. 2 Association of Certified Fraud Examiners ACFE ACFE Central Italy Chapter • non-profit association that links professionals who are actively involved in anti-fraud activities including preventing, detecting, and investigating fraud • ACFE Certified Fraud Examiner è una associazione senza scopo di lucro che mette in contatto tutti i professionisti che si occupano attivamente di frodi
  3. 3. 3 Fraud / Frode • a knowing misrepresentation of the truth or concealment of a material fact to induce another to act to his or her detriment, includes any intentional or deliberate act to deprive another of the property or money by the guile, deception, or other unfair means • comportamento attivo od omissivo che causi un altrui errore e che determini un indebito profitto e un danno, anche potenziale, tende a sovrapporsi con il comportamento scorretto e sleale, la mancata segnalazione da parte del soggetto agente di un errore preesistente commesso dalla vittima è sufficiente per qualificare un atto come fraudolento •
  4. 4. 4 Fraud / Frode • fraud detection: methodologies to detect and prevent fraudulent activities / metodi per trovare le frodi • red flag: signal that something is out of the ordinary and may need to be investigated further / segnale che indica anomalia da investigare
  5. 5. 5 Fraud / Frode • forensic fraud: when forensic examiner try to reconstruct the fraud scenario / si cerca di ricostruire lo scenario della frode • fraud analytics: the process used to analyze data that can identify anomalies, trends, patterns / processo di analisi dei dati per trovare anomalie, trend, configurazioni • transactional data: can be used to identify fraud in financial transactions / utili per ricostruire le frodi nei pagamenti
  6. 6. 6 Right fraud solution Fraud detection tool must be able to / il sistema deve sapere: • information credibility by integrating disparate data sources / integrare dati da diverse fonti • detect fraud faster with real-time integration to IT systems / trovare subito la frode • reduce false positives by using all data/ riduzione dei falsi positivi usando tutti i dati • uncover hidden relationships, detect subtle patterns of behavior, prioritize suspicious cases / trovare relazioni nascoste, creare priorità • fraud reporting / descrivere la frode trovata
  7. 7. 7 Right fraud solution Fraud detection tool must be able to / il sistema deve saper fare: • anti denial of service / protezione contro messa fuori uso • anti reverse engineering / protezione contro chi vuole capire come funziona • anti data manipulation / protezione contro modifica dei dati inseriti
  8. 8. 8 Methodology 1. team 2. fraud scenario / come avviene la frode 3. risk assessment / valutazione del rischio 4. data sampling / scelta dati rappresentativi 5. data manipulation / creazione di dati specifici 6. data analysis / analisi dati 7. results discussion / analisi dei risultati
  9. 9. 9 1. team • internal audit • IT database specialist • data analyst • fraud specialist • fraud auditor • forensic accountant • legal team
  10. 10. 2. fraud scenario • fraud scenario concealment is how the fraudulent act is executed / contesto e modalità con cui può avvenire la frode • why does fraud occur / perché accade • the person committing the fraud scenario / persone coinvolte e caratteristiche • how the scheme occurs, how the perpetrator of the fraud obtains the financial benefit / come la frode fornisce un vantaggio economico • establish normal behaviors and abnormal patterns / comportamenti corretti e scorretti 10
  11. 11. 11 3. risk assessment risk assessment and identify the potential targets / identificare gli obiettivi a rischio di subire frode: • inherent vulnerabilities • inventories • cash • reputation • others Losses Matrix
  12. 12. 12 4. data sampling • statistical analysis technique used to select, manipulate and analyze a representative subset of data points / tecniche di analisi statistica • time series analysis, series of data points in time order / analisi della serie temporale
  13. 13. 13 5. data manipulation • synthetic data set, in case of no available data on real fraud events / creazione dati con frodi non realistiche • pay attention / attenzione a interpretare risultati • 00848343/document Synthetic logs generator for fraud detection in mobile transfer services • Synthetic Financial Datasets For Fraud Detection testimon/paysim1
  14. 14. 14 6. data analysis • statistical data analysis – averages, quantiles, performance metrics, probability distributions, and so on – models and probability distributions – time-series analysis – clustering and classification to find patterns and associations among groups of data – anomaly, outlier detection • artificial intelligence – neural networks, Bayesian networks – machine learning – deep learning
  15. 15. 15 6. data analysis • social network analysis – graph visualization – social media mining • information visualization – visualizations that reveal useful patterns in the data
  16. 16. 16 6. data analysis • 9.pdf A Comprehensive Survey of Data Mining- based Fraud Detection Research • 0856_A_Survey_of_Credit_Card_Fraud_Detect ion_Techniques_Data_and_Technique_Oriente d_Perspective A Survey of Credit Card Fraud Detection Techniques: Data and Technique Oriented Perspective
  17. 17. 17 6. data analysis - social network analysis • knowledge discovery from graph related to fraud detection • detection and investigation of groups of collaborating automobile insurance fraudsters / trovare legami tra le persone coinvolte in un incidente automobilistico
  18. 18. 18 6. data analysis - social network analysis • Are fraudsters friends on social media? / sono amici online? • Open Source INTelligence • Social Media Mining same emails related to different user accounts / stessa email relativi a diversi account utente
  19. 19. 19 6. data analysis – neural network 1.SOM Neural Network, visualization of the multidimensional data reflecting the user sequential activities 2.fraud detection on the basis of the threshold type binary classification algorithm clusters of similar users profile suspicio us users
  20. 20. 20 6. data analysis – deep learning • Deep learning is a subset of AI and machine learning that uses multi-layered artificial neural networks • using-deep-learning Credit Card Fraud Detection using Restricted Boltzmann Machine in TensorFlow and Python language
  21. 21. 21 parallel coordinates, visualizing high-dimensional geometry and analyzing multivariate data / grafico con coordinate parallele per mostrare dati con molte dimensioni 6. data analysis information visualization
  22. 22. 22 Sankey diagrams, the width of the arrows is shown proportionally to the cash flow quantity / mostra il flusso pagamenti per trovare le perdite 6. data analysis information visualization
  23. 23. 23 • digital forensics aims to detect the alterations done in the images / alterazione di immagini per falsificare le informazioni contenute • example: human face on identity card, document check stamp fraud detection • approaches: – image search of human face on Google – set of image forensic techniques – GIMP cross-platform image editor available – fotomontaggio/ 6. data analysis image forgery detection
  24. 24. 24 6. data analysis programming languages • Python – Credit card transactions fraud detection model – Credit Card Fraud Detection using Machine Learning • R – Sales fraud detection – Credit card fraud dataset • MatLab cles/systematic-fraud-detection-through-automated- data-analytics-in-matlab.html • Java, C, ecc.
  25. 25. 6. results discussion • reporting and monitoring dashboard: % score, details on insights / schermata di riepilogo con punteggio e spiegazione 25
  26. 26. 26 6. results discussion • confusion matrix, visualization performance of fraud detection / mostra la precisione di risultati • a+d=good b+c=bad prediction • Key Fraud Indicators (KFI) related to financial indicators and fraud amount
  27. 27. The end! Thank you! email 27