Analisi dati per software anti frode. Roberto Marmo, Università di Pavia
1. Data Analysis and Anti-Fraud Software
Analisi dati per software anti frode
Roberto Marmo
Università di Pavia, Facoltà di Ingegneria
www.robertomarmo.net www.socialmediamining.it
info@robertomarmo.net
https://it.linkedin.com/in/robertomarmo
2. 2
Association of Certified Fraud Examiners ACFE
ACFE Central Italy Chapter info@acfecentral.it www.acfecentral.it
• non-profit association that links professionals who are actively
involved in anti-fraud activities including preventing,
detecting, and investigating fraud http://www.acfe.com/
• ACFE Certified Fraud Examiner è una associazione senza
scopo di lucro che mette in contatto tutti i professionisti che si
occupano attivamente di frodi
3. 3
Fraud / Frode
• a knowing misrepresentation of the truth or concealment of a
material fact to induce another to act to his or her detriment,
includes any intentional or deliberate act to deprive another of
the property or money by the guile, deception, or other unfair
means
• comportamento attivo od omissivo che causi un altrui errore e
che determini un indebito profitto e un danno, anche potenziale,
tende a sovrapporsi con il comportamento scorretto e sleale, la
mancata segnalazione da parte del soggetto agente di un
errore preesistente commesso dalla vittima è sufficiente per
qualificare un atto come fraudolento
• http://www.acfe.com/fraud-101.aspx
4. 4
Fraud / Frode
• fraud detection: methodologies to detect and
prevent fraudulent activities / metodi per trovare
le frodi
• red flag: signal that something is out of the
ordinary and may need to be investigated
further / segnale che indica anomalia da
investigare
5. 5
Fraud / Frode
• forensic fraud: when forensic examiner try to
reconstruct the fraud scenario / si cerca di
ricostruire lo scenario della frode
• fraud analytics: the process used to analyze
data that can identify anomalies, trends,
patterns / processo di analisi dei dati per
trovare anomalie, trend, configurazioni
• transactional data: can be used to
identify fraud in financial transactions / utili per
ricostruire le frodi nei pagamenti
6. 6
Right fraud solution
Fraud detection tool must be able to / il sistema deve
sapere:
• information credibility by integrating disparate data
sources / integrare dati da diverse fonti
• detect fraud faster with real-time integration to IT
systems / trovare subito la frode
• reduce false positives by using all data/ riduzione dei
falsi positivi usando tutti i dati
• uncover hidden relationships, detect subtle patterns of
behavior, prioritize suspicious cases / trovare
relazioni nascoste, creare priorità
• fraud reporting / descrivere la frode trovata
7. 7
Right fraud solution
Fraud detection tool must be able to / il sistema
deve saper fare:
• anti denial of service / protezione contro messa
fuori uso
• anti reverse engineering / protezione contro chi
vuole capire come funziona
• anti data manipulation / protezione contro
modifica dei dati inseriti
8. 8
Methodology
1. team
2. fraud scenario / come avviene la frode
3. risk assessment / valutazione del rischio
4. data sampling / scelta dati rappresentativi
5. data manipulation / creazione di dati specifici
6. data analysis / analisi dati
7. results discussion / analisi dei risultati
9. 9
1. team
• internal audit
• IT database specialist
• data analyst
• fraud specialist
• fraud auditor
• forensic accountant
• legal team
10. 2. fraud scenario
• fraud scenario concealment is how the
fraudulent act is executed / contesto e modalità
con cui può avvenire la frode
• why does fraud occur / perché accade
• the person committing the fraud scenario /
persone coinvolte e caratteristiche
• how the scheme occurs, how the perpetrator of
the fraud obtains the financial benefit / come la
frode fornisce un vantaggio economico
• establish normal behaviors and abnormal
patterns / comportamenti corretti e scorretti
10
11. 11
3. risk assessment
risk assessment and identify the potential targets
/ identificare gli obiettivi a rischio di subire frode:
• inherent vulnerabilities
• inventories
• cash
• reputation
• others Losses Matrix
12. 12
4. data sampling
• statistical analysis technique used to select,
manipulate and analyze a representative subset
of data points / tecniche di analisi statistica
• time series analysis, series of data points in
time order / analisi della serie temporale
13. 13
5. data manipulation
• synthetic data set, in case of no available data
on real fraud events / creazione dati con frodi
non realistiche
• pay attention / attenzione a interpretare risultati
• https://hal.archives-ouvertes.fr/hal-
00848343/document Synthetic logs generator
for fraud detection in mobile transfer services
• Synthetic Financial Datasets For Fraud
Detection https://www.kaggle.com/ntnu-
testimon/paysim1
14. 14
6. data analysis
• statistical data analysis
– averages, quantiles, performance metrics,
probability distributions, and so on
– models and probability distributions
– time-series analysis
– clustering and classification to find patterns and
associations among groups of data
– anomaly, outlier detection
• artificial intelligence
– neural networks, Bayesian networks
– machine learning
– deep learning
15. 15
6. data analysis
• social network analysis
– graph visualization
– social media mining
• information visualization
– visualizations that reveal useful patterns in the data
16. 16
6. data analysis
• https://arxiv.org/ftp/arxiv/papers/1009/1009.611
9.pdf A Comprehensive Survey of Data Mining-
based Fraud Detection Research
• https://www.researchgate.net/publication/31061
0856_A_Survey_of_Credit_Card_Fraud_Detect
ion_Techniques_Data_and_Technique_Oriente
d_Perspective A Survey of Credit Card Fraud
Detection Techniques: Data and Technique
Oriented Perspective
17. 17
6. data analysis - social network analysis
• knowledge discovery from
graph related to fraud
detection
• detection and investigation
of groups of collaborating
automobile insurance
fraudsters / trovare legami
tra le persone coinvolte in
un incidente automobilistico
18. 18
6. data analysis - social network analysis
• Are fraudsters friends on
social media? / sono amici
online?
• Open Source INTelligence
• Social Media Mining
same emails
related to different
user accounts /
stessa email
relativi a diversi
account utente
19. 19
6. data analysis – neural network
1.SOM Neural Network, visualization of the
multidimensional data reflecting the user
sequential activities
2.fraud detection on the basis of the threshold
type binary classification algorithm
clusters
of similar
users
profile
suspicio
us users
20. 20
6. data analysis – deep learning
• Deep learning is a subset of AI and machine
learning that uses multi-layered artificial neural
networks
• https://github.com/aaxwaz/Fraud-detection-
using-deep-learning Credit Card Fraud
Detection using Restricted Boltzmann Machine
in TensorFlow and Python language
21. 21
parallel coordinates, visualizing high-dimensional
geometry and analyzing multivariate data / grafico
con coordinate parallele per mostrare dati con
molte dimensioni
6. data analysis information visualization
22. 22
Sankey diagrams, the width of the arrows is
shown proportionally to the cash flow quantity /
mostra il flusso pagamenti per trovare le perdite
6. data analysis information visualization
23. 23
• digital forensics aims to detect the alterations
done in the images / alterazione di immagini per
falsificare le informazioni contenute
• example: human face on identity card,
document check stamp fraud detection
• approaches:
– image search of human face on Google
– set of image forensic techniques
– GIMP cross-platform image editor available
– http://fotoautentica.wordpress.com/tag/scoprire-un-
fotomontaggio/
6. data analysis image forgery detection
24. 24
6. data analysis programming languages
• Python
– https://github.com/cloudacademy/fraud-detection Credit card
transactions fraud detection model
– https://github.com/yazanobeidi/fraud-detection Credit
Card Fraud Detection using Machine Learning
• R
– https://github.com/z-o-e/sales-fraud-detection Sales fraud
detection
– https://github.com/Davidovich4/fraud-detection Credit card
fraud dataset
• MatLab
https://www.mathworks.com/company/newsletters/arti
cles/systematic-fraud-detection-through-automated-
data-analytics-in-matlab.html
• Java, C, ecc.
25. 6. results discussion
• reporting and monitoring dashboard: % score,
details on insights / schermata di riepilogo con
punteggio e spiegazione
25
26. 26
6. results discussion
• confusion matrix, visualization performance of
fraud detection / mostra la precisione di risultati
• a+d=good b+c=bad prediction
• Key Fraud Indicators (KFI) related to financial
indicators and fraud amount