SlideShare a Scribd company logo
Detection of fraud in financial blockchain-based
transactions through big data analytics
Jessica P´aez Bonilla
Director: Jose Maria ´Alvarez Rodr´ıguez
Universidad Carlos III de Madrid
Master in Big Data Analytics
2017-2018
July 11,2018
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 1 / 27
Overview
1 Introduction
2 Project Objectives
3 System Design
4 Implementation
5 Experiment
6 Project Budget and Plan
7 Legal Framework and socio-economic environment
8 Conclusions and Future works
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 2 / 27
Introduction
Using analytical techniques -data gathering, preprocessing, and model
building- it could be possible to detect and prevent financial fraud.
The aim to describe complex fraud in terms of patterns suitable for
system-driven detection and analysis.
Network analysis can provide useful insight into large datasets based
on the interconnectedness of the agents in the network being
analyzed.
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 3 / 27
Introduction
Network: shows relationships among the blockchain users and flux of
money. It enables the fraud patterns discovery.
Network graph analysis offers a method for capturing the context
of fraud in a standard, machine readable and transferable format.
Associations learned from visually observing fraudulent transactions,
could be used as knowledge input to create analytical models.
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 4 / 27
Project Objectives
1 Research techniques used for fraud detection and explore blockchain
data.
2 Design a system that could take into account the patterns
surrounding the fraudulent transactions.
3 Implement a system using big data analytic tools like R and Python.
4 Experiment and validate the designed system.
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 5 / 27
System Overview
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 6 / 27
System Design - Network Metrics
Metric Interpretation
Degree Influence on the network
Closeness How quick is the access to other nodes in the network
Betweeness Node location. Is it in the shortest path to other nodes?
Density Level of linkage among the nodes
Modularity How modular the network is
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 7 / 27
Implementation - Technology used
BigQuery, R
(igraph) and
Python have been
used in the
development of
this system.
Table 1: Used Packages Versions
Package Used Version
matplotlib 1.5.1
pandas 0.19.2
networkx 1.11
community 0.9
numpy 1.11.3
scipy 0.18.1
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 8 / 27
Experiment - Steps
1 Data Exploration.
2 Network metrics and extraction of communities.
3 Features and ML algorithms selection.
4 Performance Measures.
5 Execution.
6 Analysis of Results.
7 Experiment Limitations.
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 9 / 27
Experiment - 1. Data Exploration
Bitcoin blockchain data was explored using BigQuery. A data segment
containing fraudulent movements was chosen as sample for analysis in this
project.
Figure 1: Blocks over time Figure 2: Transactions in the sample
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 10 / 27
Experiment - 2. Network Metrics and extraction of
communities
Communities
1 Network modeling
2 Clustering
3 Giant Component
Figure 3: Communities extraction
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 11 / 27
Experiment - 3. Features and ML algorithms selection
Figure 4: Selected features
ML Algorithms
1 Decision Tree
1 White-box modeled. Can be
interpreted.
2 Perform well on imbalanced
datasets.
2 Random Forest
1 Ensemble: combine the
predictions of several base
estimators in order to improve
robustness over a single
estimator.
2 Each tree in the ensemble is
built from a sample drawn
with replacement
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 12 / 27
Experiment - 4. Performance Measures
Classification Precision
It gives the percentage of correct predictions.
Confusion Matrix
It is a 2x2 matrix that tells us the types of errors that the classifier is
making.
AUC - Area Under the (ROC) Curve
It is a single number summary of classifier performance, useful even when
there is class imbalance.
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 13 / 27
Experiment - 5. Execution
Once the features (transaction network metrics) are obtained, and ML
algortithms and its performance metrics are defined, 2 main tasks need to
be run before fitting the system.
Observations Labeling
Analysis of a real fraudulent transaction.
Dataset Balancing
Once the dataset is labeled, there were many more observations of one
class. An oversampling technique was applied in order to balance it.
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 14 / 27
Experiment- 5.1. Analysis of a fraudulent transaction
Figure 5: Fraudster Neighbours
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 15 / 27
Experiment - 5.2. Dataset Balancing
The dataset used
has around 30k
observations in
the training set
and around 7k in
the test set.
Python package
Imbalanced-learn
was used. It
applies an
oversampling on
the minority class.
Table 2: Proportion of classes
Dataset Class Proportion
Train Suspicious 0.498627
Train Non-suspicious 0.501373
Test Suspicious 0.500343
Test Non-suspicious 0.499657
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 16 / 27
Experiment - 6. Analysis of Results
The obtained metrics of the selected ML algorithms are summarized in the
table below:
Table 3: Classification Metrics Comparison
Model Class. Accuracy Sensitivity AUC
Decision Tree 0.9989 0.9979 0.9994
Random Forest 0.9619 0.9752 0.9974
The selected method was the Random Forest, as was the one giving more
weight to the different network metrics and still achieving a high accuracy.
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 17 / 27
Experiment - 6. Analysis of Results
The weight given to each of the features of Random Forest is presented in
this barchart.
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 18 / 27
Experiment - 6. Analysis of Results
Table 4: Classification Metrics - Random Forest
METRIC VALUE
Classification accuracy 0.9619
Classification error 0.0380
Sensitivity 0.9752
Specificity 0.9487
False positive rate 0.0512
Precision 0.9502
AUC 0.9974
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 19 / 27
Experiment - 6. Analysis of Results
ROC Curve obtained:
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 20 / 27
Experiment - 7. Limitations
Studying more known cases of fraud within the bitcoin blockchain, it
could be possible to increase the known fraudulent transaction
patterns.
Having more data will also help to prevent the overfitting with
decision trees, as the tree design would not be able to cover all the
training data.
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 21 / 27
Project Budget
A summary of the project budget is presented in the table.
Cost Total (AC)
Direct Costs 8,827.5
Indirect Costs 882,75
Total Costs 9,710.25
Profit (10%) 971.025
Cost + Profit 10,681.275
IVA (21%) 2,243.06
TOTAL + IVA 12,924.343
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 22 / 27
Project Planification
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 23 / 27
Legal Framework and socio-economic environment
Legal Framework: The Bitcoin blockchain data is now available for
exploration with BigQuery, using Google Cloud services. Data is
public and no licensing is required.
Socio-economic environment: Blockchain technology is rapidly
evolving and will be widely used in the finance world in the coming
years.
10 % of world GDP will be stored in blockchains by 2020.
IoT era also promotes the Fintech revolution.
It creates the challenge to develop and apply different sets of
techniques in order to detect fraud in these new digital platforms.
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 24 / 27
Conclusions
1 Business: Detecting and flagging activity suspicious of fraud before it
actually takes place could save billions annually in both developed and
non-developed economies.
2 Technical: The proposed system can flag a suspicious blockchain
transaction with a high accuracy taking into account network metrics
resulting of modeling the giant components of the transactions.
3 Personal: Learning of a ongrowing sector (”Fintech”) that combines
finance and technology as well as of how the analytic techniques can
be applied to it.
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 25 / 27
Future works
1 Create a software platform that could access and integrate both
environments R and Python.
2 This platform could be running continuously and flag by means of
an UI whenever the model classifies a new observation as Suspicious.
3 Knowing more patterns of fraudulent transactions can help to
avoid the overfitting in the models.
4 Try other network metrics (like mean neighbour degree, node
correlation similarity etc..) as features for the classification model.
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 26 / 27
Thank you for your attention
Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 27 / 27

More Related Content

What's hot

[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
Shunta Saito
 
Deep learning health care
Deep learning health care  Deep learning health care
Deep learning health care
Meenakshi Sood
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
WithTheBest
 
Machine learning
 Machine learning Machine learning
Machine learning
Siddharth Kar
 
Federated learning
Federated learningFederated learning
Federated learning
Mindos Cheng
 
畳み込みLstm
畳み込みLstm畳み込みLstm
畳み込みLstm
tak9029
 
Vision and Language(メタサーベイ )
Vision and Language(メタサーベイ )Vision and Language(メタサーベイ )
Vision and Language(メタサーベイ )
cvpaper. challenge
 
Dropout as a Bayesian Approximation
Dropout as a Bayesian ApproximationDropout as a Bayesian Approximation
Dropout as a Bayesian Approximation
Sangwoo Mo
 
An introduction on normalizing flows
An introduction on normalizing flowsAn introduction on normalizing flows
An introduction on normalizing flows
Grigoris C
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
Noura Hussein
 
Final thesis presentation
Final thesis presentationFinal thesis presentation
Final thesis presentation
Pawan Singh
 
Deep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical ImagingDeep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical Imaging
Joonhyung Lee
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data Scientist
Alexey Grigorev
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
Mustafa Yagmur
 
Chest X-ray Pneumonia Classification with Deep Learning
Chest X-ray Pneumonia Classification with Deep LearningChest X-ray Pneumonia Classification with Deep Learning
Chest X-ray Pneumonia Classification with Deep Learning
BaoTramDuong2
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
Dr. Stylianos Kampakis
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
Yunyao Li
 
Sms spam classification
Sms spam classificationSms spam classification
Sms spam classification
AnishaAgarwal41
 
Slides, thesis dissertation defense, deep generative neural networks for nove...
Slides, thesis dissertation defense, deep generative neural networks for nove...Slides, thesis dissertation defense, deep generative neural networks for nove...
Slides, thesis dissertation defense, deep generative neural networks for nove...
mehdi Cherti
 

What's hot (20)

[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
 
Deep learning health care
Deep learning health care  Deep learning health care
Deep learning health care
 
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAIGenerative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
Generative Adversarial Networks (GANs) - Ian Goodfellow, OpenAI
 
Machine learning
 Machine learning Machine learning
Machine learning
 
Federated learning
Federated learningFederated learning
Federated learning
 
畳み込みLstm
畳み込みLstm畳み込みLstm
畳み込みLstm
 
Vision and Language(メタサーベイ )
Vision and Language(メタサーベイ )Vision and Language(メタサーベイ )
Vision and Language(メタサーベイ )
 
Dropout as a Bayesian Approximation
Dropout as a Bayesian ApproximationDropout as a Bayesian Approximation
Dropout as a Bayesian Approximation
 
An introduction on normalizing flows
An introduction on normalizing flowsAn introduction on normalizing flows
An introduction on normalizing flows
 
Image classification using CNN
Image classification using CNNImage classification using CNN
Image classification using CNN
 
Final thesis presentation
Final thesis presentationFinal thesis presentation
Final thesis presentation
 
Deep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical ImagingDeep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical Imaging
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Full-stack Data Scientist
Full-stack Data ScientistFull-stack Data Scientist
Full-stack Data Scientist
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Chest X-ray Pneumonia Classification with Deep Learning
Chest X-ray Pneumonia Classification with Deep LearningChest X-ray Pneumonia Classification with Deep Learning
Chest X-ray Pneumonia Classification with Deep Learning
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Sms spam classification
Sms spam classificationSms spam classification
Sms spam classification
 
Slides, thesis dissertation defense, deep generative neural networks for nove...
Slides, thesis dissertation defense, deep generative neural networks for nove...Slides, thesis dissertation defense, deep generative neural networks for nove...
Slides, thesis dissertation defense, deep generative neural networks for nove...
 

Similar to Detection of fraud in financial blockchain-based transactions through big data analytics

Blockchain Technology: A Sustainability Perspective
Blockchain Technology: A Sustainability PerspectiveBlockchain Technology: A Sustainability Perspective
Blockchain Technology: A Sustainability Perspective
PinkiinsanSinghSingh
 
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian network
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian networkImpact of big data congestion in IT: An adaptive knowledgebased Bayesian network
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian network
IJECEIAES
 
Analysis of IT Monitoring Using Open Source Software Techniques: A Review
Analysis of IT Monitoring Using Open Source Software Techniques: A ReviewAnalysis of IT Monitoring Using Open Source Software Techniques: A Review
Analysis of IT Monitoring Using Open Source Software Techniques: A Review
IJERD Editor
 
The Internet of Things: What's next?
The Internet of Things: What's next? The Internet of Things: What's next?
The Internet of Things: What's next?
PayamBarnaghi
 
Analytics of Performance and Data Quality for Mobile Edge Cloud Applications
Analytics of Performance and Data Quality for Mobile Edge Cloud ApplicationsAnalytics of Performance and Data Quality for Mobile Edge Cloud Applications
Analytics of Performance and Data Quality for Mobile Edge Cloud Applications
Hong-Linh Truong
 
The Story of the Semantic Grid
The Story of the Semantic GridThe Story of the Semantic Grid
The Story of the Semantic Grid
butest
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
ArunKumar5524
 
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
AIRCC Publishing Corporation
 
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
ijcsit
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
SherinMariamReji05
 
Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach  Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach
IJECEIAES
 
Ijarcce 6
Ijarcce 6Ijarcce 6
Ijarcce 6
Daniel John
 
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
rahulmonikasharma
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET Journal
 
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesData Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
IJAEMSJORNAL
 
IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...
IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...
IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...
IJITE
 
Enabling the data driven enterprise
Enabling the data driven enterpriseEnabling the data driven enterprise
Enabling the data driven enterprise
rmikkilineni
 
Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...
Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...
Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...
JIEMS Akkalkuwa
 
Image Recognition Expert System based on deep learning
Image Recognition Expert System based on deep learningImage Recognition Expert System based on deep learning
Image Recognition Expert System based on deep learning
PRATHAMESH REGE
 
Integrated Analytics for IIoT Predictive Maintenance using IoT Big Data Cloud...
Integrated Analytics for IIoT Predictive Maintenance using IoT Big Data Cloud...Integrated Analytics for IIoT Predictive Maintenance using IoT Big Data Cloud...
Integrated Analytics for IIoT Predictive Maintenance using IoT Big Data Cloud...
Hong-Linh Truong
 

Similar to Detection of fraud in financial blockchain-based transactions through big data analytics (20)

Blockchain Technology: A Sustainability Perspective
Blockchain Technology: A Sustainability PerspectiveBlockchain Technology: A Sustainability Perspective
Blockchain Technology: A Sustainability Perspective
 
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian network
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian networkImpact of big data congestion in IT: An adaptive knowledgebased Bayesian network
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian network
 
Analysis of IT Monitoring Using Open Source Software Techniques: A Review
Analysis of IT Monitoring Using Open Source Software Techniques: A ReviewAnalysis of IT Monitoring Using Open Source Software Techniques: A Review
Analysis of IT Monitoring Using Open Source Software Techniques: A Review
 
The Internet of Things: What's next?
The Internet of Things: What's next? The Internet of Things: What's next?
The Internet of Things: What's next?
 
Analytics of Performance and Data Quality for Mobile Edge Cloud Applications
Analytics of Performance and Data Quality for Mobile Edge Cloud ApplicationsAnalytics of Performance and Data Quality for Mobile Edge Cloud Applications
Analytics of Performance and Data Quality for Mobile Edge Cloud Applications
 
The Story of the Semantic Grid
The Story of the Semantic GridThe Story of the Semantic Grid
The Story of the Semantic Grid
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
 
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
 
Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach  Concept Drift Identification using Classifier Ensemble Approach
Concept Drift Identification using Classifier Ensemble Approach
 
Ijarcce 6
Ijarcce 6Ijarcce 6
Ijarcce 6
 
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET-	 Fault Detection and Prediction of Failure using Vibration AnalysisIRJET-	 Fault Detection and Prediction of Failure using Vibration Analysis
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
 
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesData Mining Framework for Network Intrusion Detection using Efficient Techniques
Data Mining Framework for Network Intrusion Detection using Efficient Techniques
 
IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...
IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...
IDENTITY DISCLOSURE PROTECTION IN DYNAMIC NETWORKS USING K W – STRUCTURAL DIV...
 
Enabling the data driven enterprise
Enabling the data driven enterpriseEnabling the data driven enterprise
Enabling the data driven enterprise
 
Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...
Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...
Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...
 
Image Recognition Expert System based on deep learning
Image Recognition Expert System based on deep learningImage Recognition Expert System based on deep learning
Image Recognition Expert System based on deep learning
 
Integrated Analytics for IIoT Predictive Maintenance using IoT Big Data Cloud...
Integrated Analytics for IIoT Predictive Maintenance using IoT Big Data Cloud...Integrated Analytics for IIoT Predictive Maintenance using IoT Big Data Cloud...
Integrated Analytics for IIoT Predictive Maintenance using IoT Big Data Cloud...
 

More from CARLOS III UNIVERSITY OF MADRID

Proyecto IVERES-UC3M
Proyecto IVERES-UC3MProyecto IVERES-UC3M
Proyecto IVERES-UC3M
CARLOS III UNIVERSITY OF MADRID
 
RTVE: Sustainable Development Goal Radar
RTVE: Sustainable Development Goal  RadarRTVE: Sustainable Development Goal  Radar
RTVE: Sustainable Development Goal Radar
CARLOS III UNIVERSITY OF MADRID
 
Engineering 4.0: Digitization through task automation and reuse
Engineering 4.0:  Digitization through task automation and reuseEngineering 4.0:  Digitization through task automation and reuse
Engineering 4.0: Digitization through task automation and reuse
CARLOS III UNIVERSITY OF MADRID
 
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
CARLOS III UNIVERSITY OF MADRID
 
SESE 2021: Where Systems Engineering meets AI/ML
SESE 2021: Where Systems Engineering meets AI/MLSESE 2021: Where Systems Engineering meets AI/ML
SESE 2021: Where Systems Engineering meets AI/ML
CARLOS III UNIVERSITY OF MADRID
 
Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...
CARLOS III UNIVERSITY OF MADRID
 
Deep Learning Notes
Deep Learning NotesDeep Learning Notes
Deep Learning Notes
CARLOS III UNIVERSITY OF MADRID
 
H2020-AHTOOLS Use Case 3 Functional Design
H2020-AHTOOLS Use Case 3 Functional DesignH2020-AHTOOLS Use Case 3 Functional Design
H2020-AHTOOLS Use Case 3 Functional Design
CARLOS III UNIVERSITY OF MADRID
 
AI4SE: Challenges and opportunities in the integration of Systems Engineering...
AI4SE: Challenges and opportunities in the integration of Systems Engineering...AI4SE: Challenges and opportunities in the integration of Systems Engineering...
AI4SE: Challenges and opportunities in the integration of Systems Engineering...
CARLOS III UNIVERSITY OF MADRID
 
INCOSE IS 2019: AI and Systems Engineering
INCOSE IS 2019: AI and Systems EngineeringINCOSE IS 2019: AI and Systems Engineering
INCOSE IS 2019: AI and Systems Engineering
CARLOS III UNIVERSITY OF MADRID
 
Challenges in the integration of Systems Engineering and the AI/ML model life...
Challenges in the integration of Systems Engineering and the AI/ML model life...Challenges in the integration of Systems Engineering and the AI/ML model life...
Challenges in the integration of Systems Engineering and the AI/ML model life...
CARLOS III UNIVERSITY OF MADRID
 
Blockchain en la Industria Musical
Blockchain en la Industria MusicalBlockchain en la Industria Musical
Blockchain en la Industria Musical
CARLOS III UNIVERSITY OF MADRID
 
OSLC KM: Elevating the meaning of data and operations within the toolchain
OSLC KM: Elevating the meaning of data and operations within the toolchainOSLC KM: Elevating the meaning of data and operations within the toolchain
OSLC KM: Elevating the meaning of data and operations within the toolchain
CARLOS III UNIVERSITY OF MADRID
 
Blockchain y sector asegurador
Blockchain y sector aseguradorBlockchain y sector asegurador
Blockchain y sector asegurador
CARLOS III UNIVERSITY OF MADRID
 
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
CARLOS III UNIVERSITY OF MADRID
 
Systems and Software Architecture: an introduction to architectural modelling
Systems and Software Architecture: an introduction to architectural modellingSystems and Software Architecture: an introduction to architectural modelling
Systems and Software Architecture: an introduction to architectural modelling
CARLOS III UNIVERSITY OF MADRID
 
News headline generation with sentiment and patterns: A case study of sports ...
News headline generation with sentiment and patterns: A case study of sports ...News headline generation with sentiment and patterns: A case study of sports ...
News headline generation with sentiment and patterns: A case study of sports ...
CARLOS III UNIVERSITY OF MADRID
 
Blockchain y la industria musical
Blockchain y la industria musicalBlockchain y la industria musical
Blockchain y la industria musical
CARLOS III UNIVERSITY OF MADRID
 
Preparing your Big Data start-up pitch
Preparing your Big Data start-up pitchPreparing your Big Data start-up pitch
Preparing your Big Data start-up pitch
CARLOS III UNIVERSITY OF MADRID
 
Internet of Things (IoT) in a nutshell
Internet of Things (IoT) in a nutshellInternet of Things (IoT) in a nutshell
Internet of Things (IoT) in a nutshell
CARLOS III UNIVERSITY OF MADRID
 

More from CARLOS III UNIVERSITY OF MADRID (20)

Proyecto IVERES-UC3M
Proyecto IVERES-UC3MProyecto IVERES-UC3M
Proyecto IVERES-UC3M
 
RTVE: Sustainable Development Goal Radar
RTVE: Sustainable Development Goal  RadarRTVE: Sustainable Development Goal  Radar
RTVE: Sustainable Development Goal Radar
 
Engineering 4.0: Digitization through task automation and reuse
Engineering 4.0:  Digitization through task automation and reuseEngineering 4.0:  Digitization through task automation and reuse
Engineering 4.0: Digitization through task automation and reuse
 
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
 
SESE 2021: Where Systems Engineering meets AI/ML
SESE 2021: Where Systems Engineering meets AI/MLSESE 2021: Where Systems Engineering meets AI/ML
SESE 2021: Where Systems Engineering meets AI/ML
 
Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...
 
Deep Learning Notes
Deep Learning NotesDeep Learning Notes
Deep Learning Notes
 
H2020-AHTOOLS Use Case 3 Functional Design
H2020-AHTOOLS Use Case 3 Functional DesignH2020-AHTOOLS Use Case 3 Functional Design
H2020-AHTOOLS Use Case 3 Functional Design
 
AI4SE: Challenges and opportunities in the integration of Systems Engineering...
AI4SE: Challenges and opportunities in the integration of Systems Engineering...AI4SE: Challenges and opportunities in the integration of Systems Engineering...
AI4SE: Challenges and opportunities in the integration of Systems Engineering...
 
INCOSE IS 2019: AI and Systems Engineering
INCOSE IS 2019: AI and Systems EngineeringINCOSE IS 2019: AI and Systems Engineering
INCOSE IS 2019: AI and Systems Engineering
 
Challenges in the integration of Systems Engineering and the AI/ML model life...
Challenges in the integration of Systems Engineering and the AI/ML model life...Challenges in the integration of Systems Engineering and the AI/ML model life...
Challenges in the integration of Systems Engineering and the AI/ML model life...
 
Blockchain en la Industria Musical
Blockchain en la Industria MusicalBlockchain en la Industria Musical
Blockchain en la Industria Musical
 
OSLC KM: Elevating the meaning of data and operations within the toolchain
OSLC KM: Elevating the meaning of data and operations within the toolchainOSLC KM: Elevating the meaning of data and operations within the toolchain
OSLC KM: Elevating the meaning of data and operations within the toolchain
 
Blockchain y sector asegurador
Blockchain y sector aseguradorBlockchain y sector asegurador
Blockchain y sector asegurador
 
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
 
Systems and Software Architecture: an introduction to architectural modelling
Systems and Software Architecture: an introduction to architectural modellingSystems and Software Architecture: an introduction to architectural modelling
Systems and Software Architecture: an introduction to architectural modelling
 
News headline generation with sentiment and patterns: A case study of sports ...
News headline generation with sentiment and patterns: A case study of sports ...News headline generation with sentiment and patterns: A case study of sports ...
News headline generation with sentiment and patterns: A case study of sports ...
 
Blockchain y la industria musical
Blockchain y la industria musicalBlockchain y la industria musical
Blockchain y la industria musical
 
Preparing your Big Data start-up pitch
Preparing your Big Data start-up pitchPreparing your Big Data start-up pitch
Preparing your Big Data start-up pitch
 
Internet of Things (IoT) in a nutshell
Internet of Things (IoT) in a nutshellInternet of Things (IoT) in a nutshell
Internet of Things (IoT) in a nutshell
 

Recently uploaded

Bridging the gap: Online job postings, survey data and the assessment of job ...
Bridging the gap: Online job postings, survey data and the assessment of job ...Bridging the gap: Online job postings, survey data and the assessment of job ...
Bridging the gap: Online job postings, survey data and the assessment of job ...
Labour Market Information Council | Conseil de l’information sur le marché du travail
 
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdfTumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Henry Tapper
 
1.2 Business Ideas Business Ideas Busine
1.2 Business Ideas Business Ideas Busine1.2 Business Ideas Business Ideas Busine
1.2 Business Ideas Business Ideas Busine
Lawrence101
 
一比一原版美国新罕布什尔大学(unh)毕业证学历认证真实可查
一比一原版美国新罕布什尔大学(unh)毕业证学历认证真实可查一比一原版美国新罕布什尔大学(unh)毕业证学历认证真实可查
一比一原版美国新罕布什尔大学(unh)毕业证学历认证真实可查
taqyea
 
Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spirit
egoetzinger
 
SWAIAP Fraud Risk Mitigation Prof Oyedokun.pptx
SWAIAP Fraud Risk Mitigation   Prof Oyedokun.pptxSWAIAP Fraud Risk Mitigation   Prof Oyedokun.pptx
SWAIAP Fraud Risk Mitigation Prof Oyedokun.pptx
Godwin Emmanuel Oyedokun MBA MSc PhD FCA FCTI FCNA CFE FFAR
 
Money20/20 and EU Networking Event of 20/24!
Money20/20 and EU Networking Event of 20/24!Money20/20 and EU Networking Event of 20/24!
Money20/20 and EU Networking Event of 20/24!
FinTech Belgium
 
Instant Issue Debit Cards
Instant Issue Debit CardsInstant Issue Debit Cards
Instant Issue Debit Cards
egoetzinger
 
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
qntjwn68
 
一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理
bbeucd
 
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
obyzuk
 
5 Tips for Creating Standard Financial Reports
5 Tips for Creating Standard Financial Reports5 Tips for Creating Standard Financial Reports
5 Tips for Creating Standard Financial Reports
EasyReports
 
快速制作美国迈阿密大学牛津分校毕业证文凭证书英文原版一模一样
快速制作美国迈阿密大学牛津分校毕业证文凭证书英文原版一模一样快速制作美国迈阿密大学牛津分校毕业证文凭证书英文原版一模一样
快速制作美国迈阿密大学牛津分校毕业证文凭证书英文原版一模一样
rlo9fxi
 
Economic Risk Factor Update: June 2024 [SlideShare]
Economic Risk Factor Update: June 2024 [SlideShare]Economic Risk Factor Update: June 2024 [SlideShare]
Economic Risk Factor Update: June 2024 [SlideShare]
Commonwealth
 
Who Is the Largest Producer of Soybean in India Now.pdf
Who Is the Largest Producer of Soybean in India Now.pdfWho Is the Largest Producer of Soybean in India Now.pdf
Who Is the Largest Producer of Soybean in India Now.pdf
Price Vision
 
一比一原版(UCL毕业证)伦敦大学|学院毕业证如何办理
一比一原版(UCL毕业证)伦敦大学|学院毕业证如何办理一比一原版(UCL毕业证)伦敦大学|学院毕业证如何办理
一比一原版(UCL毕业证)伦敦大学|学院毕业证如何办理
otogas
 
快速办理(SMU毕业证书)南卫理公会大学毕业证毕业完成信一模一样
快速办理(SMU毕业证书)南卫理公会大学毕业证毕业完成信一模一样快速办理(SMU毕业证书)南卫理公会大学毕业证毕业完成信一模一样
快速办理(SMU毕业证书)南卫理公会大学毕业证毕业完成信一模一样
5spllj1l
 
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
sameer shah
 
How Non-Banking Financial Companies Empower Startups With Venture Debt Financing
How Non-Banking Financial Companies Empower Startups With Venture Debt FinancingHow Non-Banking Financial Companies Empower Startups With Venture Debt Financing
How Non-Banking Financial Companies Empower Startups With Venture Debt Financing
Vighnesh Shashtri
 
BONKMILLON Unleashes Its Bonkers Potential on Solana.pdf
BONKMILLON Unleashes Its Bonkers Potential on Solana.pdfBONKMILLON Unleashes Its Bonkers Potential on Solana.pdf
BONKMILLON Unleashes Its Bonkers Potential on Solana.pdf
coingabbar
 

Recently uploaded (20)

Bridging the gap: Online job postings, survey data and the assessment of job ...
Bridging the gap: Online job postings, survey data and the assessment of job ...Bridging the gap: Online job postings, survey data and the assessment of job ...
Bridging the gap: Online job postings, survey data and the assessment of job ...
 
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdfTumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
Tumelo-deep-dive-into-pass-through-voting-Feb23 (1).pdf
 
1.2 Business Ideas Business Ideas Busine
1.2 Business Ideas Business Ideas Busine1.2 Business Ideas Business Ideas Busine
1.2 Business Ideas Business Ideas Busine
 
一比一原版美国新罕布什尔大学(unh)毕业证学历认证真实可查
一比一原版美国新罕布什尔大学(unh)毕业证学历认证真实可查一比一原版美国新罕布什尔大学(unh)毕业证学历认证真实可查
一比一原版美国新罕布什尔大学(unh)毕业证学历认证真实可查
 
Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spirit
 
SWAIAP Fraud Risk Mitigation Prof Oyedokun.pptx
SWAIAP Fraud Risk Mitigation   Prof Oyedokun.pptxSWAIAP Fraud Risk Mitigation   Prof Oyedokun.pptx
SWAIAP Fraud Risk Mitigation Prof Oyedokun.pptx
 
Money20/20 and EU Networking Event of 20/24!
Money20/20 and EU Networking Event of 20/24!Money20/20 and EU Networking Event of 20/24!
Money20/20 and EU Networking Event of 20/24!
 
Instant Issue Debit Cards
Instant Issue Debit CardsInstant Issue Debit Cards
Instant Issue Debit Cards
 
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
1:1制作加拿大麦吉尔大学毕业证硕士学历证书原版一模一样
 
一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB毕业证)圣芭芭拉分校毕业证如何办理
 
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
一比一原版(GWU,GW毕业证)加利福尼亚大学|尔湾分校毕业证如何办理
 
5 Tips for Creating Standard Financial Reports
5 Tips for Creating Standard Financial Reports5 Tips for Creating Standard Financial Reports
5 Tips for Creating Standard Financial Reports
 
快速制作美国迈阿密大学牛津分校毕业证文凭证书英文原版一模一样
快速制作美国迈阿密大学牛津分校毕业证文凭证书英文原版一模一样快速制作美国迈阿密大学牛津分校毕业证文凭证书英文原版一模一样
快速制作美国迈阿密大学牛津分校毕业证文凭证书英文原版一模一样
 
Economic Risk Factor Update: June 2024 [SlideShare]
Economic Risk Factor Update: June 2024 [SlideShare]Economic Risk Factor Update: June 2024 [SlideShare]
Economic Risk Factor Update: June 2024 [SlideShare]
 
Who Is the Largest Producer of Soybean in India Now.pdf
Who Is the Largest Producer of Soybean in India Now.pdfWho Is the Largest Producer of Soybean in India Now.pdf
Who Is the Largest Producer of Soybean in India Now.pdf
 
一比一原版(UCL毕业证)伦敦大学|学院毕业证如何办理
一比一原版(UCL毕业证)伦敦大学|学院毕业证如何办理一比一原版(UCL毕业证)伦敦大学|学院毕业证如何办理
一比一原版(UCL毕业证)伦敦大学|学院毕业证如何办理
 
快速办理(SMU毕业证书)南卫理公会大学毕业证毕业完成信一模一样
快速办理(SMU毕业证书)南卫理公会大学毕业证毕业完成信一模一样快速办理(SMU毕业证书)南卫理公会大学毕业证毕业完成信一模一样
快速办理(SMU毕业证书)南卫理公会大学毕业证毕业完成信一模一样
 
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
STREETONOMICS: Exploring the Uncharted Territories of Informal Markets throug...
 
How Non-Banking Financial Companies Empower Startups With Venture Debt Financing
How Non-Banking Financial Companies Empower Startups With Venture Debt FinancingHow Non-Banking Financial Companies Empower Startups With Venture Debt Financing
How Non-Banking Financial Companies Empower Startups With Venture Debt Financing
 
BONKMILLON Unleashes Its Bonkers Potential on Solana.pdf
BONKMILLON Unleashes Its Bonkers Potential on Solana.pdfBONKMILLON Unleashes Its Bonkers Potential on Solana.pdf
BONKMILLON Unleashes Its Bonkers Potential on Solana.pdf
 

Detection of fraud in financial blockchain-based transactions through big data analytics

  • 1. Detection of fraud in financial blockchain-based transactions through big data analytics Jessica P´aez Bonilla Director: Jose Maria ´Alvarez Rodr´ıguez Universidad Carlos III de Madrid Master in Big Data Analytics 2017-2018 July 11,2018 Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 1 / 27
  • 2. Overview 1 Introduction 2 Project Objectives 3 System Design 4 Implementation 5 Experiment 6 Project Budget and Plan 7 Legal Framework and socio-economic environment 8 Conclusions and Future works Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 2 / 27
  • 3. Introduction Using analytical techniques -data gathering, preprocessing, and model building- it could be possible to detect and prevent financial fraud. The aim to describe complex fraud in terms of patterns suitable for system-driven detection and analysis. Network analysis can provide useful insight into large datasets based on the interconnectedness of the agents in the network being analyzed. Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 3 / 27
  • 4. Introduction Network: shows relationships among the blockchain users and flux of money. It enables the fraud patterns discovery. Network graph analysis offers a method for capturing the context of fraud in a standard, machine readable and transferable format. Associations learned from visually observing fraudulent transactions, could be used as knowledge input to create analytical models. Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 4 / 27
  • 5. Project Objectives 1 Research techniques used for fraud detection and explore blockchain data. 2 Design a system that could take into account the patterns surrounding the fraudulent transactions. 3 Implement a system using big data analytic tools like R and Python. 4 Experiment and validate the designed system. Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 5 / 27
  • 6. System Overview Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 6 / 27
  • 7. System Design - Network Metrics Metric Interpretation Degree Influence on the network Closeness How quick is the access to other nodes in the network Betweeness Node location. Is it in the shortest path to other nodes? Density Level of linkage among the nodes Modularity How modular the network is Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 7 / 27
  • 8. Implementation - Technology used BigQuery, R (igraph) and Python have been used in the development of this system. Table 1: Used Packages Versions Package Used Version matplotlib 1.5.1 pandas 0.19.2 networkx 1.11 community 0.9 numpy 1.11.3 scipy 0.18.1 Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 8 / 27
  • 9. Experiment - Steps 1 Data Exploration. 2 Network metrics and extraction of communities. 3 Features and ML algorithms selection. 4 Performance Measures. 5 Execution. 6 Analysis of Results. 7 Experiment Limitations. Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 9 / 27
  • 10. Experiment - 1. Data Exploration Bitcoin blockchain data was explored using BigQuery. A data segment containing fraudulent movements was chosen as sample for analysis in this project. Figure 1: Blocks over time Figure 2: Transactions in the sample Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 10 / 27
  • 11. Experiment - 2. Network Metrics and extraction of communities Communities 1 Network modeling 2 Clustering 3 Giant Component Figure 3: Communities extraction Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 11 / 27
  • 12. Experiment - 3. Features and ML algorithms selection Figure 4: Selected features ML Algorithms 1 Decision Tree 1 White-box modeled. Can be interpreted. 2 Perform well on imbalanced datasets. 2 Random Forest 1 Ensemble: combine the predictions of several base estimators in order to improve robustness over a single estimator. 2 Each tree in the ensemble is built from a sample drawn with replacement Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 12 / 27
  • 13. Experiment - 4. Performance Measures Classification Precision It gives the percentage of correct predictions. Confusion Matrix It is a 2x2 matrix that tells us the types of errors that the classifier is making. AUC - Area Under the (ROC) Curve It is a single number summary of classifier performance, useful even when there is class imbalance. Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 13 / 27
  • 14. Experiment - 5. Execution Once the features (transaction network metrics) are obtained, and ML algortithms and its performance metrics are defined, 2 main tasks need to be run before fitting the system. Observations Labeling Analysis of a real fraudulent transaction. Dataset Balancing Once the dataset is labeled, there were many more observations of one class. An oversampling technique was applied in order to balance it. Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 14 / 27
  • 15. Experiment- 5.1. Analysis of a fraudulent transaction Figure 5: Fraudster Neighbours Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 15 / 27
  • 16. Experiment - 5.2. Dataset Balancing The dataset used has around 30k observations in the training set and around 7k in the test set. Python package Imbalanced-learn was used. It applies an oversampling on the minority class. Table 2: Proportion of classes Dataset Class Proportion Train Suspicious 0.498627 Train Non-suspicious 0.501373 Test Suspicious 0.500343 Test Non-suspicious 0.499657 Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 16 / 27
  • 17. Experiment - 6. Analysis of Results The obtained metrics of the selected ML algorithms are summarized in the table below: Table 3: Classification Metrics Comparison Model Class. Accuracy Sensitivity AUC Decision Tree 0.9989 0.9979 0.9994 Random Forest 0.9619 0.9752 0.9974 The selected method was the Random Forest, as was the one giving more weight to the different network metrics and still achieving a high accuracy. Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 17 / 27
  • 18. Experiment - 6. Analysis of Results The weight given to each of the features of Random Forest is presented in this barchart. Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 18 / 27
  • 19. Experiment - 6. Analysis of Results Table 4: Classification Metrics - Random Forest METRIC VALUE Classification accuracy 0.9619 Classification error 0.0380 Sensitivity 0.9752 Specificity 0.9487 False positive rate 0.0512 Precision 0.9502 AUC 0.9974 Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 19 / 27
  • 20. Experiment - 6. Analysis of Results ROC Curve obtained: Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 20 / 27
  • 21. Experiment - 7. Limitations Studying more known cases of fraud within the bitcoin blockchain, it could be possible to increase the known fraudulent transaction patterns. Having more data will also help to prevent the overfitting with decision trees, as the tree design would not be able to cover all the training data. Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 21 / 27
  • 22. Project Budget A summary of the project budget is presented in the table. Cost Total (AC) Direct Costs 8,827.5 Indirect Costs 882,75 Total Costs 9,710.25 Profit (10%) 971.025 Cost + Profit 10,681.275 IVA (21%) 2,243.06 TOTAL + IVA 12,924.343 Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 22 / 27
  • 23. Project Planification Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 23 / 27
  • 24. Legal Framework and socio-economic environment Legal Framework: The Bitcoin blockchain data is now available for exploration with BigQuery, using Google Cloud services. Data is public and no licensing is required. Socio-economic environment: Blockchain technology is rapidly evolving and will be widely used in the finance world in the coming years. 10 % of world GDP will be stored in blockchains by 2020. IoT era also promotes the Fintech revolution. It creates the challenge to develop and apply different sets of techniques in order to detect fraud in these new digital platforms. Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 24 / 27
  • 25. Conclusions 1 Business: Detecting and flagging activity suspicious of fraud before it actually takes place could save billions annually in both developed and non-developed economies. 2 Technical: The proposed system can flag a suspicious blockchain transaction with a high accuracy taking into account network metrics resulting of modeling the giant components of the transactions. 3 Personal: Learning of a ongrowing sector (”Fintech”) that combines finance and technology as well as of how the analytic techniques can be applied to it. Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 25 / 27
  • 26. Future works 1 Create a software platform that could access and integrate both environments R and Python. 2 This platform could be running continuously and flag by means of an UI whenever the model classifies a new observation as Suspicious. 3 Knowing more patterns of fraudulent transactions can help to avoid the overfitting in the models. 4 Try other network metrics (like mean neighbour degree, node correlation similarity etc..) as features for the classification model. Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 26 / 27
  • 27. Thank you for your attention Jessica P´aez Bonilla (UC3M) Master Thesis July 11,2018 27 / 27