SlideShare a Scribd company logo
1 of 33
COMPARATIVE STUDY OF VARIOUS APPROACHES
FOR TRANSACTION FRAUD DETECTION USING
DATA MINING
Submitted By
PRATIBHA SINGH
M.Tech(CS)/4505/14
Guided By :
Dr. B. B. SAGAR (asst. prof., Dept of CSE)
Mrs. S. MALLIKA(asst. prof., Dept of CSE)
CONTENT
 AIM
 INTRODUCTION
 LITERATURE SURVEY
 DEFINATION OF DATAMINING
 ORIGIN OF DATA MINING
 DATAMINING PROCESS
 APPROACHES TO USE DATAMINING
 DATA MINING TASK
 WHY TO PROCESSESSING OF DATA
REQUIRED
 DATA MINING TECHNIQUES AND
TYPES OF FRAUD
 DESCRIPTION OF DATASET USED IN
OUR STUDY
 IMPLEMENTATION OF NN, LR AND
KNN.
 PREFORMANCE EVALUATION OF
VARIOUS MODELS
 RESULT
 CONCLUSION
 REFRENCES
 PAPER PUBLISHED
AIM
Our aim is to compares three different predictive data-
mining techniques (Neural Network, Logistic
Regression and K-Nearest neighbour) on the dataset
taken from large Brazilian bank, with registers within
time window between Jul/14/2004 through
Sep/12/2004. Each register represents a credit card
authorization, with only approved transactions
excluding the visualization of the denied transactions.
The simulation of the data mining prediction is done
on R console for better understanding and visualize
the result in the form of ROC, Lift Chart, PR curve,
Confusion Matrix and other skill scores for better
understanding.
INTRODUCTION
 Fraud detection as we all know it is a process of
detecting fraud through various data mining &
machine learning approaches.
 In this research we will be comparing various
approaches for transaction fraud detection using
data mining and machine learning techniques. The
comparison will show the results of various
transaction fraud detection techniques applied on
real dataset based on certain parameters. With these
results we are showing how accurate our method
works on real datasets and our aim is be to find out
the most suitable method which would help in
catching fraud, will be cost sensitive and which will
reduced false rate,etc.
LITERATURE SURVEY
 As we can see in paper [5] S.Bhattacharyya has done
comparative study of various approaches on one
synthetic dataset and analysed results of logistic
regression, random forest and SVM. And the results
shows LR is best in his research.
 I took this[5] paper as a base paper and done
comparative study on LR, NN and KNN on real dataset
which would be helpful for further research.
DATA MINING
The efficient discovery of previously
unknown, valid, potentially useful,
understandable patterns in large
datasets
The analysis of (often large)
observational data sets to find
unsuspected relationships and to
summarize the data in novel ways
that are both understandable and
useful to the data owner
TERMS
Data
Pattern:
Attribute
Interestingness
AI /
Machine Learning
Statistics
Data Mining
Database
systems
ORIGINS OF DATA MINING
KNOWLEDGE DISCOVERY
APPROACH TO USE DATA MINING
Identify the problem
Use data mining
techniques to
transform the data
into information
Act on the
information
Measure the results
Understand the
domain
Create a dataset
Choose the data
mining task and the
specific algorithm
Interpret the
results
Select the
interesting
attributes
Data cleaning
and
preprocessing
General Approach
Data Mining Tasks
Classification
learning a
function that
maps an item
into one of a set
of predefined
classes
Regression
Learning a
function that
maps an
item to a real
value.
OR
A
Independent
variable to a
dependent
variable
Clustering
Identify a
set of groups
of similar
items
Dependencies
and
associations
Identify
significant
dependencies
between data
attributes
Summarization
find a compact
description of
the dataset or a
subset of the
dataset
Why Preprocessing of Data is required?
 The available data does not full fill the requirement of input
data in Data mining process
 The attributes are not properly defined for Data mining
process
 Incomplete data: lacking attribute values, lacking certain
attributes of interest, or containing only aggregate data.
 Noisy data: containing errors or outliers
 Incomplete Data:
 Attributes of interest are not available (e.g., customer information for sales
transaction data)
 Missing/unknown values for some data
Since, data quality are not good then the result of data
mining are also not good and vary
 Quality decisions must be based on quality data
So, Data cleaning are required
DATA MINING
TECHNIQUES
Classification
Regression
Clustering
Visualisation
Outlier
detection
Prediction
TYPES OF FRAUD
Header Attributes Symbol Attribute Description
A MCC Merchant Category Code
B MCC_P Merchant Category Code from
previous transaction of the same
credit card
C Post_zip_code Post/zip code
D Post_zip_code_P Post/zip code from previous
transaction
E Amount Amount of money of the current
transaction
F Amount_P Amount of money of the previous
transaction
G Type_T Type of transaction - Card present
(Normal transaction), Internet,
Telephone, direct debit, etc
H Credit_limit Credit limit of the account
I Brand_scheme Brand/scheme - Visa,
MasterCard, Diners, JCB, etc
J Variant Variant - Local, International,
Gold, Platinum
K F_score Fraud score of the previous
transaction (this is really - really
important to know)
L No_of_instalm
ents
Number of instalments of the
current transaction
M Time_Last_T Time in minutes since the last
transaction
N Diff_F_Score Difference between the fraud
score of the current previous
transaction and fraud score from
the one before
O M_T_Limit Merchant transaction limit,
maximum amount of money
allowed for a transaction in that
specific type of business
P Flag Fraud transaction flag (N = No
fraud , S = Yes Fraud)
DESCRIPTION OF THE DATASET TAKEN FOR THE STUDY
NUMBER OF FRAUDS AND LEGAL IN
EACH SPLIT IN DATASET
FRAUD DETECTION METHOD
MCC MCC_
P
Post_zi
p_code
Post_zi
p_code
_P
Amoun
t
Amoun
t_P
Type_
T
Credit_
limit
Brand_
scheme
Variant F_scor
e
VIP_C
ommo
n
Local_
Interan
tional
No_of_
instalm
ents
Time_
Last_T
Diff_F
_Score
M_T_
Limit
Flag
Class
33 33 10 10 10 10 10 10 6 6 10 2 2 4 8 6 9 2
Test
Set
Learn
Classifier Model
Training
Set
Curves Confusion Matrix
ROC PR LIFT
VALIDATION
NN K-NN LR
Structure of
the dataset
FRAUD DETECTION USING NEURAL NETWORK
Dataset
training
testing
R-Script
R-Library: Neuralnet
Algorithm: resilient backpropagation
Hidden layer: c(4,2) OR 8
Activation fn: Logistic
Error fn : ‘sse’ (sum of squared
R Code for Implementation of Neural Network
FRAUD DETECTION USING K-NEAREST NEIGHBOUR
Dataset
trainset
testset
R-Script
R-Library: Class
Algorithm: knn
k = 1, 2, 3, 4, 5
Categ-
orical
Class
(x - min(x)) /
(max(x) - min(x)
Normalization
Dataset
Trainset target
Testset target
Best K
R CODE FOR IMPLEMENTATION OF
K-NEAREST NEIGHBOR
FRAUD DETECTION USING LOGISTIC REGRESSION
Dataset
training
testing
R-Script
R-Library: Linear model (in-built)
function: glm
Family : Binomial
R CODE FOR IMPLEMENTATION OF
LOGISTIC REGRESSION
FRAUD DETECTION MODEL VALIDATION
 Stands for “Receiver Operating
Characteristic”
 From signal processing: trade-off
between hit rate and false alarm rate
over noisy channel
 Compute FPR, TPR and plot them in
ROC space
 Every classifier is a point in ROC space
 For probabilistic algorithms
ROC Analysis
Area Under
Curve (AUC)
=0.75 AUC
TP Rate (Sensitivity):
FP Rate (fall-out):
+ -
+
-
TP
FN
FP
TN
actual
TP+FN FP+TN
true positive false positive
false negative true negative
FRAUD DETECTION MODEL VALIDATION (CONT.)
Confusion Matrix
Recall
TP+FP
FRAUD DETECTION MODEL VALIDATION (CONT.)
Positive Predicted Value (PPV)
P(TP): % True Positives: Sensitivity
P(FP): % False Positives: 1 – Specificity
PERFORMANCE MEASURE CALCULATED FROM CONFUSION
MATRIX OF NEURAL NETWORK (NN), K-NEAREST NEIGHBOUR
(KNN) AND LOGISTIC REGRESSION (LR)
Performance
Measure
Neural Network K-NN LR
Accuracy 96.2 % 97.14 % 96.19 %
AUC 0.856 0.77 0.86
Execution Time 17 seconds 3 seconds Instant
Detection rate 96.06 % 95.03% 95.7 %
Sensitivity 99.7 98.7 % 99.4 %
Specificity 4 56.7 % 12.2 %
Precision 96.4 % 98.3 % 96.7 %
RMSE 0.194 0.16 0.17
RSquare 0.14 0.33 0.14
RESULTS (ROC)
NN
K-NN
LR
RESULTS (LIFT CHART)
LIFT IS A MEASURE OF THE EFFECTIVENESS OF A PREDICTIVE MODEL CALCULATED AS
THE RATIO BETWEEN THE RESULTS OBTAINED WITH AND WITHOUT THE PREDICTIVE
MODEL.
NN
K-NN
LR
RESULTS (PR CHART)
NN
K-NN
LR
CONCLUSION
 The performance of the three data mining algorithm are compared
and we found that the fraud detection rate of NN is highest among all
three algorithm taken for the study and Specificity is lowest that
shows the NN has good predictive model than other two but if we
take the execution time and RMSE the KNN algorithm is better than
NN. So the overall results showed the performance of KNN and NN is
better than Logistic Regression but KNN and NN both take some
execution time to process the large data. But if we take a better
Argument and function to implement the model like Backpropagation
and Hidden Node reduce the execution time for NN on the other hand
the value of K and other function affects the execution time for
implementing the model using KNN.
 We believe that these results are very promising and supportive of a
multi-algorithmic approach to classifying and assessing large, noisy
and real time data sets, and future work will focus upon testing the
algorithms and resolution strategies on similarly complex data sets
from other real-world domains.
REFERENCES
[1] S. Benson Edwin Raj, A. Annie Portia “Analysis on Credit Card Fraud Detection Methods”.
IEEE-International Conference on Computer, Communication and Electrical Technology; (2011).
(152-156).
[2] Haruna, C., abdul-kareem., S. abubakar. A: A Framework for selecting the optimal technique
suitable for application in data mining task., Future information technology,163-169, (2014).
[3] Manoel Fernando Alonso Gadi, Xidi Wang, and Alair Pereira do Lago. Credit card fraud detection
with artificial immune system, 7th international conference, icaris 2008, phuket, thailand, august
10-13, 2008, proceedings. In ICARIS, volume 5132 of Lecture Notes in Computer Science, pages
119 – 131. Springer, 2008
[4] E.W.T. Ngai, Yong Hu, Y.H. Wong, Yijun Chen, Xin Sun “The application of data mining techniques
in financial fraud detection: A classification framework and an academic review of literature”.
Elsevier-Decision Support Systems.50; (559–569), (2011).
[5] S.Bhattacharyya, S. Jha, K. Tharakunnel, J.C. Westland, “Data mining for credit card fraud: A
comparative study”, in Elsevier- Decision Support Systems, 2011.
[6] Usama, M. Fayyad, et al., Advances in Knowledge Discovery and Data Mining. Cambridge,
Mass.: MIT Press (1996).
[7] Han, Jun; Morag, Claudio, “The influence of the sigmoid function parameters on the speed of
Backpropagation learning", In Mira, José, Sandoval, Francisco, From Natural to Artificial Neural
Computation. pp. 195–201, 1995.
[8] Neda .S Halvaiee, M. Kazem Akbari “A novel model for credit card fraud detection using Artificial
Immune Systems”, Elsevier-Applied soft computing, Vol- 24, pp 40-49, 2014.
[9] F. Campos, S. Cavalcante, An extended approach for Dempster–Shafer theory, in:
Proceedings of the IEEE International Conference on Information Reuse and Integration, 2003,
pp. 338–344.
RESEARCH PAPER PUBLISHED
IEEE CONFERENCE ID: 37465
Comparative study of various approaches for transaction Fraud Detection using Machine Learning Algorithms

More Related Content

What's hot

Credit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksCredit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksDatabricks
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperGarvit Burad
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)k.surya kumar
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithmsankit panigrahy
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMamiteshg
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Andrea Dal Pozzolo
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsHariteja Bodepudi
 
Detecting fraud with Python and machine learning
Detecting fraud with Python and machine learningDetecting fraud with Python and machine learning
Detecting fraud with Python and machine learningwgyn
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleImpetus Technologies
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud DetectionNitesh Kumar
 
Machine Learning
Machine LearningMachine Learning
Machine LearningVivek Garg
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learningSandeep Garg
 
Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)ajmal anbu
 

What's hot (20)

Credit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksCredit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In Databricks
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Credit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research PaperCredit Card Fraudulent Transaction Detection Research Paper
Credit Card Fraudulent Transaction Detection Research Paper
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
 
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHMHEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
HEART DISEASE PREDICTION USING NAIVE BAYES ALGORITHM
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?
 
CREDIT_CARD.ppt
CREDIT_CARD.pptCREDIT_CARD.ppt
CREDIT_CARD.ppt
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
 
Detecting fraud with Python and machine learning
Detecting fraud with Python and machine learningDetecting fraud with Python and machine learning
Detecting fraud with Python and machine learning
 
1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)
 
Machine Learning by Rj
Machine Learning by RjMachine Learning by Rj
Machine Learning by Rj
 
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare CommunitiesDisease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
 

Similar to Comparative study of various approaches for transaction Fraud Detection using Machine Learning Algorithms

Life and science journal.pdf
Life and science journal.pdfLife and science journal.pdf
Life and science journal.pdfSarita30844
 
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...IJERD Editor
 
BIG DATA ANALYTICS FOR USER-ACTIVITY ANALYSIS AND USER-ANOMALY DETECTION IN...
 BIG DATA ANALYTICS FOR USER-ACTIVITY  ANALYSIS AND USER-ANOMALY DETECTION IN... BIG DATA ANALYTICS FOR USER-ACTIVITY  ANALYSIS AND USER-ANOMALY DETECTION IN...
BIG DATA ANALYTICS FOR USER-ACTIVITY ANALYSIS AND USER-ANOMALY DETECTION IN...Nexgen Technology
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsimtiaz khan
 
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection SystemThe Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection SystemIOSRjournaljce
 
Proposal for System Analysis and Desing
Proposal for System Analysis and DesingProposal for System Analysis and Desing
Proposal for System Analysis and DesingMd Khaza Main Uddin
 
IRJET - Crime Analysis and Prediction - by using DBSCAN Algorithm
IRJET -  	  Crime Analysis and Prediction - by using DBSCAN AlgorithmIRJET -  	  Crime Analysis and Prediction - by using DBSCAN Algorithm
IRJET - Crime Analysis and Prediction - by using DBSCAN AlgorithmIRJET Journal
 
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubsUnsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubsIRJET Journal
 
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptxNEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptxagniva pradhan
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
 
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...IJCSIS Research Publications
 
Crime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data MiningCrime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data MiningAnavadya Shibu
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET Journal
 
Cryptanalysis of Cipher texts using Artificial Neural Networks: A comparative...
Cryptanalysis of Cipher texts using Artificial Neural Networks: A comparative...Cryptanalysis of Cipher texts using Artificial Neural Networks: A comparative...
Cryptanalysis of Cipher texts using Artificial Neural Networks: A comparative...IRJET Journal
 
San Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contestSan Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contestSameer Darekar
 
IRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation ForestIRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation ForestIRJET Journal
 
Cyb 5675 class project final
Cyb 5675   class project finalCyb 5675   class project final
Cyb 5675 class project finalCraig Cannon
 

Similar to Comparative study of various approaches for transaction Fraud Detection using Machine Learning Algorithms (20)

Life and science journal.pdf
Life and science journal.pdfLife and science journal.pdf
Life and science journal.pdf
 
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
A Novel Framework For Numerical Character Recognition With Zoning Distance Fe...
 
BIG DATA ANALYTICS FOR USER-ACTIVITY ANALYSIS AND USER-ANOMALY DETECTION IN...
 BIG DATA ANALYTICS FOR USER-ACTIVITY  ANALYSIS AND USER-ANOMALY DETECTION IN... BIG DATA ANALYTICS FOR USER-ACTIVITY  ANALYSIS AND USER-ANOMALY DETECTION IN...
BIG DATA ANALYTICS FOR USER-ACTIVITY ANALYSIS AND USER-ANOMALY DETECTION IN...
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
 
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection SystemThe Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
The Use of K-NN and Bees Algorithm for Big Data Intrusion Detection System
 
Proposal for System Analysis and Desing
Proposal for System Analysis and DesingProposal for System Analysis and Desing
Proposal for System Analysis and Desing
 
IRJET - Crime Analysis and Prediction - by using DBSCAN Algorithm
IRJET -  	  Crime Analysis and Prediction - by using DBSCAN AlgorithmIRJET -  	  Crime Analysis and Prediction - by using DBSCAN Algorithm
IRJET - Crime Analysis and Prediction - by using DBSCAN Algorithm
 
Cerdit card
Cerdit cardCerdit card
Cerdit card
 
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubsUnsupervised Distance Based Detection of Outliers by using Anti-hubs
Unsupervised Distance Based Detection of Outliers by using Anti-hubs
 
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptxNEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
NEAREST NEIGHBOUR CLUSTER ANALYSIS.pptx
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
ICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptxICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptx
 
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
Comparative Analysis of K-Means Data Mining and Outlier Detection Approach fo...
 
Crime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data MiningCrime Data Analysis, Visualization and Prediction using Data Mining
Crime Data Analysis, Visualization and Prediction using Data Mining
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection Analysis
 
Cryptanalysis of Cipher texts using Artificial Neural Networks: A comparative...
Cryptanalysis of Cipher texts using Artificial Neural Networks: A comparative...Cryptanalysis of Cipher texts using Artificial Neural Networks: A comparative...
Cryptanalysis of Cipher texts using Artificial Neural Networks: A comparative...
 
San Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contestSan Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contest
 
IRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation ForestIRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation Forest
 
Cyb 5675 class project final
Cyb 5675   class project finalCyb 5675   class project final
Cyb 5675 class project final
 
PPT.pptx
PPT.pptxPPT.pptx
PPT.pptx
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

Comparative study of various approaches for transaction Fraud Detection using Machine Learning Algorithms

  • 1. COMPARATIVE STUDY OF VARIOUS APPROACHES FOR TRANSACTION FRAUD DETECTION USING DATA MINING Submitted By PRATIBHA SINGH M.Tech(CS)/4505/14 Guided By : Dr. B. B. SAGAR (asst. prof., Dept of CSE) Mrs. S. MALLIKA(asst. prof., Dept of CSE)
  • 2. CONTENT  AIM  INTRODUCTION  LITERATURE SURVEY  DEFINATION OF DATAMINING  ORIGIN OF DATA MINING  DATAMINING PROCESS  APPROACHES TO USE DATAMINING  DATA MINING TASK  WHY TO PROCESSESSING OF DATA REQUIRED  DATA MINING TECHNIQUES AND TYPES OF FRAUD  DESCRIPTION OF DATASET USED IN OUR STUDY  IMPLEMENTATION OF NN, LR AND KNN.  PREFORMANCE EVALUATION OF VARIOUS MODELS  RESULT  CONCLUSION  REFRENCES  PAPER PUBLISHED
  • 3. AIM Our aim is to compares three different predictive data- mining techniques (Neural Network, Logistic Regression and K-Nearest neighbour) on the dataset taken from large Brazilian bank, with registers within time window between Jul/14/2004 through Sep/12/2004. Each register represents a credit card authorization, with only approved transactions excluding the visualization of the denied transactions. The simulation of the data mining prediction is done on R console for better understanding and visualize the result in the form of ROC, Lift Chart, PR curve, Confusion Matrix and other skill scores for better understanding.
  • 4. INTRODUCTION  Fraud detection as we all know it is a process of detecting fraud through various data mining & machine learning approaches.  In this research we will be comparing various approaches for transaction fraud detection using data mining and machine learning techniques. The comparison will show the results of various transaction fraud detection techniques applied on real dataset based on certain parameters. With these results we are showing how accurate our method works on real datasets and our aim is be to find out the most suitable method which would help in catching fraud, will be cost sensitive and which will reduced false rate,etc.
  • 5. LITERATURE SURVEY  As we can see in paper [5] S.Bhattacharyya has done comparative study of various approaches on one synthetic dataset and analysed results of logistic regression, random forest and SVM. And the results shows LR is best in his research.  I took this[5] paper as a base paper and done comparative study on LR, NN and KNN on real dataset which would be helpful for further research.
  • 6. DATA MINING The efficient discovery of previously unknown, valid, potentially useful, understandable patterns in large datasets The analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner TERMS Data Pattern: Attribute Interestingness
  • 7. AI / Machine Learning Statistics Data Mining Database systems ORIGINS OF DATA MINING
  • 9. APPROACH TO USE DATA MINING Identify the problem Use data mining techniques to transform the data into information Act on the information Measure the results Understand the domain Create a dataset Choose the data mining task and the specific algorithm Interpret the results Select the interesting attributes Data cleaning and preprocessing General Approach
  • 10. Data Mining Tasks Classification learning a function that maps an item into one of a set of predefined classes Regression Learning a function that maps an item to a real value. OR A Independent variable to a dependent variable Clustering Identify a set of groups of similar items Dependencies and associations Identify significant dependencies between data attributes Summarization find a compact description of the dataset or a subset of the dataset
  • 11. Why Preprocessing of Data is required?  The available data does not full fill the requirement of input data in Data mining process  The attributes are not properly defined for Data mining process  Incomplete data: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data.  Noisy data: containing errors or outliers  Incomplete Data:  Attributes of interest are not available (e.g., customer information for sales transaction data)  Missing/unknown values for some data Since, data quality are not good then the result of data mining are also not good and vary  Quality decisions must be based on quality data So, Data cleaning are required
  • 13. Header Attributes Symbol Attribute Description A MCC Merchant Category Code B MCC_P Merchant Category Code from previous transaction of the same credit card C Post_zip_code Post/zip code D Post_zip_code_P Post/zip code from previous transaction E Amount Amount of money of the current transaction F Amount_P Amount of money of the previous transaction G Type_T Type of transaction - Card present (Normal transaction), Internet, Telephone, direct debit, etc H Credit_limit Credit limit of the account I Brand_scheme Brand/scheme - Visa, MasterCard, Diners, JCB, etc J Variant Variant - Local, International, Gold, Platinum K F_score Fraud score of the previous transaction (this is really - really important to know) L No_of_instalm ents Number of instalments of the current transaction M Time_Last_T Time in minutes since the last transaction N Diff_F_Score Difference between the fraud score of the current previous transaction and fraud score from the one before O M_T_Limit Merchant transaction limit, maximum amount of money allowed for a transaction in that specific type of business P Flag Fraud transaction flag (N = No fraud , S = Yes Fraud) DESCRIPTION OF THE DATASET TAKEN FOR THE STUDY
  • 14.
  • 15. NUMBER OF FRAUDS AND LEGAL IN EACH SPLIT IN DATASET
  • 16. FRAUD DETECTION METHOD MCC MCC_ P Post_zi p_code Post_zi p_code _P Amoun t Amoun t_P Type_ T Credit_ limit Brand_ scheme Variant F_scor e VIP_C ommo n Local_ Interan tional No_of_ instalm ents Time_ Last_T Diff_F _Score M_T_ Limit Flag Class 33 33 10 10 10 10 10 10 6 6 10 2 2 4 8 6 9 2 Test Set Learn Classifier Model Training Set Curves Confusion Matrix ROC PR LIFT VALIDATION NN K-NN LR Structure of the dataset
  • 17. FRAUD DETECTION USING NEURAL NETWORK Dataset training testing R-Script R-Library: Neuralnet Algorithm: resilient backpropagation Hidden layer: c(4,2) OR 8 Activation fn: Logistic Error fn : ‘sse’ (sum of squared
  • 18. R Code for Implementation of Neural Network
  • 19. FRAUD DETECTION USING K-NEAREST NEIGHBOUR Dataset trainset testset R-Script R-Library: Class Algorithm: knn k = 1, 2, 3, 4, 5 Categ- orical Class (x - min(x)) / (max(x) - min(x) Normalization Dataset Trainset target Testset target Best K
  • 20. R CODE FOR IMPLEMENTATION OF K-NEAREST NEIGHBOR
  • 21. FRAUD DETECTION USING LOGISTIC REGRESSION Dataset training testing R-Script R-Library: Linear model (in-built) function: glm Family : Binomial
  • 22. R CODE FOR IMPLEMENTATION OF LOGISTIC REGRESSION
  • 23. FRAUD DETECTION MODEL VALIDATION  Stands for “Receiver Operating Characteristic”  From signal processing: trade-off between hit rate and false alarm rate over noisy channel  Compute FPR, TPR and plot them in ROC space  Every classifier is a point in ROC space  For probabilistic algorithms ROC Analysis Area Under Curve (AUC) =0.75 AUC
  • 24. TP Rate (Sensitivity): FP Rate (fall-out): + - + - TP FN FP TN actual TP+FN FP+TN true positive false positive false negative true negative FRAUD DETECTION MODEL VALIDATION (CONT.) Confusion Matrix Recall TP+FP
  • 25. FRAUD DETECTION MODEL VALIDATION (CONT.) Positive Predicted Value (PPV) P(TP): % True Positives: Sensitivity P(FP): % False Positives: 1 – Specificity
  • 26. PERFORMANCE MEASURE CALCULATED FROM CONFUSION MATRIX OF NEURAL NETWORK (NN), K-NEAREST NEIGHBOUR (KNN) AND LOGISTIC REGRESSION (LR) Performance Measure Neural Network K-NN LR Accuracy 96.2 % 97.14 % 96.19 % AUC 0.856 0.77 0.86 Execution Time 17 seconds 3 seconds Instant Detection rate 96.06 % 95.03% 95.7 % Sensitivity 99.7 98.7 % 99.4 % Specificity 4 56.7 % 12.2 % Precision 96.4 % 98.3 % 96.7 % RMSE 0.194 0.16 0.17 RSquare 0.14 0.33 0.14
  • 28. RESULTS (LIFT CHART) LIFT IS A MEASURE OF THE EFFECTIVENESS OF A PREDICTIVE MODEL CALCULATED AS THE RATIO BETWEEN THE RESULTS OBTAINED WITH AND WITHOUT THE PREDICTIVE MODEL. NN K-NN LR
  • 30. CONCLUSION  The performance of the three data mining algorithm are compared and we found that the fraud detection rate of NN is highest among all three algorithm taken for the study and Specificity is lowest that shows the NN has good predictive model than other two but if we take the execution time and RMSE the KNN algorithm is better than NN. So the overall results showed the performance of KNN and NN is better than Logistic Regression but KNN and NN both take some execution time to process the large data. But if we take a better Argument and function to implement the model like Backpropagation and Hidden Node reduce the execution time for NN on the other hand the value of K and other function affects the execution time for implementing the model using KNN.  We believe that these results are very promising and supportive of a multi-algorithmic approach to classifying and assessing large, noisy and real time data sets, and future work will focus upon testing the algorithms and resolution strategies on similarly complex data sets from other real-world domains.
  • 31. REFERENCES [1] S. Benson Edwin Raj, A. Annie Portia “Analysis on Credit Card Fraud Detection Methods”. IEEE-International Conference on Computer, Communication and Electrical Technology; (2011). (152-156). [2] Haruna, C., abdul-kareem., S. abubakar. A: A Framework for selecting the optimal technique suitable for application in data mining task., Future information technology,163-169, (2014). [3] Manoel Fernando Alonso Gadi, Xidi Wang, and Alair Pereira do Lago. Credit card fraud detection with artificial immune system, 7th international conference, icaris 2008, phuket, thailand, august 10-13, 2008, proceedings. In ICARIS, volume 5132 of Lecture Notes in Computer Science, pages 119 – 131. Springer, 2008 [4] E.W.T. Ngai, Yong Hu, Y.H. Wong, Yijun Chen, Xin Sun “The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature”. Elsevier-Decision Support Systems.50; (559–569), (2011). [5] S.Bhattacharyya, S. Jha, K. Tharakunnel, J.C. Westland, “Data mining for credit card fraud: A comparative study”, in Elsevier- Decision Support Systems, 2011. [6] Usama, M. Fayyad, et al., Advances in Knowledge Discovery and Data Mining. Cambridge, Mass.: MIT Press (1996). [7] Han, Jun; Morag, Claudio, “The influence of the sigmoid function parameters on the speed of Backpropagation learning", In Mira, José, Sandoval, Francisco, From Natural to Artificial Neural Computation. pp. 195–201, 1995. [8] Neda .S Halvaiee, M. Kazem Akbari “A novel model for credit card fraud detection using Artificial Immune Systems”, Elsevier-Applied soft computing, Vol- 24, pp 40-49, 2014. [9] F. Campos, S. Cavalcante, An extended approach for Dempster–Shafer theory, in: Proceedings of the IEEE International Conference on Information Reuse and Integration, 2003, pp. 338–344.
  • 32. RESEARCH PAPER PUBLISHED IEEE CONFERENCE ID: 37465