Data mining for credit
card fraud: A
comparative study
SiddharthaBhattacharyya, SanjeevJha, KurianTharakunne,
ChristopherWestland (2011)
DOR BAHDUR BUDHATHOKI SID:45057
NARENDRA SHARMA SID:45040
Abstract
• This paper evaluates two advanced data mining approaches, support
vector machines and random forests, together with the well-known
logistic regression, as part of an attempt to better detect (and thus control
and prosecute) credit card fraud. The study is based on real-life data of
transactions from an international credit card operation.
Introduction
• Data mining- Practice of examining large pre-existing database in order to
generate new information. It is helpful to turn raw data into useful
information, knowledge discovery, predictive analysis( to apply past
outcomes to predict future)
• Credit Card Fraud- there are two types of credit card fraud,
1) Application fraud- obtaining new card from issuing companies using false
information.
2) Behavioral fraud- Includes mail theft, stolen card , counterfeit card and
card holder not present.
Methods
• There are three data mining techniques used to predict credit card fraud.
1) Logistic Regression(LR)- Appropriate when dependent variable is
categorical, here dependent variable fraud is binary.
2) Support Vector machines(SVM)- statistical learning techniques that have
been found very successful in variety of tasks. SVMs are linear classifier
that work on hi dimensional feature space with out incorporating any
additional computational complexity.
3) Random Forest (RF)-It is ensemble of classification of trees models.
Ensembles work well when individual numbers are dissimilar and random
forests obtain variation among individual.
Results
• This section presents results from the experiments comparing the
performance of Logistic regression (LR), Random Forests (RF) and Support
Vector Machines (SVM) model developed from training data carrying
varying levels of fraud cases.
Result Contd.
Discussion
• This paper examined the performance of two advanced data mining
techniques, random forests and support vector machines, together with
logistic regression, for credit card fraud detection.
• A real-life dataset on credit card transactions from the January 2006–
January 2007 period was used in their evaluation.
• Random forests and SVM are two approaches that have gained prominence
in recent years with noted superior performance across a range of
applications. Till date, their use for credit card fraud prediction has been
limited.
Discussion
• They use data under sampling, a simple approach which has been noted to
perform well and examine the performance of the three techniques with
varying levels of data under sampling. For performance assessment, they
use a test dataset with much lower fraud rate (0.5%) than in the training
datasets with different levels of under sampling.
Thank You

Data mining for credit card fraud

  • 1.
    Data mining forcredit card fraud: A comparative study SiddharthaBhattacharyya, SanjeevJha, KurianTharakunne, ChristopherWestland (2011) DOR BAHDUR BUDHATHOKI SID:45057 NARENDRA SHARMA SID:45040
  • 2.
    Abstract • This paperevaluates two advanced data mining approaches, support vector machines and random forests, together with the well-known logistic regression, as part of an attempt to better detect (and thus control and prosecute) credit card fraud. The study is based on real-life data of transactions from an international credit card operation.
  • 3.
    Introduction • Data mining-Practice of examining large pre-existing database in order to generate new information. It is helpful to turn raw data into useful information, knowledge discovery, predictive analysis( to apply past outcomes to predict future) • Credit Card Fraud- there are two types of credit card fraud, 1) Application fraud- obtaining new card from issuing companies using false information. 2) Behavioral fraud- Includes mail theft, stolen card , counterfeit card and card holder not present.
  • 4.
    Methods • There arethree data mining techniques used to predict credit card fraud. 1) Logistic Regression(LR)- Appropriate when dependent variable is categorical, here dependent variable fraud is binary. 2) Support Vector machines(SVM)- statistical learning techniques that have been found very successful in variety of tasks. SVMs are linear classifier that work on hi dimensional feature space with out incorporating any additional computational complexity. 3) Random Forest (RF)-It is ensemble of classification of trees models. Ensembles work well when individual numbers are dissimilar and random forests obtain variation among individual.
  • 5.
    Results • This sectionpresents results from the experiments comparing the performance of Logistic regression (LR), Random Forests (RF) and Support Vector Machines (SVM) model developed from training data carrying varying levels of fraud cases.
  • 6.
  • 7.
    Discussion • This paperexamined the performance of two advanced data mining techniques, random forests and support vector machines, together with logistic regression, for credit card fraud detection. • A real-life dataset on credit card transactions from the January 2006– January 2007 period was used in their evaluation. • Random forests and SVM are two approaches that have gained prominence in recent years with noted superior performance across a range of applications. Till date, their use for credit card fraud prediction has been limited.
  • 8.
    Discussion • They usedata under sampling, a simple approach which has been noted to perform well and examine the performance of the three techniques with varying levels of data under sampling. For performance assessment, they use a test dataset with much lower fraud rate (0.5%) than in the training datasets with different levels of under sampling.
  • 9.