Data mining for credit card fraud

Data mining for credit
card fraud: A
comparative study
SiddharthaBhattacharyya, SanjeevJha, KurianTharakunne,
ChristopherWestland (2011)
DOR BAHDUR BUDHATHOKI SID:45057
NARENDRA SHARMA SID:45040

Abstract
• This paper evaluates two advanced data mining approaches, support
vector machines and random forests, together with the well-known
logistic regression, as part of an attempt to better detect (and thus control
and prosecute) credit card fraud. The study is based on real-life data of
transactions from an international credit card operation.

Introduction
• Data mining- Practice of examining large pre-existing database in order to
generate new information. It is helpful to turn raw data into useful
information, knowledge discovery, predictive analysis( to apply past
outcomes to predict future)
• Credit Card Fraud- there are two types of credit card fraud,
1) Application fraud- obtaining new card from issuing companies using false
information.
2) Behavioral fraud- Includes mail theft, stolen card , counterfeit card and
card holder not present.

Methods
• There are three data mining techniques used to predict credit card fraud.
1) Logistic Regression(LR)- Appropriate when dependent variable is
categorical, here dependent variable fraud is binary.
2) Support Vector machines(SVM)- statistical learning techniques that have
been found very successful in variety of tasks. SVMs are linear classifier
that work on hi dimensional feature space with out incorporating any
additional computational complexity.
3) Random Forest (RF)-It is ensemble of classification of trees models.
Ensembles work well when individual numbers are dissimilar and random
forests obtain variation among individual.

Results
• This section presents results from the experiments comparing the
performance of Logistic regression (LR), Random Forests (RF) and Support
Vector Machines (SVM) model developed from training data carrying
varying levels of fraud cases.

Discussion
• This paper examined the performance of two advanced data mining
techniques, random forests and support vector machines, together with
logistic regression, for credit card fraud detection.
• A real-life dataset on credit card transactions from the January 2006–
January 2007 period was used in their evaluation.
• Random forests and SVM are two approaches that have gained prominence
in recent years with noted superior performance across a range of
applications. Till date, their use for credit card fraud prediction has been
limited.

Discussion
• They use data under sampling, a simple approach which has been noted to
perform well and examine the performance of the three techniques with
varying levels of data under sampling. For performance assessment, they
use a test dataset with much lower fraud rate (0.5%) than in the training
datasets with different levels of under sampling.

Data mining for credit card fraud

More Related Content

What's hot

Similar to Data mining for credit card fraud

Recently uploaded

Data mining for credit card fraud