Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

Application of Data Mining and Machine
Learning for Credit Card Fraud Detection
A Comparative Analysis on two Academic Papers
Author: Christian Adom
Computer Science Department , City University London
26th April 2015
Abstract
The use of credit cards as a payment method has become increas-
ingly popular in recent years. As advancements in e-commerce tech-
nologies continue to emerge, consumers are increasingly taking advan-
tage of the convenience and flexibility offered by credit card purchases.
This change in consumer behaviour has given rise to an unprecedented
increase in cases of credit card frauds and subsequently lead to sub-
stantial financial loss for consumers, issuing banks and merchants in
the payments industry. In this paper we compare and discuss two aca-
demic research papers that attempt to apply data mining and machine
learning techniques to address the problem of credit card fraud detec-
tion and prevention. Our aim is to critically assess the methodologies,
techniques and results presented by the researchers for both papers in
countering the problem of credit card fraud. Next we identify the areas
of similarities in the approach and methodologies used, while clearly
delineating between the differences in the techniques implemented. Fi-
nally we provide a short excursion of the use of these techniques in
industry.
1 Introduction
The use of credit cards as the primary payment method has become in-
creasingly popular with consumers in recent years. A study conducted in
2014 by the UK Cards Association revealed there were approximately 175.6
million cards in issue (55.4 million of which were credit cards) and that card
expenditure rose by £0.6 billion, amounting to a total of £49.0 billion. [1]
1

There is strong evidence to suggest that this growing trend in consumer
behaviour is driven by advancements in e-commerce technologies and con-
sumers are increasingly taking advantage of the convenience and flexibility
offered by credit card purchases.
Although a prosperous economic time for the e-commerce market, issuing
banks, merchants and payment providers are now facing an unprecedented
rise in the number of credit card fraud cases as a result of this rise. A study
conducted by the Financial Fraud Action UK (FFA) estimates the fraud
losses on UK cards totalled £450.4 million in 2013, a 16 % increases from
£388.3 million in 2012. [2]
In an effort to address this growing problem, a number of fraud detection
models and techniques have been proposed by researchers within both the
payments industry and academia. A survey of the current literature on fraud
detection methods reveals a number of approaches to tackling this problem.
Below are a few of the research areas: [3] [4] [5]
• Bayesian Network
• Genetic Algorithm
• Neural Network
• Support Vector Machine
• Decision Tree
• Fuzzy Logic Based System
• Hidden Markov Model
• Meta Learning Strategy
For the purpose of this paper, we will discuss two academic research papers
that address the issue of credit card fraud detection and prevention, namely:
1. Credit Card Fraud detection using Hidden Markov Model [6]
2. Neural data mining for Credit card Fraud Detection [7]
The aim of this paper is to critically assess the methodologies, techniques
and results presented by the researchers in countering the problem of credit
card fraud. Areas of similarities in the approach and methodologies used are
identified, while clearly delineating between the differences in the techniques
implemented.
2

Generally, credit card fraud can be divided into two types, off-line and
on-line fraud. For off-line fraud, the crime is committed by using the stolen
physical card to make (usually face-to-face) unauthorized transactions. In
the case of on-line fraud, the fraud is committed by stealing the card details
(without the knowledge of the legal card holder) and proceeding to make
unauthorized transaction through the internet, phone or other card-holder
not present (CNP) channels [8]
In order to successfully detect this kind of fraud it is necessary to develop
methods to analyse the spending pattern on a card and find inconsistencies
with respect to the normal spending patterns of the card-holder.
Generally, humans exhibit specific behaviour in their spending habits, thus
every card-holder can be represented by a set of patterns containing unique
information such as; purchase category, time/location of purchase, trans-
action amount, etc. Deviation from known patterns is considered to be a
potential threat to the model.
The principles introduced above are the key insights into successful fraud
detection presented in both papers, and it is on this common ground that
we will address the research. The remainder of this paper is organised as
follows: Section 2 and 3 presents a summarisation and review of the two
papers under comparison. Section 4 covers a comparative analysis.
2 Credit Card Fraud Detection Using Hidden Markov
Model
Credit card fraud detection for CNP transactions using a Hidden Markov
Model (HMM) is based on the analysis of the spending pattern on a card
holder. The key approach is to model the sequence of operations for pro-
cessing credit card transactions using the HMM.
The details of items purchased in individual transactions (not known to
Fraud detection system) are represented as the underlying finite Markov
chain, which are not observable. The transactions can only be observed
through a stochastic process that produces the sequence of the amount of
money spent in each transaction.
The HMM is trained with normal behaviour of card-holder by generating
synthetic data - This need to generate artificial data is due to difficulties re-
searchers face in obtaining real credit card data sets due to security, privacy
and cost issues. Upon completion of model training, the Fraud detection sys-
tem (FDS) is tested by running transactions through the system - Incoming
3

transactions not accepted by HMM with sufficiently high probability are
marked as fraud.
2.1 HMM Background
In probability theory, a Markov Model is a stochastic model used to model
randomly changing systems where it is assumed that future states depend
only on the present state and not on the sequence of events that precede
it. [9] When the system being modelled is assumed to be a Markov process
with unobserved (hidden) states, we can represent it as a Hidden Markov
Model.
A HMM has a finites set of states which are governed by a set of transi-
tion probabilities, where for any particular state, an outcome or observation
can be generated according to an associated probability distribution. It is
only the outcome and not the state that is visible to an external observer
(reference to the term ”Hidden”).
2.1.1 Characteristics of an HMM
To set the foundation for understanding how HMM’s are applied to the
problem of fraud detection, it is to necessary to provide a formal description
of the characteristics of an HMM.
Each Hidden Markov Model is defined by the following elements: states,
observation symbols, transition probabilities, and initial probabilities.
1. The N states of the model is defined as:
S = {S1, S2, ..., Sn} (1)
2. The M observation symbols per state is defined as:
V = {V1, V2, ..., Vm} (2)
3. The state transition probability distribution A is given by:
aij = P (qt+1 = Sj|qt = Sj) (3)
where:
qt = current state
The transition probabilities satisfy the stochastic constraints:
aij ≥ 0, 1 ≤ i, j ≤ N and
N
j=1
aij = 1, 1 ≤ i ≤ N
4

4. The observation symbol probability distribution B in each state is
given by:
bj(k) = P (vk = Sj) , 1 ≤ j ≤ N, 1 ≤ k ≤ M (4)
The observation symbol probabilities satisfy the stochastic constraints:
bj(k) ≥ 0, 1 ≤ j ≤ N 1 ≤ k ≤ M and
M
k=1
bj(k) = 1, 1 ≤ j ≤ N
5. The initial state probability vector π is the probability that the model
is in state Si at time t = 0 and is defined as:
πi = P (q1 = Si) , 1 ≤ i ≤ N such that
N
i=1
πi = 1 (5)
From the above definitions, the complete specification of an HMM re-
quires the estimation of two model parameters, N and M, and three proba-
bility distributions A, B, and π. This is represented by the notation:
λ = (A, B, π) (6)
2.2 Implementing the HMM for Credit Card Fraud Detec-
tion
With the mathematical characteristics of the HMM defined , we now turn to
providing a high level outline of the fraud detection process using an HMM.
This can be summarised in three steps:
1. Each incoming transaction is submitted to the FDS (usually running
at an issuing bank) for verification.
2. The FDS tries to find an anomaly in the transaction based on the
spending profile of the card-holder.
3. If the FDS confirms the transaction to be fraudulent, it raises an alert
and the issuing bank declines the transaction.
This general process raises a number of questions, such as:
• How are the credit card transaction processing operations mapped in
terms of an HMM ?
• How are the spending profiles of the cardholders determined and cat-
egorised?
We discuss these and other issues in the next section.
5

2.2.1 HMM for Credit Card Transaction Processing
The process for mapping credit card transaction processes in terms of a
HMM can be enumerated in six steps:
1. Decide on observation symbols in model.
2. Quantize purchase values x into M price ranges V1, V2, ..., Vm forming
the observation symbols.
3. Define the observation symbols as, V = l, m, h making M = 3
where: l = low, m = medium, h = high
4. The transition in purchase type is used as the state transition in the
model.
5. The set of all possible types of purchases forms the set of hidden states
of the HMM.
6. Compute the probability matrices A, B and π
We can make some general comments about the steps defined above:
In step 4 the transition in purchase type is used as the state transition in the
model. This choice seems to contradict our intuition, since a credit card-
holder makes different kinds of purchases of different amounts over a period
of time, the natural choice would be to consider the sequence of transaction
amounts instead. However, the sequence of types of purchase is considered
to be more reliable compared to the transaction amounts because a card-
holder makes purchases depending on his/her need for obtaining different
types of items over a period of time. This spending behaviour subsequently
generates a sequence of transaction amounts. Furthermore, the individual
transaction amount generally depends on the associated type of purchase.
Figure 1: Special case of fully connected HMM
6

In step 6, the optimal values for these parameters are determined in the
training phase using a ”Baum-Welch (forward–backward) algorithm.
With the process defined we present a graphical representation of a HMM
as shown in Figure 1. This is a special case of a fully connected HMM in
which every state of the model can be reached in a single step from another
state. In this figure GR, EL and MI represent Groceries, Electronics and
Miscalenoues purchases respectively.
2.3 Process flow of FDS
Figure 2: Process flow of the FDS
After obtaining an estimate of the HMM parameters through the train-
ing phase, Abhinav et al obtain an initial sequence of symbols from the
cardholders transactions. This sequence can then be passed parametrically
to the HMM to compute the probability of acceptance:
α1 = P (O1, O2, ..., OR|λ) (7)
They then form another sequence of length R by discarding O1 and
adding OR+1 in the new sequence. This is then passed parametrically to the
HMM and the probability of acceptance is computed as:
α2 = P (O2, O3, ..., OR+1|λ) (8)
where OR+1 is the symbol generated by a new transaction at time t + 1
The metric for accepting or declining the sequence is defined as:
∆α = α1 − α2 (9)
7

where if ∆α > 0 they conclude the sequence is accepted by the HMM
with low probability thus it is potentially fraud, given that the following
additional condition holds:
∆α
α1
≥ Threshold (10)
Alternatively, if the condition does not hold then OR+1 is permanently
added in the sequence and the new sequence is used for determining the
validity of the next transaction.
The complete process flow of the FDS is shown in Figure 2, where the
process is divided into two separate phases (Training and Detection)
2.4 Results and Analysis
In the final stage of testing and analysis, large-scale simulations were carried
out to test the effectiveness of the FDS system. Abhinav et al used True
Positive (TP) , False Positive (FP), TP-FP spread and Accuracy metrics,
to measure the capability of the system. In this context, TP represents the
fraction of fraudulent transactions correctly classified as fraudulent, whereas
FP is the fraction of genuine transactions incorrectly classified as fraudulent.
Furthermore the converse of these metrics, True negative (TN) and False
Negative (FN) were also used as part of accuracy calculation.
To measure the performance of the FDS , the difference between TP and
FP, called the TP-FP spread, was used as a metric. Accuracy represents
the fraction of total number of transactions (both genuine and fraudulent)
that have been detected correctly, and is given by:
TP + TN
TP + TN + FP + FN
(11)
Lastly, experiments were carried out to determine the correct combina-
tion of HMM design parameters namely; number of states, sequence length,
and threshold value. After obtaining these parameters , a comparative study
with another Fraud detection system was carried out as a means of bench-
marking the system.
2.5 Performance Comparison
The performance of the proposed system was measured while varying the
number of fraudulent transactions and spending profile of the card-holder.
The performance was then compared with the credit card fraud detection
technique proposed by S.J Stolfo et al in the paper “Credit Card Fraud
Detection Using Meta-Learning [10]
8

Abhinav et al carried out experiments by considering four profiles, noting
that one of them is a mixed profile, meaning that spending profile was not
considered in their approach. The profiles they considered are (55 35 10),
(70 20 10), and (95 3 2). Here, (x, y, z) profile represents a low spending
profile card-holder who has been carried out x % of their transactions in the
low, y % in medium, and z % in the high range. The goal was to determine
how the system performed for different mixes of transaction amount ranges
in the transactions.
For every combination of spending profile and malicious transaction dis-
tribution, they carried out 100 test runs and recorded the average result.
For consistency the same set of data was used to determine the performance
of both the approach used by Abhinav et al and S.J Stolfo et al (denoted
”OA” and” ST” respectively for convenience).
Fig 3a shows the variation of TP and FP for the two approaches using
the spending profile (95 3 2). The variation of TP-FP and Accuracy is also
shown in Fig 3b. The graph shows that the TP of the researchers approach
is markedly close to Stolfo et al’s approach. Furthermore, both approaches
have similar values of FP.
They concluded that the two systems had comparable accuracies and average
TP-FP spread and showed a similar trend with variation in µ.
Figure 3: Performance variation of the two systems (OA and ST) for the
spending profile (95 3 2)
9

From the testing results, Abhinav et al conclusion was that the proposed
system has an overall Accuracy of 80%, even under large input condition
variations, which is much higher than the overall Accuracy of the method
proposed by Stolfo. Their system therefore correctly detects most of the
fraudulent transactions.
3 Neural Data Mining for Card Fraud Detection
This paper applies a combination of data mining techniques and a neural
network algorithm to address the fraud detection problem. The aim is to ob-
tain high fraud coverage, combined with a low false alarm rate. The general
approach is to model the sequence of operations in credit card transaction
processing using a confidence-based neural network. To ensure the accuracy
and effectiveness of fraud detection, receiver operating characteristic (ROC)
analysis is applied as a means of measuring the accuracy of the model.
A neural network is initially trained with synthetic data, then if an in-
coming credit card transaction is not accepted by the trained neural network
model (NNM) with sufficiently low confidence, it is considered to be fraudu-
lent. The paper shows how confidence value, neural network algorithm and
ROC can be combined successfully to perform credit card fraud detection.
3.1 Background on Neural Networks
Fraud detection using Artificial Neural Network takes inspiration from the
biological nervous system and brain function of humans beings. The ap-
proach is based on the brains ability to learn from past experience and
apply the data/knowledge in decision making and problem solving. The
goal of researchers have been to apply these principals to credit card fraud
detection methods.[11]
The neural network is usually trained with both personal and historical
transaction data of the card-holder, such as occupation, income, transaction
amount, purchase location, frequency and time period of purchases. In ad-
dition to training the network with this information, it is also very common
to include the variety of credit card fraud faced by a particular issuing bank
into the training data.
The neural network is typically depicted as having three interconnected
layers: (As shown in Figure 4)
• Input Layer: Receives input from an external source such as a database
10

Figure 4: Multi-layered neural network
• Hidden Layer: A layer hidden from external observer and receives
input from input layer or another hidden layer
• Output Layer: Exposes the network to external observers and provides
the final output of the network
The output of the neural network generally takes real values between 0
and 1. If the output is below some specified threshold values (for example
0.6 ) then the transaction is classified as genuine. If the output is above some
specified threshold then the ”probability” that the transaction is fraudulent
is considered to be high. This is a rather subtle distinction.
3.2 Data Set
As discussed in previous sections, a critical part of designing an effective
NNM is to supply the model with realistic transactional data. However due
to security, privacy and cost issues, researchers face difficulties in obtaining
credit card data sets, the typical approach taken by researchers has been to
generate ”synthetic” data to facilitate the development and testing of the
model.
11

In generating the data, it is necessary to provide the neural network with
a mix of genuine as well as fraudulent transactions to train the classifiers.
Tao et al specify a ratio of approximately 100 good transactions for each
fraudulent transaction in the training data set in order to accurately simulate
real customer transactions.
In designing the data set for the model, it is important to clearly specify
the credit card payment-related training data attributes for the NMM. Tao
et al specify key attributes such as:
• time of transaction
• location of transaction
• type of merchandise
• business code for merchandise
• business type for merchandise
• transaction amount
Furthermore, the idea of ”Actual Target Values” are used (for classifica-
tion) to guide the neural network learning process, where a target value 1
represents abnormal, and 0 represents normal.
3.3 Calculation of confidence value
The unique approach introduced by the Tao et al in the application of Neu-
ral network techniques for fraud detection is to convert both training and
testing data into confidence values before putting into NNM. These values
are formatted to the range [0.0, 1.0] and each input contains historical in-
formation at this time. Furthermore, they categorise the input attributes
into discrete and continuous values, where attributes such time, location,
type of merchandise, e.t.c are defined as discrete whilst an attribute such
as transaction amount belongs to the continuous category. Therefore there
are two separate methods proposed for the calculation of confidence values
based on the category of input attributes.
To illustrate the calculation of confidence values for discrete and continu-
ous attributes, they consider the location of the transaction and transaction
amount respectively as examples:
12

Given a sequence of transactions:
X = {x1, x2, ..., xn}
The confidence for the transaction location is given by:
C(xi) =
mxi
n
(12)
where:
n is the number of uses of the credit card
xi for i = 1, 2, ..., 3 is the location of use of the credit card
mxi denotes the number of uses of credit card in the location xi
The confidence for the transaction location is given by:
C(xi) = e
−1
2
(
xi − µ
σ
)2
(13)
where:
n is the number of uses of the credit card
xi for i = 1, 2, ..., 3 is the transaction amount i of use of the credit card
σ is standard deviation for the transaction amount
µ is the average of transaction amount
The purpose of the calculation of the confidence values is two fold. Firstly
it will be tested against a threshold value that enables the researchers to
determine whether the transaction is genuine or fraudulent. Furthermore,
through the confidence calculation, neural network input is formatted to the
range [0.0, 1.0] - where all input values achieve the purpose of format - The
formatted data will help to speed up the neural network learning process.
3.4 Back Propagation and Receiver operating characteristic
A brief overview of the theory and operation of NNM were discussed in
section 3 and 3.1. In this section we look at the methods applied by Tao et
al in further detail.
3.4.1 Back Propagation
In the proposed system, the reseachers apply a multi-layer neural network
model and a backpropagation (BP) algorithm on the model.
The BP algorithm is a common approach to training artificial neural net-
works. It computes the gradient of a cost function with respects to all the
weights in the network. The gradient is passed as input into an optimization
method (such as steepest descent) which subsequently updates the weights,
with the aim of minimizing the cost function[12].
13

In this study the BP algorithm learns by iteratively processing a data set
of training ”tuples” (a finite ordered list of elements):
X = x1, x2, ..., xn
The algorithm compares the networks prediction for each tuple with the
actual known target value. For each training tuple, the weights are adjusted
to minimize the mean squared error between the networks prediction and the
actual target value. The modifications are made in the backwards direction,
from the output layer Y = y1, y2, ..., yn, through each hidden layer down to
the first hidden layer.
For this study the researchers used a sigmoid function:
S(t) =
1
1 + e−t
(14)
for the nodes in the hidden layers and the output layer.
3.4.2 Receiver operating characteristic (ROC)
In this paper, ROC analysis has the dual purpose of ensuring the accuracy
and performance of the model is adequate. This is achieved by obtaining
an optimal threshold for determining whether a transaction is genuine or
fraudulent. This threshold value is tested against the output of the NNM,
which takes the form of confidence value Y = y1, y2, ..., yn. Here the classi-
fication of the transaction as genuine or fraudulent will then be determined
by whether this confidence value is higher or lower than the threshold.
A crucial part of ROC analysis is the specification of the confusion matrix
and Table 1 shows the layout of the matrix. The confusion matrix compares
actual classification values (rows) against model predictions of fraud. If the
model predicted fraud high accuracy, all observations in the confusion matrix
would reside in the two cells labelled ”True Positive” and ”True Nega-
tive”. The objective is to maximize correct predictions while managing the
increase in false alarms. [13]
In this context the False Positive Rate is the ratio of abnormal spending
pattern incorrectly detected as normal over total abnormal spending pattern
and is given by:
FPR =
FP
FP + TN
(15)
Conversely the True Positive Rate is also given by the ratio:
TPR =
TP
TP + FN
(16)
14

Table 1: Confusion matrix
Prediction Classification
Y N
Actual Classification Y True Positive False Negative
N False Positive True Negative
With the FPR and TPR metrics defined, we introduce another important
metric at this point, namely the Youden Index. In medical/biological
sciences, the Youden Index (or Youden exponent as defined in this paper) is
typically a used as a summary measure of the ROC curve. It both measures
the effectiveness of a ”diagonistic marker” and enables the selection of an
optimal threshold value (cutoff point) for the marker.
It is defined as J = Senesitivity + speficity − 1 [14]
Its value ranges from 0 to 1, where a value of 1 indicates that there are no
false positives or false negatives, i.e. the test is perfect. (Value of 0 indicates
the converse)
In the context of this paper, when considering the optimal point on the
ROC curve, Tao et al define the maximal number of Youden exponent E as:
E = TPR − FPR (17)
Then taking into consideration the cost of false negative and false posi-
tive, the weighted exponet (CE) is defined as:
CE =
FNC
FPC + FNC
∗ TPR −
FNC
FPC + FNC
∗ FPR (18)
where:
FNC is the cost of false negative and FPC is the cost of false positive
Satisfying the constraints:
0 ≤ FPC ≤ 1 , 0 ≤ FNC ≤ 1 , FPC + FNC = 0 (19)
And:
FPC = FNC = 0 (20)
Now when equation [20] holds, then equation [18] reduces to:
CE =
1
2
∗ (TPR − FPR) =
1
2
∗ E (21)
The crucial point under illustration here is that the use of cost of weighted
exponent overcomes the inadequacies of setting threshold without consider-
ing error cost.
15

3.5 Results and Analysis
In this section we summarise the experimental results and concluding anal-
ysis obtained by the researchers after running test transactions through the
FDS.
Firstly, 7000 records of synthetic data was used for training the NNM and
3000 for for testing purposes.
Figure 5: Calculation of confidence and classification values per record
The table in Figure 5 shows 10 records of card-holder behaviour attributes
(Time of transactions, Merchant type, Business code, etc) with their asso-
ciated values (0.56, 0.83, 0.55, etc) after confidence values were calculated.
For each record a target value of 0 (normal) or 1 (abnormal) was assigned
based on testing the output yi against a specified threshold value (see section
on ROC)
We now consider the ROC curve shown in Figure 6. From this the re-
searchers attempt to show that setting the threshold value at 0.4 improves
the detection accuracy of the model. Without considering the cost factor
(CE) the model obtains the optimal value at the point where the TP rate
hits 91.2 % and FPR is at 13.55%, providing a reasonably good ratio of
True positive to False positive. They can then choose to factor the cost by
computing the threshold value according to equation [21], noting that the
optimal threshold value will be adjusted when the relative cost changes.
16

Figure 6: ROC curve
4 Comparative Analysis
In this section we compare and contrasts both papers by identifying the
areas of similarities in the approach used, while clearly delineating between
the diﬀerences. Figure 7 provides an illustration of the key areas that will be
discussed in this section, where we attempt to show that there is a general
concord in the scope, motivation and methodologies applied in both papers
whilst expounding on the primary diﬀerence in the implementation and
techniques applied to solving the fraud detection problem.
4.1 Areas of concordance
4.1.1 Scope
As an introduction to this section, it should be stated that these two papers
were carefully selected from a range of available literature on the topic of
fraud detection, due to the fact that they both approached this issue from a
perspective that incorporated data mining, machine learning and statistical
techniques within the context of credit card fraud detection. Therefore
17

Figure 7: Illustration of various components of both papers
the scope of their research were closely aligned, making for an interesting
exposition and comparison.
4.1.2 Motivation
As discussed in section 1, in the retail market environment, e-commerce has
rapidly grown and gained popularity due to the ability to facilitate instan-
taneous transactions. Subsequently credit card payment has become the
most important means of payment due to rapid development in informa-
tion technology globally. However as the usage of credit card increases the
rate of fraudulent practices is also increasing substantially. Both Abhinav
et al. and Tao et al are acutely aware of this problem and its implication
for card-holders and especially issuing banks, who face the risks of losing
millions in fraud compensation and fines. It is therefore clear that it is this
common concern that acts as the motivation for the research presented by
both papers.
4.1.3 Methodology
The usage of transaction data to understand the spending pattern of card-
holders and to detect credit card fraud is not a new concept and has been
largely recognised by researchers in this area (as is evidenced by the numer-
ous literature on fraud detection techniques) as the most effective means of
solving the fraud detection problem. It is therefore not surprising that we
should find this methodological approach adopted by both Abhinav et al.
and Tao et al in their work.
18

The common approach adopted by both is to model the sequence of op-
erations for processing credit card transactions, then test the model by run-
ning transactions through them. This subsequently leads to Abhinav et al.
and Tao et al introducing TP an FP metrics as a means to ensure the ac-
curacy and effectiveness of their fraud detection model. Furthermore, the
researchers analyse historical transactional data and attempt to find incon-
sistency in spending pattern as a means of detecting fraudulent transactions.
To ensure the fraud detection model is successful in its primary purpose,
Abhinav and Toa place a strong emphasis on training their models with
realistic data that contains a good mix/ratio of fraudulent to genuine trans-
actions. However they both have to deal with issues surrounding security,
privacy and cost of obtaining real transaction data, hence the need to gen-
erate synthetic data.
Lastly, both researchers run test transactions against their trained models,
making the decision to accept or reject transactions based on a specified
threshold value.
4.2 Implementation and Technical differences
4.2.1 Modelling the sequence of operations in credit card trans-
action processing
In the approach proposed by Abhinav et al, the key idea is to model the se-
quence of operations for processing credit card transaction using the HMM,
where the details of items purchased in individual transactions are repre-
sented as the underlying finite Markov chain, which are not observable. The
transactions can only be observed through a stochastic process that produces
the sequence of the amount of money spent in each transaction. Whilst in
the work of Tao et al the the sequence of operations in credit card trans-
action processing is modelled using a confidence-based neural network and
ROC analysis. The calculation of confidence values are introduced for both
discrete and continuous input attributes respectively.
4.3 Training Data and Learning Time
The unique approach introduced by the Tao et al in the application of Neu-
ral network techniques for fraud detection is to convert the both training
and testing data into confidence values before putting into NNM. These val-
ues are formatted to the range [0.0, 1.0] and each input contains historical
information at this time. In Abhinav et al approach, once the sequence
is formed from the cardholder’s transactions, it is passed directly into the
HMM without formatting.
19

In regards to training and learning time of the model, it is important to
note the differences in training approach employed by both researchers. In
Abhinav’s implementation, although the training is done offline, the learning
time of the model can have a strong impact on the scalability of the system.
This is due to the fact that an HMM has to be trained for every cardholder.
Considering the fact that an issuing bank such as HSBC processes millions of
transactions for equally large number of cardholders, their implementation
could lead to issues surrounding performance and scalability of the system.
In Tao’ approach, the challenge to the performance and scalability of their
system lies in finding an efficient optimization algorithm for minimizing the
gradient cost function and subsequently updating the weights. Fortunately,
optimization algorithms are well studied and a large body of techniques are
available that can be implemented to address this problem.
4.4 Threshold values
In Tao et al work, the key insight is taking into consideration the cost of
false negative and false positive, the approach overcomes the inadequacies
of setting threshold without considering error cost by adjusting the optimal
threshold value when the relative cost changes. Furthermore the application
of ROC analysis is introduced to show that setting the threshold value at the
optimal value improves the detection accuracy of the model. Alternatively in
Abhinav’s implementation the threshold value is learnt empirically through
the training stage using the Baum-welch algorithm, then is effectively fixed
for the card-holders markov model.
4.5 Acceptance and Declining of incoming transactions
In the Tao’s implementation, the output of the neural network generally
takes real values between 0 and 1. If the output is below some specified
threshold values then the transaction is classified as genuine, alternatively
if the output is above some specified threshold then the ”probability” that
the transaction is fraudulent is considered to be high.
Comparatively, in Abhinav work, after obtaining an estimate of the HMM
parameters through the training phase, they obtain an initial sequence of
symbols from the cardholders transactions. This sequence can then be
passed parametrically to the HMM to compute the probability of accep-
tance. Subsequently, if the metric ∆α > 0 they conclude the sequence is
accepted by the HMM with low probability thus it is potentially fraud, if
additional conditions are satisfied.
20

5 Usage of HMM and NMM in Industry
By way of contextualising these papers in terms of the application of the
methods discussed, it is helpful to examine one case of general application
of these techniques and another case related directly to credit card fraud
detection. Firstly considering HMM’s, they have applications in a broad
number of scientific and mathematical fields where the goal is to recover
a data sequence that is not immediately observable (but other data that
depend on the sequence are)
For example HMM’s have applications in Automatic speech recognition,
where the model is trained to recognize speech utterance from given obser-
vations. [15]. They have also been used extensively in biological sequence
analysis, where HMMs can be used to solve various sequence analysis prob-
lems such as pairwise and multiple sequence alignments, gene annotation,
classification, etc. [16]
Neural networks models are a immensely popular technique for fraud de-
tection and are used by some of the worlds largest banks. In fact Santander
bank uses a fraud detection system called Falcon Fraud Manager from
FICO which is based heavily on neural models and leverages adaptive ana-
lytics. The adaptive model adjusts the base neural network ”Falcon score”
in response to real-time fraud tactics that were not present at the time of
the neural network model training[17][18]. FICO analytics software and
tools are used across multiple industries to manage risk, fight fraud, build
more profitable customer relationships, optimize operations and meet strict
government regulations[19].
References
[1] UK Cards Association
http://www.theukcardsassociation.org.uk/wm documents/December
2014 Full Report.pdf
[2] Financial Fraud Action UK
http://www.financialfraudaction.org.uk/downloads.asp?genre=consumer
[3] S. Benson Edwin Raj, A. Annie Portia Analysis on Credit Card Fraud
Detection Methods. International Conference on Computer, Communica-
tion and Electrical Technology – ICCCET2011, 18th, 19th March, 2011
[4] Masoumeh Zareapoor, Seeja.K.R, and M.Afshar.Alam Analysis of Credit
Card Fraud Detection Techniques: based on Certain Design Criteria.
International Journal of Computer Applications (0975 – 8887) Volume
52– No.3, August 2012
21

[5] Khyati Chaudhary, Jyoti Yadav, Bhawna Mallick A review of Fraud
Detection Techniques: Credit Card. International Journal of Computer
Applications (0975 – 8887) Volume 45– No.1, May 2012.
[6] Abhinav Srivastava, Amlan Kundu, Shamik Sural and Arun K. Majum-
dar, Credit Card Fraud detection using Hidden Markov Model. IEEE
Transactions on dependable and secure computing VOL. 5, NO. 1,
January-March 2008
[7] Tao Guo, Gui-Yang Li, Neural data mining for Credit card Fraud Detec-
tion. Proceedings of the Seventh International Conference on Machine
Learning and Cybernetics, Kunming, 12-15 July 2008
[8] Yufeng Kou, Chang-Tien Lu, Sirirat Sinvongwattana, Survey of Fraud
Detection Techniques. Proceedings of the 2004 IEEE International Con-
ference on Networking, Sensing & Control Taipei, Taiwan, March 21-23,
2004
[9] Markov Model
http://en.wikipedia.org/wiki/Markov model
[10] S.J. Stolfo, D.W. Fan, W. Lee, A.L. Prodromidis, and P.K. Chan, Credit
Card Fraud Detection Using Meta-Learning: Issues and Initial Results.
Proc. AAAI Workshop AI Methods in Fraud and Risk Management, pp.
83-90, 1997
[11] Khyati Chaudhary, Jyoti Yadav, Bhawna Mallick A review of Fraud
Detection Techniques: Credit Card. International Journal of Computer
Applications (0975 – 8887) Volume 45– No.1, May 2012.
[12] Rumelhart, David E.; Hinton, Geoﬀrey E.; Williams, Ronald J Learning
representations by back-propagating errors. Nature 323 (6088): 533–536,
(8 October 1986)
[13] Using Data Mining Techniques for Fraud Detection, A SAS Institute
Best Practices Paper
http://www.ag.unr.edu/gf/dm/dmfraud.pdf
[14] Ronen Fluss, David Faraggi, and Benjamin Reiser Estimation of the
Youden index and its associated cutoﬀ point . Biometrical Journal, 2005
[15] HMM Speech Recognition
http://www.fysiskplanering.se/fou/cuppsats.nsf/all/e156a6197d8b0678c1256bbb003f62
[16] Byung-Jun Yoon, Hidden Markov Models and their Applications in
Biological Sequence Analysis
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2766791/
22

[17] FICO Analytics
http://www.fico.com/en/node/8140?file=5380
[18] FICO Analytics
http://www.fico.com/en/blogs/tag/score-performance/page/5/
[19] FICO Analytics
http://www.fico.com/en/about-us
23

Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0

Similar to Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0 (20)

Application of Data Mining and Machine Learning techniques for Fraud Detection_v.1.0