Fraud Detection using Data Mining Project

DATA MINING PROJECT
Fraud Detention using Data Mining
JUNE 7, 2015
NORTHWESTERN UNIVERSITY, MSIS 435
ALBERT KENNEDY

1
TABLE OF CONTENTS
Abstract............................................................................................................................................... 2
Introduction........................................................................................................................................ 3
Data Mining Applications.................................................................................................................... 4
Data Mining Themes........................................................................................................................... 4
CRISP-DM Methodology...................................................................................................................... 5
Data Understanding............................................................................................................................ 7
Data Preparation................................................................................................................................. 9
Data Mining Algorithm...................................................................................................................... 10
Experimental Results and Analysis................................................................................................ 11
Conclusion..................................................................................................................................... 14
Future Work.................................................................................................................................. 15
References .................................................................................................................................... 16

2
Fraud Detention
1 ABSTRACT
The Adoption of data mining can be great for many use cases and organization that have a special need
and understanding of what can be done with existing data. Many organizations don’t understand the
power and value they have in what you control. With this, the benefits of using a process for making
smarter decision will be discussed in this paper. For the purpose of explaining not yet a data mining
topic and its benefits but also to address common problems in the Fraud and identity sector where
many businesses and individuals can take advantage of.
Fraud detention should be more important today than ever before. With the growing e-commence
business that moving rapidly and people having more access to important, financial institutions need to
be more aware of ways to detect possible fraudulent acts. We can achieve this goal with the use of data
mining.
This document will show case a typical problem with sample data taken from borrowers and their
information that related to credit approval. Another data set is similar but present typical account
holders and a common profile of related attributes that make up a “type” of customer. We will go into
detail and the proper process of how to solve this problem using the outline:
 Defining the business problem
 Collection the data and enhancing that data
 Choosing a model strategy and algorithm that fixes the business need
 Executing the model through a training set then test that model
 Evaluating the results of the model
 Decide for the model or make any changes
 Deploy the model into an actionable project
The above outline is based on a framework that many data scientist use today called “The Cross Industry
Standard Process for Data Mining” (CRISP-DM)1
. This foundation is what will be use to analyze the fraud
detention use case of both the German Credit fraud data and the Give Me Some Credit data set to
compare against using two separate techniques. This two datasets have been thoroughly cleanse and
checked for correctly and free of any bias input to shew any result toward any direction. In order to
become successful in building a fraud detention system, it is important to understand the data mining
tools applications used in the industry. Second, it’s important to know the themes of data mining and
most importantly the CRISP-DM methodology that will be used to support our business problem and
data mining design for a reliable fraud detention system.
1
Information related to CRISP-DM, see CRISP-DM 1.0, Step-by-Step data mining guide, SPSS,

3
2 INTRODUCTION
In America and many other others of the world, crime appears to be relevant in today’s society. As the
government and we as people continue to find ways to prevent and coach individuals to shrey away
from unlawful event, people find alternative and intelligent ways to be corrupted. Using our government
law enforcement has only become as good as to catch perpetrators after a particular wrong doing event
has been either reported or caught. Not all actions we can’t control and stop before happening that’s
within any person’s private setting. How about we can help prevent those crimes that can be warned if
not stop prior to happing? This type of crime that we people such as businesses and individual victims
can have control over with the used of data analysis. The crimes of identity thief and Credit approval
can be defended.
The purpose of this study is to empower financial businesses and individuals with the ability to combat
potential thief of personal and company information to avoid misuse for another person’s gain.
Description of Problem: In today’s fast growing technology sociality, individuals are completing more
and more online transactions and sending data to multiple sources of businesses using the same vital
information to identify a person.
Examples of this could be applying for credit via an online store. So what information is needed for this?
In many cases
 Full name
 Telephone number or street address
 Social security number (SSN)
The above information is all that is needed for a creditors to approve someone for a line of credit or
account under a person’s name. There’s an issue here; anyone that doesn’t know you can obtain this
information easily. The only harder bit of information to obtain is an SSN. The best way for someone to
get a person’s SSN, and through personal work records; a person that has access to this who handle’s
administrative tasks for employees. This is a problem, because a telephone number and street address
can be any number or address that the creditor does not care for other than a place to mail bills to.
Typically a creditor will validate this through mailing information. Say a different phone number was
given and validated with a phone call that could also be possible. We can use smarter ways to combat
this, which some companies do with multiple levels on validating (through phone call, matching current
address and identification).
Our Objective: We can help fix these problems through the use of data mining and making sound
business decisions in order to complete a credit transition.
 First we need we can identify the types of data needed to solve an identity or fraud similar
action. We will use pervious data from users of different credit accounts to do the analysis work
on.
 Then, we need to choose one or many different data mining algorithms to test this data for an
outcome of what we are analyzing to make a recommendation, prediction, classifications,
and/or description of for better business decision.

4
So, let’s explore the many different data mining applications that can be used that’s related to this
problem
3 DATA MINING APPLICATIONS
There are many related data mining applications that can be used for the purpose of detecting
fraudulent activities. This is a growing study that many established organizations have seek out and
completed in depth research that more so expose the issues and weakness more so than real world
working applications that actual identity the issue and combat the problem in a defense matter.
A company called Morpho, has a mission where they are the market leader in security solutions who are
the pioneer in identification and detection systems. They deliver many products that many target
government and national agencies with dedication tools and systems to safeguard sensitive information.
They completed a study using data mining and the relation to identity fraud as an application to prevent
and or warn businesses, government organizations and individuals of a possible fraudulent act. From
there “Fighting Identity Fraud with Data Mining,” paper on Safran product2
, they speak about a
comprehensive fraud-proof process.
A second company that should be worth mention who use data mining methods to conducted a similar
study was Federal Data Corporation and the SASA institute Inc. These two completed a thorough study
on “Using Data Mining Techniques for Fraud Detection.” This was solved in conjunction with using the
SAS Enterprise Miner software. The two use cases presented where 1) Health Care Fraud Detection and
2) Purchase Card Fraud detection. Both have similar if not the same business problems and ending goals.
The first case, the FDC and SAS used Decision Trees to group all the nominal values of input into smaller
group that will in turn give a predictive target outcome. The second case study, they used a modeling
strategy that was clustering. Their analysis included three clusters that help explain the cluster analysis
efficiently segments data into groups of similar cases.
The overall conclusions for both unveiled unknown patterns and regularities in their data.
4 DATA MINING THEMES
The study of data mining to best explained and organized into different themes. These different area are
better described by the four core data mining tasks. According to “Introduction to Data Mining,” by
Pang-Ning Tan, these themes are covered under the four core tasks; Predictive modeling, cluster
analysis, Association Analysis Anomaly detection.3
To briefly describe each theme of data mining, it is best to show by examples. These themes in detailed
are:
 Classification
 Clustering
 Anomaly
2
Product from the Morpho Inc., Safran, “Fighting Identity Fraud With Data Mining”
3
See more information on themes from Assignment 1: Data Science Applicaiton, Kennedy, Albert

5
First off, predictive modeling is split into two types 1) Classification which is used for discrete target
variables and 2) regression, which is used for continuous target variables” Tan. The Classification type is
mostly common for making a prediction for an outcome which is the target variable of a single action.
Whereas the regression, may analyze the cost that a consumer may spend monthly on an e-commerce
website. These two types of variables (classification and regression) are what help define predictive
model.
The second them, Cluster analysis “seeks to find groups of closely related observations so that
observations that belong to the same cluster are more similar to each other than observations that
belong to other cluster”, Tan. Good examples of this would be grouping different customers’ purchasing
behaviors. Doing this helps define the types of customers and their purchases a clothing retail store
have for analysis.
Then lastly, there is the Association Analysis them. This theme is used “to discover patterns that
describe strongly associated features in the data” Tan. The association theme is commonly used in the
retailer or grocery market businesses for analysis. We can group liked things together that have
similarities based on related attributes and/or pair transitions from users.
These three themes all have its purpose to help solve particular data science problems. With solving
these problems, businesses can make decisions using these techniques. Understanding its definition is
key to ensure the right solution/theme is being utilized for the correct problem. Once the understanding
is complete, analyst can make use of the right tool for apply for the most appropriate case. In many uses,
not just one theme may apply to a case, but multiple themes can be applied for better analysis and
comparisons for the best results.
5 CRISP-DM METHODOLOGY
For many of these data mining techniques, we don’t want to apply the wrong or less effective solution.
Lucky, there is a well-organized methodology that gives businesses the steps and processes to handle
out these type of data mining project. We use what’s called the “Cross Industry Standard Process for
Data Mining” or CRIPS-DM for short. The CRIPS-DM is a structured framework with hierarchical steps to
follow in order to help guide through a proper data mining problem and solution. The CRISP-DM include
six phases:4
4
Information related to CRISP-DM, see CRISP-DM 1.0, Step-by-Step data mining guide, SPSS

6
Figure 5-1: CRISP-DM diagram
1) Business Understanding – this makes sense as an initial step. Any data mining problem has an
business need and problem that needs to be understood. This stage “represents a part of the
craft where the analysts’ creativity plays a large role…the design team should think carefully
about the use scenario” Provost. In this stage questions such as what needs to be done and how
it needs to be done are asked.
2) Data Understanding – this phase is a self-explanatory step yet could take a lot of time. Making
sure as an analyst you become knowledgeable of the data can make for an easier process. This
phase enables you to become “familiar with the data, identify data quality problems, discover
first insights into data, and/or detect interesting subsets to form hypotheses regarding hidden
information”, SPSS.
3) Data Preparation – in order for us to make a good analysis, we need tools that will enable us to
process the data in the best matter suitable for the necessary model. Examples of this phase
may require converting data in a simple tabular format, removed pointless attributes not
relevant to the data problem and/or converting a data file to a particular file format in order to
operate in the chosen data mining tools.

7
4) Modeling – “The modeling stage is the primary place where data mining techniques are applied
to the data” Provost. Simply put, this is where the magic happens and the actual data mining
craft and chosen algorithm(s) are put into work.
5) Evaluation – at the evaluation phase, we take time to access the results of the outcomes from
the models that were built from our data. The most important aspect of this phase going
through this evaluation is to gain confidence in the model’s outcome. We would like to analysis
the results and understand its outcome to ensure it’s reliable for meet the original business
problem’s needs.
6) Deployment – Now the model has been created and test, now we can make use of this reliable
model into a real life production case. What can we do with it…“the knowledge gained will need
to be organized and presented in a way that the customer can use it”, SPSS. Depending on the
businesses need for the data mining model that was craft, the deployment can be simple or
complex. Simple being as creating the results to report to managers or taking that model and
actually implementing in to the business in need.
Within each phase, there are a set of tasks that the business will help generate for both the data mining
team and business to complete in order to complete through the CRISP-DM cycle successfully.
6 DATA UNDERSTANDING
For the purpose of proving the data mining algorithms used for the case of fraud detention, we will
examine two different data sets. The first is the German Credit fraud data from Dr. Hans Hofmann, the
University of Hamburg in Germany. This particular data has 1000 instances of customer information for
preparation of understanding approval of credit. There are 20 attributes used to help describe each
instance and there uniqueness.
German Creadit Data Definition:
Variable Name Description Type
over_draft Status of existing checking account qualitative
credit_usage real Duration in month numerical
credit_history A30 : no credits taken/
all credits paid back duly
A31 : all credits at this bank paid back duly
A32 : existing credits paid back duly till now
A33 : delay in paying off in the past
A34 : critical account/
other credits existing (not at this bank)
qualitative
purpose Type of credit/loan needed for (new car, used car,
furniture/equipment, radio/tv, repairs, education,
vacation, business,other
qualitative
current_balance real Credit amount numerical
Average_Credit_Balance Savings account/bonds qualitative
employment Present employement since a date qualitative

8
Location real Installment rate in percentage of disposble income numerical
Personal_status Personal status and sex qualitative
other_parties Other debtors / guarantor qualitative
residence_since real Present residence since a date qualitative
property_magnitude Property qualitative
cc_age real Age count in months numerical
other_payment_plans Other installment plans qualitative
housing Housing type - rent, own, for free qualitative
existing_Credits real Number of existing credits at this bank numerical
job Job and type Unemployed/unskilled, skilled,
management/self-employed, highly qualified
qualitative
num_dependents real Number of people being liable to provide maintenance numerical
own_telephone Telephone qualitative
foreign_worker Indicate Yes or No if Foreign worker qualitative
class the cost matrix used for indicating is customer is Good
or Bad
qualitative
The above information is financial data taken from year 1994 of customer that submitted for credit.
Details of the attributes used are of types in integers and categorical formats. This data set is best used
as a Classification data mining task.
The second data source comes from Kaggle, Give Me Some Credit competition.
According to Kaggle the purpose of this second is to so “state of the art in credit scoring by predicting
the probability that somebody will experience financial distress in the next two years.” This dataset uses
less attributes and input for descripting the customers. However this dataset has 4000 instances
Credit distress probability Data Definition:
Variable Name Description Type
SeriousDlqin2yrs Person experienced 90 days past due delinquency or
worse
Y/N
RevolvingUtilizationOfUnsecuredLines Total balance on credit cards and personal lines of credit
except real estate and no installment debt like car loans
divided by the sum of credit limits
percentage
age Age of borrower in years integer
NumberOfTime30-
59DaysPastDueNotWorse
Number of times borrower has been 30-59 days past due
but no worse in the last 2 years.
integer
DebtRatio Monthly debt payments, alimony,living costs divided by
monthy gross income
percentage
MonthlyIncome Monthly income real
NumberOfOpenCreditLinesAndLoans Number of Open loans (installment like car loan or
mortgage) and Lines of credit (e.g. credit cards)
integer
NumberOfTimes90DaysLate Number of times borrower has been 90 days or more
past due.
integer
NumberRealEstateLoansOrLines Number of mortgage and real estate loans including
home equity lines of credit
integer
NumberOfTime60-
89DaysPastDueNotWorse
Number of times borrower has been 60-89 days past due
but no worse in the last 2 years.
integer

9
NumberOfDependents Number of dependents in family excluding themselves
(spouse, children etc.)
integer
The goal of this particular data is to help build a model to help those that are borrowing make better
financial decisions. This will be under a Classification task type as well.
7 DATA PREPARATION
The process used to was very simple for the German Credit fraud data. Decided to use an ARFF file
format due to the source of the data from UC Irvine, Machine Learning Repository to organize. Had
collect the attributes needed for the analysis and paste into notepad application with the @ beginning
symbols to denote that these variables are the attributes. Then taking the @data field and pasting
below the raw data. As long as the raw data that’s divided by commas has the same number of values
to match the number of attributes given, the file will be accurate.
Here’s a snapshot of what the inside of an ARFF file will contains:
@relation german_credit
@attribute over_draft { '<0', '0<=X<200', '>=200', 'no checking'}
@attribute credit_usage real
@attribute credit_history { 'no credits/all paid', 'all paid', 'existing paid', 'delayed
previously', 'critical/other existing credit'}
@attribute purpose { 'new car', 'used car', furniture/equipment, radio/tv, 'domestic appliance',
repairs, education, vacation, retraining, business, other}
@attribute current_balance real
@attribute Average_Credit_Balance { '<100', '100<=X<500', '500<=X<1000', '>=1000', 'no known
savings'}
@attribute employment { unemployed, '<1', '1<=X<4', '4<=X<7', '>=7'}
@attribute location real
@attribute personal_status { 'male div/sep', 'female div/dep/mar', 'male single', 'male mar/wid',
'female single'}
@attribute other_parties { none, 'co applicant', guarantor}
@attribute residence_since real
@attribute property_magnitude { 'real estate', 'life insurance', car, 'no known property'}
@attribute cc_age real
@attribute other_payment_plans { bank, stores, none}
@attribute housing { rent, own, 'for free'}
@attribute existing_credits real
@attribute job { 'unemp/unskilled non res', 'unskilled resident', skilled, 'high qualif/self
emp/mgmt'}
@attribute num_dependents real
@attribute own_telephone { none, yes}
@attribute foreign_worker { yes, no}
@attribute class { good, bad}
@data
'<0',6,'critical/other existing credit',radio/tv,1169,'no known savings','>=7',4,'male
single',none,4,'real estate',67,none,own,2,skilled,1,yes,yes,good
'0<=X<200',48,'existing paid',radio/tv,5951,'<100','1<=X<4',2,'female div/dep/mar',none,2,'real
estate',22,none,own,1,skilled,1,none,yes,bad
A compiled version of this can be viewed here 
http://weka.8497.n7.nabble.com/file/n23121/credit_fruad.arff
The second data source is a CSV file that required some modification. The original data set file had
150,000 instances which was too large for the WEKA data mining tool’s heap size to handle. I manually
removed sections of the data set by dividing the total 150,000 into four sections and removing 3,750

10
from bottom half of the sections. I did this so I have an even amount of distributed data instead of only
taking the top 4000 instances. This file was then saved as cs-traning.csv file for WEKA input.
For both files, I created Data definitions to elaborate on their chosen attributes. The purpose of this is to
explain what inputs are used and its purpose for a clearer business understanding when we need to
make a decision after reviewing the results.
8 DATA MINING ALGORITHM
Select data mining algorithm for your project, elaborate chosen algorithm in detail with reason why
algorithm was chosen over other algorithms.
German Credit Fraud Use case:
The first dataset using the German credit fraud data was ideal for the use of Decision tree algorithm.
This type of classification algorithm, the decision tree is a very first good technique for this particular use
case. Because this is a very common algorithm to use, we have many reason of a benefit to do an
analysis with this method. First, it’s relatively simple approach for classification type of data. It gives the
ability to take sample data with known attributes and place them into categories. Second, it help you
visualize the workflow of how the data is been broken down into sections that make decisions. Last, you
can determine a predictive outcome from the results.
Using Decision Tree, we need to explain more in depth how this method works and its structure. When
data is ran through this type of analysis, it structures the data into these three areas called nodes:
 The root node is an attribute used to question the initial question if or if not something is in a
particular group or not to start off with. This can only be a single node item where the groups
are then branched off from it much like a tree from the bottom.
 Then there grows the internal nodes which is a title of the proceeding branch off the root node.
The purpose of the internal node is to give information only pertaining to that group.
 Lastly, there’s the leaf node(s). Think of these are individual leafs as answers to the internal
nodes. There can be multiple leads branching off an internal node. These finalize the answer of
the item in a particular internal node group.

11
Figure 8-1 Decision tree
The simple method works each time for data that has multiple attributes that has classification of types
in them. For the purposes of our fraudulent credit problem, out root node answers the major questions
of those that will over draft or not. In this cases, I wouldn’t care to use the nodes that are derived, but
more so of the confusion matrix.
The Confusion Matrix helps classifies the actual classes from those that are potentially negative classed
numbers.5
We use the confusion matrix to help spate the decisions made by the classifier making
explicit how one class is being confused for another. This way error can be handled separately. We do
this be looking at the True class items and the predicted class items in a matrix box.
TP = true positive
FN = false negative
PREDICTED CLASS
ACTUAL
CLASS
Yes No
Yes a=TP b=FN
Yes c=FN d=TN
Figure 8.2 – Confusion Matrix
The goal here is to have the model obtain the highest possible accuracy rate or lowest error rate.
**Confusion matrix
9 EXPERIMENTAL RESULTS AND ANALYSIS
To test our fraudulent data, the test was executed with two different data mining techniques. Decision
tree and the Simple K-means algorithms to generate the results.
German Credit Data Use:
Examining the first data set of the German credit fraud data, we analyze this data using the decision tree
algorithm. The outcome below shows the summary of the results.
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 705 70.5 %
Incorrectly Classified Instances 295 29.5 %
Kappa statistic 0.2467
Mean absolute error 0.3467
Root mean squared error 0.4796
Relative absolute error 82.5233 %
Root relative squared error 104.6565 %
Total Number of Instances 1000
=== Detailed Accuracy By Class ===
5
See more information from Assignment 2: Marketing Campaign Effectiveness, Kennedy, Albert

12
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.84 0.61 0.763 0.84 0.799 0.639 good
0.39 0.16 0.511 0.39 0.442 0.639 bad
Weighted Avg. 0.705 0.475 0.687 0.705 0.692 0.639
=== Confusion Matrix ===
a b <-- classified as
588 112 | a = good
183 117 | b = bad
What we are looking for here is a way to determine a good enough model for use. Based on the results,
we have a 70% accuracy of correct classified instances in the data set. This is decent for justification.
However, the ROC area curve is at 64%, which isn’t bad nor near perfect or ideal. How about we
consider the confusion matrix for deeper analysis. Remember from below the confusion matrix is
separated on four different section to help determine where our good and bad classifiers fall into best.
A = True Positive
B = False Negatives
Our matrix has an overwhelming amount, 588 instances that fall under the True Positive section where
it also equals true for the predicted and actual class. Say we take the percentage for the A class
588 + 183 = 771 (total)
588 (TP) / 771(total) = 76% accuracy
In order to generate this results, the data mining tool used was Weka v3-6-12
Give Me Some Credit Use:
Using the data set from the Kapple competition, we have a difference approach of how we want to view
the results. Initially, trying the decision tree algorithm for this dataset did not yield any solid enough
results for analysis purposes or to make any business sense. So, it was best to do a Cluster algorithm
type and view the results.
Choosing the Simple K Means was the algorithm used for this.
kMeans
======
Number of iterations: 10
Within cluster sum of squared errors: 5429.211249046848
Missing values globally replaced with mean/mode
Cluster centroids:

13
Time taken to build model (full training data) : 0.05 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 401 ( 40%)
1 266 ( 27%)
2 333 ( 33%)
To help explain the results, it will help to define the K-means method and its use. The K-means is a
commonly used cluster algorithm which is “simple, iterative way to approach and divide a data set into
specific number of cluster”- Manning. This process run through any dataset and uses a “closeness” as a
way to measure the distance of items in a dataset. This is called the Euclidean distance. To help explain
how the Euclidean distance works there’s a center point that’s created which is the centroid as the
cluster. Any number of items of K, that is within that distance of the centroid is defined as a cluster that
it is assigned to. If one item is closer in distance to another centroid than another centroid point, that
closer item is part of a different cluster.
In the above analysis, we have three defined clusters separated into their groups based on like
attributes their share.
Cluster 0:
This cluster has the strongest poll of instances. The results would suggest that in this cluster more
individuals that has an existing paid credit history and happen to be in the younger age group (31) ,
females and are most likely to be requesting a NEW CAR for credit.

14
Cluster 1:
This cluster has the weakness amount of instances. The results would suggest that customers in this
group are the oldest (age 40) are seeking for credit for USED CARS.
Cluster 2:
This cluster has a distribution of 33%. The results would suggest these are SINGLE MALE customers
seeking credit for RADIO/TV.
10 CONCLUSION
The completed analysis drawn from two different dataset would yield two different ending business
decision results. If visually inspecting the first dataset using the decision tree algorithm, I would initial
conclude this is not a valid enough test to make as a solid choice for fraud detention. We have to
remember the goal is to define the problem where fraud used is being done where customer
information is being misuse in place to benefit another person by creating credit accounts. However,
the confusion matrix does present some promising results to consider. What a financial institution can
take from that is a starter point of where to predict. However it gives about a 76% accuracy of the
changes of this results be true for prediction.
The second analysis is to be view from a different approach not so much as for predictive, but for a clear
view of where customer land is their relative attributes that tie to them. Using the Simple K-Mean
algorithm and picking 3 clusters, we can analysis a few important things:
1) where the most important group of customers are at
2) the related attributes in comparison to other groups
3) purpose of line of credit
4) Instances or distribution percentage of which group has the highest activity and means to apply
for credit.
With the clustering results, a business can answer the above questions ahead of time. We can pick and
choose the determinant for our analysis. For our purpose we wish to detent the reason for a line of
credit.

15
11 FUTURE WORK
Describe next steps to continue work on this project
There are organizations that already taking the necessary actions to benefit from these findings.
Companies like Morpho with their Safran product. Here’s a list of procedures and steps that should be
consider in order for this study to become successful.
1) Ensure those data scientist and analyst placed with the responsibility to craft these results to
use proper data mining practices and methodologies like CRISP-DM.
2) Take risk: I believe there are enough intelligent groups of people that understand the worth and
capable of drafting similar fraud detention analysis. There needs to more action on the
deployment phase.
3) Implement a production system where this process is automated.
Based on the analysis and results we have a plausible solution to solve a business problem with fraud
due to credit applications.

16
12 REFERENCES
Tan, Pang and Michel Steinbach. Introduction to Data Mining. Boston: Pearson Addison Wesley, 2005.
Print (Chp 1, Chp 4).
Provost, Foster, and Fawcett, Tom, Data Science for Business, Sebastopol, CA, 2014. Print (Chp 2, Chp 7)
Morpho, Safran, Fighting Identity Fraud with Data Mining, Groundbreaking means to prevent fraud in
identity management solutions, France, Print (page 4, and page 7)
Federal Data Corporation and SAS, Using Data Mining Techniques for Fraud Detection, Solving Business
Problems using SAS Enterprise Minder Software, Cary, NC, Print (page 1, page 15 and page 20)
Dr. Hofmann, Hans, University at Hamburg, UCI Machine Learning Reposity, CA, 2000,
http://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
Kaggle, Give Me Some Credit, 201, https://www.kaggle.com/c/GiveMeSomeCredit

Fraud Detection using Data Mining Project

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Fraud Detection using Data Mining Project

Similar to Fraud Detection using Data Mining Project (20)

Fraud Detection using Data Mining Project