CREDIT_CARD.ppt

CREDIT CARD FRAUD DETECTION
USING RANDOM FOREST
ALGORITHM
GUIDED BY
TEAM MEMBERS

ABSTRACT
❖Credit Card Fraud is increasing considerably with the
development of modern technology.
❖Here we mainly focus on credit card fraud transactions in real
world. Initially we will collect the credit card dataset and then the
dataset will be analysed and processed.
❖After that random forest algorithm is applied for obtaining the
accuracy of the dataset. Finally the number of fraud transactions
present in the dataset will be identified.
❖The performance of the techniques is evaluated based on
accuracy, sensitivity, specificity and precision. The accuracy of the
resultant dataset obtained is about 98%.

EXISTING SYSTEM
❖In existing system methods such as Cluster Analysis, SVM,
Bayesian network, Logistic Regression, Naïve Bayer’s , Hidden
Markov model etc are used to find out the credit card fraud
transactions.
❖The methods used in the existing system are based on
unsupervised learning and the accuracy obtained by these
methods is about 60-70%.

PROPOSED SYSTEM
❖The proposed system overcomes the above mentioned issue in
an efficient way. It aims at analysing the number of fraud of fraud
transactions that are present in the dataset.
❖In proposed System, we use Random forest algorithm to
classify the credit card dataset. Random Forest is an algorithm for
classification and regression.
❖The dataset is classified into trained and test dataset where the
data can be trained individually. The Random Forest Algorithm
can able to process large amount of data.
❖Even for large dataset this algorithm is extremely fast and can
able to give accuracy of about 98%. Finally the number of fraud
transactions will be identified and represented in the form of
confusion matrix.

LITERATURE SURVEY
s.no Title Year Authors Techniques
used
Demerits
1. Credit Card 2017 Andrea Dal Pozzolo, Cluster analysis, Accuracy of an
Fraud Detection: Giacomo Boracchi, Artificial Neural algorithm is only around
A Realistic Olivier Caelen, Cesare Network (ANN) 90%.
Modeling and a Alippi, Gianluca
Novel Learning Bontempi,
Strategy
2 . Credit card fraud 2017 John O. Awoyemi , Naïve Bayes, Imbalanced data set,
detection using Adebayo O. K-Nearest Accuracy of an
Machine Adetunmbi, Samuel A. Neighbour, algorithm is only around
Learning Oluwadare Logistic 71% to 80%
Techniques Regression (Naïve Bayes),
For KNN it degrades
with high-dimension
data as there is little
difference between
nearest and farthest
neighbour.

Sno Title Y
ear Authors Techniques
used
Demerits
3. Analysis on Credit Card
Fraud identification
techniques based on
KNN and outlier
detection
2017 N.Malini,
M. Pushpa
K-Nearest
Neighbour
(KNN),
Outlier
detection
technique
Couldn’t handle large
datasets of range more
than 1.5 lakh transactions,
Accuracy is less than 80%
(degrades with high-
dimension data)
4. Credit card fraud
detection: A Hybrid
approach using fuzzy
clustering and Neural
Networks
2015 Tanumay Kumar
Behera, Suvasini
Panigrahi
Fuzzy
C-means
clustering,Ar
tificial
Neural
Networks
(ANN)
Acuuracy is about 60% to
80% for classifiers,
For essemble result it is
90%

RANDOM FOREST ALGORITHM
❖Random Forest Algorithm is a supervised learning algorithm
which is used for Classification and Regression.
❖Random Forest is a tree based algorithm which involves building
several decision trees, then combining their output to improve
ability of the model.
❖It can be used for identifying the features from the training dataset
and used to handle thousands of input variables.
❖Random Forest Algorithm is widely used where each decision tree
can be trained independently and it reduces over fitting.

REQUIREMENTS
HARDWARE
❖RAM – 4GB
SOFTWARE
❖Anaconda
PROGRAMMING LANGUAGE
❖Python

Dataset
pre-processing Feature extraction
Machine learning
model
Classifier Section
Result
Performance
analysis
Test data
SYSTEM ARCHITECTURE

MODULES DESCRIPTION
MODULE 1: DATA COLLECTION
Data used in this paper is a set of product reviews collected from
credit card transactions records. This step is concerned with selecting
the subset of all available data that you will be working with. ML
problems start with data preferably, lots of data (examples or
observations) for which you already know the target answer. Data for
which you already know the target answer is called labelled data.

MODULE 2: DATA PRE-PROCESSING
Pre-processing refers to the transformations applied to credit card
dataset before feeding it to the algorithm. In python, scikit-learn
library has a pre-built functionality under
sklearn.preprocessing.Three common data pre-processing steps for
Credit Card Dataset are as follows:
1.Formatting
2.Cleaning
3.Sampling

MODULE 3 : FEATURE EXTRACTION
Next thing is to do Feature extraction is an attribute reduction process.
Unlike feature selection, which ranks the existing attributes according
to their predictive significance, feature extraction actually transforms
the attributes. The transformed attributes, or features, are linear
combinations of the original attributes. Finally, our models are trained
using Classifier algorithm. We use classify module on Natural
Language Toolkit library on Python. We use the labelled dataset
gathered. The rest of our labelled data will be used to evaluate the
models. Some machine learning algorithms were used to classify pre-
processed data. The chosen classifiers were Random forest. These
algorithms are very popular in text classification tasks.

MODULE 4: EVALUATION MODEL
➢ Model Evaluation is an integral part of the model
development process. It helps to find the best model that represents
our data and how well the chosen model will work in the future.
Evaluating model performance with the data used for training is not
acceptable in data science because it can easily generate
overoptimistic and over fitted models. There are two methods of
evaluating models in data science,
1. Hold-Out
2. Cross-Validation
➢To avoid over fitting, both methods use a test set to evaluate
model performance. Performance of each classification model is
estimated base on its averaged. The result will be in the visualized
form. Representation of classified data in the form of graphs.
Accuracy is defined as the percentage of correct predictions for the
test data. It can be calculated easily by dividing the number of
correct predictions by the number of total predictions.

SCREENSHOTS
IMPORT PACKAGEAND READ DATASET:

RANDOM FOREST ALGORITHM AND ACCURACY

ACCURACY COMPARISON
METRICS SVM NAVIE
BAYER’S
K MEANS
CLUSTERING
LR RF
ACCURACY 85.05 83.50 78.62 96.82 98.60
SENSITIVITY 84.06 78.00 69.93 95.68 98.87

GRAPHICAL REPRESENTATION
SV
M
NB KMEAN
S
L
R
R
F
0
20
40
60
80
10
0
12
0
ALGORITHM
ACCURACY

CONCLUSION & FUTURE ENHANCEMENT
The Random forest algorithm will perform better with a larger number of training data and
application of more pre-processing techniques would also help. It is one of the most
accurate learning algorithms available. For many data sets, it produces a highly accurate
classifier. It runs efficiently on large databases. It can handle thousands of input variables
without variable deletion. Thus the Random Forest Algorithm produce more accurate
results in credit card fraud detection and it has the capacity to estimate the missing data
and also it can able to handle the large proportion of missing data.
Results show that when the imbalance ratio increases gradually in the data, Random
Forest try to perform very well. As the Random Forest gave better results so there is need
to explore them more with larger datasets using these findings, further extension to this
work can be to apply different resampling techniques on the data to find more insights
for the credit card imbalanced data and get more improved results.

REFERENCES
1“Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning
Strategy”,Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen, Cesare Alippi,
Gianluca Bontempi, IEEE on Neural Networks and Learning Systems,2018.
2“A new Credit card fraud detecting method based on behavior certificate”,
Lutao Zheng, Guanjnu Liu, Wenjing Luan, Zhengchuan Li, Yuwei Zhang,
Chungang Yan, Changjun Jiang, 2018 IEEE 15th Internatinal Conference on
Networking,Sensing and Control(ICNSC).
3”Supervised Machine Learning Algorithms for Credit Card Fraudulent
Transaction Detection: A Comparative Study”, Sahil Dhankhad ,Emad
Mohammed , Behrouz Far, 2018 IEEE International Conference on Information
Reuse and Integration(IRI).
4"Credit Card Fraud Detection using Machine Learning Models and Collating
Machine Learning models", Navanshu Khare and Saad Yunus Sait, International
Journal of Pure and Applied Mathematics, Volume 118 No. 20 2018, 825-
838,2018.
5”Credit Card Fraud Detection using learning to Rank Approach”, N.Kalaiselvi,
S.Rajalakshmi, J.Padmavathi, Joyce B.Karthiga, 2018 International Conference
on Computation of Power, Energy, Information and Communication(ICCPEIC).

CREDIT_CARD.ppt

More Related Content

What's hot

Similar to CREDIT_CARD.ppt

Recently uploaded

CREDIT_CARD.ppt