STUDENT NAME : SANGEETHA T
REGISTER NUMBER : 422323106022
INSTITUTION : TCET - VANDAVASI
DEPARTMENT : ECE – II ND YEAR
DATE OF SUBMISSION : 15-05-2025
GITHUB REPOSITORY LINK:
https://github.com/Sangee56/sangeetha.a.git
AI-POWERED CREDIT CARD FRAUD
DETECTION AND PREVENTION
PROBLEM STATEMENT
The increasing prevalence of credit card fraud in the digital era necessitates the development
of robust and efficient fraud detection systems.
This project aims to develop a machine-learning model to detect credit card fraud. The
model will be trained on a dataset of historical credit card transactions and evaluated on a
holdout dataset of unseen transactions.
ABSTRACT
● AI-Powered Credit Card Fraud Detection and Prevention
● Credit card fraud poses a significant threat to financial institutions and consumers,
resulting in substantial financial losses and eroding customer trust. Traditional rule-
based fraud detection systems, while effective to some extent, often struggle to keep
up with evolving fraud tactics. This paper presents an AI-powered approach to credit
card fraud detection and prevention, leveraging machine learning algorithms and real-
time data analysis to improve detection accuracy and reduce false positives. The
proposed system utilizes a combination of supervised learning for known fraud
patterns and anomaly detection for identifying previously unseen threats. Key
techniques include data preprocessing, feature engineering, and the application of
advanced models such as random forests, gradient boosting, and deep learning neural
networks. This AI-driven framework not only enhances security but also provides a
scalable, adaptive solution for mitigating financial fraud in a rapidly changing digital
landscape.
SYSTEM REQUIREMENTS
• Operating System – Windows 8/9/10/11
• Jupyter lab
• Visual Studio Code(VS code)
• Python
• Processor : intel Processor i3 or Above
• CPU : 2.0GHz or above
• RAM : 4GB or more
• Hard Disk : 500GB
OBJECTIVES
● This project tackles the critical challenge of credit card fraud
detection and prevention.
● Our goal is to develop effective methods using machine learning,
anomaly detection, and deep learning to identify fraudulent
activities.
● Objective : Enhancing financial transaction security and
minimizing fraudulent losses.
FLOWCHART OF PROJECT WORKFLOW
Genre Distribution: Number of ratings per
user:
DATASET DESCRIPTION
● Data Description: The dataset was retrieved from an open-source website,
Kaggle.com. It contains data on transactions made in 2013 by European credit card users
in two days only. The dataset consists of 31 attributes and 284,808 rows.
○ Twenty-eight attributes are numeric variables that, due to the confidentiality and privacy of the customers.
○ Time: which contains the elapsed seconds between the first and other transactions of each Attribute.
○ Amount : Which is the amount of each transaction
○ Claswhich contains binary s : variables where 1 is a case of fraudulent transaction, and 0 is not as case of
fraudulent transaction.
● Dataset : https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
DATA PREPROCESSING
converted categorical
into numerical
variables-
•Binary Encoding : Gender
•One Hot Encoding :
Transaction Category
Encoding
Performed standard
scaling to normalize
numerical features.
Ensures all variables
are on a similar scale,
preventing features with
larger magnitudes from
dominating the model.
Standard
Scaling:
To handle imbalance of
the dataset.
Adding more copies of
the minority class to
balance the dataset.
SMOTE (Synthetic
Minority Over-sampling
Technique) -
• a smarter way to
oversample, it creates
synthetic samples that
are similar to the
existing minority class
samples.
Oversampling
EDA (Exploratory Data Analysis)
Data
CleaningRemoved the
columns that are
not required for
model building
No nulls were
there & Rectified
inappropriate
datatype
Feature
Engineering
Created Some
new features as
required
•For e.g., is_fraud_cat
for categorical
analysis,
•for numerical analysis
age' , 'trans_month',
'trans_year',
'month_name’,etc.
Categorical
Variable
Analysis
Visualized -
•Transaction
categories and
gender distribution,
both for the entire
dataset and
specifically for
fraudulent
transactions.
•Top 10 fraudulent
transactions by job,
city, and state
Numerical
Variable
Analysis
Visualized Overall
Skewness
Class balance –
• Not Fraud
(99.4%)
• Fraud (0.6%)
Bivariate
Analysis -
Vizualisation with
'is_fraud'
• age groups ,
• latitudinal &
longitudinal
distance and
• month & year.
FEATURE ENGINEERING
Credit card fraud involves unauthorized use of credit cards to obtain goods, services, or
funds. It affects both individuals and businesses, leading to financial losses and compromised
personal information. Some common types of credit card fraud are:
Card Not Present Fraud:
● Occurs when the physical card isn’t present during a transaction (common in online or
over-the-phone purchases). In 2023, card not present fraud accounted for an estimated
$9.49 billion in losses.
Account Takeover Fraud:
● Fraudsters gain access to a victim’s account to make unauthorized transactions. In
2023, account takeover attacks increased by 354% year-over-year, resulting in almost
$13 billion in losses
MODEL BUILDING
Data collection:
The first phase will involve collecting a dataset of historical credit card transactions. The
data will be collected from various sources, including banks, credit card companies, and
merchants.
Data Cleaning:
• Impute the missing values with the column's mean, median, or mode.
• Drop the rows with missing values.
• Use a machine learning model to predict the missing values like isnull() and heatmap().
Normalize the data:
Normalization is scaling the data so that all features have similar values. This can
improve the performance of machine learning models by making the parts more
comparable.
Model training:
The second phase will involve training the machine learning model on the collected data.
The model will be prepared using a supervised learning algorithm like SVM.
Model evaluation:
The third phase will involve evaluating the machine learning model's performance on a
holdout dataset of unseen transactions. The model's performance will be evaluated using
accuracy, precision, and recall metrics.
MODEL BUILDING
Machine Learning
Technique
• Logistic Regression:
• Interpretability: Provides straightforward interpretations of coefficients
for understanding feature impact on fraud likelihood.
• Simplicity: Easy implementation and understanding facilitate
communication with stakeholders.
• Random Forest:
• Complex Relationship Capture: Excels at capturing complex data
relationships to detect subtle fraud patterns.
• Minimal Feature Engineering: Requires minimal feature manipulation,
suitable for challenging feature selection scenarios.
Anomaly Detection
Technique
• Isolation Forest:
• Efficient Anomaly Detection: Efficiently isolates anomalies (fraudulent
transactions) in high-dimensional data.
• Distribution Agnostic: Robust against various fraud patterns without
assuming specific data distributions.
Deep Learning
Technique
• Neural Network (MLP Classifier):
• Nonlinear Pattern Detection: Captures nonlinear data relationships for
sophisticated fraud detection.
• Scalability: Handles large data volumes and adapts to real-time fraud
detection needs.
Data Analysis
Check Null Data Data Correlation
MODEL EVALUATION
K-Nearest Neighbor (KNN):
Two Ks were used to determine the best KNN
model, K=3 and K =7.
● K = 3 While making the KNN model, We
created two models: K =3 and K =7.
Figure 5 shows the model created in Jupiter
Notebook; the model scored an accuracy
of 100% and identified 85,443 transactions
correctly and missed 131.
• K=7
There was a slight decrease in the Accuracy of the
model created in Jupiter Note-
book as it scored 100% when K is 7, and the model
miss classified 131 fraudulent transactions as no
fraudulent. As for the Accuracy is the same as K=3
100% with 52 misclassified transactions .
Continue
Logistic Regression (L.R.):
○ The last model created using Jupiter Notebook is Logistic Regression; the model managed to score an
Accuracy on Training data of 93.51% , while it scored an Accuracy score on Test Data of 91.88%, as
presented in blew Figure.
Continue
Support Vector Machine (SVM):
The model Support Vector Machine, as shown in blew Figure , scored 97.59% for the Accuracy.
SVM Confusion Matrix
Continue
Decision Tree (D.T.):
Continue
Table of Accuracy
DEPLOYMENT
Integrate the trained machine learning models into the retail
organization’s fraud detection system. Ensure seamless
interoperability with existing infrastructure and workflows.
Provide ongoing support and maintenance to monitor model
performance, address emerging fraud threats, and fine-tune
algorithms as necessary
Card Skimming:
● Fraudsters use devices to
capture card information
from ATMs or point-of-sale
terminals. Card skimming
costs consumers and financial
institutions over $1 billion
annually.
Phishing Scams:
● Trick victims into providing
card information through fake
emails, texts, or websites
VISUALIZATION OF RESULTS & MODEL INSIGHTS
Inferences :
• Achieves a perfect accuracy (1.00), indicating it classified all transactions correctly (might be due to overfitting on the training data).
• Both precision and recall are high for both fraudulent and non-fraudulent transactions.
• F1-scores are also high for both classes.
• ROC-AUC score (0.9930) suggests excellent discriminative ability between classes.
• ROC Curve: Close to top-left corner, indicating good TPR-FPR trade-off.
• Precision-Recall Curve: Fairly close to top-left corner, indicating good precision-recall balance.
SOURCE CODE
FUTURE SCOPE
1. Evolution of AI/ML Techniques
Generative AI: This cutting-edge approach revolutionizes fraud
prevention. By combining adaptive learning, large dataset
handling, improved anomaly detection, and reduced false
positives, generative AI enhances our ability to stay ahead of
fraudsters.
Explainable AI (XAI): Researchers are working on making
complex AI models more interpretable. XAI ensures that we
understand why a model makes specific predictions, which is
crucial for trust and accountability.
Hybrid Models: Combining different ML techniques—such as
neural networks, decision trees, and clustering—allows us to
leverage their strengths and mitigate their weaknesses
TEAMS MEMBERS AND CONTRIBUTIONS
HEMALATHA S : PROBLEM STATEMENT & ABSTRACT ,OBJECTIVE, FEATURE
ENGINEERING ,DEPLOYMENT,SOURCE CODE
SANGEETHA T : DATA SET DESCRIPTION & PREPROCESSING , EDA , MODEL
BUILDING, FLOWCHART OF THE PROJECT WORKFLOW
MAGESHWARAN P : MODEL BUILDING & FUTURE SCOPE, SYSTEM
REQUIEMENTS, MODEL EVALUATION

SANGEETHA PHASE 3phasephasephasephh.pptx

  • 1.
    STUDENT NAME :SANGEETHA T REGISTER NUMBER : 422323106022 INSTITUTION : TCET - VANDAVASI DEPARTMENT : ECE – II ND YEAR DATE OF SUBMISSION : 15-05-2025 GITHUB REPOSITORY LINK: https://github.com/Sangee56/sangeetha.a.git
  • 2.
    AI-POWERED CREDIT CARDFRAUD DETECTION AND PREVENTION
  • 3.
    PROBLEM STATEMENT The increasingprevalence of credit card fraud in the digital era necessitates the development of robust and efficient fraud detection systems. This project aims to develop a machine-learning model to detect credit card fraud. The model will be trained on a dataset of historical credit card transactions and evaluated on a holdout dataset of unseen transactions.
  • 4.
    ABSTRACT ● AI-Powered CreditCard Fraud Detection and Prevention ● Credit card fraud poses a significant threat to financial institutions and consumers, resulting in substantial financial losses and eroding customer trust. Traditional rule- based fraud detection systems, while effective to some extent, often struggle to keep up with evolving fraud tactics. This paper presents an AI-powered approach to credit card fraud detection and prevention, leveraging machine learning algorithms and real- time data analysis to improve detection accuracy and reduce false positives. The proposed system utilizes a combination of supervised learning for known fraud patterns and anomaly detection for identifying previously unseen threats. Key techniques include data preprocessing, feature engineering, and the application of advanced models such as random forests, gradient boosting, and deep learning neural networks. This AI-driven framework not only enhances security but also provides a scalable, adaptive solution for mitigating financial fraud in a rapidly changing digital landscape.
  • 5.
    SYSTEM REQUIREMENTS • OperatingSystem – Windows 8/9/10/11 • Jupyter lab • Visual Studio Code(VS code) • Python • Processor : intel Processor i3 or Above • CPU : 2.0GHz or above • RAM : 4GB or more • Hard Disk : 500GB
  • 6.
    OBJECTIVES ● This projecttackles the critical challenge of credit card fraud detection and prevention. ● Our goal is to develop effective methods using machine learning, anomaly detection, and deep learning to identify fraudulent activities. ● Objective : Enhancing financial transaction security and minimizing fraudulent losses.
  • 7.
  • 8.
    Genre Distribution: Numberof ratings per user:
  • 9.
    DATASET DESCRIPTION ● DataDescription: The dataset was retrieved from an open-source website, Kaggle.com. It contains data on transactions made in 2013 by European credit card users in two days only. The dataset consists of 31 attributes and 284,808 rows. ○ Twenty-eight attributes are numeric variables that, due to the confidentiality and privacy of the customers. ○ Time: which contains the elapsed seconds between the first and other transactions of each Attribute. ○ Amount : Which is the amount of each transaction ○ Claswhich contains binary s : variables where 1 is a case of fraudulent transaction, and 0 is not as case of fraudulent transaction. ● Dataset : https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
  • 10.
    DATA PREPROCESSING converted categorical intonumerical variables- •Binary Encoding : Gender •One Hot Encoding : Transaction Category Encoding Performed standard scaling to normalize numerical features. Ensures all variables are on a similar scale, preventing features with larger magnitudes from dominating the model. Standard Scaling: To handle imbalance of the dataset. Adding more copies of the minority class to balance the dataset. SMOTE (Synthetic Minority Over-sampling Technique) - • a smarter way to oversample, it creates synthetic samples that are similar to the existing minority class samples. Oversampling
  • 11.
    EDA (Exploratory DataAnalysis) Data CleaningRemoved the columns that are not required for model building No nulls were there & Rectified inappropriate datatype Feature Engineering Created Some new features as required •For e.g., is_fraud_cat for categorical analysis, •for numerical analysis age' , 'trans_month', 'trans_year', 'month_name’,etc. Categorical Variable Analysis Visualized - •Transaction categories and gender distribution, both for the entire dataset and specifically for fraudulent transactions. •Top 10 fraudulent transactions by job, city, and state Numerical Variable Analysis Visualized Overall Skewness Class balance – • Not Fraud (99.4%) • Fraud (0.6%) Bivariate Analysis - Vizualisation with 'is_fraud' • age groups , • latitudinal & longitudinal distance and • month & year.
  • 12.
    FEATURE ENGINEERING Credit cardfraud involves unauthorized use of credit cards to obtain goods, services, or funds. It affects both individuals and businesses, leading to financial losses and compromised personal information. Some common types of credit card fraud are: Card Not Present Fraud: ● Occurs when the physical card isn’t present during a transaction (common in online or over-the-phone purchases). In 2023, card not present fraud accounted for an estimated $9.49 billion in losses. Account Takeover Fraud: ● Fraudsters gain access to a victim’s account to make unauthorized transactions. In 2023, account takeover attacks increased by 354% year-over-year, resulting in almost $13 billion in losses
  • 14.
    MODEL BUILDING Data collection: Thefirst phase will involve collecting a dataset of historical credit card transactions. The data will be collected from various sources, including banks, credit card companies, and merchants. Data Cleaning: • Impute the missing values with the column's mean, median, or mode. • Drop the rows with missing values. • Use a machine learning model to predict the missing values like isnull() and heatmap(). Normalize the data: Normalization is scaling the data so that all features have similar values. This can improve the performance of machine learning models by making the parts more comparable. Model training: The second phase will involve training the machine learning model on the collected data. The model will be prepared using a supervised learning algorithm like SVM. Model evaluation: The third phase will involve evaluating the machine learning model's performance on a holdout dataset of unseen transactions. The model's performance will be evaluated using accuracy, precision, and recall metrics.
  • 15.
    MODEL BUILDING Machine Learning Technique •Logistic Regression: • Interpretability: Provides straightforward interpretations of coefficients for understanding feature impact on fraud likelihood. • Simplicity: Easy implementation and understanding facilitate communication with stakeholders. • Random Forest: • Complex Relationship Capture: Excels at capturing complex data relationships to detect subtle fraud patterns. • Minimal Feature Engineering: Requires minimal feature manipulation, suitable for challenging feature selection scenarios. Anomaly Detection Technique • Isolation Forest: • Efficient Anomaly Detection: Efficiently isolates anomalies (fraudulent transactions) in high-dimensional data. • Distribution Agnostic: Robust against various fraud patterns without assuming specific data distributions. Deep Learning Technique • Neural Network (MLP Classifier): • Nonlinear Pattern Detection: Captures nonlinear data relationships for sophisticated fraud detection. • Scalability: Handles large data volumes and adapts to real-time fraud detection needs.
  • 16.
    Data Analysis Check NullData Data Correlation
  • 18.
    MODEL EVALUATION K-Nearest Neighbor(KNN): Two Ks were used to determine the best KNN model, K=3 and K =7. ● K = 3 While making the KNN model, We created two models: K =3 and K =7. Figure 5 shows the model created in Jupiter Notebook; the model scored an accuracy of 100% and identified 85,443 transactions correctly and missed 131. • K=7 There was a slight decrease in the Accuracy of the model created in Jupiter Note- book as it scored 100% when K is 7, and the model miss classified 131 fraudulent transactions as no fraudulent. As for the Accuracy is the same as K=3 100% with 52 misclassified transactions .
  • 19.
    Continue Logistic Regression (L.R.): ○The last model created using Jupiter Notebook is Logistic Regression; the model managed to score an Accuracy on Training data of 93.51% , while it scored an Accuracy score on Test Data of 91.88%, as presented in blew Figure.
  • 20.
    Continue Support Vector Machine(SVM): The model Support Vector Machine, as shown in blew Figure , scored 97.59% for the Accuracy. SVM Confusion Matrix
  • 21.
  • 22.
  • 23.
    DEPLOYMENT Integrate the trainedmachine learning models into the retail organization’s fraud detection system. Ensure seamless interoperability with existing infrastructure and workflows. Provide ongoing support and maintenance to monitor model performance, address emerging fraud threats, and fine-tune algorithms as necessary
  • 25.
    Card Skimming: ● Fraudstersuse devices to capture card information from ATMs or point-of-sale terminals. Card skimming costs consumers and financial institutions over $1 billion annually. Phishing Scams: ● Trick victims into providing card information through fake emails, texts, or websites
  • 26.
    VISUALIZATION OF RESULTS& MODEL INSIGHTS Inferences : • Achieves a perfect accuracy (1.00), indicating it classified all transactions correctly (might be due to overfitting on the training data). • Both precision and recall are high for both fraudulent and non-fraudulent transactions. • F1-scores are also high for both classes. • ROC-AUC score (0.9930) suggests excellent discriminative ability between classes. • ROC Curve: Close to top-left corner, indicating good TPR-FPR trade-off. • Precision-Recall Curve: Fairly close to top-left corner, indicating good precision-recall balance.
  • 27.
  • 29.
    FUTURE SCOPE 1. Evolutionof AI/ML Techniques Generative AI: This cutting-edge approach revolutionizes fraud prevention. By combining adaptive learning, large dataset handling, improved anomaly detection, and reduced false positives, generative AI enhances our ability to stay ahead of fraudsters. Explainable AI (XAI): Researchers are working on making complex AI models more interpretable. XAI ensures that we understand why a model makes specific predictions, which is crucial for trust and accountability. Hybrid Models: Combining different ML techniques—such as neural networks, decision trees, and clustering—allows us to leverage their strengths and mitigate their weaknesses
  • 30.
    TEAMS MEMBERS ANDCONTRIBUTIONS HEMALATHA S : PROBLEM STATEMENT & ABSTRACT ,OBJECTIVE, FEATURE ENGINEERING ,DEPLOYMENT,SOURCE CODE SANGEETHA T : DATA SET DESCRIPTION & PREPROCESSING , EDA , MODEL BUILDING, FLOWCHART OF THE PROJECT WORKFLOW MAGESHWARAN P : MODEL BUILDING & FUTURE SCOPE, SYSTEM REQUIEMENTS, MODEL EVALUATION