CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Fraud Detection Analysis
VARSHA
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
• Introduction
• Data collection & cleaning
• Exploratory data analysis
• Data pre-processing
• Model selection
• Model evaluation
• Conclusion
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Introduction
A company called Block Fraud specializes in
identifying and preventing fraud in mobile financial
transactions. Their plan is to enter the Brazilian
market by implementing a competitive pricing
strategy leveraging their high accuracy in fraud
detection.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
key metrics
• precision score : It is a metric that
measures the ability of a model to
identify relevant data points
• recall: a machine learning metric
that measures how often a model
correctly identifies positive
instances in a dataset.
• F1 score:It is a metric that
measures a model's performance
by combining precision and recall.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Click to edit
Master title
style
Data cleaning
• Handled categorical features using one-hot
encoding.
• Checked for null values and ensured all
features relevant to modeling are numeric.
• These actions form a comprehensive data
cleaning process that prepares the dataset for
effective model training. Each step ensures
that the data is in the best possible shape to
train a robust and accurate model.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Outliers:
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Exploratory Data Analysis
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Click to edit
Master title
style
Data pre-processing
• Standardized the dataset using
StandardScaler to prepare for model training.
• Applied StandardScaler to numerical features
to normalize the data, ensuring a mean of 0
and standard deviation of 1. This step helps
improve model performance, especially for
models sensitive to feature scaling.
• Identified and addressed outliers that could
skew the model’s predictions, either by
capping or removing extreme values.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Model training and Evaluation
• Chose Logistic Regression as the model for predicting fraud, a suitable
choice given the binary nature of the target variable (isFraud).
• Split the dataset into training and testing sets using train_test_split,
ensuring that the model is trained on one portion of the data and
evaluated on another to gauge its performance on unseen data.
• Model Training: Applied StandardScaler to the features in both training
and testing sets to standardize the input data, ensuring the model treats
each feature equally regardless of scale
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Model Training:
Trained the Logistic Regression model on the preprocessed and
scaled training data.
Fit the model using X_train and y_train, optimizing the model's
parameters on the training data.
Accuracy Score: Evaluated the model’s performance by calculating
the accuracy on the test set, comparing predicted labels with
actual labels in y_test.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
• Confusion Matrix: Analyzed the confusion matrix to
understand the number of true positives, true negatives,
false positives, and false negatives.
• Precision, Recall, F1-Score: Assessed precision, recall, and
F1-score to get a balanced view of model performance,
especially considering class imbalance (if any).
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
By using Logistic Regression Model
• Confusion Matrix:
True Negatives (TN): 1999
False Positives (FP): 0
False Negatives (FN): 27
True Positives (TP): 203
• Accuracy: 98.79%
• Classification Report:
• Precision for Class 1 (Fraud): 1.00
• Recall for Class 1 (Fraud): 0.88
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Financial Impact Analysis
Revenue from True Positives (Correctly Identified Fraudulent
Transactions):
Revenue per Correct Fraudulent Transaction: 25% of the transaction value
True Positives (TP): 203
Total Revenue: 203 × 25 % ×
average fraud transaction value
203×25%×average fraud transaction value
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
2.Costs from False Negatives (Missed Fraudulent Transactions):
Cost per Missed Fraudulent Transaction: 100% refund of the transaction value
False Negatives (FN): 27
Total Cost: 27×100%×V
3.Costs from False Positives (Incorrectly Identified Legitimate Transactions):
Cost per Incorrectly Identified Legitimate Transaction: 5% charge
False Positives (FP): 0 (So, the cost is $0 in this case)
4. Net Financial Impact:
The Net Financial Impact can be calculated by subtracting the total costs (from False
Negatives and False Positives) from the total revenue (from True Positives).
Net Financial Impact=(203×0.25×V)−(27×1×V)
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Click to edit
Master title style
Conclusion
• In this project, developed a model to detect
fraudulent transactions, focusing on thorough data
preparation and feature engineering.
• The net financial impact is positive, implying that your
model has a beneficial effect on the company's
revenue. For every fraudulent transaction detected,
the company earns a significant portion of the
transaction value, minus the costs associated with
missed frauds.
• If you know the average value of fraudulent
transactions (V), you can substitute it into the formula
to get the exact financial impact
.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Thank You!

Fortifying Fraud Detection: Advanced Data Analysis Techniques for Enhanced Security

  • 1.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Fraud Detection Analysis VARSHA
  • 2.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. • Introduction • Data collection & cleaning • Exploratory data analysis • Data pre-processing • Model selection • Model evaluation • Conclusion
  • 3.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Introduction A company called Block Fraud specializes in identifying and preventing fraud in mobile financial transactions. Their plan is to enter the Brazilian market by implementing a competitive pricing strategy leveraging their high accuracy in fraud detection.
  • 4.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. key metrics • precision score : It is a metric that measures the ability of a model to identify relevant data points • recall: a machine learning metric that measures how often a model correctly identifies positive instances in a dataset. • F1 score:It is a metric that measures a model's performance by combining precision and recall.
  • 5.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Click to edit Master title style Data cleaning • Handled categorical features using one-hot encoding. • Checked for null values and ensured all features relevant to modeling are numeric. • These actions form a comprehensive data cleaning process that prepares the dataset for effective model training. Each step ensures that the data is in the best possible shape to train a robust and accurate model.
  • 6.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Outliers:
  • 7.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Exploratory Data Analysis
  • 8.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
  • 9.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Click to edit Master title style Data pre-processing • Standardized the dataset using StandardScaler to prepare for model training. • Applied StandardScaler to numerical features to normalize the data, ensuring a mean of 0 and standard deviation of 1. This step helps improve model performance, especially for models sensitive to feature scaling. • Identified and addressed outliers that could skew the model’s predictions, either by capping or removing extreme values.
  • 10.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Model training and Evaluation • Chose Logistic Regression as the model for predicting fraud, a suitable choice given the binary nature of the target variable (isFraud). • Split the dataset into training and testing sets using train_test_split, ensuring that the model is trained on one portion of the data and evaluated on another to gauge its performance on unseen data. • Model Training: Applied StandardScaler to the features in both training and testing sets to standardize the input data, ensuring the model treats each feature equally regardless of scale
  • 11.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Model Training: Trained the Logistic Regression model on the preprocessed and scaled training data. Fit the model using X_train and y_train, optimizing the model's parameters on the training data. Accuracy Score: Evaluated the model’s performance by calculating the accuracy on the test set, comparing predicted labels with actual labels in y_test.
  • 12.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. • Confusion Matrix: Analyzed the confusion matrix to understand the number of true positives, true negatives, false positives, and false negatives. • Precision, Recall, F1-Score: Assessed precision, recall, and F1-score to get a balanced view of model performance, especially considering class imbalance (if any).
  • 13.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. By using Logistic Regression Model • Confusion Matrix: True Negatives (TN): 1999 False Positives (FP): 0 False Negatives (FN): 27 True Positives (TP): 203 • Accuracy: 98.79% • Classification Report: • Precision for Class 1 (Fraud): 1.00 • Recall for Class 1 (Fraud): 0.88
  • 14.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Financial Impact Analysis Revenue from True Positives (Correctly Identified Fraudulent Transactions): Revenue per Correct Fraudulent Transaction: 25% of the transaction value True Positives (TP): 203 Total Revenue: 203 × 25 % × average fraud transaction value 203×25%×average fraud transaction value
  • 15.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. 2.Costs from False Negatives (Missed Fraudulent Transactions): Cost per Missed Fraudulent Transaction: 100% refund of the transaction value False Negatives (FN): 27 Total Cost: 27×100%×V 3.Costs from False Positives (Incorrectly Identified Legitimate Transactions): Cost per Incorrectly Identified Legitimate Transaction: 5% charge False Positives (FP): 0 (So, the cost is $0 in this case) 4. Net Financial Impact: The Net Financial Impact can be calculated by subtracting the total costs (from False Negatives and False Positives) from the total revenue (from True Positives). Net Financial Impact=(203×0.25×V)−(27×1×V)
  • 16.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Click to edit Master title style Conclusion • In this project, developed a model to detect fraudulent transactions, focusing on thorough data preparation and feature engineering. • The net financial impact is positive, implying that your model has a beneficial effect on the company's revenue. For every fraudulent transaction detected, the company earns a significant portion of the transaction value, minus the costs associated with missed frauds. • If you know the average value of fraudulent transactions (V), you can substitute it into the formula to get the exact financial impact .
  • 17.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Thank You!