CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
E-commerce Product Delivery Prediction
Submitted by-Aanchal Agrawal
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
AGENDA
• Project Objective
• Dataset Overview
• Data Preprocessing
• Exploratory Data Analysis
• Correlation Analysis
• Model Training
• Model Performance
• Conclusion
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Objective:
• To conduct Exploratory Data Analysis based on the data set to enhance the
accuracy of detecting fraud in mobile financial transactions.
• By leveraging machine learning, the project seeks to predict fraudulent transactions
with high precision.
• The goal is to develop a robust machine learning model to accurately identify
fraudulent transactions in real-time , enabling the company to improve security,
reduce financial losses, and gain insights into factors contributing to transaction
fraud.
Project Overview
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
END TO END
DEPLOYEMENT
FLOW
Data
Collection,
Exploration
& Cleaning
Data Pre-
Processing
& Feature
Engineering
Data
Modeling
Selection
&
Training
Evaluation
of Model &
Optimizatio
n
Model
Deployment
User
Data
Set
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Dataset
Overview
• Dataset Summary:
• Total Records: 11,143 rows.
• Features: 10 columns including:
• Step, Type, Amount, NameOrig, oldbalanceOrg,
newbalanceOrig, nameDest, oldbalanceDest,
newbalanceDest and isFraud.
• Target Variable: IsFraud.Y/N
(1: Yes, 0: No)
• Key Attributes:
• Type: Cash-in, Cash-Out, Debit, Payment And Transfer.
• Amount: Amount of the transaction in local currency.
• isFraud: This is the transactions made by the fraudulent
agents.
• isFlaggedFraud: Aims to control massive transfers
from one account to another and flags illegal attempts.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Data Preprocessing
Steps Followed:
• Checked for missing values: No missing data was found.
• Converted numerical values (0 and 1) to Yes/No for analysis.
• Split data into training and testing sets for model building.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Exploratory Data Analysis
1.Is fraud:
• Yes – 10.2%, No – 89.8%.
2.Bivariate Analysis:
• The majority fraud transaction occurs for the same user [True].
• All the fraud amount is greater than 10,000 [True].
• 60% of fraud transaction occurs using cash-out-type method [False].
• Values greater than 100.000 occurs using transfers-type method [False].
• Fraud transactions occurs at least in 3 days [True].
3. Multivariate Analysis:
• Numerical Analysis represented through Heatmap.
• Categorical Analysis represented through Heatmap.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Machine Learning Models
Models Used:
I. Logistic Regression:
• Interpreted linear relationships between features.
II. Random Forest Classifier:
• Explored ensemble learning for improved accuracy.
• Selected optimal parameters via grid search.
III. XGBoost Classifier:
• Returned most accuracy of 99.7%.
IV.K-Nearest Neighbors:
• Examined proximity-based decision-making.
V. LightGBM:
• Light Gradient Boosting Machine that offers faster training.
VI. Support Vector Machine:
• SVM algorithms to classify data points determine best accuracy.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Model Performance
• Performance Summary:
• Logistic Regression: Balanced accuracy at 95.4%.
• K-Nearest Neighbors: Balanced accuracy at 95.9%.
• Support Vector Machine: Balanced accuracy at 94.8%.
• Random Forest: Second highest accuracy at 99.4%.
• XGBoost Classifier: Highest accuracy at 99.7%
• Metrics Used:
• Precision: Model's ability to predict on-time deliveries correctly.
• Recall: Sensitivity to capture all true positives.
• F1-Score: Balance between precision and recall.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Conclusion
• The project's objective was to enhance the accuracy of detecting fraud in mobile financial
transactions. By leveraging machine learning, the project seeks to predict fraudulent transactions with
high precision. The goal is to develop a robust machine learning model to accurately identify
fraudulent transactions in real-time , enabling the company to improve security, reduce financial
losses, and gain insights into factors contributing to transaction fraud.
• Conducted thorough EDA using visualization techniques to understand transaction patterns and fraud
indicators.
• Regarding machine learning models, the XGBoost classifier outperformed others with a 99.7%
accuracy rate. Close behind was the Random Forest Classifier with 99.4%. The Support Vector
Machine model trailed with the least accuracy at 94.8%.
• The company receives 25% of each transaction value truly detected as fraud.
• The company receives 5% of each transaction value detected as fraud, however the transaction is
legitimate.
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Questions ?
CONFIDENTIAL: The information in this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this
material is prohibited and subject to legal action under breach of IP and confidentiality clauses.
Thank You!

Fraud Detection in Cybersecurity: Advanced Techniques for Safeguarding Digital Assets

  • 1.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. E-commerce Product Delivery Prediction Submitted by-Aanchal Agrawal
  • 2.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. AGENDA • Project Objective • Dataset Overview • Data Preprocessing • Exploratory Data Analysis • Correlation Analysis • Model Training • Model Performance • Conclusion
  • 3.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Objective: • To conduct Exploratory Data Analysis based on the data set to enhance the accuracy of detecting fraud in mobile financial transactions. • By leveraging machine learning, the project seeks to predict fraudulent transactions with high precision. • The goal is to develop a robust machine learning model to accurately identify fraudulent transactions in real-time , enabling the company to improve security, reduce financial losses, and gain insights into factors contributing to transaction fraud. Project Overview
  • 4.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. END TO END DEPLOYEMENT FLOW Data Collection, Exploration & Cleaning Data Pre- Processing & Feature Engineering Data Modeling Selection & Training Evaluation of Model & Optimizatio n Model Deployment User Data Set
  • 5.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Dataset Overview • Dataset Summary: • Total Records: 11,143 rows. • Features: 10 columns including: • Step, Type, Amount, NameOrig, oldbalanceOrg, newbalanceOrig, nameDest, oldbalanceDest, newbalanceDest and isFraud. • Target Variable: IsFraud.Y/N (1: Yes, 0: No) • Key Attributes: • Type: Cash-in, Cash-Out, Debit, Payment And Transfer. • Amount: Amount of the transaction in local currency. • isFraud: This is the transactions made by the fraudulent agents. • isFlaggedFraud: Aims to control massive transfers from one account to another and flags illegal attempts.
  • 6.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Data Preprocessing Steps Followed: • Checked for missing values: No missing data was found. • Converted numerical values (0 and 1) to Yes/No for analysis. • Split data into training and testing sets for model building.
  • 7.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Exploratory Data Analysis 1.Is fraud: • Yes – 10.2%, No – 89.8%. 2.Bivariate Analysis: • The majority fraud transaction occurs for the same user [True]. • All the fraud amount is greater than 10,000 [True]. • 60% of fraud transaction occurs using cash-out-type method [False]. • Values greater than 100.000 occurs using transfers-type method [False]. • Fraud transactions occurs at least in 3 days [True]. 3. Multivariate Analysis: • Numerical Analysis represented through Heatmap. • Categorical Analysis represented through Heatmap.
  • 8.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Machine Learning Models Models Used: I. Logistic Regression: • Interpreted linear relationships between features. II. Random Forest Classifier: • Explored ensemble learning for improved accuracy. • Selected optimal parameters via grid search. III. XGBoost Classifier: • Returned most accuracy of 99.7%. IV.K-Nearest Neighbors: • Examined proximity-based decision-making. V. LightGBM: • Light Gradient Boosting Machine that offers faster training. VI. Support Vector Machine: • SVM algorithms to classify data points determine best accuracy.
  • 9.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Model Performance • Performance Summary: • Logistic Regression: Balanced accuracy at 95.4%. • K-Nearest Neighbors: Balanced accuracy at 95.9%. • Support Vector Machine: Balanced accuracy at 94.8%. • Random Forest: Second highest accuracy at 99.4%. • XGBoost Classifier: Highest accuracy at 99.7% • Metrics Used: • Precision: Model's ability to predict on-time deliveries correctly. • Recall: Sensitivity to capture all true positives. • F1-Score: Balance between precision and recall.
  • 10.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Conclusion • The project's objective was to enhance the accuracy of detecting fraud in mobile financial transactions. By leveraging machine learning, the project seeks to predict fraudulent transactions with high precision. The goal is to develop a robust machine learning model to accurately identify fraudulent transactions in real-time , enabling the company to improve security, reduce financial losses, and gain insights into factors contributing to transaction fraud. • Conducted thorough EDA using visualization techniques to understand transaction patterns and fraud indicators. • Regarding machine learning models, the XGBoost classifier outperformed others with a 99.7% accuracy rate. Close behind was the Random Forest Classifier with 99.4%. The Support Vector Machine model trailed with the least accuracy at 94.8%. • The company receives 25% of each transaction value truly detected as fraud. • The company receives 5% of each transaction value detected as fraud, however the transaction is legitimate.
  • 11.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Questions ?
  • 12.
    CONFIDENTIAL: The informationin this document belongs to Boston Institute of Analytics LLC. Any unauthorized sharing of this material is prohibited and subject to legal action under breach of IP and confidentiality clauses. Thank You!