E-commerce Product
Delivery Prediction
Made by: Akshay Jambekar
Agenda
 PROJECT OVERVIEW
 Dataset Overview
 Data Pre-Processing
 Label Encoding
 Normalization
 EDA
 Model Building and Evaluation
 Hyperparameter Tuning in Machine Learning
 Data Augmentation
 Observation and Conclusion
PROJECT OVERVIEW
E-commerce Product Delivery Predictio
Classification
TASK!
 Objective: Enhance delivery predictions for
an international e-commerce company
specializing in electronics.
 Approach: Leverage machine learning
models to predict on-time delivery of
products.
 Impact:
• Improve customer satisfaction
• Optimize logistics operations
• Gain insights into factors affecting
delivery performance
Dataset Overview
• Our data set consists of 10999 records with below attributes
• Customer ID: Unique identifier for customers
• Warehouse Block: Warehouse sections (A, B, C, D, E)
• Mode of Shipment: Shipping methods (Ship, Flight, Road)
• Customer Care Calls: Number of shipment inquiry calls
• Customer Rating: Customer rating (1 = Worst, 5 = Best)
• Cost of Product: Product cost in US dollars
• Prior Purchases: Number of previous purchases
• Product Importance: Product importance (Low, Medium, High)
• Gender: Customer gender (Male, Female)
• Discount Offered: Discount on the product
• Weight in Grams: Product weight in grams
• Reached on Time (Y/N): Target variable (1 = Not on time, 0 = On time)
Data Pre-Processing
• Warehouse block:
• Mode_of_Shipment:
• Product_importance:
• Gender:
Warehouse block Encoded
A 0
B 1
C 2
D 3
F 4
Label Encoding for Categorical Attributes
Label Encoding: Converts categorical data into numerical values for machine learning models.
Mode_of_Shipment Encoded
Flight 0
Road 1
Ship 2
Product_importance Encoded
High 0
Low 1
Medium 2
Gender Encoded
F 0
M 1
Data Pre-Processing
 Min-Max Scaling
Min-Max Scaling is used to normalize the numerical features of the dataset.
The scaling transforms the values to a range between 0 and 1,
which helps in improving the performance of machine learning models
 Formula:
Xscaled
Xscaled = Scaled value
X = Original value
= Minimum value in the dataset
= Maximum value in the dataset
 Numeric Columns Scaled:
• Customer_care_calls
• Customer_rating
• Cost_of_the_Product
• Prior_purchases
• Discount_offered
• Weight_in_gms
This transformation ensures that all these features are on the same scale, eliminating any biases due to different ranges.
Data Pre-Processing
 Min-Max Scaling
 What is Min-Max Scaling?
 A normalization technique that scales
the data to a fixed range, typically [0, 1].
 Ensures all features contribute equally
to model performance.
 Formula:
 Where X is the original feature value,
X_min is the minimum value of the
feature, and X_max is the maximum
value.
 Standardization
 What is Standardization?
 A scaling technique that transforms data
to have a mean of 0 and a standard
deviation of 1.
 Ensures that each feature contributes
equally to model performance.
 Formula:
 Where X is the original feature value, μ is the
mean of the feature, and σ is the standard
deviation.
EDA
Interpretation from Correlation matrix :-
• Discount Offered have high positive correlation
with Reached on Time or Not of 40%.
• Weights in gram have negative correlation with
Reached on Time or Not -27%.
• Discount Offered and weights in grams have
negative correlation -38%.
• Customer care calls and weights in grams have
negative correlation -28%.
• Customer care calls and cost of the product have
positive correlation of 32%.
• Prior Purchases and Customer care calls have
slightly positive correlation.
EDA
From the above plots, we can
conclude following:-
Warehouse block F have has more
values than all other Warehouse
blocks.
In mode of shipment columns we can
clearly see that ship delivers the most
of products to the customers.
Most of the customers calls 3 or 4
times to the customer care centers.
Customer Ratings does not have
much variation.
EDA
From this plots, we can conclude
following:-
 Most of the customers have 3
prior purchases.
 We can say that most of the
products are of low Importance.
 Gender Column doesn't have
much variance.
 More products doesn't reach on
time than products reached on
time.
EDA
Warehouse F:
 The majority of products did not reach on
time (higher red bars).
 This warehouse has the highest overall
count of deliveries, with a significant number
of late deliveries.
Warehouse D:
 Similar numbers of on-time and late
deliveries, but slightly more late deliveries.
Warehouses A, B, and C:
 In all three blocks, there are more late
deliveries than on-time deliveries.
 The pattern suggests that on-time delivery is
a general issue across these blocks, though
they handle fewer shipments than
Warehouse F.
EDA
Gender:
There is almost similar numbers of
on-time and late deliveries, but
slightly more late deliveries for both
genders male and female
EDA

Low Product Importance:
 A significant number of products were delivered
on time (purple bar).
 However, even more products did not reach on
time (red bar), indicating a higher failure rate
for low-importance products.

Medium Product Importance:
 Similar to low importance, most products did
not reach on time, though the gap between late
and on-time deliveries is more pronounced here.
This category has a higher count of delayed
deliveries.

High Product Importance:
 The majority of deliveries reached on time.
There are fewer late deliveries in this category,
suggesting better performance for high-
importance products.
EDA
Mode of shipment:
Shipments by Ship: Most shipments
made by ship does not reach on
time.
Shipments by Flight and Road:
While a significant number of
shipments reached on time, there
were also notable instances where
they did not.
EDA
Customer care calls
 A higher number of customer care calls
(particularly 3 or 4) is associated with a
higher likelihood of deliveries not
reaching on time.
 Fewer calls (2) and more than 6 calls
show a more balanced distribution, but
there's still a slight trend towards
deliveries not reaching on time.
EDA
•Prior Purchases
• Customers with 2 to 4 prior purchases seem to experience a higher likelihood of late deliveries.
• As the number of prior purchases increases (beyond 5), the outcomes start to balance, but there
is still a slight tendency for deliveries to be late
EDA
 Discount offered
 Higher Discounts and Late
Deliveries: There is a clear
trend where higher discounts are
associated with deliveries that
did not reach on time. The
larger variability in discounts for
late deliveries suggests that
offering higher discounts may be
related to logistical challenges
or delays.
 Lower Discounts and On-Time
Deliveries: Deliveries that
reached on time are associated
with consistently lower
discounts, with very little
variation in the discount
offered.
EDA
Weight in gms
 For deliveries that were on
time (Reached.on.Time_Y.N =
0), the weight distribution is
relatively compact, mostly
between 4000 and 5000 grams,
with a few outliers on the lower
end.
 For deliveries that were not on
time (Reached.on.Time_Y.N =
1), the weight distribution is
much wider, ranging from
approximately 2000 to 4500
grams.
EDA
Model Building and Evaluation
 We have divided our data in 70-30 format,
70% of data is used for Training
30% of data is used for Testing
 We have trained model using below ML algorithms:
 Logistic Regression
 KNN Classification
 Random Forest
 Support Vector Machine
 Gradient Boosting Classification
Model Comparison -
Base
• The graph shows the comparison between Logistics Regression,
KNN, Random forest, SVM and Gradient Boosting for basic model
without any model optimization.
• Random forest, SVM and Gradient Boosting gives 67% of
accuracy for all three model
Hyperparameter Tuning in Machine Learning
 Definition:
 Hyperparameters are configuration settings used to control the learning process of machine learning models.
 Unlike model parameters, they are not learned from data.
 Importance:
 Hyperparameters significantly influence the model's performance.
 Proper tuning can improve accuracy, generalization, and reduce overfitting.
 Methods for Hyperparameter Tuning:
 Grid Search: Exhaustive search over a manually specified subset of hyperparameters.
 Random Search: Random combinations of hyperparameters are tried.
 Bayesian Optimization: Probabilistic model used to select the next set of hyperparameters based on prior
performance.
Model Comparison –
Hyperparameter Tuning
• The graph shows the comparison between Logistics Regression, KNN, Random forest, SVM and Gradient Boosting for optimized
models using hyper parameter tuning using grid search.
• Gradient Boosting and Random forest classifier gives best accuracy of 69%.
Data Augmentation
 Problem: Imbalanced Data
 Imbalance in the target variable can lead to biased model performance.
 Example: In a classification problem, one class may dominate others.
 SMOTE (Synthetic Minority Over-sampling Technique)
Generates synthetic samples by interpolating between existing minority class instances.
 Advantage:
 Introduces new, varied data points rather than simply duplicating, reducing the risk of overfitting.
 Use Case:
 Effective for balancing datasets in machine learning tasks like classification.
 Outcome:
 Improved balance in the target variable.
 Enhanced model performance by addressing class imbalance and reducing bias towards the majority
class.
Model Comparison –
Data Augmentation
• The graph shows the comparison between Logistics Regression, KNN, Random forest, SVM and Gradient Boosting for Augmented
data
• There is no significant difference noted after data augmentation. Gradient Boosting Classifier give best accuracy of 69%.
Observation and Conclusion
 Weight in gms, Cost of Product, Discount offered this feature highly
contributes in predicting Product was delivered on time or not.
 The highest test accuracy observed is 69% using gradient boosting.
THANK YOU

Enhancing E-Commerce Efficiency: Predicting Delivery Times with Machine Learning

  • 1.
  • 2.
    Agenda  PROJECT OVERVIEW Dataset Overview  Data Pre-Processing  Label Encoding  Normalization  EDA  Model Building and Evaluation  Hyperparameter Tuning in Machine Learning  Data Augmentation  Observation and Conclusion
  • 3.
    PROJECT OVERVIEW E-commerce ProductDelivery Predictio Classification TASK!  Objective: Enhance delivery predictions for an international e-commerce company specializing in electronics.  Approach: Leverage machine learning models to predict on-time delivery of products.  Impact: • Improve customer satisfaction • Optimize logistics operations • Gain insights into factors affecting delivery performance
  • 4.
    Dataset Overview • Ourdata set consists of 10999 records with below attributes • Customer ID: Unique identifier for customers • Warehouse Block: Warehouse sections (A, B, C, D, E) • Mode of Shipment: Shipping methods (Ship, Flight, Road) • Customer Care Calls: Number of shipment inquiry calls • Customer Rating: Customer rating (1 = Worst, 5 = Best) • Cost of Product: Product cost in US dollars • Prior Purchases: Number of previous purchases • Product Importance: Product importance (Low, Medium, High) • Gender: Customer gender (Male, Female) • Discount Offered: Discount on the product • Weight in Grams: Product weight in grams • Reached on Time (Y/N): Target variable (1 = Not on time, 0 = On time)
  • 5.
    Data Pre-Processing • Warehouseblock: • Mode_of_Shipment: • Product_importance: • Gender: Warehouse block Encoded A 0 B 1 C 2 D 3 F 4 Label Encoding for Categorical Attributes Label Encoding: Converts categorical data into numerical values for machine learning models. Mode_of_Shipment Encoded Flight 0 Road 1 Ship 2 Product_importance Encoded High 0 Low 1 Medium 2 Gender Encoded F 0 M 1
  • 6.
    Data Pre-Processing  Min-MaxScaling Min-Max Scaling is used to normalize the numerical features of the dataset. The scaling transforms the values to a range between 0 and 1, which helps in improving the performance of machine learning models  Formula: Xscaled Xscaled = Scaled value X = Original value = Minimum value in the dataset = Maximum value in the dataset  Numeric Columns Scaled: • Customer_care_calls • Customer_rating • Cost_of_the_Product • Prior_purchases • Discount_offered • Weight_in_gms This transformation ensures that all these features are on the same scale, eliminating any biases due to different ranges.
  • 7.
    Data Pre-Processing  Min-MaxScaling  What is Min-Max Scaling?  A normalization technique that scales the data to a fixed range, typically [0, 1].  Ensures all features contribute equally to model performance.  Formula:  Where X is the original feature value, X_min is the minimum value of the feature, and X_max is the maximum value.  Standardization  What is Standardization?  A scaling technique that transforms data to have a mean of 0 and a standard deviation of 1.  Ensures that each feature contributes equally to model performance.  Formula:  Where X is the original feature value, μ is the mean of the feature, and σ is the standard deviation.
  • 8.
    EDA Interpretation from Correlationmatrix :- • Discount Offered have high positive correlation with Reached on Time or Not of 40%. • Weights in gram have negative correlation with Reached on Time or Not -27%. • Discount Offered and weights in grams have negative correlation -38%. • Customer care calls and weights in grams have negative correlation -28%. • Customer care calls and cost of the product have positive correlation of 32%. • Prior Purchases and Customer care calls have slightly positive correlation.
  • 9.
    EDA From the aboveplots, we can conclude following:- Warehouse block F have has more values than all other Warehouse blocks. In mode of shipment columns we can clearly see that ship delivers the most of products to the customers. Most of the customers calls 3 or 4 times to the customer care centers. Customer Ratings does not have much variation.
  • 10.
    EDA From this plots,we can conclude following:-  Most of the customers have 3 prior purchases.  We can say that most of the products are of low Importance.  Gender Column doesn't have much variance.  More products doesn't reach on time than products reached on time.
  • 11.
    EDA Warehouse F:  Themajority of products did not reach on time (higher red bars).  This warehouse has the highest overall count of deliveries, with a significant number of late deliveries. Warehouse D:  Similar numbers of on-time and late deliveries, but slightly more late deliveries. Warehouses A, B, and C:  In all three blocks, there are more late deliveries than on-time deliveries.  The pattern suggests that on-time delivery is a general issue across these blocks, though they handle fewer shipments than Warehouse F.
  • 12.
    EDA Gender: There is almostsimilar numbers of on-time and late deliveries, but slightly more late deliveries for both genders male and female
  • 13.
    EDA  Low Product Importance: A significant number of products were delivered on time (purple bar).  However, even more products did not reach on time (red bar), indicating a higher failure rate for low-importance products.  Medium Product Importance:  Similar to low importance, most products did not reach on time, though the gap between late and on-time deliveries is more pronounced here. This category has a higher count of delayed deliveries.  High Product Importance:  The majority of deliveries reached on time. There are fewer late deliveries in this category, suggesting better performance for high- importance products.
  • 14.
    EDA Mode of shipment: Shipmentsby Ship: Most shipments made by ship does not reach on time. Shipments by Flight and Road: While a significant number of shipments reached on time, there were also notable instances where they did not.
  • 15.
    EDA Customer care calls A higher number of customer care calls (particularly 3 or 4) is associated with a higher likelihood of deliveries not reaching on time.  Fewer calls (2) and more than 6 calls show a more balanced distribution, but there's still a slight trend towards deliveries not reaching on time.
  • 16.
    EDA •Prior Purchases • Customerswith 2 to 4 prior purchases seem to experience a higher likelihood of late deliveries. • As the number of prior purchases increases (beyond 5), the outcomes start to balance, but there is still a slight tendency for deliveries to be late
  • 17.
    EDA  Discount offered Higher Discounts and Late Deliveries: There is a clear trend where higher discounts are associated with deliveries that did not reach on time. The larger variability in discounts for late deliveries suggests that offering higher discounts may be related to logistical challenges or delays.  Lower Discounts and On-Time Deliveries: Deliveries that reached on time are associated with consistently lower discounts, with very little variation in the discount offered.
  • 18.
    EDA Weight in gms For deliveries that were on time (Reached.on.Time_Y.N = 0), the weight distribution is relatively compact, mostly between 4000 and 5000 grams, with a few outliers on the lower end.  For deliveries that were not on time (Reached.on.Time_Y.N = 1), the weight distribution is much wider, ranging from approximately 2000 to 4500 grams.
  • 19.
  • 20.
    Model Building andEvaluation  We have divided our data in 70-30 format, 70% of data is used for Training 30% of data is used for Testing  We have trained model using below ML algorithms:  Logistic Regression  KNN Classification  Random Forest  Support Vector Machine  Gradient Boosting Classification
  • 21.
    Model Comparison - Base •The graph shows the comparison between Logistics Regression, KNN, Random forest, SVM and Gradient Boosting for basic model without any model optimization. • Random forest, SVM and Gradient Boosting gives 67% of accuracy for all three model
  • 22.
    Hyperparameter Tuning inMachine Learning  Definition:  Hyperparameters are configuration settings used to control the learning process of machine learning models.  Unlike model parameters, they are not learned from data.  Importance:  Hyperparameters significantly influence the model's performance.  Proper tuning can improve accuracy, generalization, and reduce overfitting.  Methods for Hyperparameter Tuning:  Grid Search: Exhaustive search over a manually specified subset of hyperparameters.  Random Search: Random combinations of hyperparameters are tried.  Bayesian Optimization: Probabilistic model used to select the next set of hyperparameters based on prior performance.
  • 23.
    Model Comparison – HyperparameterTuning • The graph shows the comparison between Logistics Regression, KNN, Random forest, SVM and Gradient Boosting for optimized models using hyper parameter tuning using grid search. • Gradient Boosting and Random forest classifier gives best accuracy of 69%.
  • 24.
    Data Augmentation  Problem:Imbalanced Data  Imbalance in the target variable can lead to biased model performance.  Example: In a classification problem, one class may dominate others.  SMOTE (Synthetic Minority Over-sampling Technique) Generates synthetic samples by interpolating between existing minority class instances.  Advantage:  Introduces new, varied data points rather than simply duplicating, reducing the risk of overfitting.  Use Case:  Effective for balancing datasets in machine learning tasks like classification.  Outcome:  Improved balance in the target variable.  Enhanced model performance by addressing class imbalance and reducing bias towards the majority class.
  • 25.
    Model Comparison – DataAugmentation • The graph shows the comparison between Logistics Regression, KNN, Random forest, SVM and Gradient Boosting for Augmented data • There is no significant difference noted after data augmentation. Gradient Boosting Classifier give best accuracy of 69%.
  • 26.
    Observation and Conclusion Weight in gms, Cost of Product, Discount offered this feature highly contributes in predicting Product was delivered on time or not.  The highest test accuracy observed is 69% using gradient boosting.
  • 27.