Thesis presentation: Applications of machine learning in predicting supply risks

THESIS DEFENSE
APPLICATIONS OF MACHINE LEARNING IN
PREDICTING SUPPLY RISKS – A CASE STUDY
OF AN E-COMMERCE ENABLER
Presenter: Nguyen Minh Tuan
Advisor: Dr. Nguyen Van Hop
1

Thesis Presentation
Outline
1. Problem Statement 03
2. Objectives, scope, limitations 04
3. Literature review 05
4. Methodology 06
5. Modeling & initial results 09
6. Proposed solution 19
7. Results validation 22
8. Conclusion & recommendations 28
Reference
2

Supply chain risk
management
Identify, assess, mitigate and
monitor unexpected events
that negatively affect any
part of supply chain
(Baryannis et al., 2018)
1. Problem statement
Potential risks in E-
commerce supply chains
• Delayed deliveries => out
of stock, drops in sales
• Difference in quantity =>
rejecting goods =>
second delivery attempt
Applications of AI
techniques in SCRM
Applying Machine learning
techniques at SKU level
could predict supply chain
risks & mitigate their
impacts
3

2. Objectives, scope & limitations
Objectives
• A risk prediction model using selected
metrics & ML algorithms (SVM, DT, RF)
in predicting supply chain risks
• A two-phase pruning approach for Decision
Tree model in tackling its weakness, which
is overfitting
• A demonstration of the model’s application
within a real-world E-commerce enabler.
Scope
• Focus on risks of delayed
deliveries
• Data: Group L’Oreal CPD
(Maybelline, L’Oreal Paris)
from May 2019 – Apr 2020
Limitations
• Data collecting
• Lack of historical data
4

3. Literature review
S. Ye, et al.
(2015)
Support vector machines (SVM)
can be applied in determining if a
company display in equal financial
profile to ones that suffered
disruptions in the past
A.
Bruzzone,
A. Orsoni
(2003)
Artificial Neural Networks (ANNs)
is used to assess production losses
& calculate cost estimates for
different scenarios.
Cavalcanti
et al.
(2019)
Supervised ML such as k-Nearest
Neighbors (k-NN) can be
advantageous in decision-making
process of resilient supplier
selection, leading to more
predictable delivery from suppliers
Baryannis
et al
(2019)
- Key
Reference
SVM and Desicion Trees are used
to predict delivery delays in a
manufacturing supply chain. DT
achieves better interpretability than
SVM.
5

1. Support Vector Machines (SVM)
• High predictive accuracy, good
generalization
• Poor interpretability, slow training time
2. Decision Trees (DT)
• High explanation ability, high training
speed
• Complex trees, overfitting, poor
generalization capabilities
3. Random Forest (RF)
• Low variance and bias
• Can reduce overfitting
• RF performs better than DT in most cases
Table 4.1: Comparison of five learning algorithms (5: the best, 1: the
worst) (Kotsiantis, 2007)
Factors
Decision
Trees
Neural
Network
Naïve
Bayes
K-NN SVM
Accuracy in general 2 3 1 2 4
Speed of learning 3 1 4 4 1
Speed of classification 4 4 4 1 4
Tolerance to noise 2 2 3 1 2
Tolerance to irrelevant
attributes
3 1 2 2 4
Dealing with risk of
overfitting
2 1 3 3 2
Explanation ability 4 1 4 2 1
4. Methodology – Choosing ML algorithms
Random Forest
(Ensemble Learning
method)
6

4. Methodology – Choosing metrics
Actual:
Positive
Actual:
Negative
Predicted as
Positive
True Positive
(TP)
False Positive
(FP)
Predicted as
Negative
False Negative
(FN)
True Negative
(TN)
Accuracy =
𝑻𝑷+𝑻𝑵
𝑻𝑷+𝑻𝑵+𝑭𝑷+𝑭𝑵
Precision =
𝐓𝐏
𝐓𝐏+𝐅𝐏
| Recall =
𝐓𝐏
𝐓𝐏+𝐅𝐍
=> might cause unjustified bias
Balanced metrics
AUC (Area Under the Curve)
2 *
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙
F1 score
𝑛(Rn - Rn-1) * Pn
Average
precision
(AP)
𝑇𝑃 ∗ 𝑇𝑁 − 𝐹𝑃 ∗ 𝐹𝑁
𝑇𝑃 + 𝐹𝑃 𝑇𝑃 + 𝐹𝑁 𝑇𝑁 + 𝐹𝑃 (𝑇𝑁 + 𝐹𝑁)
MCC
7

Data is collected through internal & external sources (brand)
Redundant variables removal, numerical form, checking outliers, normalization
Features are selected and extracted to maintain meaningful ones
Analysis on strengths/ weakness of different ML methods and key reference
Solution to improve performance, deal with weaknesses of initial results
SVM, DT, RF are selected as three model’s algorithm
Comparison of initial models and proposed solution based on metric scores &
different training sizes
Recommend machine learning technique, future research
Data
Preprocessing
Internal Data
Collection
External Data
Collection
Feature
Engineering/
Selection
Improved solution
development
SVM RF
Results
Validation
Conclusion &
Recommendation
DT
4. Methodology – Conceptual design framework
8

5. Modelling – Data processing
~7200 SKUs, 8-month periodS
Product
barcode, sapcode, product name, purchase price,
retail sale price
P
Orders
purchase order number, platform purchase number,
brand, platform
O
Deliveries
requested quantity; requested amount; delivered
quantity; delivered amount; actual quantity inbound
at platforms; actual amount inbound at platforms;
service level.
D
9
CSV format
Remove redundant
variables
Numerical form
Checking outliers
Raw data
Normalization

5. Modelling – Feature Engineering and Selection
SelectKBest
+ statistical
test (mutual
information)
T-test
analysis
based on p-
values
Feature
Engineering
Feature
Selection
Goal:
Predict whether a delivery
will be late or not
1: late deliveries (>= 4 more
business days)
0: on time deliveries (<= 3 business
days)
=> binary classification problem
8-
feature
dataset
5-
feature
dataset
10

5. Modelling – Model generation
Support Vector
Machine
Decision Tree Random
Forest
Grid search => optimal
parameters C and 𝛾
(penalty parameter and
kernel coefficient).
Gini index, two
experiments (unlimited
tree, limited tree)
Run model with default
parameters => tuning
parameters
80% - 20%
Training set Test set
5-FOLD CROSS-VALIDATION
11

Params Test scores Classification CT
(min)C ꝩ AP F1 MCC Acc Recall Precision AUC TP TN FP FN
105 105 0.805 0.801 0.664 0.821 0.705 0.928 0.824 499 639 39 209 3.3
105 104 0.737 0.682 0.545 0.744 0.538 0.932 0.748 381 650 28 327 0.48
104 105 0.743 0.688 0.554 0.748 0.542 0.939 0.753 384 653 25 324 0.06
5. Modelling – Initial results – Support Vector Machine
Table 5.1 Prediction scores using SVM with 8 features
(a) Average Precision with 8 features (b) F1 score with 8 features
Figure 5.1 Indicative grid search results for SVM with 8 features 12

Feature Test scores
AP F1 MCC Acc Recall Precision AUC
8 0.805 0.801 0.664 0.821 0.705 0.939 0.824
5 0.769 0.825 0.640 0.820 0.828 0.832 0.820
5. Modelling – Initial results – Support Vector Machine
5%
Computational time
8-feature: 3.3 minutes
5-feature: < 0.5 minutes
Params Test scores Classification CT
(min)C ꝩ AP F1 MCC Acc Recall Precision AUC TP TN FP FN
105 105 0.751 0.814 0.613 0.807 0.828 0.801 0.806 586 532 146 122 1.60
105 104 0.766 0.812 0.626 0.812 0.792 0.832 0.813 561 565 113 147 0.08
104 105 0.769 0.825 0.640 0.820 0.826 0.823 0.820 585 552 126 123 0.06
Table 5.2 Prediction scores using SVM with 5 features
Table 5.3 Best prediction scores using SVM with 8 and 5 features
13

Table 5.4 Prediction score using unrestricted decision trees with default parameters
5. Modelling – Initial results – Decision Trees
Feature Test scores Classification
AP F1 MCC Acc Recall Precision AUC TP TN FP FN
8 0.744 0.683 0.555 0.747 0.534 0.947 0.751 378 657 21 330
5 0.842 0.885 0.765 0.882 0.886 0.884 0.882 627 596 82 81
Maximum
depth
Total
nodes
8-feature
DT
51 551
5-feature
DT
41 1297
Maximum
depth
Total
nodes
8-feature
DT
6 15
5-feature
DT
6 15
14

Figure 5.2 Decision Tree classifier using restricted parameters with 8 features
5. Modelling – Initial results – Decision Trees
8 0.623 0.419 0.347 0.616 0.271 0.923 0.624 192 662 16 516
5 0.610 0.581 0.299 0.639 0.490 0.714 0.643 611 281 430 124
Table 5.5 Prediction scores using DT with max_depth = 6 and max_leaf_nodes = 15
15

Figure 5.3 AUC score of n_estimators using RF
with 8 features
Figure 5.4 AUC score of n_estimators using RF
with 5 features
Table 5.6 Prediction scores using RF with default parameters
5. Modelling – Initial results – Random Forest
8 0.767 0.709 0.592 0.766 0.559 0.968 0.770 396 665 13 312
5 0.829 0.875 0.744 0.872 0.877 0.873 0.872 621 588 90 87
16

Table 5.7 Comparison of prediction scores between the based model (subsequent rows) and model using
paramaters tuning with RF for 8 & 5 features
5. Modelling – Initial results – Random Forest
Test scores Classification
Feature AP F1 MCC Acc Recall Precision AUC TP TN FP FN
8 0.767 0.709 0.592 0.766 0.559 0.968 0.770 396 665 13 312
8 (initial) 0.802 0.764 0.650 0.802 0.629 0.974 0.805 445 666 12 263
Improvement 4.56% 7.76% 9.80% 4.70% 12.52% 0.62% 4.55%
5 0.829 0.875 0.744 0.872 0.877 0.873 0.872 621 588 90 87
5 (initial) 0.843 0.887 0.768 0.884 0.890 0.884 0.884 630 595 83 78
Improvement 1.69% 1.37% 3.23% 1.38% 1.48% 1.26% 1.38%
17

5. Modelling – Initial results summary and conclusion
Classifier
Test scores
SVM 0.805 0.825 0.664 0.821 0.828 0.939 0.824
DT 0.842 0.885 0.765 0.882 0.886 0.947 0.882
RDT 0.623 0.581 0.347 0.639 0.490 0.923 0.643
RF 0.843 0.887 0.768 0.884 0.890 0.974 0.884
Table 5.8 Summary of best prediction scores using SVM, DT, RDT, RF
1. Result #1: Random Forest
RF outperformed other algorithms, drawback: complexity
2. Result #2: Support Vector Machine
SVM achieve comparable results compared with DT, disadvantage: computational time
3. Result #3: Restricted Decision Trees
Limiting tree => informative tree, visualizing tree => interpretation, coming at the expense of
predictive performance.
4. Result #4: Feature dataset
5-feature dataset > 8-feature dataset (SVM, RF). DT: fluctuations, features having maximum
prediction performance? 18

6. Proposed solution – RFECV using RF
Improved feature selection
Recursive feature elimination
(Guyon et al., 2002)
Cross-validation
RFECV using Random Forest
(Kuhn and Johnson, 2019)
Create model on set of predictors, calculating scores that are
important for each predictor => the least essential features are
cut => recursively repeated to achieve final selected features
Train models on subsets of the available input data and
evaluating them on the complementary subset of the data.
K-fold cross validation, k: number of split group (5 or 10)
1. Random Forest not exclude variables from prediction
equation (ensembles models)
2. A method for measuring feature importance => feature
rankings => find features with best predictive performance
Initial used SelectKBest class with mutual information metric
& statistical analysis based on p-value => 8-feature & 5-
feature dataset => most optimal features?
19

6. Proposed solution – Two-phase cost-complexity pruning
Pruning
Techniques
Advantages Disadvantages
Reduced Error
Pruning (REP)
(Quinlan, 1999)
- Linear computational
complexity
- Perform good accuracy
- Bias towards over-pruning
Pessimistic Error
Pruning (PEP)
(Quinlan, 1999)
- High run speed
- Not require a test data
set
- Not produce a selection
of trees
- Each node will still be
visited once
Cost-Complexity
Pruning (CCP)
(Breiman et al.,
1984)
- High accuracy when the
best tree is selected
- Produces a selection of
trees for experts to study
- Can only choose a tree in
the set of subtrees
Minimum Error
Pruning (MEP)
(Niblett and
Bratko, 1991)
- Can get an ideal
solution in theory
- Assumption is seldom true
- Produces only a single tree
- Unstable results
PHASE 1:
SURVEY
Do survey of
subtrees with
relatively good
accuracy
performance for
pruning
PHASE 2:
CCP
Use CCP algorithm
to choose the
weakest link of the
resultant tree to
prune by seeking the
most optimal
ccp_alpha.
TWO-PHASE
CCP
Table 6.1 Evaluation of Decision Tree pruning techniques
20

6. Proposed solution framework
Decision Tree
RFECV using
RF
2-phase CCP
pruning
Result
Validation
Recursive feature elimination with cross-validation (RFECV) using
Random Forest => choose the features with maximum potential
1) Search the subtrees with a relatively good accuracy performance
in terms of max depth and max leaf nodes of the tree.
2) Find the tree with best predictive accuracy by seeking most
optimal value of ccp_alpha
Figure 6.1: Flowchart of the proposed solution
21

7. Results – Improved feature selection solution
Optimal number of features: 9
Figure 7.1 Accuracy score of all features using RFECV with Random Forest Figure 7.2 Feature importances using RFECV with Random Forest
Table 7.1 Prediction scores of DT model after using RFECV with RF
22
Test scores Classification
Feature AP F1 MCC Acc Recall Precision AUC TP TN FP FN
9 0.830 0.871 0.739 0.860 0.860 0.881 0.870 609 596 82 99
8 0.972 0.982 0.962 0.981 0.983 0.980 0.981 696 664 14 12
7 0.974 0.982 0.964 0.982 0.983 0.982 0.982 696 665 13 12
6 1.000 1.000 1.000 1.000 1.000 1.000 1.000 708 678 0 0

7. Results – 2 phase cost-complexity pruning (CCP)
Initial
tree
2-phase CCP
CCP alpha 0.0001 0.00015 0.0002 0.00025 0.0003
Training accuracy 0.969 0.911 0.909 0.903 0.900 0.893
Test accuracy 0.869 0.838 0.839 0.842 0.841 0.833
Difference
(overfitting) 9.95% 7.29% 6.96% 6.10% 5.97% 5.93%
Max depth 62 46 46 46 46 46
Total nodes 1649 983 935 821 775 699
Initial
tree
2-phase CCP
CCP alpha 0.0005 0.0010 0.0015 0.002 0.0025
Training accuracy 1 0.996 0.995 0.994 0.991 0.988
Test accuracy 0.982 0.981 0.981 0.982 0.978 0.973
Difference
(overfitting) 1.80% 1.44% 1.42% 1.18% 1.30% 1.55%
Max depth 12 12 12 11 10 10
Total nodes 211 119 115 103 89 83
Table 7.2: Comparison of ccp_alpha of 9-feature DT after 2-phase CCP Table 7.2: Comparison of ccp_alpha of 7-feature DT after 2-phase CCP
Figure 7.3: 9-feature Decion Tree after 2-phase cost-complexity pruning Figure 7.4: 7-feature Decion Tree after 2-phase cost-complexity pruning
23

7. Results – Summary results
Classifier
Test scores
Initial results
SVM
(key reference)
0.805 0.825 0.664 0.821 0.828 0.939 0.824
DT
(key reference)
0.842 0.885 0.765 0.882 0.886 0.947 0.882
RDT
(key reference)
0.623 0.581 0.347 0.639 0.490 0.923 0.643
RF 0.843 0.887 0.768 0.884 0.890 0.974 0.884
Improve
solution
2-phase CCP DT 0.974 0.982 0.964 0.982 0.983 0.982 0.982
Table 7.3 Prediction scores of initial models and after proposed solution model
11%
24

7. Results – Validation
15 trials
Trial No. Classifier AP F1 MCC Accuracy Recall Precision AUC
Total
Average
SVM 0.804 0.820 0.662 0.821 0.842 0.922 0.823
DT 0.838 0.880 0.756 0.870 0.875 0.935 0.878
RDT 0.624 0.588 0.349 0.640 0.506 0.919 0.644
RF 0.844 0.886 0.767 0.883 0.884 0.887 0.883
2-phase CCP DT 0.980 0.986 0.971 0.986 0.985 0.986 0.986
Table 7.4: Prediction scores of initial models and 2-phase CCP DT on 15 trials
28%17% 12%
25

7. Results – Features sensitivity analysis
26
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
7 8 9 10 11 12 13 14 15 16
Accuracy
Features
Original DT Accuracy 2-phase CCP DT Accuracy
0
500
1000
1500
2000
2500
7 8 9 10 11 12 13 14 15 16
Totalnodes
Features
Original DT total nodes 2-phase CCP DT total nodes
0
0.0002
0.0004
0.0006
0.0008
0.001
0.0012
0.0014
0.0016
7 8 9 10 11 12 13 14 15 16
Cost-complexitypruningalpha
Features
Most optimal CCP alpha

7. Results – Social, economic and environmental impacts
Social impacts
• Helping companies
maximize profits
• Reducing costs of second
delivery attempts
• Making predictions of
deliveries
• Managing supplier
performance
• Allowing governments to
allocate expenditure on this
research
Economic impacts Environmental impacts
• Reducing carbon dioxide
emissions due to second
deliveries
• Diminishing missed
deliveries can alleviate
environmental impacts
27

8. Conclusions & Recommendations
• A risk prediction model: 3 algorithms: SVM, DT, RF with different evaluation measures to
predict supply risks, focusing on risks of delayed deliveries
• A solution to improve feature selection using RFECV and a two-phase CCP technique
• A case study of an E-commerce Enabler
• Results: proposed solution predict correctly 98% of late deliveries, improving 12% over initial
model. Prioritizing performance over overfitting requires a compromise of CCP alpha values.
• Considering larger datasets, more AI algorithms techniques (for example, gradient boosting or
deep learning)
• Investigating pruning techniques for Random Forest
• Studying other supply chain risks, such as risks of quantity short deliveries or risks of product
returns. 28

Reference
Baryannis, G.; Validi, S.; Danib,S.; Antoniou,G. Supply chain risk
management and artiﬁcial intelligence: state of the art and future
research directions. 2018
1
A. Bruzzone, A. Orsoni. AI and simulation-based techniques for the
assessment of supply chain logistic performance, in: 36th Annual
Simulation Symposium, IEEE, Orlando, FL, USA, USA, 2003, pp.
154–164
2
Cavalcantea, I.; Frazzonb,E.; Forcellinia, F.; Ivanovc, D. A
supervised machine learning approach to data-driven simulation of
resilient supplier selection in digital manufacturing. 2019
3
S. Ye, Z. Xiao, G. Zhu, Identification of supply chain disruptions
with economic performance of firms using multi-category support
vector machines, Int. J. Prod, 2015
4
Baryannis, G.; Dani, S.; Antoniou, G. Predicting supply chain risks
using machine learning: The trade-off between performance and
interpretability, 2019
5
S.B. Kotsiantis, “Supervised Machine Learning: A Review of
Classification Techniques”, Informatica 31 (2007) 249-2686
Guyon, I, J Weston, S Barnhill, and V Vapnik. 2002. “Gene Selection
for Cancer Classification Using Support Vector Machines.” Machine
Learning 46 (1): 389–422.
7
Kuhn, M., Johnson, J. Feature Engineering and Selection: A
Practical Approach for Predictive Models, 20198
J. Quinlan. Simplifying decision trees, Int. J. HumanComputer
Studies, (1999)51, pp. 497-491, 19999
Breiman, L., H. Friedman, J., A. Olshen, R., J. Stone, C.:
Classiﬁcation and Regression Trees. Chapman and Hall, New York
(1984)
10
B. Cestnik, and I. Bratko. On Estimating Probabilities in Tree
Pruning, EWSL, pp. 138-150, 199111
29

Q&A Section
THANK YOU FOR
YOUR TIME!
30

Thesis presentation: Applications of machine learning in predicting supply risks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Thesis presentation: Applications of machine learning in predicting supply risks

Similar to Thesis presentation: Applications of machine learning in predicting supply risks (20)

Recently uploaded

Recently uploaded (20)

Thesis presentation: Applications of machine learning in predicting supply risks

Editor's Notes