SlideShare a Scribd company logo
1 of 30
THESIS DEFENSE
APPLICATIONS OF MACHINE LEARNING IN
PREDICTING SUPPLY RISKS – A CASE STUDY
OF AN E-COMMERCE ENABLER
Presenter: Nguyen Minh Tuan
Advisor: Dr. Nguyen Van Hop
1
Thesis Presentation
Outline
1. Problem Statement 03
2. Objectives, scope, limitations 04
3. Literature review 05
4. Methodology 06
5. Modeling & initial results 09
6. Proposed solution 19
7. Results validation 22
8. Conclusion & recommendations 28
Reference
2
Supply chain risk
management
Identify, assess, mitigate and
monitor unexpected events
that negatively affect any
part of supply chain
(Baryannis et al., 2018)
1. Problem statement
Potential risks in E-
commerce supply chains
• Delayed deliveries => out
of stock, drops in sales
• Difference in quantity =>
rejecting goods =>
second delivery attempt
Applications of AI
techniques in SCRM
Applying Machine learning
techniques at SKU level
could predict supply chain
risks & mitigate their
impacts
3
2. Objectives, scope & limitations
Objectives
• A risk prediction model using selected
metrics & ML algorithms (SVM, DT, RF)
in predicting supply chain risks
• A two-phase pruning approach for Decision
Tree model in tackling its weakness, which
is overfitting
• A demonstration of the model’s application
within a real-world E-commerce enabler.
Scope
• Focus on risks of delayed
deliveries
• Data: Group L’Oreal CPD
(Maybelline, L’Oreal Paris)
from May 2019 – Apr 2020
Limitations
• Data collecting
• Lack of historical data
4
3. Literature review
S. Ye, et al.
(2015)
Support vector machines (SVM)
can be applied in determining if a
company display in equal financial
profile to ones that suffered
disruptions in the past
A.
Bruzzone,
A. Orsoni
(2003)
Artificial Neural Networks (ANNs)
is used to assess production losses
& calculate cost estimates for
different scenarios.
Cavalcanti
et al.
(2019)
Supervised ML such as k-Nearest
Neighbors (k-NN) can be
advantageous in decision-making
process of resilient supplier
selection, leading to more
predictable delivery from suppliers
Baryannis
et al
(2019)
- Key
Reference
SVM and Desicion Trees are used
to predict delivery delays in a
manufacturing supply chain. DT
achieves better interpretability than
SVM.
5
1. Support Vector Machines (SVM)
• High predictive accuracy, good
generalization
• Poor interpretability, slow training time
2. Decision Trees (DT)
• High explanation ability, high training
speed
• Complex trees, overfitting, poor
generalization capabilities
3. Random Forest (RF)
• Low variance and bias
• Can reduce overfitting
• RF performs better than DT in most cases
Table 4.1: Comparison of five learning algorithms (5: the best, 1: the
worst) (Kotsiantis, 2007)
Factors
Decision
Trees
Neural
Network
Naïve
Bayes
K-NN SVM
Accuracy in general 2 3 1 2 4
Speed of learning 3 1 4 4 1
Speed of classification 4 4 4 1 4
Tolerance to noise 2 2 3 1 2
Tolerance to irrelevant
attributes
3 1 2 2 4
Dealing with risk of
overfitting
2 1 3 3 2
Explanation ability 4 1 4 2 1
4. Methodology – Choosing ML algorithms
Random Forest
(Ensemble Learning
method)
6
4. Methodology – Choosing metrics
Actual:
Positive
Actual:
Negative
Predicted as
Positive
True Positive
(TP)
False Positive
(FP)
Predicted as
Negative
False Negative
(FN)
True Negative
(TN)
Accuracy =
𝑻𝑷+𝑻𝑵
𝑻𝑷+𝑻𝑵+𝑭𝑷+𝑭𝑵
Precision =
𝐓𝐏
𝐓𝐏+𝐅𝐏
| Recall =
𝐓𝐏
𝐓𝐏+𝐅𝐍
=> might cause unjustified bias
Balanced metrics
AUC (Area Under the Curve)
2 *
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙
F1 score
𝑛(Rn - Rn-1) * Pn
Average
precision
(AP)
𝑇𝑃 ∗ 𝑇𝑁 − 𝐹𝑃 ∗ 𝐹𝑁
𝑇𝑃 + 𝐹𝑃 𝑇𝑃 + 𝐹𝑁 𝑇𝑁 + 𝐹𝑃 (𝑇𝑁 + 𝐹𝑁)
MCC
7
Data is collected through internal & external sources (brand)
Redundant variables removal, numerical form, checking outliers, normalization
Features are selected and extracted to maintain meaningful ones
Analysis on strengths/ weakness of different ML methods and key reference
Solution to improve performance, deal with weaknesses of initial results
SVM, DT, RF are selected as three model’s algorithm
Comparison of initial models and proposed solution based on metric scores &
different training sizes
Recommend machine learning technique, future research
Data
Preprocessing
Internal Data
Collection
External Data
Collection
Feature
Engineering/
Selection
Improved solution
development
SVM RF
Results
Validation
Conclusion &
Recommendation
DT
4. Methodology – Conceptual design framework
8
5. Modelling – Data processing
~7200 SKUs, 8-month periodS
Product
barcode, sapcode, product name, purchase price,
retail sale price
P
Orders
purchase order number, platform purchase number,
brand, platform
O
Deliveries
requested quantity; requested amount; delivered
quantity; delivered amount; actual quantity inbound
at platforms; actual amount inbound at platforms;
service level.
D
9
CSV format
Remove redundant
variables
Numerical form
Checking outliers
Raw data
Normalization
5. Modelling – Feature Engineering and Selection
SelectKBest
+ statistical
test (mutual
information)
T-test
analysis
based on p-
values
Feature
Engineering
Feature
Selection
Goal:
Predict whether a delivery
will be late or not
1: late deliveries (>= 4 more
business days)
0: on time deliveries (<= 3 business
days)
=> binary classification problem
8-
feature
dataset
5-
feature
dataset
10
5. Modelling – Model generation
Support Vector
Machine
Decision Tree Random
Forest
Grid search => optimal
parameters C and 𝛾
(penalty parameter and
kernel coefficient).
Gini index, two
experiments (unlimited
tree, limited tree)
Run model with default
parameters => tuning
parameters
80% - 20%
Training set Test set
5-FOLD CROSS-VALIDATION
11
Params Test scores Classification CT
(min)C ꝩ AP F1 MCC Acc Recall Precision AUC TP TN FP FN
105 105 0.805 0.801 0.664 0.821 0.705 0.928 0.824 499 639 39 209 3.3
105 104 0.737 0.682 0.545 0.744 0.538 0.932 0.748 381 650 28 327 0.48
104 105 0.743 0.688 0.554 0.748 0.542 0.939 0.753 384 653 25 324 0.06
5. Modelling – Initial results – Support Vector Machine
Table 5.1 Prediction scores using SVM with 8 features
(a) Average Precision with 8 features (b) F1 score with 8 features
Figure 5.1 Indicative grid search results for SVM with 8 features 12
Feature Test scores
AP F1 MCC Acc Recall Precision AUC
8 0.805 0.801 0.664 0.821 0.705 0.939 0.824
5 0.769 0.825 0.640 0.820 0.828 0.832 0.820
5. Modelling – Initial results – Support Vector Machine
5%
Computational time
8-feature: 3.3 minutes
5-feature: < 0.5 minutes
Params Test scores Classification CT
(min)C ꝩ AP F1 MCC Acc Recall Precision AUC TP TN FP FN
105 105 0.751 0.814 0.613 0.807 0.828 0.801 0.806 586 532 146 122 1.60
105 104 0.766 0.812 0.626 0.812 0.792 0.832 0.813 561 565 113 147 0.08
104 105 0.769 0.825 0.640 0.820 0.826 0.823 0.820 585 552 126 123 0.06
Table 5.2 Prediction scores using SVM with 5 features
Table 5.3 Best prediction scores using SVM with 8 and 5 features
13
Table 5.4 Prediction score using unrestricted decision trees with default parameters
5. Modelling – Initial results – Decision Trees
Feature Test scores Classification
AP F1 MCC Acc Recall Precision AUC TP TN FP FN
8 0.744 0.683 0.555 0.747 0.534 0.947 0.751 378 657 21 330
5 0.842 0.885 0.765 0.882 0.886 0.884 0.882 627 596 82 81
Maximum
depth
Total
nodes
8-feature
DT
51 551
5-feature
DT
41 1297
Maximum
depth
Total
nodes
8-feature
DT
6 15
5-feature
DT
6 15
14
Figure 5.2 Decision Tree classifier using restricted parameters with 8 features
5. Modelling – Initial results – Decision Trees
Feature Test scores Classification
AP F1 MCC Acc Recall Precision AUC TP TN FP FN
8 0.623 0.419 0.347 0.616 0.271 0.923 0.624 192 662 16 516
5 0.610 0.581 0.299 0.639 0.490 0.714 0.643 611 281 430 124
Table 5.5 Prediction scores using DT with max_depth = 6 and max_leaf_nodes = 15
15
Figure 5.3 AUC score of n_estimators using RF
with 8 features
Figure 5.4 AUC score of n_estimators using RF
with 5 features
Table 5.6 Prediction scores using RF with default parameters
5. Modelling – Initial results – Random Forest
Feature Test scores Classification
AP F1 MCC Acc Recall Precision AUC TP TN FP FN
8 0.767 0.709 0.592 0.766 0.559 0.968 0.770 396 665 13 312
5 0.829 0.875 0.744 0.872 0.877 0.873 0.872 621 588 90 87
16
Table 5.7 Comparison of prediction scores between the based model (subsequent rows) and model using
paramaters tuning with RF for 8 & 5 features
5. Modelling – Initial results – Random Forest
Test scores Classification
Feature AP F1 MCC Acc Recall Precision AUC TP TN FP FN
8 0.767 0.709 0.592 0.766 0.559 0.968 0.770 396 665 13 312
8 (initial) 0.802 0.764 0.650 0.802 0.629 0.974 0.805 445 666 12 263
Improvement 4.56% 7.76% 9.80% 4.70% 12.52% 0.62% 4.55%
5 0.829 0.875 0.744 0.872 0.877 0.873 0.872 621 588 90 87
5 (initial) 0.843 0.887 0.768 0.884 0.890 0.884 0.884 630 595 83 78
Improvement 1.69% 1.37% 3.23% 1.38% 1.48% 1.26% 1.38%
17
5. Modelling – Initial results summary and conclusion
Classifier
Test scores
AP F1 MCC Acc Recall Precision AUC
SVM 0.805 0.825 0.664 0.821 0.828 0.939 0.824
DT 0.842 0.885 0.765 0.882 0.886 0.947 0.882
RDT 0.623 0.581 0.347 0.639 0.490 0.923 0.643
RF 0.843 0.887 0.768 0.884 0.890 0.974 0.884
Table 5.8 Summary of best prediction scores using SVM, DT, RDT, RF
1. Result #1: Random Forest
RF outperformed other algorithms, drawback: complexity
2. Result #2: Support Vector Machine
SVM achieve comparable results compared with DT, disadvantage: computational time
3. Result #3: Restricted Decision Trees
Limiting tree => informative tree, visualizing tree => interpretation, coming at the expense of
predictive performance.
4. Result #4: Feature dataset
5-feature dataset > 8-feature dataset (SVM, RF). DT: fluctuations, features having maximum
prediction performance? 18
6. Proposed solution – RFECV using RF
Improved feature selection
Recursive feature elimination
(Guyon et al., 2002)
Cross-validation
RFECV using Random Forest
(Kuhn and Johnson, 2019)
Create model on set of predictors, calculating scores that are
important for each predictor => the least essential features are
cut => recursively repeated to achieve final selected features
Train models on subsets of the available input data and
evaluating them on the complementary subset of the data.
K-fold cross validation, k: number of split group (5 or 10)
1. Random Forest not exclude variables from prediction
equation (ensembles models)
2. A method for measuring feature importance => feature
rankings => find features with best predictive performance
Initial used SelectKBest class with mutual information metric
& statistical analysis based on p-value => 8-feature & 5-
feature dataset => most optimal features?
19
6. Proposed solution – Two-phase cost-complexity pruning
Pruning
Techniques
Advantages Disadvantages
Reduced Error
Pruning (REP)
(Quinlan, 1999)
- Linear computational
complexity
- Perform good accuracy
- Bias towards over-pruning
Pessimistic Error
Pruning (PEP)
(Quinlan, 1999)
- High run speed
- Not require a test data
set
- Not produce a selection
of trees
- Each node will still be
visited once
Cost-Complexity
Pruning (CCP)
(Breiman et al.,
1984)
- High accuracy when the
best tree is selected
- Produces a selection of
trees for experts to study
- Can only choose a tree in
the set of subtrees
Minimum Error
Pruning (MEP)
(Niblett and
Bratko, 1991)
- Can get an ideal
solution in theory
- Assumption is seldom true
- Produces only a single tree
- Unstable results
PHASE 1:
SURVEY
Do survey of
subtrees with
relatively good
accuracy
performance for
pruning
PHASE 2:
CCP
Use CCP algorithm
to choose the
weakest link of the
resultant tree to
prune by seeking the
most optimal
ccp_alpha.
TWO-PHASE
CCP
Table 6.1 Evaluation of Decision Tree pruning techniques
20
6. Proposed solution framework
Decision Tree
RFECV using
RF
2-phase CCP
pruning
Result
Validation
Recursive feature elimination with cross-validation (RFECV) using
Random Forest => choose the features with maximum potential
1) Search the subtrees with a relatively good accuracy performance
in terms of max depth and max leaf nodes of the tree.
2) Find the tree with best predictive accuracy by seeking most
optimal value of ccp_alpha
Figure 6.1: Flowchart of the proposed solution
21
7. Results – Improved feature selection solution
Optimal number of features: 9
Figure 7.1 Accuracy score of all features using RFECV with Random Forest Figure 7.2 Feature importances using RFECV with Random Forest
Table 7.1 Prediction scores of DT model after using RFECV with RF
22
Test scores Classification
Feature AP F1 MCC Acc Recall Precision AUC TP TN FP FN
9 0.830 0.871 0.739 0.860 0.860 0.881 0.870 609 596 82 99
8 0.972 0.982 0.962 0.981 0.983 0.980 0.981 696 664 14 12
7 0.974 0.982 0.964 0.982 0.983 0.982 0.982 696 665 13 12
6 1.000 1.000 1.000 1.000 1.000 1.000 1.000 708 678 0 0
7. Results – 2 phase cost-complexity pruning (CCP)
Initial
tree
2-phase CCP
CCP alpha 0.0001 0.00015 0.0002 0.00025 0.0003
Training accuracy 0.969 0.911 0.909 0.903 0.900 0.893
Test accuracy 0.869 0.838 0.839 0.842 0.841 0.833
Difference
(overfitting) 9.95% 7.29% 6.96% 6.10% 5.97% 5.93%
Max depth 62 46 46 46 46 46
Total nodes 1649 983 935 821 775 699
Initial
tree
2-phase CCP
CCP alpha 0.0005 0.0010 0.0015 0.002 0.0025
Training accuracy 1 0.996 0.995 0.994 0.991 0.988
Test accuracy 0.982 0.981 0.981 0.982 0.978 0.973
Difference
(overfitting) 1.80% 1.44% 1.42% 1.18% 1.30% 1.55%
Max depth 12 12 12 11 10 10
Total nodes 211 119 115 103 89 83
Table 7.2: Comparison of ccp_alpha of 9-feature DT after 2-phase CCP Table 7.2: Comparison of ccp_alpha of 7-feature DT after 2-phase CCP
Figure 7.3: 9-feature Decion Tree after 2-phase cost-complexity pruning Figure 7.4: 7-feature Decion Tree after 2-phase cost-complexity pruning
23
7. Results – Summary results
Classifier
Test scores
AP F1 MCC Acc Recall Precision AUC
Initial results
SVM
(key reference)
0.805 0.825 0.664 0.821 0.828 0.939 0.824
DT
(key reference)
0.842 0.885 0.765 0.882 0.886 0.947 0.882
RDT
(key reference)
0.623 0.581 0.347 0.639 0.490 0.923 0.643
RF 0.843 0.887 0.768 0.884 0.890 0.974 0.884
Improve
solution
2-phase CCP DT 0.974 0.982 0.964 0.982 0.983 0.982 0.982
Table 7.3 Prediction scores of initial models and after proposed solution model
11%
24
7. Results – Validation
15 trials
Trial No. Classifier AP F1 MCC Accuracy Recall Precision AUC
Total
Average
SVM 0.804 0.820 0.662 0.821 0.842 0.922 0.823
DT 0.838 0.880 0.756 0.870 0.875 0.935 0.878
RDT 0.624 0.588 0.349 0.640 0.506 0.919 0.644
RF 0.844 0.886 0.767 0.883 0.884 0.887 0.883
2-phase CCP DT 0.980 0.986 0.971 0.986 0.985 0.986 0.986
Table 7.4: Prediction scores of initial models and 2-phase CCP DT on 15 trials
28%17% 12%
25
7. Results – Features sensitivity analysis
26
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
7 8 9 10 11 12 13 14 15 16
Accuracy
Features
Original DT Accuracy 2-phase CCP DT Accuracy
0
500
1000
1500
2000
2500
7 8 9 10 11 12 13 14 15 16
Totalnodes
Features
Original DT total nodes 2-phase CCP DT total nodes
0
0.0002
0.0004
0.0006
0.0008
0.001
0.0012
0.0014
0.0016
7 8 9 10 11 12 13 14 15 16
Cost-complexitypruningalpha
Features
Most optimal CCP alpha
7. Results – Social, economic and environmental impacts
Social impacts
• Helping companies
maximize profits
• Reducing costs of second
delivery attempts
• Making predictions of
deliveries
• Managing supplier
performance
• Allowing governments to
allocate expenditure on this
research
Economic impacts Environmental impacts
• Reducing carbon dioxide
emissions due to second
deliveries
• Diminishing missed
deliveries can alleviate
environmental impacts
27
8. Conclusions & Recommendations
• A risk prediction model: 3 algorithms: SVM, DT, RF with different evaluation measures to
predict supply risks, focusing on risks of delayed deliveries
• A solution to improve feature selection using RFECV and a two-phase CCP technique
• A case study of an E-commerce Enabler
• Results: proposed solution predict correctly 98% of late deliveries, improving 12% over initial
model. Prioritizing performance over overfitting requires a compromise of CCP alpha values.
• Considering larger datasets, more AI algorithms techniques (for example, gradient boosting or
deep learning)
• Investigating pruning techniques for Random Forest
• Studying other supply chain risks, such as risks of quantity short deliveries or risks of product
returns. 28
Reference
Baryannis, G.; Validi, S.; Danib,S.; Antoniou,G. Supply chain risk
management and artificial intelligence: state of the art and future
research directions. 2018
1
A. Bruzzone, A. Orsoni. AI and simulation-based techniques for the
assessment of supply chain logistic performance, in: 36th Annual
Simulation Symposium, IEEE, Orlando, FL, USA, USA, 2003, pp.
154–164
2
Cavalcantea, I.; Frazzonb,E.; Forcellinia, F.; Ivanovc, D. A
supervised machine learning approach to data-driven simulation of
resilient supplier selection in digital manufacturing. 2019
3
S. Ye, Z. Xiao, G. Zhu, Identification of supply chain disruptions
with economic performance of firms using multi-category support
vector machines, Int. J. Prod, 2015
4
Baryannis, G.; Dani, S.; Antoniou, G. Predicting supply chain risks
using machine learning: The trade-off between performance and
interpretability, 2019
5
S.B. Kotsiantis, “Supervised Machine Learning: A Review of
Classification Techniques”, Informatica 31 (2007) 249-2686
Guyon, I, J Weston, S Barnhill, and V Vapnik. 2002. “Gene Selection
for Cancer Classification Using Support Vector Machines.” Machine
Learning 46 (1): 389–422.
7
Kuhn, M., Johnson, J. Feature Engineering and Selection: A
Practical Approach for Predictive Models, 20198
J. Quinlan. Simplifying decision trees, Int. J. HumanComputer
Studies, (1999)51, pp. 497-491, 19999
Breiman, L., H. Friedman, J., A. Olshen, R., J. Stone, C.:
Classification and Regression Trees. Chapman and Hall, New York
(1984)
10
B. Cestnik, and I. Bratko. On Estimating Probabilities in Tree
Pruning, EWSL, pp. 138-150, 199111
29
Q&A Section
THANK YOU FOR
YOUR TIME!
30

More Related Content

What's hot

Anomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnAnomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnagramfort
 
Genetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuningGenetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuningDr. Jyoti Obia
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Sunil Nair
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentationMrsShwetaBanait1
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLHimadri Mishra
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Machine learning - AI
Machine learning - AIMachine learning - AI
Machine learning - AIWitekio
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 
Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)Olusola Amusan
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text MiningMichel Bruley
 

What's hot (20)

Anomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnAnomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learn
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Genetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuningGenetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuning
 
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
Data Mining - Classification Of Breast Cancer Dataset using Decision Tree Ind...
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Object tracking presentation
Object tracking  presentationObject tracking  presentation
Object tracking presentation
 
5desc
5desc5desc
5desc
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoML
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Machine learning - AI
Machine learning - AIMachine learning - AI
Machine learning - AI
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
ML Basics
ML BasicsML Basics
ML Basics
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)Long Short Term Memory (Neural Networks)
Long Short Term Memory (Neural Networks)
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text Mining
 

Similar to Thesis presentation: Applications of machine learning in predicting supply risks

Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design TrainingESCOM
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningAndrea Gigli
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...Manuel Martín
 
Automation of building reliable models
Automation of building reliable modelsAutomation of building reliable models
Automation of building reliable modelsEszter Szabó
 
Injection Attack detection using ML for
Injection Attack detection using ML  forInjection Attack detection using ML  for
Injection Attack detection using ML forKhazane Hassan
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Venturesmicrosoftventures
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningLeo Salemann
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningKarunakar Kotha
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningWenfan Xu
 
Face Recognition: From Scratch To Hatch
Face Recognition: From Scratch To HatchFace Recognition: From Scratch To Hatch
Face Recognition: From Scratch To HatchEduard Tyantov
 
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)Ontico
 
06-00-ACA-Evaluation.pdf
06-00-ACA-Evaluation.pdf06-00-ACA-Evaluation.pdf
06-00-ACA-Evaluation.pdfAlexanderLerch4
 

Similar to Thesis presentation: Applications of machine learning in predicting supply risks (20)

Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
CSL0777-L07.pptx
CSL0777-L07.pptxCSL0777-L07.pptx
CSL0777-L07.pptx
 
eam2
eam2eam2
eam2
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text Mining
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...
 
Ajila (1)
Ajila (1)Ajila (1)
Ajila (1)
 
Automation of building reliable models
Automation of building reliable modelsAutomation of building reliable models
Automation of building reliable models
 
Injection Attack detection using ML for
Injection Attack detection using ML  forInjection Attack detection using ML  for
Injection Attack detection using ML for
 
BI_Final_Group_1
BI_Final_Group_1BI_Final_Group_1
BI_Final_Group_1
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Predicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine LearningPredicting Moscow Real Estate Prices with Azure Machine Learning
Predicting Moscow Real Estate Prices with Azure Machine Learning
 
Face Recognition: From Scratch To Hatch
Face Recognition: From Scratch To HatchFace Recognition: From Scratch To Hatch
Face Recognition: From Scratch To Hatch
 
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
 
06-00-ACA-Evaluation.pdf
06-00-ACA-Evaluation.pdf06-00-ACA-Evaluation.pdf
06-00-ACA-Evaluation.pdf
 
Hanaa phd presentation 14-4-2017
Hanaa phd  presentation  14-4-2017Hanaa phd  presentation  14-4-2017
Hanaa phd presentation 14-4-2017
 

Recently uploaded

HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 

Recently uploaded (20)

HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 

Thesis presentation: Applications of machine learning in predicting supply risks

  • 1. THESIS DEFENSE APPLICATIONS OF MACHINE LEARNING IN PREDICTING SUPPLY RISKS – A CASE STUDY OF AN E-COMMERCE ENABLER Presenter: Nguyen Minh Tuan Advisor: Dr. Nguyen Van Hop 1
  • 2. Thesis Presentation Outline 1. Problem Statement 03 2. Objectives, scope, limitations 04 3. Literature review 05 4. Methodology 06 5. Modeling & initial results 09 6. Proposed solution 19 7. Results validation 22 8. Conclusion & recommendations 28 Reference 2
  • 3. Supply chain risk management Identify, assess, mitigate and monitor unexpected events that negatively affect any part of supply chain (Baryannis et al., 2018) 1. Problem statement Potential risks in E- commerce supply chains • Delayed deliveries => out of stock, drops in sales • Difference in quantity => rejecting goods => second delivery attempt Applications of AI techniques in SCRM Applying Machine learning techniques at SKU level could predict supply chain risks & mitigate their impacts 3
  • 4. 2. Objectives, scope & limitations Objectives • A risk prediction model using selected metrics & ML algorithms (SVM, DT, RF) in predicting supply chain risks • A two-phase pruning approach for Decision Tree model in tackling its weakness, which is overfitting • A demonstration of the model’s application within a real-world E-commerce enabler. Scope • Focus on risks of delayed deliveries • Data: Group L’Oreal CPD (Maybelline, L’Oreal Paris) from May 2019 – Apr 2020 Limitations • Data collecting • Lack of historical data 4
  • 5. 3. Literature review S. Ye, et al. (2015) Support vector machines (SVM) can be applied in determining if a company display in equal financial profile to ones that suffered disruptions in the past A. Bruzzone, A. Orsoni (2003) Artificial Neural Networks (ANNs) is used to assess production losses & calculate cost estimates for different scenarios. Cavalcanti et al. (2019) Supervised ML such as k-Nearest Neighbors (k-NN) can be advantageous in decision-making process of resilient supplier selection, leading to more predictable delivery from suppliers Baryannis et al (2019) - Key Reference SVM and Desicion Trees are used to predict delivery delays in a manufacturing supply chain. DT achieves better interpretability than SVM. 5
  • 6. 1. Support Vector Machines (SVM) • High predictive accuracy, good generalization • Poor interpretability, slow training time 2. Decision Trees (DT) • High explanation ability, high training speed • Complex trees, overfitting, poor generalization capabilities 3. Random Forest (RF) • Low variance and bias • Can reduce overfitting • RF performs better than DT in most cases Table 4.1: Comparison of five learning algorithms (5: the best, 1: the worst) (Kotsiantis, 2007) Factors Decision Trees Neural Network Naïve Bayes K-NN SVM Accuracy in general 2 3 1 2 4 Speed of learning 3 1 4 4 1 Speed of classification 4 4 4 1 4 Tolerance to noise 2 2 3 1 2 Tolerance to irrelevant attributes 3 1 2 2 4 Dealing with risk of overfitting 2 1 3 3 2 Explanation ability 4 1 4 2 1 4. Methodology – Choosing ML algorithms Random Forest (Ensemble Learning method) 6
  • 7. 4. Methodology – Choosing metrics Actual: Positive Actual: Negative Predicted as Positive True Positive (TP) False Positive (FP) Predicted as Negative False Negative (FN) True Negative (TN) Accuracy = 𝑻𝑷+𝑻𝑵 𝑻𝑷+𝑻𝑵+𝑭𝑷+𝑭𝑵 Precision = 𝐓𝐏 𝐓𝐏+𝐅𝐏 | Recall = 𝐓𝐏 𝐓𝐏+𝐅𝐍 => might cause unjustified bias Balanced metrics AUC (Area Under the Curve) 2 * 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙 F1 score 𝑛(Rn - Rn-1) * Pn Average precision (AP) 𝑇𝑃 ∗ 𝑇𝑁 − 𝐹𝑃 ∗ 𝐹𝑁 𝑇𝑃 + 𝐹𝑃 𝑇𝑃 + 𝐹𝑁 𝑇𝑁 + 𝐹𝑃 (𝑇𝑁 + 𝐹𝑁) MCC 7
  • 8. Data is collected through internal & external sources (brand) Redundant variables removal, numerical form, checking outliers, normalization Features are selected and extracted to maintain meaningful ones Analysis on strengths/ weakness of different ML methods and key reference Solution to improve performance, deal with weaknesses of initial results SVM, DT, RF are selected as three model’s algorithm Comparison of initial models and proposed solution based on metric scores & different training sizes Recommend machine learning technique, future research Data Preprocessing Internal Data Collection External Data Collection Feature Engineering/ Selection Improved solution development SVM RF Results Validation Conclusion & Recommendation DT 4. Methodology – Conceptual design framework 8
  • 9. 5. Modelling – Data processing ~7200 SKUs, 8-month periodS Product barcode, sapcode, product name, purchase price, retail sale price P Orders purchase order number, platform purchase number, brand, platform O Deliveries requested quantity; requested amount; delivered quantity; delivered amount; actual quantity inbound at platforms; actual amount inbound at platforms; service level. D 9 CSV format Remove redundant variables Numerical form Checking outliers Raw data Normalization
  • 10. 5. Modelling – Feature Engineering and Selection SelectKBest + statistical test (mutual information) T-test analysis based on p- values Feature Engineering Feature Selection Goal: Predict whether a delivery will be late or not 1: late deliveries (>= 4 more business days) 0: on time deliveries (<= 3 business days) => binary classification problem 8- feature dataset 5- feature dataset 10
  • 11. 5. Modelling – Model generation Support Vector Machine Decision Tree Random Forest Grid search => optimal parameters C and 𝛾 (penalty parameter and kernel coefficient). Gini index, two experiments (unlimited tree, limited tree) Run model with default parameters => tuning parameters 80% - 20% Training set Test set 5-FOLD CROSS-VALIDATION 11
  • 12. Params Test scores Classification CT (min)C ꝩ AP F1 MCC Acc Recall Precision AUC TP TN FP FN 105 105 0.805 0.801 0.664 0.821 0.705 0.928 0.824 499 639 39 209 3.3 105 104 0.737 0.682 0.545 0.744 0.538 0.932 0.748 381 650 28 327 0.48 104 105 0.743 0.688 0.554 0.748 0.542 0.939 0.753 384 653 25 324 0.06 5. Modelling – Initial results – Support Vector Machine Table 5.1 Prediction scores using SVM with 8 features (a) Average Precision with 8 features (b) F1 score with 8 features Figure 5.1 Indicative grid search results for SVM with 8 features 12
  • 13. Feature Test scores AP F1 MCC Acc Recall Precision AUC 8 0.805 0.801 0.664 0.821 0.705 0.939 0.824 5 0.769 0.825 0.640 0.820 0.828 0.832 0.820 5. Modelling – Initial results – Support Vector Machine 5% Computational time 8-feature: 3.3 minutes 5-feature: < 0.5 minutes Params Test scores Classification CT (min)C ꝩ AP F1 MCC Acc Recall Precision AUC TP TN FP FN 105 105 0.751 0.814 0.613 0.807 0.828 0.801 0.806 586 532 146 122 1.60 105 104 0.766 0.812 0.626 0.812 0.792 0.832 0.813 561 565 113 147 0.08 104 105 0.769 0.825 0.640 0.820 0.826 0.823 0.820 585 552 126 123 0.06 Table 5.2 Prediction scores using SVM with 5 features Table 5.3 Best prediction scores using SVM with 8 and 5 features 13
  • 14. Table 5.4 Prediction score using unrestricted decision trees with default parameters 5. Modelling – Initial results – Decision Trees Feature Test scores Classification AP F1 MCC Acc Recall Precision AUC TP TN FP FN 8 0.744 0.683 0.555 0.747 0.534 0.947 0.751 378 657 21 330 5 0.842 0.885 0.765 0.882 0.886 0.884 0.882 627 596 82 81 Maximum depth Total nodes 8-feature DT 51 551 5-feature DT 41 1297 Maximum depth Total nodes 8-feature DT 6 15 5-feature DT 6 15 14
  • 15. Figure 5.2 Decision Tree classifier using restricted parameters with 8 features 5. Modelling – Initial results – Decision Trees Feature Test scores Classification AP F1 MCC Acc Recall Precision AUC TP TN FP FN 8 0.623 0.419 0.347 0.616 0.271 0.923 0.624 192 662 16 516 5 0.610 0.581 0.299 0.639 0.490 0.714 0.643 611 281 430 124 Table 5.5 Prediction scores using DT with max_depth = 6 and max_leaf_nodes = 15 15
  • 16. Figure 5.3 AUC score of n_estimators using RF with 8 features Figure 5.4 AUC score of n_estimators using RF with 5 features Table 5.6 Prediction scores using RF with default parameters 5. Modelling – Initial results – Random Forest Feature Test scores Classification AP F1 MCC Acc Recall Precision AUC TP TN FP FN 8 0.767 0.709 0.592 0.766 0.559 0.968 0.770 396 665 13 312 5 0.829 0.875 0.744 0.872 0.877 0.873 0.872 621 588 90 87 16
  • 17. Table 5.7 Comparison of prediction scores between the based model (subsequent rows) and model using paramaters tuning with RF for 8 & 5 features 5. Modelling – Initial results – Random Forest Test scores Classification Feature AP F1 MCC Acc Recall Precision AUC TP TN FP FN 8 0.767 0.709 0.592 0.766 0.559 0.968 0.770 396 665 13 312 8 (initial) 0.802 0.764 0.650 0.802 0.629 0.974 0.805 445 666 12 263 Improvement 4.56% 7.76% 9.80% 4.70% 12.52% 0.62% 4.55% 5 0.829 0.875 0.744 0.872 0.877 0.873 0.872 621 588 90 87 5 (initial) 0.843 0.887 0.768 0.884 0.890 0.884 0.884 630 595 83 78 Improvement 1.69% 1.37% 3.23% 1.38% 1.48% 1.26% 1.38% 17
  • 18. 5. Modelling – Initial results summary and conclusion Classifier Test scores AP F1 MCC Acc Recall Precision AUC SVM 0.805 0.825 0.664 0.821 0.828 0.939 0.824 DT 0.842 0.885 0.765 0.882 0.886 0.947 0.882 RDT 0.623 0.581 0.347 0.639 0.490 0.923 0.643 RF 0.843 0.887 0.768 0.884 0.890 0.974 0.884 Table 5.8 Summary of best prediction scores using SVM, DT, RDT, RF 1. Result #1: Random Forest RF outperformed other algorithms, drawback: complexity 2. Result #2: Support Vector Machine SVM achieve comparable results compared with DT, disadvantage: computational time 3. Result #3: Restricted Decision Trees Limiting tree => informative tree, visualizing tree => interpretation, coming at the expense of predictive performance. 4. Result #4: Feature dataset 5-feature dataset > 8-feature dataset (SVM, RF). DT: fluctuations, features having maximum prediction performance? 18
  • 19. 6. Proposed solution – RFECV using RF Improved feature selection Recursive feature elimination (Guyon et al., 2002) Cross-validation RFECV using Random Forest (Kuhn and Johnson, 2019) Create model on set of predictors, calculating scores that are important for each predictor => the least essential features are cut => recursively repeated to achieve final selected features Train models on subsets of the available input data and evaluating them on the complementary subset of the data. K-fold cross validation, k: number of split group (5 or 10) 1. Random Forest not exclude variables from prediction equation (ensembles models) 2. A method for measuring feature importance => feature rankings => find features with best predictive performance Initial used SelectKBest class with mutual information metric & statistical analysis based on p-value => 8-feature & 5- feature dataset => most optimal features? 19
  • 20. 6. Proposed solution – Two-phase cost-complexity pruning Pruning Techniques Advantages Disadvantages Reduced Error Pruning (REP) (Quinlan, 1999) - Linear computational complexity - Perform good accuracy - Bias towards over-pruning Pessimistic Error Pruning (PEP) (Quinlan, 1999) - High run speed - Not require a test data set - Not produce a selection of trees - Each node will still be visited once Cost-Complexity Pruning (CCP) (Breiman et al., 1984) - High accuracy when the best tree is selected - Produces a selection of trees for experts to study - Can only choose a tree in the set of subtrees Minimum Error Pruning (MEP) (Niblett and Bratko, 1991) - Can get an ideal solution in theory - Assumption is seldom true - Produces only a single tree - Unstable results PHASE 1: SURVEY Do survey of subtrees with relatively good accuracy performance for pruning PHASE 2: CCP Use CCP algorithm to choose the weakest link of the resultant tree to prune by seeking the most optimal ccp_alpha. TWO-PHASE CCP Table 6.1 Evaluation of Decision Tree pruning techniques 20
  • 21. 6. Proposed solution framework Decision Tree RFECV using RF 2-phase CCP pruning Result Validation Recursive feature elimination with cross-validation (RFECV) using Random Forest => choose the features with maximum potential 1) Search the subtrees with a relatively good accuracy performance in terms of max depth and max leaf nodes of the tree. 2) Find the tree with best predictive accuracy by seeking most optimal value of ccp_alpha Figure 6.1: Flowchart of the proposed solution 21
  • 22. 7. Results – Improved feature selection solution Optimal number of features: 9 Figure 7.1 Accuracy score of all features using RFECV with Random Forest Figure 7.2 Feature importances using RFECV with Random Forest Table 7.1 Prediction scores of DT model after using RFECV with RF 22 Test scores Classification Feature AP F1 MCC Acc Recall Precision AUC TP TN FP FN 9 0.830 0.871 0.739 0.860 0.860 0.881 0.870 609 596 82 99 8 0.972 0.982 0.962 0.981 0.983 0.980 0.981 696 664 14 12 7 0.974 0.982 0.964 0.982 0.983 0.982 0.982 696 665 13 12 6 1.000 1.000 1.000 1.000 1.000 1.000 1.000 708 678 0 0
  • 23. 7. Results – 2 phase cost-complexity pruning (CCP) Initial tree 2-phase CCP CCP alpha 0.0001 0.00015 0.0002 0.00025 0.0003 Training accuracy 0.969 0.911 0.909 0.903 0.900 0.893 Test accuracy 0.869 0.838 0.839 0.842 0.841 0.833 Difference (overfitting) 9.95% 7.29% 6.96% 6.10% 5.97% 5.93% Max depth 62 46 46 46 46 46 Total nodes 1649 983 935 821 775 699 Initial tree 2-phase CCP CCP alpha 0.0005 0.0010 0.0015 0.002 0.0025 Training accuracy 1 0.996 0.995 0.994 0.991 0.988 Test accuracy 0.982 0.981 0.981 0.982 0.978 0.973 Difference (overfitting) 1.80% 1.44% 1.42% 1.18% 1.30% 1.55% Max depth 12 12 12 11 10 10 Total nodes 211 119 115 103 89 83 Table 7.2: Comparison of ccp_alpha of 9-feature DT after 2-phase CCP Table 7.2: Comparison of ccp_alpha of 7-feature DT after 2-phase CCP Figure 7.3: 9-feature Decion Tree after 2-phase cost-complexity pruning Figure 7.4: 7-feature Decion Tree after 2-phase cost-complexity pruning 23
  • 24. 7. Results – Summary results Classifier Test scores AP F1 MCC Acc Recall Precision AUC Initial results SVM (key reference) 0.805 0.825 0.664 0.821 0.828 0.939 0.824 DT (key reference) 0.842 0.885 0.765 0.882 0.886 0.947 0.882 RDT (key reference) 0.623 0.581 0.347 0.639 0.490 0.923 0.643 RF 0.843 0.887 0.768 0.884 0.890 0.974 0.884 Improve solution 2-phase CCP DT 0.974 0.982 0.964 0.982 0.983 0.982 0.982 Table 7.3 Prediction scores of initial models and after proposed solution model 11% 24
  • 25. 7. Results – Validation 15 trials Trial No. Classifier AP F1 MCC Accuracy Recall Precision AUC Total Average SVM 0.804 0.820 0.662 0.821 0.842 0.922 0.823 DT 0.838 0.880 0.756 0.870 0.875 0.935 0.878 RDT 0.624 0.588 0.349 0.640 0.506 0.919 0.644 RF 0.844 0.886 0.767 0.883 0.884 0.887 0.883 2-phase CCP DT 0.980 0.986 0.971 0.986 0.985 0.986 0.986 Table 7.4: Prediction scores of initial models and 2-phase CCP DT on 15 trials 28%17% 12% 25
  • 26. 7. Results – Features sensitivity analysis 26 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 120.00% 7 8 9 10 11 12 13 14 15 16 Accuracy Features Original DT Accuracy 2-phase CCP DT Accuracy 0 500 1000 1500 2000 2500 7 8 9 10 11 12 13 14 15 16 Totalnodes Features Original DT total nodes 2-phase CCP DT total nodes 0 0.0002 0.0004 0.0006 0.0008 0.001 0.0012 0.0014 0.0016 7 8 9 10 11 12 13 14 15 16 Cost-complexitypruningalpha Features Most optimal CCP alpha
  • 27. 7. Results – Social, economic and environmental impacts Social impacts • Helping companies maximize profits • Reducing costs of second delivery attempts • Making predictions of deliveries • Managing supplier performance • Allowing governments to allocate expenditure on this research Economic impacts Environmental impacts • Reducing carbon dioxide emissions due to second deliveries • Diminishing missed deliveries can alleviate environmental impacts 27
  • 28. 8. Conclusions & Recommendations • A risk prediction model: 3 algorithms: SVM, DT, RF with different evaluation measures to predict supply risks, focusing on risks of delayed deliveries • A solution to improve feature selection using RFECV and a two-phase CCP technique • A case study of an E-commerce Enabler • Results: proposed solution predict correctly 98% of late deliveries, improving 12% over initial model. Prioritizing performance over overfitting requires a compromise of CCP alpha values. • Considering larger datasets, more AI algorithms techniques (for example, gradient boosting or deep learning) • Investigating pruning techniques for Random Forest • Studying other supply chain risks, such as risks of quantity short deliveries or risks of product returns. 28
  • 29. Reference Baryannis, G.; Validi, S.; Danib,S.; Antoniou,G. Supply chain risk management and artificial intelligence: state of the art and future research directions. 2018 1 A. Bruzzone, A. Orsoni. AI and simulation-based techniques for the assessment of supply chain logistic performance, in: 36th Annual Simulation Symposium, IEEE, Orlando, FL, USA, USA, 2003, pp. 154–164 2 Cavalcantea, I.; Frazzonb,E.; Forcellinia, F.; Ivanovc, D. A supervised machine learning approach to data-driven simulation of resilient supplier selection in digital manufacturing. 2019 3 S. Ye, Z. Xiao, G. Zhu, Identification of supply chain disruptions with economic performance of firms using multi-category support vector machines, Int. J. Prod, 2015 4 Baryannis, G.; Dani, S.; Antoniou, G. Predicting supply chain risks using machine learning: The trade-off between performance and interpretability, 2019 5 S.B. Kotsiantis, “Supervised Machine Learning: A Review of Classification Techniques”, Informatica 31 (2007) 249-2686 Guyon, I, J Weston, S Barnhill, and V Vapnik. 2002. “Gene Selection for Cancer Classification Using Support Vector Machines.” Machine Learning 46 (1): 389–422. 7 Kuhn, M., Johnson, J. Feature Engineering and Selection: A Practical Approach for Predictive Models, 20198 J. Quinlan. Simplifying decision trees, Int. J. HumanComputer Studies, (1999)51, pp. 497-491, 19999 Breiman, L., H. Friedman, J., A. Olshen, R., J. Stone, C.: Classification and Regression Trees. Chapman and Hall, New York (1984) 10 B. Cestnik, and I. Bratko. On Estimating Probabilities in Tree Pruning, EWSL, pp. 138-150, 199111 29
  • 30. Q&A Section THANK YOU FOR YOUR TIME! 30

Editor's Notes

  1. Thank you first of all for being here for my presentation.
  2. Supply chains, particularly in E-commerce businesses, have always been influenced by unpredictable risks that harm their operational activities. Researchers and supply chain practitioners have tried to minimize the effects of these events. So what is supply chain risk management? As stated in (Baryannis et al., 2018), supply chain risk management (SCRM) includes various strategies aiming to identify, assess, mitigate and monitor unexpected events or conditions which might have a negative impact on any part of a supply chain.  In terms of E-commerce supply chains, a challenge Ecommerce businesses might deal with is the risks of delayed deliveries. If there is any delay in the lead time delivery, this will cause drops in sales because out-of-stocks items would be passed over by customers. Other potential risks could occur when there is a difference between the characteristics or quantity of the goods in purchase orders and those are delivered in reality. This results in rejecting and returning products. It could take considerable time, efforts from relevant stakeholders to make another deliver.  At the same time, the applications of AI techniques in relation to SCRM has received increasing interest. Applying machine learning techniques at the SKU level appears to be crucial for protecting the delivery of goods from disruptions by predicting supply chain risks and mitigating their adverse impacts.
  3. The key objective of this paper is to investigate a risk prediction model using selected metrics & ML algorithms (SVM, DT, RF) in predicting supply chain risks. We also propose a two-phase pruning approach for Decision Tree classifier, ensuring overfitting of the model remains small. The models are then implemented in a case study with data from a real-world E-commerce company. Scopes: This study focuses on risks of late delivery of goods. The data collection will be collected according to the SKUs from the group brand L’Oreal Consumer Product Division (Maybelline, L’Oreal Paris) that are delivered to two primary E-commerce platforms (Shopee, Lazada) from May of 2019 to Apr of 2020. Other risks or risks from other brands are not within the scope of this research. Limitations: Data collecting is one of the challenge. It must be obtained from different sources before being used in machine learning algorithms.   Data is also expected to be in large size to provide an accurate analysis, which is a limitation since there has been a lack of historical data from the chosen brand. 
  4. - One of the earliest efforts to apply machine learning techniques in supply chain risk assessment from the use of Artificial Neural Network (ANN) by Bruzzone and Orsoni [14] with regard to production losses. Based on these training results, the ANNs learn how to compare input and output, allowing cost estimates to be calculated for various scenarios. - Ye et al. (2015) identify supply chain disruptions by applying support vector machine techniques within the economic performance of listed firms in China. The resulting -model is able to determine whether a particular company displays an equal financial profile to ones that suffered a disruption. - Cavalcante et al. (2019) introduce a supervised machine learning approach to resilient supplier selection in digital manufacturing. The results indicate that the use of supervised machine learning algorithms such as k-NN can be advantageous in the decision-making process of resilient supplier selection. - The most recent study using AI techniques in SCRM in the case of predicting delivery delays in a manufacturing supply chain. Two ML algorithms are selected, including support vector machines and decision trees. The results show that DT delivers better interpretation than SVM. Giving interpretability priority over performance may require a level of compromise. => Key Reference
  5. In this research, the strengths and weaknesses of some well-known classification algorithms are considered for the comparison, including DT, NN, NB, KNN, SVM Kotsiantis (2007) compared these techniques based on important features (from the evidence of existing empirical and theoretical studies). Results are given as the following table.  We first choose Support Vector Machines, as an example of high predictive accuracy and a good generalization performance algorithm. On the advantage side of high explanation ability and speed of classification, Decision Trees are chosen since classification trees have high training speed and are generally easily understood by non-experts. In order to overcome the limitations of the Decision Tree, Random Forest is used since it is less reliant on a single tree. Random Forest is based on the bagging technique which is one of the Ensemble Learning methods. An important feature of RF is its low variance and bias. Compared to decision trees which often suffer from the high variance factor. RF tends to address this problem effectively. Because RF is essentially a collection of multiple decision trees, the results of which are aggregated into one final result. Also, RF can reduce overfitting, the error due to variance. RF reduces variance by training the model on different samples and using a random subset of predictor variables. In most cases, random forest performs better than decision trees.  => For these reasons, Random forest is chosen as the third approach in this paper.
  6. Equally essential to the choice of algorithms is the choice of metrics. In this case, if a classification method is used, then the prediction model could result in one of four outcomes: TP, FP, FN, TN We use a standard metric like accuracy Metrics that prioritize one over the other such as precision or recall are not recommended since they might cause unjustified bias Other metrics that could balance between different outcomes are considered: F1 score, AP, MCC (Mathews correlation coefficient) AUC (the closer it comes to 1 the better the model is)
  7. 1/ Data is collected through internal and external sources (from brand) 2/ Data is going through preprocessing. 3/ In terms of Feature Engineering/ Selection, features are selected and extracted based on several methods to maintain the meaningful set of features 4/ Based on the strengths and weaknesses of different classification machine learning techniques, Support Vector Machines, Decision Trees, and Random Forest are selected as the three model’s algorithms. Then, new approaches could be implemented to improve their performance as well as overcome the weaknesses of the model’s initial result. 5/ A comparison of the initial models and the proposed solution model would be made in terms of metric scores and different training sizes to validate the results. 6/ Finally, the paper will suggest recommended machine learning techniques to predict supply risks with the highest ability. There will also be an additional discussion on future research of the thesis.
  8. We settled on a dataset containing information on approximately 7200 SKUs delivered in a 8-month period (September of 2019 – April of 2020). Some following data is given in terms of product, orders and deliveries. There are a number of stages in pre-processing data. First, raw data were imported into CSV format by Panda library. Next, we remove redundant variables (sap code, product name, purchase order number) Also, all data were converted to numerical form It was up to the algorithm that the data was normalized or not. (DT and RF did not require normalization as it used rule-based approaches instead of distance calculation)
  9. Feature engineering: The objective chosen was to predict, for a supplier of the brand, whether a delivery will be late or not. We select the delivery status variable: 1 to denote deliveries that are 4 or more business days late and 0 for the on-time delivery (up to 3 business days). Therefore, the goal of predicting whether a delivery was late or not refers to a binary classification problem. Feature selection: SelectKBest class, provided by scikit-learn library, was used with a well-known statistical test, mutual information, which calculates between two variables and measures the reduction in uncertainty for one variable given a known value of the other variable. 8 variable: platform, purchase price, the actual amount the order was delivered, the day the purchase order was raised, the week the purchase order was raised, the day the order was delivered, the week the order was delivered, transit time (the difference between the day the order was raised and delivered to the warehouse). Another method of feature selection was to use the t-test analysis, considered the p-value of coefficient (p-value < 0.05 shows that developing a model using this feature is significant). As a result, we obtained the dataset of 5 variables (including purchase price, retail purchase price, the day the purchase order was raised, the week the purchase order was raised, the week the order was delivered)
  10. The dataset is next split into training and test sets according to the ratio of 80% and 20%, respectively. For the training process, stratified 5-fold cross-validation is used to ensure that each class is correctly represented in all folds. (StratifiedKFold is a variation of k-fold that returns stratified folds: each set includes around the same percentage of samples of each target class as the complete set) In the case of SVM, a grid search is implemented to identify the optimal parameters C and 𝛾 (penalty parameter and kernel coefficient). In terms of DT, Gini index is used to create split points. Two experiments are run, one with default parameters that allow an arbitrarily large tree and one where the maximum depth of the tree and maximum leaf nodes are limited to ensure that the tree will not grow too large. For Random Forest, we also first run the model with default parameters, then tuning parameters is implemented to see if we can improve the performance of the prediction scores.
  11. Table 5.1 indicates results of test scores using 8 features based on optimal C and 𝛾 and their computation time (CT) while figure 5.1 represents results of the grid search process, using average precision and F1 score as metrics. For average precision, the highest test scores are optimal as C = 105 and 𝛾 = 105 with 86.3% precision and the results fall steadily as values of C or 𝛾 decrease. Similarly, results based on the F1 score share the same pattern.
  12. Results of prediction using 5 features are slightly better than those of 8 features. In terms of balanced metrics, the MCC score is improved the most, by over 5%, followed by the F1 score with almost the same percentage while there is not much difference between the AP score of the two feature sets. The primary drawback of this model is the computational time required to find the optimal value, which takes almost 9 minutes in running the model. Finally, parameters C = 105 and 𝛾 = 103 lead to the best possible compromise outcomes between different classifications with a reasonable computation time, at less than half a minute
  13. In terms of DT, we first run the decision tree classifier in scikit-learn with default parameters. This classifier is an optimized version of the CART algorithm uses the Gini impurity measure to determine the quality of each split in the tree. Also, the size and structure of the resulting tree show no limitation. Table 5.4 illustrates prediction scores using default parameters with both feature sets. The results of the dataset with 5 features are slightly better than those with 8 feature. However, the resulting trees are extremely large: for 8 features the tree has a maximum depth of 64 and 1881 total nodes, whereas for 5 features maximum depth is 41 and total nodes are 1411. Since the model’s explanation ability is limited when trees grow too large, we hinder the classifier to a maximum depth of 6 and maximum leaf nodes of 15, for example, which lead to trees with 29 nodes in total.
  14. In the tree shown in Figure 5.2, the feature that contributes to the best split (X7) is the transit time (the difference between the day the order was raised and delivered to the warehouse). This results in around 50% of the training samples classified as not late if the date the purchase order was raised is less than or equal to 3.5 days (the value corresponding to 0.0 before normalization), meaning the order was raised on Monday or Tuesday, which is incorrect for 2407 samples. The following splits require combinations of additional features. For example, if we continue to follow the left branch at the node where X3 ≤ 0.0, right branch at the node X7 ≤ 0.0, and finally left branch of X4 ≤ 0.0, then the delivery is predicted to be late, which is correct for 374 of the 390 samples. For decision trees restricted to a maximum depth of 6 and maximum 15 leaf nodes, results are generally worse than those of unrestricted decision trees for both 8 and 5 features. For the 8-feature set, only 18 samples are misclassified as late deliveries, leading to a precision of almost 93%. However, it comes at the expense of other metrics, with AP or accuracy falls at around 64%, showing marked fluctuations among other scores. Results are summarized in table 5.5.
  15. RF, we first fit a random forest with default parameters to get a baseline idea of the performance of the model. Table 5.6 shows prediction scores with both features. The results are more balanced and better compared to SVM or DT, with only 90 samples misclassified as late deliveries and predicting correctly above 88% of late deliveries in both feature sets. Generally, we only have a vague idea of the good parameters and thus we consider narrowing our search by evaluating a wide range of values for each parameter. In this part, we visualize a number of important parameters of Random Forest based on AUC score to see how they impact our models in terms of overfitting and underfitting, thereby, selecting the most optimized parameters to run the model. N_estimators N_estimators shows the number of trees in the forest. The default value of n_estimators is 100. Normally the higher the amount of trees, the better to learn the data and the less likely the model is to overfit. However, adding a lot of trees can slow down the training process considerably, therefore we do a parameter search to find the optimal value. Figure 4.4, 4.5 demonstrate AUC score of n_estimators with dataset of 8 and 5 features respectively. For the dataset of 8 features, we can stop at 150 trees as increasing the number of trees decreases the test performance. With the dataset of 5 features, we can stop at 200 trees. We do the same for other parameters.
  16. As can be observed, by tuning parameters, both feature sets witness a steady improvement in all metric scores in comparison to those using default parameters, notably model of 5 features. MCC achieved the most improvement (12%), followed by AP (7%) and other metrics (5-6%). For the 8-feature set, only 69 samples are misclassified as late deliveries, leading to an accuracy and precision score of about 90%. This explains why tuning parameters properly could have a positive effect on the performance of the model. However, since the prediction performance of both two datasets (8 and 5 features) has increased less dramatically compared to the based model, we consider taking a look at the previous steps of the model to see if there is still room for its performance improvement.
  17. To summarize, table 5.8 sums up the best scores from each classifier, including Support Vector Machines (SVM), Decision Trees (DT), Restricted Decision Tress (RDT), and Random Forest (RF). Overall, it is clear that Random Forest outperforms other algorithms in almost metric scores, with a precision and accuracy and average precision of around 92% and 90%, respectively. However, a drawback of RF is its complexity resulting in slower training speed and lower interpretability than DT because RF generates forest of trees (unlike only one tree in case of DT. Although SVM achieves comparable results compared with DT regarding recall and precision, other balanced scores such as average precision, F1 score, or MCC are lower in SVM than in the DT model. One of the disadvantages of SVM lies in its computational time, which takes almost 9 minutes in the case of the 8-feature set. In terms of Restricted Decision Tree, visualizing decision tree with restricted parameters can deliver a good interpretation, whereas this is not the case with SVM algorithms. However, it comes at the expense of the model predictive performance when we stop growing the tree early. Therefore, we need an approach to prune the tree yet still ensure a high predictive accuracy. Besides, the summary results also show that the 5-feature dataset yields better performance by using SVM algorithm compared to the 8-feature dataset, which is the same case with the RF algorithm. Decision Trees, on the other hand, shows sharp fluctuations in the metric scores between the two datasets. Hence, it is not necessarily true that the fewer features the model has, the better result it can generate. => Therefore, we need a solution in terms of feature selection to find features with better prediction results.
  18. Initially, we applied some feature selection techniques, including SelectKBest class with mutual information metric, recursive feature elimination using SVM, and statistical analysis based on p-value to select the 8-feature and 5-feature datasets. However, they might not be the most optimal features because irrelevant or partially relevant features can negatively impact model performance. Thus, in this section, we need to adopt a more pragmatic approach to the current method of feature selection. As noted earlier, recursive feature elimination (RFE, Guyon et al. (2002) [21]) is a backward selection of the predictors. This technique starts with creating a model on the whole set of predictors and calculating the scores that are important for each predictor. Then, the least essential features are cut from the current set of features. This process is recursively repeated on the pruned set until the final number of selected features is achieved. Cross-validation is a method for testing ML models by training several ML models on subsets of the available input data and evaluating them on the complementary subset of the data. This technique has a single parameter named k which indicates the number of groups that a given data sample should split. Therefore, the procedure is often known as k-fold cross-validation. Generally, the option of k is 5 or 10 but there is no formal rule. Backward selection is frequently used with random forest models for two reasons [Kuhn, Johnson, 2019]. First, the random forest tends not to exclude variables from the prediction equation. This is due to the nature of model ensembles. (Improved performance in ensembles is associated with the variety of constituent models, averaging models that are effectively the same does not reduce the variation in the model predictions. For this reason, using a random selection of predictors, the random forest forces the trees to contain sub-optimal splits of the predictors.) The second reason random forest is used with RFE is that this model has a well-known internal method for measuring feature importance. This can be used with the first model fit within RFE, where the entire predictor set is used to compute the feature rankings. In this case, recursive feature elimination with cross-validation (RFECV) using Random Forest will be used to rank the features of the initial dataset and hence, select the ones that give the best predictive performance.
  19. Over-fitting happens when data models fit into noise or misleading points in the input distribution, reducing generalization errors and predictive performance. To address this problem, researchers in the field have considerable interest in tree pruning. Cost-complexity pruning seems to be considered to study as it can deliver a good result if the best trees are selected. The method is at a disadvantage regarding the REP method because it can only select a tree in the set of subtrees {T0, T1, T2, …, TL}, rather than the set of all possible subtrees. To tackle this weakness, we modify the mentioned CCP technique in a more rational way, ensuring that appropriate subtrees are chosen for pruning. In this study, we adopt a two-phase approach. First, we survey the subtrees that have a relatively good accuracy performance for pruning. The purpose of this stage is to quickly prune subtrees that are unimportant to the model performance. In the second phase, we use cost-complexity pruning to choose the weakest link of the resultant tree to prune by calculating the ccp_alpha. This calculation is repeated until we find the most optimal value of alpha that gives the maximum accuracy performance. The cost-complexity pruning technique method will be discussed in detail in our report.
  20. In the proposed solution, the recursive feature elimination with cross-validation (RFECV) using Random Forest will be implemented to choose the features of the model that can provide the maximum potential for prediction. After that, a two-phase cost-complexity pruning technique is performed to prune the tree to reduce overfitting. The results will be validated according to different training sizes of the dataset.
  21. As mentioned earlier, RFECV using Random Forest is used to determine the features yielding the maximum prediction performance. We first make an instance of the Random Forest algorithm. Next, we create an instance of RFECV with step = 1 (removing one feature at each iteration), cv=StratifiedKFold(5) (using 5-Fold Cross Validation), and accuracy as a score metric to optimize. As a result, we achieve the following number of optimal features: Optimal number of features: 11 After that, we plot the accuracy score obtained with every number of features used. Figure 7.1 illustrates that 11 features give the accuracy scores that are above 91% Next, we draw a bar chart of feature importances to gain a better visualization. Figure 7.2 shows these results. It was clear that the feature that contributes significantly to the data is the difference between the day the order was raised and delivered to warehouses, followed by… Table 7.1 illustrating prediction scores of the DT model with default parameters after reselection feature. The results from the table show that the dataset of 7 features allows the tree algorithm to achieve a marked improvement in terms of prediction performance compared to the initial model while that of 6 features is undoubtedly an overfitting case.
  22. We found that after pruning, the tree performed best with the accuracy of 84% when ccp_alpha = 0.0001. Increasing that, the performance started to decline, indicating a trade-off between predictive performance, CCP alpha, and overfitting. Lower difference between training and test accuracy requires higher alpha, which means more complexity. Compared to the initial tree, we succeeded in pruning the trees to reduce overfitting by 2% yet keep a relatively good performance of accuracy, at 84%. Finally, we visualize the tree with value of ccp_alpha is 0.0001, given in Figure 7.3 The same process would apply to the dataset of 7 features. The tree, in this case, performed best with ccp_alpha in the vicinity of 0.0004. It also showed a 1% reduction in terms of overfitting, which remained nearly stable with the value of alpha varied in the range of 0.0003-0.0007. It is clear that the prediction scores of the 7-feature dataset far exceed those of the 8-feature dataset, reaching a peak of almost 98% for almost metric scores.
  23. In this part, we first summarize the results from the models of the initial results and the proposed two-phase CCP solution. Table 7.3 compares these results. The results show that 2-phase CCP pruned decision trees after reselecting features using RFECV witness a balanced and marked growth in all metric scores compared to the other three algorithms, including Decision Trees and SVM of the key reference. The model predicts correctly almost 98% of late deliveries, improving 11.6% of accuracy compared to the initial decision tree algorithm. A small reduction in overfitting is also experienced for both 8 and 7 features in 2-phase CCP DT, at the expense of CCP alpha values.
  24. Next, we take a variety of training sizes between 0.55 and 0.9 to validate the results of the initial models and the proposed solution model. We run 15 trials, given the average results as in Table 7.4. As can be seen, Decision Trees without restriction are capable of achieving better results with SVM, performing slightly worse than Random Forest in all metric scores. Two-phase CCP Decision Tree with 7-feature data sets, on the other hand, appeared to outperform other models with the highest figures, at around 98%. It saw a significant improvement compared to Decision Trees or SVM of the key reference, with 32% higher MCC, 19% higher average precision, and 13% higher accuracy as well as other metric scores.
  25. In this part, the social, economic, and environmental impacts of the supply chain risk prediction models would be taken into consideration.   First, the efforts of using predictive models may have benefits for researchers, enterprises, and society as a whole. Machine learning algorithms such as Decision Tree allow supply chain practitioners to make predictions about which delivery could be delayed or in shortage of quantity so that they can be more active in identifying the causes of these risks for minimizing these impacts. Furthermore, businesses could use these techniques to better manage their supplier performance, in which on-time delivery can be considered to be an important KPI. Thanks to the applications of machine learning predictive models, governments can make well-informed decisions on how to prioritize expenditure on this area of research, taking into consideration the needs of society, both present and future.   From an economic perspective, effective prediction of supply risks during delivery can help E-commerce enablers such as Onpoint to operate more smoothly and maximize profits. The reason is if the companies succeeded in forecasting the risk of delayed deliveries or SKUs that were likely to be in short supply, they would have prompted actions to implement a flexible approach in their sale campaign plans to fulfill customers’ orders, which helps to avoid drops in sales. It can also help save businesses a great deal of money due to second deliveries attempts In addition, plenty of evidence suggests that predicting supply chain risks using machine learning could make a positive impact on the environment. Managing the risks of missed deliveries can reduce the rate of second deliveries, which in turn reduces the amount of carbon dioxide emissions from vehicles. A study conducted by Heriot-Watt University has shown that if a van it out for delivery and half of the items in the van fail to be delivered, the carbon footprint of the van is 50% higher than if all the deliveries were successful. The study assumed the van attempts a second delivery in the same working shift increasing its mileage and emissions. This is also the case of client suppliers of Onpoint such as DHL when there are deliveries that fail to be accepted by E-commerce warehouses due to delay, which consequently leads to subsequent delivery attempts. These deliveries are attributed to the cause of increasing significant amounts of carbon emissions. By applying a supply chain risk prediction framework in this study, the number of missed deliveries can diminish and so does the impact on the environment.
  26. In this study, we presented a risk prediction model in terms of three algorithms, Support Vector Machine, Decision Tree and Random Forest using different evaluation measures from accuracy, recall, and precision to balanced metrics such as average precision, F1 score, MCC, AUC to predict the supply risks, especially the risks related to delayed deliveries. The thesis also focuses on proposing a new approach to enhance the decision tree model performance as well as tackling its overfitting weakness. A feature selection method of RFECV along with a pruning technique is suggested to not only improve the predictive performance but also reduce overfitting.   The applicability of the model is then demonstrated through a real-world case study of an E-commerce enabler. The results have shown the advantage of the proposed RFECV feature selection in improving predictive performance and the 2-phase cost-complexity pruning tree approach in solving overfitting problem in specific and the tree-based algorithm problems in general. On average, the proposed approach is able to predict correctly 98% of late deliveries, improving over 13% of the predictive performance compared to the initial decision tree. Prioritizing performance over overfitting requires a compromise in terms of CCP alpha values. Future research directions on the application of machine learning in predicting supply risks include (a) considering larger datasets and a more variety of AI algorithms techniques, including, for example, gradient boosting or deep learning; (b) investigating pruning techniques for Random Forest to tackle the problem of overfitting, and (c) studying whether similar approaches can be applied in other supply chain risks, such as risks of quantity short deliveries or risks of product returns.