PredictingCustomer’s
NextOrders
Erdi Güngör
Content
Define
Objective
Prepare
Data
Algorithms
Interpret
Results
Data preparation
Feature creation
ClassificationBusiness
objective
Predictive models
2
Content
Define
Objective
Prepare
Data
Algorithms
Interpret
Results
Data preparation
Feature creation
Classification Predictive models
3
Business
objective
Objective
 The online shopping mall problem
To make easy to fill basket with personal favorites of the customers
What will be in the next market basket ?
Objective
 Supervised Learning
 Unsupervised Learning (Apriori, Collaborative Filtering)
 Deep LearningApproach
 MarkovChain
Content
Define
Objective
Prepare
Data
Algorithms
Interpret
Results
Data preparation
Feature creation
Classification Predictive models
6
Business
objective
Prepare Data /
Source Data
 Open source data
Aisle – 135 record Department – 21
record
Order Products
Prior - 1048576
Order Products
Train - 1048576
Orders- 3421083 Products - 49689
Prepare Data /
Base Data
 Data Preparation
 Train set: 13863746 -23 Test set: 4833292 -24
Order_eval
_set
User Order Order_2 Order_2_
eval_set
Products
_2
Reordered
Test 1234 4 1 Prior A 0
Test 1234 4 1 Prior B 0
Test 1234 4 1 Prior C 0
Test 1234 4 2 Prior D 0
Test 1234 4 2 Prior B 1
Test 1234 4 2 Prior E 0
Test 1234 4 3 Prior F 0
Test 1234 4 3 Prior G 1
Test 1234 4 3 Prior E 1
Order_ev
al_set
User Order Products Reordere
d
Prior 5678 10 X 0
Prior 5678 10 Y 0
Prior 5678 10 Z 0
Prior 5678 11 W 0
Prior 5678 11 Q 0
Prior 5678 11 Z 1
Train 5678 12 W 1
Train 5678 12 Q 1
Train 5678 12 Z 1
Prepare Data /
Features
Customer
Based
• Total number of orders /
products / unique
products / unique
department / unique
aisle / unique order day
• Sum / Avg. of reordered
products
• The customer period
• Avg.Time between
orders
• Rate of reordered
products in total
products
• The order count after
related product
Customer –
Product Based
• Avg. sequence in market
basket
• Sum of reordered
information of product
• The max./avg. hour to
order product
• The min. / max. order
number of product
• The most popular hour
for product
Clustering
Label
• Kmeans
5 cluster
Wrt percentage usage of
sub-segment in over all
Content
Define
Objective
Prepare
Data
Algorithms
Interpret
Results
Data preparation
Feature creation
Classification Predictive models
10
Business
objective
Algorithms
 DecisionTrees, Naive Bayes, Instance Based, Logistic Regression,
SupportVector Machine, Regression
 Ensemble Learning: Random Forest, AdaBoost, Gradient Boosting
 Neural Networks
Algorithms
 Train-Test Split ( K-crossValidation)
 Extreme Gradient Boosting (XGBoost)*
 Random Forest
 Logistic Regression
 DecisionTree
 Gradient Boosting
Algorithms
Extreme Gradient Boosting (XGBoost)
 Implementation of gradient boosted decision trees
 Regularization
 Parallel processing
 High flexibility ( all types of data)
 Handling missing values and tree pruning
Algorithms
Extreme Gradient Boosting (XGBoost) Parameters
 Objective (Reg:Logistic): Learning objective
 Eval_metric (Logloss): Evaluation metric/Negative log-likelihood
 Eta (0.1): Shrinkage the feature weights to prevent overfitting
 Max_depth (7): Maximum depth of tree
 Min_child_weight (9) :Minimum sum of instance weight
 Gamma (0.80): Minimum loss reduction to make further partition
 Subsample (0.76): Subsample ratio of the training instance
 Colsample_bytree (0.95): Subsample ratio of columns when
constructing each tree
 Alpha (0) : L1 regularization term on weight
 Lambda (1) : L2 regularization term on weight
Algorithms
Extreme Gradient Boosting (XGBoost)
 Dmatrix : data structure
 Training model
 Label Prediction-> probabilistic values
 Threshold (0.20)
Content
Define
Objective
Prepare
Data
Algorithms
Interpret
Results
Data preparation
Feature creation
Classification Predictive models
16
Business
objective
Interpret
Results
• 5-Fold CrossValidation Model performance: + 90% accuracy
THANK
YOU
 Andrew Ng – Geoffrey Hinton
 https://www.kdnuggets.com/
 https://www.kaggle.com/

Erdi güngör bbs