7. Step 1: Reading & Filtering
Take out:
• The non-adherent
clients
• Clients who did
not buy “Films
Action et Policier”
during the
observation
period nor before,
having an
eclectism level
lower than 10.
Convert
target
variables
from ‘Int’ to
‘string’ to be
used later in
the DT
Take out
variables with
low variance
(<1) since low
variance
means less
differentiation
Compute the correlation
between the different
variables then take out
variables with high
correlation (>0,7) since
two variables highly
correlated gives more or
less the same
information (redundant)
11. Step 2 : Partitioning, Modelling, Testing
We decided to use a Tree Ensemble to create a relevant model
because :
• It can handle categorical features and this is the case here
(binary)
• It can deal with high dimensional spaces and large training
examples and it is also the case here
• It does not require a huge CPU capacity as other models
• The results with this model are quite satisfying
12. Step 2 : Partitioning, Modelling, Testing
The learner
creates the
model thanks
to the 80%
sample
• The input table is
split into two
partitions :
80%/20%
• The first 80% of the
input goes into the
Tree Ensemble
Learner (Creates
the Model) and the
20% into the
predictor (to test the
model)
The predictor
tests the model
on the 20%
and shows the
propensity of
interest for
each client
We import the
Webshop_Test
file to test the
model on the
test data
We join the
“Manga” and
“Film_Action et
Policier” data
We test the model
computed in the
Learner on the
“Webshop_Test”
file
16. Step 3 : Computing Margin & Splitting Promotions
Table 1: Probability
to accept a film promo
per client
Table 2: Probability
to accept a manga
promo per client
Expected Margin =
Probability to buy *
Margin
Join margins of
each promo to
respective client
18. Step 3 : Computing Margin & Splitting Promotions
Creates a new column
which contains the film
margin when it is higher
than the manga margin
(for a given customer)
… and does the same
when the manga margin
is higher
Here we know the
promo with the best
margin per client
which will help us
deciding which item
to promote for each
client
20. Step 3: Computing Margin & Splitting Promotions
Creates a new C4 (film) column in which:
1) Margins below 0.5 are replaced by 0
2) Margins superior or equal to 0.5 are
replaced by 1
=> We know whether we will offer a film
promotion or not for a given customer
Same process
for Mangas
Margin per product
BD Mangas 427.901 €
Films actions et policiers 52.223 €
Total 480.124 €
25. Overfitting
• We used a Cross-Validation
test to verify if our model is
robust enough and is not
overfitted
• We noticed that the error rates
given by the Cross-Validation
are quite low for the 2
categories (<10%)
• Film’s Error % Variance : 0.437
• Manga’s Error % Variance:
0.152