PrésentationKnime-Final

Assignement 2 : KNIME
Özcan Oguzhan, Shoaib Aiham, Phetsarath Soulyvanh,
Smine Walid, Bertiau Gary
Business Intelligence and Data
Science

Step 1: Reading & Filtering
Import the file
with all of the
inputs
Correct the
names of the
variables that
has a ‘ç’ with a
‘c’ instead

Step 1: Reading & Filtering
Take out:
• The non-adherent
clients
• Clients who did
not buy “Films
Action et Policier”
during the
observation
period nor before,
having an
eclectism level
lower than 10.
Convert
target
variables
from ‘Int’ to
‘string’ to be
used later in
the DT
Take out
variables with
low variance
(<1) since low
variance
means less
differentiation
Compute the correlation
between the different
variables then take out
variables with high
correlation (>0,7) since
two variables highly
correlated gives more or
less the same
information (redundant)

Step 2
Partitioning, Modeling, Testing

Step 2 : Partitioning, Modelling, Testing

We decided to use a Tree Ensemble to create a relevant model
because :
• It can handle categorical features and this is the case here
(binary)
• It can deal with high dimensional spaces and large training
examples and it is also the case here
• It does not require a huge CPU capacity as other models
• The results with this model are quite satisfying

The learner
creates the
model thanks
to the 80%
sample
• The input table is
split into two
partitions :
80%/20%
• The first 80% of the
input goes into the
Tree Ensemble
Learner (Creates
the Model) and the
20% into the
predictor (to test the
model)
The predictor
tests the model
on the 20%
and shows the
propensity of
interest for
each client
We import the
Webshop_Test
file to test the
model on the
test data
We join the
“Manga” and
“Film_Action et
Policier” data
We test the model
computed in the
Learner on the
“Webshop_Test”
file

Step 3
Computing Margin & Splitting Promotions

Step 3 :Computing Margin & Splitting Promotions

Step 3 : Computing Margin & Splitting Promotions
Table 1: Probability
to accept a film promo
per client
Table 2: Probability
to accept a manga
promo per client
Expected Margin =
Probability to buy *
Margin
Join margins of
each promo to
respective client

Step 3 : Computing Margin & Splitting Promotions
Creates a new column
which contains the film
margin when it is higher
than the manga margin
(for a given customer)
… and does the same
when the manga margin
is higher
Here we know the
promo with the best
margin per client
which will help us
deciding which item
to promote for each
client

Step 3: Computing Margin & Splitting Promotions
Creates a new C4 (film) column in which:
1) Margins below 0.5 are replaced by 0
2) Margins superior or equal to 0.5 are
replaced by 1
=> We know whether we will offer a film
promotion or not for a given customer
Same process
for Mangas
Margin per product
BD Mangas 427.901 €
Films actions et policiers 52.223 €
Total 480.124 €

ROC and AUC for the Films_Actions et Policier
AUC=0,8762

ROC and AUC for the BD_Mangas
AUC=0,9491

Measures (Sensitivity, Precision, Accuracy)
Manga :
Films d’action et policiers :

Overfitting
• We used a Cross-Validation
test to verify if our model is
robust enough and is not
overfitted
• We noticed that the error rates
given by the Cross-Validation
are quite low for the 2
categories (<10%)
• Film’s Error % Variance : 0.437
• Manga’s Error % Variance:
0.152

PrésentationKnime-Final

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (16)

Similar to PrésentationKnime-Final

Similar to PrésentationKnime-Final (20)

PrésentationKnime-Final