SlideShare a Scribd company logo
1 of 25
Assignement 2 : KNIME
Özcan Oguzhan, Shoaib Aiham, Phetsarath Soulyvanh,
Smine Walid, Bertiau Gary
Business Intelligence and Data
Science
Step 1
Reading and Filtering
Step 1: Reading & Filtering
Step 1: Reading & Filtering
Step 1: Reading & Filtering
Import the file
with all of the
inputs
Correct the
names of the
variables that
has a ‘ç’ with a
‘c’ instead
Step 1: Reading & Filtering
Step 1: Reading & Filtering
Take out:
• The non-adherent
clients
• Clients who did
not buy “Films
Action et Policier”
during the
observation
period nor before,
having an
eclectism level
lower than 10.
Convert
target
variables
from ‘Int’ to
‘string’ to be
used later in
the DT
Take out
variables with
low variance
(<1) since low
variance
means less
differentiation
Compute the correlation
between the different
variables then take out
variables with high
correlation (>0,7) since
two variables highly
correlated gives more or
less the same
information (redundant)
Step 2
Partitioning, Modeling, Testing
Step 2 : Partitioning, Modelling, Testing
Step 2 : Partitioning, Modelling, Testing
Step 2 : Partitioning, Modelling, Testing
We decided to use a Tree Ensemble to create a relevant model
because :
• It can handle categorical features and this is the case here
(binary)
• It can deal with high dimensional spaces and large training
examples and it is also the case here
• It does not require a huge CPU capacity as other models
• The results with this model are quite satisfying
Step 2 : Partitioning, Modelling, Testing
The learner
creates the
model thanks
to the 80%
sample
• The input table is
split into two
partitions :
80%/20%
• The first 80% of the
input goes into the
Tree Ensemble
Learner (Creates
the Model) and the
20% into the
predictor (to test the
model)
The predictor
tests the model
on the 20%
and shows the
propensity of
interest for
each client
We import the
Webshop_Test
file to test the
model on the
test data
We join the
“Manga” and
“Film_Action et
Policier” data
We test the model
computed in the
Learner on the
“Webshop_Test”
file
Step 3
Computing Margin & Splitting Promotions
Step 3 :Computing Margin & Splitting Promotions
Step 3 :Computing Margin & Splitting Promotions
Step 3 : Computing Margin & Splitting Promotions
Table 1: Probability
to accept a film promo
per client
Table 2: Probability
to accept a manga
promo per client
Expected Margin =
Probability to buy *
Margin
Join margins of
each promo to
respective client
Step 3 :Computing Margin & Splitting Promotions
Step 3 : Computing Margin & Splitting Promotions
Creates a new column
which contains the film
margin when it is higher
than the manga margin
(for a given customer)
… and does the same
when the manga margin
is higher
Here we know the
promo with the best
margin per client
which will help us
deciding which item
to promote for each
client
Step 3 :Computing Margin & Splitting Promotions
Step 3: Computing Margin & Splitting Promotions
Creates a new C4 (film) column in which:
1) Margins below 0.5 are replaced by 0
2) Margins superior or equal to 0.5 are
replaced by 1
=> We know whether we will offer a film
promotion or not for a given customer
Same process
for Mangas
Margin per product
BD Mangas 427.901 €
Films actions et policiers 52.223 €
Total 480.124 €
Additional information
ROC and AUC for the Films_Actions et Policier
AUC=0,8762
ROC and AUC for the BD_Mangas
AUC=0,9491
Measures (Sensitivity, Precision, Accuracy)
Manga :
Films d’action et policiers :
Overfitting
• We used a Cross-Validation
test to verify if our model is
robust enough and is not
overfitted
• We noticed that the error rates
given by the Cross-Validation
are quite low for the 2
categories (<10%)
• Film’s Error % Variance : 0.437
• Manga’s Error % Variance:
0.152

More Related Content

Viewers also liked

ESRI ERUC 2014 - Easy Automation for Process Efficiencies
ESRI ERUC 2014 - Easy Automation for Process EfficienciesESRI ERUC 2014 - Easy Automation for Process Efficiencies
ESRI ERUC 2014 - Easy Automation for Process Efficiencies
Tammy Kobliuk
 
FME 2014 - Automating Creation of 911 Compliant Data
FME 2014 - Automating Creation of 911 Compliant DataFME 2014 - Automating Creation of 911 Compliant Data
FME 2014 - Automating Creation of 911 Compliant Data
Tammy Kobliuk
 
interior_forest_analysis
interior_forest_analysisinterior_forest_analysis
interior_forest_analysis
Tammy Kobliuk
 
ForestCoverClassificationProcedure
ForestCoverClassificationProcedureForestCoverClassificationProcedure
ForestCoverClassificationProcedure
Tammy Kobliuk
 

Viewers also liked (16)

Presentación ptar Eco Lógica
Presentación ptar Eco LógicaPresentación ptar Eco Lógica
Presentación ptar Eco Lógica
 
Introducing - Fourth Force - Next level for Verification's
Introducing - Fourth Force - Next level for Verification'sIntroducing - Fourth Force - Next level for Verification's
Introducing - Fourth Force - Next level for Verification's
 
Douglas-Gallant-M.B.A
Douglas-Gallant-M.B.ADouglas-Gallant-M.B.A
Douglas-Gallant-M.B.A
 
Revista Emverde Corporación de Empresarios Verdes
Revista Emverde Corporación de Empresarios VerdesRevista Emverde Corporación de Empresarios Verdes
Revista Emverde Corporación de Empresarios Verdes
 
Casos de éxito Eco Lógica
Casos de éxito Eco LógicaCasos de éxito Eco Lógica
Casos de éxito Eco Lógica
 
ESRI ERUC 2014 - Easy Automation for Process Efficiencies
ESRI ERUC 2014 - Easy Automation for Process EfficienciesESRI ERUC 2014 - Easy Automation for Process Efficiencies
ESRI ERUC 2014 - Easy Automation for Process Efficiencies
 
douglas gallant
douglas gallantdouglas gallant
douglas gallant
 
FME 2014 - Automating Creation of 911 Compliant Data
FME 2014 - Automating Creation of 911 Compliant DataFME 2014 - Automating Creation of 911 Compliant Data
FME 2014 - Automating Creation of 911 Compliant Data
 
douglas gallant
douglas gallantdouglas gallant
douglas gallant
 
interior_forest_analysis
interior_forest_analysisinterior_forest_analysis
interior_forest_analysis
 
Presentación sfie
Presentación sfiePresentación sfie
Presentación sfie
 
Internet y derechos fundamentales
Internet y derechos fundamentalesInternet y derechos fundamentales
Internet y derechos fundamentales
 
ForestCoverClassificationProcedure
ForestCoverClassificationProcedureForestCoverClassificationProcedure
ForestCoverClassificationProcedure
 
3 interpolasi1
3 interpolasi13 interpolasi1
3 interpolasi1
 
Pmkvy 2.0 2016
Pmkvy 2.0  2016Pmkvy 2.0  2016
Pmkvy 2.0 2016
 
Presentacion terremoto
Presentacion terremotoPresentacion terremoto
Presentacion terremoto
 

Similar to PrésentationKnime-Final

ContentsPhase 1 Design Concepts2Project Description2Use.docx
ContentsPhase 1 Design Concepts2Project Description2Use.docxContentsPhase 1 Design Concepts2Project Description2Use.docx
ContentsPhase 1 Design Concepts2Project Description2Use.docx
maxinesmith73660
 
Market basket predictive_model
Market basket predictive_modelMarket basket predictive_model
Market basket predictive_model
Fatima Khalid
 
Testing Software Solutions
Testing Software SolutionsTesting Software Solutions
Testing Software Solutions
gavhays
 

Similar to PrésentationKnime-Final (20)

Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Machine Learning in e commerce - Reboot
Machine Learning in e commerce - RebootMachine Learning in e commerce - Reboot
Machine Learning in e commerce - Reboot
 
Accurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification Algorithms
 
Towards a Practice of Token Engineering
Towards a Practice of Token EngineeringTowards a Practice of Token Engineering
Towards a Practice of Token Engineering
 
Xlminer demo
Xlminer demoXlminer demo
Xlminer demo
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
Ab testing 101
Ab testing 101Ab testing 101
Ab testing 101
 
ContentsPhase 1 Design Concepts2Project Description2Use.docx
ContentsPhase 1 Design Concepts2Project Description2Use.docxContentsPhase 1 Design Concepts2Project Description2Use.docx
ContentsPhase 1 Design Concepts2Project Description2Use.docx
 
Market basket predictive_model
Market basket predictive_modelMarket basket predictive_model
Market basket predictive_model
 
Testing Software Solutions
Testing Software SolutionsTesting Software Solutions
Testing Software Solutions
 
Black_Friday_Sales_Trushita
Black_Friday_Sales_TrushitaBlack_Friday_Sales_Trushita
Black_Friday_Sales_Trushita
 
Corporate presentation
Corporate presentationCorporate presentation
Corporate presentation
 
Decision theory
Decision theoryDecision theory
Decision theory
 
Decision analysis
Decision analysisDecision analysis
Decision analysis
 
Decision analysis
Decision analysisDecision analysis
Decision analysis
 
Decision analysis
Decision analysisDecision analysis
Decision analysis
 
Decision analysis
Decision analysisDecision analysis
Decision analysis
 

PrésentationKnime-Final

  • 1. Assignement 2 : KNIME Özcan Oguzhan, Shoaib Aiham, Phetsarath Soulyvanh, Smine Walid, Bertiau Gary Business Intelligence and Data Science
  • 2. Step 1 Reading and Filtering
  • 3. Step 1: Reading & Filtering
  • 4. Step 1: Reading & Filtering
  • 5. Step 1: Reading & Filtering Import the file with all of the inputs Correct the names of the variables that has a ‘ç’ with a ‘c’ instead
  • 6. Step 1: Reading & Filtering
  • 7. Step 1: Reading & Filtering Take out: • The non-adherent clients • Clients who did not buy “Films Action et Policier” during the observation period nor before, having an eclectism level lower than 10. Convert target variables from ‘Int’ to ‘string’ to be used later in the DT Take out variables with low variance (<1) since low variance means less differentiation Compute the correlation between the different variables then take out variables with high correlation (>0,7) since two variables highly correlated gives more or less the same information (redundant)
  • 9. Step 2 : Partitioning, Modelling, Testing
  • 10. Step 2 : Partitioning, Modelling, Testing
  • 11. Step 2 : Partitioning, Modelling, Testing We decided to use a Tree Ensemble to create a relevant model because : • It can handle categorical features and this is the case here (binary) • It can deal with high dimensional spaces and large training examples and it is also the case here • It does not require a huge CPU capacity as other models • The results with this model are quite satisfying
  • 12. Step 2 : Partitioning, Modelling, Testing The learner creates the model thanks to the 80% sample • The input table is split into two partitions : 80%/20% • The first 80% of the input goes into the Tree Ensemble Learner (Creates the Model) and the 20% into the predictor (to test the model) The predictor tests the model on the 20% and shows the propensity of interest for each client We import the Webshop_Test file to test the model on the test data We join the “Manga” and “Film_Action et Policier” data We test the model computed in the Learner on the “Webshop_Test” file
  • 13. Step 3 Computing Margin & Splitting Promotions
  • 14. Step 3 :Computing Margin & Splitting Promotions
  • 15. Step 3 :Computing Margin & Splitting Promotions
  • 16. Step 3 : Computing Margin & Splitting Promotions Table 1: Probability to accept a film promo per client Table 2: Probability to accept a manga promo per client Expected Margin = Probability to buy * Margin Join margins of each promo to respective client
  • 17. Step 3 :Computing Margin & Splitting Promotions
  • 18. Step 3 : Computing Margin & Splitting Promotions Creates a new column which contains the film margin when it is higher than the manga margin (for a given customer) … and does the same when the manga margin is higher Here we know the promo with the best margin per client which will help us deciding which item to promote for each client
  • 19. Step 3 :Computing Margin & Splitting Promotions
  • 20. Step 3: Computing Margin & Splitting Promotions Creates a new C4 (film) column in which: 1) Margins below 0.5 are replaced by 0 2) Margins superior or equal to 0.5 are replaced by 1 => We know whether we will offer a film promotion or not for a given customer Same process for Mangas Margin per product BD Mangas 427.901 € Films actions et policiers 52.223 € Total 480.124 €
  • 22. ROC and AUC for the Films_Actions et Policier AUC=0,8762
  • 23. ROC and AUC for the BD_Mangas AUC=0,9491
  • 24. Measures (Sensitivity, Precision, Accuracy) Manga : Films d’action et policiers :
  • 25. Overfitting • We used a Cross-Validation test to verify if our model is robust enough and is not overfitted • We noticed that the error rates given by the Cross-Validation are quite low for the 2 categories (<10%) • Film’s Error % Variance : 0.437 • Manga’s Error % Variance: 0.152