SlideShare a Scribd company logo
Using Taradata SAS Visual Analytics to Analyze Bio Organics Data
Exercise 1:
1) How many observations are there in the BIGORGANICS dataset?
● 111,115 Observations
2) How many variables (or columns) are there? Which variables are Measures? And
Categories?
● There are 13 Variables
● The following Variables are Measures: Affluence Grade, Age, Loyalty Card
Tenure, Organics Purchase Count, Organics Purchase Indicator, Total Spend
● The following Variables are Categories: Customer Loyalty ID, Gender,
Geographic Region, Loyalty Status, Neighborhood Cluster- 55 Level,
Neighborhood Cluster- 7 Level, Television Region
3) Quickly, verify the distributions and the main statistics of each measurable variable.
What are their average values? Which variables present missing values?
●
● Variables presenting missing values are: Television Region, Gender, Geographic
Region, Neighborhood Cluster-55 Level and Neighborhood Cluster-7 Level
4) Using the visualizations, you already have created or by creating new ones, which
categorical variables present missing values?
Exercise 2:
Do the imputation for the other categorical variables:
1. Geographic Region substitute the missing values by “South East”
2. Television Region substitute the missing values by “London”
3. Neighborhood Cluster-55 Level - substitute the missing values by “U”
4. Neighborhood Cluster-7 Level -substitute the missing values by “U”
Exercise 3:
Do the imputation for the other interval variables (verify exercise 3 on page 9) and check their
new distributions.
1) Age – substitute the missing values by 53:
2) Loyalty Card Tenure – substitute the missing values by 0
Exercise 4:
1) Create a Box Plot that displays the Affluence Grade and Age by Organics Purchase
Indicator. Show the averages (include a screenshot of the output as Answer)
2) What conclusions can you make about those who purchase organic products?
● Those who purchase organic products have a higher Affluence Grade than those who
don’t.
Exercise 5:
1) Which cluster appears to have the youngest customers?
It appears that cluster id # 1 have the youngest customer with age 18.
2) Which cluster appears to have the customers with highest Loyalty Card Tenure?
Cluster 4 has the highest Loyalty Card Tenure.
3) Which cluster appears to have the customers with highest Affluence?
Cluster # 2 has the highest Affluent
4) Which cluster appears to have the customers with the highest Organics Purchase?
Cluster # 3, and 2 have the highest Organic purchase.
5) Which cluster appears to have the customers with the highest Total Spend?
Cluster ID 1 has the highest total spent.
6) Which cluster appears to have more customers with Loyalty Status: Gold?
Exercise 6:
1) Which variables are actually been using by the Tree?
It is using Age, Gender, and Affluence variables.
2) Which variable is the first split? On which Value?
Age
3) How many leaves to we have? Which is the purest leaf?
We have had 10 but we changed it to 20 leafs
Exercise 7:
1) What is the cumulative lift for the model at the 5th
percentile? And on the 20th
percentile?
At the 5th percentile the lift is at 3.5 and at the 20th percentile it is 2.5
2) View the ROC chart. What is the KS Statistic?
KS Statistics as follows : .4511
3) View the Misclassification Chart. How many false negatives are there?
4) Display the details table. Which node provides the highest gain?
North provides the highest gain.
5) What are the splitting rules applied to this node?
It is using the following variables as nodes:
Exercise 8:
1) Which variables were not considered important to the model? What would be an underlying
cause? From the graph below we could tell it did not accept variables Geo_IMP and NC-7
Level_IMP. There might be not significant data to change the value.
2) What is the R-Square for the model? What is the number of Observations used?
R-sqaure is .2239 and Observations are 111,115
3) What is the cumulative lift for the model at the 5th
percentile? And on the 20th
percentile?
At the 5th percentile the lift is at 3.5 and at the 20th percentile it is 2.5
4) View the ROC Chart. What is the KS Statistic?
KS Statistics as follows : .4511
ROC CHART:
5) View the Misclassification Chart. How many false negatives are there?
6) Select some data points with residuals greater than 25, right-click in the Residual Plot, and
select Exclude Selected. What ist he number of Observations used now? What is the R-Square
now? How many false negatives are there now?
After doing that the new R-Sqaure value is .2161 and Observations changed to 104,545 please
see the image below:
Exercise 9:
1) On which Age group we got the highest R-Square? And which group has more
observations?
2) Is the Total Spend variable relevant for all the models? For which models is it relevant?
3) How any false negatives are there for Young group? And for Middle?Ffalse negative rate is
higher for younger group versus middle group.
Exercise 10:
1) Which model was selected by default?
Decision Tree
2) Which model provides the best FPR? And FDR?
FPR= Logistic regression, and FDR is the Decision Tree
3) Which model provides the best KS Statistic? Kolmogorov-Smirnoff statistic is a goodness of
fit metric that represents the maximum separation between the model ROC curve and the
baseline.
4) Suppose we are going to have a new campaign promoting the Organics products and the
budget for the campaign allows us to have only 14,000 customer contacts (around 15% of the
customer base). So, we want to apply this model to select a target population for this campaign,
composed by the 15,000 customers with the greatest propensity to buy Organic products.
Which model would be selected?
5) Suppose the same campaign scenario as the previous, but with 20,000 customers (around
20% of the customer base). Which model would be selected?
6) TIP: Cumulative % of events – cumulative number of events observed up to and including
the specified percentile bin divided by the total number of observations, sorted in descending
order of the predicted event probabilities.

More Related Content

What's hot

Data cube computation
Data cube computationData cube computation
Data cube computationRashmi Sheikh
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
Dr.ammara khakwani
 
Lecture 14 requirements modeling - flow and behavior
Lecture 14   requirements modeling - flow and  behaviorLecture 14   requirements modeling - flow and  behavior
Lecture 14 requirements modeling - flow and behavior
IIUI
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
mahavir_a
 
Analytical tools
Analytical toolsAnalytical tools
Analytical tools
Aniket Joshi
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
zekeLabs Technologies
 
Mining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTMining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDT
Davide Gallitelli
 
Analysis vs reporting
Analysis vs reportingAnalysis vs reporting
Analysis vs reporting
Rajashree Thirupathi
 
Association rules apriori algorithm
Association rules   apriori algorithmAssociation rules   apriori algorithm
Association rules apriori algorithm
Dr. Jasmine Beulah Gnanadurai
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
Sulman Ahmed
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
guest0edcaf
 
Building & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business CaseBuilding & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business Case
Siddhanth Chaurasiya
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
uncleRhyme
 
Information Storage and Retrieval : A Case Study
Information Storage and Retrieval : A Case StudyInformation Storage and Retrieval : A Case Study
Information Storage and Retrieval : A Case Study
Bhojaraju Gunjal
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Salah Amean
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Krish_ver2
 
3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis
mlong24
 
Introduction to statistics 2013
Introduction to statistics 2013Introduction to statistics 2013
Introduction to statistics 2013Mohammad Ihmeidan
 

What's hot (20)

Data cube computation
Data cube computationData cube computation
Data cube computation
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
 
Lecture 14 requirements modeling - flow and behavior
Lecture 14   requirements modeling - flow and  behaviorLecture 14   requirements modeling - flow and  behavior
Lecture 14 requirements modeling - flow and behavior
 
Web mining slides
Web mining slidesWeb mining slides
Web mining slides
 
Analytical tools
Analytical toolsAnalytical tools
Analytical tools
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 
Mining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTMining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDT
 
Analysis vs reporting
Analysis vs reportingAnalysis vs reporting
Analysis vs reporting
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Association rules apriori algorithm
Association rules   apriori algorithmAssociation rules   apriori algorithm
Association rules apriori algorithm
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Building & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business CaseBuilding & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business Case
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Information Storage and Retrieval : A Case Study
Information Storage and Retrieval : A Case StudyInformation Storage and Retrieval : A Case Study
Information Storage and Retrieval : A Case Study
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis3.5 Exploratory Data Analysis
3.5 Exploratory Data Analysis
 
Introduction to statistics 2013
Introduction to statistics 2013Introduction to statistics 2013
Introduction to statistics 2013
 

Viewers also liked

LuBeLiSq_LiveStream_InteractivePDF_Final
LuBeLiSq_LiveStream_InteractivePDF_FinalLuBeLiSq_LiveStream_InteractivePDF_Final
LuBeLiSq_LiveStream_InteractivePDF_FinalTing Lu
 
Larry_Somers_Resume
Larry_Somers_ResumeLarry_Somers_Resume
Larry_Somers_ResumeLarry Somers
 
Project
ProjectProject
ProjectXu Liu
 
Team DOFH Final Slide Deck
Team DOFH Final Slide DeckTeam DOFH Final Slide Deck
Team DOFH Final Slide DeckAnukriti Kurria
 

Viewers also liked (6)

ANAND TALSANIYA CV
ANAND TALSANIYA CVANAND TALSANIYA CV
ANAND TALSANIYA CV
 
LuBeLiSq_LiveStream_InteractivePDF_Final
LuBeLiSq_LiveStream_InteractivePDF_FinalLuBeLiSq_LiveStream_InteractivePDF_Final
LuBeLiSq_LiveStream_InteractivePDF_Final
 
Larry_Somers_Resume
Larry_Somers_ResumeLarry_Somers_Resume
Larry_Somers_Resume
 
Project
ProjectProject
Project
 
Publication
PublicationPublication
Publication
 
Team DOFH Final Slide Deck
Team DOFH Final Slide DeckTeam DOFH Final Slide Deck
Team DOFH Final Slide Deck
 

Similar to BioOrganicsCS

Exam 1 (covers Chapters 1-7)Math 140Show all work! Na.docx
Exam 1 (covers Chapters 1-7)Math 140Show all work!     Na.docxExam 1 (covers Chapters 1-7)Math 140Show all work!     Na.docx
Exam 1 (covers Chapters 1-7)Math 140Show all work! Na.docx
SANSKAR20
 
Data visualisation laboratory report/manual
Data visualisation laboratory report/manualData visualisation laboratory report/manual
Data visualisation laboratory report/manual
VidhyambikaSR
 
Data visualisation laboratory report/manual
Data visualisation laboratory report/manualData visualisation laboratory report/manual
Data visualisation laboratory report/manual
VidhyambikaSR
 
Mth 540 Massive Success / snaptutorial.com
Mth 540 Massive Success / snaptutorial.comMth 540 Massive Success / snaptutorial.com
Mth 540 Massive Success / snaptutorial.com
Reynoldsha
 
Mth 540 Success Begins / snaptutorial.com
Mth 540  Success Begins / snaptutorial.comMth 540  Success Begins / snaptutorial.com
Mth 540 Success Begins / snaptutorial.com
WilliamsTaylor63
 
Instructions and Advice · This assignment consists of six que.docx
Instructions and Advice · This assignment consists of six que.docxInstructions and Advice · This assignment consists of six que.docx
Instructions and Advice · This assignment consists of six que.docx
dirkrplav
 
1Answer the following questions1. Jackson even-numbered C.docx
1Answer the following questions1. Jackson even-numbered C.docx1Answer the following questions1. Jackson even-numbered C.docx
1Answer the following questions1. Jackson even-numbered C.docx
hyacinthshackley2629
 
Qnt 351 Education Organization-snaptutorial.com
Qnt 351 Education Organization-snaptutorial.comQnt 351 Education Organization-snaptutorial.com
Qnt 351 Education Organization-snaptutorial.com
robertlesew39
 
Discussion Questions Chapter 15Terms in Review1Define or exp.docx
Discussion Questions Chapter 15Terms in Review1Define or exp.docxDiscussion Questions Chapter 15Terms in Review1Define or exp.docx
Discussion Questions Chapter 15Terms in Review1Define or exp.docx
edgar6wallace88877
 
Final SAS Day 2015 Poster
Final SAS Day 2015 PosterFinal SAS Day 2015 Poster
Final SAS Day 2015 PosterReuben Hilliard
 
Qnt 351 final exam
Qnt 351 final examQnt 351 final exam
Qnt 351 final examhytf__012
 
Campaign response modeling
Campaign response modelingCampaign response modeling
Campaign response modeling
Esteban Ribero
 
QNT 351 Final Exam Guide C
QNT 351 Final Exam Guide CQNT 351 Final Exam Guide C
QNT 351 Final Exam Guide C
monsterr14
 
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxHomework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
pooleavelina
 
Qnt 561 final exam
Qnt 561 final examQnt 561 final exam
Qnt 561 final exam
vioregardthis1986
 
Reporting and Analysis In this assignment, you will review and.docx
Reporting and Analysis In this assignment, you will review and.docxReporting and Analysis In this assignment, you will review and.docx
Reporting and Analysis In this assignment, you will review and.docx
sodhi3
 
As mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docxAs mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docx
fredharris32
 
Teaching students how to critically appraise organizational data
Teaching students how to critically appraise organizational dataTeaching students how to critically appraise organizational data
Teaching students how to critically appraise organizational data
Center for Evidence-Based Management
 
QNT 565 Extraordinary Life/newtonhelp.com 
QNT 565 Extraordinary Life/newtonhelp.com QNT 565 Extraordinary Life/newtonhelp.com 
QNT 565 Extraordinary Life/newtonhelp.com 
myblue58
 

Similar to BioOrganicsCS (20)

JEDM_RR_JF_Final
JEDM_RR_JF_FinalJEDM_RR_JF_Final
JEDM_RR_JF_Final
 
Exam 1 (covers Chapters 1-7)Math 140Show all work! Na.docx
Exam 1 (covers Chapters 1-7)Math 140Show all work!     Na.docxExam 1 (covers Chapters 1-7)Math 140Show all work!     Na.docx
Exam 1 (covers Chapters 1-7)Math 140Show all work! Na.docx
 
Data visualisation laboratory report/manual
Data visualisation laboratory report/manualData visualisation laboratory report/manual
Data visualisation laboratory report/manual
 
Data visualisation laboratory report/manual
Data visualisation laboratory report/manualData visualisation laboratory report/manual
Data visualisation laboratory report/manual
 
Mth 540 Massive Success / snaptutorial.com
Mth 540 Massive Success / snaptutorial.comMth 540 Massive Success / snaptutorial.com
Mth 540 Massive Success / snaptutorial.com
 
Mth 540 Success Begins / snaptutorial.com
Mth 540  Success Begins / snaptutorial.comMth 540  Success Begins / snaptutorial.com
Mth 540 Success Begins / snaptutorial.com
 
Instructions and Advice · This assignment consists of six que.docx
Instructions and Advice · This assignment consists of six que.docxInstructions and Advice · This assignment consists of six que.docx
Instructions and Advice · This assignment consists of six que.docx
 
1Answer the following questions1. Jackson even-numbered C.docx
1Answer the following questions1. Jackson even-numbered C.docx1Answer the following questions1. Jackson even-numbered C.docx
1Answer the following questions1. Jackson even-numbered C.docx
 
Qnt 351 Education Organization-snaptutorial.com
Qnt 351 Education Organization-snaptutorial.comQnt 351 Education Organization-snaptutorial.com
Qnt 351 Education Organization-snaptutorial.com
 
Discussion Questions Chapter 15Terms in Review1Define or exp.docx
Discussion Questions Chapter 15Terms in Review1Define or exp.docxDiscussion Questions Chapter 15Terms in Review1Define or exp.docx
Discussion Questions Chapter 15Terms in Review1Define or exp.docx
 
Final SAS Day 2015 Poster
Final SAS Day 2015 PosterFinal SAS Day 2015 Poster
Final SAS Day 2015 Poster
 
Qnt 351 final exam
Qnt 351 final examQnt 351 final exam
Qnt 351 final exam
 
Campaign response modeling
Campaign response modelingCampaign response modeling
Campaign response modeling
 
QNT 351 Final Exam Guide C
QNT 351 Final Exam Guide CQNT 351 Final Exam Guide C
QNT 351 Final Exam Guide C
 
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxHomework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
 
Qnt 561 final exam
Qnt 561 final examQnt 561 final exam
Qnt 561 final exam
 
Reporting and Analysis In this assignment, you will review and.docx
Reporting and Analysis In this assignment, you will review and.docxReporting and Analysis In this assignment, you will review and.docx
Reporting and Analysis In this assignment, you will review and.docx
 
As mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docxAs mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docx
 
Teaching students how to critically appraise organizational data
Teaching students how to critically appraise organizational dataTeaching students how to critically appraise organizational data
Teaching students how to critically appraise organizational data
 
QNT 565 Extraordinary Life/newtonhelp.com 
QNT 565 Extraordinary Life/newtonhelp.com QNT 565 Extraordinary Life/newtonhelp.com 
QNT 565 Extraordinary Life/newtonhelp.com 
 

BioOrganicsCS

  • 1. Using Taradata SAS Visual Analytics to Analyze Bio Organics Data Exercise 1: 1) How many observations are there in the BIGORGANICS dataset? ● 111,115 Observations 2) How many variables (or columns) are there? Which variables are Measures? And Categories? ● There are 13 Variables ● The following Variables are Measures: Affluence Grade, Age, Loyalty Card Tenure, Organics Purchase Count, Organics Purchase Indicator, Total Spend ● The following Variables are Categories: Customer Loyalty ID, Gender, Geographic Region, Loyalty Status, Neighborhood Cluster- 55 Level, Neighborhood Cluster- 7 Level, Television Region 3) Quickly, verify the distributions and the main statistics of each measurable variable. What are their average values? Which variables present missing values? ● ● Variables presenting missing values are: Television Region, Gender, Geographic Region, Neighborhood Cluster-55 Level and Neighborhood Cluster-7 Level 4) Using the visualizations, you already have created or by creating new ones, which categorical variables present missing values?
  • 2. Exercise 2: Do the imputation for the other categorical variables: 1. Geographic Region substitute the missing values by “South East”
  • 3. 2. Television Region substitute the missing values by “London” 3. Neighborhood Cluster-55 Level - substitute the missing values by “U” 4. Neighborhood Cluster-7 Level -substitute the missing values by “U”
  • 4. Exercise 3: Do the imputation for the other interval variables (verify exercise 3 on page 9) and check their new distributions. 1) Age – substitute the missing values by 53: 2) Loyalty Card Tenure – substitute the missing values by 0
  • 5. Exercise 4: 1) Create a Box Plot that displays the Affluence Grade and Age by Organics Purchase Indicator. Show the averages (include a screenshot of the output as Answer) 2) What conclusions can you make about those who purchase organic products? ● Those who purchase organic products have a higher Affluence Grade than those who don’t. Exercise 5:
  • 6. 1) Which cluster appears to have the youngest customers? It appears that cluster id # 1 have the youngest customer with age 18. 2) Which cluster appears to have the customers with highest Loyalty Card Tenure? Cluster 4 has the highest Loyalty Card Tenure. 3) Which cluster appears to have the customers with highest Affluence? Cluster # 2 has the highest Affluent 4) Which cluster appears to have the customers with the highest Organics Purchase? Cluster # 3, and 2 have the highest Organic purchase. 5) Which cluster appears to have the customers with the highest Total Spend? Cluster ID 1 has the highest total spent.
  • 7. 6) Which cluster appears to have more customers with Loyalty Status: Gold? Exercise 6: 1) Which variables are actually been using by the Tree? It is using Age, Gender, and Affluence variables.
  • 8. 2) Which variable is the first split? On which Value? Age 3) How many leaves to we have? Which is the purest leaf? We have had 10 but we changed it to 20 leafs Exercise 7:
  • 9. 1) What is the cumulative lift for the model at the 5th percentile? And on the 20th percentile? At the 5th percentile the lift is at 3.5 and at the 20th percentile it is 2.5 2) View the ROC chart. What is the KS Statistic? KS Statistics as follows : .4511 3) View the Misclassification Chart. How many false negatives are there? 4) Display the details table. Which node provides the highest gain? North provides the highest gain.
  • 10. 5) What are the splitting rules applied to this node? It is using the following variables as nodes: Exercise 8: 1) Which variables were not considered important to the model? What would be an underlying cause? From the graph below we could tell it did not accept variables Geo_IMP and NC-7 Level_IMP. There might be not significant data to change the value. 2) What is the R-Square for the model? What is the number of Observations used?
  • 11. R-sqaure is .2239 and Observations are 111,115 3) What is the cumulative lift for the model at the 5th percentile? And on the 20th percentile? At the 5th percentile the lift is at 3.5 and at the 20th percentile it is 2.5 4) View the ROC Chart. What is the KS Statistic? KS Statistics as follows : .4511 ROC CHART: 5) View the Misclassification Chart. How many false negatives are there?
  • 12. 6) Select some data points with residuals greater than 25, right-click in the Residual Plot, and select Exclude Selected. What ist he number of Observations used now? What is the R-Square now? How many false negatives are there now? After doing that the new R-Sqaure value is .2161 and Observations changed to 104,545 please see the image below: Exercise 9: 1) On which Age group we got the highest R-Square? And which group has more observations?
  • 13. 2) Is the Total Spend variable relevant for all the models? For which models is it relevant? 3) How any false negatives are there for Young group? And for Middle?Ffalse negative rate is higher for younger group versus middle group.
  • 14. Exercise 10: 1) Which model was selected by default? Decision Tree 2) Which model provides the best FPR? And FDR? FPR= Logistic regression, and FDR is the Decision Tree
  • 15.
  • 16. 3) Which model provides the best KS Statistic? Kolmogorov-Smirnoff statistic is a goodness of fit metric that represents the maximum separation between the model ROC curve and the baseline.
  • 17. 4) Suppose we are going to have a new campaign promoting the Organics products and the budget for the campaign allows us to have only 14,000 customer contacts (around 15% of the customer base). So, we want to apply this model to select a target population for this campaign, composed by the 15,000 customers with the greatest propensity to buy Organic products. Which model would be selected?
  • 18. 5) Suppose the same campaign scenario as the previous, but with 20,000 customers (around 20% of the customer base). Which model would be selected? 6) TIP: Cumulative % of events – cumulative number of events observed up to and including the specified percentile bin divided by the total number of observations, sorted in descending order of the predicted event probabilities.