SlideShare a Scribd company logo
Santander Product Recommendation
&
Product Prediction
Srivarsini Rangarajan
John Erickson
Problem Overview
● This is a Kaggle competition problem
(https://www.kaggle.com/c/santander-product-recommendation)
● 1.5 years of customers behavior data from Santander bank to predict what new
products customers will purchase.
● Columns 1 - 24 are customer information; 25 - 48 are products purchased
● The test and train sets are split by time, and public and private leaderboard
sets are split randomly.
● Training data over 13 million rows; Test data about 1M
● GOAL: predict which products existing customers will buy but here we have
considered product prediction based on features of existing customers.
Problem - Solving Strategy
Exploratory Analysis (John)
● Market Basket Analysis of training data (20K rows)
● Concept: use arules package in R to use apriori algorithm for association
rules (example: {item 1, item 2} → {item 3} )
Prediction (Sri)
● Use boosting to predict product purchases
● In R, we use XGBoost package, which is good for product
recommendations (has won many Kaggle competitions before)
Exploratory Analysis (Market Basket Analysis)
Step 1: Convert Data Frame into a
Sparse Matrix (I did this in Python).
This allows the arules package to work
Exploratory Analysis (Market Basket Analysis)
Bank2 <- read.transactions("santander3.csv",sep=",")
summary(Bank2)
Exploratory Analysis (Market Basket Analysis)
Exploratory Analysis (Market Basket Analysis)
● This means plot items with
support >= 0.01
Exploratory Analysis (Market Basket Analysis)
Exploratory Analysis (Market Basket Analysis)
Support: the proportion of item(s) in the dataset.
Equation: number of occurences of item X / number of items in dataset
Confidence: the likelihood of an item Z is purchased given items X,Y purchased
Equation: conf (X -> Y) = support (x U y) / support (X)
Lift: how much more likely an item is to be purchased with these other items than by itself
Exploratory Analysis (Market Basket Analysis)
Prediction with Xgboost - eXtreme Gradient Boosting
● Supervised Learning Technique
● Prediction based on set of weak ensemble models to get a strong model.
● Tries to optimize the differentiable loss function(cost associated with each
misclassification)
Objective: Prediction of a customer opening a Current Account based on the
existing customer features (Test 10K rows and Train 10K rows)
Features:
Sex,Age,RelAtBegMOnth,ResIndex,ForIndex,Channel,ProvinceCode,ActivityIndex,In
come,Segment
Dependent Variable: CurrentAccount
XGBoost on Santander Data
1. Data Cleanup & Convert all the features into a numeric matrix
santanderTrain$Sex <- as.numeric(santanderTrain$Sex)
2. Tuning Parameters
3. Predictor Selection & Numeric Label
4. Used Cross Validation to identify the best minimal loss
5. Train the model 6. Predict the results
Conclusion(s)
From market basket analysis / association rules, we learned that
● if a Santander customer has Direct Debit, e-account, and pensions, they will also likely have a Payroll
Account.
● Clearly, these people are employed, and likely to use Santander in the future, so long as they are
happy customers.
From xgboost, we learned that
● Efficient in predicting the customer product with the set of provided features.
● With the results of XGBoost, we found that Gross Income is a major feature that is used in classifying
Cluster 1(No Current Account) and Channel, Age and Province Code are important in classifying
Cluster 2(Potential Customers)
● Validation(sum(abs(pred-orig) upon train data provided an accuracy of 80% in prediction.

More Related Content

What's hot

Bpr bayesian personalized ranking from implicit feedback
Bpr bayesian personalized ranking from implicit feedbackBpr bayesian personalized ranking from implicit feedback
Bpr bayesian personalized ranking from implicit feedback
Park JunPyo
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
Pranov Mishra
 
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)
Amol Patil
 
MLR Walmart Dataset
MLR Walmart DatasetMLR Walmart Dataset
Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical data
Paul Skeie
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNetIntroduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
Amazon Web Services
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning
Omkar Rane
 
Detecting malaria using a deep convolutional neural network
Detecting malaria using a deep  convolutional neural networkDetecting malaria using a deep  convolutional neural network
Detecting malaria using a deep convolutional neural network
Yusuf Brima
 
Churn modelling
Churn modellingChurn modelling
Churn modelling
Yogesh Khandelwal
 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
BU - PG Master Computing Conference
 
Shap
ShapShap

What's hot (12)

Bpr bayesian personalized ranking from implicit feedback
Bpr bayesian personalized ranking from implicit feedbackBpr bayesian personalized ranking from implicit feedback
Bpr bayesian personalized ranking from implicit feedback
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs)
 
MLR Walmart Dataset
MLR Walmart DatasetMLR Walmart Dataset
MLR Walmart Dataset
 
Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical data
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNetIntroduction to Generative Adversarial Networks (GAN) with Apache MXNet
Introduction to Generative Adversarial Networks (GAN) with Apache MXNet
 
(Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning (Machine Learning) Ensemble learning
(Machine Learning) Ensemble learning
 
Detecting malaria using a deep convolutional neural network
Detecting malaria using a deep  convolutional neural networkDetecting malaria using a deep  convolutional neural network
Detecting malaria using a deep convolutional neural network
 
Churn modelling
Churn modellingChurn modelling
Churn modelling
 
Customer churn prediction in banking
Customer churn prediction in bankingCustomer churn prediction in banking
Customer churn prediction in banking
 
Shap
ShapShap
Shap
 

Viewers also liked

Product recommendation for Santander Bank customers
Product recommendation for Santander Bank customersProduct recommendation for Santander Bank customers
Product recommendation for Santander Bank customersSumit Saini
 
Letter of Recommendation
Letter of RecommendationLetter of Recommendation
Letter of RecommendationSumit Saini
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket AnalysisOlaseni Fadipe
 
Review of a paper on market basket analysis
Review of a paper on market basket analysisReview of a paper on market basket analysis
Review of a paper on market basket analysisDev Karan Singh Maletia
 
Market Basket Analysis in SAS
Market Basket Analysis in SASMarket Basket Analysis in SAS
Market Basket Analysis in SAS
Andrew Kramer
 
Transactions / Basket Analysis
Transactions / Basket AnalysisTransactions / Basket Analysis
Transactions / Basket Analysis
Philippe Nemery
 
Market basketanalysis using r
Market basketanalysis using rMarket basketanalysis using r
Market basketanalysis using r
Yogesh Khandelwal
 
Data mining- Association Analysis -market basket
Data mining- Association Analysis -market basketData mining- Association Analysis -market basket
Data mining- Association Analysis -market basketSwapnil Soni
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
tsering choezom
 
Market analysis and the buying behavior of buyers of paper industry
Market analysis and the buying behavior of  buyers of paper industryMarket analysis and the buying behavior of  buyers of paper industry
Market analysis and the buying behavior of buyers of paper industry
AbhisheK Kumar Rajoria
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
Mahendra Gupta
 
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekFrom Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
Jo-fai Chow
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
Derek Kane
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Derek Kane
 

Viewers also liked (14)

Product recommendation for Santander Bank customers
Product recommendation for Santander Bank customersProduct recommendation for Santander Bank customers
Product recommendation for Santander Bank customers
 
Letter of Recommendation
Letter of RecommendationLetter of Recommendation
Letter of Recommendation
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
Review of a paper on market basket analysis
Review of a paper on market basket analysisReview of a paper on market basket analysis
Review of a paper on market basket analysis
 
Market Basket Analysis in SAS
Market Basket Analysis in SASMarket Basket Analysis in SAS
Market Basket Analysis in SAS
 
Transactions / Basket Analysis
Transactions / Basket AnalysisTransactions / Basket Analysis
Transactions / Basket Analysis
 
Market basketanalysis using r
Market basketanalysis using rMarket basketanalysis using r
Market basketanalysis using r
 
Data mining- Association Analysis -market basket
Data mining- Association Analysis -market basketData mining- Association Analysis -market basket
Data mining- Association Analysis -market basket
 
Market basket analysis
Market basket analysisMarket basket analysis
Market basket analysis
 
Market analysis and the buying behavior of buyers of paper industry
Market analysis and the buying behavior of  buyers of paper industryMarket analysis and the buying behavior of  buyers of paper industry
Market analysis and the buying behavior of buyers of paper industry
 
Market Basket Analysis
Market Basket AnalysisMarket Basket Analysis
Market Basket Analysis
 
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekFrom Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 

Similar to Santander

DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle Contest
DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle ContestDA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle Contest
DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle Contest
Berker Kozan
 
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptxbigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
Harshavardhan851231
 
BIG MART SALES.pptx
BIG MART SALES.pptxBIG MART SALES.pptx
BIG MART SALES.pptx
LSURYAPRAKASHREDDY
 
BIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptxBIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptx
LSURYAPRAKASHREDDY
 
Customer analytics for e commerce
Customer analytics for e commerceCustomer analytics for e commerce
Customer analytics for e commerce
Alok Tayal (PMP, PMI-ACP, TOGAF)
 
Erdi güngör bbs
Erdi güngör bbsErdi güngör bbs
Erdi güngör bbs
Erdi Güngör
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
Aseda Owusua Addai-Deseh
 
EE660 Project_sl_final
EE660 Project_sl_finalEE660 Project_sl_final
EE660 Project_sl_finalShanglin Yang
 
Bimbo Final Project Presentation
Bimbo Final Project PresentationBimbo Final Project Presentation
Bimbo Final Project PresentationCan Köklü
 
ContentsPhase 1 Design Concepts2Project Description2Use.docx
ContentsPhase 1 Design Concepts2Project Description2Use.docxContentsPhase 1 Design Concepts2Project Description2Use.docx
ContentsPhase 1 Design Concepts2Project Description2Use.docx
maxinesmith73660
 
Data science guide
Data science guideData science guide
Data science guide
gokulprasath06
 
CapstoneProjectPresentation.pdf
CapstoneProjectPresentation.pdfCapstoneProjectPresentation.pdf
CapstoneProjectPresentation.pdf
VineetGupta76987
 
Tarun datascientist affle
Tarun datascientist affleTarun datascientist affle
Tarun datascientist affle
Tarun Aditya
 
Wooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit CustomersWooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit Customers
Lucinda Linde
 
Data Scientist in Data Science Forecasting Team by Minhui Wu, Data Scientist ...
Data Scientist in Data Science Forecasting Team by Minhui Wu, Data Scientist ...Data Scientist in Data Science Forecasting Team by Minhui Wu, Data Scientist ...
Data Scientist in Data Science Forecasting Team by Minhui Wu, Data Scientist ...
Paris Women in Machine Learning and Data Science
 
Building ML models for smart retail
Building ML models for smart retailBuilding ML models for smart retail
Building ML models for smart retail
Albert Y. C. Chen
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
PATHALAMRAJESH
 
Resume xiaodan(vinci)
Resume xiaodan(vinci)Resume xiaodan(vinci)
Resume xiaodan(vinci)
vinci105
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
Daniel Hen
 
CoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core OperationsCoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core Operations
DataBench
 

Similar to Santander (20)

DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle Contest
DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle ContestDA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle Contest
DA 592 - Term Project Presentation - Berker Kozan Can Koklu - Kaggle Contest
 
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptxbigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx
 
BIG MART SALES.pptx
BIG MART SALES.pptxBIG MART SALES.pptx
BIG MART SALES.pptx
 
BIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptxBIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptx
 
Customer analytics for e commerce
Customer analytics for e commerceCustomer analytics for e commerce
Customer analytics for e commerce
 
Erdi güngör bbs
Erdi güngör bbsErdi güngör bbs
Erdi güngör bbs
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
EE660 Project_sl_final
EE660 Project_sl_finalEE660 Project_sl_final
EE660 Project_sl_final
 
Bimbo Final Project Presentation
Bimbo Final Project PresentationBimbo Final Project Presentation
Bimbo Final Project Presentation
 
ContentsPhase 1 Design Concepts2Project Description2Use.docx
ContentsPhase 1 Design Concepts2Project Description2Use.docxContentsPhase 1 Design Concepts2Project Description2Use.docx
ContentsPhase 1 Design Concepts2Project Description2Use.docx
 
Data science guide
Data science guideData science guide
Data science guide
 
CapstoneProjectPresentation.pdf
CapstoneProjectPresentation.pdfCapstoneProjectPresentation.pdf
CapstoneProjectPresentation.pdf
 
Tarun datascientist affle
Tarun datascientist affleTarun datascientist affle
Tarun datascientist affle
 
Wooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit CustomersWooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit Customers
 
Data Scientist in Data Science Forecasting Team by Minhui Wu, Data Scientist ...
Data Scientist in Data Science Forecasting Team by Minhui Wu, Data Scientist ...Data Scientist in Data Science Forecasting Team by Minhui Wu, Data Scientist ...
Data Scientist in Data Science Forecasting Team by Minhui Wu, Data Scientist ...
 
Building ML models for smart retail
Building ML models for smart retailBuilding ML models for smart retail
Building ML models for smart retail
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
Resume xiaodan(vinci)
Resume xiaodan(vinci)Resume xiaodan(vinci)
Resume xiaodan(vinci)
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
CoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core OperationsCoreBigBench: Benchmarking Big Data Core Operations
CoreBigBench: Benchmarking Big Data Core Operations
 

Recently uploaded

一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 

Recently uploaded (20)

一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 

Santander

  • 1. Santander Product Recommendation & Product Prediction Srivarsini Rangarajan John Erickson
  • 2. Problem Overview ● This is a Kaggle competition problem (https://www.kaggle.com/c/santander-product-recommendation) ● 1.5 years of customers behavior data from Santander bank to predict what new products customers will purchase. ● Columns 1 - 24 are customer information; 25 - 48 are products purchased ● The test and train sets are split by time, and public and private leaderboard sets are split randomly. ● Training data over 13 million rows; Test data about 1M ● GOAL: predict which products existing customers will buy but here we have considered product prediction based on features of existing customers.
  • 3. Problem - Solving Strategy Exploratory Analysis (John) ● Market Basket Analysis of training data (20K rows) ● Concept: use arules package in R to use apriori algorithm for association rules (example: {item 1, item 2} → {item 3} ) Prediction (Sri) ● Use boosting to predict product purchases ● In R, we use XGBoost package, which is good for product recommendations (has won many Kaggle competitions before)
  • 4. Exploratory Analysis (Market Basket Analysis) Step 1: Convert Data Frame into a Sparse Matrix (I did this in Python). This allows the arules package to work
  • 5. Exploratory Analysis (Market Basket Analysis) Bank2 <- read.transactions("santander3.csv",sep=",") summary(Bank2)
  • 6. Exploratory Analysis (Market Basket Analysis)
  • 7. Exploratory Analysis (Market Basket Analysis) ● This means plot items with support >= 0.01
  • 8. Exploratory Analysis (Market Basket Analysis)
  • 9. Exploratory Analysis (Market Basket Analysis) Support: the proportion of item(s) in the dataset. Equation: number of occurences of item X / number of items in dataset Confidence: the likelihood of an item Z is purchased given items X,Y purchased Equation: conf (X -> Y) = support (x U y) / support (X) Lift: how much more likely an item is to be purchased with these other items than by itself
  • 10. Exploratory Analysis (Market Basket Analysis)
  • 11. Prediction with Xgboost - eXtreme Gradient Boosting ● Supervised Learning Technique ● Prediction based on set of weak ensemble models to get a strong model. ● Tries to optimize the differentiable loss function(cost associated with each misclassification) Objective: Prediction of a customer opening a Current Account based on the existing customer features (Test 10K rows and Train 10K rows) Features: Sex,Age,RelAtBegMOnth,ResIndex,ForIndex,Channel,ProvinceCode,ActivityIndex,In come,Segment Dependent Variable: CurrentAccount
  • 12. XGBoost on Santander Data 1. Data Cleanup & Convert all the features into a numeric matrix santanderTrain$Sex <- as.numeric(santanderTrain$Sex) 2. Tuning Parameters
  • 13. 3. Predictor Selection & Numeric Label 4. Used Cross Validation to identify the best minimal loss
  • 14. 5. Train the model 6. Predict the results
  • 15.
  • 16. Conclusion(s) From market basket analysis / association rules, we learned that ● if a Santander customer has Direct Debit, e-account, and pensions, they will also likely have a Payroll Account. ● Clearly, these people are employed, and likely to use Santander in the future, so long as they are happy customers. From xgboost, we learned that ● Efficient in predicting the customer product with the set of provided features. ● With the results of XGBoost, we found that Gross Income is a major feature that is used in classifying Cluster 1(No Current Account) and Channel, Age and Province Code are important in classifying Cluster 2(Potential Customers) ● Validation(sum(abs(pred-orig) upon train data provided an accuracy of 80% in prediction.