SlideShare a Scribd company logo
Project – 3.
DATA MINING
Thera Bank - Loan Purchase Modeling
By,
Saleesh Satheeshchandran, PGP BABI 2019-20 (G6)
Objective of the project - Thera Bank - Loan Purchase
Modeling
The Objective of the project is to build the best model which can classify the right customers
who have a higher probability of purchasing the loan.
The dataset has data on 5000 customers. The data include customer demographic information
(age, income, etc.), the customer's relationship with the bank (mortgage, securities account,
etc.), and the customer response to the last personal loan campaign (Personal Loan). Among
these 5000 customers, only 480 (= 9.6%) accepted the personal loan that was offered to them
in the earlier campaign.
Assumptions
There are no particular assumptions
Exploratory Data Analysis
1. The dimension of the data shows 5000 observations.
2. Some of the categorical variables are converted to factors.
3. Boxplots of the following variables are done.
- Age - No outliers
- Experience - No outliers
- Income – Numerous outliers
- Zipcode – 1 outlier
- Family Members – No outliers
Age - No outliers
- Experience - No outliers
- Income – Numerous outliers
- Zipcode – 1 outlier
Family Members – No outliers
Correlation
The following variables have high correlation.
- Age / Experience
- Income/CCAvg
R Studio Screenshots
Clustering
Following clustering models were applied.
1. K-Means
2. Hierarchical
The K-means algorithm is executed in the following steps:
• Partition of objects into k non-empty subsets
• Identifying the cluster centroids (mean point) of the current partition.
• Assigning each point to a specific cluster
• Compute the distances from each point and allot points to the cluster where
the distance from the centroid is minimum.
• After re-allotting the points, find the centroid of the new cluster formed.
• Three clusters are obtained by the k-means clustering.
• These are not very distinctive clusters.
• In this project the k-means clustering has not been very effective.
The Hierarchical clustering build a cluster hierarchy that is commonly displayed as a
tree diagram called a dendrogram. They begin with each object in a separate cluster.
At each step, the two clusters that are most similar are joined into a single new
cluster. Once fused, objects are never separated.
The horizontal axis represents the clusters. The vertical scale on the dendrogram
represent the distance or dissimilarity. Each joining (fusion) of two clusters is
represented on the diagram by the splitting of a vertical line into two vertical lines.
The vertical position of the split, shown by a short bar gives the distance
(dissimilarity) between the two clusters.
CART Model
The algorithm of decision tree models works by repeatedly partitioning the data into multiple
sub-spaces, so that the outcomes in each final sub-space is as homogeneous as possible. This
approach is technically called recursive partitioning.
The resulting tree is composed of decision nodes, branches and leaf nodes. The tree is placed
from upside to down, so the root is at the top and leaves indicating the outcome is put at the
bottom.
The tree will stop growing by the following three criteria
all leaf nodes are pure with a single class;
a pre-specified minimum number of training observations that cannot be assigned to each leaf
nodes with any splitting methods;
The number of observations in the leaf node reaches the pre-specified minimum one.
However, this full tree including all predictor appears to be very complex and cisdifficult to
interpret in the situation where we have a large data sets with multiple predictors.
Additionally, it is easy to see that, a fully grown tree will overfit the training data and might
lead to poor test set performance.
Unpruned Tree
PRUNING
Unpruned Tree
Briefly, our goal here is to see if a smaller subtree can give us comparable results to the fully
grown tree. If yes, we should go for the simpler tree because it reduces the likelihood of
overfitting.
One possible robust strategy of pruning the tree (or stopping the tree to grow) consists of
avoiding splitting a partition if the split does not significantly improves the overall quality of
the model.
Pruned Tree
Random Forest
Random forest is a tree-based algorithm which involves building several trees (decision
trees), then combining their output to improve generalization ability of the model. The
method of combining trees is known as an ensemble method. Ensembling is a combination of
weak learners (individual trees) to produce a strong learner.
A Random Forest model is created with default parameters and then fine tuned the model by
changing ‘mtry’. We can tune the random forest model by changing the number of trees
(ntree) and the number of variables randomly sampled at each stage (mtry).
Ntree: Number of trees to grow. This should not be set to too small a number, to ensure that
every input row gets predicted at least a few times.
Mtry: Number of variables randomly sampled as candidates at each split. The default values
are different for classification (sqrt(p) where p is number of variables in x) and regression
(p/3)
Testing
After doing the Random Forest in train data. The next step is to apply the same model on the
test data and see its validity.
Model Performance
The following Performance Measure are used to test the robustness of the model in the real
world.
1. Confusion Matrix
2. ROC Curves, Gini Coefficient
3. Concordance-Discordance ratio
Confusion Matrix:
A confusion matrix is a summary of prediction results on a classification problem.
The number of correct and incorrect predictions are summarized with count values and
broken down by each class
ROC Curve
A ROC curve plots the performance of a binary classifier under various threshold
settings; this is measured by true positive rate and false positive rate. If your
classifier predicts “true” more often, it will have more true positives (good) but also
more false positives (bad). If your classifier is more conservative, predicting “true”
less often, it will have fewer false positives but fewer true positives as well. The ROC
curve is a graphical representation of this tradeoff.
Concordance-Discordance ratio
Used concordant and discordant pairs to describe the relationship between pairs of
observations. To calculate the concordant and discordant pairs, the data are treated
as ordinal, so ordinal data should be appropriate for your application. The number of
concordant and discordant pairs are used in calculations for Kendall's tau, which
measures the association between two ordinal variables.
Bank loan purchase modeling

More Related Content

What's hot

Open06
Open06Open06
Open06butest
 
Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.
ASHISH MENKUDALE
 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep Learning
Pranov Mishra
 
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data SetData Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
IJCERT
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
Derek Kane
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data Mining
IRJET Journal
 
Predicting house prices_Regression
Predicting house prices_RegressionPredicting house prices_Regression
Predicting house prices_Regression
Sruti Jain
 
Forecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A ReviewForecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A Review
Dr. Sudhir Kumar Srivastava
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
ijcseit
 
Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
heba_ahmad
 
Dmml report final
Dmml report finalDmml report final
Dmml report final
sarthakkhare3
 
Machine learning session1
Machine learning   session1Machine learning   session1
Machine learning session1
Abhimanyu Dwivedi
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Rebecca Bilbro
 
Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...
Madhav Mishra
 
Use of Linear Regression in Machine Learning for Ranking
Use of Linear Regression in Machine Learning for RankingUse of Linear Regression in Machine Learning for Ranking
Use of Linear Regression in Machine Learning for Ranking
ijsrd.com
 
IRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET- House Rent Price Prediction
IRJET- House Rent Price Prediction
IRJET Journal
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
Galit Shmueli
 

What's hot (19)

Open06
Open06Open06
Open06
 
Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.Data Science: Prediction analysis for houses in Ames, Iowa.
Data Science: Prediction analysis for houses in Ames, Iowa.
 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep Learning
 
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data SetData Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data Mining
 
Predicting house prices_Regression
Predicting house prices_RegressionPredicting house prices_Regression
Predicting house prices_Regression
 
Forecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A ReviewForecasting Methodology Used in Restructured Electricity Market: A Review
Forecasting Methodology Used in Restructured Electricity Market: A Review
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
 
Dmml report final
Dmml report finalDmml report final
Dmml report final
 
Machine learning session1
Machine learning   session1Machine learning   session1
Machine learning session1
 
final paper1
final paper1final paper1
final paper1
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
 
Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 2 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Use of Linear Regression in Machine Learning for Ranking
Use of Linear Regression in Machine Learning for RankingUse of Linear Regression in Machine Learning for Ranking
Use of Linear Regression in Machine Learning for Ranking
 
IRJET- House Rent Price Prediction
IRJET- House Rent Price PredictionIRJET- House Rent Price Prediction
IRJET- House Rent Price Prediction
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
 

Similar to Bank loan purchase modeling

Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
Palin analytics
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Parth Khare
 
QQ Plot.pptx
QQ Plot.pptxQQ Plot.pptx
QQ Plot.pptx
Rahul Borate
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
SowmyaJyothi3
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
Vikash Kumar
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: Insurance
Gregg Barrett
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
IJMER
 
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdfTop Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Artificial Intelligence Board of America
 
copy for Gary Chin.
copy for Gary Chin.copy for Gary Chin.
copy for Gary Chin.Teng Xiaolu
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
INFOGAIN PUBLICATION
 
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
IOSR Journals
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...Yao Wu
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
Dr. C.V. Suresh Babu
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
uetian12
 
PPT s10-machine vision-s2
PPT s10-machine vision-s2PPT s10-machine vision-s2
PPT s10-machine vision-s2
Binus Online Learning
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
Swati .
 
Ijetr021251
Ijetr021251Ijetr021251

Similar to Bank loan purchase modeling (20)

Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
QQ Plot.pptx
QQ Plot.pptxQQ Plot.pptx
QQ Plot.pptx
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: Insurance
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdfTop Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
 
copy for Gary Chin.
copy for Gary Chin.copy for Gary Chin.
copy for Gary Chin.
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
 
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
 
PPT s10-machine vision-s2
PPT s10-machine vision-s2PPT s10-machine vision-s2
PPT s10-machine vision-s2
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
 
Ijetr021251
Ijetr021251Ijetr021251
Ijetr021251
 

More from Saleesh Satheeshchandran

Medilogistics - Ranbaxy
Medilogistics - RanbaxyMedilogistics - Ranbaxy
Medilogistics - Ranbaxy
Saleesh Satheeshchandran
 
Ceat
CeatCeat
Retail logistics solutions for Aditya Birla Retail
Retail logistics solutions for Aditya Birla RetailRetail logistics solutions for Aditya Birla Retail
Retail logistics solutions for Aditya Birla Retail
Saleesh Satheeshchandran
 
Aibono ops framework compressed
Aibono ops framework compressedAibono ops framework compressed
Aibono ops framework compressed
Saleesh Satheeshchandran
 
Voltas compressed
Voltas compressedVoltas compressed
Voltas compressed
Saleesh Satheeshchandran
 
Presentation
PresentationPresentation
Whirlpool rdc & cfa compressed
Whirlpool rdc & cfa compressedWhirlpool rdc & cfa compressed
Whirlpool rdc & cfa compressed
Saleesh Satheeshchandran
 
Aquastar amway sakinaka v 2.0
Aquastar   amway sakinaka v 2.0Aquastar   amway sakinaka v 2.0
Aquastar amway sakinaka v 2.0
Saleesh Satheeshchandran
 
Aquastar corporate presentation ver 8[1].0 draft
Aquastar corporate presentation ver 8[1].0 draftAquastar corporate presentation ver 8[1].0 draft
Aquastar corporate presentation ver 8[1].0 draft
Saleesh Satheeshchandran
 
Dabur distrb final
Dabur distrb finalDabur distrb final
Dabur distrb final
Saleesh Satheeshchandran
 
Tsg introduction rev
Tsg introduction revTsg introduction rev
Tsg introduction rev
Saleesh Satheeshchandran
 
Mahindra retail cases
Mahindra retail casesMahindra retail cases
Mahindra retail cases
Saleesh Satheeshchandran
 
Mahindra logistics presentation
Mahindra logistics presentationMahindra logistics presentation
Mahindra logistics presentation
Saleesh Satheeshchandran
 
Dell project client presentation
Dell project client presentationDell project client presentation
Dell project client presentation
Saleesh Satheeshchandran
 
Coffee Cafe
Coffee CafeCoffee Cafe
Time series forecasting
Time series forecastingTime series forecasting
Time series forecasting
Saleesh Satheeshchandran
 
Cold storage case study
Cold storage case studyCold storage case study
Cold storage case study
Saleesh Satheeshchandran
 

More from Saleesh Satheeshchandran (17)

Medilogistics - Ranbaxy
Medilogistics - RanbaxyMedilogistics - Ranbaxy
Medilogistics - Ranbaxy
 
Ceat
CeatCeat
Ceat
 
Retail logistics solutions for Aditya Birla Retail
Retail logistics solutions for Aditya Birla RetailRetail logistics solutions for Aditya Birla Retail
Retail logistics solutions for Aditya Birla Retail
 
Aibono ops framework compressed
Aibono ops framework compressedAibono ops framework compressed
Aibono ops framework compressed
 
Voltas compressed
Voltas compressedVoltas compressed
Voltas compressed
 
Presentation
PresentationPresentation
Presentation
 
Whirlpool rdc & cfa compressed
Whirlpool rdc & cfa compressedWhirlpool rdc & cfa compressed
Whirlpool rdc & cfa compressed
 
Aquastar amway sakinaka v 2.0
Aquastar   amway sakinaka v 2.0Aquastar   amway sakinaka v 2.0
Aquastar amway sakinaka v 2.0
 
Aquastar corporate presentation ver 8[1].0 draft
Aquastar corporate presentation ver 8[1].0 draftAquastar corporate presentation ver 8[1].0 draft
Aquastar corporate presentation ver 8[1].0 draft
 
Dabur distrb final
Dabur distrb finalDabur distrb final
Dabur distrb final
 
Tsg introduction rev
Tsg introduction revTsg introduction rev
Tsg introduction rev
 
Mahindra retail cases
Mahindra retail casesMahindra retail cases
Mahindra retail cases
 
Mahindra logistics presentation
Mahindra logistics presentationMahindra logistics presentation
Mahindra logistics presentation
 
Dell project client presentation
Dell project client presentationDell project client presentation
Dell project client presentation
 
Coffee Cafe
Coffee CafeCoffee Cafe
Coffee Cafe
 
Time series forecasting
Time series forecastingTime series forecasting
Time series forecasting
 
Cold storage case study
Cold storage case studyCold storage case study
Cold storage case study
 

Recently uploaded

standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 

Recently uploaded (20)

standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 

Bank loan purchase modeling

  • 1. Project – 3. DATA MINING Thera Bank - Loan Purchase Modeling By, Saleesh Satheeshchandran, PGP BABI 2019-20 (G6)
  • 2. Objective of the project - Thera Bank - Loan Purchase Modeling The Objective of the project is to build the best model which can classify the right customers who have a higher probability of purchasing the loan. The dataset has data on 5000 customers. The data include customer demographic information (age, income, etc.), the customer's relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (Personal Loan). Among these 5000 customers, only 480 (= 9.6%) accepted the personal loan that was offered to them in the earlier campaign. Assumptions There are no particular assumptions Exploratory Data Analysis 1. The dimension of the data shows 5000 observations. 2. Some of the categorical variables are converted to factors. 3. Boxplots of the following variables are done. - Age - No outliers - Experience - No outliers - Income – Numerous outliers - Zipcode – 1 outlier - Family Members – No outliers
  • 3. Age - No outliers - Experience - No outliers
  • 4. - Income – Numerous outliers - Zipcode – 1 outlier
  • 5. Family Members – No outliers Correlation The following variables have high correlation. - Age / Experience - Income/CCAvg
  • 7. Clustering Following clustering models were applied. 1. K-Means 2. Hierarchical The K-means algorithm is executed in the following steps: • Partition of objects into k non-empty subsets • Identifying the cluster centroids (mean point) of the current partition. • Assigning each point to a specific cluster • Compute the distances from each point and allot points to the cluster where the distance from the centroid is minimum. • After re-allotting the points, find the centroid of the new cluster formed. • Three clusters are obtained by the k-means clustering. • These are not very distinctive clusters. • In this project the k-means clustering has not been very effective. The Hierarchical clustering build a cluster hierarchy that is commonly displayed as a tree diagram called a dendrogram. They begin with each object in a separate cluster. At each step, the two clusters that are most similar are joined into a single new cluster. Once fused, objects are never separated. The horizontal axis represents the clusters. The vertical scale on the dendrogram represent the distance or dissimilarity. Each joining (fusion) of two clusters is represented on the diagram by the splitting of a vertical line into two vertical lines. The vertical position of the split, shown by a short bar gives the distance (dissimilarity) between the two clusters.
  • 8.
  • 9.
  • 10. CART Model The algorithm of decision tree models works by repeatedly partitioning the data into multiple sub-spaces, so that the outcomes in each final sub-space is as homogeneous as possible. This approach is technically called recursive partitioning. The resulting tree is composed of decision nodes, branches and leaf nodes. The tree is placed from upside to down, so the root is at the top and leaves indicating the outcome is put at the bottom. The tree will stop growing by the following three criteria all leaf nodes are pure with a single class; a pre-specified minimum number of training observations that cannot be assigned to each leaf nodes with any splitting methods; The number of observations in the leaf node reaches the pre-specified minimum one. However, this full tree including all predictor appears to be very complex and cisdifficult to interpret in the situation where we have a large data sets with multiple predictors. Additionally, it is easy to see that, a fully grown tree will overfit the training data and might lead to poor test set performance.
  • 12. PRUNING Unpruned Tree Briefly, our goal here is to see if a smaller subtree can give us comparable results to the fully grown tree. If yes, we should go for the simpler tree because it reduces the likelihood of overfitting. One possible robust strategy of pruning the tree (or stopping the tree to grow) consists of avoiding splitting a partition if the split does not significantly improves the overall quality of the model. Pruned Tree
  • 13.
  • 14. Random Forest Random forest is a tree-based algorithm which involves building several trees (decision trees), then combining their output to improve generalization ability of the model. The method of combining trees is known as an ensemble method. Ensembling is a combination of weak learners (individual trees) to produce a strong learner. A Random Forest model is created with default parameters and then fine tuned the model by changing ‘mtry’. We can tune the random forest model by changing the number of trees (ntree) and the number of variables randomly sampled at each stage (mtry). Ntree: Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. Mtry: Number of variables randomly sampled as candidates at each split. The default values are different for classification (sqrt(p) where p is number of variables in x) and regression (p/3)
  • 15. Testing After doing the Random Forest in train data. The next step is to apply the same model on the test data and see its validity.
  • 16. Model Performance The following Performance Measure are used to test the robustness of the model in the real world. 1. Confusion Matrix 2. ROC Curves, Gini Coefficient 3. Concordance-Discordance ratio Confusion Matrix: A confusion matrix is a summary of prediction results on a classification problem. The number of correct and incorrect predictions are summarized with count values and broken down by each class ROC Curve A ROC curve plots the performance of a binary classifier under various threshold settings; this is measured by true positive rate and false positive rate. If your classifier predicts “true” more often, it will have more true positives (good) but also more false positives (bad). If your classifier is more conservative, predicting “true” less often, it will have fewer false positives but fewer true positives as well. The ROC curve is a graphical representation of this tradeoff. Concordance-Discordance ratio Used concordant and discordant pairs to describe the relationship between pairs of observations. To calculate the concordant and discordant pairs, the data are treated as ordinal, so ordinal data should be appropriate for your application. The number of concordant and discordant pairs are used in calculations for Kendall's tau, which measures the association between two ordinal variables.