SlideShare a Scribd company logo
1 of 3
Identifying Default of Credit Card Payments
- Vikas Virani(s3715555)
- Salina Bharthu(s3736867)
 Banks dealing with the risk of potential of customer to repay credit card bills
 Study aims at providing model to predict likelihood of default payment
 Classifying probability of default payment using KNN and Decision tree algorithms.
Data Description:
 Data source: UCI machine learning repository( https://archive.ics.uci.edu/ml/machine-learning-databases/00350/)
 Dataset contains 30000 observations and 24 variables.
 Holds customers’ personal details such as(age, education, gender, marital status)
 Holds financial details such as(balance limit, previous payment status, previous bill amounts, previous amount paid, default payment status)
Data Preparation:
 Checking data against missing values, impossible values and outliers.
- No missing value present
- Outliers are removed using Z-score approach
- Impossible values replaced by nearest possible value
 Assigning appropriate datatype to categorical variables.
 Display summary of nominal variables and numerical variables.
Data Exploration:
 Visualizing individual variables to find the trends
 Visualizing pair of variables to explore hypothesis
- Does personal details (such as age, sex, education, marital status) or balance limit or past payment status impacts
the chances of default?
- Age: Relatively similar chances of default in all age groups
- Sex: Female customers are more probable to default as compared to male.
- Education: Higher chances of default for university graduate as compared to school and high school graduate respectively.
- Marital status: Similar chances of default for married, single and others.
- Balance limit: The bill amount is slightly higher for non-default cards as compared to default cards.
- Previous Payment Status: The chances of default are increased when there is a delay in previous payments for even 1 month.
- Do spending habits vary for different age groups, sex or marital status?
- Age : People below the age of 24 tend to spend less money overall. Whereas, spending habits are equally distributed among other age groups, people
from 55- 65 tend to spend more.
- Sex: Spending habits does not vary in gender.
- Marital status: Married people tend to spend more than others
- Do Payment delays are dependent on Age, Gender or Marital Status of the Person?
- Age: payment status follows similar trend in all age groups
- Gender: Women are having higher proportion of chances of payment delay than men.
- Marital status: Higher number of single card holders having payment delays as compared to married and others.
Feature Selection:
 Using MinMaxScaler() to set all features in the same scale
 Applying F1 Score technique to select the best features of dataset
Data Modelling:
 Splitting up data into training and test dataset.
 Applying KNN and Decision tree classifiers.
 Applying Resampling technique to balance the dataset in terms of target feature values.
 Using GridSearchCV to find best suitable combination of attribute values of classifier.
 Validating model using confusion matrix, Classification report, accuracy score and error rate.
Conclusion:
 To effectively classify unseen data for Default Payment Status, Decision tree classifier works better.
Classification Algorithm Training- test data
proportion
Accuracy score Error rate
K nearest neighbour 80% - 20% 0.742 0.258
K nearest neighbour 60% - 40% 0.726 0.274
K nearest neighbour 50% - 50% 0.732 0.268
Decision Tree 80% - 20% 0.800 0.200
Decision Tree 60% - 40% 0.803 0.197
Decision Tree 50% - 50% 0.789 0.211

More Related Content

What's hot

Business statistics review
Business statistics reviewBusiness statistics review
Business statistics review
FELIXARCHER
 
Statistics for management
Statistics for managementStatistics for management
Statistics for management
John Prarthan
 
A model for profit pattern mining based on genetic algorithm
A model for profit pattern mining based on genetic algorithmA model for profit pattern mining based on genetic algorithm
A model for profit pattern mining based on genetic algorithm
eSAT Journals
 
Add slides
Add slidesAdd slides
Add slides
Rupa D
 

What's hot (17)

Statistical Analysis of Small & Micro Entrepreneurs (category :Vegetable Vend...
Statistical Analysis of Small & Micro Entrepreneurs (category :Vegetable Vend...Statistical Analysis of Small & Micro Entrepreneurs (category :Vegetable Vend...
Statistical Analysis of Small & Micro Entrepreneurs (category :Vegetable Vend...
 
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICSCOMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
COMPARISON OF BANKRUPTCY PREDICTION MODELS WITH PUBLIC RECORDS AND FIRMOGRAPHICS
 
Statistics...
Statistics...Statistics...
Statistics...
 
Business statistics
Business statistics Business statistics
Business statistics
 
Business statistics review
Business statistics reviewBusiness statistics review
Business statistics review
 
Statistics for management
Statistics for managementStatistics for management
Statistics for management
 
Statics for the management
Statics for the managementStatics for the management
Statics for the management
 
Statistics for management assignment
Statistics for management assignmentStatistics for management assignment
Statistics for management assignment
 
Introduction to Statistics - Basic Statistical Terms
Introduction to Statistics - Basic Statistical TermsIntroduction to Statistics - Basic Statistical Terms
Introduction to Statistics - Basic Statistical Terms
 
Introduction concepts of Statistics
Introduction concepts of StatisticsIntroduction concepts of Statistics
Introduction concepts of Statistics
 
Statistics
StatisticsStatistics
Statistics
 
What is statistics
What is statisticsWhat is statistics
What is statistics
 
Business Statistics Notes for Business and Commerce Department
Business Statistics Notes for Business and Commerce DepartmentBusiness Statistics Notes for Business and Commerce Department
Business Statistics Notes for Business and Commerce Department
 
A model for profit pattern mining based on genetic algorithm
A model for profit pattern mining based on genetic algorithmA model for profit pattern mining based on genetic algorithm
A model for profit pattern mining based on genetic algorithm
 
Add slides
Add slidesAdd slides
Add slides
 
Stats notes
Stats notesStats notes
Stats notes
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
 

Similar to Pds assignment 2 presentation

Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data AnalyticsSupply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
MujtabaAliKhan12
 
Predictive analytics-white-paper
Predictive analytics-white-paperPredictive analytics-white-paper
Predictive analytics-white-paper
Shubhashish Biswas
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
AschalewAyele2
 
credit scoring paper published in eswa
credit scoring paper published in eswacredit scoring paper published in eswa
credit scoring paper published in eswa
Akhil Bandhu Hens, FRM
 

Similar to Pds assignment 2 presentation (20)

Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data AnalyticsSupply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
Supply Chain Analytics, Supply Chain Management, Supply Chain Data Analytics
 
Construction of a robust prediction model to forecast the likelihood of a cre...
Construction of a robust prediction model to forecast the likelihood of a cre...Construction of a robust prediction model to forecast the likelihood of a cre...
Construction of a robust prediction model to forecast the likelihood of a cre...
 
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
 
Predictive analytics
Predictive analyticsPredictive analytics
Predictive analytics
 
Fairness and Ethics in A
Fairness and Ethics in AFairness and Ethics in A
Fairness and Ethics in A
 
03_AJMS_298_21.pdf
03_AJMS_298_21.pdf03_AJMS_298_21.pdf
03_AJMS_298_21.pdf
 
Scoring and predicting risk preferences
Scoring and predicting risk preferencesScoring and predicting risk preferences
Scoring and predicting risk preferences
 
Detecting health insurance fraud using analytics
Detecting health insurance fraud using analytics Detecting health insurance fraud using analytics
Detecting health insurance fraud using analytics
 
Predictive analytics-white-paper
Predictive analytics-white-paperPredictive analytics-white-paper
Predictive analytics-white-paper
 
Data mining
Data miningData mining
Data mining
 
Spatial Risk Diffusion: Predicting risk linked to human behavior
Spatial Risk Diffusion:  Predicting risk linked to human behaviorSpatial Risk Diffusion:  Predicting risk linked to human behavior
Spatial Risk Diffusion: Predicting risk linked to human behavior
 
Microfinance data collection digitally,and impact measurement in Bangladesh
Microfinance  data collection digitally,and  impact measurement in BangladeshMicrofinance  data collection digitally,and  impact measurement in Bangladesh
Microfinance data collection digitally,and impact measurement in Bangladesh
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 
Data analytics
Data analyticsData analytics
Data analytics
 
fast publication journals
fast publication journalsfast publication journals
fast publication journals
 
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERSPROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
PROBABILISTIC CREDIT SCORING FOR COHORTS OF BORROWERS
 
Keys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycleKeys to extract value from the data analytics life cycle
Keys to extract value from the data analytics life cycle
 
credit scoring paper published in eswa
credit scoring paper published in eswacredit scoring paper published in eswa
credit scoring paper published in eswa
 
Engineering Ethics: Practicing Fairness
Engineering Ethics: Practicing FairnessEngineering Ethics: Practicing Fairness
Engineering Ethics: Practicing Fairness
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 

Recently uploaded

如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 

Recently uploaded (20)

NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
123.docx. .
123.docx.                                 .123.docx.                                 .
123.docx. .
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 

Pds assignment 2 presentation

  • 1. Identifying Default of Credit Card Payments - Vikas Virani(s3715555) - Salina Bharthu(s3736867)  Banks dealing with the risk of potential of customer to repay credit card bills  Study aims at providing model to predict likelihood of default payment  Classifying probability of default payment using KNN and Decision tree algorithms. Data Description:  Data source: UCI machine learning repository( https://archive.ics.uci.edu/ml/machine-learning-databases/00350/)  Dataset contains 30000 observations and 24 variables.  Holds customers’ personal details such as(age, education, gender, marital status)  Holds financial details such as(balance limit, previous payment status, previous bill amounts, previous amount paid, default payment status) Data Preparation:  Checking data against missing values, impossible values and outliers. - No missing value present - Outliers are removed using Z-score approach - Impossible values replaced by nearest possible value  Assigning appropriate datatype to categorical variables.  Display summary of nominal variables and numerical variables.
  • 2. Data Exploration:  Visualizing individual variables to find the trends  Visualizing pair of variables to explore hypothesis - Does personal details (such as age, sex, education, marital status) or balance limit or past payment status impacts the chances of default? - Age: Relatively similar chances of default in all age groups - Sex: Female customers are more probable to default as compared to male. - Education: Higher chances of default for university graduate as compared to school and high school graduate respectively. - Marital status: Similar chances of default for married, single and others. - Balance limit: The bill amount is slightly higher for non-default cards as compared to default cards. - Previous Payment Status: The chances of default are increased when there is a delay in previous payments for even 1 month. - Do spending habits vary for different age groups, sex or marital status? - Age : People below the age of 24 tend to spend less money overall. Whereas, spending habits are equally distributed among other age groups, people from 55- 65 tend to spend more. - Sex: Spending habits does not vary in gender. - Marital status: Married people tend to spend more than others - Do Payment delays are dependent on Age, Gender or Marital Status of the Person? - Age: payment status follows similar trend in all age groups - Gender: Women are having higher proportion of chances of payment delay than men. - Marital status: Higher number of single card holders having payment delays as compared to married and others.
  • 3. Feature Selection:  Using MinMaxScaler() to set all features in the same scale  Applying F1 Score technique to select the best features of dataset Data Modelling:  Splitting up data into training and test dataset.  Applying KNN and Decision tree classifiers.  Applying Resampling technique to balance the dataset in terms of target feature values.  Using GridSearchCV to find best suitable combination of attribute values of classifier.  Validating model using confusion matrix, Classification report, accuracy score and error rate. Conclusion:  To effectively classify unseen data for Default Payment Status, Decision tree classifier works better. Classification Algorithm Training- test data proportion Accuracy score Error rate K nearest neighbour 80% - 20% 0.742 0.258 K nearest neighbour 60% - 40% 0.726 0.274 K nearest neighbour 50% - 50% 0.732 0.268 Decision Tree 80% - 20% 0.800 0.200 Decision Tree 60% - 40% 0.803 0.197 Decision Tree 50% - 50% 0.789 0.211