ASSIGNMENT 4 -
CLUSTERING
PROBLEM 1 - CREDIT CARD DATASET FOR CLUSTERING
 Steps performed:
 EDA
 Handling missing values
 Building K-means clustering model
 Hyperparameter tunning the model
 Characterize customers who
(a) make frequent purchases using the credit card
(b) have high balances in their account
CHARACTERIZE CUSTOMERS – BALANCE, PURCHASE FREQUENCY
Cluster 2 has grouped customers with :
High Balance, High Purchase Frequency
High amount of purchases done in installment
High amount of cash in advance transactions
Highest credit limit
Cluster 1 has grouped customers with :
Low Balance but relatively moderate purchase
frequency
Low amount of cash in advance transactions,
installment purchases and credit limit
PROBLEM 2 - SOUTH GERMAN CREDIT DATA FOR
CLUSTERING
 Steps performed:
 EDA
 Missing values check and dropping Target column
 Building K-means clustering model
 Hyperparameter tunning the model
 Predicting good/bad credits
 Characterize debtors who are bad credits
PREDICTING
GOOD/BAD
CREDITS
Target column is added to the cluster output dataset
Records are grouped by clusters and based on majority voting; the winning
target column value is assigned as the predicted value to the respective
cluster records
Predicted value is compared against the actual target column value and
accuracy is calculated using right predictions/total records
True positives and false negatives are also calculated by comparing the
predicted vs target values
CHARACTERIZING DEBTORS WHO ARE BAD CREDITS
Cluster 7 and 10 predict bad credits
Taking statistics for these clusters, we can conclude that bad credits are having :
Credit duration around 40 months, High credit amount,
Another debtor or guarantor for the credit,
No valuable property, 35 years in age

Assignment4_Clustering.pptx

  • 1.
  • 2.
    PROBLEM 1 -CREDIT CARD DATASET FOR CLUSTERING  Steps performed:  EDA  Handling missing values  Building K-means clustering model  Hyperparameter tunning the model  Characterize customers who (a) make frequent purchases using the credit card (b) have high balances in their account
  • 3.
    CHARACTERIZE CUSTOMERS –BALANCE, PURCHASE FREQUENCY Cluster 2 has grouped customers with : High Balance, High Purchase Frequency High amount of purchases done in installment High amount of cash in advance transactions Highest credit limit Cluster 1 has grouped customers with : Low Balance but relatively moderate purchase frequency Low amount of cash in advance transactions, installment purchases and credit limit
  • 4.
    PROBLEM 2 -SOUTH GERMAN CREDIT DATA FOR CLUSTERING  Steps performed:  EDA  Missing values check and dropping Target column  Building K-means clustering model  Hyperparameter tunning the model  Predicting good/bad credits  Characterize debtors who are bad credits
  • 5.
    PREDICTING GOOD/BAD CREDITS Target column isadded to the cluster output dataset Records are grouped by clusters and based on majority voting; the winning target column value is assigned as the predicted value to the respective cluster records Predicted value is compared against the actual target column value and accuracy is calculated using right predictions/total records True positives and false negatives are also calculated by comparing the predicted vs target values
  • 6.
    CHARACTERIZING DEBTORS WHOARE BAD CREDITS Cluster 7 and 10 predict bad credits Taking statistics for these clusters, we can conclude that bad credits are having : Credit duration around 40 months, High credit amount, Another debtor or guarantor for the credit, No valuable property, 35 years in age