BANK CHURN CUSTOMER ANALYSIS
DOMAIN - BFSI
PRESENTED BY
HARSH PAKHARE
1. INTRODUCTION
2. PROBLEM IDENTIFICATION
3. ATTRIBUTE /FEATURE DESCRIPTION
4. EXPLORATORY DATA ANALYSIS
5. MODEL BUILDING
6. BUILDING CLASSIFICATION MODEL
7. RESULT AND CONCLUSION
PROJECT CONTENTS
INTRODUCTION
• In the dynamic landscape of the banking industry,
retaining customers is paramount for sustained success.
• Customer churn, or the loss of customers, poses
challenges that this project aims to address through
data-driven insights and proactive strategies.
• This presentation outlines our approach to identifying,
predicting, and mitigating customer churn for the
benefit of our bank and its valued customers
PROBLEM IDENTIFICATION
• Inadequate Customer Insights
• Data Quality Issues
• Dynamic Market Conditions
• Resource Allocation
• Limited Personalization
• Customer Communication Gaps
ATTRIBUTE/FEATURE DESCRIPTION
CustomerID: ID given to the Customer
Surname: Customers LastName
Geography: The place where the customers belongs.
Gender: Customers gender
Age: Customers Age
Tenure: Time duration of customers
Balance: The Amount remaining in the Account
EXPLORATORY DATA ANALYSIS
• IMPORT DATA:
df=pd.read_csv('/content/drive/MyDrive/Classroom/BIA/ML/Churn_Modelling.csv’)
• FIND MISSING VALUES:
No Missing Values
• FINDING FEATURES WITH ONE VALUE:
No features with one value
• CHECKING IF THE DATA IS BALANCED OR NOT ON TARGET
• The Data is highly Imbalanced.
• FINDING CATEGORICAL FEATURE DISTRIBUTION
USING COUNTPLOT
FINDING NUMERICAL FEATURE DISTRIBUTION USING
COUNTPLOT
• CHECKING OUTLIERS USING BOXPLOT
• DROP UNWANTED COLUMNS:
data=data.drop(['CustomerId','Surname','Exited','RowNumber'],axis=1)
we have dropped these columns because it does not have huge impact on
model building. And dropped Exited column because it is Target variable.
• STANDARDIZATION:
Standardization is a preprocessing method used to transform numerical
data by scaling it to have a mean of zero and a standard deviation of one.
This transformation is applied to all features ensuring that they have the
same scale, thus preventing features with larger magnitudes from
dominating the learning algorithm.
• LABEL ENCODER:
As we have Analyzed in EDA we have Total 3 categorical features. Including
the Target column. So before Model building we will convert those into
numerical features,With the help of label encode.
MODEL BUILDING
• DATA IS HIGHLY IMBALANCED SO WE HAVE USED OVER SAMPLING:
• SPLITTING DATASET:
Split our dataset into 80% - 20% ratio
where x= Independent variable
y= Dependent variable
BUILDING CLASSIFICATION MODEL
• WE HAVE USED 3 ALGORITHM TO FIND BEST
ACCURACY:
• DECISION TREE
• XGBOOST CLASSIFIER
• RANDOM FOREST CLASSIFIER
• DECISION TREE:
Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems..
• RANDOMFOREST CLASSIFIER:
Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble learning
• XGBOOST CLASSIFIER:
XGBoost is an optimized distributed gradient boosting library designed
for efficient and scalable training of machine learning models. It is an
ensemble learning method that combines the predictions of multiple weak
models to produce a stronger prediction.
RESULT AND CONCLUSION
RANDOMFOREST gave the best Accuracy : 92.39%
XGBOOST : 91.65%
DECISION TREE : 88.38%
THANK YOU !!!

Employee Churn Prediction: Artificial Intelligence Project Presentation

  • 2.
    BANK CHURN CUSTOMERANALYSIS DOMAIN - BFSI PRESENTED BY HARSH PAKHARE
  • 3.
    1. INTRODUCTION 2. PROBLEMIDENTIFICATION 3. ATTRIBUTE /FEATURE DESCRIPTION 4. EXPLORATORY DATA ANALYSIS 5. MODEL BUILDING 6. BUILDING CLASSIFICATION MODEL 7. RESULT AND CONCLUSION PROJECT CONTENTS
  • 4.
    INTRODUCTION • In thedynamic landscape of the banking industry, retaining customers is paramount for sustained success. • Customer churn, or the loss of customers, poses challenges that this project aims to address through data-driven insights and proactive strategies. • This presentation outlines our approach to identifying, predicting, and mitigating customer churn for the benefit of our bank and its valued customers
  • 5.
    PROBLEM IDENTIFICATION • InadequateCustomer Insights • Data Quality Issues • Dynamic Market Conditions • Resource Allocation • Limited Personalization • Customer Communication Gaps
  • 6.
    ATTRIBUTE/FEATURE DESCRIPTION CustomerID: IDgiven to the Customer Surname: Customers LastName Geography: The place where the customers belongs. Gender: Customers gender Age: Customers Age Tenure: Time duration of customers Balance: The Amount remaining in the Account
  • 7.
    EXPLORATORY DATA ANALYSIS •IMPORT DATA: df=pd.read_csv('/content/drive/MyDrive/Classroom/BIA/ML/Churn_Modelling.csv’) • FIND MISSING VALUES: No Missing Values • FINDING FEATURES WITH ONE VALUE: No features with one value
  • 8.
    • CHECKING IFTHE DATA IS BALANCED OR NOT ON TARGET • The Data is highly Imbalanced.
  • 9.
    • FINDING CATEGORICALFEATURE DISTRIBUTION USING COUNTPLOT
  • 10.
    FINDING NUMERICAL FEATUREDISTRIBUTION USING COUNTPLOT
  • 11.
    • CHECKING OUTLIERSUSING BOXPLOT
  • 12.
    • DROP UNWANTEDCOLUMNS: data=data.drop(['CustomerId','Surname','Exited','RowNumber'],axis=1) we have dropped these columns because it does not have huge impact on model building. And dropped Exited column because it is Target variable. • STANDARDIZATION: Standardization is a preprocessing method used to transform numerical data by scaling it to have a mean of zero and a standard deviation of one. This transformation is applied to all features ensuring that they have the same scale, thus preventing features with larger magnitudes from dominating the learning algorithm. • LABEL ENCODER: As we have Analyzed in EDA we have Total 3 categorical features. Including the Target column. So before Model building we will convert those into numerical features,With the help of label encode.
  • 13.
    MODEL BUILDING • DATAIS HIGHLY IMBALANCED SO WE HAVE USED OVER SAMPLING: • SPLITTING DATASET: Split our dataset into 80% - 20% ratio where x= Independent variable y= Dependent variable
  • 14.
    BUILDING CLASSIFICATION MODEL •WE HAVE USED 3 ALGORITHM TO FIND BEST ACCURACY: • DECISION TREE • XGBOOST CLASSIFIER • RANDOM FOREST CLASSIFIER
  • 15.
    • DECISION TREE: DecisionTree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems..
  • 16.
    • RANDOMFOREST CLASSIFIER: RandomForest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be used for both Classification and Regression problems in ML. It is based on the concept of ensemble learning
  • 17.
    • XGBOOST CLASSIFIER: XGBoostis an optimized distributed gradient boosting library designed for efficient and scalable training of machine learning models. It is an ensemble learning method that combines the predictions of multiple weak models to produce a stronger prediction.
  • 18.
    RESULT AND CONCLUSION RANDOMFORESTgave the best Accuracy : 92.39% XGBOOST : 91.65% DECISION TREE : 88.38%
  • 19.