Employee Churn Prediction: Artificial Intelligence Project Presentation

BANK CHURN CUSTOMER ANALYSIS
DOMAIN - BFSI
PRESENTED BY
HARSH PAKHARE

1. INTRODUCTION
2. PROBLEM IDENTIFICATION
3. ATTRIBUTE /FEATURE DESCRIPTION
4. EXPLORATORY DATA ANALYSIS
5. MODEL BUILDING
6. BUILDING CLASSIFICATION MODEL
7. RESULT AND CONCLUSION
PROJECT CONTENTS

INTRODUCTION
• In the dynamic landscape of the banking industry,
retaining customers is paramount for sustained success.
• Customer churn, or the loss of customers, poses
challenges that this project aims to address through
data-driven insights and proactive strategies.
• This presentation outlines our approach to identifying,
predicting, and mitigating customer churn for the
benefit of our bank and its valued customers

PROBLEM IDENTIFICATION
• Inadequate Customer Insights
• Data Quality Issues
• Dynamic Market Conditions
• Resource Allocation
• Limited Personalization
• Customer Communication Gaps

ATTRIBUTE/FEATURE DESCRIPTION
CustomerID: ID given to the Customer
Surname: Customers LastName
Geography: The place where the customers belongs.
Gender: Customers gender
Age: Customers Age
Tenure: Time duration of customers
Balance: The Amount remaining in the Account

EXPLORATORY DATA ANALYSIS
• IMPORT DATA:
df=pd.read_csv('/content/drive/MyDrive/Classroom/BIA/ML/Churn_Modelling.csv’)
• FIND MISSING VALUES:
No Missing Values
• FINDING FEATURES WITH ONE VALUE:
No features with one value

• CHECKING IF THE DATA IS BALANCED OR NOT ON TARGET
• The Data is highly Imbalanced.

• FINDING CATEGORICAL FEATURE DISTRIBUTION
USING COUNTPLOT

FINDING NUMERICAL FEATURE DISTRIBUTION USING
COUNTPLOT

• CHECKING OUTLIERS USING BOXPLOT

• DROP UNWANTED COLUMNS:
data=data.drop(['CustomerId','Surname','Exited','RowNumber'],axis=1)
we have dropped these columns because it does not have huge impact on
model building. And dropped Exited column because it is Target variable.
• STANDARDIZATION:
Standardization is a preprocessing method used to transform numerical
data by scaling it to have a mean of zero and a standard deviation of one.
This transformation is applied to all features ensuring that they have the
same scale, thus preventing features with larger magnitudes from
dominating the learning algorithm.
• LABEL ENCODER:
As we have Analyzed in EDA we have Total 3 categorical features. Including
the Target column. So before Model building we will convert those into
numerical features,With the help of label encode.

MODEL BUILDING
• DATA IS HIGHLY IMBALANCED SO WE HAVE USED OVER SAMPLING:
• SPLITTING DATASET:
Split our dataset into 80% - 20% ratio
where x= Independent variable
y= Dependent variable

BUILDING CLASSIFICATION MODEL
• WE HAVE USED 3 ALGORITHM TO FIND BEST
ACCURACY:
• DECISION TREE
• XGBOOST CLASSIFIER
• RANDOM FOREST CLASSIFIER

• DECISION TREE:
Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems..

• RANDOMFOREST CLASSIFIER:
Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble learning

• XGBOOST CLASSIFIER:
XGBoost is an optimized distributed gradient boosting library designed
for efficient and scalable training of machine learning models. It is an
ensemble learning method that combines the predictions of multiple weak
models to produce a stronger prediction.

RESULT AND CONCLUSION
RANDOMFOREST gave the best Accuracy : 92.39%
XGBOOST : 91.65%
DECISION TREE : 88.38%

Employee Churn Prediction: Artificial Intelligence Project Presentation

More Related Content

Similar to Employee Churn Prediction: Artificial Intelligence Project Presentation

More from Boston Institute of Analytics

Recently uploaded

Employee Churn Prediction: Artificial Intelligence Project Presentation