- This case study aims to identify patterns which indicate if a client has difficulty paying their instalments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc.
- This will ensure that the consumers capable of repaying the loan are not rejected.
- Identification of such applicant's using EDA is the aim of this case study.
Embark on a captivating journey into the realm of customer churn prediction with this insightful data analysis project presented by Boston Institute of Analytics. Our talented students delve into the intricacies of customer behavior, leveraging advanced data analysis techniques to forecast and mitigate churn risks. From examining historical customer data and purchase patterns to identifying predictive indicators and developing robust churn prediction models, this project offers a comprehensive exploration of the factors influencing customer retention. Gain invaluable insights and actionable recommendations derived from rigorous data analysis, presented in an engaging and informative format. Don't miss this opportunity to delve into the fascinating world of customer churn prediction and unlock new perspectives on customer relationship management. Explore the project now and embark on a journey of discovery with Boston Institute of Analytics. To learn more about our data science and artificial intelligence programs, visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/.
The data set used in this project is available in the Kaggle and contains nineteen columns (independent variables) that indicate the characteristics of the clients of a fictional telecommunications corporation. The Churn column (response variable) indicates whether the customer departed within the last month or not. The class No includes the clients that did not leave the company last month, while the class YES contains the clients that decided to terminate their relations with the company. The objective of the analysis is to obtain the relation between the customer’s characteristics and the churn.
Explore our students' cutting-edge project on predicting bank customer churn using advanced analytics techniques. This project employs machine learning algorithms to analyze customer data and forecast the likelihood of churn, offering valuable insights for financial institutions. Gain insights into customer retention strategies, predictive modeling, and the potential impact on banking operations. To learn more, do check out https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
Data Exploration and Reporting Data exploration, preparation, and reporting on two related datasets relevant to the Fintech domain.
This report covers the data exploration and transformation of financial data. It also explains the patterns of data which were identified by applying the KDD process with the help of data quality report.
Embark on a captivating journey into the realm of customer churn prediction with this insightful data analysis project presented by Boston Institute of Analytics. Our talented students delve into the intricacies of customer behavior, leveraging advanced data analysis techniques to forecast and mitigate churn risks. From examining historical customer data and purchase patterns to identifying predictive indicators and developing robust churn prediction models, this project offers a comprehensive exploration of the factors influencing customer retention. Gain invaluable insights and actionable recommendations derived from rigorous data analysis, presented in an engaging and informative format. Don't miss this opportunity to delve into the fascinating world of customer churn prediction and unlock new perspectives on customer relationship management. Explore the project now and embark on a journey of discovery with Boston Institute of Analytics. To learn more about our data science and artificial intelligence programs, visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/.
The data set used in this project is available in the Kaggle and contains nineteen columns (independent variables) that indicate the characteristics of the clients of a fictional telecommunications corporation. The Churn column (response variable) indicates whether the customer departed within the last month or not. The class No includes the clients that did not leave the company last month, while the class YES contains the clients that decided to terminate their relations with the company. The objective of the analysis is to obtain the relation between the customer’s characteristics and the churn.
Explore our students' cutting-edge project on predicting bank customer churn using advanced analytics techniques. This project employs machine learning algorithms to analyze customer data and forecast the likelihood of churn, offering valuable insights for financial institutions. Gain insights into customer retention strategies, predictive modeling, and the potential impact on banking operations. To learn more, do check out https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
Data Exploration and Reporting Data exploration, preparation, and reporting on two related datasets relevant to the Fintech domain.
This report covers the data exploration and transformation of financial data. It also explains the patterns of data which were identified by applying the KDD process with the help of data quality report.
Explore in-depth insights into the intricate world of bank loan approval with this compelling data analysis project presented by Boston Institute of Analytics. Our talented students delve into the complexities of loan approval processes, leveraging advanced data analysis techniques to uncover patterns, trends, and factors influencing loan decisions. From evaluating credit scores and income levels to analyzing loan terms and default rates, this project offers a comprehensive examination of the key metrics and variables impacting bank loan approval. Gain valuable insights and actionable recommendations derived from rigorous data analysis, presented in an engaging and informative format. Don't miss this opportunity to delve into the fascinating realm of data analysis and unlock new perspectives on bank loan approval dynamics. Explore the project now and embark on a journey of discovery with Boston Institute of Analytics. To learn more about our data science and artificial intelligence programs, visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/.
The process by which many sales leads are narrowed down to a smaller number of actual sales is often referred to as a sales funnel. The idea behind the funnel image is that many sales leads enter into the wider top of the funnel, but as a result of exclusions, specific targeting criteria, and customer choices, only some of them will actually emerge from the narrow end of the funnel and result in sales.
This presentation focuses on a specific application of SAS, namely how to structure a SAS data set to facilitate sales funnel analysis, and how to analyze this data to find expansion opportunities in any industry to which the sales funnel concept is applicable. With a properly structured data set, the SAS code needed for sales funnel analysis is straightforward, and can generate significant return on investment.
The waterfall is a key concept in sales funnel analysis, and it will be covered, but this paper will focus on increasing the number of waterfall survivors rather than the technical aspects of creating a waterfall chart in SAS.
The capstone project is a Machine Learning application that creates a model for a famous bank in New Jersey.
It analyzes their Clients who took loans in their bank based on various parameters.
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
This case study aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
As part of our team's enrollment for Data Science Super Specialization course under UpX Academy, we submitted many projects for our final assessments, one of them was Telecom Churn Analysis Model.
The input data was provided by UpX academy and language we used is R. As part of the project, our main objective was :-
-> To predict Customer Churn.
-> To Highlight the main variables/factors influencing Customer Churn.
-> To Use various ML algorithms to build prediction models, evaluate the accuracy and performance of these models.
-> Finding out the best model for our business case & providing executive Summary.
To address the mentioned business problem, we tried to follow a thorough approach. We did a detailed level Exploratory Data Analysis which consists of various Box Plots, Bar Plots etc..
Further we tried our best to build as many Classification models possible which fits our business case (Logistic Regression/kNN/Decision Trees/Random Forest/SVM) and also tried to touch Cox Hazard Survival analysis Model. Later for every model we tried to boost their performances by applying various performance tuning techniques.
As we all are still into our learning mode w.r.t these concepts & starting new, please feel free to provide feedback on our work. Any suggestions are most welcome... :)
Thanks!!
Reduction in customer complaints - Mortgage IndustryPranov Mishra
The project aims at analysis of Customer Complaints/Inquiries received by a US based mortgage (loan) servicing company..
The goal of the project is building a predictive model using the identified significant
contributors and coming up with recommendations for changes which will lead to
1. Reducing Re-work
2. Reducing Operational Cost
3. Improve Customer Satisfaction
4. Improve company preparedness to respond to customer.
Three models were built - Logistic Regression, Random Forest and Gradient Boosting. It was seen that the accuracy, auc (Area under the curve), sensitivity and specificity improved drastically as the model complexity increased from simple to complex.
Logistic regression was not generalizing well to a non-linear data. So the model was suffering from both bias and variance. Random Forest is an ensemble technique in itself and helps with reducing variance to a great extent. Gradient Boosting, with its sequential learning ability, helps reduce the bias. The results from both random forest and gradient boosting did not differ by much. This is confirming the bias-variance trade-off concept which states that complex models will do well on non-linear data as the inflexible simple models will have high bias and can have high variance.
Additionally, a lift chart was built which gives a Cumulative lift of 133% in the first four deciles
Dimensionality & Dimensions of Hyperion Planningepmvirtual.com
In this tutorial we are providing the details of Oracle Hyperion Planning applications' dimensions and their details.
This guide is presented to you by epmvirtual.com
Explore in-depth insights into the intricate world of bank loan approval with this compelling data analysis project presented by Boston Institute of Analytics. Our talented students delve into the complexities of loan approval processes, leveraging advanced data analysis techniques to uncover patterns, trends, and factors influencing loan decisions. From evaluating credit scores and income levels to analyzing loan terms and default rates, this project offers a comprehensive examination of the key metrics and variables impacting bank loan approval. Gain valuable insights and actionable recommendations derived from rigorous data analysis, presented in an engaging and informative format. Don't miss this opportunity to delve into the fascinating realm of data analysis and unlock new perspectives on bank loan approval dynamics. Explore the project now and embark on a journey of discovery with Boston Institute of Analytics. To learn more about our data science and artificial intelligence programs, visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/.
The process by which many sales leads are narrowed down to a smaller number of actual sales is often referred to as a sales funnel. The idea behind the funnel image is that many sales leads enter into the wider top of the funnel, but as a result of exclusions, specific targeting criteria, and customer choices, only some of them will actually emerge from the narrow end of the funnel and result in sales.
This presentation focuses on a specific application of SAS, namely how to structure a SAS data set to facilitate sales funnel analysis, and how to analyze this data to find expansion opportunities in any industry to which the sales funnel concept is applicable. With a properly structured data set, the SAS code needed for sales funnel analysis is straightforward, and can generate significant return on investment.
The waterfall is a key concept in sales funnel analysis, and it will be covered, but this paper will focus on increasing the number of waterfall survivors rather than the technical aspects of creating a waterfall chart in SAS.
The capstone project is a Machine Learning application that creates a model for a famous bank in New Jersey.
It analyzes their Clients who took loans in their bank based on various parameters.
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
This case study aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
As part of our team's enrollment for Data Science Super Specialization course under UpX Academy, we submitted many projects for our final assessments, one of them was Telecom Churn Analysis Model.
The input data was provided by UpX academy and language we used is R. As part of the project, our main objective was :-
-> To predict Customer Churn.
-> To Highlight the main variables/factors influencing Customer Churn.
-> To Use various ML algorithms to build prediction models, evaluate the accuracy and performance of these models.
-> Finding out the best model for our business case & providing executive Summary.
To address the mentioned business problem, we tried to follow a thorough approach. We did a detailed level Exploratory Data Analysis which consists of various Box Plots, Bar Plots etc..
Further we tried our best to build as many Classification models possible which fits our business case (Logistic Regression/kNN/Decision Trees/Random Forest/SVM) and also tried to touch Cox Hazard Survival analysis Model. Later for every model we tried to boost their performances by applying various performance tuning techniques.
As we all are still into our learning mode w.r.t these concepts & starting new, please feel free to provide feedback on our work. Any suggestions are most welcome... :)
Thanks!!
Reduction in customer complaints - Mortgage IndustryPranov Mishra
The project aims at analysis of Customer Complaints/Inquiries received by a US based mortgage (loan) servicing company..
The goal of the project is building a predictive model using the identified significant
contributors and coming up with recommendations for changes which will lead to
1. Reducing Re-work
2. Reducing Operational Cost
3. Improve Customer Satisfaction
4. Improve company preparedness to respond to customer.
Three models were built - Logistic Regression, Random Forest and Gradient Boosting. It was seen that the accuracy, auc (Area under the curve), sensitivity and specificity improved drastically as the model complexity increased from simple to complex.
Logistic regression was not generalizing well to a non-linear data. So the model was suffering from both bias and variance. Random Forest is an ensemble technique in itself and helps with reducing variance to a great extent. Gradient Boosting, with its sequential learning ability, helps reduce the bias. The results from both random forest and gradient boosting did not differ by much. This is confirming the bias-variance trade-off concept which states that complex models will do well on non-linear data as the inflexible simple models will have high bias and can have high variance.
Additionally, a lift chart was built which gives a Cumulative lift of 133% in the first four deciles
Dimensionality & Dimensions of Hyperion Planningepmvirtual.com
In this tutorial we are providing the details of Oracle Hyperion Planning applications' dimensions and their details.
This guide is presented to you by epmvirtual.com
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
2. •If the applicant is likely to repay the loan, then not approving the loan results in a loss of business to the company
•If the applicant is not likely to repay the loan, i.e. he/she is likely to default, then approving the loan may lead to a
financial loss for the company.
This case study aims to identify patterns which indicate if a client has difficulty paying their instalments which may
be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a
higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected.
Identification of such applicant's using EDA is the aim of this case study.
2
Business Understanding
3. What are the Datasets providedfor Analysis?
❑ There are Two major Datasets provided which are mentioned below:
1. Application Data.
2. Previous Application Data.
❑ These file provided in “Comma Separated Values” Format. (.csv).
❑ Another file provided with Column Descriptions for defining and understanding each columns contribution.
❑ Prerequisites:
1. Programming Language: Python.
2. Platform: Jupyter Notebook.
3. Libraries: Pandas, Numpy, Matplotlib, Seaborn, itertools and some Warnings
3
4. Description Assigned Variable
Data Set - 1: "Application_data.csv" ap_dt
Data Set - 2: "previous_application.csv" pr_ap_dt
For Null values defined as nulls
To store Null Total Values mis_val
Null Values in ap_dt> 50% nul_50
Null Values in ap_dt > 15% nul_15
Storing Relevant Values nrel
For Columns Flag Col_flag
To store all flag columns and Target columns
dt_flg
Assigning Variables:
Note: There are many other Variables used in data analysis process those variables are mentioned in Jupyter Notebook
4
5. .
Data Understanding:
➢ Number of Columns: 122
➢ Number of Rows: 307511
➢ Data Types: Integer, Float, Strings
➢ Descriptive view of Data file: There were anomalies like
negative numbers, Null values, Days and Years were not in
proper Format
• Float64: 65
• Int64: 41
• Object: 16
1. Number of Columns: 37
2. Number of Rows: 1670214
3. Data Types: Integer, Float, Strings
4. Descriptive view of Data file: There were anomalies like
negative numbers, Null values, Days and Years were not in
proper Format
Float64: 15
Int64: 06
Object: 16
1. Application_data.csv [ap_dt]
Required Details mentioned below:
2. Previous_Application_data.csv [pr_ap_dt]
5
6. • Rectify the null values.
• Filtering unwanted data columns.
• Filling the missing values.
• Sorting the data.
• Fixing the datatype
To Remove unwanted or irrelevant columns,
• First, I have calculated null values “nulls(ap_dt)”
• Then calculated the values in term of percentage.
• Found that there were above 41 columns which consists more than 50% Null values.
• By comparing the columns contribution with given csv file (Columns_Description.csv), I have removed the irrelevant
columns.
• Similarly, After removing the 50% data there were 10 Columns which are more than 15% Null Values.
How we did that?
Data Cleaning & Manipulation for Application Data:
6
7. • After double checking the 15% Null Values, There was out-sourced data columns which are provided by externally.
• Source Columns: EXT_SOURCE_2 & EXT_SOURCE_2.
• What is the relation between these 2 values? As per column description datafile, These are normalized values from external data
source.
Correlation & Causation:
Data Cleaning & Manipulation for Application Data:
7
8. • By Above mentioned Correlation Heatmap, We found that There is no relation and much contribution
• These data doesn't cause causation.
• So, on this base I have Removed the EXT_SOURCE_2 & EXT_SOURCE_2 columns.
• After Removing All these columns, we left with 71 Columns.
• This 71 Columns includes 28 Flag Columns:
• In which there are Email, phone, Car, work and other important data were stored.
• To analyse the Flag data, I have combined all the flag columns in one variable “col_flag” .
• Includes “Target” Variable, Which has Explains (1 - client with payment difficulties: he/she had late payment more than X days on at least
one of the first Y instalments of the loan in our sample, 0 - all other cases)
• For analysis we need to find Payers & Defaulters, for that I have changed data from 1’s & 0’s to “Defaulter” and “Repayer” .
Analysing EXT_SOURCE_2 & EXT_SOURCE_2 Columns, Flag Columns & Target Columns
Data Cleaning & Manipulation for Application Data:
8
9. Analyzing Flag columns & Target column:
Bar Graph Analysis:
• By Observing the Graph:
• defaulters:
• (FLAG_OWN_REALTY,
• FLAG_MOBIL ,
• FLAG_EMP_PHONE,
• FLAG_CONT_MOBILE,
• FLAG_DOCUMENT_3
• These columns make relativity thus we can include these below
columns:
• FLAG_DOCUMENT_3,
• FLAG_OWN_REALTY,
• FLAG_MOBIL
• We can remove all other FLAG columns..
9
10. Imputing Values:
• In that 10 columns there was a column
“OCCUPATION_TYPE” , Which describes
the user occupation was having 31% of
Null Values.
• I have used “Unknown” variable to fill
those 31% null values.
• First Highest percentage is: "Unknown“
• Second Highest percentage is: "Laborers"
10
11. Standardize the Values:
Very high value data columns:
AMT_INCOME_TOTAL,
AMT_CREDIT,
AMT_GOODS_PRICE
Converting these numerical columns in categorical columns for better understanding.
Negative values Data columns: -
DAYS_BIRTH,
DAYS_EMPLOYED,
DAYS_REGISTRATION,
DAYS_ID_PUBLISH,
DAYS_LAST_PHONE_CHANGE.
Need to Make it correct those values convert DAYS_BIRTH to AGE in years , DAYS_EMPLOYED to YEARS EMPLOYED.
11
12. StandardizingAMT_INCOME_TOTAL, AMT_CREDIT, AMT_GOODS_PRICE columns:
• It has pricing from 0 to lakhs. so, lets make category and divide the pricing.
• Make Income Range range from 0 to 10 Lakhs.
bins = [0,1,2,3,4,5,6,7,8,9,10,11]
slot = ['0-1L','1L-2L', '2L-3L','3L-4L','4L-5L','5L-6L','6L-7L','7L-8L','8L-9L','9L-10L','10L Above’]
• Make Credit Range range from 0 to 10 Lakhs.
bins = [0,1,2,3,4,5,6,7,8,9,10,100]
slots = ['0-1L','1L-2L', '2L-3L','3L-4L','4L-5L','5L-6L','6L-7L','7L-8L','8L-9L','9L-10L','10L Above']
• Make Price of Goods range from 0 to 10 Lakhs.
bins = [0,1,2,3,4,5,6,7,8,9,10,100]
slots = ['0-1L','1L-2L', '2L-3L','3L-4L','4L-5L','5L-6L','6L-7L','7L-8L','8L-9L','9L-10L','10L Above']
Standardize the Values:
12
13. Standardizing– Negative Columns
As Mentioned below,
Negative values Data columns:
◦ DAYS_BIRTH,
◦ DAYS_EMPLOYED,
◦ DAYS_REGISTRATION,
◦ DAYS_ID_PUBLISH,
◦ DAYS_LAST_PHONE_CHANGE.
Using Absolute function converting Negative Values to Positive Values
Standardize the Values:
Before: - ve Values
13
14. Standardizing– Negative Columns
As Mentioned below,
Positive values Data columns:
◦ DAYS_BIRTH,
◦ DAYS_EMPLOYED,
◦ DAYS_REGISTRATION,
◦ DAYS_ID_PUBLISH,
◦ DAYS_LAST_PHONE_CHANGE.
By Using Absolute function converting Negative Values to Positive Values
Standardize the Values:
After: + ve Values
14
15. Find the Outliners
• Max Outliners: AMT_ANNUITY,
AMT_CREDIT,
AMT_GOODS_PRICE,CNT_CHILDREN
• Min Outliners: AMT_INCOME_TOTAL
• No Outliners: DAYS_BIRTH
Standardize the Values:
15
16. ▪ States that: Application_Data.csv:
▪ There are: 3,07,511 Rows & 53 Columns.
▪ Types of datatypes available:
▪ Integers,
▪ Float values,
▪ Strings.
▪ Found the Null values, Filled them with "Unknown" variable.
▪ Removed unwanted columns and other columns.
▪ We have worked on the negative values and converted them into positive values in some of columns.
▪ We have converted Values in proper format.
▪ Now file is neat and clean for further process.
Summary on Datasets: Application_Data.csv
16
17. ▪ States that: Application_Data.csv:
▪ There are: 1670214 Rows & 37 Columns.
▪ Types of datatypes available:
▪ Integers,
▪ Float values,
▪ Strings.
▪ Found the Null values, Filled them with "Unknown" variable.
▪ Removed unwanted columns and other columns.
▪ We have worked on the negative values and converted them into positive values in some of columns.
▪ We have converted Values in proper format.
▪ Now file is neat and clean for further process.
Summary on Datasets: Previous_Application_Data.csv
17
18. 1. Plotting kde plot for "AMT_GOODS_PRICE" to understandthe distribution
Data Set Analyzing using Graphical Representation.
Analyzing The Data using Kdeplot:
• There are several peaks along the distribution.
Let's impute using the mode, mean and median
and see if the distribution is still about the
same.
18
19. 2. plotting a kdeplot to understanddistribution of "AMT_ANNUITY"
Data Set Analyzing using Graphical Representation.
Analyzing The Data using Kdeplot:
* There is a single peak at the left side of the
distribution, and it indicates the presence of
outliers and hence imputing with mean
would not be the right approachand hence
imputing with median.
19
20. Data Set Analyzing using Graphical Representation.
Analyzing The Data using Kdeplot:
The original distribution is closer
with the distribution of data
imputed with mode in this case,
thus will impute mode for missing
values
20
22. Repayers & Defaulters
Data Set Analyzing using Graphical Representation.
• Repayer Percentage is 91.93%
• Defaulter Percentage is 8.07%
• Imbalance Ratio with respect to
• Repayer and Defaulter is given: 11.39/1
(approx)
22
23. Gender wise Analysis
Based on the percentageof
default credits, maleshave a
higher chance of not returning
their loans, comparing with
women.
Data Set Analyzing using Graphical Representation.
AnalyzingUnivariate,Bivariate,Multivariate :
CategoricalUnivariateVariablesAnalysis
23
24. Education wise Analysis
Majority of clients have
Secondary/secondary special
education, followed byclients
with Higher education.
Veryfew clients have an academic
degree Lowersecondary category
have highest rate of defaulter.
Peoplewith Academicdegree are
least likelyto default.
Data Set Analyzing using Graphical Representation.
AnalyzingUnivariate,Bivariate,Multivariate :
CategoricalUnivariateVariablesAnalysis
24
25. Income wise Analysis
Mostof applicants for loans income
type is Working, followed by
Commercialassociate,Pensioner
and Stateservant.
Theapplicants who areon
Maternityleave havedefaulting
percentageof 40%which is the
highest, followed by Unemployed
(37%).
Therestunder averagearound 10%
defaulters.
Student and Businessmenthough
less in numbers,do not havedefault
record.Safesttwo categoriesfor
providing loan..
Data Set Analyzing using Graphical Representation.
AnalyzingUnivariate,Bivariate,Multivariate :
CategoricalUnivariateVariablesAnalysis
25
26. Contractwise Analysis
Contract type: Revolvingloans are
just a small fraction (10%) from
the total number of loans Around
8-9% Cash loan applicants and 5-
6% Revolvingloan applicant are in
defaulters
Data Set Analyzing using Graphical Representation.
AnalyzingUnivariate,Bivariate,Multivariate :
CategoricalUnivariateVariablesAnalysis
26
27. Real Estate Analysis
Theclients who own real estate
are morethan double of the ones
that don't own.
Thedefaulting rate of both
categories are around the same
(~8%).
Thus wecan infer that there is no
correlation betweenowning a
reality and defaulting the loan.
Data Set Analyzing using Graphical Representation.
AnalyzingUnivariate,Bivariate,Multivariate :
CategoricalUnivariateVariablesAnalysis
27
28. Family Analysis
Most of the people who have
taken loan aremarried, followed
by Single/not marriedand civil
marriage. In Percentageof
defaulters Civil marriagehas the
highest percent around and
widow has the lowest.
Data Set Analyzing using Graphical Representation.
AnalyzingUnivariate,Bivariate,Multivariate :
CategoricalUnivariateVariablesAnalysis
28
29. Occupation Analysis
Category with highest percent of
defautess are Low-skillLaborers
(above 17%), followed byDrivers
and Waiters/barmen staff,
Securitystaff, Laborers and
Cooking staff.
IT staff are less likelyto apply for
Loan.
Data Set Analyzing using Graphical Representation.
AnalyzingUnivariate,Bivariate,Multivariate :
CategoricalUnivariateVariablesAnalysis
29
30. No. Family MembersAnalysis
Data Set Analyzing using Graphical Representation.
AnalyzingUnivariate,Bivariate,Multivariate :
CategoricalUnivariateVariablesAnalysis
30
31. Numerical
Univariate Analysis
When the credit amount goes
beyond 30 Lakhs, there is an
increasein defaulters.
Data Set Analyzing using Graphical Representation.
AnalyzingUnivariate,Bivariate,Multivariate :
CategoricalUnivariateVariablesAnalysis
31
32. 90% of the previously
cancelled client have actually
rep the loan. Revising the
interest rates would increase
business opportunityfor these
clients88% of the clients who
have been previously refused
a loan has payer back the
loan in currentcase. Refusal
reason should be recorded for
furtheranalysisas these
clients could turn into
potential repayingcustomer.
Data Set Analyzing using Graphical Representation.
AnalyzingUnivariate,Bivariate,Multivariate :
CategoricalUnivariateVariablesAnalysis
32
33. Data Set Analyzing using Graphical Representation.
AnalyzingUnivariate,Bivariate,Multivariate :
CategoricalUnivariateVariablesAnalysis
Clients who have average of 0.13 or higher their DEF_60_CNT_SOCIAL_CIRCLE score tend to default more and thus
analysing client's social circle could help in disbursement of the loan.
33