- The document discusses exploratory data analysis performed on loan application data to identify patterns that can help predict the likelihood of default.
- Univariate analysis was conducted by segmenting categorical and numeric variables based on loan repayment difficulties. Significant variables for rejection/approval were identified.
- Bivariate analysis showed strong correlations between loan amount, income, price of goods, and previous loan amounts - suggesting these should be considered carefully.
- The analysis provides insights to help the company better target loan approval towards applicants able to repay, while avoiding losses from defaulters.
This case study aims to identify patterns that indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of the loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
In other words, the company wants to understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default. The company can utilize this knowledge for its portfolio and risk assessment.
Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case StudyStudyStudyStudyStudy
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
This case study aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
This case study aims to identify patterns that indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of the loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
In other words, the company wants to understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default. The company can utilize this knowledge for its portfolio and risk assessment.
Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case StudyStudyStudyStudyStudy
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
This case study aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
Banking professionals can now take advantage of our well-structured and subject-oriented Online Banking PowerPoint Presentation Slides. This electronic banking PPT theme helps you to showcase the obstacles faced by the banking sector that still operates offline. Further, present the problem statement through financial impact, projected revenue, and competition benchmark using our E-banking PowerPoint template. Get access to key stats on online banking, and customer channel preference to present a convincing web banking PPT presentation. Elucidate retail, corporate, or any other online banking type through this easy-to-understand internet banking PowerPoint theme. The digital banking PPT template deck helps you illustrate the leading players in the industry along with the services they offer. This E-banking PowerPoint presentation helps you convey the federal rules and regulations concerning online banking to your audience. Web banking PPT deck helps you in highlighting the implementation process. You can easily explain E-banking software providers, workforce training, costing, and integration with E-commerce platforms. https://bit.ly/30uZUqH
Worked on real life business problem where due to Covid-19, Airbnb has seen a major decline in revenue. To prepare for the next best steps that Airbnb needs to take as a business, analysis has been done on a dataset consisting of various Airbnb listings in New York.
This analysis served as the basis for the presentation created for the Lead Data Analyst and Data Analysis Managers
Telecommunication Analysis (3 use-cases) with IBM watson analyticssheetal sharma
The purpose of this study is, with the help of Watson Analytics examine why customers are not used the connection of Bits Telecom Company, which factors are influence the churn. Also see the cross selling and up-selling, also focus on profitability and investment and find out the way for better results.
Le financement court terme (d’une durée inférieure à un an) permet de financer l’activité quotidienne d’une entreprise ou d’une TPE à la différence des financements moyen terme (entre 1 et 5 ans) et long terme (d’une durée supérieure à 5 ans) destinés aux investissements.
C'est la presentation PowerPoint (PPT) du memoire de sortie intitule: Les canaux de transmission de la politique monetaire en Haiti: Une approche narrative (1996-2016)
Predicting Credit Card Defaults using Machine Learning AlgorithmsSagar Tupkar
This is a project that I worked on as a Capstone for my Masters in Business Analytics program at the University of Cincinnati. In this project, I have performed an end-to-end data mining exercise including data cleaning, distribution analysis, exploratory data analysis, model building etc. to identify and predict Credit Card defaults using Customer's data on past payments and general profile. In the process for building Machine Learning models, I have fit and compared the performance of multiple models and algorithms like Logistic Regreesion, PCA, Classification tree, AdaBoost Classifier, ANN and LDA.
Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)PRABHASH GOKARN
An attempt to analyze Bank Data on loans and find patterns in the data that are predictors of loan defaults. This will ensure that future loan decisions are made more logically and reduce possible defaults. The analysis has been done using Python.
Banking professionals can now take advantage of our well-structured and subject-oriented Online Banking PowerPoint Presentation Slides. This electronic banking PPT theme helps you to showcase the obstacles faced by the banking sector that still operates offline. Further, present the problem statement through financial impact, projected revenue, and competition benchmark using our E-banking PowerPoint template. Get access to key stats on online banking, and customer channel preference to present a convincing web banking PPT presentation. Elucidate retail, corporate, or any other online banking type through this easy-to-understand internet banking PowerPoint theme. The digital banking PPT template deck helps you illustrate the leading players in the industry along with the services they offer. This E-banking PowerPoint presentation helps you convey the federal rules and regulations concerning online banking to your audience. Web banking PPT deck helps you in highlighting the implementation process. You can easily explain E-banking software providers, workforce training, costing, and integration with E-commerce platforms. https://bit.ly/30uZUqH
Worked on real life business problem where due to Covid-19, Airbnb has seen a major decline in revenue. To prepare for the next best steps that Airbnb needs to take as a business, analysis has been done on a dataset consisting of various Airbnb listings in New York.
This analysis served as the basis for the presentation created for the Lead Data Analyst and Data Analysis Managers
Telecommunication Analysis (3 use-cases) with IBM watson analyticssheetal sharma
The purpose of this study is, with the help of Watson Analytics examine why customers are not used the connection of Bits Telecom Company, which factors are influence the churn. Also see the cross selling and up-selling, also focus on profitability and investment and find out the way for better results.
Le financement court terme (d’une durée inférieure à un an) permet de financer l’activité quotidienne d’une entreprise ou d’une TPE à la différence des financements moyen terme (entre 1 et 5 ans) et long terme (d’une durée supérieure à 5 ans) destinés aux investissements.
C'est la presentation PowerPoint (PPT) du memoire de sortie intitule: Les canaux de transmission de la politique monetaire en Haiti: Une approche narrative (1996-2016)
Predicting Credit Card Defaults using Machine Learning AlgorithmsSagar Tupkar
This is a project that I worked on as a Capstone for my Masters in Business Analytics program at the University of Cincinnati. In this project, I have performed an end-to-end data mining exercise including data cleaning, distribution analysis, exploratory data analysis, model building etc. to identify and predict Credit Card defaults using Customer's data on past payments and general profile. In the process for building Machine Learning models, I have fit and compared the performance of multiple models and algorithms like Logistic Regreesion, PCA, Classification tree, AdaBoost Classifier, ANN and LDA.
Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)PRABHASH GOKARN
An attempt to analyze Bank Data on loans and find patterns in the data that are predictors of loan defaults. This will ensure that future loan decisions are made more logically and reduce possible defaults. The analysis has been done using Python.
Fair Lending Testing and Analysis - Made EasyDavid Gilbert
Fair Lending laws have been around for decades, but more robust Fair Lending analysis has recently become a hot-button issue and point of emphasis with regulators.
Financial institutions must now be able mathematically prove no discrimination or "disparate impact/treatment" is occurring in marketing activities, during the loan application process, with pricing and add-on products, and with charge-off and collection practices.
Financing Policy Webinar with Congressman Israel and Matthew Brown - Matthew ...Alliance To Save Energy
November 19, 2009 - The Alliance hosted a webinar that addressed a range of current financing proposals, including a discussion by Congressman Israel on Property Assessed Clean Energy (PACE) bonds and an overview by Matthew Brown on models of clean energy financing.
With subprime loans now accounting for about 32% of approved auto originations, “trust but verify” is good advice when it comes to evaluating risk and determining rates for prospective customers. Despite our modern information age in which data is expected to be available on-demand, most dealers and lenders still rely on traditional credit scores and customer-reported details, a method that unfortunately might not generate the most accurate or comprehensive assessment of a borrowers’ qualifications.
In this whitepaper, Equifax auto marketing expert Jenn Reid leverages our unique data assets to explore how verification through alternative data sources can provide greater accountability, transparency and detailed insight into borrowers’ qualifications. Learn how the following four specific attributes correlate to borrower credit-worthiness:
* Income verification
* Employment tenure
* Pay frequency
* Employment disruptions
Visit equifax.com/automotive for more
Data Governance in the age of Social MediaExperian
Data is key to all of us. Regardless if you are a banker, retailer, marketer or underwriter, we all strive to know the most about our prospects and customers. We need to know their likes, wants, pain points and a foresight into their interest. And we need to know it before the prospect or customer does. Given the never-ending need for further insights, many of us continually look for new data sources to provide this competitive edge. This is just good business. But there is a need to understand both the predictability and persistence of the data and the insights it provides.
This presentation explores:
The regulatory landscape
The new data sources being tested and used
The implications upon your data governance infrastructure
The path to ensuring your use of the data does not become more of a burden than a benefit
Credit Audit's Use of Data Analytics in Examining Consumer Loan PortfoliosJacob Kosoff
Written by Jacob Kosoff and published in September 2013 by the RMA Journal. This article describes banks in 2012 & 2013 were modernizing their Credit Review functions.
The term “alternative data” is tossed about in the industry, but what types of alternative data can truly be used when lenders want to make a credit decision? How can it be leveraged to help you grow your credit portfolio wisely? What insights can you glean to expand your consumer universe?
Uncover some of the latest trends attached to the non-prime universe and learn the latest around alternative credit data. This deck additionally explores how some of the newest attributes can benefit lenders of all sizes.
Worked on End-to-End Implementation of Machine Learning Project.
Project Name: Loan Status Prediction
• Handling Null Values, Outliers, Unbalanced Dataset
• Data Pre-processing, Restructuring for Balanced Data
• Applying various Machine Learning Classification models
• Analyzing Various Accuracy Parameters
• Tuning and Pickling Models
• Deploying Model on Streamlit
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
2. BUSINESS
OBJECTIVE
Toapprove the loan applications of the clients who are capableof
repaying the loans.
In other words, the company wants to understand the driving
factors (or driver variables) behind loan default, i.e. the variables
which are strong indicators of default. The company can utilise this
knowledge for its portfolio and risk assessment.
EDA on the available data to understand how customer
attributes and loan attributes influence the tendency of
default.
Identifying the patterns which indicate if a client has
difficulty paying their installments, to take different actions
like denying the loan, reducing loan amount, lending at
higher interest rates to riskyclients.
3. TYPESOF
DECISIONON
LOAN
APPLICATIONS
Approved: TheCompany hasapproved loan Application
Cancelled: Theclient cancelled the application sometime
during approval. Either the client changed her/his mind
about the loan or in some casesdue to ahigher risk ofthe
client he received worse pricing which he did notwant.
Refused: Thecompany had rejected the loan (because the
client does not meet their requirementsetc.).
Unused offer: Loan hasbeen cancelled by the client but on
different stagesof the process.
4. Risks
Associated
with decision
If the applicant is not likely to repay the
loan, i.e. he/she is likely to default,
then approving the loan may lead to a
financial lossfor thecompany.
If the applicant is likely to repay the
loan, then not approving the loan
theresults in a loss of business to
company.
5. Business
understanding
Theloan providing companies find it hard
to give loans to the people due to their
insufficient or non-existent credit history.
Becauseof that, some consumers useit
astheir advantage by becoming a
defaulter.
Using EDAto analyse the patterns
present in the data to ensure that the
applicants capable of repaying the loan
are not rejected and those who are
unlikely to pay are notapproved.
7. Resultsof analysis– SignificantCATEGORICALVariables– Forrejecting or approvingthe client
By the EDA so far and results discussed above, theseare the Categoricalvariablesto be consideredfor making the decision on a new client
Application DataVariables
NAME_CONTRACT_TYPE
NAME_EDUCATION_TYPE
NAME_HOUSING_TYPE
NAME_TYPE_SUITE
FAMILY_STATUS
OCUUPATION_TYPE
NAME_INCOME_TYPE
FLAG_OWN_REALTY
FLAG_OWN_CAR
Merged data variables:The previousdata also has a lot to influence on the decision, Although this is not a completeanalysis but, thesevariablesare more significant than others.
NAME_PORTFOLIO
NAME_TYPE_SUITE_PREV
NAME_CLIENT_TYPE
NAME_PRODUCT_TYPE
NAME_SELLER_INDUSTRY
NAME_INCOME_TYPE
Theinsights and results of the analysisof each variable are mentioned along with the plots in the next pages.
8. Univariate AnalysisSegmentedover TARGETvariable 0 forno payment difficulties and 1 for
defaults in:
-Application Data
-Application and Previous Merged Data
Imbalance of Data for TARGETvariable 0 and 1 in ApplicationData Imbalance of Data for TARGETvariable 0 and 1 in PreviousData
9. Univariate AnalysisSegmentedover TARGET0 AND1
CONTRACTTYPEConsumer,Cashand Revolving Loans:
Application Data: PERCENTAGEOF
DEFAULTERS(TARGET=1)
IN CASHLOANS– 8.35%
IN REVOLVINGLOANS–5.48%
Contract Typeof previousapplications:
PERCENTAGEOFDEFAULTERS(TARGET=1)
IN CASHLOANS– 9.12%
IN REVOLVINGLOANS-10.46
IN ConsumerLoans-7.70%
Onmerging the two datasets:
Revolving Loansare slightly better in terms of numberof
defaulters turningup
10. Univariate AnalysisSegmentedover TARGET0 AND1
Applicant OwnCaror not Applicants own Realty ornot
RESULTSOFANALYSIS:
FLAG_OWN_CAR:
Thereare moreapplicants
who do not havecar in
the TARGET1 applicants
who havepayment
difficulties
FLAG_OWN_REALTY:
Applicants who own
realty
Not amajor differencein
ratio
11. Univariate AnalysisSegmentedover Target0 and1
Applicant’s SuiteType Applicant’s FamilyStatus
RESULTSOFANALYSIS:
Type_Suite:
Unaccompaniedshowa
higherDefaulter
Family_Status:
Married on the other
hand lower defaultsare
found
12. Univariate AnalysisSegmentedover Target0 and 1 Applicant’s HousingType
Applicant’s EducationType
RESULTSOFANALYSIS:
Applicant’s Housing
Type:
Thosewho own House
turn lesserintoDefaulters
Thosewho are with
parents are more of arisk
of beingdefaulters
Applicant’s Education
Type:
Secondaryeducationare
at higher risk of turning
into defaulters asper
current data
Higher education
applicants couldbe
potentially better
repayers
13. Univariate AnalysisSegmentedover Target0 and1
Applicant’s OccupationType
RESULTSOFANALYSIS:
As The Laborers , Sales Staff, Core
Staff, Managers and Drivers are the
highest number ofapplicants.
TheLaborers turn out to be even
higher defaulters aswell,
Sales Staff is also higher in count of
defaulters.
Managers on the other hand are
slightly low in count in being
defaulter thanrepayers.
According to this information ,
Managers can be preferred higher
over laborers and Sales Staff, but
this needsto be furtheranalysed
14. Univariate AnalysisSegmentedover Target0 and1
Applicant’s LoanApplication ProcessStartDay Applicant’s Gender
RESULTSOFANALYSIS:
CODE_GENDER:
Wecannot saymuch about Gender
becauseit showsalmost no Bias
here in the segmentation on
TARGET
WEEKDAY_APPR_PROCESS_START:
Not significant but aslightlyless
Defaulters onTUESDAY
16. Resultsof analysis– previousand current application mergeddata
NAME_PORTFOLIO:
POSseemsto be betterin
terms of repayers
While XNAand CARDSare
more towardsdefaults
NAME_TYPE_SUITE_PREV:In
the previous data,the same
pattern isseen.
Unaccompaniedand moreof
defaulters
17. Resultsof analysis– previousand current application mergeddata
NAME_CLIENT_TYPE:
Ascompared to Repeaterand
New type of client , the
Refreshedclient type is a
better type that repaysbut
only aslight difference is
observed
NAME_Product_TYPE:
Walk-in type is observed tobe
high at turning into defaulter
compared to XNAand x-sell
which are more of repayin
type
18. Resultsof analysis– previousand current application mergeddata
NAME_SELLER_INDUSTRY:
Consumerelectronics is better
at repaying
While XNAis amajor defaulter
turning category.(XNAis
unknown industry here)
NAME_INCOME_TYPE:
Working: This is the most risky
category asdefaulter turn up is
highest
CommercialAssoc.: slightly
lower default turnup.
Pensioner: lower defaults
observed and hencecouldbe
taken into consideration
Stateservant: Thishaving
lesserdefaults
Unemployed:
Student: negligible numberof
applicants but no defaulters.
Couldbe agood sourcefor
businessfor company.And
alsosafeclients
Businessman:safeclients as
no defaulters, but negligible
applicants.
19. Insightsfrom theCorrelation
Significant NUMERICVariables – Forrejectingor approving the client
Bythe EDAsofar and results discussedabove, these are the Numeric variables which have high correlation In the
Application Data
.
• AMT_INCOME_TOTAL
• AMT_CREDIT
• AMT_ANNUITY
• AMT_GOODS_PRICE
In the previousand current application analysis the following variablesarefound to be highly correlated:
• AMT_ANNUITY_PREV
• AMT_APPLICATION
• AMT_CREDIT_PREV
• AMT_DOWN_PAYMENT
• AMT_GOODS_PRICE_PREV
• CNT_PAYMENT
Theresults of the analysis of each of these variables is mentionedalong
withthe plots in the next pages.
24. Insights from correlation
top 3correlations of appdf TARGET0:
• 1.AMT_GOODS_PRICE and AMT_CREDIT :0.912
• 2.AMT_ANNUITY and AMT_CREDIT :0.643
• 3.AMT_ANNUITY and AMT_INCOME_TOTAL :0.345
top 3correlations of appdf TARGET1:
• 1.AMT_GOODS_PRICE and AMT_CREDIT :0.890
• 2.AMT_ANNUITY and AMT_CREDIT :0.621
• 3.AMT_ANNUITY and AMT_INCOME_TOTAL :0.305
top 3correlations of prev_app_dfTARGET0
• 1.AMT_APPLICATION and AMT_GOODS_PRICE_PREV :0.999
• 2.AMT_GOODS_PRICE_PREV and AMT_CREDIT_PREV :0.932
• 3.AMT_CREDIT_PREV andAMT_APPLICATION :0.878
top 3correlations of prev_app_dfTARGET1
• 1. AMT_APPLICATION and AMT_GOODS_PRICE_PREV :0.999
• 2.AMT_GOODS_PRICE_PREV and AMT_CREDIT_PREV : 0.932
• 3.AMT_CREDIT_PREV andAMT_APPLICATION :0.889