SlideShare a Scribd company logo
Loan Prediction using
Machine Learning
Content
 Introduction
 The classification problem
 Steps involved in machine learning
 Features
 Labels
 Visualizing data using Google Colab
 Explanation of the Code using Google Colab
 Models of training and testing the dataset
1. Loan prediction using logistic regression
2. Loan prediction using random forest classification
3. Loan prediction using decision tree classification
 Loan Prediction models Comparison
INTRODUCTION
 Loan-Prediction
 Understanding the problem statement is the first and foremost step. This
would help you give an intuition of what you will face ahead of time. Let
us see the problem statement.
 Dream Housing Finance company deals in all home loans. They have
presence across all urban, semi urban and rural areas. Customer first apply
for home loan after that company validates the customer eligibility for
loan. Company wants to automate the loan eligibility process (real time)
based on customer detail provided while filling online application form.
These details are Gender, Marital Status, Education, Number of
Dependents, Income, Loan Amount, Credit History and others. To automate
this process, they have given a problem to identify the customers
segments, those are eligible for loan amount so that they can specifically
target these customers.
The Classification problem
 It is a classification problem where we have to predict whether a loan
would be approved or not. In a classification problem, we have to
predict discrete values based on a given set of independent variable(s).
Classification can be of two types:
 Binary Classification : In this classification we have to predict either of
the two given classes. For example: classifying the gender as male or
female, predicting the result as win or loss, etc. Multiclass
Classification : Here we have to classify the data into three or more
classes. For example: classifying a movie's genre as comedy, action or
romantic, classify fruits as oranges, apples, or pears, etc.
 Loan prediction is a very common real-life problem that each retail
bank faces atleast once in its lifetime. If done correctly, it can save a
lot of man hours at the end of a retail bank.
Steps involved in machine learning
1 - Data Collection
 The quantity & quality of your data dictate how accurate our model is
 The outcome of this step is generally a representation of data (Guo simplifies to
specifying a table) which we will use for training
 Using pre-collected data, by way of datasets from Kaggle, UCI, etc., still fits into
this step
2 - Data Preparation
 Wrangle data and prepare it for training
 Clean that which may require it (remove duplicates, correct errors, deal with
missing values, normalization, data type conversions, etc.)
 Randomize data, which erases the effects of the particular order in which we
collected and/or otherwise prepared our data.
Steps involved in machine learning
3 - Choose a Model
 Different algorithms are for different tasks; choose the right one

4 - Train the Model
 The goal of training is to answer a question or make a prediction
correctly as often as possible
 Linear regression example: algorithm would need to learn values
for m (or W) and b (x is input, y is output)
 Each iteration of process is a training step
Steps involved in machine learning
5 - Evaluate the Model
 Uses some metric or combination of metrics to "measure" objective
performance of model
 Test the model against previously unseen data
 This unseen data is meant to be somewhat representative of model
performance in the real world, but still helps tune the model (as
opposed to test data, which does not)
 Good train/evaluate split 80/20, 70/30, or similar, depending on
domain, data availability, dataset particulars, etc.
Steps involved in machine learning
6 - Parameter Tuning
 This step refers to hyper-parameter tuning, which is an "art form" as
opposed to a science
 Tune model parameters for improved performance
 Simple model hyper-parameters may include: number of training
steps, learning rate, initialization values and distribution, etc.
7 - Make Predictions
 Using further (test set) data which have, until this point, been
withheld from the model (and for which class labels are known), are
used to test the model; a better approximation of how the model will
perform in the real world.
DATASETS
 Here we have two datasets. First is train_dataset.csv, test_dataset.csv.
 These are datasets of loan approval applications which are featured with
annual income, married or not, dependents are there or not, educated or
not, credit history present or not, loan amount etc.
 The outcome of the dataset is represented by loan status in the train
dataset.
 This column is absent in test_dataset.csv as we need to assign loan status
with the help of training dataset.
FEATURES PRESENT IN LOAN PREDICTION
 Loan_ID – The ID number generated by the bank which is giving loan.
 Gender – Whether the person taking loan is male or female.
 Married – Whether the person is married or unmarried.
 Dependents – Family members who stay with the person.
 Education – Educational qualification of the person taking loan.
 Self_Employed – Whether the person is self-employed or not.
 ApplicantIncome – The basic salary or income of the applicant per month.
 CoapplicantIncome – The basic income or family members.
 LoanAmount – The amount of loan for which loan is applied.
 Loan_Amount_Term – How much time does the loan applicant take to pay the loan.
 Credit_History – Whether the loan applicant has taken loan previously from same bank.
 Property_Area – This is about the area where the person stays ( Rural/Urban).
Labels
 LOAN_STATUS – Based on the mentioned features, the machine learning
algorithm decides whether the person should be give loan or not.
Visualizing data using google Colab
Visualizing data using google Colab
Visualizing data using google Colab
Visualizing data using google Colab
Visualizing data using google Colab
Visualizing data using google Colab
Explanation of the Code using Google Colab
 The dataset is trained and tested with 3 methods
1. Loan prediction using logistic regression
2. Loan prediction using random forest classification
3. Loan prediction using decision tree classification
Loan prediction using Logistic Regression
 # take a look at the top 5 rows of the train set, notice the column "Loan_Status"
 train.head()
Loan_ID Gender Married
Depende
nts
Educatio
n
Self_Emp
loyed
Applicant
Income
Coapplica
ntIncome
LoanAmo
unt
Loan_Am
ount_Ter
m
Credit_Hi
story
Property
_Area
Loan_Sta
tus
LP001002 Male No 0 Graduate No 5849 0.0 NaN 360.0 1.0 Urban Y
LP001003 Male Yes 1 Graduate No 4583 1508.0 128.0 360.0 1.0 Rural N
LP001005 Male Yes 0 Graduate Yes 3000 0.0 66.0 360.0 1.0 Urban Y
LP001006 Male Yes 0
Not
Graduate
No 2583 2358.0 120.0 360.0 1.0 Urban Y
LP001008 Male No 0 Graduate No 6000 0.0 141.0 360.0 1.0 Urban Y
Loan prediction using Logistic Regression
Loan_ID Gender Married
Dependen
ts
Education
Self_Empl
oyed
ApplicantI
ncome
Coapplica
ntIncome
LoanAmo
unt
Loan_Am
ount_Ter
m
Credit_Hi
story
Property_
Area
LP001015 Male Yes 0 Graduate No 5720 0 110.0 360.0 1.0 Urban
LP001022 Male Yes 1 Graduate No 3076 1500 126.0 360.0 1.0 Urban
LP001031 Male Yes 2 Graduate No 5000 1800 208.0 360.0 1.0 Urban
LP001035 Male Yes 2 Graduate No 2340 2546 100.0 360.0 NaN Urban
LP001051 Male No 0
Not
Graduate
No 3276 0 78.0 360.0 1.0 Urban
• # take a look at the top 5 rows of the test set, notice the absense of "Loan_Status"
that we will predict
• test.head()
Loan prediction using Logistic Regression
 # Printing values of whether loan is accepted or rejected
 y_pred [:100]
Loan prediction using Logistic Regression
 Confusion Matrix
Loan prediction using Logistic Regression
# Check Accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)
0.8373983739837398
# Applying k-Fold Cross Validation
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
accuracies.mean()
# accuracies.std()
0.8024081632653062
Loan prediction using random forest classification
 # Printing values of whether loan is accepted or rejected
 y_pred [:100]
Loan prediction using random forest classification
 Confusion matrix
Loan prediction using random forest classification
# Check Accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)
0.6910569105691057
# Applying k-Fold Cross Validation
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
accuracies.mean()
# accuracies.std()
0.7148163265306122
Loan Prediction using Decision Tree Classification
 # Printing values of whether loan is accepted or rejected
 y_pred[:100]
Loan Prediction using Decision Tree Classification
 Confusion Matrix
Loan Prediction using Decision Tree Classification
# Check Accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)
0.8292682926829268
# Applying k-Fold Cross Validation
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
accuracies.mean()
# accuracies.std()
0.7922448979591836
Loan prediction models comparison
Loan Prediction Accuracy Accuracy using K-fold Cross
Validation
Using Logistic Regression 0.8373983739837398 0.8024081632653062
Using Random Forest
Classification
0.6910569105691057 0.7148163265306122
Using Decision Tree Classification 0.8292682926829268 0.7922448979591836
This means that from the above accuracy table, we can conclude that logistic regression is best
model for the loan prediction problem.
THANK YOU

More Related Content

What's hot

Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
Samra Shahzadi
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAbhishek Singh
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
zekeLabs Technologies
 
Loan Default Prediction with Machine Learning
Loan Default Prediction with Machine LearningLoan Default Prediction with Machine Learning
Loan Default Prediction with Machine Learning
Alibaba Cloud
 
Lecture 9&10 computer vision segmentation-no_task
Lecture 9&10 computer vision segmentation-no_taskLecture 9&10 computer vision segmentation-no_task
Lecture 9&10 computer vision segmentation-no_task
cairo university
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
Predictive Analysis PowerPoint Presentation Slides
Predictive Analysis PowerPoint Presentation SlidesPredictive Analysis PowerPoint Presentation Slides
Predictive Analysis PowerPoint Presentation Slides
SlideTeam
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
Pratap Dangeti
 
Supervised Machine Learning Techniques
Supervised Machine Learning TechniquesSupervised Machine Learning Techniques
Supervised Machine Learning Techniques
Tara ram Goyal
 
Knn Algorithm
Knn AlgorithmKnn Algorithm
Knn Algorithm
Kamal Gupta Roy
 
[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification
[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification
[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification
Seulki Park
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKAbutest
 
Data mining and analysis of customer churn dataset
Data mining and analysis of customer churn datasetData mining and analysis of customer churn dataset
Data mining and analysis of customer churn dataset
Rohan Choksi
 
ADABoost classifier
ADABoost classifierADABoost classifier
ADABoost classifier
SreerajVA
 
AI - Introduction to Bellman Equations
AI - Introduction to Bellman EquationsAI - Introduction to Bellman Equations
AI - Introduction to Bellman Equations
Andrew Ferlitsch
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
SOUMIT KAR
 

What's hot (20)

Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Loan Default Prediction with Machine Learning
Loan Default Prediction with Machine LearningLoan Default Prediction with Machine Learning
Loan Default Prediction with Machine Learning
 
Lecture 9&10 computer vision segmentation-no_task
Lecture 9&10 computer vision segmentation-no_taskLecture 9&10 computer vision segmentation-no_task
Lecture 9&10 computer vision segmentation-no_task
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Predictive Analysis PowerPoint Presentation Slides
Predictive Analysis PowerPoint Presentation SlidesPredictive Analysis PowerPoint Presentation Slides
Predictive Analysis PowerPoint Presentation Slides
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Supervised Machine Learning Techniques
Supervised Machine Learning TechniquesSupervised Machine Learning Techniques
Supervised Machine Learning Techniques
 
Knn Algorithm
Knn AlgorithmKnn Algorithm
Knn Algorithm
 
[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification
[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification
[CVPR 22] Context-rich Minority Oversampling for Long-tailed Classification
 
Data Mining with WEKA WEKA
Data Mining with WEKA WEKAData Mining with WEKA WEKA
Data Mining with WEKA WEKA
 
Data mining and analysis of customer churn dataset
Data mining and analysis of customer churn datasetData mining and analysis of customer churn dataset
Data mining and analysis of customer churn dataset
 
ADABoost classifier
ADABoost classifierADABoost classifier
ADABoost classifier
 
AI - Introduction to Bellman Equations
AI - Introduction to Bellman EquationsAI - Introduction to Bellman Equations
AI - Introduction to Bellman Equations
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 

Similar to scrib.pptx

Loan Eligibility Checker
Loan Eligibility CheckerLoan Eligibility Checker
Loan Eligibility Checker
KiranVodela
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Boston Institute of Analytics
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
Boston Institute of Analytics
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
Souma Maiti
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
Vasudev pendyala
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
loanpredictionsystem-210808032534.pptx
loanpredictionsystem-210808032534.pptxloanpredictionsystem-210808032534.pptx
loanpredictionsystem-210808032534.pptx
081JeetPatel
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
Johnson Ubah
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
Er. Nawaraj Bhandari
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
Kunal Kashyap
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
IRJET Journal
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docx
ShiraPrater50
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
Mihai Enescu
 
End to-end machine learning project for beginners
End to-end machine learning project for beginnersEnd to-end machine learning project for beginners
End to-end machine learning project for beginners
Sharath Kumar
 
SURVEY ON SENTIMENT ANALYSIS
SURVEY ON SENTIMENT ANALYSISSURVEY ON SENTIMENT ANALYSIS
SURVEY ON SENTIMENT ANALYSIS
IRJET Journal
 
Data mining - Machine Learning
Data mining - Machine LearningData mining - Machine Learning
Data mining - Machine Learning
RupaDutta3
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
Marc Berman
 

Similar to scrib.pptx (20)

Loan Eligibility Checker
Loan Eligibility CheckerLoan Eligibility Checker
Loan Eligibility Checker
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
LOAN APPROVAL PRDICTION SYSTEM USING MACHINE LEARNING.
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Telecom Churn Analysis
Telecom Churn AnalysisTelecom Churn Analysis
Telecom Churn Analysis
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
loanpredictionsystem-210808032534.pptx
loanpredictionsystem-210808032534.pptxloanpredictionsystem-210808032534.pptx
loanpredictionsystem-210808032534.pptx
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docx
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
 
End to-end machine learning project for beginners
End to-end machine learning project for beginnersEnd to-end machine learning project for beginners
End to-end machine learning project for beginners
 
SURVEY ON SENTIMENT ANALYSIS
SURVEY ON SENTIMENT ANALYSISSURVEY ON SENTIMENT ANALYSIS
SURVEY ON SENTIMENT ANALYSIS
 
Data mining - Machine Learning
Data mining - Machine LearningData mining - Machine Learning
Data mining - Machine Learning
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 

Recently uploaded

Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
abdulrafaychaudhry
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
QuickwayInfoSystems3
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 

Recently uploaded (20)

Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 

scrib.pptx

  • 2. Content  Introduction  The classification problem  Steps involved in machine learning  Features  Labels  Visualizing data using Google Colab  Explanation of the Code using Google Colab  Models of training and testing the dataset 1. Loan prediction using logistic regression 2. Loan prediction using random forest classification 3. Loan prediction using decision tree classification  Loan Prediction models Comparison
  • 3. INTRODUCTION  Loan-Prediction  Understanding the problem statement is the first and foremost step. This would help you give an intuition of what you will face ahead of time. Let us see the problem statement.  Dream Housing Finance company deals in all home loans. They have presence across all urban, semi urban and rural areas. Customer first apply for home loan after that company validates the customer eligibility for loan. Company wants to automate the loan eligibility process (real time) based on customer detail provided while filling online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit History and others. To automate this process, they have given a problem to identify the customers segments, those are eligible for loan amount so that they can specifically target these customers.
  • 4. The Classification problem  It is a classification problem where we have to predict whether a loan would be approved or not. In a classification problem, we have to predict discrete values based on a given set of independent variable(s). Classification can be of two types:  Binary Classification : In this classification we have to predict either of the two given classes. For example: classifying the gender as male or female, predicting the result as win or loss, etc. Multiclass Classification : Here we have to classify the data into three or more classes. For example: classifying a movie's genre as comedy, action or romantic, classify fruits as oranges, apples, or pears, etc.  Loan prediction is a very common real-life problem that each retail bank faces atleast once in its lifetime. If done correctly, it can save a lot of man hours at the end of a retail bank.
  • 5. Steps involved in machine learning 1 - Data Collection  The quantity & quality of your data dictate how accurate our model is  The outcome of this step is generally a representation of data (Guo simplifies to specifying a table) which we will use for training  Using pre-collected data, by way of datasets from Kaggle, UCI, etc., still fits into this step 2 - Data Preparation  Wrangle data and prepare it for training  Clean that which may require it (remove duplicates, correct errors, deal with missing values, normalization, data type conversions, etc.)  Randomize data, which erases the effects of the particular order in which we collected and/or otherwise prepared our data.
  • 6. Steps involved in machine learning 3 - Choose a Model  Different algorithms are for different tasks; choose the right one  4 - Train the Model  The goal of training is to answer a question or make a prediction correctly as often as possible  Linear regression example: algorithm would need to learn values for m (or W) and b (x is input, y is output)  Each iteration of process is a training step
  • 7. Steps involved in machine learning 5 - Evaluate the Model  Uses some metric or combination of metrics to "measure" objective performance of model  Test the model against previously unseen data  This unseen data is meant to be somewhat representative of model performance in the real world, but still helps tune the model (as opposed to test data, which does not)  Good train/evaluate split 80/20, 70/30, or similar, depending on domain, data availability, dataset particulars, etc.
  • 8. Steps involved in machine learning 6 - Parameter Tuning  This step refers to hyper-parameter tuning, which is an "art form" as opposed to a science  Tune model parameters for improved performance  Simple model hyper-parameters may include: number of training steps, learning rate, initialization values and distribution, etc. 7 - Make Predictions  Using further (test set) data which have, until this point, been withheld from the model (and for which class labels are known), are used to test the model; a better approximation of how the model will perform in the real world.
  • 9. DATASETS  Here we have two datasets. First is train_dataset.csv, test_dataset.csv.  These are datasets of loan approval applications which are featured with annual income, married or not, dependents are there or not, educated or not, credit history present or not, loan amount etc.  The outcome of the dataset is represented by loan status in the train dataset.  This column is absent in test_dataset.csv as we need to assign loan status with the help of training dataset.
  • 10. FEATURES PRESENT IN LOAN PREDICTION  Loan_ID – The ID number generated by the bank which is giving loan.  Gender – Whether the person taking loan is male or female.  Married – Whether the person is married or unmarried.  Dependents – Family members who stay with the person.  Education – Educational qualification of the person taking loan.  Self_Employed – Whether the person is self-employed or not.  ApplicantIncome – The basic salary or income of the applicant per month.  CoapplicantIncome – The basic income or family members.  LoanAmount – The amount of loan for which loan is applied.  Loan_Amount_Term – How much time does the loan applicant take to pay the loan.  Credit_History – Whether the loan applicant has taken loan previously from same bank.  Property_Area – This is about the area where the person stays ( Rural/Urban).
  • 11. Labels  LOAN_STATUS – Based on the mentioned features, the machine learning algorithm decides whether the person should be give loan or not.
  • 12. Visualizing data using google Colab
  • 13. Visualizing data using google Colab
  • 14. Visualizing data using google Colab
  • 15. Visualizing data using google Colab
  • 16. Visualizing data using google Colab
  • 17. Visualizing data using google Colab
  • 18. Explanation of the Code using Google Colab  The dataset is trained and tested with 3 methods 1. Loan prediction using logistic regression 2. Loan prediction using random forest classification 3. Loan prediction using decision tree classification
  • 19. Loan prediction using Logistic Regression  # take a look at the top 5 rows of the train set, notice the column "Loan_Status"  train.head() Loan_ID Gender Married Depende nts Educatio n Self_Emp loyed Applicant Income Coapplica ntIncome LoanAmo unt Loan_Am ount_Ter m Credit_Hi story Property _Area Loan_Sta tus LP001002 Male No 0 Graduate No 5849 0.0 NaN 360.0 1.0 Urban Y LP001003 Male Yes 1 Graduate No 4583 1508.0 128.0 360.0 1.0 Rural N LP001005 Male Yes 0 Graduate Yes 3000 0.0 66.0 360.0 1.0 Urban Y LP001006 Male Yes 0 Not Graduate No 2583 2358.0 120.0 360.0 1.0 Urban Y LP001008 Male No 0 Graduate No 6000 0.0 141.0 360.0 1.0 Urban Y
  • 20. Loan prediction using Logistic Regression Loan_ID Gender Married Dependen ts Education Self_Empl oyed ApplicantI ncome Coapplica ntIncome LoanAmo unt Loan_Am ount_Ter m Credit_Hi story Property_ Area LP001015 Male Yes 0 Graduate No 5720 0 110.0 360.0 1.0 Urban LP001022 Male Yes 1 Graduate No 3076 1500 126.0 360.0 1.0 Urban LP001031 Male Yes 2 Graduate No 5000 1800 208.0 360.0 1.0 Urban LP001035 Male Yes 2 Graduate No 2340 2546 100.0 360.0 NaN Urban LP001051 Male No 0 Not Graduate No 3276 0 78.0 360.0 1.0 Urban • # take a look at the top 5 rows of the test set, notice the absense of "Loan_Status" that we will predict • test.head()
  • 21. Loan prediction using Logistic Regression  # Printing values of whether loan is accepted or rejected  y_pred [:100]
  • 22. Loan prediction using Logistic Regression  Confusion Matrix
  • 23. Loan prediction using Logistic Regression # Check Accuracy from sklearn.metrics import accuracy_score accuracy_score(y_test,y_pred) 0.8373983739837398 # Applying k-Fold Cross Validation from sklearn.model_selection import cross_val_score accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10) accuracies.mean() # accuracies.std() 0.8024081632653062
  • 24. Loan prediction using random forest classification  # Printing values of whether loan is accepted or rejected  y_pred [:100]
  • 25. Loan prediction using random forest classification  Confusion matrix
  • 26. Loan prediction using random forest classification # Check Accuracy from sklearn.metrics import accuracy_score accuracy_score(y_test,y_pred) 0.6910569105691057 # Applying k-Fold Cross Validation from sklearn.model_selection import cross_val_score accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10) accuracies.mean() # accuracies.std() 0.7148163265306122
  • 27. Loan Prediction using Decision Tree Classification  # Printing values of whether loan is accepted or rejected  y_pred[:100]
  • 28. Loan Prediction using Decision Tree Classification  Confusion Matrix
  • 29. Loan Prediction using Decision Tree Classification # Check Accuracy from sklearn.metrics import accuracy_score accuracy_score(y_test,y_pred) 0.8292682926829268 # Applying k-Fold Cross Validation from sklearn.model_selection import cross_val_score accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10) accuracies.mean() # accuracies.std() 0.7922448979591836
  • 30. Loan prediction models comparison Loan Prediction Accuracy Accuracy using K-fold Cross Validation Using Logistic Regression 0.8373983739837398 0.8024081632653062 Using Random Forest Classification 0.6910569105691057 0.7148163265306122 Using Decision Tree Classification 0.8292682926829268 0.7922448979591836 This means that from the above accuracy table, we can conclude that logistic regression is best model for the loan prediction problem.