Machine learning project called loan prediction which is implemented using different algorithms that are in machine learning and accuracy in all algorithms has been calculated and the data set has been downloaded from google and the data is splitted for training and testing purpose where 90% for training and 10% for testing the more data is used for training to increase efficiency also the system will give accurate information based on the data that is trained before implementing algorithms the steps like data collection data analyzing and data cleaning has to be performed
Evaluation metrics for binary classification - the ultimate guideneptune.ml
Presentation from PyData Warsaw 2019 by Jakub Czakon.
Choosing a proper metric is a crucial yet difficult part of the machine learning project. In this talk, you will learn about a number of common and lesser-known metrics and performance charts as well as typical decisions when it comes to choosing one for your project. Hopefully, with all that knowledge you will be fully equipped to deal with metric-related problems in your future projects!
Evaluation metrics for binary classification - the ultimate guideneptune.ml
Presentation from PyData Warsaw 2019 by Jakub Czakon.
Choosing a proper metric is a crucial yet difficult part of the machine learning project. In this talk, you will learn about a number of common and lesser-known metrics and performance charts as well as typical decisions when it comes to choosing one for your project. Hopefully, with all that knowledge you will be fully equipped to deal with metric-related problems in your future projects!
Loan Default Prediction with Machine LearningAlibaba Cloud
See webinar recording of this presentation at: https://resource.alibabacloud.com/webinar/detail.htm?webinarId=50
This webinar is designed to help users understand the end-to-end data science processes of using a propensity model on Alibaba Cloud’s Machine Learning Platform for AI; from defining the business problem, exploratory data analysis, data processing, model training to testing and deployment. You get an end-to-end case study (including a live demo) on how to use Alibaba Cloud products to predict the propensity of loan defaults.
Learn more about Machine Learning Platform for AI:
https://www.alibabacloud.com/product/machine-learning
Gather the required information from the data and predict future outcomes and trends. Use content-ready Predictive Analysis PowerPoint Presentation Slides to forecast future probabilities. Majorly applied in the business field, predictive analysis PPT templates will help you evaluate current data and historical facts to understand customers, products, services, partners, and to identify potential risks and opportunities for an organization. This deck comprises of templates such as research methodology, consumer insights consumption, need for consumer insights, key stats, data collection and processing, consumer insight capabilities, These templates are completely customizable. You can edit the templates as per your need. Change color, text, icon and font size as per your requirement. Add or remove the content, if needed. Get access to the predictive analysis PowerPoint presentation slideshow to predict future outcomes for various business topics such as customer relationship management, health care, collection analytics, fraud detection, risk management, direct marketing, industry applications, etc. Get access to the professionally designed ready-made predictive analysis PowerPoint presentation slides for your business to interpret big data for your benefit. Maintain your demeanour with our Predictive Analysis Powerpoint Presentation Slides. They will help you keep your cool.
Abstract: This PDSG workship introduces basic concepts on Bellman Equations. Concepts covered are States, Actions, Rewards, Value Function, Discount Factor, Bellman Equation, Bellman Optimality, Deterministic vs. Non-Deterministic, Policy vs. Plan, and Lifespan Penalty.
Level: Intermediate
Requirements: Should have some prior familiarity with graph theory and basic statistics. No prior programming knowledge is required.
Developed a Loan Eligibility Checker using machine learning with Logistic Regression classifier for classification. Analyzed customer data and assigned them an eligibility score based on factors like income, credit history, and loan amount. Our team worked on developing a Loan Eligibility Checker by using machine learning algorithms. We utilized classification as our main algorithm and chose the Logistic Regression classifier to classify our features. We pre-processed the data and handled missing values, encoded categorical variables, and scaled the data. We then split the data into training and testing sets and evaluated our model's performance.
Explore in-depth insights into the intricate world of bank loan approval with this compelling data analysis project presented by Boston Institute of Analytics. Our talented students delve into the complexities of loan approval processes, leveraging advanced data analysis techniques to uncover patterns, trends, and factors influencing loan decisions. From evaluating credit scores and income levels to analyzing loan terms and default rates, this project offers a comprehensive examination of the key metrics and variables impacting bank loan approval. Gain valuable insights and actionable recommendations derived from rigorous data analysis, presented in an engaging and informative format. Don't miss this opportunity to delve into the fascinating realm of data analysis and unlock new perspectives on bank loan approval dynamics. Explore the project now and embark on a journey of discovery with Boston Institute of Analytics. To learn more about our data science and artificial intelligence programs, visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/.
Loan Default Prediction with Machine LearningAlibaba Cloud
See webinar recording of this presentation at: https://resource.alibabacloud.com/webinar/detail.htm?webinarId=50
This webinar is designed to help users understand the end-to-end data science processes of using a propensity model on Alibaba Cloud’s Machine Learning Platform for AI; from defining the business problem, exploratory data analysis, data processing, model training to testing and deployment. You get an end-to-end case study (including a live demo) on how to use Alibaba Cloud products to predict the propensity of loan defaults.
Learn more about Machine Learning Platform for AI:
https://www.alibabacloud.com/product/machine-learning
Gather the required information from the data and predict future outcomes and trends. Use content-ready Predictive Analysis PowerPoint Presentation Slides to forecast future probabilities. Majorly applied in the business field, predictive analysis PPT templates will help you evaluate current data and historical facts to understand customers, products, services, partners, and to identify potential risks and opportunities for an organization. This deck comprises of templates such as research methodology, consumer insights consumption, need for consumer insights, key stats, data collection and processing, consumer insight capabilities, These templates are completely customizable. You can edit the templates as per your need. Change color, text, icon and font size as per your requirement. Add or remove the content, if needed. Get access to the predictive analysis PowerPoint presentation slideshow to predict future outcomes for various business topics such as customer relationship management, health care, collection analytics, fraud detection, risk management, direct marketing, industry applications, etc. Get access to the professionally designed ready-made predictive analysis PowerPoint presentation slides for your business to interpret big data for your benefit. Maintain your demeanour with our Predictive Analysis Powerpoint Presentation Slides. They will help you keep your cool.
Abstract: This PDSG workship introduces basic concepts on Bellman Equations. Concepts covered are States, Actions, Rewards, Value Function, Discount Factor, Bellman Equation, Bellman Optimality, Deterministic vs. Non-Deterministic, Policy vs. Plan, and Lifespan Penalty.
Level: Intermediate
Requirements: Should have some prior familiarity with graph theory and basic statistics. No prior programming knowledge is required.
Developed a Loan Eligibility Checker using machine learning with Logistic Regression classifier for classification. Analyzed customer data and assigned them an eligibility score based on factors like income, credit history, and loan amount. Our team worked on developing a Loan Eligibility Checker by using machine learning algorithms. We utilized classification as our main algorithm and chose the Logistic Regression classifier to classify our features. We pre-processed the data and handled missing values, encoded categorical variables, and scaled the data. We then split the data into training and testing sets and evaluated our model's performance.
Explore in-depth insights into the intricate world of bank loan approval with this compelling data analysis project presented by Boston Institute of Analytics. Our talented students delve into the complexities of loan approval processes, leveraging advanced data analysis techniques to uncover patterns, trends, and factors influencing loan decisions. From evaluating credit scores and income levels to analyzing loan terms and default rates, this project offers a comprehensive examination of the key metrics and variables impacting bank loan approval. Gain valuable insights and actionable recommendations derived from rigorous data analysis, presented in an engaging and informative format. Don't miss this opportunity to delve into the fascinating realm of data analysis and unlock new perspectives on bank loan approval dynamics. Explore the project now and embark on a journey of discovery with Boston Institute of Analytics. To learn more about our data science and artificial intelligence programs, visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/.
Delve into the realm of predictive modeling for loan approval. Learn how data science is revolutionizing the lending industry, making the loan approval process faster, more accurate, and fairer. Discover the key factors that influence loan decisions and how predictive modelling is shaping the future of lending. visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/ for more data science insights
As part of our team's enrollment for Data Science Super Specialization course under UpX Academy, we submitted many projects for our final assessments, one of them was Telecom Churn Analysis Model.
The input data was provided by UpX academy and language we used is R. As part of the project, our main objective was :-
-> To predict Customer Churn.
-> To Highlight the main variables/factors influencing Customer Churn.
-> To Use various ML algorithms to build prediction models, evaluate the accuracy and performance of these models.
-> Finding out the best model for our business case & providing executive Summary.
To address the mentioned business problem, we tried to follow a thorough approach. We did a detailed level Exploratory Data Analysis which consists of various Box Plots, Bar Plots etc..
Further we tried our best to build as many Classification models possible which fits our business case (Logistic Regression/kNN/Decision Trees/Random Forest/SVM) and also tried to touch Cox Hazard Survival analysis Model. Later for every model we tried to boost their performances by applying various performance tuning techniques.
As we all are still into our learning mode w.r.t these concepts & starting new, please feel free to provide feedback on our work. Any suggestions are most welcome... :)
Thanks!!
Using Python library such as numpy, scipy and pandas to carry out supervised learning operations like Support vector machine, decision tree and K-nearest neighbor.
To identify the segment of customers, who have a higher tendency to default, if they are offered a Personal Loan
To leverage the existing Two-Wheeler Loan (TW) customer base to cross-sell the Personal Loan product
Credit Card Marketing Classification Trees Fr.docxShiraPrater50
Credit Card Marketing
Classification Trees
From Building Better Models with JMP® Pro,
Chapter 6, SAS Press (2015). Grayson, Gardner
and Stephens.
Used with permission. For additional information,
see community.jmp.com/docs/DOC-7562.
2
Credit Card Marketing
Classification Trees
Key ideas: Classification trees, validation, confusion matrix, misclassification, leaf report, ROC
curves, lift curves.
Background
A bank would like to understand the demographics and other characteristics associated with whether a
customer accepts a credit card offer. Observational data is somewhat limited for this kind of problem, in
that often the company sees only those who respond to an offer. To get around this, the bank designs a
focused marketing study, with 18,000 current bank customers. This focused approach allows the bank to
know who does and does not respond to the offer, and to use existing demographic data that is already
available on each customer.
The designed approach also allows the bank to control for other potentially important factors so that the
offer combination isn’t confused or confounded with the demographic factors. Because of the size of the
data and the possibility that there are complex relationships between the response and the studied
factors, a decision tree is used to find out if there is a smaller subset of factors that may be more
important and that warrant further analysis and study.
The Task
We want to build a model that will provide insight into why some bank customers accept credit card offers.
Because the response is categorical (either Yes or No) and we have a large number of potential predictor
variables, we use the Partition platform to build a classification tree for Offer Accepted. We are primarily
interested in understanding characteristics of customers who have accepted an offer, so the resulting
model will be exploratory in nature.1
The Data Credit Card Marketing BBM.jmp
The data set consists of information on the 18,000 current bank customers in the study.
Customer Number: A sequential number assigned to the customers (this column is hidden and
excluded – this unique identifier will not be used directly).
Offer Accepted: Did the customer accept (Yes) or reject (No) the offer.
Reward: The type of reward program offered for the card.
Mailer Type: Letter or postcard.
Income Level: Low, Medium or High.
# Bank Accounts Open: How many non-credit-card accounts are held by the customer.
1 In exploratory modeling, the goal is to understand the variables or characteristics that drive behaviors or particular outcomes. In
predictive modeling, the goal is to accurately predict new observations and future behaviors, given the current information and
situation.
3
Overdraft Protection: Does the customer have overdraft protection on their checking account(s)
(Yes or No).
Credit Rating: Low, Medium or High.
# Credit Cards Held: The number of cred ...
Credit risks are calculated based on the borrowers’ overall ability to repay. Our objective was to use optimization in order to create a tool that approves or rejects loans to borrowers. We also used optimization to establish how much interest rate/credit will be extended to borrowers who were approved for a loan.
End to-end machine learning project for beginnersSharath Kumar
It is a complete end to to end project for beginners on a banking data. where this project predicts the weather the clients is going to pay next month premium. This project also includes data pre-processing like uni-variate analysis, Bi-Variate analysis, outlier-detection, imputing strategies and finally predicting
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
In the ever-evolving landscape of technology, enterprise software development is undergoing a significant transformation. Traditional coding methods are being challenged by innovative no-code solutions, which promise to streamline and democratize the software development process.
This shift is particularly impactful for enterprises, which require robust, scalable, and efficient software to manage their operations. In this article, we will explore the various facets of enterprise software development with no-code solutions, examining their benefits, challenges, and the future potential they hold.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
2. Content
Introduction
The classification problem
Steps involved in machine learning
Features
Labels
Visualizing data using Google Colab
Explanation of the Code using Google Colab
Models of training and testing the dataset
1. Loan prediction using logistic regression
2. Loan prediction using random forest classification
3. Loan prediction using decision tree classification
Loan Prediction models Comparison
3. INTRODUCTION
Loan-Prediction
Understanding the problem statement is the first and foremost step. This
would help you give an intuition of what you will face ahead of time. Let
us see the problem statement.
Dream Housing Finance company deals in all home loans. They have
presence across all urban, semi urban and rural areas. Customer first apply
for home loan after that company validates the customer eligibility for
loan. Company wants to automate the loan eligibility process (real time)
based on customer detail provided while filling online application form.
These details are Gender, Marital Status, Education, Number of
Dependents, Income, Loan Amount, Credit History and others. To automate
this process, they have given a problem to identify the customers
segments, those are eligible for loan amount so that they can specifically
target these customers.
4. The Classification problem
It is a classification problem where we have to predict whether a loan
would be approved or not. In a classification problem, we have to
predict discrete values based on a given set of independent variable(s).
Classification can be of two types:
Binary Classification : In this classification we have to predict either of
the two given classes. For example: classifying the gender as male or
female, predicting the result as win or loss, etc. Multiclass
Classification : Here we have to classify the data into three or more
classes. For example: classifying a movie's genre as comedy, action or
romantic, classify fruits as oranges, apples, or pears, etc.
Loan prediction is a very common real-life problem that each retail
bank faces atleast once in its lifetime. If done correctly, it can save a
lot of man hours at the end of a retail bank.
5. Steps involved in machine learning
1 - Data Collection
The quantity & quality of your data dictate how accurate our model is
The outcome of this step is generally a representation of data (Guo simplifies to
specifying a table) which we will use for training
Using pre-collected data, by way of datasets from Kaggle, UCI, etc., still fits into
this step
2 - Data Preparation
Wrangle data and prepare it for training
Clean that which may require it (remove duplicates, correct errors, deal with
missing values, normalization, data type conversions, etc.)
Randomize data, which erases the effects of the particular order in which we
collected and/or otherwise prepared our data.
6. Steps involved in machine learning
3 - Choose a Model
Different algorithms are for different tasks; choose the right one
4 - Train the Model
The goal of training is to answer a question or make a prediction
correctly as often as possible
Linear regression example: algorithm would need to learn values
for m (or W) and b (x is input, y is output)
Each iteration of process is a training step
7. Steps involved in machine learning
5 - Evaluate the Model
Uses some metric or combination of metrics to "measure" objective
performance of model
Test the model against previously unseen data
This unseen data is meant to be somewhat representative of model
performance in the real world, but still helps tune the model (as
opposed to test data, which does not)
Good train/evaluate split 80/20, 70/30, or similar, depending on
domain, data availability, dataset particulars, etc.
8. Steps involved in machine learning
6 - Parameter Tuning
This step refers to hyper-parameter tuning, which is an "art form" as
opposed to a science
Tune model parameters for improved performance
Simple model hyper-parameters may include: number of training
steps, learning rate, initialization values and distribution, etc.
7 - Make Predictions
Using further (test set) data which have, until this point, been
withheld from the model (and for which class labels are known), are
used to test the model; a better approximation of how the model will
perform in the real world.
9. DATASETS
Here we have two datasets. First is train_dataset.csv, test_dataset.csv.
These are datasets of loan approval applications which are featured with
annual income, married or not, dependents are there or not, educated or
not, credit history present or not, loan amount etc.
The outcome of the dataset is represented by loan status in the train
dataset.
This column is absent in test_dataset.csv as we need to assign loan status
with the help of training dataset.
10. FEATURES PRESENT IN LOAN PREDICTION
Loan_ID – The ID number generated by the bank which is giving loan.
Gender – Whether the person taking loan is male or female.
Married – Whether the person is married or unmarried.
Dependents – Family members who stay with the person.
Education – Educational qualification of the person taking loan.
Self_Employed – Whether the person is self-employed or not.
ApplicantIncome – The basic salary or income of the applicant per month.
CoapplicantIncome – The basic income or family members.
LoanAmount – The amount of loan for which loan is applied.
Loan_Amount_Term – How much time does the loan applicant take to pay the loan.
Credit_History – Whether the loan applicant has taken loan previously from same bank.
Property_Area – This is about the area where the person stays ( Rural/Urban).
11. Labels
LOAN_STATUS – Based on the mentioned features, the machine learning
algorithm decides whether the person should be give loan or not.
18. Explanation of the Code using Google Colab
The dataset is trained and tested with 3 methods
1. Loan prediction using logistic regression
2. Loan prediction using random forest classification
3. Loan prediction using decision tree classification
19. Loan prediction using Logistic Regression
# take a look at the top 5 rows of the train set, notice the column "Loan_Status"
train.head()
Loan_ID Gender Married
Depende
nts
Educatio
n
Self_Emp
loyed
Applicant
Income
Coapplica
ntIncome
LoanAmo
unt
Loan_Am
ount_Ter
m
Credit_Hi
story
Property
_Area
Loan_Sta
tus
LP001002 Male No 0 Graduate No 5849 0.0 NaN 360.0 1.0 Urban Y
LP001003 Male Yes 1 Graduate No 4583 1508.0 128.0 360.0 1.0 Rural N
LP001005 Male Yes 0 Graduate Yes 3000 0.0 66.0 360.0 1.0 Urban Y
LP001006 Male Yes 0
Not
Graduate
No 2583 2358.0 120.0 360.0 1.0 Urban Y
LP001008 Male No 0 Graduate No 6000 0.0 141.0 360.0 1.0 Urban Y
20. Loan prediction using Logistic Regression
Loan_ID Gender Married
Dependen
ts
Education
Self_Empl
oyed
ApplicantI
ncome
Coapplica
ntIncome
LoanAmo
unt
Loan_Am
ount_Ter
m
Credit_Hi
story
Property_
Area
LP001015 Male Yes 0 Graduate No 5720 0 110.0 360.0 1.0 Urban
LP001022 Male Yes 1 Graduate No 3076 1500 126.0 360.0 1.0 Urban
LP001031 Male Yes 2 Graduate No 5000 1800 208.0 360.0 1.0 Urban
LP001035 Male Yes 2 Graduate No 2340 2546 100.0 360.0 NaN Urban
LP001051 Male No 0
Not
Graduate
No 3276 0 78.0 360.0 1.0 Urban
• # take a look at the top 5 rows of the test set, notice the absense of "Loan_Status"
that we will predict
• test.head()
21. Loan prediction using Logistic Regression
# Printing values of whether loan is accepted or rejected
y_pred [:100]
29. Loan Prediction using Decision Tree Classification
# Check Accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred)
0.8292682926829268
# Applying k-Fold Cross Validation
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
accuracies.mean()
# accuracies.std()
0.7922448979591836
30. Loan prediction models comparison
Loan Prediction Accuracy Accuracy using K-fold Cross
Validation
Using Logistic Regression 0.8373983739837398 0.8024081632653062
Using Random Forest
Classification
0.6910569105691057 0.7148163265306122
Using Decision Tree Classification 0.8292682926829268 0.7922448979591836
This means that from the above accuracy table, we can conclude that logistic regression is best
model for the loan prediction problem.