SlideShare a Scribd company logo
Home Loan eligibility Big Data Analysis, using
Python
CHAPTER 1 - Introduction of the topic
We will be progressing in a step by step manner as we go along this report.
Therefore, let us start with the first part of the topic at hand, which is what
does the researcher mean by Home Loan.
Home Loan :- Lets have a look at this in a very basic and simple way.
When it comes to survival there are three primary things - food, clothing and
shelter. Here, in our report we will are talking about shelter.
There was a day and age when survival was tough and therefore we called
housing as shelter. As we progressed, coming to the present day we have
named that basic need as a house or a home.
Now, when we say, Home Loan or House Loan we are indicating towards a
sum of money which has been borrowed from a financial Institution or a
bank, with an intention to purchase a Home.
Presently, when we say home, it can mean a variety of different things
because of the options that are available to us now, it can be a plot of land,
a villa, a flat etc. Not just for the purchase today loans are being granted
even for house repairs, re-construction purposes, demolition and renovation
of an existing home.
Let us go a level deeper and understand and what condition does this
monetary transaction take place. The money lender which can be a bank or
financial institution gives the money under a set of mutually agreed upon
conditions. In general these basics conditions have details like the Rate of
Interest to be paid,the duration, an agreement that states that the property
belongs to the money lending party until the final amount including the
interest as been paid by the borrower.
The interest rates for the home can be fixed or floating, or partly fixed and or
partly floating.
There are also certain tax benefits provided by the Government on your
home loan under the Section 80EE of the Income Tax Act. However, the
Income tax deduction can be claimed on home loan only by first time home
buyers.
As per the Income Tax Act, 1961, borrowers can avail home loan tax
benefits under different sections and save considerable outflow in the form
of tax annually.
Savings, wonderful, this is the part of the home loans, which is a very critical
part for a buyer and even a non-buyer. So, let us look into the Tax
savings a little more in detail.
An individual can claim tax benefit on home loan in various ways under the
following sections :-
Table showing the Tax benefits provided by the Government of India. Source - www.BajajFinserv.in
The Government of India extends these benefits as a form of relief to
borrowers, making housing affordable for all the Indian Citizens.
Elaborating the Home Loan Tax Sections in Details
On availing a home loan, you need to make monthly repayments as EMIs,
which include two primary components – principal amount and interest
payable. The IT Act enables borrowers to enjoy tax benefits on both these
components individually.
1. Section 80C
 Claim a maximum home loan tax deduction of up to Rs. 1.5 Lakh from
your taxable income on the principal repayment.
 This may include stamp duty and registration charges as well but can be
claimed only once.
2. Section 24
 Enjoy maximum deductions of up to Rs. 2 Lakh on the interest amount
payable.
 These deductions apply only on the property whose construction is
finished within 5 years. If it doesn’t finish within this time frame, you can
claim only up to Rs. 30,000.
2. Section 80EE
 First-time home buyers can claim an additional Rs. 50,000 on the
payable interest every financial year.
 The Home Loan amount must not be more than Rs. 35 Lakh.
 The property’s value must be within Rs. 50 Lakh.[ Source: Bajaj Finserv ]
Conditions which are important while taking a Home Loan
1. The tax exemption is applicable only when construction of the
property is complete, or you purchase a ready-to-move-in house.
2. Enjoy these tax benefits every year and save significant amounts.
3. If you sell off the property within 5 years of its possession, the claimed
benefits shall get reversed and added to your income.
4. You may purchase the property and let it out on rent. In that case, no
maximum amount is applicable to claim as home loan tax exemption.
5. When availing the home loan, if you continue to rent another house
where is presently reside, you can claim tax benefits against HRA as well.
[ Source: Bajaj Finserv ]
Home Loan Market in India
The total home loan market in India is valued, around 3 Lakh Crore. When
we have a look at this humungous figure, we start to get a feel as to how big
an important this Financing sector is, w.r.t the India Markets.
In the researcher’s opinion, the monetary size of the sector or the total
market valuation which in this case is valued at around 3 Lakh Crore
represents the significance of the market.
A market which operates at this valuation, it clearly shows the importance of
this sector in the daily life of the Indian citizens and to the Indian economy
as a whole. Housing as we all know, is one amongst the basic needs of
each and every individual and now we have the data as well to back that
claim.
As per the latest movements of the Government related to the Home Loan
sector, it was seen that on August 23, 2019 the Finance Minister Nirmala
Sitharaman announced the provision of a list of benefits with the help of the
stimulus package i.e. to be released by the R.B.I. Apart, from this, R.B.I was
seen this year to have further reduced the repo rate to 35 basis points and
which makes the interest for the banks to take loans from the R.B.I to
5.4%.All in all, it is required that the banks pass on the benefits to the
borrowers at the earliest. As noted S.B.I and H.D.F.C were the fastest to
respond to this and have decreased the interest accordingly which is a very
good sign. The government as future measure to further provide support
and strength to this sector proposed to also establish an organization to
improve credit lending infrastructure in India.
Mortgage penetration in India as a percentage of GDP
Fig :- Home Loan penetration per country as percentage of their country’s GDP.
With such a huge opportunity, it is quite predictable that there would be
many companies who would like to have a piece of this cake. At present the
home loan market in India has, 80-plus players. However, two large
companies, HDFC and LIC, individually have a market share of over Rs. 1.5
lakh crore, this makes it up to 57 per cent, just for these 2 companies.
[Source - rating agency ICRA]
What we can extract from the graph above, is yet another intresting fact that
home loans at present is currently being availed by a very small percentage
of the population. This means that this sector even with that valuation is just
in its infancy stage and there is a very huge opportunity that is available for
all the companies to grow in this sector, since the loan penetration has
reached, just 9% of the whole population of India.
Market Share of Home Loan providers in India
Fig 2 : A pie chart representation of all the players in the Indian Home Loan Market.
There are other players too, which hold a place in this market. Some others
with notably good market shares are SBI and ICICI Bank. The Pie-Chart
below represents the market share which is held by these companies.
They say growth comes at a cost, well it applies very rightly to this
Financing sector, it is estimated, that these companies will require Rs
9,000-16,000 crore of external capital or in other words external funding to
continue with their Industry average growth rate of 20-22 per cent in the
coming years.
Data Analysis :- In the present day and age we have reached a very
remarkable point when compared to our past. This is being said in the
context of the technological prowess that has now become an availability for
all of humanity.
It is important that we give due note to technology because that is what is
the major enabler here. It is because of technology that we were able to
collect data.
When we say data, this can mean a whole variety of different things as per
the situation we are dealing with, it is a word, with a very vast scope. The
data that is collected as feedback by the customers at the billing counter in
a shopping mall, by restaurants, by e-commerce companies for their
products and also by other service providers all are different in nature but
come within the domain.
By the phrase above, there was intention to bring to light, what do we mean
when we say Data. We do understand the diversity that comes into the
picture when we talk about data.
In rather simpler words, to conclude what Data means, in the context of this
report and in the context of Data Analysis in general, it is a collection of
facts, figures, details, features, information, evidence w.r.t to a particular
task or operation.
Now, when we have data i.e. available, the next step is to take this data and
to do an analysis. We do analysis to extract insights from the data. By
taking a look at an example, the significance of the analysis aspect of data
will be clear to us.
Here, we are taking an example, which most people would have had an
experience with already. So, when we have to travel from one place to
another, if we are not familiar with the route. We tend to use the Google
Maps to guide us to reach the destination.
Now, we opened the app, the map shows us in general at least 3 possible
alternative routes to our destination. Here, at that very moment a lot of
analysis is taking place.
When we look at the map, we can see that there are portions of the route
which are marked in red. Here, the machine is calculating the traffic
condition based on the movement of Android users, which are present in
that location.
The server is analyzing this data and based on it is showing the red mark.
Here, data was taken from the users, it was analyzed and then it was made
available for other users, so that they can plan and take the best routes,
accordingly.
The Data Analysis is being done here in real time in the Google Maps
app.This is an example wherein we see Data Analysis being used to make
the transportation much easier.
In very much the same way. Data Analysis today is being used in multiple
places to take decisions so as to make the lives of the people easier and
better.
Usage and importance of Big Data Analysis in Business :- This
here brings us to the core part of our report. The reason for saying this, is
because, it is here, the researcher will shed light on how companies are
able to cut down on their losses, improving the productivity, increase their
sales, customized offers to their loyal customers all of this and much more
with the help of Big Data Analytics.
Right above we have made a list of a lot of benefits which are provided by
Big Data Analytics. Let’s now look into the benefits one by one and
understand it better way. The researcher would like to begin with cutting
down the losses for the company. To explain how this is achieved using Big
Data, we will talk an example of a use case business scenario.
Sprinklers Pvt. Ltd, were using multiple marketing methodologies for the
promotion of their products. Their marketing strategy involved distribution of
pamphlets, field marketing, door-to-door marketing, stand-up display
counters in malls and Digital marketing.
The management took a decision to tone down the marketing and to focus
by focusing on the channels which were turning out to be effective. To
have an awareness as to which channel was leading to higher product
purchase the marketing team collected the final sales reports from all the
individual marketing channels.
After reviewing all the reports, it was found that Digital marketing and door-
to-door marketing were the top 2 highest when it came final sale
conversions.
After reviewing this report, the management team, to trim down on the
losses, took a decision to focus only the top 2 marketing channels and
discontinue the others, for a certain specific period of time. This way they
were able to meet their projected targets and at the same time make
savings in terms of man-power and the cash burn that was taking place.
The second case the researcher will be discussing about how, Big Data
Analytics is being used in the industry to improve the overall productivity.A
United States based logistic company, UPS(United Parcel Services) which
has Global operations had started off, by collecting data on the trucks they
were using for the item deliveries. They had a listed a set of parameters like
the routes taken by the trucks, performance, braking, weather conditions
and average truck speed.
The data was collected and after the analysis, changes were made on the
routes taken by the trucks. By implementing the changes the company was
able to save 85 million miles/yearly which meant they saved 8 million
gallons of fuel from the daily routes of the trucks.
It was found from the analysis that by saving just 20 miles/driver, the
company was able to make a saving of $30 Million. The changes which
were made possible, led to huge savings for the company.
The Logistics company United Parcel Services, have since realized the
significance of Big Data Analytics and from then made an effort to optimize
their aircraft deliveries as well using Big Data Analytics.
Let’s now have a look at the look at the last point we mentioned when it
comes the benefits we have from using Big Data Analytics, which is
providing customized offers to the loyal customers.
Before the researcher, sheds some light on a company which offers product
or service customization, there was an intresting insight the researcher
came across. Bain & Company, a Global Management Consulting firm had
conducted a survey of more than 1000+ online shoppers and it was found
that 25-30% of these customers were looking for customized products and
services online.
The same research also revealed that people looking for customized
services were willing to spend more money and were also more engaged
with the retailer.
Bain survey, clearly reveals to us that the buying behaviour of customers is
progressively changing. It is therefore, important that the companies
integrate this new customer expectation, into their business model.
There is a company which has been at the forefront when is comes to the
customization experience and that is Amazon. The reason why the
researcher has picked this company is because of the sheer size and
presence of this company.
It is immensely huge and it is very much probable that a lot of people would
have had the experience of having shopped on this online platform. It can
be noted that when a customer has opened the Amazon website They will
be shown a list of items as recommended for you.
What is happening here is based on the Data collected from previous
Search history, recent browsing pattern, the cache collected on the device
the which is stored in the browser, all this data is being analyzed and
customized products based on the customers choice is being shown to the
customer.
If the customer wants to re-purchase a certain item, they can do so now
Within just a few clicks, if they were looking a for a certain specific item they
instantly be able to see that item when they log in to the platform
all this and more of such personalized features was offered to the customer
based on analysis a lot of different data sets.
So, buy having a look at all these different scenarios, we can now begin to
understand the importance of Big Data Analytics in the business
environment.
Chapter 2 : Introduction to PCS Global
Fig 1 :- PCS Global Pvt. Limited, company logo.
INTRODUCTION
PCS Global, is a Tech based company, which started out from Calcutta.
The headquarters of the company is located there itself. The area of
operations revolves around Information Technology services like Software
testing and development, Web development, APP development, SEO and
Web Hosting, Enterprise Resource Planning and few more similar services.
Information Technology services are in the present day and age become
more and more important for all types of business. Every company has
certain sectors were it specializes when it comes to the services offered.
Under the same context the primary clientele base for PCS Global are
Banking and Financial services, Telecommunication, Media and
Entertainment, Travel and Tourism and many others.
MAJOR OPERATIONS
The goal of the company has been increase the operational efficiency of the
client, to increase their productivity, modernization of the technology being
used at the enterprise level and customizing it to their needs and
requirements.
To provide these I.T. services and solutions the brings to the table offering
and capabilities ranging from Systems Integration, Infra Services, Software
development and maintenance and High-end server technology.
COMPANY MISSION
To direct all our organizational efforts at building upon the existing
organizational strengths and brand recognition to achieve enhanced levels
of profitable growth in the core business and diversify into new areas that
compliment and supplement the core business, with the diversification
aimed at achieving excellence and industry leader status in the new areas.
The PCS Global people will however be encouraged to be open to
unconventional ideas and services and recognize new trends at very early
stages.
COMPANY VISION
PCS Global will be recognized and respected as professional, innovative,
profitable information, and knowledge based IT enterprise. PCS Global
embeds internet based technologies into its internal operating structures
and as business solutions for customers; with customer, employee and
shareholder interests at the core of its operations; demonstrating a clear
concern for ethical conduct and good corporate citizenship; with the
objective of growing into a regional and global player.
AWARDS & RECOGNITION
1. Promoted to Pvt. Limited in 2010.
2. Received BOPT accreditation in 2017.
3. Got accreditation from HRD ministry in 2017.
4. Recognized as most effective training partner and awarded by various
Colleges for their best training services.
5. Got opportunities to open innovation labs in several Government
Engineering colleges with the help of I.T. ministry.
ORGANIZATIONAL STRUCTURE
PCS Global is registered private organization under the Ministry of
Corporate Affairs(MCA). MCA is a government body which supervises all
the corporate affairs in India through the Companies Act, 1956, 2013 and
other allied Acts, Bills and Rules. MCA, with the help of the Bills and Rules
also protects investors and offers many important services and rights to the
stakeholders of the company. PCS Global follows the organizational
structure as prescribed by the MCA for private companies.
DEPARTMENTS IN THE ORGANIZATION
The Departments that are in PCS Global are as follows :-
 Software Development
 Digital Marketing
 Finance and Accounting
 Marketing
 Training
 Human Resource
 Operations Management
COMPETITION & CLIENTS
PCS Global is an I.T. company, I.T. is a foray where which there is tough
competition due to the presence of high number of companies which
operate in the I.T. products and services space. PCS Global operates both
nationally and internationally. Therefore, given below are the names of
companies that give PCS global competition nationally and then
internationally.
National :-
1) Eometric Software Solution
2) Sasken
3) Infotech Enterprises
4) Mastek
5) Polaris
6) Sapient
7) KPIT Cummins
8) Rolta India
9) L&T Infotech
20) NIIT
International :-
1) Infosys
2) TCS
3) Wipro
4) HCL
5) Mphasis ( HP Subsidiary)
6) Oracle Financial ( earlier known as iFlex,subsidiary of Oracle)
7) Financial Technologies
8) Patni Computers
9) Tech Mahindra (now Owns Satyam which used to be a Tier 1 Company)
10) Mindtree
PRODUCTS & SERVICES OFFERED
PCS Global has a wide variety of offering when it comes to Product and
Service offering to its customer base. Now, since the researcher is going to
shed some light on management system. Let’s have a better understanding
of this topic before we go ahead.
Fig :- Diagram to explain how integration of I.T system and Business Management gives us Business
Management System. The info-graphic was made by researcher for better visual representation.
Since, each category here has a vast set of options one can choose from,
let’s look at the each Product category 1st
and then move on to the next one,
which is the Services.
So, now we have a info-graphic which tells us how these modern business
management systems are made.
Taking a look at this from a technical and accurate standpoint, any use of
information technology system for the administration and management
purpose would come under the domain of Information Technology as a
Service(ITaaS). It helps in managing the day-to-day operations of the
business. Let’s have a look at the IT services being offered by PCS Global.
 Product Offering by PCS Global
Fig :- The 1st
four items that are offered in the Product category by PCS Global.
Let’s now take a look at the each and every product offering in brief. The
very 1st
one we will be looking at is the hospital management system (HMS).
In this fast-paced world, managing the operations of the hospital can surely
be a very difficult task. Even we today we can see many hospital still
following the traditional route wherein all the tasks liking checking if the
doctor is available, registration, billing, waiting in queue before the
consultation, all this and more and being managed manually.
This traditional management system has now been replaced by a hospital
management system (HMS) which is a computer based system which
facilitates managing all the functioning of the hospital. It comes with a lot of
benefits like the customers are now easier to manage, availability of the
doctor is easily be known, registration and billing have become faster with
the help of a computer etc.
The 2nd
product category is School Management System. Schools and
Colleges today as all of us would have experienced have grown really big,
in
general we would be able to see that the number of students in any school
or college on an average would be around 1000 - 2000, this is just the
number of students, then comes the number of teachers, the administrative
staff, the accounts section, the facilities staff and others.
Managing all these verticals can become difficult. But, with the help of
School management software managing these elements can be made
easier. It is specifically designed in a way such that all the operations and
the administration activity of the school or the educational institution is run
efficiently and smoothly.
The 3rd product category is Banking Management System. It is considered
to be one of the most complex systems of all because of the vast variety of
the things that is covered under this one roof.
`
The aspects covered here goes from managing and protecting the customer
information, information to the transactions that are happening every
moment, recording the details of all such transactions, generating tabulated
reports for recording and reference purpose, all this and much more come
within the daily operation of a bank.
Managing these events can be complex, therefore Banking Management
System are used to reduces the dependency on manual labour and also the
tasks which are automated, will be error free as they will only work as they
are programmed whereas doing work manually may have possibility of
slight human error.
The 4th
product category is Office Management System. In simple terms, it
can be defined as a computer based system which assists in office
administration. Office administration is a very vast area, it covers aspects
like the multiple levels of administration like clerical, secretarial, senior/top
management, chairman etc.
Offices have different departments based on the company’s objectives.
Coming to one of the next aspects of Office administration, we deal with
departments and their associated function.
Here, with the help of ITaaS, we strive to achieve a structured method of
control over the daily operations, framed around the objective of the
company.
Fig :- The 2nd list of items that are offered in the Product category section by PCS Global.
The researcher will now 2nd
list of product offering by PCS Global. The first
one here is Asset Management System. When the researcher says Asset
Management System, he is essentially trying to indicate towards an I.T.
application which is used to record and track an asset throughout its life
cycle, which is right from the purchase of the asset to its sale.
In the 2nd
list, the next product category is HR Management System. Every
single institution, organization or company that is present today requires a
Human Resource(H.R.) department.
The HR department is entrusted with a wide range of responsibilities which
revolve around a core objective which is taking care of all types of needs,
which includes, emotional, professional and physical well-being of the
employee.
HR functionalities have over a period of time grown to include more aspects
like induction of new entrants, grievance redressal, employee payroll
management, talent acquisition and management, workforce
analytics, performance management, and benefits administration and many
more. All these corporate HR operations are now are managed with the
assistance of HR Management System.
In the list, the next product category is Transport Management System(T
MS). In the corporate scenario, TMS is viewed as a subset of Supply Chain
Management(SCM), which in-turn at times may be a subset of the
company’s Enterprise Resource Planning(ERP) system.
Venn-diagram representing Mgmt. Systems in a company
Fig :- A visual representation of Transport Management System which is a subset of Supply Chain Mgmt. System,
which in-turn is a subset of the company’s ERP system. Here, the Other Dept. System represents all the other mgmt.
systems being used by the company like Finance, Marketing etc. Source - designed by researcher.
The visual representation shows that the Transport Management
System(TMS), is within the Supply Chain Management(SCM) System. By
this the researcher is trying to indicate that the vast nature of SCM system.
When it comes to SCM, there are so many verticals that are present for
example inventory management, supply and demand forecasting, inventory
maintenance, fulfillment of orders being made, supplier relations and since
in the present day and age since we have given our final end-users the
facility to return the goods if they don’t like it supply chain therefore also
includes Returns Management.
In the list, the next product category is Open Source Portal Applications. So,
to understand this let’s divide this term into to two halves. The 1st
one is
Open Source and the 2nd
one is Portal Applications.
The researcher would like to explain it part-by-part. The 1st
part is Open
Source, by this it means that the source to execution of a particular work is
available or open to all the people. Here, let’s look at it once again, to gain a
better understanding, Source means an application or a tool.
Let’s take an example to understand this better. Openshot is an open
source Video editing software. Here, this application was made by the
developers and then it was offered to the whole public for Video editing for
free. So, this becomes a source for editing and it is Open, meaning
available for everyone to use and work.
Let’s now go for the next bit which is Portal software. Portal software
essentially means a gateway to a service which is provided via intranet or
internet facility.
Let’s understand this better, there is dimension which enclosed from all
sides. We can imagine this to be a huge sphere. Now, this sphere has an
entry point. We can enter from this entry point and the access the services
which we need and then when the work is done we can come out of the
portal.
It is important to note that the service can only be availed within the limits of
the portal. Once we come outside the portal we cannot access the service.
In the corporate environment we can come across many such services.
A company can have an Open Source portal for all of its company
employees, which means all the employees of the company can access the
portal to do the specific task. The portal can be for email, messaging,
calling, Customer relationship management(CRM), work-flow maintenance
and management etc.
Fig :- The 3rd list of items that are offered in the Product category section by PCS Global.
In the 3rd
and final list, the 1st
product category is Publication Management
System. Publication essentially means, a business which involves
distribution of content. The nature of content we are trying to publish can be
of various types like advertisement, information, news, details about the
sale of a product, service or even a real estate property. Apart from the
types mentioned here, it can be used for for any other purpose deemed
suitable by the publisher as well.
This was about the content which is to be published. Now, when we are
working on any one of the type of content mentioned above. There are other
functionalities that come to the picture. Let’s take one type to understand
this aspect better. Let’s assume that we are publishing news content.
This requires communication from multiple sources to one single point.
Then the content that is being transferred has to be securely transferred,
such that it does not get leaked or is hacked by any other party. The next
step would be secure storage of this content.
Then comes the challenge of availability of this content to the various stake
holders withing the organization. Now again this sharing is preferably done
over a secure internal communication tool. The next step would involve
audio, video or text editing of the content. Finally, after all these layers of
refinement the content is published.
To manage all these tasks a Publication Management software package is
developed which helps the organization to perform their tasks with
efficiency, security and in an organized manner.
In the 3rd
list, the 2nd
product category is Store Management System. Today,
Store Management System has become a critical component of every retail
business. It is very common to have seen this management system at work.
All of us would have gone to any physical store to purchase some product,
grocery, shoes, medicine, books or any other item and we would have seen
that they are entering all the details in the computer to generate the bill.
The Store Management System that is being used is making a record of all
the items that have been sold. This software can be accessed by the
management to understand how many items were purchased and how
many were sold, which category of item is selling more similarly which
category of time is selling less, which have not yet sold and so on.
It provides data of all these multiple parameters to the management. So,
after viewing this data, the management can decide which category they
can offer more offers and discounts to improve their business. All this is
made possible by using a Store Management System.
The next product category is Financial Management System(FMS). Every
company has to manage its financial activities which includes keeping a
record of the salary i.e to be paid to the employees, paying the taxes based
on the revenue, savings, contingency fund, pre-paid bills to clients,
purchase of products and services all this and more.
To manage such a wide range of Financial operations listed above by the
researcher, we make use of Financial Management System which helps in
effective utilization and management of the monetary asset that the
company has at its disposal.
Venn-diagram representing Mgmt. Systems in a company
Fig :- A visual representation of company’s ERP System which holds all the different subsets. Here, we focus on two
subsets, SCM system and Material Management both of which come under Supply Chain Mgmt. System. The Other
Dept. System represents all the other mgmt. systems being used by the company. Source - designed by researcher.
In the 3rd
list, the final product category is Material Management System.
When we take a look at this domain’s basic operation, it is seen that it takes
care of the proper supply of materials so that the manufacturing of the
product is taking place in an efficient manner.
“Material management is the planning, directing, controlling
and co-ordination of all those activities concerned with
material and inventory requirements, from the point of their
inception to their introduction into manufacturing process”
- L. J. De Rose
Sir, De Rose, summarized the activities that come under the domain of
Material Management very beautifully in the above words. He talks about all
the activities, right from procurement of materials and ends with final
manufacturing of the product, all activities that come within the frame of
these two points are a part of material management”
 Service Offering by PCS Global
The researcher will now shed light on the services which are offered by
PCS Global Pvt. Ltd., the first list of items that are a part of servics are
shown in the Figure below.
Fig :- The 1st
list of services that are offered by PCS Global.
The 1st
in the list of services offered by PCS Global, is Software
development. In the present day and age, as it was discussed in detail in
the previous Product section, it is seen that there has been a lot of
integration of software in the process or operations of the company.
Fig :- The info-graphic represents the cycle which is followed in the software development process.
Source - Wikipedia. Edited - by the researcher.
To increase the efficiency and the profit margins, companies today are
making efforts to make sure that they have the most advanced software
packages, which is being used for the company operations.
In the Figure above, we see the step-by-step process that is followed, for
the development process. PCS Global provides this service and the
speciality is that it does by involving the client at every step of the
development process.
PCS Global, believes in delivering value and it understands that the needs
of companies differ, therefore making the client a part of the complete
development process is important. This helps in delivering a Final product
which adds the Best and the Maximum value to the client company.
The next in the list of services offered by PCS Global, is Software Testing.
Here, we see a similar approach being applied to test the software. When
software is being built or it has already been built the next phase in the
process is the testing part.
Fig :- The
info-graphic represents the cycle which is followed in the Software Testing process. Source - designed
by the researcher.
In the Figure above we see the Software Testing Life Cycle (STLC), which
is followed by PCS Global, to deliver software products which are fail-proof.
It shows the step-by-step, process used in Testing. Here, we start by
understanding the objective, do the planning, start with the development,
put the developed bit of code in an environment similar to the actual
environment which comprises of hardware, software and network
components.
After having tested the codes in a simulated environment. We then move to
Testing Execution, where we Run the code in the actual environment. Then
once the results are obtained, the Test Cycle comes to a closure.
Data Science, is a huge umbrella which includes vast number of services
like Machine Learning, Big Data analytics, Database management,
Business Intelligence, Natural Language processing, Data extraction
transformation and loading, Visualization of Big Data and Predictive
analytics.
PCS Global provides all these services, which helps companies towards
running their businesses in a better way.
Web Development is increasingly becoming more and more important as
we see with the Digitization wave, it is important that all the Businesses
today have good online presence. PCS Global offers services in this area
and has a good number of experts within the company which handle all the
areas of Web development and even the critical ones like HTML, PHP and
Graphic Designing.
Fig :- The 2nd
list of services that are offered by PCS Global.
The Figure above shows the 2nd
in the list of services offered by PCS
Global, the first out of which is Application Design and Development. Here,
the App. Development team of PCS Global, understands the business
model of the clients and then goes ahead towards designing and building
the application solution, which meets the requirements of the business.
The next in the list of services offered by PCS Global, is SEO and Web
Hosting. Here, the company provides facilities to a company, startup or
even an individual who would like to have a website and increase their
online presence.
This include a vast number of services like website designing and building,
determining the type of hosting that would be the best fit for the client,
understanding and estimating the technical resources that would be
required by the website like storage, RAM, bandwidth, data transfer rate,
uptime of the website and more, all which contribute towards making the
best website.
Enterprise Resource Planning(ERP), refers to software solution, that is used
to manage all the operations that is taking place within an organization.
Most often ERP packages are custom built as per the requirement of the
company. The package can be used for 1 department or for more than 1
department.
In case it is being used for more than 1 department, there is an option
provided in ERP packages, which allows both the departments to be
controlled and monitored within 1 software framework, this is a very
important feature of ERP packages.
Education and Corporate Training, under this PCS Global, provides
Technical Training to students and interested individuals who would like to
learn and gain experience, working in technical domains like Java, R,
Python, SQL etc.
Chapter 3 : Internship Methodology
1. Internship problem
The Bank, was facing a few challenges when it came to running the daily
operation, which includes functionalities like analysis of the Big Home Loan
Dataset which comprised of details of the Home Loan applicants which was
time consuming, probability of human errors, partiality towards specific
applicants, high cost spent on employees engaged in the manual analysis
of the records, inconsistency in final reports over the same Dataset or
Applicant Records, as Data size evolved over a period of time to become
Big Data, it was becoming impossible to manage and meet the expected
dead lines set by the Bank.
Understanding the impact of using Python, an object oriented programming
language, to address all these complications.
2. Significance of the research
The primary aim was to understand to what degree modern technology was
capable in impacting any business, when it comes to running it in a better
and a more efficient manner.
The researcher was given an opportunity by PCS Global, to work on a real-
time Banking Dataset. The Dataset is a list of details which was filled with
the help of applicants who were interested in taking Home loans from the
bank.
For security and confidentiality reasons, the name of the bank has not been
disclosed. Home Loan is a very nascent sector and has a huge business
potential. The researcher has shed light on this aspect and given more
details related to its importance in the introduction section of the research.
Now, with the help of software applications i.e available today, we will
understand how the operations of the bank has been benefited by
integrating this in their daily operations.
For this purpose of modernization and automation of the banking process,
the researcher has chosen Python, which is an object-oriented
programming language. It is very versatile and easy to understand,
therefore the reason for use in this task.
Let’s understand how Python was helping towards making the operations of
the bank faster and more efficient. We have chosen to work on the 1st
layer
of operations when it comes to the issuing of the Home Loan.
Here, the applicant is an individual who is in need of a Home Loan from the
bank to purchase a home. The researcher is defining the terms so that there
is complete when it comes to the context and the words being used in the
report.
The 1st
step, is that the applicant provides required details to the bank. This
is carried out in a number of ways, it can entered by the bank staff, in a
sheet of paper offered to the applicant etc. The main purpose here, is to
gather details of the applicant which is used to understand if the applicant is
eligible for the Home Loan by the bank, as per the banks terms and
conditions.
To give an idea here, the applicant is asked details such as applicant
income, co-applicant income, dependents of the applicant, credit history,
martial status and so on. Analysis of these factors, gives the bank an idea,
whether the loan can be issued to the applicant or not.
Now, once again coming to the initial aspect, how can technology help us in
running businesses in a better way. So, the researcher described that the
details were collected by the applicants. The next step was analysis of the
factors.
Up-till, now this is Home Loan Data, was being manually analyzed by
designated banking staff. With the help of Python, all this Data can be
analyzed and the results can be obtained within a few seconds. As the
numbers of applicants were exponentially growing the Bank was facing a
really hard time to manage and meet the deadlines.
With the help pf Python, the entire primary applicant analysis task of the
Bank was automated and not only that the number of applications that were
coming in, did not matter any more. As Python was able to instantly analyze
and give results if it was 1000 applicants or for 100,00,000 applicants.
With the help of Python, the researcher created a model, which acts as a
filter. We 1st
take the previous records of the company, which comprises of
all the details plus one more additional column, which is the Loan Status of
the applicant.
We use this as the Train Home Loan Dataset, meaning we run Python on
this to understand which combinations were eligible for the Home Loan.
After having figured out the model, which is available in the form of an
equation.
We run this model on the Test Home Loan Dataset, which has all the details
of the applicants but without the Loan Status. Here, we make the column,
but in the start since we do not know the status, the entire column of Loan
Status is empty.
Once we run the model, in the new Test Home Loan Dataset, we get the
results in the Final Loan Status Column. Python was able to understand
what was the way the Bank was issuing the Loans by analyzing the
previous Dataset and now this operation, was no longer needed to be done
manually. This automation provides enormous benefits to the Bank.
2. Objectives
1. To understand the various benefits gained by the integration of
Python with the daily Banking operations.
2. To find the pattern of applicants applying for the Home Loan.
3, To identify and build a model using Regression Testing in Python, which
allows for propagation from manual analysis to automated analysis.
3. Hypotheses
1) Higher the applicant income, higher will be the probability of that
individual for getting the Home Loan from the Bank.
2) Urban applicants will have the highest number of loans sanctioned
followed by Semi-Urban applicants and lastly Rural applicants.
3) Applicants who are married have a better chance of being eligible for the
Home Loan.
5. Scope of Study
The study was done at PCS Global Pvt. Ltd., which is Tech company based
out of Calcutta, with a branch in Bangalore. The company provides Data
Science based services to multiple various companies. In this study we
have taken a Live Home Loan Dataset of a Bank, which is of a .
The study was done on a fixed Dataset, as we understand the basic
dynamics of Banking process wherein we see that the number of applicants
goes an increasing w.r.t time. The actual Dataset, is in fact variable in
nature due to this aspect of banking.
The other limitation is a fixed time frame. Banking operations run 24/7*365
days. This means that Data is continuously in a state of change.
Applications are accepted, some are rejected, some are deemed repetition
and are removed and similarly a lot of change happens to the Data around
the clock.
Therefore, for the purpose of our study we have chosen a Dataset of a fixed
number and which is of fixed time frame. While it is possible to do the
analysis for the Live Dataset, the Live Data Analysis lies beyond the scope
of this study and is very much a part of the daily operations of the bank.
4. Methodology
As mentioned, previously to PCS Global, is a software solutions company.
Therefore, Data Science services is one amongst the many services it
offers to its clients. The researcher was a part of the Data Science team at
the company which provides this facility.
The Home Loan Data, was provided by the Bank, to the company. This
Data is collected by the bank in a number of ways like a the applicant is
seated in a cabin, wherein the bank staff asks the related questions and fills
in the data, a form is given to the applicant and he/she is requested to fill
the details and then submit it at the counter, there is a provision to apply
online as well, which comprises the same questions as mentioned in the
physical form.
Due to privacy and security concerns of the bank, representation of the
questionnaire is not possible. The Dataset which is a refined final
representation of the same has been made available after having secured
all the necessary permission to do so by the bank authorities.
The Data is therefore available to the company as Secondary Data. The
Data is viewed and analyzed using Python. The Graphical Visualizations
are also prepared using Python with the help of libraries which are a part of
Python framework.
The sample size of the Home Loan Dataset which has been provided is,
614 rows and 13 columns. All further study using this Dataset has been
done using Python.
5. Sample Design
The sample that has been taken for the study, is Probability Sample set,
meaning there is equal opportunity provided for the occurrence of every
possible variable to be available proportionately in the sample that the
researcher is taking for the examination.
One other reason, for taking a Probability Sample set, is that one of the
researcher’s objective is in automation of the primary analysis, which helps
in deciding the Loan Status of the applicants by analyzing the Data that is
available.
Since, the efficiency and the accuracy of the model i.e. generated which will
be used in the automation process depends a lot on the diversity of the
input data i.e. been provided. For this purpose as well it is important that the
sample we take for our study be diverse. Therefore, it is essential that we
obtain the Data by the method Probability Sampling, which ensures that to a
large degree that there is diversity in the Dataset.
Sampling Method :- Probability Sampling, under that Simple Random
Sampling.
Tools and Techniques used in the study
A. Tools used in the study :-
PYTHON :- It is an object oriented programming language. The
researcher opted for using this platform for the Data Analysis of the Home
Loan Dataset that has been made available. A few reasons for using Python
was that vast amount of libraries that have been built which can be used
along with Python for Data Analysis.
All these are also open-source resources i.e. made available for everyone to
use. These factors help in making Python a right choice for Data Analysis.
JUPYTER :- As mentioned above, the researcher opted to use Python.
Now Jupyter is a free and open-source environment for running Python.
There are a number of benefits of running Python using this coding
environment like the aesthetics are simple and intuitive, in-built libraries,
narrative text imagery is better, better visualization and so on.
Packages required within Python
· NumPy :- It is a library in Python programming language, which is
primarily used when there is a need for working with muti-dimensional arrays
and matrices. It is also capable of solving complex mathematical functions
i.e used during Data Analysis.
· Pandas :- It is a library in Python programming language, which is
primarily used for Data manipulation and Data analysis. This package was
built upon the NumPy package. Since in our study we have a Dataset which
comprises of Rows and Columns, this package will be very useful as this
library unlocks many functionality for this type of a Dataset, specifically.
· Matplotlib :- It is a plotting library meant for Python programming
language. In our study we will be analyzing the Data with the help of graphs.
Therefore, this package provides us with all the provisions required for
plotting of the Data.
· Sklearn :- It is an open-source machine learning library meant for Python
programming language. This library is used as it helps by providing many
Statistical capabilities like regression testing, classification, and clustering all
of which we will use in our study for the purpose of Data Analysis.
Data Analysis
Introduction :-
Here, the researcher will explain the manner in which he will be performing
the Data Analysis on the Home Loan Dataset. Firstly, we will open Jupyter
Notebook, which as mentioned before is an environment wherein we will be
running Python.
Once, Python is up and ready. The researcher will import the Dataset which
is in CSV(comma separated values) format in the Python environment. By
this we will be able access and view the Dataset.
After, importing the Dataset, in the environment, the first action that is
performed is viewing the Dataset. Viewing the Dataset in a tabulated way
provides a basic understanding of the Data that we will be analyzing.
Dataframe 1 - The first 4 columns of the Home Loan Dataset
Table 4.1
Fig :- The table shows a the 1st
5 rows and 4 columns of the Home Loan Dataset.
The researcher shall now begin with the analysis of the Data. In order to
have a deeper understanding of the data, Univariate analysis is taken.
Graph 4.1
Fig :- Gender distribution in the Dataset.
As we can see, the researcher has selected the 1st
column - Gender, of the
Dataframe that has been plotted. From the bar-graph it is very evident that
the number of male applicants received were very high.
A business use case of this Data would be, is when the bank will be
preparing their marketing strategy. This gives an insight of the audience the
marketing team can focus on.
With the help of Python, it is possible to extract the number of applicants in
percentage format. The researcher understands that representing this Data
in percentage form would be further helpful.
Fig :- Shows the Gender of applicants in percentage format.
Here, we can see that Males are 81% and the Females are 18% of the
overall applicants for the bank. This figure would as mentioned help in
building products and services which are aligned to the awareness of this
percentage.
For the purpose of analysis and to extract the model, the Dataset that we
have received is of 2 types. The current Dataset which is being used in the
Data Analysis is training Dataset. The other half is the testing Dataset.
The difference between these Dataset is that, in the 2nd
Dataset which is
test Dataset, we do not have the Final column which is Loan Status. This
column contains whether that particular individual which can be uniquely
identified using the Loan Id, was eligible for the Home Loan or not.
The reason it is absent is because, we will be extracting the model from the
training Dataset and then once we fit the model and run it in the test
Dataset. Since, model is a filter which has been created by analyzing the
previous Data. Now, we can obtain the final Loan Status for any number of
applicants by just entering their data and running the model.
We have obtained this model by a Statistical measure called Regression
Testing. Here, the researcher has given a brief idea. As we go on further,
more detailed explanation will be given w.r.t all these elements.
Graph 4.2
Fig :- Marital Status of the applicants
Here, the researcher has chose the next column to perform the Univariate
analysis. So, in the graph above we see the results. From the graph we can
infer that the number of applicants seeking Home Loan is higher for married
individuals.
This is a data point which gain is quiet inherent when seen in the
community, where it is seen that people get married and then they aspire to
own their own home. The Data quantifies this belief.
Fig :- Marital Status in percentage format.
As it is more clear to understand the distribution when the data is
represented in percentage format.The researcher has again shown marital
status in percentage format. It can be seen that the Married applicants are
at 65% and Non - Married applicants are at 34%. It can be said that this
reason behind the pattern of this Dataset is understandable, i.e. more
married individuals are looking towards owning a home and for that purpose
they are taking a Home Loan.
Graph 4.3
Fig :- Distribution of dependents of the applicant.
Here, the researcher has chosen to represent the Dependent column in the
Dataset. Dependent here means people who rely on the needs to be fulfilled
by the applicant. In more simple terms it means that the applicant is being a
care taker for an individual.
The Bank has very stringent laws when it comes to qualifying an individual
as a dependent. The Government of India has made it mandatory to follow
the case of dependents with vigilance as this related with Tax benefits that
can be availed by the applicant.
The following come under the umbrella of dependents, it can be the
applicant’s wife, child, parents etc. There are a set of qualifications which
one has to fulfil to qualify as a dependent.
The bank has verified these qualifications parameters, compiled it and
presented it in the Dataset. It is now, understood by what Dependents
mean, in the context of the graph and the Dataset. With this understanding
when we take a look at the graph, we can infer that for most of the
applicants here have zero dependents.
As a business case this is a rather positive figure for the bank, as it
indicates less liabilities or expenditure of the applicant, which means the
applicant will be capable of paying the EMIs on a regular basis, without any
hassle.
Graph 4.4
Fig :- Diversity of applicants, in terms of education, which is represented in percentage format.
The researcher has shown the graph, now in percentage format, which is
better at giving an insight of the distribution in the Dataset. In the graph
above, the researcher has chosen to represent the Education column in the
Dataset.From the graph, we can infer that 75%+ applicants are Graduates.
The percentage of Non-Graduates here is at 20%. Whence the Bank, is
making a marketing strategy, percentage will help the team to have a
clearer picture of the scenario the bank is facing.By looking at the figures, it
is evident that the bank is doing more business with the Graduates. A
business use case of this Data, would be, is to understand the challenges
Bank is facing when it comes to doing business with the Non-Graduates.
Now, India is a country wherein average literacy rate is at 75% (as on
2019), which leaves us with 35% who come under our category of Non-
Graduates. Taking an estimate of the number of people, 35% of 136 Crore
population, we get 47 Crore people who come under this bracket. Out of the
47 Crore population, there are many who are financially well-of by doing
businesses. Now, these people may not be graduates, but they have
businesses which is taking care of their needs.
It therefore definitely becomes a possibility, that the can tap into this
category of people. Provision should be made, to understand this sector
and to provide for it, which will help to improve the business of the bank and
at the same time help the people who are Non-Graduates own a home.
Graph 4.5
Fig :- Employment distribution, Self-Employed vs not Self-Employed applicants
The researcher has chosen to represent the Self_Employed column in the
Dataset. The bank has made an attempt to understand how many of the
applicants are having their own businesses and how many are employed.
From the graph, we can infer that most applicants are employed with an
organization. This can be viewed as a positive sign, as there would be
stable and consistent flow of income coming in for the applicant, unless for
rare scenarios which would lead to un-employment for the applicant.
Here, again as the researcher had observed in the previous graph as well,
the business engagement with Self-Employed applicants is seen to be less.
At this point we will have to verify, whether the policies of the bank are
coming in the way of development in this category. This pattern will have to
be notified to the bank. As it holds scope for modification.
Graph 4.6
Fig :- Credit history of the applicants. Here, 1.0 indicates all dues paid and 0.0 indicates all dues not paid
or dues to the bank not paid on time.
The researcher has chosen to represent the Credit_History column in the
Dataset. The bank would like to understand previous loan behaviour of the
applicants. It is understood that Home Loans are high value loans, therefore
before it is lent out, very precaution should be taken to understand if the
applicant is eligible or not.
From the graph, we can infer that most applicants have successfully repaid
all their past dues. When it comes to percentage, the representation of 1.0
which is successfully paid all their dues is at 84.219 % and 0.0 is at 15.78
%.This is a positive sign for the bank as it means that there is a good
probability that there will be zero defaulters.
It would be possible to filter out all the applicants who have previous history
of not having paid the due or having paid the due later. Extra caution would
be suggested when dealing with these applicants.
If it is noted that there is consistency when it comes non-payment, it would
be best to not process the applicant for the Home Loan. This would help the
bank in safely utilizing its funds.
Graph 4.7
Fig :- Property Distribution amongst the applicants.
The researcher has chosen to represent the Property_Area column in the
Dataset. This gives us an idea of the location(Rural, Semi-Urban or Urban)
of the property, the individual is looking to purchase.
There are multiple business use case of this data, the bank can understand
in which location they are having more business and in which location they
are having less business. This would help them to allocate the resources of
the bank accordingly to improve the business.
Urban areas have proved to have needed high investments and they also
have the highest Return on Investment(RoI) when compared to the other 2
locations. So, with the help of this Data, the management can strategize
and take actions to improve sales for Urban properties. From the graph we
can also infer that the sales in the Rural areas is the lowest when compared
with the other 2 locations.
It is important to understand why it is the lowest. Is it that the awareness
about the provisions of the bank is less amongst the rural folk. If it is so then
the bank should take steps to strengthen the marketing team deployed in
the rural areas. Overall, this data helps in understanding areas the company
is doing well and the areas it can deploy resources to make the necessary
improvements.
Graph 4.8
Fig :- Distribution of Applicant of income.
The researcher has chosen to represent the Applicant_Income column in
the Dataset. This column as the name suggests is a record of how much is
the monthly salary of the applicant.Here, the researcher has used normal
distribution method to represent the applicant’s income distribution.
Here, the researcher has used box-plot method as well to represent the
applicant’s income distribution.These methods were used as every
individual had a unique income.
To make a bar graph representing each and every income would be difficult
and un-fruitful. Therefore, we have used these 2 methods, Normal
distribution and Box-plot method.
From the graph 4.8, we can see that there is a high peak within the income
range of 0-20,000 this indicates that the number of applicants whose
income is below Rs. 20,000 is high.
Peak is one measure which can be extracted from the graph. The other
measure is the width. It can be seen that the width is more at the 0.00005th
level in the y-axis.
Graph 4.9
Fig :- Distribution of Applicant of income in Box-plot method.
This indicates that, there are a lot applicants who have their monthly salary
around Rs. 5,000 in the applicant group. Let’s now take a look at the next
one which is Graph 4.9.
Here, also the researcher has represented the applicant’s income
distribution, but in Box-plot method. This style gives us a very clear idea of
the distribution of income amongst the applicants.
Here, we can see there is a base line, which represents the start. Then
there is a wide box at 5000 mark which represents the maximum
concentration. The next smaller wide line is at 10,000 after which it can be
seen that there is a series of dots.
Here, the width indicates the concentration in the graph of the applicants.
The level indicates the income, where the concentration is noted in the
graph.
Graph 4.10
Fig :- Distribution of the term of Home Loan.
The researcher has chosen to represent the Loan_Amount_Term column in
the Dataset. This column represents the duration of the Home Loan taken
by the applicant.
The time the applicant will take to repay the bank for the sum borrowed. In
the graph the numbers indicates the total number of months. To calculate
the duration in term of years, the number can be divided by 12.
So, 100 months would come up to 8.3 years, 400 would come up to 33.3
years and so on. From the graph we can infer that there is peak in the 300-
400 months range. In terms of years it would be 25-30 years repayment
duration. Graph also shows a short peak at 200 months mark, which
indicates that a few applicants plan to repay their loan within the 10-15
years period. A business use case here, can be to make an attractive Home
Loan offering for the applicants who choose 25-30 years repayment
duration and market it to all the people. Applicants already prefer the time
duration, adding more value by giving special interest rates will help boost the
business.
Graph 4.11
Fig :- Loan Amount distribution, represented in Box-plot format.
The researcher has chosen to represent the Loan_Amount column in the
Dataset. This column represents the Loan Amount taken by the applicant.
The Y-axis shows the Loan amount, here 200 represents 20 Lakhs.
The scale has been chosen for easier representation, on paper. From the
graph we can infer that majority of the loan is in the range of 100-200. In
actual terms in the range of 10-20 Lakhs.
The next such concentration is seen near the 300 mark, which represents
30 Lakh. Then we have the concentration going on decreasing the Home
Loan Amount increases.
We have reasonable variation till 50 Lakh mark, after which there is a very
high decrease in the number of the applicants and as we go above the 60
Lakhs mark the number becomes single digit.
A business use case here can be to make policies which can help in sales
of higher value home Loans which are above 50 Lakhs mark. On one hand
it would be advisable to be cautious, but at the same time higher the Loan
amount taken higher will by the return the bank will be able to generate.
Graph 4.12
Fig :- The Target variable - Loan Status. Here, “Yes”, indicates the applicant was eligible for the Home
Loan as per the primary analysis and “No” indicates that the individual will not be further processed for the
Home Loan.
The researcher has chosen to represent the Loan_Status column in the
Dataset. This is the most important column in the Dataset. This is the final
outcome of the analysis which has been done by the bank for that particular
applicant.
As mentioned we have with us, 2 sets of Data the Training set and the Test
set, in the Training set, for which we have done the analysis, which has 614
rows and 13 columns, the Loan_Status has been manually determined by
the banking staff.
The researcher will be running the Regression Test in the Training set to
determine the model. Once, we run the model in the Test Data set, we will
get the Loan Status of the Test set, which does not have the Loan Status
manually determined.
Since, the Loan Status is the Target variable, the researcher will be
presenting here all the details, related to this variable.
Fig :- The figure here, is a snippet taken from Jupyter Notebook. Here, “train” indicates the
Training Dataset. The column chose is “Loan_Status”. The next element, “value_counts” is a function in
Python which helps to count the variables in the column.
Here, the researcher has shared, a snippet, to present all quantitative
representations of Loan_Status. From the figure we can see that out of the
total 614 applicants, 422 are eligible as per the primary analysis and 192
applicants are not eligible.
Fig :- The data has been represented in percentage form. Here, “Yes”, indicates the applicant was
eligible for the Home Loan as per the primary analysis and “No” indicates that the individual will not be
further processed for the Home Loan
In this figure, we see the percentage being shown again but this time in a
more basic representation. From the figure, it is seen that the 68.72% of the
applicants are eligible and 31.27% are not eligible as per the manual
analysis done on the training Dataset.
The reason for presenting more w.r.t the target variable Loan_Status is to
get a better understanding when it comes to the most important column in
the Dataset.
Graph 4.13
Fig :- Bivariate analysis. Loan Status and the Gender of applicants.
The researcher has in the figure above represented Bivariate analysis done
on the Loan Status and the Gender column. The Y-axis shows the number
of applicants, in percentage format for better understanding.
From the graph we can infer, when it comes to approval rates for male or
female candidates the numbers are quite proportional to each other.
Comparatively, both are at the same level with respect to each other.
When it is viewed in the context of percentage of applicants who were
eligible for the Home Loan, it can be seen from the graph 4.12, that both
male and female applicants have performed fairly well with close to more
than 75%+ of the applicants who were eligible as per the primary analysis
for the Home Loan.
Graph 4.14
Fig :- Bivariate analysis. Loan Status and the Marital Status of applicants.
The researcher has in the figure above represented Bivariate analysis done
on the Loan Status and Married column. The Y-axis shows the number of
applicants, in percentage format for better understanding.
From the graph we can infer, when it comes to eligibility status, the married
applicants have performed better that their counterparts. One of the reasons
for the spike can be because in the case of married applicants they have
co-applicant income which adds to the stability quotient of the applicant.As
in the present day and age both the partners are are earning members of
the family. Although there is a difference, it is rather small. A business use
case here can be that the bank can analyze the previous defaulter records
and find out which category has defaulted most times the married or the
non-married applicants
Based on the analyzed figure, we can understand which category has more
risk and guided by the report we can make plans to promote Home Loans to
the category which has lesser risk, for better business. At the same time it is
important that we don’t put all the eggs in 1 basket. Therefore, it will be
advised to follow the right proportion based on the current findings.The
researcher had hypothesized that the married applicants have a fairer
chance of performing better at the Loan Eligibility status. With the help of
the Data representation above it is seen that this hypothesis holds true.
Graph 4.15
Fig :- Bivariate analysis. Loan Status and the number of Dependents.
The researcher has in the figure above represented Bivariate analysis done
on the Loan Status and number of Dependents. The Y-axis shows the
number of applicants, in percentage format.
From the graph we can infer, when it comes to eligibility status for the
number of dependents, there is an inconsistency.
Fig :- Percentage calculation with exact number of applicants shown for 0,1,2 and 3+ dependents.
For 0 dependents the eligibility status is at 31.04% it increases as the
number goes up by 1 and then we can see that as the number goes up
again by 1, making it 2 dependents the non-eligibility status decreases to
32.8%.
From the data we can see that the eligibility status percentage is the highest
for 1 dependent and for the 3+ dependents. When we view the data in term
of numbers and not as percentage it is very evident that 0 dependents is the
highest when compared to all the others.
Therefore, the increase and decrease in eligibility status here can be said to
be inconsistent. One more intresting aspect we can note from the figure is
that non-eligibility percentage for 3 dependents is 100%.
This is possible as we have a finite sample under analysis. When it would
come to bigger and more diverse Dataset, this would not be the case. Else
it would mean that if a person has 3 dependents the person would not be
given a Home Loan, which is not true.What we can conclude from this
analysis is that number of dependents has less priority when it comes to
finalizing the loan eligibility status for the home loan applicants.
Graph 4.16
Fig :- Bivariate analysis. Loan Status and Education of applicants.
The researcher has in the figure above represented Bivariate analysis done
on the Loan Status and Education status of the applicants. The Y-axis
shows the number of applicants, in percentage format.
From the graph we can infer, when it comes to Home Loan eligibility status
in context to the Education status of the applicants, the graduates have a
comparatively better eligibility percentage.
Fig :- Loan Status w.r.t Education of applicants in percentage format.
The researcher has in the figure above represented shown the exact
percentage. The 70.83% is percentage of graduates who are eligible for the
Loan. The second calculation in similar context is for the non-graduates.
Graph 4.17
Fig :- Bivariate analysis. Loan Status and Self-Employed applicants.
The researcher has in the figure above represented Bivariate analysis done
on the Loan Status and Self-Employment status of the applicants. The Y-
axis shows the number of applicants, in percentage format.
From the graph we can infer, when it comes to Home Loan eligibility status
in context to the Self-Employment of the applicants, it can be seen from the
graph above and the snippet below that there is negligible difference,
between the two.
Fig :- Loan Status w.r.t Self-Employment status of applicants in percentage format.
All though when we take a look at the graph we see the percentage to be
similar, when viewed in term of number there definetly seems to be big
difference. The total number of applicants for not self-employed is at 500
and that of self-employed is at 82.
Graph 4.18
Fig :- Bivariate analysis. Loan Status and Credit history of applicants.
The researcher has in the figure above represented Bivariate analysis done
on the Loan Status and Credit History of the applicants. The Y-axis shows
the number of applicants, in percentage format.
Here, in the graph, “0.0” represents applicants who have either not paid
their dues at all or have not paid it on time. “1.0” indicates the applicants
who paid all their dues and on time.
Fig :- Loan Status w.r.t Credit history of the applicants in percentage format.
From the graph we can infer, that for the sake of precaution, bank has
sanctioned less Home Loans to previous defaulters. As there is a probability
that they might repeat it once again. If it is sanctioned then all the necessary
documents and formalities are to be followed.
Graph 4.19
Fig :- Bivariate analysis. Loan Status and Property Area of applicants.
The researcher has in the figure above represented Bivariate analysis done
on the Loan Status and Property Area of the applicants. The Y-axis shows
the number of applicants, in percentage format.
From the graph we can infer, when it comes to Home Loan eligibility status
in context to the Property Area of the applicants, it can be seen from the
graph above that the bank has approved highest Home Loans for Semi-
Urban areas.
Fig :- Loan Status w.r.t Property Area of the applicants in percentage format.
The percentage and the total number of applicants both are noted to be
highest for Semi-Urban applicants. The other applicants are very close to
each other, in terms of percentage and the number.
Graph 4.20
Fig :- Bivariate analysis. Loan Status and income of applicants.
The researcher has in the figure above represented Bivariate analysis done
on the Loan Status and income of the applicants. The Y-axis shows the
number of applicants, in percentage format.
The researcher has made use of range in this graph. The applicant income
is unique from each other. So, when we use range we can cover all the
unique entries.
Fig :- Snipped of Python code, which makes use of bins to make range for the applicants.
Here, it can be seen that in the 1st
line of the code we made use of numbers
to define the range of each bin. Low, Average, High and Very High all these
are bins. Each Bin here represents a range. The range of the 1st
bin is from
0 - 2500 Rupees, the 2nd
one which is Average has a range of 2500 - 4000
Rupees and it is the same for the other 2 bins as well.
Now, there is an understanding of the bins used. The next part is to
understand the Loan Status in context with the income of the applicant.
Fig :- Loan Status w.r.t income of the applicant in percentage format.
Please do note that the snippet shown above displays 2 important
information. On the right hand side we have the number of applicants, which
are arranged as per the range and their eligibility status.
In the right hand picture, “Y” and “N” represent yes and no. On the left hand
side we have taken out the percentage using the number shown on the
right. This is done for better understanding of the scenario.
The highest number of applicants fall in the Average bin( 2500-4000
Rupees). The next data that we can extract is the High bin(4000-6000
Rupees) has the highest eligibility status. This proves our hypothesis wrong,
which states that higher the applicants income higher will be the eligibility
status percentage.The researcher had hypothesized that higher the
applicant income higher will be the probability of the individual for getting
the Home Loan.
With the help of the Data representation above it is seen that this
hypothesis holds does not hold true. It is seen that the income range
categorized by High is 1st
and the 2nd
spot is for the income range
categorized by Average at 70%.
Graph 4.21
Fig :- Bivariate analysis. Loan Status and Loan amount taken by the applicants.
The researcher has in the figure above represented Bivariate analysis done
on the Loan Status and Loan amount taken by the applicants. The Y-axis
shows the number of applicants, in percentage format.
The researcher has used bins here to make ranges for the Loan Amount
taken from the bank. This was done because the Loan Amounts were
unique for each applicant and to capture all of them we will making use of
range.
Let’s now define the bins we have used, the bins are Low, Average and
High. Low represents the range 0 to 100, Average represents the range 100
to 200 and so on.
The range here is 0-100 and the multiplication factor to this range is 10,000.
Meaning the range actually is 0-10,00,000. Similarly, for Average bin the
range is 10 - 20 Lakhs and so on.
After having understood the bins and the associated range. The next part is
the Bivariate analysis.
Fig :- Loan Status w.r.t Loan Amount depicted in percentage format.
From the figure we can note that the highest number of applicants are for
the Average bin which has a range of 10-20 Lakhs. It is so happens to be
that the Average bin has the highest eligibility status as well.
After having reviewed all the above give data, when the researcher now
goes through the graph 4.20, it is now easier to understand the data
depicted by the graph.
We can infer, when it comes to Home Loan eligibility status in context to the
Loan Amount, it can be seen from the graph above that Average bin has the
highest eligibility status, followed by the Low Bin ( 0 - 10 Lakhs ) and lastly
the High bin ( 20 - 70 Lakhs).
The researcher will now start with the regression testing of the Home Loan
Dataset. For this test we will be using 2 statistical measures Linear
Regression and Logistic Regression.
Graph 4.22
Fig :- Scatter plot of Applicant Income vs Loan Amount
The researcher has enclosed a highly concentrated section within a red
box. The applicant income range is 0-20,000 and on the Y-axis, the Loan
Amount is 0-40,00,000.
It is very evident that close to 85% of our applicants lie within this range.
This is a very important insight that the researcher was able to extract, with
the help of the scatter plot.
The distribution which lies outside the highlighted area is seen to be very
diverse. It can therefore be predicted that, because of the huge variation in
the data it would be extremely difficult to find the best fit line.
Graph 4.23
Fig :- Scatter plot of Applicant Income vs Loan Amount with all possible best fit lines.
Before the researcher performs the automation process using a Machine
Learning technique which is Logisitic Regression on the Dataset. Here, we
will take a selected Dataframe from our Dataset to understand the why it will
not be possible to use Linear Regression in place of Logistic Regression.
The researcher performed Linear Regression on the Dataset and it can be
seen from the graph that the variance in the Dataset is very high, there
exists a high number of best-fit lines all of which have the same error rating.
This highlights the Non-Linear nature of our Home Loan Dataset.
Therefore, we can conclude that it will further not be feasible to use Linear
Regression for the analysis of our Dataset. This is because most real-life
scenarios are Non-Linear in nature.
A solution to this challenge is to use a non-linear approach, for the analysis
of this Dataset. The researcher will use Logarithmic scaling to analyze the
variation, which is done in Logistic Regression. This will be used to solve
our final objective which is to determine the Loan Eligibility Status.
Table 4.2
Fig :- Rows and Columns we selected to be used in Logistic Regression model building.
Here, to perform the Logistic Regression, we have selected a few columns
and we have dropped all the other columns like applicant income, co-
applicant income, Loan Amount etc. Out of of a set of 13 columns we are
using 8 columns to begin with for our model.
Table 4.3
Fig :- Rows and Columns we selected to be used in Logistic Regression model building for variable “X” .
Here, we have defined a new variable, “X” and we are storing our Dataset in
“X”. It can be seen that we have dropped the Loan Status column from the
Dataframe we are storing in “X”. The Final Dataframe stored in “X” can be
seen.
Table 4.4
Fig :- Rows and Columns selected for Y-Dataframe.
Here, we have defined a new variable, “Y” and we are storing our Dataset in
“Y”. It can be seen that we have used only the Loan Status column from the
Dataframe for are storing in “Y”. The Final Dataframe stored in “Y” can be
seen.
Fig :- Usage of train_test_split function in the Dataset.
We make use of a very important function which is train_test_split. This
function helps us to divide the the training and the test data set within the
Dataset. The Training Data will be used by the Machine Learning algorithm
to learn more about the Data. The Test Data set will be later used to check
how the prediction of the Loan Eligibility status.
The Split also helps us to compare the algorithm generated output with the
manually derived Loan Eligibility status and based on this we get the
accuracy score of the model.
Table 4.4
Fig :- Rows and Columns selected under x_train after train_test_split function.
It can be seen that the Dataset has been sliced by the algorithm. The same
is done for all of the other arguments as well which can be seen in the
train_test_split function.
Table 4.5
Fig :- Rows and Columns selected under y_train after train_test_split function.
Here we see that the Dataset has been sliced by the algorithm and stored
under the y-train. The Data is split using the train_test_split function, to
perform model building using Logistic Regression.
Fig :- We derive the final model using Logistic Regression function.
Here, in the bottom of the snippet we can see the final model, which the
researcher was able to extract. With this model in awareness we can be
presented with any number of applicants and it will be possible to get the
preliminary analysis within seconds.
Fig :- We derive the final model using Logistic Regression function.
The score function as mentioned helps us to compare the algorithm
generated output with the manually derived Loan Eligibility status, it can be
seen that the accuracy of this model is 79%, which is fairly good accuracy
score. We can work and improve the accuracy of the model.
Chapter 5 : Findings from the study
Fig :- Visual representation of the 3 categorical variables.
When the researcher factored in 3 variables, the researcher was able to
come across an intresting finding. The diagram represents a visual
representation of the the 3 variables. On the right-hand side we see 3
categorical variables. Within the Venn-diagrams are the highest category
w.r.t. each variable mentioned in the right. For example in location out of
Urban, Semi-Urban and Rural, Semi-Urban had the highest percentage of
applicants.
When we are factoring out a strategy, in context to the Bank, Location is an
important starting factor, the next factor to be checked is the Education of
the applicant and finally the individual’s Employment Status.
This analysis helped us to understand the segment of applicants which
were the Graduated employees who are Semi-Urban residents who gave
the bank the maximum business. Segmentation is very critical as primarily it
helps the Bank understand who are its customers.
The next important detail that can be derived by segmentation, is the list of
needs of that are most important or which add most value to the individuals
of that segment. It is when we focus and fulfil the needs segment-wise is
when it is possible to add maximum value to the customers.
One more other finding was very critical and that was the scatter plot of
Loan Amount vs the Applicant Income. It once again helped us to segment
the customers of the bank.
The researcher was able to understand that there was very high
concentration of applicants whose income range lied between 0-20,000
Rupees and who were aspiring towards taking a Home Loan in the range of
0-40,00,000 Rupees.
Recommendations
Fig :- State-wise Home Loan penetration in India. Source - RBI, IDBI Capital Research.
It is evident that many Indian states are yet to ride the wave of urbanization.
On review of the graph above the above Data point that major part of the
population lies under the Semi-Urban and Rural area category, is further
validated.
It is known that there is a saturation of Housing Finance Companies(HFCs)
in the Urban areas. It was noted in our Data Analysis that the client base of
the bank were the highest for the Semi-Urban area and it was the Semi-
Urban applicants itself which had the highest eligibility status. The insights
when integrated together converge into a recommendation.
The Bank has a good client base in the Semi-Urban area. The researcher
recommends the Bank to become a niche Bank with a focus on Semi-Urban
and Rural category. Repco Home Finance Ltd. one of the giants in HFC
space has seen success by following the segmentation strategy. It was
noted that the bank focused on niche audience which was self-employed
individuals.
As housing finance gained momentum, it primarily was around the salaried
customer while the potentially creditworthy but difficult to assess self-
employed class remained out of the ambit of lenders.
Repco understood this and made a quick and aggressive move towards this
segment.This success story establishes one truth which is the success of
serving niche audience in the Home Loan market.
Based on the Data Analysis, it is observed that the Bank has a good
audience in the Semi-Urban location. It is recommended that the bank
deploy its resources to forge strategies towards development in Semi-Urban
and the Rural markets.
Based on the finding the researcher would advise the bank to make a tie-up
with realtors and construction companies which have focus on properties
which lie in the Semi-Urban areas and the Rural areas.
It was seen that the most applicants were looking for homes which lie in the
affordable range of 0-40,00,000 Rupees. Making alliances with the
companies and realtors who provide for such a range would provide for a
win-win scenario for both the parties.
People looking for Homes prefer properties which are backed by banks for 2
reasons. The 1st
being that it adds an extra layer of security to the property.
A real estate property which is backed by a bank, means that all the
documentation and the legal formalities of the construction company and
the construction site are legal and correct.
The 2nd
is the provision of Home Loan availability for the property. If this
aspect is taken care of in the initial stage itself. There would be higher sales
for both the construction company and the bank, in terms of Home Loans.
Therefore, it will be advised that the bank instruct the marketing team to
collect details weekly about all the properties in the areas were it operates.
Then based on the past records of the company invite the companies for a
possible business partnership.As mentioned before that for the purchase of
homes there exists a lot real estate agents and agencies as well whose sole
purpose is to connect the buyer and the sellers. The bank can approach
them as well and gain further ground level insights. The more data the bank
has w.r.t the area where it operates, the more beneficial it will be for the
bank. As it is the ground level details that are needed to be taken into
consideration while implementing the marketing and sales strategies of the
bank.
Conclusion
The focus areas for the business development of the Bank have been
highlighted, based on the study of Home Loan demographics using
statistical analysis. It is advised that the findings of the study be deployed by
the bank, to drive better growth possibilities.
Bibliography
1. https://www.techopedia.com for reference on Asset management.
2. https://searchhrsoftware.techtarget.com for reference on HR management
system.
3. https://searchcio.techtarget.com for reference on Open Source Portal
Management.
4. https://searcherp.techtarget.com for reference on Financial Management
System.
5. https://www.intellias.com for reference on Services in the domain of Data
Science .
6. https://blog.paessler.com for reference on SEO and Web Hosting
Services.
7. https://www.oracle.com for reference on ERP Services
8. https://www.moneycontrol.com for reference on Home Loan status in
India.
Data Analysis on Home Loan Dataset using Python

More Related Content

What's hot

CIBIL
CIBILCIBIL
Summer internship project on home loans
Summer internship project on home loansSummer internship project on home loans
Summer internship project on home loans
Somendra Singh
 
17689260 summer-project-on-sbi
17689260 summer-project-on-sbi17689260 summer-project-on-sbi
17689260 summer-project-on-sbi
subeer22
 
Indian Overseas Bank Sip Report Loans And Advances Management
Indian Overseas Bank Sip Report Loans And Advances ManagementIndian Overseas Bank Sip Report Loans And Advances Management
Indian Overseas Bank Sip Report Loans And Advances Management
ICFAI BUSINESS SCHOOL
 
Merger & Acquisition of HDFC Bank with Centurian Bank of Punjab
Merger & Acquisition of HDFC Bank with Centurian Bank of PunjabMerger & Acquisition of HDFC Bank with Centurian Bank of Punjab
Merger & Acquisition of HDFC Bank with Centurian Bank of Punjab
Rohan Solanki
 
Omo ppt
Omo pptOmo ppt
Retail banking ppt
Retail banking pptRetail banking ppt
Retail banking ppt
Amit Saini
 
Punjab national bank scheme for agriculture
Punjab national bank scheme for agriculturePunjab national bank scheme for agriculture
Punjab national bank scheme for agriculture
Priya priyadarshini
 
Retail banking
Retail bankingRetail banking
Retail banking
Dharmik
 
NPA research report
NPA research reportNPA research report
NPA research report
sunita Burman
 
A project report on SBI bank
A project report on SBI bankA project report on SBI bank
A project report on SBI bank
Bhavik Parmar
 
Project report on NPAs
Project report on NPAsProject report on NPAs
Project report on NPAs
Parneet Walia
 
ICICI Group
ICICI GroupICICI Group
ICICI Group
prmenon1
 
A Study of Agriculture Loan of Axis Bank Ltd (MBA Finance Project)
A Study of Agriculture Loan of  Axis Bank Ltd (MBA Finance Project)A Study of Agriculture Loan of  Axis Bank Ltd (MBA Finance Project)
A Study of Agriculture Loan of Axis Bank Ltd (MBA Finance Project)
Avinash Labade
 
Homeloans
HomeloansHomeloans
Homeloans
Dharmik
 
project on online banking in india
project on online banking in indiaproject on online banking in india
project on online banking in india
Koushik Halder
 
Nbfc
NbfcNbfc
Nbfc
sumit235
 
Credit risk management presentation
Credit risk management presentationCredit risk management presentation
Credit risk management presentation
harsh raj
 
Deposit scheme project report
Deposit scheme project reportDeposit scheme project report
Deposit scheme project report
surekhaparasur
 
Presentation on state bank of india
Presentation on state bank of indiaPresentation on state bank of india
Presentation on state bank of india
Shaikh Mussaddik
 

What's hot (20)

CIBIL
CIBILCIBIL
CIBIL
 
Summer internship project on home loans
Summer internship project on home loansSummer internship project on home loans
Summer internship project on home loans
 
17689260 summer-project-on-sbi
17689260 summer-project-on-sbi17689260 summer-project-on-sbi
17689260 summer-project-on-sbi
 
Indian Overseas Bank Sip Report Loans And Advances Management
Indian Overseas Bank Sip Report Loans And Advances ManagementIndian Overseas Bank Sip Report Loans And Advances Management
Indian Overseas Bank Sip Report Loans And Advances Management
 
Merger & Acquisition of HDFC Bank with Centurian Bank of Punjab
Merger & Acquisition of HDFC Bank with Centurian Bank of PunjabMerger & Acquisition of HDFC Bank with Centurian Bank of Punjab
Merger & Acquisition of HDFC Bank with Centurian Bank of Punjab
 
Omo ppt
Omo pptOmo ppt
Omo ppt
 
Retail banking ppt
Retail banking pptRetail banking ppt
Retail banking ppt
 
Punjab national bank scheme for agriculture
Punjab national bank scheme for agriculturePunjab national bank scheme for agriculture
Punjab national bank scheme for agriculture
 
Retail banking
Retail bankingRetail banking
Retail banking
 
NPA research report
NPA research reportNPA research report
NPA research report
 
A project report on SBI bank
A project report on SBI bankA project report on SBI bank
A project report on SBI bank
 
Project report on NPAs
Project report on NPAsProject report on NPAs
Project report on NPAs
 
ICICI Group
ICICI GroupICICI Group
ICICI Group
 
A Study of Agriculture Loan of Axis Bank Ltd (MBA Finance Project)
A Study of Agriculture Loan of  Axis Bank Ltd (MBA Finance Project)A Study of Agriculture Loan of  Axis Bank Ltd (MBA Finance Project)
A Study of Agriculture Loan of Axis Bank Ltd (MBA Finance Project)
 
Homeloans
HomeloansHomeloans
Homeloans
 
project on online banking in india
project on online banking in indiaproject on online banking in india
project on online banking in india
 
Nbfc
NbfcNbfc
Nbfc
 
Credit risk management presentation
Credit risk management presentationCredit risk management presentation
Credit risk management presentation
 
Deposit scheme project report
Deposit scheme project reportDeposit scheme project report
Deposit scheme project report
 
Presentation on state bank of india
Presentation on state bank of indiaPresentation on state bank of india
Presentation on state bank of india
 

Similar to Data Analysis on Home Loan Dataset using Python

COMPARISON OF HOME LOAN SCHEME OF ICICI BANK WITH 3 OTHER PRIVATE BANKS
COMPARISON OF HOME LOAN SCHEME OF ICICI BANK WITH 3 OTHER PRIVATE BANKSCOMPARISON OF HOME LOAN SCHEME OF ICICI BANK WITH 3 OTHER PRIVATE BANKS
COMPARISON OF HOME LOAN SCHEME OF ICICI BANK WITH 3 OTHER PRIVATE BANKS
Khushbu Malara
 
March 2017
March 2017March 2017
March 2017
Sarabdeep Singh
 
Home loan
Home loanHome loan
Home loan
bappy ahmed
 
157975498 project-on-home-loan
157975498 project-on-home-loan157975498 project-on-home-loan
157975498 project-on-home-loan
Gorakhanath Patil
 
A comparative study of interest rates on housing loans
A comparative study of interest rates on housing loansA comparative study of interest rates on housing loans
A comparative study of interest rates on housing loans
Projects Kart
 
Total project insurance sector
Total project  insurance sectorTotal project  insurance sector
Total project insurance sector
Joydip Roy
 
VOLATILE MONTH -OCT -22.pdf
VOLATILE MONTH -OCT -22.pdfVOLATILE MONTH -OCT -22.pdf
VOLATILE MONTH -OCT -22.pdf
7KCR Financial Services
 
Income 4 banking
Income 4 bankingIncome 4 banking
Income 4 banking
Dharmik
 
GIC Housing Finance (NSE Code - GICHSGFIN) - Sep16 Katalyst Wealth Alpha Report
GIC Housing Finance (NSE Code - GICHSGFIN) - Sep16 Katalyst Wealth Alpha ReportGIC Housing Finance (NSE Code - GICHSGFIN) - Sep16 Katalyst Wealth Alpha Report
GIC Housing Finance (NSE Code - GICHSGFIN) - Sep16 Katalyst Wealth Alpha Report
Katalyst Wealth
 
321866659-Comparative-Study-of-Bank-s-Retail-Loan-Product-Home-Loan.pptx
321866659-Comparative-Study-of-Bank-s-Retail-Loan-Product-Home-Loan.pptx321866659-Comparative-Study-of-Bank-s-Retail-Loan-Product-Home-Loan.pptx
321866659-Comparative-Study-of-Bank-s-Retail-Loan-Product-Home-Loan.pptx
priyammajumder
 
Sbi july
Sbi julySbi july
Sbi july
Bodhik
 
Projjct
ProjjctProjjct
Projjct
Hardik Modi
 
banking industry & state bank of india
banking industry & state bank of indiabanking industry & state bank of india
banking industry & state bank of india
Shweta Khamar
 
state bank of india
state bank of indiastate bank of india
state bank of india
Shweta Khamar
 
Non performing asset
Non performing assetNon performing asset
Non performing asset
yash pune
 
Seeman_Fiintouch_LLP_Newsletter_September_22.pdf
Seeman_Fiintouch_LLP_Newsletter_September_22.pdfSeeman_Fiintouch_LLP_Newsletter_September_22.pdf
Seeman_Fiintouch_LLP_Newsletter_September_22.pdf
Ashis Kumar Dey
 
Performance management system of NBFC
Performance management system of NBFCPerformance management system of NBFC
Performance management system of NBFC
Vidushi Mathur
 
Kotak factsheet june
Kotak factsheet juneKotak factsheet june
Kotak factsheet june
Bodhik
 
Accounting & Economics For Business 6 November
Accounting & Economics For Business 6 NovemberAccounting & Economics For Business 6 November
Accounting & Economics For Business 6 November
Dr. Trilok Kumar Jain
 
Comparative Study of Loans and Advances of Commercial Banks.docx
Comparative Study of Loans and Advances of Commercial Banks.docxComparative Study of Loans and Advances of Commercial Banks.docx
Comparative Study of Loans and Advances of Commercial Banks.docx
Noaman Akbar
 

Similar to Data Analysis on Home Loan Dataset using Python (20)

COMPARISON OF HOME LOAN SCHEME OF ICICI BANK WITH 3 OTHER PRIVATE BANKS
COMPARISON OF HOME LOAN SCHEME OF ICICI BANK WITH 3 OTHER PRIVATE BANKSCOMPARISON OF HOME LOAN SCHEME OF ICICI BANK WITH 3 OTHER PRIVATE BANKS
COMPARISON OF HOME LOAN SCHEME OF ICICI BANK WITH 3 OTHER PRIVATE BANKS
 
March 2017
March 2017March 2017
March 2017
 
Home loan
Home loanHome loan
Home loan
 
157975498 project-on-home-loan
157975498 project-on-home-loan157975498 project-on-home-loan
157975498 project-on-home-loan
 
A comparative study of interest rates on housing loans
A comparative study of interest rates on housing loansA comparative study of interest rates on housing loans
A comparative study of interest rates on housing loans
 
Total project insurance sector
Total project  insurance sectorTotal project  insurance sector
Total project insurance sector
 
VOLATILE MONTH -OCT -22.pdf
VOLATILE MONTH -OCT -22.pdfVOLATILE MONTH -OCT -22.pdf
VOLATILE MONTH -OCT -22.pdf
 
Income 4 banking
Income 4 bankingIncome 4 banking
Income 4 banking
 
GIC Housing Finance (NSE Code - GICHSGFIN) - Sep16 Katalyst Wealth Alpha Report
GIC Housing Finance (NSE Code - GICHSGFIN) - Sep16 Katalyst Wealth Alpha ReportGIC Housing Finance (NSE Code - GICHSGFIN) - Sep16 Katalyst Wealth Alpha Report
GIC Housing Finance (NSE Code - GICHSGFIN) - Sep16 Katalyst Wealth Alpha Report
 
321866659-Comparative-Study-of-Bank-s-Retail-Loan-Product-Home-Loan.pptx
321866659-Comparative-Study-of-Bank-s-Retail-Loan-Product-Home-Loan.pptx321866659-Comparative-Study-of-Bank-s-Retail-Loan-Product-Home-Loan.pptx
321866659-Comparative-Study-of-Bank-s-Retail-Loan-Product-Home-Loan.pptx
 
Sbi july
Sbi julySbi july
Sbi july
 
Projjct
ProjjctProjjct
Projjct
 
banking industry & state bank of india
banking industry & state bank of indiabanking industry & state bank of india
banking industry & state bank of india
 
state bank of india
state bank of indiastate bank of india
state bank of india
 
Non performing asset
Non performing assetNon performing asset
Non performing asset
 
Seeman_Fiintouch_LLP_Newsletter_September_22.pdf
Seeman_Fiintouch_LLP_Newsletter_September_22.pdfSeeman_Fiintouch_LLP_Newsletter_September_22.pdf
Seeman_Fiintouch_LLP_Newsletter_September_22.pdf
 
Performance management system of NBFC
Performance management system of NBFCPerformance management system of NBFC
Performance management system of NBFC
 
Kotak factsheet june
Kotak factsheet juneKotak factsheet june
Kotak factsheet june
 
Accounting & Economics For Business 6 November
Accounting & Economics For Business 6 NovemberAccounting & Economics For Business 6 November
Accounting & Economics For Business 6 November
 
Comparative Study of Loans and Advances of Commercial Banks.docx
Comparative Study of Loans and Advances of Commercial Banks.docxComparative Study of Loans and Advances of Commercial Banks.docx
Comparative Study of Loans and Advances of Commercial Banks.docx
 

More from Shreyas Sinha

Won the McDonalds Data Analytics Competition [ Sep,2020 ]
Won the McDonalds Data Analytics Competition [ Sep,2020 ] Won the McDonalds Data Analytics Competition [ Sep,2020 ]
Won the McDonalds Data Analytics Competition [ Sep,2020 ]
Shreyas Sinha
 
McDonald Dataset Analysis - Shreyas Sinha [ 2nd Sep, 2020 ]
McDonald Dataset Analysis -  Shreyas Sinha [ 2nd Sep, 2020 ]McDonald Dataset Analysis -  Shreyas Sinha [ 2nd Sep, 2020 ]
McDonald Dataset Analysis - Shreyas Sinha [ 2nd Sep, 2020 ]
Shreyas Sinha
 
Service Quality --- Case study analysis --- Shreyas Sinha CMS18MBA090 --- ...
Service Quality --- Case study analysis  --- Shreyas  Sinha  CMS18MBA090 --- ...Service Quality --- Case study analysis  --- Shreyas  Sinha  CMS18MBA090 --- ...
Service Quality --- Case study analysis --- Shreyas Sinha CMS18MBA090 --- ...
Shreyas Sinha
 
PCS Global Promotion Letter
PCS Global Promotion Letter PCS Global Promotion Letter
PCS Global Promotion Letter
Shreyas Sinha
 
Presentation in IEEE International Conference on Cloud Computing
Presentation  in  IEEE  International  Conference on  Cloud  ComputingPresentation  in  IEEE  International  Conference on  Cloud  Computing
Presentation in IEEE International Conference on Cloud Computing
Shreyas Sinha
 
My LinkedIn Post #Trending in the HR Domain
My LinkedIn Post #Trending in the HR Domain My LinkedIn Post #Trending in the HR Domain
My LinkedIn Post #Trending in the HR Domain
Shreyas Sinha
 
Shreyas Sinha's Conceptual Model for Organization's Emotional Intelligence H...
Shreyas Sinha's Conceptual Model  for Organization's Emotional Intelligence H...Shreyas Sinha's Conceptual Model  for Organization's Emotional Intelligence H...
Shreyas Sinha's Conceptual Model for Organization's Emotional Intelligence H...
Shreyas Sinha
 
Innovation - A short write-up by Shreyas Sinha
Innovation  - A short write-up by Shreyas Sinha  Innovation  - A short write-up by Shreyas Sinha
Innovation - A short write-up by Shreyas Sinha
Shreyas Sinha
 
Factors in organization that are impacted by emotional intelligence (ei)
Factors in organization that are impacted by emotional intelligence (ei)Factors in organization that are impacted by emotional intelligence (ei)
Factors in organization that are impacted by emotional intelligence (ei)
Shreyas Sinha
 
Maruthi Suzuki --- Case study analysis
 Maruthi Suzuki --- Case study analysis Maruthi Suzuki --- Case study analysis
Maruthi Suzuki --- Case study analysis
Shreyas Sinha
 
Cloud high way 111 bizplan by shreyas sinha mba 4th sem dayananda sagar usn ...
Cloud high way 111 bizplan by shreyas sinha mba 4th sem dayananda sagar usn  ...Cloud high way 111 bizplan by shreyas sinha mba 4th sem dayananda sagar usn  ...
Cloud high way 111 bizplan by shreyas sinha mba 4th sem dayananda sagar usn ...
Shreyas Sinha
 
Industrial Disputes Act - Shreyas
Industrial Disputes Act - Shreyas Industrial Disputes Act - Shreyas
Industrial Disputes Act - Shreyas
Shreyas Sinha
 
E - Commerce Platforms presented by Shreyas Sinha
E - Commerce Platforms presented by Shreyas SinhaE - Commerce Platforms presented by Shreyas Sinha
E - Commerce Platforms presented by Shreyas Sinha
Shreyas Sinha
 
About E-Commerce by Shreyas Sinha
About E-Commerce by Shreyas SinhaAbout E-Commerce by Shreyas Sinha
About E-Commerce by Shreyas Sinha
Shreyas Sinha
 
Case study analysis of Fevicol - Strategic Mgmt. assignment - group 1
Case study analysis of Fevicol  - Strategic  Mgmt.  assignment - group 1Case study analysis of Fevicol  - Strategic  Mgmt.  assignment - group 1
Case study analysis of Fevicol - Strategic Mgmt. assignment - group 1
Shreyas Sinha
 
Case study summary - Branch Mngr Recruitment
Case study summary  - Branch Mngr RecruitmentCase study summary  - Branch Mngr Recruitment
Case study summary - Branch Mngr Recruitment
Shreyas Sinha
 
Alphabet inc. Case Study analysis team
Alphabet inc. Case Study analysis   teamAlphabet inc. Case Study analysis   team
Alphabet inc. Case Study analysis team
Shreyas Sinha
 
Media management - A Challenge in Brand Building
Media management  - A Challenge in Brand Building Media management  - A Challenge in Brand Building
Media management - A Challenge in Brand Building
Shreyas Sinha
 
Diploma in Modern Human Resource Management Revised
Diploma in Modern Human Resource Management RevisedDiploma in Modern Human Resource Management Revised
Diploma in Modern Human Resource Management Revised
Shreyas Sinha
 
Case study summary --- Shreyas Sinha [ Nice Animation included ]
Case study summary ---  Shreyas Sinha [ Nice Animation included ]Case study summary ---  Shreyas Sinha [ Nice Animation included ]
Case study summary --- Shreyas Sinha [ Nice Animation included ]
Shreyas Sinha
 

More from Shreyas Sinha (20)

Won the McDonalds Data Analytics Competition [ Sep,2020 ]
Won the McDonalds Data Analytics Competition [ Sep,2020 ] Won the McDonalds Data Analytics Competition [ Sep,2020 ]
Won the McDonalds Data Analytics Competition [ Sep,2020 ]
 
McDonald Dataset Analysis - Shreyas Sinha [ 2nd Sep, 2020 ]
McDonald Dataset Analysis -  Shreyas Sinha [ 2nd Sep, 2020 ]McDonald Dataset Analysis -  Shreyas Sinha [ 2nd Sep, 2020 ]
McDonald Dataset Analysis - Shreyas Sinha [ 2nd Sep, 2020 ]
 
Service Quality --- Case study analysis --- Shreyas Sinha CMS18MBA090 --- ...
Service Quality --- Case study analysis  --- Shreyas  Sinha  CMS18MBA090 --- ...Service Quality --- Case study analysis  --- Shreyas  Sinha  CMS18MBA090 --- ...
Service Quality --- Case study analysis --- Shreyas Sinha CMS18MBA090 --- ...
 
PCS Global Promotion Letter
PCS Global Promotion Letter PCS Global Promotion Letter
PCS Global Promotion Letter
 
Presentation in IEEE International Conference on Cloud Computing
Presentation  in  IEEE  International  Conference on  Cloud  ComputingPresentation  in  IEEE  International  Conference on  Cloud  Computing
Presentation in IEEE International Conference on Cloud Computing
 
My LinkedIn Post #Trending in the HR Domain
My LinkedIn Post #Trending in the HR Domain My LinkedIn Post #Trending in the HR Domain
My LinkedIn Post #Trending in the HR Domain
 
Shreyas Sinha's Conceptual Model for Organization's Emotional Intelligence H...
Shreyas Sinha's Conceptual Model  for Organization's Emotional Intelligence H...Shreyas Sinha's Conceptual Model  for Organization's Emotional Intelligence H...
Shreyas Sinha's Conceptual Model for Organization's Emotional Intelligence H...
 
Innovation - A short write-up by Shreyas Sinha
Innovation  - A short write-up by Shreyas Sinha  Innovation  - A short write-up by Shreyas Sinha
Innovation - A short write-up by Shreyas Sinha
 
Factors in organization that are impacted by emotional intelligence (ei)
Factors in organization that are impacted by emotional intelligence (ei)Factors in organization that are impacted by emotional intelligence (ei)
Factors in organization that are impacted by emotional intelligence (ei)
 
Maruthi Suzuki --- Case study analysis
 Maruthi Suzuki --- Case study analysis Maruthi Suzuki --- Case study analysis
Maruthi Suzuki --- Case study analysis
 
Cloud high way 111 bizplan by shreyas sinha mba 4th sem dayananda sagar usn ...
Cloud high way 111 bizplan by shreyas sinha mba 4th sem dayananda sagar usn  ...Cloud high way 111 bizplan by shreyas sinha mba 4th sem dayananda sagar usn  ...
Cloud high way 111 bizplan by shreyas sinha mba 4th sem dayananda sagar usn ...
 
Industrial Disputes Act - Shreyas
Industrial Disputes Act - Shreyas Industrial Disputes Act - Shreyas
Industrial Disputes Act - Shreyas
 
E - Commerce Platforms presented by Shreyas Sinha
E - Commerce Platforms presented by Shreyas SinhaE - Commerce Platforms presented by Shreyas Sinha
E - Commerce Platforms presented by Shreyas Sinha
 
About E-Commerce by Shreyas Sinha
About E-Commerce by Shreyas SinhaAbout E-Commerce by Shreyas Sinha
About E-Commerce by Shreyas Sinha
 
Case study analysis of Fevicol - Strategic Mgmt. assignment - group 1
Case study analysis of Fevicol  - Strategic  Mgmt.  assignment - group 1Case study analysis of Fevicol  - Strategic  Mgmt.  assignment - group 1
Case study analysis of Fevicol - Strategic Mgmt. assignment - group 1
 
Case study summary - Branch Mngr Recruitment
Case study summary  - Branch Mngr RecruitmentCase study summary  - Branch Mngr Recruitment
Case study summary - Branch Mngr Recruitment
 
Alphabet inc. Case Study analysis team
Alphabet inc. Case Study analysis   teamAlphabet inc. Case Study analysis   team
Alphabet inc. Case Study analysis team
 
Media management - A Challenge in Brand Building
Media management  - A Challenge in Brand Building Media management  - A Challenge in Brand Building
Media management - A Challenge in Brand Building
 
Diploma in Modern Human Resource Management Revised
Diploma in Modern Human Resource Management RevisedDiploma in Modern Human Resource Management Revised
Diploma in Modern Human Resource Management Revised
 
Case study summary --- Shreyas Sinha [ Nice Animation included ]
Case study summary ---  Shreyas Sinha [ Nice Animation included ]Case study summary ---  Shreyas Sinha [ Nice Animation included ]
Case study summary --- Shreyas Sinha [ Nice Animation included ]
 

Recently uploaded

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 

Recently uploaded (20)

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 

Data Analysis on Home Loan Dataset using Python

  • 1. Home Loan eligibility Big Data Analysis, using Python CHAPTER 1 - Introduction of the topic We will be progressing in a step by step manner as we go along this report. Therefore, let us start with the first part of the topic at hand, which is what does the researcher mean by Home Loan. Home Loan :- Lets have a look at this in a very basic and simple way. When it comes to survival there are three primary things - food, clothing and shelter. Here, in our report we will are talking about shelter. There was a day and age when survival was tough and therefore we called housing as shelter. As we progressed, coming to the present day we have named that basic need as a house or a home. Now, when we say, Home Loan or House Loan we are indicating towards a sum of money which has been borrowed from a financial Institution or a bank, with an intention to purchase a Home. Presently, when we say home, it can mean a variety of different things because of the options that are available to us now, it can be a plot of land,
  • 2. a villa, a flat etc. Not just for the purchase today loans are being granted even for house repairs, re-construction purposes, demolition and renovation of an existing home. Let us go a level deeper and understand and what condition does this monetary transaction take place. The money lender which can be a bank or financial institution gives the money under a set of mutually agreed upon conditions. In general these basics conditions have details like the Rate of Interest to be paid,the duration, an agreement that states that the property belongs to the money lending party until the final amount including the interest as been paid by the borrower. The interest rates for the home can be fixed or floating, or partly fixed and or partly floating. There are also certain tax benefits provided by the Government on your home loan under the Section 80EE of the Income Tax Act. However, the Income tax deduction can be claimed on home loan only by first time home buyers. As per the Income Tax Act, 1961, borrowers can avail home loan tax benefits under different sections and save considerable outflow in the form of tax annually.
  • 3. Savings, wonderful, this is the part of the home loans, which is a very critical part for a buyer and even a non-buyer. So, let us look into the Tax savings a little more in detail. An individual can claim tax benefit on home loan in various ways under the following sections :- Table showing the Tax benefits provided by the Government of India. Source - www.BajajFinserv.in The Government of India extends these benefits as a form of relief to borrowers, making housing affordable for all the Indian Citizens.
  • 4. Elaborating the Home Loan Tax Sections in Details On availing a home loan, you need to make monthly repayments as EMIs, which include two primary components – principal amount and interest payable. The IT Act enables borrowers to enjoy tax benefits on both these components individually. 1. Section 80C  Claim a maximum home loan tax deduction of up to Rs. 1.5 Lakh from your taxable income on the principal repayment.  This may include stamp duty and registration charges as well but can be claimed only once. 2. Section 24  Enjoy maximum deductions of up to Rs. 2 Lakh on the interest amount payable.  These deductions apply only on the property whose construction is finished within 5 years. If it doesn’t finish within this time frame, you can claim only up to Rs. 30,000. 2. Section 80EE  First-time home buyers can claim an additional Rs. 50,000 on the payable interest every financial year.  The Home Loan amount must not be more than Rs. 35 Lakh.
  • 5.  The property’s value must be within Rs. 50 Lakh.[ Source: Bajaj Finserv ] Conditions which are important while taking a Home Loan 1. The tax exemption is applicable only when construction of the property is complete, or you purchase a ready-to-move-in house. 2. Enjoy these tax benefits every year and save significant amounts. 3. If you sell off the property within 5 years of its possession, the claimed benefits shall get reversed and added to your income. 4. You may purchase the property and let it out on rent. In that case, no maximum amount is applicable to claim as home loan tax exemption. 5. When availing the home loan, if you continue to rent another house where is presently reside, you can claim tax benefits against HRA as well. [ Source: Bajaj Finserv ] Home Loan Market in India
  • 6. The total home loan market in India is valued, around 3 Lakh Crore. When we have a look at this humungous figure, we start to get a feel as to how big an important this Financing sector is, w.r.t the India Markets. In the researcher’s opinion, the monetary size of the sector or the total market valuation which in this case is valued at around 3 Lakh Crore represents the significance of the market. A market which operates at this valuation, it clearly shows the importance of this sector in the daily life of the Indian citizens and to the Indian economy as a whole. Housing as we all know, is one amongst the basic needs of each and every individual and now we have the data as well to back that claim. As per the latest movements of the Government related to the Home Loan sector, it was seen that on August 23, 2019 the Finance Minister Nirmala Sitharaman announced the provision of a list of benefits with the help of the stimulus package i.e. to be released by the R.B.I. Apart, from this, R.B.I was seen this year to have further reduced the repo rate to 35 basis points and which makes the interest for the banks to take loans from the R.B.I to 5.4%.All in all, it is required that the banks pass on the benefits to the borrowers at the earliest. As noted S.B.I and H.D.F.C were the fastest to respond to this and have decreased the interest accordingly which is a very good sign. The government as future measure to further provide support and strength to this sector proposed to also establish an organization to improve credit lending infrastructure in India.
  • 7. Mortgage penetration in India as a percentage of GDP Fig :- Home Loan penetration per country as percentage of their country’s GDP. With such a huge opportunity, it is quite predictable that there would be many companies who would like to have a piece of this cake. At present the home loan market in India has, 80-plus players. However, two large companies, HDFC and LIC, individually have a market share of over Rs. 1.5 lakh crore, this makes it up to 57 per cent, just for these 2 companies. [Source - rating agency ICRA] What we can extract from the graph above, is yet another intresting fact that home loans at present is currently being availed by a very small percentage of the population. This means that this sector even with that valuation is just in its infancy stage and there is a very huge opportunity that is available for all the companies to grow in this sector, since the loan penetration has reached, just 9% of the whole population of India.
  • 8. Market Share of Home Loan providers in India Fig 2 : A pie chart representation of all the players in the Indian Home Loan Market. There are other players too, which hold a place in this market. Some others with notably good market shares are SBI and ICICI Bank. The Pie-Chart below represents the market share which is held by these companies. They say growth comes at a cost, well it applies very rightly to this Financing sector, it is estimated, that these companies will require Rs 9,000-16,000 crore of external capital or in other words external funding to continue with their Industry average growth rate of 20-22 per cent in the coming years. Data Analysis :- In the present day and age we have reached a very remarkable point when compared to our past. This is being said in the
  • 9. context of the technological prowess that has now become an availability for all of humanity. It is important that we give due note to technology because that is what is the major enabler here. It is because of technology that we were able to collect data. When we say data, this can mean a whole variety of different things as per the situation we are dealing with, it is a word, with a very vast scope. The data that is collected as feedback by the customers at the billing counter in a shopping mall, by restaurants, by e-commerce companies for their products and also by other service providers all are different in nature but come within the domain. By the phrase above, there was intention to bring to light, what do we mean when we say Data. We do understand the diversity that comes into the picture when we talk about data. In rather simpler words, to conclude what Data means, in the context of this report and in the context of Data Analysis in general, it is a collection of facts, figures, details, features, information, evidence w.r.t to a particular task or operation.
  • 10. Now, when we have data i.e. available, the next step is to take this data and to do an analysis. We do analysis to extract insights from the data. By taking a look at an example, the significance of the analysis aspect of data will be clear to us. Here, we are taking an example, which most people would have had an experience with already. So, when we have to travel from one place to another, if we are not familiar with the route. We tend to use the Google Maps to guide us to reach the destination. Now, we opened the app, the map shows us in general at least 3 possible alternative routes to our destination. Here, at that very moment a lot of analysis is taking place. When we look at the map, we can see that there are portions of the route which are marked in red. Here, the machine is calculating the traffic condition based on the movement of Android users, which are present in that location. The server is analyzing this data and based on it is showing the red mark. Here, data was taken from the users, it was analyzed and then it was made available for other users, so that they can plan and take the best routes, accordingly.
  • 11. The Data Analysis is being done here in real time in the Google Maps app.This is an example wherein we see Data Analysis being used to make the transportation much easier. In very much the same way. Data Analysis today is being used in multiple places to take decisions so as to make the lives of the people easier and better. Usage and importance of Big Data Analysis in Business :- This here brings us to the core part of our report. The reason for saying this, is because, it is here, the researcher will shed light on how companies are able to cut down on their losses, improving the productivity, increase their sales, customized offers to their loyal customers all of this and much more with the help of Big Data Analytics. Right above we have made a list of a lot of benefits which are provided by Big Data Analytics. Let’s now look into the benefits one by one and understand it better way. The researcher would like to begin with cutting down the losses for the company. To explain how this is achieved using Big Data, we will talk an example of a use case business scenario. Sprinklers Pvt. Ltd, were using multiple marketing methodologies for the promotion of their products. Their marketing strategy involved distribution of pamphlets, field marketing, door-to-door marketing, stand-up display counters in malls and Digital marketing.
  • 12. The management took a decision to tone down the marketing and to focus by focusing on the channels which were turning out to be effective. To have an awareness as to which channel was leading to higher product purchase the marketing team collected the final sales reports from all the individual marketing channels. After reviewing all the reports, it was found that Digital marketing and door- to-door marketing were the top 2 highest when it came final sale conversions. After reviewing this report, the management team, to trim down on the losses, took a decision to focus only the top 2 marketing channels and discontinue the others, for a certain specific period of time. This way they were able to meet their projected targets and at the same time make savings in terms of man-power and the cash burn that was taking place. The second case the researcher will be discussing about how, Big Data Analytics is being used in the industry to improve the overall productivity.A United States based logistic company, UPS(United Parcel Services) which has Global operations had started off, by collecting data on the trucks they were using for the item deliveries. They had a listed a set of parameters like the routes taken by the trucks, performance, braking, weather conditions and average truck speed.
  • 13. The data was collected and after the analysis, changes were made on the routes taken by the trucks. By implementing the changes the company was able to save 85 million miles/yearly which meant they saved 8 million gallons of fuel from the daily routes of the trucks. It was found from the analysis that by saving just 20 miles/driver, the company was able to make a saving of $30 Million. The changes which were made possible, led to huge savings for the company. The Logistics company United Parcel Services, have since realized the significance of Big Data Analytics and from then made an effort to optimize their aircraft deliveries as well using Big Data Analytics. Let’s now have a look at the look at the last point we mentioned when it comes the benefits we have from using Big Data Analytics, which is providing customized offers to the loyal customers. Before the researcher, sheds some light on a company which offers product or service customization, there was an intresting insight the researcher came across. Bain & Company, a Global Management Consulting firm had conducted a survey of more than 1000+ online shoppers and it was found that 25-30% of these customers were looking for customized products and services online.
  • 14. The same research also revealed that people looking for customized services were willing to spend more money and were also more engaged with the retailer. Bain survey, clearly reveals to us that the buying behaviour of customers is progressively changing. It is therefore, important that the companies integrate this new customer expectation, into their business model. There is a company which has been at the forefront when is comes to the customization experience and that is Amazon. The reason why the researcher has picked this company is because of the sheer size and presence of this company. It is immensely huge and it is very much probable that a lot of people would have had the experience of having shopped on this online platform. It can be noted that when a customer has opened the Amazon website They will be shown a list of items as recommended for you. What is happening here is based on the Data collected from previous Search history, recent browsing pattern, the cache collected on the device the which is stored in the browser, all this data is being analyzed and customized products based on the customers choice is being shown to the customer.
  • 15. If the customer wants to re-purchase a certain item, they can do so now Within just a few clicks, if they were looking a for a certain specific item they instantly be able to see that item when they log in to the platform all this and more of such personalized features was offered to the customer based on analysis a lot of different data sets. So, buy having a look at all these different scenarios, we can now begin to understand the importance of Big Data Analytics in the business environment.
  • 16. Chapter 2 : Introduction to PCS Global Fig 1 :- PCS Global Pvt. Limited, company logo. INTRODUCTION PCS Global, is a Tech based company, which started out from Calcutta. The headquarters of the company is located there itself. The area of operations revolves around Information Technology services like Software testing and development, Web development, APP development, SEO and Web Hosting, Enterprise Resource Planning and few more similar services. Information Technology services are in the present day and age become more and more important for all types of business. Every company has certain sectors were it specializes when it comes to the services offered. Under the same context the primary clientele base for PCS Global are Banking and Financial services, Telecommunication, Media and Entertainment, Travel and Tourism and many others.
  • 17. MAJOR OPERATIONS The goal of the company has been increase the operational efficiency of the client, to increase their productivity, modernization of the technology being used at the enterprise level and customizing it to their needs and requirements. To provide these I.T. services and solutions the brings to the table offering and capabilities ranging from Systems Integration, Infra Services, Software development and maintenance and High-end server technology. COMPANY MISSION To direct all our organizational efforts at building upon the existing organizational strengths and brand recognition to achieve enhanced levels of profitable growth in the core business and diversify into new areas that compliment and supplement the core business, with the diversification aimed at achieving excellence and industry leader status in the new areas. The PCS Global people will however be encouraged to be open to unconventional ideas and services and recognize new trends at very early stages.
  • 18. COMPANY VISION PCS Global will be recognized and respected as professional, innovative, profitable information, and knowledge based IT enterprise. PCS Global embeds internet based technologies into its internal operating structures and as business solutions for customers; with customer, employee and shareholder interests at the core of its operations; demonstrating a clear concern for ethical conduct and good corporate citizenship; with the objective of growing into a regional and global player. AWARDS & RECOGNITION 1. Promoted to Pvt. Limited in 2010. 2. Received BOPT accreditation in 2017. 3. Got accreditation from HRD ministry in 2017. 4. Recognized as most effective training partner and awarded by various Colleges for their best training services. 5. Got opportunities to open innovation labs in several Government Engineering colleges with the help of I.T. ministry. ORGANIZATIONAL STRUCTURE
  • 19. PCS Global is registered private organization under the Ministry of Corporate Affairs(MCA). MCA is a government body which supervises all the corporate affairs in India through the Companies Act, 1956, 2013 and other allied Acts, Bills and Rules. MCA, with the help of the Bills and Rules also protects investors and offers many important services and rights to the stakeholders of the company. PCS Global follows the organizational structure as prescribed by the MCA for private companies. DEPARTMENTS IN THE ORGANIZATION The Departments that are in PCS Global are as follows :-  Software Development  Digital Marketing  Finance and Accounting  Marketing  Training  Human Resource  Operations Management
  • 20. COMPETITION & CLIENTS PCS Global is an I.T. company, I.T. is a foray where which there is tough competition due to the presence of high number of companies which operate in the I.T. products and services space. PCS Global operates both nationally and internationally. Therefore, given below are the names of companies that give PCS global competition nationally and then internationally. National :- 1) Eometric Software Solution 2) Sasken 3) Infotech Enterprises 4) Mastek 5) Polaris 6) Sapient 7) KPIT Cummins 8) Rolta India 9) L&T Infotech
  • 21. 20) NIIT International :- 1) Infosys 2) TCS 3) Wipro 4) HCL 5) Mphasis ( HP Subsidiary) 6) Oracle Financial ( earlier known as iFlex,subsidiary of Oracle) 7) Financial Technologies 8) Patni Computers 9) Tech Mahindra (now Owns Satyam which used to be a Tier 1 Company) 10) Mindtree PRODUCTS & SERVICES OFFERED
  • 22. PCS Global has a wide variety of offering when it comes to Product and Service offering to its customer base. Now, since the researcher is going to shed some light on management system. Let’s have a better understanding of this topic before we go ahead. Fig :- Diagram to explain how integration of I.T system and Business Management gives us Business Management System. The info-graphic was made by researcher for better visual representation. Since, each category here has a vast set of options one can choose from, let’s look at the each Product category 1st and then move on to the next one, which is the Services. So, now we have a info-graphic which tells us how these modern business management systems are made.
  • 23. Taking a look at this from a technical and accurate standpoint, any use of information technology system for the administration and management purpose would come under the domain of Information Technology as a Service(ITaaS). It helps in managing the day-to-day operations of the business. Let’s have a look at the IT services being offered by PCS Global.  Product Offering by PCS Global Fig :- The 1st four items that are offered in the Product category by PCS Global. Let’s now take a look at the each and every product offering in brief. The very 1st one we will be looking at is the hospital management system (HMS). In this fast-paced world, managing the operations of the hospital can surely be a very difficult task. Even we today we can see many hospital still following the traditional route wherein all the tasks liking checking if the doctor is available, registration, billing, waiting in queue before the consultation, all this and more and being managed manually.
  • 24. This traditional management system has now been replaced by a hospital management system (HMS) which is a computer based system which facilitates managing all the functioning of the hospital. It comes with a lot of benefits like the customers are now easier to manage, availability of the doctor is easily be known, registration and billing have become faster with the help of a computer etc. The 2nd product category is School Management System. Schools and Colleges today as all of us would have experienced have grown really big, in general we would be able to see that the number of students in any school or college on an average would be around 1000 - 2000, this is just the number of students, then comes the number of teachers, the administrative staff, the accounts section, the facilities staff and others. Managing all these verticals can become difficult. But, with the help of School management software managing these elements can be made easier. It is specifically designed in a way such that all the operations and the administration activity of the school or the educational institution is run efficiently and smoothly. The 3rd product category is Banking Management System. It is considered to be one of the most complex systems of all because of the vast variety of the things that is covered under this one roof.
  • 25. ` The aspects covered here goes from managing and protecting the customer information, information to the transactions that are happening every moment, recording the details of all such transactions, generating tabulated reports for recording and reference purpose, all this and much more come within the daily operation of a bank. Managing these events can be complex, therefore Banking Management System are used to reduces the dependency on manual labour and also the tasks which are automated, will be error free as they will only work as they are programmed whereas doing work manually may have possibility of slight human error. The 4th product category is Office Management System. In simple terms, it can be defined as a computer based system which assists in office administration. Office administration is a very vast area, it covers aspects like the multiple levels of administration like clerical, secretarial, senior/top management, chairman etc. Offices have different departments based on the company’s objectives. Coming to one of the next aspects of Office administration, we deal with departments and their associated function.
  • 26. Here, with the help of ITaaS, we strive to achieve a structured method of control over the daily operations, framed around the objective of the company. Fig :- The 2nd list of items that are offered in the Product category section by PCS Global. The researcher will now 2nd list of product offering by PCS Global. The first one here is Asset Management System. When the researcher says Asset Management System, he is essentially trying to indicate towards an I.T. application which is used to record and track an asset throughout its life cycle, which is right from the purchase of the asset to its sale. In the 2nd list, the next product category is HR Management System. Every single institution, organization or company that is present today requires a Human Resource(H.R.) department. The HR department is entrusted with a wide range of responsibilities which revolve around a core objective which is taking care of all types of needs, which includes, emotional, professional and physical well-being of the employee.
  • 27. HR functionalities have over a period of time grown to include more aspects like induction of new entrants, grievance redressal, employee payroll management, talent acquisition and management, workforce analytics, performance management, and benefits administration and many more. All these corporate HR operations are now are managed with the assistance of HR Management System. In the list, the next product category is Transport Management System(T MS). In the corporate scenario, TMS is viewed as a subset of Supply Chain Management(SCM), which in-turn at times may be a subset of the company’s Enterprise Resource Planning(ERP) system. Venn-diagram representing Mgmt. Systems in a company
  • 28. Fig :- A visual representation of Transport Management System which is a subset of Supply Chain Mgmt. System, which in-turn is a subset of the company’s ERP system. Here, the Other Dept. System represents all the other mgmt. systems being used by the company like Finance, Marketing etc. Source - designed by researcher. The visual representation shows that the Transport Management System(TMS), is within the Supply Chain Management(SCM) System. By this the researcher is trying to indicate that the vast nature of SCM system. When it comes to SCM, there are so many verticals that are present for example inventory management, supply and demand forecasting, inventory maintenance, fulfillment of orders being made, supplier relations and since in the present day and age since we have given our final end-users the facility to return the goods if they don’t like it supply chain therefore also includes Returns Management.
  • 29. In the list, the next product category is Open Source Portal Applications. So, to understand this let’s divide this term into to two halves. The 1st one is Open Source and the 2nd one is Portal Applications. The researcher would like to explain it part-by-part. The 1st part is Open Source, by this it means that the source to execution of a particular work is available or open to all the people. Here, let’s look at it once again, to gain a better understanding, Source means an application or a tool. Let’s take an example to understand this better. Openshot is an open source Video editing software. Here, this application was made by the developers and then it was offered to the whole public for Video editing for free. So, this becomes a source for editing and it is Open, meaning available for everyone to use and work. Let’s now go for the next bit which is Portal software. Portal software essentially means a gateway to a service which is provided via intranet or internet facility. Let’s understand this better, there is dimension which enclosed from all sides. We can imagine this to be a huge sphere. Now, this sphere has an entry point. We can enter from this entry point and the access the services which we need and then when the work is done we can come out of the portal.
  • 30. It is important to note that the service can only be availed within the limits of the portal. Once we come outside the portal we cannot access the service. In the corporate environment we can come across many such services. A company can have an Open Source portal for all of its company employees, which means all the employees of the company can access the portal to do the specific task. The portal can be for email, messaging, calling, Customer relationship management(CRM), work-flow maintenance and management etc. Fig :- The 3rd list of items that are offered in the Product category section by PCS Global. In the 3rd and final list, the 1st product category is Publication Management System. Publication essentially means, a business which involves distribution of content. The nature of content we are trying to publish can be of various types like advertisement, information, news, details about the sale of a product, service or even a real estate property. Apart from the
  • 31. types mentioned here, it can be used for for any other purpose deemed suitable by the publisher as well. This was about the content which is to be published. Now, when we are working on any one of the type of content mentioned above. There are other functionalities that come to the picture. Let’s take one type to understand this aspect better. Let’s assume that we are publishing news content. This requires communication from multiple sources to one single point. Then the content that is being transferred has to be securely transferred, such that it does not get leaked or is hacked by any other party. The next step would be secure storage of this content. Then comes the challenge of availability of this content to the various stake holders withing the organization. Now again this sharing is preferably done over a secure internal communication tool. The next step would involve audio, video or text editing of the content. Finally, after all these layers of refinement the content is published. To manage all these tasks a Publication Management software package is developed which helps the organization to perform their tasks with efficiency, security and in an organized manner. In the 3rd list, the 2nd product category is Store Management System. Today, Store Management System has become a critical component of every retail
  • 32. business. It is very common to have seen this management system at work. All of us would have gone to any physical store to purchase some product, grocery, shoes, medicine, books or any other item and we would have seen that they are entering all the details in the computer to generate the bill. The Store Management System that is being used is making a record of all the items that have been sold. This software can be accessed by the management to understand how many items were purchased and how many were sold, which category of item is selling more similarly which category of time is selling less, which have not yet sold and so on. It provides data of all these multiple parameters to the management. So, after viewing this data, the management can decide which category they can offer more offers and discounts to improve their business. All this is made possible by using a Store Management System. The next product category is Financial Management System(FMS). Every company has to manage its financial activities which includes keeping a record of the salary i.e to be paid to the employees, paying the taxes based on the revenue, savings, contingency fund, pre-paid bills to clients, purchase of products and services all this and more. To manage such a wide range of Financial operations listed above by the researcher, we make use of Financial Management System which helps in
  • 33. effective utilization and management of the monetary asset that the company has at its disposal. Venn-diagram representing Mgmt. Systems in a company Fig :- A visual representation of company’s ERP System which holds all the different subsets. Here, we focus on two subsets, SCM system and Material Management both of which come under Supply Chain Mgmt. System. The Other Dept. System represents all the other mgmt. systems being used by the company. Source - designed by researcher. In the 3rd list, the final product category is Material Management System. When we take a look at this domain’s basic operation, it is seen that it takes care of the proper supply of materials so that the manufacturing of the product is taking place in an efficient manner.
  • 34. “Material management is the planning, directing, controlling and co-ordination of all those activities concerned with material and inventory requirements, from the point of their inception to their introduction into manufacturing process” - L. J. De Rose Sir, De Rose, summarized the activities that come under the domain of Material Management very beautifully in the above words. He talks about all the activities, right from procurement of materials and ends with final manufacturing of the product, all activities that come within the frame of these two points are a part of material management”  Service Offering by PCS Global The researcher will now shed light on the services which are offered by PCS Global Pvt. Ltd., the first list of items that are a part of servics are shown in the Figure below.
  • 35. Fig :- The 1st list of services that are offered by PCS Global. The 1st in the list of services offered by PCS Global, is Software development. In the present day and age, as it was discussed in detail in the previous Product section, it is seen that there has been a lot of integration of software in the process or operations of the company. Fig :- The info-graphic represents the cycle which is followed in the software development process. Source - Wikipedia. Edited - by the researcher. To increase the efficiency and the profit margins, companies today are making efforts to make sure that they have the most advanced software packages, which is being used for the company operations. In the Figure above, we see the step-by-step process that is followed, for the development process. PCS Global provides this service and the speciality is that it does by involving the client at every step of the development process.
  • 36. PCS Global, believes in delivering value and it understands that the needs of companies differ, therefore making the client a part of the complete development process is important. This helps in delivering a Final product which adds the Best and the Maximum value to the client company. The next in the list of services offered by PCS Global, is Software Testing. Here, we see a similar approach being applied to test the software. When software is being built or it has already been built the next phase in the process is the testing part. Fig :- The info-graphic represents the cycle which is followed in the Software Testing process. Source - designed by the researcher. In the Figure above we see the Software Testing Life Cycle (STLC), which is followed by PCS Global, to deliver software products which are fail-proof.
  • 37. It shows the step-by-step, process used in Testing. Here, we start by understanding the objective, do the planning, start with the development, put the developed bit of code in an environment similar to the actual environment which comprises of hardware, software and network components. After having tested the codes in a simulated environment. We then move to Testing Execution, where we Run the code in the actual environment. Then once the results are obtained, the Test Cycle comes to a closure. Data Science, is a huge umbrella which includes vast number of services like Machine Learning, Big Data analytics, Database management, Business Intelligence, Natural Language processing, Data extraction transformation and loading, Visualization of Big Data and Predictive analytics. PCS Global provides all these services, which helps companies towards running their businesses in a better way. Web Development is increasingly becoming more and more important as we see with the Digitization wave, it is important that all the Businesses today have good online presence. PCS Global offers services in this area and has a good number of experts within the company which handle all the areas of Web development and even the critical ones like HTML, PHP and Graphic Designing.
  • 38. Fig :- The 2nd list of services that are offered by PCS Global. The Figure above shows the 2nd in the list of services offered by PCS Global, the first out of which is Application Design and Development. Here, the App. Development team of PCS Global, understands the business model of the clients and then goes ahead towards designing and building the application solution, which meets the requirements of the business. The next in the list of services offered by PCS Global, is SEO and Web Hosting. Here, the company provides facilities to a company, startup or even an individual who would like to have a website and increase their online presence. This include a vast number of services like website designing and building, determining the type of hosting that would be the best fit for the client, understanding and estimating the technical resources that would be required by the website like storage, RAM, bandwidth, data transfer rate,
  • 39. uptime of the website and more, all which contribute towards making the best website. Enterprise Resource Planning(ERP), refers to software solution, that is used to manage all the operations that is taking place within an organization. Most often ERP packages are custom built as per the requirement of the company. The package can be used for 1 department or for more than 1 department. In case it is being used for more than 1 department, there is an option provided in ERP packages, which allows both the departments to be controlled and monitored within 1 software framework, this is a very important feature of ERP packages. Education and Corporate Training, under this PCS Global, provides Technical Training to students and interested individuals who would like to learn and gain experience, working in technical domains like Java, R, Python, SQL etc.
  • 40. Chapter 3 : Internship Methodology 1. Internship problem The Bank, was facing a few challenges when it came to running the daily operation, which includes functionalities like analysis of the Big Home Loan Dataset which comprised of details of the Home Loan applicants which was time consuming, probability of human errors, partiality towards specific applicants, high cost spent on employees engaged in the manual analysis of the records, inconsistency in final reports over the same Dataset or Applicant Records, as Data size evolved over a period of time to become Big Data, it was becoming impossible to manage and meet the expected dead lines set by the Bank.
  • 41. Understanding the impact of using Python, an object oriented programming language, to address all these complications. 2. Significance of the research The primary aim was to understand to what degree modern technology was capable in impacting any business, when it comes to running it in a better and a more efficient manner. The researcher was given an opportunity by PCS Global, to work on a real- time Banking Dataset. The Dataset is a list of details which was filled with the help of applicants who were interested in taking Home loans from the bank. For security and confidentiality reasons, the name of the bank has not been disclosed. Home Loan is a very nascent sector and has a huge business potential. The researcher has shed light on this aspect and given more details related to its importance in the introduction section of the research. Now, with the help of software applications i.e available today, we will understand how the operations of the bank has been benefited by integrating this in their daily operations.
  • 42. For this purpose of modernization and automation of the banking process, the researcher has chosen Python, which is an object-oriented programming language. It is very versatile and easy to understand, therefore the reason for use in this task. Let’s understand how Python was helping towards making the operations of the bank faster and more efficient. We have chosen to work on the 1st layer of operations when it comes to the issuing of the Home Loan. Here, the applicant is an individual who is in need of a Home Loan from the bank to purchase a home. The researcher is defining the terms so that there is complete when it comes to the context and the words being used in the report. The 1st step, is that the applicant provides required details to the bank. This is carried out in a number of ways, it can entered by the bank staff, in a sheet of paper offered to the applicant etc. The main purpose here, is to gather details of the applicant which is used to understand if the applicant is eligible for the Home Loan by the bank, as per the banks terms and conditions. To give an idea here, the applicant is asked details such as applicant income, co-applicant income, dependents of the applicant, credit history, martial status and so on. Analysis of these factors, gives the bank an idea, whether the loan can be issued to the applicant or not.
  • 43. Now, once again coming to the initial aspect, how can technology help us in running businesses in a better way. So, the researcher described that the details were collected by the applicants. The next step was analysis of the factors. Up-till, now this is Home Loan Data, was being manually analyzed by designated banking staff. With the help of Python, all this Data can be analyzed and the results can be obtained within a few seconds. As the numbers of applicants were exponentially growing the Bank was facing a really hard time to manage and meet the deadlines. With the help pf Python, the entire primary applicant analysis task of the Bank was automated and not only that the number of applications that were coming in, did not matter any more. As Python was able to instantly analyze and give results if it was 1000 applicants or for 100,00,000 applicants. With the help of Python, the researcher created a model, which acts as a filter. We 1st take the previous records of the company, which comprises of all the details plus one more additional column, which is the Loan Status of the applicant. We use this as the Train Home Loan Dataset, meaning we run Python on this to understand which combinations were eligible for the Home Loan.
  • 44. After having figured out the model, which is available in the form of an equation. We run this model on the Test Home Loan Dataset, which has all the details of the applicants but without the Loan Status. Here, we make the column, but in the start since we do not know the status, the entire column of Loan Status is empty. Once we run the model, in the new Test Home Loan Dataset, we get the results in the Final Loan Status Column. Python was able to understand what was the way the Bank was issuing the Loans by analyzing the previous Dataset and now this operation, was no longer needed to be done manually. This automation provides enormous benefits to the Bank. 2. Objectives 1. To understand the various benefits gained by the integration of Python with the daily Banking operations. 2. To find the pattern of applicants applying for the Home Loan. 3, To identify and build a model using Regression Testing in Python, which allows for propagation from manual analysis to automated analysis. 3. Hypotheses
  • 45. 1) Higher the applicant income, higher will be the probability of that individual for getting the Home Loan from the Bank. 2) Urban applicants will have the highest number of loans sanctioned followed by Semi-Urban applicants and lastly Rural applicants. 3) Applicants who are married have a better chance of being eligible for the Home Loan. 5. Scope of Study The study was done at PCS Global Pvt. Ltd., which is Tech company based out of Calcutta, with a branch in Bangalore. The company provides Data Science based services to multiple various companies. In this study we have taken a Live Home Loan Dataset of a Bank, which is of a . The study was done on a fixed Dataset, as we understand the basic dynamics of Banking process wherein we see that the number of applicants goes an increasing w.r.t time. The actual Dataset, is in fact variable in nature due to this aspect of banking. The other limitation is a fixed time frame. Banking operations run 24/7*365 days. This means that Data is continuously in a state of change. Applications are accepted, some are rejected, some are deemed repetition
  • 46. and are removed and similarly a lot of change happens to the Data around the clock. Therefore, for the purpose of our study we have chosen a Dataset of a fixed number and which is of fixed time frame. While it is possible to do the analysis for the Live Dataset, the Live Data Analysis lies beyond the scope of this study and is very much a part of the daily operations of the bank. 4. Methodology As mentioned, previously to PCS Global, is a software solutions company. Therefore, Data Science services is one amongst the many services it offers to its clients. The researcher was a part of the Data Science team at the company which provides this facility. The Home Loan Data, was provided by the Bank, to the company. This Data is collected by the bank in a number of ways like a the applicant is seated in a cabin, wherein the bank staff asks the related questions and fills in the data, a form is given to the applicant and he/she is requested to fill the details and then submit it at the counter, there is a provision to apply online as well, which comprises the same questions as mentioned in the physical form.
  • 47. Due to privacy and security concerns of the bank, representation of the questionnaire is not possible. The Dataset which is a refined final representation of the same has been made available after having secured all the necessary permission to do so by the bank authorities. The Data is therefore available to the company as Secondary Data. The Data is viewed and analyzed using Python. The Graphical Visualizations are also prepared using Python with the help of libraries which are a part of Python framework. The sample size of the Home Loan Dataset which has been provided is, 614 rows and 13 columns. All further study using this Dataset has been done using Python. 5. Sample Design The sample that has been taken for the study, is Probability Sample set, meaning there is equal opportunity provided for the occurrence of every possible variable to be available proportionately in the sample that the researcher is taking for the examination. One other reason, for taking a Probability Sample set, is that one of the researcher’s objective is in automation of the primary analysis, which helps
  • 48. in deciding the Loan Status of the applicants by analyzing the Data that is available. Since, the efficiency and the accuracy of the model i.e. generated which will be used in the automation process depends a lot on the diversity of the input data i.e. been provided. For this purpose as well it is important that the sample we take for our study be diverse. Therefore, it is essential that we obtain the Data by the method Probability Sampling, which ensures that to a large degree that there is diversity in the Dataset. Sampling Method :- Probability Sampling, under that Simple Random Sampling. Tools and Techniques used in the study A. Tools used in the study :- PYTHON :- It is an object oriented programming language. The researcher opted for using this platform for the Data Analysis of the Home Loan Dataset that has been made available. A few reasons for using Python was that vast amount of libraries that have been built which can be used along with Python for Data Analysis.
  • 49. All these are also open-source resources i.e. made available for everyone to use. These factors help in making Python a right choice for Data Analysis. JUPYTER :- As mentioned above, the researcher opted to use Python. Now Jupyter is a free and open-source environment for running Python. There are a number of benefits of running Python using this coding environment like the aesthetics are simple and intuitive, in-built libraries, narrative text imagery is better, better visualization and so on. Packages required within Python · NumPy :- It is a library in Python programming language, which is primarily used when there is a need for working with muti-dimensional arrays and matrices. It is also capable of solving complex mathematical functions i.e used during Data Analysis. · Pandas :- It is a library in Python programming language, which is primarily used for Data manipulation and Data analysis. This package was built upon the NumPy package. Since in our study we have a Dataset which comprises of Rows and Columns, this package will be very useful as this library unlocks many functionality for this type of a Dataset, specifically.
  • 50. · Matplotlib :- It is a plotting library meant for Python programming language. In our study we will be analyzing the Data with the help of graphs. Therefore, this package provides us with all the provisions required for plotting of the Data. · Sklearn :- It is an open-source machine learning library meant for Python programming language. This library is used as it helps by providing many Statistical capabilities like regression testing, classification, and clustering all of which we will use in our study for the purpose of Data Analysis. Data Analysis Introduction :- Here, the researcher will explain the manner in which he will be performing the Data Analysis on the Home Loan Dataset. Firstly, we will open Jupyter Notebook, which as mentioned before is an environment wherein we will be running Python. Once, Python is up and ready. The researcher will import the Dataset which is in CSV(comma separated values) format in the Python environment. By this we will be able access and view the Dataset.
  • 51. After, importing the Dataset, in the environment, the first action that is performed is viewing the Dataset. Viewing the Dataset in a tabulated way provides a basic understanding of the Data that we will be analyzing. Dataframe 1 - The first 4 columns of the Home Loan Dataset Table 4.1 Fig :- The table shows a the 1st 5 rows and 4 columns of the Home Loan Dataset. The researcher shall now begin with the analysis of the Data. In order to have a deeper understanding of the data, Univariate analysis is taken.
  • 52. Graph 4.1 Fig :- Gender distribution in the Dataset. As we can see, the researcher has selected the 1st column - Gender, of the Dataframe that has been plotted. From the bar-graph it is very evident that the number of male applicants received were very high. A business use case of this Data would be, is when the bank will be preparing their marketing strategy. This gives an insight of the audience the marketing team can focus on. With the help of Python, it is possible to extract the number of applicants in percentage format. The researcher understands that representing this Data in percentage form would be further helpful.
  • 53. Fig :- Shows the Gender of applicants in percentage format. Here, we can see that Males are 81% and the Females are 18% of the overall applicants for the bank. This figure would as mentioned help in building products and services which are aligned to the awareness of this percentage. For the purpose of analysis and to extract the model, the Dataset that we have received is of 2 types. The current Dataset which is being used in the Data Analysis is training Dataset. The other half is the testing Dataset. The difference between these Dataset is that, in the 2nd Dataset which is test Dataset, we do not have the Final column which is Loan Status. This column contains whether that particular individual which can be uniquely identified using the Loan Id, was eligible for the Home Loan or not. The reason it is absent is because, we will be extracting the model from the training Dataset and then once we fit the model and run it in the test Dataset. Since, model is a filter which has been created by analyzing the previous Data. Now, we can obtain the final Loan Status for any number of applicants by just entering their data and running the model.
  • 54. We have obtained this model by a Statistical measure called Regression Testing. Here, the researcher has given a brief idea. As we go on further, more detailed explanation will be given w.r.t all these elements. Graph 4.2 Fig :- Marital Status of the applicants
  • 55. Here, the researcher has chose the next column to perform the Univariate analysis. So, in the graph above we see the results. From the graph we can infer that the number of applicants seeking Home Loan is higher for married individuals. This is a data point which gain is quiet inherent when seen in the community, where it is seen that people get married and then they aspire to own their own home. The Data quantifies this belief. Fig :- Marital Status in percentage format. As it is more clear to understand the distribution when the data is represented in percentage format.The researcher has again shown marital status in percentage format. It can be seen that the Married applicants are at 65% and Non - Married applicants are at 34%. It can be said that this reason behind the pattern of this Dataset is understandable, i.e. more married individuals are looking towards owning a home and for that purpose they are taking a Home Loan. Graph 4.3
  • 56. Fig :- Distribution of dependents of the applicant. Here, the researcher has chosen to represent the Dependent column in the Dataset. Dependent here means people who rely on the needs to be fulfilled by the applicant. In more simple terms it means that the applicant is being a care taker for an individual. The Bank has very stringent laws when it comes to qualifying an individual as a dependent. The Government of India has made it mandatory to follow the case of dependents with vigilance as this related with Tax benefits that can be availed by the applicant.
  • 57. The following come under the umbrella of dependents, it can be the applicant’s wife, child, parents etc. There are a set of qualifications which one has to fulfil to qualify as a dependent. The bank has verified these qualifications parameters, compiled it and presented it in the Dataset. It is now, understood by what Dependents mean, in the context of the graph and the Dataset. With this understanding when we take a look at the graph, we can infer that for most of the applicants here have zero dependents. As a business case this is a rather positive figure for the bank, as it indicates less liabilities or expenditure of the applicant, which means the applicant will be capable of paying the EMIs on a regular basis, without any hassle. Graph 4.4 Fig :- Diversity of applicants, in terms of education, which is represented in percentage format.
  • 58. The researcher has shown the graph, now in percentage format, which is better at giving an insight of the distribution in the Dataset. In the graph above, the researcher has chosen to represent the Education column in the Dataset.From the graph, we can infer that 75%+ applicants are Graduates. The percentage of Non-Graduates here is at 20%. Whence the Bank, is making a marketing strategy, percentage will help the team to have a clearer picture of the scenario the bank is facing.By looking at the figures, it is evident that the bank is doing more business with the Graduates. A business use case of this Data, would be, is to understand the challenges Bank is facing when it comes to doing business with the Non-Graduates. Now, India is a country wherein average literacy rate is at 75% (as on 2019), which leaves us with 35% who come under our category of Non- Graduates. Taking an estimate of the number of people, 35% of 136 Crore population, we get 47 Crore people who come under this bracket. Out of the 47 Crore population, there are many who are financially well-of by doing businesses. Now, these people may not be graduates, but they have businesses which is taking care of their needs. It therefore definitely becomes a possibility, that the can tap into this category of people. Provision should be made, to understand this sector and to provide for it, which will help to improve the business of the bank and at the same time help the people who are Non-Graduates own a home. Graph 4.5
  • 59. Fig :- Employment distribution, Self-Employed vs not Self-Employed applicants The researcher has chosen to represent the Self_Employed column in the Dataset. The bank has made an attempt to understand how many of the applicants are having their own businesses and how many are employed. From the graph, we can infer that most applicants are employed with an organization. This can be viewed as a positive sign, as there would be stable and consistent flow of income coming in for the applicant, unless for rare scenarios which would lead to un-employment for the applicant. Here, again as the researcher had observed in the previous graph as well, the business engagement with Self-Employed applicants is seen to be less. At this point we will have to verify, whether the policies of the bank are coming in the way of development in this category. This pattern will have to be notified to the bank. As it holds scope for modification.
  • 60. Graph 4.6 Fig :- Credit history of the applicants. Here, 1.0 indicates all dues paid and 0.0 indicates all dues not paid or dues to the bank not paid on time. The researcher has chosen to represent the Credit_History column in the Dataset. The bank would like to understand previous loan behaviour of the applicants. It is understood that Home Loans are high value loans, therefore before it is lent out, very precaution should be taken to understand if the applicant is eligible or not. From the graph, we can infer that most applicants have successfully repaid all their past dues. When it comes to percentage, the representation of 1.0 which is successfully paid all their dues is at 84.219 % and 0.0 is at 15.78 %.This is a positive sign for the bank as it means that there is a good probability that there will be zero defaulters.
  • 61. It would be possible to filter out all the applicants who have previous history of not having paid the due or having paid the due later. Extra caution would be suggested when dealing with these applicants. If it is noted that there is consistency when it comes non-payment, it would be best to not process the applicant for the Home Loan. This would help the bank in safely utilizing its funds. Graph 4.7 Fig :- Property Distribution amongst the applicants. The researcher has chosen to represent the Property_Area column in the Dataset. This gives us an idea of the location(Rural, Semi-Urban or Urban) of the property, the individual is looking to purchase. There are multiple business use case of this data, the bank can understand in which location they are having more business and in which location they are having less business. This would help them to allocate the resources of the bank accordingly to improve the business.
  • 62. Urban areas have proved to have needed high investments and they also have the highest Return on Investment(RoI) when compared to the other 2 locations. So, with the help of this Data, the management can strategize and take actions to improve sales for Urban properties. From the graph we can also infer that the sales in the Rural areas is the lowest when compared with the other 2 locations. It is important to understand why it is the lowest. Is it that the awareness about the provisions of the bank is less amongst the rural folk. If it is so then the bank should take steps to strengthen the marketing team deployed in the rural areas. Overall, this data helps in understanding areas the company is doing well and the areas it can deploy resources to make the necessary improvements. Graph 4.8
  • 63. Fig :- Distribution of Applicant of income. The researcher has chosen to represent the Applicant_Income column in the Dataset. This column as the name suggests is a record of how much is the monthly salary of the applicant.Here, the researcher has used normal distribution method to represent the applicant’s income distribution. Here, the researcher has used box-plot method as well to represent the applicant’s income distribution.These methods were used as every individual had a unique income. To make a bar graph representing each and every income would be difficult and un-fruitful. Therefore, we have used these 2 methods, Normal distribution and Box-plot method.
  • 64. From the graph 4.8, we can see that there is a high peak within the income range of 0-20,000 this indicates that the number of applicants whose income is below Rs. 20,000 is high. Peak is one measure which can be extracted from the graph. The other measure is the width. It can be seen that the width is more at the 0.00005th level in the y-axis. Graph 4.9 Fig :- Distribution of Applicant of income in Box-plot method. This indicates that, there are a lot applicants who have their monthly salary around Rs. 5,000 in the applicant group. Let’s now take a look at the next one which is Graph 4.9. Here, also the researcher has represented the applicant’s income distribution, but in Box-plot method. This style gives us a very clear idea of the distribution of income amongst the applicants.
  • 65. Here, we can see there is a base line, which represents the start. Then there is a wide box at 5000 mark which represents the maximum concentration. The next smaller wide line is at 10,000 after which it can be seen that there is a series of dots. Here, the width indicates the concentration in the graph of the applicants. The level indicates the income, where the concentration is noted in the graph. Graph 4.10 Fig :- Distribution of the term of Home Loan. The researcher has chosen to represent the Loan_Amount_Term column in the Dataset. This column represents the duration of the Home Loan taken by the applicant.
  • 66. The time the applicant will take to repay the bank for the sum borrowed. In the graph the numbers indicates the total number of months. To calculate the duration in term of years, the number can be divided by 12. So, 100 months would come up to 8.3 years, 400 would come up to 33.3 years and so on. From the graph we can infer that there is peak in the 300- 400 months range. In terms of years it would be 25-30 years repayment duration. Graph also shows a short peak at 200 months mark, which indicates that a few applicants plan to repay their loan within the 10-15 years period. A business use case here, can be to make an attractive Home Loan offering for the applicants who choose 25-30 years repayment duration and market it to all the people. Applicants already prefer the time duration, adding more value by giving special interest rates will help boost the business. Graph 4.11 Fig :- Loan Amount distribution, represented in Box-plot format.
  • 67. The researcher has chosen to represent the Loan_Amount column in the Dataset. This column represents the Loan Amount taken by the applicant. The Y-axis shows the Loan amount, here 200 represents 20 Lakhs. The scale has been chosen for easier representation, on paper. From the graph we can infer that majority of the loan is in the range of 100-200. In actual terms in the range of 10-20 Lakhs. The next such concentration is seen near the 300 mark, which represents 30 Lakh. Then we have the concentration going on decreasing the Home Loan Amount increases. We have reasonable variation till 50 Lakh mark, after which there is a very high decrease in the number of the applicants and as we go above the 60 Lakhs mark the number becomes single digit. A business use case here can be to make policies which can help in sales of higher value home Loans which are above 50 Lakhs mark. On one hand it would be advisable to be cautious, but at the same time higher the Loan amount taken higher will by the return the bank will be able to generate.
  • 68. Graph 4.12 Fig :- The Target variable - Loan Status. Here, “Yes”, indicates the applicant was eligible for the Home Loan as per the primary analysis and “No” indicates that the individual will not be further processed for the Home Loan. The researcher has chosen to represent the Loan_Status column in the Dataset. This is the most important column in the Dataset. This is the final outcome of the analysis which has been done by the bank for that particular applicant. As mentioned we have with us, 2 sets of Data the Training set and the Test set, in the Training set, for which we have done the analysis, which has 614 rows and 13 columns, the Loan_Status has been manually determined by the banking staff.
  • 69. The researcher will be running the Regression Test in the Training set to determine the model. Once, we run the model in the Test Data set, we will get the Loan Status of the Test set, which does not have the Loan Status manually determined. Since, the Loan Status is the Target variable, the researcher will be presenting here all the details, related to this variable. Fig :- The figure here, is a snippet taken from Jupyter Notebook. Here, “train” indicates the Training Dataset. The column chose is “Loan_Status”. The next element, “value_counts” is a function in Python which helps to count the variables in the column. Here, the researcher has shared, a snippet, to present all quantitative representations of Loan_Status. From the figure we can see that out of the total 614 applicants, 422 are eligible as per the primary analysis and 192 applicants are not eligible.
  • 70. Fig :- The data has been represented in percentage form. Here, “Yes”, indicates the applicant was eligible for the Home Loan as per the primary analysis and “No” indicates that the individual will not be further processed for the Home Loan In this figure, we see the percentage being shown again but this time in a more basic representation. From the figure, it is seen that the 68.72% of the applicants are eligible and 31.27% are not eligible as per the manual analysis done on the training Dataset. The reason for presenting more w.r.t the target variable Loan_Status is to get a better understanding when it comes to the most important column in the Dataset.
  • 71. Graph 4.13 Fig :- Bivariate analysis. Loan Status and the Gender of applicants. The researcher has in the figure above represented Bivariate analysis done on the Loan Status and the Gender column. The Y-axis shows the number of applicants, in percentage format for better understanding. From the graph we can infer, when it comes to approval rates for male or female candidates the numbers are quite proportional to each other. Comparatively, both are at the same level with respect to each other.
  • 72. When it is viewed in the context of percentage of applicants who were eligible for the Home Loan, it can be seen from the graph 4.12, that both male and female applicants have performed fairly well with close to more than 75%+ of the applicants who were eligible as per the primary analysis for the Home Loan. Graph 4.14 Fig :- Bivariate analysis. Loan Status and the Marital Status of applicants. The researcher has in the figure above represented Bivariate analysis done on the Loan Status and Married column. The Y-axis shows the number of applicants, in percentage format for better understanding. From the graph we can infer, when it comes to eligibility status, the married applicants have performed better that their counterparts. One of the reasons for the spike can be because in the case of married applicants they have
  • 73. co-applicant income which adds to the stability quotient of the applicant.As in the present day and age both the partners are are earning members of the family. Although there is a difference, it is rather small. A business use case here can be that the bank can analyze the previous defaulter records and find out which category has defaulted most times the married or the non-married applicants Based on the analyzed figure, we can understand which category has more risk and guided by the report we can make plans to promote Home Loans to the category which has lesser risk, for better business. At the same time it is important that we don’t put all the eggs in 1 basket. Therefore, it will be advised to follow the right proportion based on the current findings.The researcher had hypothesized that the married applicants have a fairer chance of performing better at the Loan Eligibility status. With the help of the Data representation above it is seen that this hypothesis holds true. Graph 4.15 Fig :- Bivariate analysis. Loan Status and the number of Dependents.
  • 74. The researcher has in the figure above represented Bivariate analysis done on the Loan Status and number of Dependents. The Y-axis shows the number of applicants, in percentage format. From the graph we can infer, when it comes to eligibility status for the number of dependents, there is an inconsistency. Fig :- Percentage calculation with exact number of applicants shown for 0,1,2 and 3+ dependents. For 0 dependents the eligibility status is at 31.04% it increases as the number goes up by 1 and then we can see that as the number goes up again by 1, making it 2 dependents the non-eligibility status decreases to 32.8%. From the data we can see that the eligibility status percentage is the highest for 1 dependent and for the 3+ dependents. When we view the data in term
  • 75. of numbers and not as percentage it is very evident that 0 dependents is the highest when compared to all the others. Therefore, the increase and decrease in eligibility status here can be said to be inconsistent. One more intresting aspect we can note from the figure is that non-eligibility percentage for 3 dependents is 100%. This is possible as we have a finite sample under analysis. When it would come to bigger and more diverse Dataset, this would not be the case. Else it would mean that if a person has 3 dependents the person would not be given a Home Loan, which is not true.What we can conclude from this analysis is that number of dependents has less priority when it comes to finalizing the loan eligibility status for the home loan applicants. Graph 4.16 Fig :- Bivariate analysis. Loan Status and Education of applicants.
  • 76. The researcher has in the figure above represented Bivariate analysis done on the Loan Status and Education status of the applicants. The Y-axis shows the number of applicants, in percentage format. From the graph we can infer, when it comes to Home Loan eligibility status in context to the Education status of the applicants, the graduates have a comparatively better eligibility percentage. Fig :- Loan Status w.r.t Education of applicants in percentage format. The researcher has in the figure above represented shown the exact percentage. The 70.83% is percentage of graduates who are eligible for the Loan. The second calculation in similar context is for the non-graduates. Graph 4.17
  • 77. Fig :- Bivariate analysis. Loan Status and Self-Employed applicants. The researcher has in the figure above represented Bivariate analysis done on the Loan Status and Self-Employment status of the applicants. The Y- axis shows the number of applicants, in percentage format. From the graph we can infer, when it comes to Home Loan eligibility status in context to the Self-Employment of the applicants, it can be seen from the graph above and the snippet below that there is negligible difference, between the two. Fig :- Loan Status w.r.t Self-Employment status of applicants in percentage format.
  • 78. All though when we take a look at the graph we see the percentage to be similar, when viewed in term of number there definetly seems to be big difference. The total number of applicants for not self-employed is at 500 and that of self-employed is at 82. Graph 4.18 Fig :- Bivariate analysis. Loan Status and Credit history of applicants. The researcher has in the figure above represented Bivariate analysis done on the Loan Status and Credit History of the applicants. The Y-axis shows the number of applicants, in percentage format. Here, in the graph, “0.0” represents applicants who have either not paid their dues at all or have not paid it on time. “1.0” indicates the applicants who paid all their dues and on time.
  • 79. Fig :- Loan Status w.r.t Credit history of the applicants in percentage format. From the graph we can infer, that for the sake of precaution, bank has sanctioned less Home Loans to previous defaulters. As there is a probability that they might repeat it once again. If it is sanctioned then all the necessary documents and formalities are to be followed. Graph 4.19 Fig :- Bivariate analysis. Loan Status and Property Area of applicants.
  • 80. The researcher has in the figure above represented Bivariate analysis done on the Loan Status and Property Area of the applicants. The Y-axis shows the number of applicants, in percentage format. From the graph we can infer, when it comes to Home Loan eligibility status in context to the Property Area of the applicants, it can be seen from the graph above that the bank has approved highest Home Loans for Semi- Urban areas. Fig :- Loan Status w.r.t Property Area of the applicants in percentage format. The percentage and the total number of applicants both are noted to be highest for Semi-Urban applicants. The other applicants are very close to each other, in terms of percentage and the number.
  • 81. Graph 4.20 Fig :- Bivariate analysis. Loan Status and income of applicants. The researcher has in the figure above represented Bivariate analysis done on the Loan Status and income of the applicants. The Y-axis shows the number of applicants, in percentage format. The researcher has made use of range in this graph. The applicant income is unique from each other. So, when we use range we can cover all the unique entries.
  • 82. Fig :- Snipped of Python code, which makes use of bins to make range for the applicants. Here, it can be seen that in the 1st line of the code we made use of numbers to define the range of each bin. Low, Average, High and Very High all these are bins. Each Bin here represents a range. The range of the 1st bin is from 0 - 2500 Rupees, the 2nd one which is Average has a range of 2500 - 4000 Rupees and it is the same for the other 2 bins as well. Now, there is an understanding of the bins used. The next part is to understand the Loan Status in context with the income of the applicant.
  • 83. Fig :- Loan Status w.r.t income of the applicant in percentage format. Please do note that the snippet shown above displays 2 important information. On the right hand side we have the number of applicants, which are arranged as per the range and their eligibility status. In the right hand picture, “Y” and “N” represent yes and no. On the left hand side we have taken out the percentage using the number shown on the right. This is done for better understanding of the scenario. The highest number of applicants fall in the Average bin( 2500-4000 Rupees). The next data that we can extract is the High bin(4000-6000 Rupees) has the highest eligibility status. This proves our hypothesis wrong, which states that higher the applicants income higher will be the eligibility status percentage.The researcher had hypothesized that higher the applicant income higher will be the probability of the individual for getting the Home Loan. With the help of the Data representation above it is seen that this hypothesis holds does not hold true. It is seen that the income range categorized by High is 1st and the 2nd spot is for the income range categorized by Average at 70%.
  • 84. Graph 4.21 Fig :- Bivariate analysis. Loan Status and Loan amount taken by the applicants. The researcher has in the figure above represented Bivariate analysis done on the Loan Status and Loan amount taken by the applicants. The Y-axis shows the number of applicants, in percentage format.
  • 85. The researcher has used bins here to make ranges for the Loan Amount taken from the bank. This was done because the Loan Amounts were unique for each applicant and to capture all of them we will making use of range. Let’s now define the bins we have used, the bins are Low, Average and High. Low represents the range 0 to 100, Average represents the range 100 to 200 and so on. The range here is 0-100 and the multiplication factor to this range is 10,000. Meaning the range actually is 0-10,00,000. Similarly, for Average bin the range is 10 - 20 Lakhs and so on. After having understood the bins and the associated range. The next part is the Bivariate analysis. Fig :- Loan Status w.r.t Loan Amount depicted in percentage format.
  • 86. From the figure we can note that the highest number of applicants are for the Average bin which has a range of 10-20 Lakhs. It is so happens to be that the Average bin has the highest eligibility status as well. After having reviewed all the above give data, when the researcher now goes through the graph 4.20, it is now easier to understand the data depicted by the graph. We can infer, when it comes to Home Loan eligibility status in context to the Loan Amount, it can be seen from the graph above that Average bin has the highest eligibility status, followed by the Low Bin ( 0 - 10 Lakhs ) and lastly the High bin ( 20 - 70 Lakhs). The researcher will now start with the regression testing of the Home Loan Dataset. For this test we will be using 2 statistical measures Linear Regression and Logistic Regression.
  • 87. Graph 4.22 Fig :- Scatter plot of Applicant Income vs Loan Amount The researcher has enclosed a highly concentrated section within a red box. The applicant income range is 0-20,000 and on the Y-axis, the Loan Amount is 0-40,00,000. It is very evident that close to 85% of our applicants lie within this range. This is a very important insight that the researcher was able to extract, with the help of the scatter plot. The distribution which lies outside the highlighted area is seen to be very diverse. It can therefore be predicted that, because of the huge variation in the data it would be extremely difficult to find the best fit line.
  • 88. Graph 4.23 Fig :- Scatter plot of Applicant Income vs Loan Amount with all possible best fit lines. Before the researcher performs the automation process using a Machine Learning technique which is Logisitic Regression on the Dataset. Here, we will take a selected Dataframe from our Dataset to understand the why it will not be possible to use Linear Regression in place of Logistic Regression. The researcher performed Linear Regression on the Dataset and it can be seen from the graph that the variance in the Dataset is very high, there exists a high number of best-fit lines all of which have the same error rating. This highlights the Non-Linear nature of our Home Loan Dataset.
  • 89. Therefore, we can conclude that it will further not be feasible to use Linear Regression for the analysis of our Dataset. This is because most real-life scenarios are Non-Linear in nature. A solution to this challenge is to use a non-linear approach, for the analysis of this Dataset. The researcher will use Logarithmic scaling to analyze the variation, which is done in Logistic Regression. This will be used to solve our final objective which is to determine the Loan Eligibility Status. Table 4.2 Fig :- Rows and Columns we selected to be used in Logistic Regression model building.
  • 90. Here, to perform the Logistic Regression, we have selected a few columns and we have dropped all the other columns like applicant income, co- applicant income, Loan Amount etc. Out of of a set of 13 columns we are using 8 columns to begin with for our model. Table 4.3 Fig :- Rows and Columns we selected to be used in Logistic Regression model building for variable “X” . Here, we have defined a new variable, “X” and we are storing our Dataset in “X”. It can be seen that we have dropped the Loan Status column from the Dataframe we are storing in “X”. The Final Dataframe stored in “X” can be seen. Table 4.4
  • 91. Fig :- Rows and Columns selected for Y-Dataframe. Here, we have defined a new variable, “Y” and we are storing our Dataset in “Y”. It can be seen that we have used only the Loan Status column from the Dataframe for are storing in “Y”. The Final Dataframe stored in “Y” can be seen. Fig :- Usage of train_test_split function in the Dataset. We make use of a very important function which is train_test_split. This function helps us to divide the the training and the test data set within the Dataset. The Training Data will be used by the Machine Learning algorithm to learn more about the Data. The Test Data set will be later used to check how the prediction of the Loan Eligibility status.
  • 92. The Split also helps us to compare the algorithm generated output with the manually derived Loan Eligibility status and based on this we get the accuracy score of the model. Table 4.4 Fig :- Rows and Columns selected under x_train after train_test_split function. It can be seen that the Dataset has been sliced by the algorithm. The same is done for all of the other arguments as well which can be seen in the train_test_split function. Table 4.5
  • 93. Fig :- Rows and Columns selected under y_train after train_test_split function. Here we see that the Dataset has been sliced by the algorithm and stored under the y-train. The Data is split using the train_test_split function, to perform model building using Logistic Regression. Fig :- We derive the final model using Logistic Regression function.
  • 94. Here, in the bottom of the snippet we can see the final model, which the researcher was able to extract. With this model in awareness we can be presented with any number of applicants and it will be possible to get the preliminary analysis within seconds. Fig :- We derive the final model using Logistic Regression function. The score function as mentioned helps us to compare the algorithm generated output with the manually derived Loan Eligibility status, it can be seen that the accuracy of this model is 79%, which is fairly good accuracy score. We can work and improve the accuracy of the model. Chapter 5 : Findings from the study
  • 95. Fig :- Visual representation of the 3 categorical variables. When the researcher factored in 3 variables, the researcher was able to come across an intresting finding. The diagram represents a visual representation of the the 3 variables. On the right-hand side we see 3 categorical variables. Within the Venn-diagrams are the highest category w.r.t. each variable mentioned in the right. For example in location out of Urban, Semi-Urban and Rural, Semi-Urban had the highest percentage of applicants. When we are factoring out a strategy, in context to the Bank, Location is an important starting factor, the next factor to be checked is the Education of the applicant and finally the individual’s Employment Status. This analysis helped us to understand the segment of applicants which were the Graduated employees who are Semi-Urban residents who gave
  • 96. the bank the maximum business. Segmentation is very critical as primarily it helps the Bank understand who are its customers. The next important detail that can be derived by segmentation, is the list of needs of that are most important or which add most value to the individuals of that segment. It is when we focus and fulfil the needs segment-wise is when it is possible to add maximum value to the customers. One more other finding was very critical and that was the scatter plot of Loan Amount vs the Applicant Income. It once again helped us to segment the customers of the bank. The researcher was able to understand that there was very high concentration of applicants whose income range lied between 0-20,000 Rupees and who were aspiring towards taking a Home Loan in the range of 0-40,00,000 Rupees. Recommendations
  • 97. Fig :- State-wise Home Loan penetration in India. Source - RBI, IDBI Capital Research. It is evident that many Indian states are yet to ride the wave of urbanization. On review of the graph above the above Data point that major part of the population lies under the Semi-Urban and Rural area category, is further validated. It is known that there is a saturation of Housing Finance Companies(HFCs) in the Urban areas. It was noted in our Data Analysis that the client base of the bank were the highest for the Semi-Urban area and it was the Semi- Urban applicants itself which had the highest eligibility status. The insights when integrated together converge into a recommendation.
  • 98. The Bank has a good client base in the Semi-Urban area. The researcher recommends the Bank to become a niche Bank with a focus on Semi-Urban and Rural category. Repco Home Finance Ltd. one of the giants in HFC space has seen success by following the segmentation strategy. It was noted that the bank focused on niche audience which was self-employed individuals. As housing finance gained momentum, it primarily was around the salaried customer while the potentially creditworthy but difficult to assess self- employed class remained out of the ambit of lenders. Repco understood this and made a quick and aggressive move towards this segment.This success story establishes one truth which is the success of serving niche audience in the Home Loan market. Based on the Data Analysis, it is observed that the Bank has a good audience in the Semi-Urban location. It is recommended that the bank deploy its resources to forge strategies towards development in Semi-Urban and the Rural markets. Based on the finding the researcher would advise the bank to make a tie-up with realtors and construction companies which have focus on properties which lie in the Semi-Urban areas and the Rural areas.
  • 99. It was seen that the most applicants were looking for homes which lie in the affordable range of 0-40,00,000 Rupees. Making alliances with the companies and realtors who provide for such a range would provide for a win-win scenario for both the parties. People looking for Homes prefer properties which are backed by banks for 2 reasons. The 1st being that it adds an extra layer of security to the property. A real estate property which is backed by a bank, means that all the documentation and the legal formalities of the construction company and the construction site are legal and correct. The 2nd is the provision of Home Loan availability for the property. If this aspect is taken care of in the initial stage itself. There would be higher sales for both the construction company and the bank, in terms of Home Loans. Therefore, it will be advised that the bank instruct the marketing team to collect details weekly about all the properties in the areas were it operates. Then based on the past records of the company invite the companies for a possible business partnership.As mentioned before that for the purchase of homes there exists a lot real estate agents and agencies as well whose sole purpose is to connect the buyer and the sellers. The bank can approach them as well and gain further ground level insights. The more data the bank has w.r.t the area where it operates, the more beneficial it will be for the bank. As it is the ground level details that are needed to be taken into
  • 100. consideration while implementing the marketing and sales strategies of the bank. Conclusion The focus areas for the business development of the Bank have been highlighted, based on the study of Home Loan demographics using statistical analysis. It is advised that the findings of the study be deployed by the bank, to drive better growth possibilities.
  • 101. Bibliography 1. https://www.techopedia.com for reference on Asset management. 2. https://searchhrsoftware.techtarget.com for reference on HR management system. 3. https://searchcio.techtarget.com for reference on Open Source Portal Management. 4. https://searcherp.techtarget.com for reference on Financial Management System. 5. https://www.intellias.com for reference on Services in the domain of Data Science . 6. https://blog.paessler.com for reference on SEO and Web Hosting Services. 7. https://www.oracle.com for reference on ERP Services 8. https://www.moneycontrol.com for reference on Home Loan status in India.