Part of the CodeUP sessions Goa organized by PayU Money on the "Evolution of AI and IoT". Video of the talk at https://www.youtube.com/watch?v=MzRuE4jKK5U
2. Agenda
● About Seynse
● Lending Industry in India & Challenges
● Technology & Data Science in Lending Industry
● Evolution of data
● Data Science Algorithms in Lending Industry
● Lending use case: Can your telecom behaviour predict creditworthiness?
Next Phase of Evolution
● Q&A
3. ● Fintech - Lending
● Started in 2016, Part of Prototyze Ventures
● Lending as a Service (LaaS) Platform
● Partnership with Airtel
● 75+ Member team in Goa and Gurgaon
● Proposition - Credit with alternate data based credit model
○ Patent pending credit model algorithms
Strategic Partner
4. The Lending Industry in India & Challenges
Market Size: $40 Billion
Alternate Lending: $241Mn in 2018, 53% CAGR (2018-2022), $1.3 Billion by 2022
Credit to GDP Ratio: One of the lowest in the world
● Organized Lending - Banks, NBFCs
● Unorganized Lending - Moneylenders, small credit
Market dominated by banks and unorganized lenders, High interest rates for unsecured and unverified
borrowers
Credit Bureau Penetration - 25%-30%
70%-75% of population is either new to credit or has no previous repayment data or unbanked, and has
difficulty securing a loan from Banks and Financial Institutions
5. Technology & Data Science to the Rescue
Lending as a Service:
● Minimal to no paperwork
● Technology enabled
● Advanced data science/machine learning models to assess the
creditworthiness of a customer
● Does not need historical financial data of consumers
● Uses a variety of data sources as surrogate to financial data
● Can be used by other Banks or NBFCs to underwrite those customers which
the banks cannot do
6. Evolution of data
Old-age lending (Banks)
● Historical financial data
● Secured lending for high value loans - collateral
information
● Bureau score - based on repayment pattern
New-age lending (FinTech)
● Existing data used by the bank, plus,
● Alternate data - age, address, education, lifestyle,
employment, company, college
● Bank statement narration - where does the
customer spend money, share of wallet analysis
on spend (food, grocery, travel, entertainment,
etc)
● Utility bills - electricity bills, water bill, house tax
● Online data - social media data
● App Usage data - online savvy customer
Available for 25-30% of the target
market population
Available for 70-75% of the target
market population
7.
8. How algorithms make sense of this data?
● Right data - value and volume
● Combination of supervised and unsupervised learning algorithms
● Validation with the market standard
● Continuous monitoring and update
9. Use Case: Can your telecom behaviour assess your
creditworthiness?
India:
● 75% Mobile phone penetration (85-90% by 2020) , with over 1 Billion devices
● 36% customers on smartphone
● 4G Data usage at 11GB per user per month
Your telecom behavior tells a lot about you !
10.
11. Data
● No personal data, CDR or sms data used
● What type of device you are using?
● How many circles do you roam in a month?
● Do you use mostly data, or mostly voice calls?
● Do you pay your bills on time?
● What mode of payment you use for recharge or payment?
● How long have you been using your phone?
● …. And over 100 other variables
12. Machine Learning models for building a credit decisioning
engine
Business Problem
Data Availability
Ensemble of multiple models
● Logistic Regression
● Decision Trees & Random Forest
● Support Vector Machines
● XGBoost
● Deep Learning
● NLP Models
13. Model flowchart
Telecom Data - Multiple
Sources Single view -
User Profile
Alternate Data Test Data
Train Data
Scoring
Validation : Actual
vs Predicted
Models
14. Classification Models - Logistic Regression, Decision Trees
Logistic Regression
● Sigmoid Activation - Classification
into 0/1 class with the probability of
success
● Prone to overfitting - Ridge or Lasso
regularization used
● Provides probability of success with
coefficients for each feature
Decision Trees
● Based on Information Gain /Entropy
or Gini Coefficient to classify the data
into subsequent classes at every
node
● Prone to overfitting unless tree is
pruned
● Provides decision rules for
classification - requirement by
regulators
16. Deep Learning Models - Neural Network, LSTM
Black box model, not interpretable, though highly accurate if provided the right amount of data
● Multiple
classes in
terms of
multiple loan
products or
output classes
● Softmax
Activation for
deciding on
more than 2
class network
17. Outcomes
● Assess the risk of customers based on telecom data and calculate the
probability of default for over 5 Million subscribers (1st Phase)
● Score and Preapprove the creditworthiness of over 50-70 Million customers
via the telecom model with varying credit score (2nd Phase)
● Most banks have 5-10 million preapproved customers
● Telecom model is performing better than the Bureau based models (for non-
telco customers) by a factor of 5
● Eliminates the need for customers to have previous history with lenders
18. Data Science in Lending: Next phase of evolution
Telco data based credit models
Social Media & Online Data based credit models
Utility bills
Financial Data
Other Alternate data
[Holy Grail] Single Unified Model : One model to score them all !
Further Applications
3 Billion unbanked & underserverd people across the world
$380 Billion Credit Gap
Social Media Data - FriendlyScore
Telecom Data - Tala, Trusting Social, Seynse
Utility & SMS parser - ZipLoans, LendingKart
Social Media too
https://www.livemint.com/Consumer/zxupEDYD560LJrnoRxcn4L/Mobile-phone-penetration-in-India-set-to-rise-to-8590-by-2.html?utm_source=scroll&utm_medium=referral&utm_campaign=scroll
https://economictimes.indiatimes.com/tech/internet/average-mobile-data-usage-at-11gb-a-month-nokia/articleshow/63032695.cms