Presentation - "Dealing with uncertainty in fintech using AI" by Evgeny Savin.
Evgeny is a Senior Data Scientist at smava for the last 3,5 years. His work focused on the design and implementation of the ML engine that helps customers find the best loans from one of the smava partner banks. Previously, Evgeny worked as a data scientist at HERE and Paypal.
4. 4
A comparison platform for personal loans
1. 1000-50000 Euros,expanding into larger size loans (i.e. mortgage).
2. Financing for purchasing a car, repairing your home, travelling etc.
3. Provided by one of the partner banks
4. Needs to be paid back
○ Customer’s creditworthiness
○ Fundamentally different to a typical ‘buy and forget’ comparison
platform
5. 5
Business Model
smava enables borrowers to get the best deal for consumer loans,
by pooling offers from a few dozen banks in a single website
6. 6
Customer journey
1. www.smava.de - fill in the details online
○ An average customer has to answer 50 questions ~ 15 mins
2. Smava sends an API request to partner banks and gets a response ~2 mins
○ As soon as the registration route is finished partner banks are requested and offers
are displayed
3. Provide proof of income and other details ~ 7 days
○ Usually scans of income slips, bank statements, etc.
4. Get the money into your bank account or get a rejection.
10. 10
Problem statement
● Banks have varying requirements and preferences reflected by their risk
management policy
○ Minimise defaults, maximise ROI
● Banks incur costs when assessing leads from Smava
● Customer needs to qualify for the loan
○ How to efficiently match between customer and a bank?
● Customer only cares about the cost of the loan and the ease of getting it
11. Why machine learning?
● Uncertainty - what is the probability that customer x will get accepted by bank y?
● Ignore uncertainty
○ request all banks for each individual customer
○ Banks don’t like waste (money spent on leads that did not convert)
● Address uncertainty with heuristics
○ High income customers - Bank A, low income customers - Bank B
○ Not efficient, needs continuous updating and manual tweaking
○ Could potentially be more harmful to the business than ignoring uncertainty
● Address uncertainty with Machine Learning
○ Use the full set of data features
■ To learn and predict bank answers
○ Retrain models frequently
○ Use predictions for efficient matching (how?!)
● Blend of heuristics and Machine Learning
○ Heuristics were there first and are believed to be useful
○ Some very simple rules can be useful
11
12. smava Master Presentation Template - Introduction
12
Smava marketplace
● Efficient matching
○ Customer mostly care about the interest rate
○ Banks care about number of requests they receive vs the number of
payouts they provide. Banks do not like waste
○ smava’s mission is to assist the customer find the best deal on the
marketplace
■ smava is concerned both with the interest rate of the offer AND
the realisability of the offer (measured by probability to get the
offer)
14. Smava engineering infrastructure
14
Data collection,
processing saving
(Smava website,
databases etc.)
Predictions,
portfolio creation,
execution schema
(Machine
Learning
microservice)
Bank API
connectivity
Bank_1
Bank_2
Bank_N
customer
data
request
execution
schema
offer request - response
offer request - response
~15 mins ~1 sec ~60 sec t
15. Design principles
● Close collaboration with engineering
○ Impact on all major KPIs
● Extensive tests
○ Unit tests
○ Statistical tests
● Automatic retraining - weekly/daily
● Live monitoring
○ input data
○ output data
15
16. Automatic retraining
● Get data from all sources
● Clean data and prepare it for modelling
● Train models
● Run tests
● Compute predictive accuracy on unseen data
● Return a retraining report
16
17. Monitoring
● Real time model predictive performance monitoring - are we predicting well?
● Anomaly detection - has anything in the outside world changed?
17
18. 18
Tech stack
● Python for data pipelines
● R for modelling and production code
● Shiny for monitoring
● C for some internal packages
19. 19
Predicting and using the predictions
● XGboost for predicting probabilities
● Using probabilities to guide decision making
● Intellectual property
21. 21
● AWS Sagemker: platform to deploy ML model in cloud
○ Competitors: Azure AI, Google Cloud AI
● Fully managed
○ autoscaling, health checks, load balancing
○ allows to change types and number of instances; no
downtime
● Can deploy custom Docker image
● Next iteration : AWS Lambda
Model deployment
22. 22
● Dockerized REST API (using Nginx, Gunicorn and Flask)
● SM expects server to respond to /invocations and /ping (port 8080)
○ /invocations: accept POST requests and returns prediction
○ /ping: accept GET and returns 200 (for health checks)
Model deployment
24. 24
24
● Create a new endpoint and put in shadow for a day
● Live monitoring of shadow
Blue-Green deployment
Prod
Shadow
S3
Further
Processing
Input
output
output
25. 25
● DS team fully owns the Application Engine microservice
○ Complete ownership of infrastructure
○ DS outputs are provided via a REST api
Application Engine microservice
27. 27
● ML component within smava is the very core of the production system
● Opportunities to significantly impact the business in real time
○ Big potential for both improvement, but also for causing damage
● Hand in hand work with engineering, product owners and wider stakeholders in
the company
○ Advocacy
○ Explanations
○ Conflicts of interest
Summary
28. Thank you for your attention
Evgeny Savin
Senior data scientists
28
evgeny.savin@smava.de