2. 1. Digital Origin introduction
2. A recurrent problem moving to production
3. H2O
4. Digital Origin pipeline
5. Sometimes is harder than usual to automate: Rejection Inference
3. 1. Digital Origin introduction
2. A recurrent problem moving to production
3. H2O
4. Digital Origin pipeline
5. Sometimes is harder than usual to automate: Rejection Inference
4. Digital Origin – Introduction
Digital Origin is a leading Spanish fintech company focused on technology-enabled consumer finance.
Founded in 2011. €15 million A-round in 2015. 80 employees with offices in Barcelona and Madrid.
Uniquely positioned to address mainstream consumer finance market with a wide portfolio of instant real-time
products with a process completely online.
Over €150 million lent to date.
¡QuéBueno! was released in 2011: Consumer finance microlending.
1
2
3
4
5
Paga+Tarde was released in 2015: Consumer finance for eCommerce and InStore.6
5.
6.
7.
8. Fraud
Risk
Business
Monitoring
Massive Fraud
Identity Fraud
Not Willing to Pay
Default Risk
Product, UX,
AR vs DR tradeoff
Evaluation
Control & Alerts
Marketing
Credit Cards / Returnings
QB
Device - Fingerprinting
User request
Graph relationships model
DNI Images models
Geo fraud model
Basket Model
Behavioural Model
Configuration & Parameter
Tuning
Reporting
Uplift models
CLTV models
Identity fraud model
Alerts
CREDIT RISK ENGINE
(CRE)
Design
&
Models
params
Risk Model
9. 1. Digital Origin introduction
2. A recurrent problem moving to production
3. H2O
4. Digital Origin pipeline
5. Sometimes is harder than usual to automate: Rejection Inference
10. A recurrent problem moving to production
I+D Environment Prod. Environment
• Scalable architecture
• Error handling
• High availability
• Load Balance
• Reliable and Stable
• …
Data Scientist profile Engineer profile
There are different requirement in development/design phase and once in production.
• Interactive mode
• Friendly for discovery
• Fast developing language
• Easy to save a state to continue later
on
• Access to mathematic libraries
• …
11. A recurrent problem moving to production
I+D Environment Prod. Environment
• Interactive mode
• Friendly for discovery
• Fast developing language
• Easy to save a state to continue later
on
• Access to mathematic libraries
• …
• Scalable architecture
• Error handling
• High availability
• Load Balance
• Reliable and Stable
• …
Data Scientist profile Engineer profile
Different languages implies twice or more work.
12. A recurrent problem moving to production
I+D Environment Prod. Environment
• Interactive mode
• Friendly for discovery
• Fast developing language
• Easy to save a state to continue later
on
• Access to mathematic libraries
• …
• Scalable architecture
• Error handling
• High availability
• Load Balance
• Reliable and Stable
• …
Data Scientist profile Engineer profile
Solution A: Python is well suited for both necessities.
13. A recurrent problem moving to production
I+D Environment Prod. Environment
• Interactive mode
• Friendly for discovery
• Fast developing language
• Easy to save a state to continue later
on
• Access to mathematic libraries
• …
• Scalable architecture
• Error handling
• High availability
• Load Balance
• Reliable and Stable
• …
Data Scientist profile Engineer profile
/ / / ...
...
Solution B: API approach to get some give some flexibility.
14. A recurrent problem moving to production
I+D Environment Prod. Environment
• Interactive mode
• Friendly for discovery
• Fast developing language
• Easy to save a state to continue later
on
• Access to mathematic libraries
• …
• Scalable architecture
• Error handling
• High availability
• Load Balance
• Reliable and Stable
• …
Data Scientist profile Engineer profile
...
15. 1. Digital Origin introduction
2. A recurrent problem moving to production
3. H2O
4. Digital Origin pipeline
5. Sometimes is harder than usual to automate: Rejection Inference
16. H2O - Architectures
• Open source API for Machine Learning
• Massively Scalable Big Data Analysis
• Easy-to-use WebUI (Jupyter – Python notebook)
• Familiar Interfaces: R, Python, Scala, Java, API, …
• Real-time Data Scoring
• Rapidly deploy models to production via POJO
or model-optimized Java objects (MOJO)
• Algorithms
• GLM
• Random Forest
• GBM
• “Deep Learning”
• Deep Water: Tensorflow, MXNet, Caffe, … (not yet)
• …
https://www.h2o.ai/h2o/
20. 1. Digital Origin introduction
2. A recurrent problem moving to production
3. H2O
4. Digital Origin pipeline
5. Sometimes is harder than usual to automate: Rejection Inference
21. Fraud
Risk
Business
Monitoring
Massive Fraud
Identity Fraud
Not Willing to Pay
Default Risk
Product, UX,
AR vs DR tradeoff
Evaluation
Control & Alerts
Marketing
Credit Cards / Returnings
QB
Device - Fingerprinting
User request
Graph relationships model
DNI Images models
Geo fraud model
Behavioural Model
Configuration & Parameter
Tuning
Reporting
Uplift models
CLTV models
Identity fraud model
Alerts
CREDIT RISK ENGINE
(CRE)
Design
&
Models
params
Risk Model
22. Development Production
Node 1 … Node N
Hadoop ecosystem
Extract
Transform
Train
models
Transform
Scoring
Export POJO
Digital Origin – Introduction
23. Data Analytics activity
Production
Credit Risk Engine (CRE)
Digital Origin – Actual Pipeline
Reporting
Replica Databases
{{mustache}}
streaming
Query
template
Tools
Corporate Libraries
batch
Data Science
Daily activity and
recurrent processes
Analytics and
Reporting databases
Production Databases
Alerts System
Services to other dep.
CRE development
CRE param. tuning
Front End
New
Model
Back End
New
Config