Pipelines for model deployment
2017-04-25
1. Digital Origin introduction
2. A recurrent problem moving to production
3. H2O
4. Digital Origin pipeline
5. Sometimes is harder than usual to automate: Rejection Inference
1. Digital Origin introduction
2. A recurrent problem moving to production
3. H2O
4. Digital Origin pipeline
5. Sometimes is harder than usual to automate: Rejection Inference
Digital Origin – Introduction
Digital Origin is a leading Spanish fintech company focused on technology-enabled consumer finance.
Founded in 2011. €15 million A-round in 2015. 80 employees with offices in Barcelona and Madrid.
Uniquely positioned to address mainstream consumer finance market with a wide portfolio of instant real-time
products with a process completely online.
Over €150 million lent to date.
¡QuéBueno! was released in 2011: Consumer finance microlending.
1
2
3
4
5
Paga+Tarde was released in 2015: Consumer finance for eCommerce and InStore.6
Fraud
Risk
Business
Monitoring
Massive Fraud
Identity Fraud
Not Willing to Pay
Default Risk
Product, UX,
AR vs DR tradeoff
Evaluation
Control & Alerts
Marketing
Credit Cards / Returnings
QB
Device - Fingerprinting
User request
Graph relationships model
DNI Images models
Geo fraud model
Basket Model
Behavioural Model
Configuration & Parameter
Tuning
Reporting
Uplift models
CLTV models
Identity fraud model
Alerts
CREDIT RISK ENGINE
(CRE)
Design
&
Models
params
Risk Model
1. Digital Origin introduction
2. A recurrent problem moving to production
3. H2O
4. Digital Origin pipeline
5. Sometimes is harder than usual to automate: Rejection Inference
A recurrent problem moving to production
I+D Environment Prod. Environment
• Scalable architecture
• Error handling
• High availability
• Load Balance
• Reliable and Stable
• …
Data Scientist profile Engineer profile
There are different requirement in development/design phase and once in production.
• Interactive mode
• Friendly for discovery
• Fast developing language
• Easy to save a state to continue later
on
• Access to mathematic libraries
• …
A recurrent problem moving to production
I+D Environment Prod. Environment
• Interactive mode
• Friendly for discovery
• Fast developing language
• Easy to save a state to continue later
on
• Access to mathematic libraries
• …
• Scalable architecture
• Error handling
• High availability
• Load Balance
• Reliable and Stable
• …
Data Scientist profile Engineer profile
Different languages implies twice or more work.
A recurrent problem moving to production
I+D Environment Prod. Environment
• Interactive mode
• Friendly for discovery
• Fast developing language
• Easy to save a state to continue later
on
• Access to mathematic libraries
• …
• Scalable architecture
• Error handling
• High availability
• Load Balance
• Reliable and Stable
• …
Data Scientist profile Engineer profile
Solution A: Python is well suited for both necessities.
A recurrent problem moving to production
I+D Environment Prod. Environment
• Interactive mode
• Friendly for discovery
• Fast developing language
• Easy to save a state to continue later
on
• Access to mathematic libraries
• …
• Scalable architecture
• Error handling
• High availability
• Load Balance
• Reliable and Stable
• …
Data Scientist profile Engineer profile
/ / / ...
...
Solution B: API approach to get some give some flexibility.
A recurrent problem moving to production
I+D Environment Prod. Environment
• Interactive mode
• Friendly for discovery
• Fast developing language
• Easy to save a state to continue later
on
• Access to mathematic libraries
• …
• Scalable architecture
• Error handling
• High availability
• Load Balance
• Reliable and Stable
• …
Data Scientist profile Engineer profile
...
1. Digital Origin introduction
2. A recurrent problem moving to production
3. H2O
4. Digital Origin pipeline
5. Sometimes is harder than usual to automate: Rejection Inference
H2O - Architectures
• Open source API for Machine Learning
• Massively Scalable Big Data Analysis
• Easy-to-use WebUI (Jupyter – Python notebook)
• Familiar Interfaces: R, Python, Scala, Java, API, …
• Real-time Data Scoring
• Rapidly deploy models to production via POJO
or model-optimized Java objects (MOJO)
• Algorithms
• GLM
• Random Forest
• GBM
• “Deep Learning”
• Deep Water: Tensorflow, MXNet, Caffe, … (not yet)
• …
https://www.h2o.ai/h2o/
H2O - Architectures
Local
Cluster + HDFS
Cluster
H2O - Architectures
Cluster + Spark
Node 1 … Node N
Cluster + Spark
H2O - Performance
Reproducible benchmark: https://github.com/szilard/benchm-ml
GLM RF GBM (setup A) GBM (setup B)
1. Digital Origin introduction
2. A recurrent problem moving to production
3. H2O
4. Digital Origin pipeline
5. Sometimes is harder than usual to automate: Rejection Inference
Fraud
Risk
Business
Monitoring
Massive Fraud
Identity Fraud
Not Willing to Pay
Default Risk
Product, UX,
AR vs DR tradeoff
Evaluation
Control & Alerts
Marketing
Credit Cards / Returnings
QB
Device - Fingerprinting
User request
Graph relationships model
DNI Images models
Geo fraud model
Behavioural Model
Configuration & Parameter
Tuning
Reporting
Uplift models
CLTV models
Identity fraud model
Alerts
CREDIT RISK ENGINE
(CRE)
Design
&
Models
params
Risk Model
Development Production
Node 1 … Node N
Hadoop ecosystem
Extract
Transform
Train
models
Transform
Scoring
Export POJO
Digital Origin – Introduction
Data Analytics activity
Production
Credit Risk Engine (CRE)
Digital Origin – Actual Pipeline
Reporting
Replica Databases
{{mustache}}
streaming
Query
template
Tools
Corporate Libraries
batch
Data Science
Daily activity and
recurrent processes
Analytics and
Reporting databases
Production Databases
Alerts System
Services to other dep.
CRE development
CRE param. tuning
Front End
New
Model
Back End
New
Config
THANKS!
Questions?
ralabern@digitalorigin.com
markus@digitalorigin.com

Digital Origin - Pipelines for model deployment

  • 1.
    Pipelines for modeldeployment 2017-04-25
  • 2.
    1. Digital Originintroduction 2. A recurrent problem moving to production 3. H2O 4. Digital Origin pipeline 5. Sometimes is harder than usual to automate: Rejection Inference
  • 3.
    1. Digital Originintroduction 2. A recurrent problem moving to production 3. H2O 4. Digital Origin pipeline 5. Sometimes is harder than usual to automate: Rejection Inference
  • 4.
    Digital Origin –Introduction Digital Origin is a leading Spanish fintech company focused on technology-enabled consumer finance. Founded in 2011. €15 million A-round in 2015. 80 employees with offices in Barcelona and Madrid. Uniquely positioned to address mainstream consumer finance market with a wide portfolio of instant real-time products with a process completely online. Over €150 million lent to date. ¡QuéBueno! was released in 2011: Consumer finance microlending. 1 2 3 4 5 Paga+Tarde was released in 2015: Consumer finance for eCommerce and InStore.6
  • 8.
    Fraud Risk Business Monitoring Massive Fraud Identity Fraud NotWilling to Pay Default Risk Product, UX, AR vs DR tradeoff Evaluation Control & Alerts Marketing Credit Cards / Returnings QB Device - Fingerprinting User request Graph relationships model DNI Images models Geo fraud model Basket Model Behavioural Model Configuration & Parameter Tuning Reporting Uplift models CLTV models Identity fraud model Alerts CREDIT RISK ENGINE (CRE) Design & Models params Risk Model
  • 9.
    1. Digital Originintroduction 2. A recurrent problem moving to production 3. H2O 4. Digital Origin pipeline 5. Sometimes is harder than usual to automate: Rejection Inference
  • 10.
    A recurrent problemmoving to production I+D Environment Prod. Environment • Scalable architecture • Error handling • High availability • Load Balance • Reliable and Stable • … Data Scientist profile Engineer profile There are different requirement in development/design phase and once in production. • Interactive mode • Friendly for discovery • Fast developing language • Easy to save a state to continue later on • Access to mathematic libraries • …
  • 11.
    A recurrent problemmoving to production I+D Environment Prod. Environment • Interactive mode • Friendly for discovery • Fast developing language • Easy to save a state to continue later on • Access to mathematic libraries • … • Scalable architecture • Error handling • High availability • Load Balance • Reliable and Stable • … Data Scientist profile Engineer profile Different languages implies twice or more work.
  • 12.
    A recurrent problemmoving to production I+D Environment Prod. Environment • Interactive mode • Friendly for discovery • Fast developing language • Easy to save a state to continue later on • Access to mathematic libraries • … • Scalable architecture • Error handling • High availability • Load Balance • Reliable and Stable • … Data Scientist profile Engineer profile Solution A: Python is well suited for both necessities.
  • 13.
    A recurrent problemmoving to production I+D Environment Prod. Environment • Interactive mode • Friendly for discovery • Fast developing language • Easy to save a state to continue later on • Access to mathematic libraries • … • Scalable architecture • Error handling • High availability • Load Balance • Reliable and Stable • … Data Scientist profile Engineer profile / / / ... ... Solution B: API approach to get some give some flexibility.
  • 14.
    A recurrent problemmoving to production I+D Environment Prod. Environment • Interactive mode • Friendly for discovery • Fast developing language • Easy to save a state to continue later on • Access to mathematic libraries • … • Scalable architecture • Error handling • High availability • Load Balance • Reliable and Stable • … Data Scientist profile Engineer profile ...
  • 15.
    1. Digital Originintroduction 2. A recurrent problem moving to production 3. H2O 4. Digital Origin pipeline 5. Sometimes is harder than usual to automate: Rejection Inference
  • 16.
    H2O - Architectures •Open source API for Machine Learning • Massively Scalable Big Data Analysis • Easy-to-use WebUI (Jupyter – Python notebook) • Familiar Interfaces: R, Python, Scala, Java, API, … • Real-time Data Scoring • Rapidly deploy models to production via POJO or model-optimized Java objects (MOJO) • Algorithms • GLM • Random Forest • GBM • “Deep Learning” • Deep Water: Tensorflow, MXNet, Caffe, … (not yet) • … https://www.h2o.ai/h2o/
  • 17.
  • 18.
    H2O - Architectures Cluster+ Spark Node 1 … Node N Cluster + Spark
  • 19.
    H2O - Performance Reproduciblebenchmark: https://github.com/szilard/benchm-ml GLM RF GBM (setup A) GBM (setup B)
  • 20.
    1. Digital Originintroduction 2. A recurrent problem moving to production 3. H2O 4. Digital Origin pipeline 5. Sometimes is harder than usual to automate: Rejection Inference
  • 21.
    Fraud Risk Business Monitoring Massive Fraud Identity Fraud NotWilling to Pay Default Risk Product, UX, AR vs DR tradeoff Evaluation Control & Alerts Marketing Credit Cards / Returnings QB Device - Fingerprinting User request Graph relationships model DNI Images models Geo fraud model Behavioural Model Configuration & Parameter Tuning Reporting Uplift models CLTV models Identity fraud model Alerts CREDIT RISK ENGINE (CRE) Design & Models params Risk Model
  • 22.
    Development Production Node 1… Node N Hadoop ecosystem Extract Transform Train models Transform Scoring Export POJO Digital Origin – Introduction
  • 23.
    Data Analytics activity Production CreditRisk Engine (CRE) Digital Origin – Actual Pipeline Reporting Replica Databases {{mustache}} streaming Query template Tools Corporate Libraries batch Data Science Daily activity and recurrent processes Analytics and Reporting databases Production Databases Alerts System Services to other dep. CRE development CRE param. tuning Front End New Model Back End New Config
  • 24.