SlideShare a Scribd company logo
1 of 43
Download to read offline
Model Factory @ING Bank
Presentation to DataWorks Summit - 2019
2019-03-20Dor Kedem
• Extensive software development career since 2002.
• Working on machine learning research & data science applications
since 2010.
• Today, a lead data scientist and product owner @ ING Bank in
Amsterdam.
Grab me (or via LinkedIn) during these couple of days to talk about:
• CI/CD solutions for a data science project lifecycle.
• Impact-driven data science (from POCs to MVPs).
• Modelling techniques and machine learning applications.
• Transitioning from software development or IT roles to data science.
• Boardgames and 3D puzzles.
A bit about me
3
Linkedin.com/in/KedemDor
Dor.Kedem (at) ing.com
Image Credit: My wife,
adorageek.com
HAPPY
BIRTHDAY!
ING Bank at a glance
Active in
more than
40 countries
+54.000
employees
in ING
Group
38M retail
customers and
12.5M primary
customers in 4Q18
Net Promoter Scores:
#1 in 6 out of 13 retail
countries
Source: https://www.ing.com/About-us/Profile/Key-figures.htm
Challenges in European Banking Scene
Historically low
interest rate =
less revenue from
lending
Source: macrotrends.net Source: https://hollandfintech.com/
Historical LIBOR rates (grey – recession)
Regulations leads to
a more transparent
and open banking
environment
Fintech industries are
looking into innovative
ways to fill traditional
banking roles
How does a bank differentiate itself from the rest?
Sources: https://www.forbes.com/sites/kurtbadenhausen/2019/03/04/the-worlds-best-banks-ing-and-citibank-lead-the-way/
https://www.ing.com/About-us
Empowering people to stay a step
ahead in life and in business
Our purpose
Our strategic priorities
Analytics Efforts in ING
"Data is the language of the future. If you don’t speak it
yet, we’ll help you master it," says Görkem Köseoğlu,
ING’s chief analytics officer.
Artificial Intelligence: Currently, ING employs around
80 data scientists, working on various AI-projects:
Analytics skillset: Thousands of employees to
engage analytical tools and insights:
Our ambition: all customer interactions driven by analytics
One-to-One Analytics
Maximising number of analytics driven service and
sales interactions
Data > insight > action is in ING’s DNA
Democratize big data usage across ING
Users of our services are extremely happy
9
Data Analytics for customer interactions (NL+BE)
Customer Journey
Experts
Data Analysts Data Scientists Data Engineers
How
many?
Over 300 (outside 1:1) Over 100 Roughly 20 Roughly 15
What do
we
know?
• Banking
• Marketing theory
• Customer engagement
• Message framing
• BI tools (SAS, IBM
Cognos)
• Data Privacy
• SQL
• Statistics & ML
• Data Privacy
• Programming (i.e.
Python, R, Scala)
• Big data
technologies
• CI/CD solutions
• Security &
Compliance
What do
we
create?
• Product specification
• Online & offline content
• Customer engagement
• Reports
• Dashboards
• A/B Testing
• Statistical models
• Data Products
• ETL systems
• Data lake
• Model hosting
10
CJE - Christina DA - Arjen DS - Samir DE - Eleanor
The need for a model factory
11
For Black-Friday (Nov 23rd), a customer engagement business unit would want to
engage eligible customers (via NBA + emails) with the option to acquire a new credit
card. We have two types of offers: regular credit cards & platinum credit card.
How to find who to contact with these offerings? 2 approaches:
Example case: Credit Card Acquisition
12
DA - Arjen
DS - Samir
CJE - Christina
• Build a likelihood model based on
past behavior and engagements.
• Rank customers according to this
model.
• Plot customer engagements on
different demographics.
• Come up with business rules based
on personal understanding.
For Black-Friday (Nov 23rd), a customer engagement business unit would want to
engage eligible customers (via NBA + emails) with the option to acquire a new credit
card. We have two types of offers: regular credit cards & platinum credit card.
How to find who to contact with these offerings? 2 approaches:
Example case: Credit Card Acquisition
13
DA - Arjen
DS - Samir
• Build a likelihood model based on
past behavior and engagements.
• Rank customers according to this
model.
• Plot customer engagements on
different demographics.
• Come up with business rules based
on personal understanding.
Very vast majority
CJE - Christina
Enticements are being offered to the wrong customers:
• Customer disengagement (unsubscribes, ad-blindness)
• Wasted work by CJE’s and DA’s
• Loss of potential revenue.
It takes a lot of time to make customer selections:
• No structured way to figure out target population.
• No structured learning from our past campaigns.
• Only a binary selection (to send / not to send) – no ranking.
Not leveraging on the full potential of our data:
• Not taking into account a large set of features.
• Not taking into account engagement with past offerings.
• Not taking into account engagement with other products.
• You only target what you can code.
Problem with manual selection of the customers
14
Purchase
No purchase
All clients Top 10%
One of the added value of models:
Ranking customers
Unordered Ranked by relevant
Selection based on threshold
Here are some responses gathered when asking about current way of work and
gaps from best practices:
Learning from our past & present…
CJE - Christina
DA - Arjen
“I don’t have time for experiments & evaluation.
We have a schedule and need to create the next campaigns”
“I get personal satisfaction from weighing in my opinion”
“I don’t see how testing everything to death leads to better results”
“Management is not critical enough about measuring our performance”
“I can’t get clear insights from my DA / CJE colleague”
“I know my customers”
“There’s a lack of clear guidelines and standards across ING tribes”
“I need to be able to contact more customers, even if models disagree”
Organizational
Personal
Fears needs to be addressed early on (i.e. fear of measurement, of loss of control, of automation)
Focusing on empowerment before the revolution:
• Helping DAs and CJEs to make better decisions, not to make all decisions for them.
• Creating direct link: standardization  gain in personal efficiency.
• Incorporating customers in our development squad.
Resource: Check out ING PACE: Evidence-based design-driven lean approach.
How should we interpret the interviews?
16
PACE PACE Phases Experiment Loop
Our approach – Model Factory
17
Democratizing model building: Enabling DA’s to create models for gaining customers insights.
Accelerate best practices: Make it easy & fast to be effective in customer selections.
• Model building process “built-in”: Tell us “what” you want – we take care of the “how”.
• Evaluation “built-in”: Build a model  Get a free model & campaign evaluation!
• Compliance “built-in”: GDPR, archiving, legal, commercial pressure, risk – we got you covered.
Our Objective
CJE - Christina
DA - Arjen
DS - Samir
DE - Eleanor
Saves time
Better engagement
Making large-
scale impact
Understand the
customer better
Saves time
Grows in skills
Meeting
objectives
Customer - Claire
More relevant
offerings
ING Bank
17
Building customer models without reinventing the model building process
Model Factory
19
Building Blocks
Model
Recipe
Model Building Process Scoring Model
𝑓( ) = 𝑦
Scoring eligible
customers
Feeding scores to
ING processes
Creating reports in BI tools
for ING business units
Somewhat similar open source approach: Uber’s Ludwig, AirBnB’s BigHead & KPN’s model factory
Mandatory ingredients:
• Business Objective
Selection from: acquisition, deepsell, retention, customer journey.
• Business Objective specification
Based on the objective. For example: which product to acquire?
• Features to include / exclude
Selection from a list. Done based on domain expertise.
• Customers to include / exclude
SQL “where clause”. Based on domain expertise.
Optional ingredients (with defaults):
• Times specification: (How long does it take to acquire, how long
before customer makes decision)
• Modelling techniques: (for advanced / data scientists users)
Model Recipe
Model specification is
translated to a 10-15
lines JSON file and is
filled by a DA
Analytics features extraction
Machine learning monitoring processes
Target templates (i.e. acquisition, deepsell)
Classifiers
Evaluators
Hyperparameter / model selection (AutoML)
Fairness & bias reduction
Building Blocks
Data-sets creators
Uplift measurement
Storage management
Scheduling
Hosting
GDPR applications
Interaction with ING services
Available to all models built with a recipe specification:
Building blocks concrete examples
22
Building Blocks Example (1): Data Sources
Clients (~80)
Products (~600)
Engagements (~300)
Data dumps &
streams from
ING sources
Data
Lake
Structured
Data Sources
Analytics
Features
Table(s)
DE - Eleanor
DA - Arjen
DS - Samir
Features in the
table are GDPR
validated
Data
scientists &
analysts build
an analytics
repository
from data
sources.
Data
engineers
build the ETL
processes to
create data
sources.
Built on top of:
• IBM PureData for Analytics (PDA)
• SAS Enterprise Global
Creating the model feature sources
Building Blocks Example (1): Data Sources
Via Apache Sqoop (diffs / full)
Deployed in Ansible
Scheduled by IBM Workload Scheduler
Hortonworks Data Platform
IBM PureData
for Analytics
Transferring the data to our model building environment
1 2
Managing access via Apache Ranger
on both levels:
Files are extracted
from PDA’s tables
to HDFS
Hive tables’
metadata is being
created / updated.
HDFS policies: hdfs://ingestion/raw
Hive policies on specific tables and model
results. i.e.:
• hdfs://access/BEL/models
• hdfs://access/NED/models
Synchronize users & permissions
ldap_user_sync
LDAP
Container
Building Blocks Example (2): Data Sets Creators
Some tips to building datasets:
• Selecting different customers in each timestamps  Generalizing to new customers.
• Arranging data set in time series accordance  Generalizing better for forecasting.
Training set
Jan ‘17 Jan ‘18
Valid
Mar ‘18
Training set
Jan ‘17 May ‘18
Valid
Mar ‘18
Training set
Jan ‘17 Jul ‘18
Valid
May ‘18
Training set
Jan ‘17 Jul ‘18
Test
Dec ‘18
Time
series
cross
validator
Picking best hyper-parameters
Train
fit the model.
Used to
Valid
learn hyper-
parameters.
Used to
Test
Legend
evaluate and to
pick best mode.
Used to
Useful resource - Timothy Lin’s Creating a Custom Cross-Validation Function in PySpark
Building Blocks Example (3): Preparing data for training
Pipeline approaches (pyspark.ml.pipeline): Elegant way to manage the workflow of your data
processing. Each stage simply appends new transformers to the pipeline model stages:
Transformers: The vast majority of transformers we use can be found in pyspark.ml.feature,
as well as some custom transformers we’ve defined ourselves to make sure all preprocessing
is managed in the pipeline object.
Code snippet: each fit function adds stages to
pipeline & transforms data for next stage
Code snippet: Example of basic custom transformer
Building Blocks Example (3 - Bonus): Filling missing values
Apart from the pyspark.ml.feature Imputer class, we also experiment with using Autoencoders for filling missing values.
• Encoder: A dimensionality reduction model, from the original features to a lower dimensional code.
• Decoder: The reverse action – tries to recreate the original input from the code (not a perfect match).
A very good resource for distributed deep learning on top of Spark: dist-keras (by CERN Database group).
To learn more about autoencoder’s check out Irhum Shafkat’s Intuitively Understanding Variational Autoencoders.
X: 5 10 23 <?> 0
Autoencoder design Autoencoder application:
Data with missing feature gets in, filled value comes out
X’: 5 10 22 7 0
Building Blocks Example (4): Model Building
Relying on open-source Big Data technologies as building blocks
Classifiers (the model types): mainly based on the Spark
Machine Learning framework and includes:
• Linear / Logistic regression
• Naïve Bayes
• Decision Trees
• Ensemble methods (Random Forest, GBRT)
• Neural Networks (MLP)
Evaluators (the model performance validation):
• Everything under the Spark MLLib evaluation metrics.
AutoML (finding the best model):
• Currently experimenting with auto-sklearn & H2O for faster
hyper-parameter tuning. See Georgian Partners’ comparison.
Building Blocks Example (5): Fairness
Resource: https://research.google.com/bigpicture/attacking-discrimination-in-ml/
For easy explanation:
Attacking discrimination with smarter machine learning
Resource: http://aif360.mybluemix.net/
For approaches on reducing bias:
IBM AI Fairness 360
Model Factory Products
30
Engaging with the model factory process & results
Customer
Journey
Expert
Data
Analyst
Data
Scientist
Data
Engineer
Validating
building
blocks
Validating
model
execution
Validate
model
quality
Understand
the customer
better
Selecting
customers for
campaign
Post-hoc
campaign
evaluation
Getting the big
picture of model
usage
   
    
    
  
Monitoring Tool
(Developed in-house)
BI Tools (IBM Cognos Analytics)
ML Monitoring Dashboard
32
Backend
Monitoring
Dashboard
Designated system for monitoring production ML models
This architecture enables access to metrics data from production models
33
User views
project metrics
Frontend loads
the project
Logstash
Development / production cluster
JSON files with project details,
metrics, models, etc.
When a user creates
a model, we create a
new model monitoring
Project JSON
files are pushed
Backend
Elastic SearchMonitoring Dashboard
Test / Prd
Test / Prd
ING servers
Designated system for monitoring production ML models
Useful resource: Google AI’s What’s your ML test score? A rubric for ML production systems (Breck et. al, 2016)
Open source alternative: mlflow.org (platform for machine learning lifecycle)
Designated system for monitoring production ML models
Useful resource: Google AI’s What’s your ML test score? A rubric for ML production systems (Breck et. al, 2016)
Open source alternative: mlflow.org (platform for machine learning lifecycle)
Reporting on the model built
Recall
Precision
AUCs
BA
G
H
C
D E
F F*
I
F. Customer
Segmentation
G. Model comparison heat
map.
H. Compare features
distributions.
I. Score distribution
J. Conversion for feature
values.
A. Technical quality
metrics
B. Lift curve
C. Cumulative Gains
D. Overlap with manual
selection
E. Feature Importance
J
36
A couple of examples to explain visually how good the model is:
Credit Card Acquisition – Model Performance
Customer percentiles (1% = 54k)
37
B/C
Cumulative Gains – how many of our
conversions did the model catch?
Lift Curve – how much better is our
selection than random selection?
Customer percentiles (1% = 54k)
DA - Arjen
“Ok… It’s better
than random…
But how does it
compare to my
previous
selection?”
What’s the difference between my old selection and the model’s?
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97
Customers not in
old selection
Ranked customers percentile (left – most relevant)
Customersselectedinpercentile
20% Threshold
38
D
DA - Arjen
Customers in old
selection
“I feel more confident the model makes
meaningful selections”
“I see that the model found top
customers that I haven’t contacted yet”
“I still don’t understand who did I
miss…”
Who are the likely customers? (In this example: by age)
6.24%
10.82% 10.55% 10.32% 10.13%
12.06% 12.72% 12.82%
14.34%
1.94%
7.05%
8.38% 8.00% 8.35%
11.59%
15.05%
18.01%
21.64%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
18-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-60+
How does [age]
distribute over the
most likely
customers?
Top 10%
versus
Bottom 90%
Age distribution in each customer group
39
Portionofentireavailablepopulation
Who did we
miss in the top
20%?
Selected
versus
Not Selected
5.82%
13.77%
12.34%
9.61%
8.68%
10.63%
12.12% 12.35%
14.68%
1.63%
6.81%
8.27% 8.38% 8.72%
12.10%
15.29%
17.99%
20.81%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
18-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-60+
H
Strategy for features: iterative refinement of the filtered features.
Credit Card Acquisition – Important Features
(larger = more important, blue color = significant to only one model)
Regular credit card acquisitions Platinum credit card acquisitions
40
f_53
f_348
f_127
f_218
f_31
f_38
f_43
f_8
f_857
f_842
f_842
f_12
f_15
f_38
f_31
f_457
f_458
f_857
f_218
f_127
E
DA - Arjen
“I gain confidence that the
models generate meaningful
results.”
“I can easily troubleshoot issues
with model recipe”.
“I learn more about our
customers.”
Grouping customers together based on the model’s important features.
Customer Segmentation
Segment size: indication of
number of customers.
Segment color: average
conversion (more yellow =
higher conversion).
Y,X Axes: Don’t mean
much, but the overall
distance between
segments mean that
customers are more
different based on
important features (closer
segments = more similar).
Allows for further
analysis on
customer segments
F
CJE
Christina
“This helps me understand who are my customers and to
tailor a message for each type of customers.”
X-axis: Ranked customers interested in
regular credit card (left - most interested)
Y-axis: Ranked customers interested in
platinum credit cards (down - most interested).
Rectangles – the top 10% of customers
in each group.
Credit Card Acquisition – Which proposal to who?
Bottom 90%
Platinum 347k 4.9Mil
Top 10% -
Platinum 232k 412k
Top 10%
Regular
Bottom 90%
Regular
# customers
in shared
percentile
(log scale)
Brighter
= more
customers
Combined ranking for both credit
card acquisition models
42
G
DA - Arjen
“I can now send the
relevant offer to the
relevant customers and
avoid spamming.
What to expect from the top-scored customers next month?
0.00%
0.50%
1.00%
1.50%
2.00%
2.50%
3.00%
3.50%
0 20 40 60 80 100
Expected next-month conversion rate for each
percentile
Conversionratenextmonth
Customer percentile as ranked by the model
(lower = more relevant)
Percentile
Expected
Percentile’s
Conversion
Expected
Total
Conversion
Expected
Total
Conversion
5 1.81% 2.25% 1404
10 1.37% 1.88% 2351
20 0.95% 1.50% 3737
50 0.48% 1.00% 6268
100 0.09% 0.65% 8103
43
To wrap up
44
Summary:
• Enabling model creation, using data scientists
best practices and cumulative efforts.
• Simple specification, modular design.
• Accelerates DA’s, empowers CJE’s, and makes
all of us more relevant to our customers.
Model Factory @ING Bank
Selected Resources:
Driving innovation:
• ING PACE: Evidence-based design-driven lean approach
Model building:
• Uber’s Ludwig – Building models without coding
• AirBnB’s BigHead
• Georgian Partners’ AutoML comparison
• Creating a Custom Cross-Validation Function in PySpark
• Distributed deep learning on spark: dist-keras
Machine learning in production:
• What’s your ML test score? A rubric for ML production systems
• MLFlow: machine learning lifecycle
Fairness & bias removal:
• Google’s “Attacking Discrimination in ML”
• IBM’s AI Fairness 360
Linkedin.com/in/KedemDor
Dor.Kedem (at) ing.com

More Related Content

What's hot

Automotive Technology Vision 2019
Automotive Technology Vision 2019Automotive Technology Vision 2019
Automotive Technology Vision 2019accenture
 
Towards Digital Twin standards following an open source approach
Towards Digital Twin standards following an open source approachTowards Digital Twin standards following an open source approach
Towards Digital Twin standards following an open source approachFIWARE
 
State of the CIO 2023 Sample Slides.pdf
State of the CIO 2023 Sample Slides.pdfState of the CIO 2023 Sample Slides.pdf
State of the CIO 2023 Sample Slides.pdfIDG
 
Blockchain Supply Chain
Blockchain Supply ChainBlockchain Supply Chain
Blockchain Supply ChainMelanie Swan
 
Data science workflow v1.1
Data science workflow v1.1Data science workflow v1.1
Data science workflow v1.1Jessie_N
 
Modeling Physical Systems in the Metaverse Easily with Graphs
Modeling Physical Systems in the Metaverse Easily with GraphsModeling Physical Systems in the Metaverse Easily with Graphs
Modeling Physical Systems in the Metaverse Easily with GraphsNeo4j
 
Blockchain in scm
Blockchain in scmBlockchain in scm
Blockchain in scmzaarahary
 
AI: Built to Scale
AI: Built to ScaleAI: Built to Scale
AI: Built to Scaleaccenture
 
GPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphNeo4j
 
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...Databricks
 
Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence Dr. Mohan K. Bavirisetty
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Rethinking Trust in Data
Rethinking Trust in Data Rethinking Trust in Data
Rethinking Trust in Data DATAVERSITY
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science LandscapePhilip Bourne
 
Digital Trend Study 2023 - "The Era of Digital Twins"
Digital Trend Study 2023 - "The Era of Digital Twins"Digital Trend Study 2023 - "The Era of Digital Twins"
Digital Trend Study 2023 - "The Era of Digital Twins"Space and Lemon Innovations
 
Enterprise Architecture for Business Model Innovation in a Connected Economy
Enterprise Architecture for Business Model Innovation in a Connected EconomyEnterprise Architecture for Business Model Innovation in a Connected Economy
Enterprise Architecture for Business Model Innovation in a Connected EconomySergio Compean
 
What is the Value of Mature Enterprise Architecture TOGAF
What is the Value of Mature Enterprise Architecture TOGAFWhat is the Value of Mature Enterprise Architecture TOGAF
What is the Value of Mature Enterprise Architecture TOGAFxavblai
 
Storytelling with data and data visualization
Storytelling with data and data visualizationStorytelling with data and data visualization
Storytelling with data and data visualizationFrehiwot Mulugeta
 

What's hot (20)

Automotive Technology Vision 2019
Automotive Technology Vision 2019Automotive Technology Vision 2019
Automotive Technology Vision 2019
 
Developing Digital Twins
Developing Digital TwinsDeveloping Digital Twins
Developing Digital Twins
 
Towards Digital Twin standards following an open source approach
Towards Digital Twin standards following an open source approachTowards Digital Twin standards following an open source approach
Towards Digital Twin standards following an open source approach
 
Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
 
State of the CIO 2023 Sample Slides.pdf
State of the CIO 2023 Sample Slides.pdfState of the CIO 2023 Sample Slides.pdf
State of the CIO 2023 Sample Slides.pdf
 
Blockchain Supply Chain
Blockchain Supply ChainBlockchain Supply Chain
Blockchain Supply Chain
 
Data science workflow v1.1
Data science workflow v1.1Data science workflow v1.1
Data science workflow v1.1
 
Modeling Physical Systems in the Metaverse Easily with Graphs
Modeling Physical Systems in the Metaverse Easily with GraphsModeling Physical Systems in the Metaverse Easily with Graphs
Modeling Physical Systems in the Metaverse Easily with Graphs
 
Blockchain in scm
Blockchain in scmBlockchain in scm
Blockchain in scm
 
AI: Built to Scale
AI: Built to ScaleAI: Built to Scale
AI: Built to Scale
 
GPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge Graph
 
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...
Productionalizing Machine Learning Solutions with Effective Tracking, Monitor...
 
Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence Building Big Data Analytics Center Of Excellence
Building Big Data Analytics Center Of Excellence
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Rethinking Trust in Data
Rethinking Trust in Data Rethinking Trust in Data
Rethinking Trust in Data
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
Digital Trend Study 2023 - "The Era of Digital Twins"
Digital Trend Study 2023 - "The Era of Digital Twins"Digital Trend Study 2023 - "The Era of Digital Twins"
Digital Trend Study 2023 - "The Era of Digital Twins"
 
Enterprise Architecture for Business Model Innovation in a Connected Economy
Enterprise Architecture for Business Model Innovation in a Connected EconomyEnterprise Architecture for Business Model Innovation in a Connected Economy
Enterprise Architecture for Business Model Innovation in a Connected Economy
 
What is the Value of Mature Enterprise Architecture TOGAF
What is the Value of Mature Enterprise Architecture TOGAFWhat is the Value of Mature Enterprise Architecture TOGAF
What is the Value of Mature Enterprise Architecture TOGAF
 
Storytelling with data and data visualization
Storytelling with data and data visualizationStorytelling with data and data visualization
Storytelling with data and data visualization
 

Similar to Model Factory at ING Bank

Cubodrom profile
Cubodrom profileCubodrom profile
Cubodrom profilecubodrom
 
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...Neo4j
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...Tealium
 
Primed-AP Methodology
Primed-AP MethodologyPrimed-AP Methodology
Primed-AP MethodologyCDO Advisors
 
Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonSocietyConsulting
 
Customer Data Platform (CDP) - TRANSFORMING CUSTOMER DATA FOR MARKETING AUTOM...
Customer Data Platform (CDP) - TRANSFORMING CUSTOMER DATA FOR MARKETING AUTOM...Customer Data Platform (CDP) - TRANSFORMING CUSTOMER DATA FOR MARKETING AUTOM...
Customer Data Platform (CDP) - TRANSFORMING CUSTOMER DATA FOR MARKETING AUTOM...greyaudrina
 
Use of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economyUse of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economyAmit Parija
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality RightDATAVERSITY
 
Entry Points – How to Get Rolling with Big Data Analytics
Entry Points – How to Get Rolling with Big Data AnalyticsEntry Points – How to Get Rolling with Big Data Analytics
Entry Points – How to Get Rolling with Big Data AnalyticsInside Analysis
 
Dataiku tatvic webinar presentation
Dataiku tatvic webinar presentationDataiku tatvic webinar presentation
Dataiku tatvic webinar presentationTatvic Analytics
 
What MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysWhat MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysBusiness Over Broadway
 
HWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics OrientationHWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics OrientationHWZ Hochschule für Wirtschaft
 
Building a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsBuilding a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsIronside
 
Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...
Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...
Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...Innovation Enterprise
 
Conker - Predict the future
Conker - Predict the futureConker - Predict the future
Conker - Predict the futureConkerRuss
 
20/10 Vision: Building A 21st Century Market Research Organization
20/10 Vision: Building A 21st Century Market Research Organization20/10 Vision: Building A 21st Century Market Research Organization
20/10 Vision: Building A 21st Century Market Research OrganizationGregory Weiss
 
Blitzscaling Session 9: Village Stage
Blitzscaling Session 9: Village StageBlitzscaling Session 9: Village Stage
Blitzscaling Session 9: Village StageGreylock Partners
 

Similar to Model Factory at ING Bank (20)

Cubodrom profile
Cubodrom profileCubodrom profile
Cubodrom profile
 
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
EY + Neo4j: Why graph technology makes sense for fraud detection and customer...
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
Digital Velocity 2014 Morning Keynote: "Building an Effective Digital Marketi...
 
Primed-AP Methodology
Primed-AP MethodologyPrimed-AP Methodology
Primed-AP Methodology
 
Big Data Meetup by Chad Richeson
Big Data Meetup by Chad RichesonBig Data Meetup by Chad Richeson
Big Data Meetup by Chad Richeson
 
Customer Data Platform (CDP) - TRANSFORMING CUSTOMER DATA FOR MARKETING AUTOM...
Customer Data Platform (CDP) - TRANSFORMING CUSTOMER DATA FOR MARKETING AUTOM...Customer Data Platform (CDP) - TRANSFORMING CUSTOMER DATA FOR MARKETING AUTOM...
Customer Data Platform (CDP) - TRANSFORMING CUSTOMER DATA FOR MARKETING AUTOM...
 
Use of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economyUse of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economy
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality Right
 
Entry Points – How to Get Rolling with Big Data Analytics
Entry Points – How to Get Rolling with Big Data AnalyticsEntry Points – How to Get Rolling with Big Data Analytics
Entry Points – How to Get Rolling with Big Data Analytics
 
Agile BI success factors
Agile BI success factorsAgile BI success factors
Agile BI success factors
 
Dataiku tatvic webinar presentation
Dataiku tatvic webinar presentationDataiku tatvic webinar presentation
Dataiku tatvic webinar presentation
 
What MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and SurveysWhat MBA Students Need to Know about CX, Data Science and Surveys
What MBA Students Need to Know about CX, Data Science and Surveys
 
Get your data analytics strategy right!
Get your data analytics strategy right!Get your data analytics strategy right!
Get your data analytics strategy right!
 
HWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics OrientationHWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
HWZ-Darden Konferenz: Building a Sustainable Analytics Orientation
 
Building a Winning Roadmap for Analytics
Building a Winning Roadmap for AnalyticsBuilding a Winning Roadmap for Analytics
Building a Winning Roadmap for Analytics
 
Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...
Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...
Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innov...
 
Conker - Predict the future
Conker - Predict the futureConker - Predict the future
Conker - Predict the future
 
20/10 Vision: Building A 21st Century Market Research Organization
20/10 Vision: Building A 21st Century Market Research Organization20/10 Vision: Building A 21st Century Market Research Organization
20/10 Vision: Building A 21st Century Market Research Organization
 
Blitzscaling Session 9: Village Stage
Blitzscaling Session 9: Village StageBlitzscaling Session 9: Village Stage
Blitzscaling Session 9: Village Stage
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Model Factory at ING Bank

  • 1. Model Factory @ING Bank Presentation to DataWorks Summit - 2019 2019-03-20Dor Kedem
  • 2. • Extensive software development career since 2002. • Working on machine learning research & data science applications since 2010. • Today, a lead data scientist and product owner @ ING Bank in Amsterdam. Grab me (or via LinkedIn) during these couple of days to talk about: • CI/CD solutions for a data science project lifecycle. • Impact-driven data science (from POCs to MVPs). • Modelling techniques and machine learning applications. • Transitioning from software development or IT roles to data science. • Boardgames and 3D puzzles. A bit about me 3 Linkedin.com/in/KedemDor Dor.Kedem (at) ing.com Image Credit: My wife, adorageek.com HAPPY BIRTHDAY!
  • 3. ING Bank at a glance Active in more than 40 countries +54.000 employees in ING Group 38M retail customers and 12.5M primary customers in 4Q18 Net Promoter Scores: #1 in 6 out of 13 retail countries Source: https://www.ing.com/About-us/Profile/Key-figures.htm
  • 4. Challenges in European Banking Scene Historically low interest rate = less revenue from lending Source: macrotrends.net Source: https://hollandfintech.com/ Historical LIBOR rates (grey – recession) Regulations leads to a more transparent and open banking environment Fintech industries are looking into innovative ways to fill traditional banking roles
  • 5. How does a bank differentiate itself from the rest? Sources: https://www.forbes.com/sites/kurtbadenhausen/2019/03/04/the-worlds-best-banks-ing-and-citibank-lead-the-way/ https://www.ing.com/About-us Empowering people to stay a step ahead in life and in business Our purpose Our strategic priorities
  • 6. Analytics Efforts in ING "Data is the language of the future. If you don’t speak it yet, we’ll help you master it," says Görkem Köseoğlu, ING’s chief analytics officer. Artificial Intelligence: Currently, ING employs around 80 data scientists, working on various AI-projects: Analytics skillset: Thousands of employees to engage analytical tools and insights:
  • 7. Our ambition: all customer interactions driven by analytics One-to-One Analytics Maximising number of analytics driven service and sales interactions Data > insight > action is in ING’s DNA Democratize big data usage across ING Users of our services are extremely happy 9
  • 8. Data Analytics for customer interactions (NL+BE) Customer Journey Experts Data Analysts Data Scientists Data Engineers How many? Over 300 (outside 1:1) Over 100 Roughly 20 Roughly 15 What do we know? • Banking • Marketing theory • Customer engagement • Message framing • BI tools (SAS, IBM Cognos) • Data Privacy • SQL • Statistics & ML • Data Privacy • Programming (i.e. Python, R, Scala) • Big data technologies • CI/CD solutions • Security & Compliance What do we create? • Product specification • Online & offline content • Customer engagement • Reports • Dashboards • A/B Testing • Statistical models • Data Products • ETL systems • Data lake • Model hosting 10 CJE - Christina DA - Arjen DS - Samir DE - Eleanor
  • 9. The need for a model factory 11
  • 10. For Black-Friday (Nov 23rd), a customer engagement business unit would want to engage eligible customers (via NBA + emails) with the option to acquire a new credit card. We have two types of offers: regular credit cards & platinum credit card. How to find who to contact with these offerings? 2 approaches: Example case: Credit Card Acquisition 12 DA - Arjen DS - Samir CJE - Christina • Build a likelihood model based on past behavior and engagements. • Rank customers according to this model. • Plot customer engagements on different demographics. • Come up with business rules based on personal understanding.
  • 11. For Black-Friday (Nov 23rd), a customer engagement business unit would want to engage eligible customers (via NBA + emails) with the option to acquire a new credit card. We have two types of offers: regular credit cards & platinum credit card. How to find who to contact with these offerings? 2 approaches: Example case: Credit Card Acquisition 13 DA - Arjen DS - Samir • Build a likelihood model based on past behavior and engagements. • Rank customers according to this model. • Plot customer engagements on different demographics. • Come up with business rules based on personal understanding. Very vast majority CJE - Christina
  • 12. Enticements are being offered to the wrong customers: • Customer disengagement (unsubscribes, ad-blindness) • Wasted work by CJE’s and DA’s • Loss of potential revenue. It takes a lot of time to make customer selections: • No structured way to figure out target population. • No structured learning from our past campaigns. • Only a binary selection (to send / not to send) – no ranking. Not leveraging on the full potential of our data: • Not taking into account a large set of features. • Not taking into account engagement with past offerings. • Not taking into account engagement with other products. • You only target what you can code. Problem with manual selection of the customers 14 Purchase No purchase All clients Top 10% One of the added value of models: Ranking customers Unordered Ranked by relevant Selection based on threshold
  • 13. Here are some responses gathered when asking about current way of work and gaps from best practices: Learning from our past & present… CJE - Christina DA - Arjen “I don’t have time for experiments & evaluation. We have a schedule and need to create the next campaigns” “I get personal satisfaction from weighing in my opinion” “I don’t see how testing everything to death leads to better results” “Management is not critical enough about measuring our performance” “I can’t get clear insights from my DA / CJE colleague” “I know my customers” “There’s a lack of clear guidelines and standards across ING tribes” “I need to be able to contact more customers, even if models disagree” Organizational Personal
  • 14. Fears needs to be addressed early on (i.e. fear of measurement, of loss of control, of automation) Focusing on empowerment before the revolution: • Helping DAs and CJEs to make better decisions, not to make all decisions for them. • Creating direct link: standardization  gain in personal efficiency. • Incorporating customers in our development squad. Resource: Check out ING PACE: Evidence-based design-driven lean approach. How should we interpret the interviews? 16 PACE PACE Phases Experiment Loop
  • 15. Our approach – Model Factory 17
  • 16. Democratizing model building: Enabling DA’s to create models for gaining customers insights. Accelerate best practices: Make it easy & fast to be effective in customer selections. • Model building process “built-in”: Tell us “what” you want – we take care of the “how”. • Evaluation “built-in”: Build a model  Get a free model & campaign evaluation! • Compliance “built-in”: GDPR, archiving, legal, commercial pressure, risk – we got you covered. Our Objective CJE - Christina DA - Arjen DS - Samir DE - Eleanor Saves time Better engagement Making large- scale impact Understand the customer better Saves time Grows in skills Meeting objectives Customer - Claire More relevant offerings ING Bank 17
  • 17. Building customer models without reinventing the model building process Model Factory 19 Building Blocks Model Recipe Model Building Process Scoring Model 𝑓( ) = 𝑦 Scoring eligible customers Feeding scores to ING processes Creating reports in BI tools for ING business units Somewhat similar open source approach: Uber’s Ludwig, AirBnB’s BigHead & KPN’s model factory
  • 18. Mandatory ingredients: • Business Objective Selection from: acquisition, deepsell, retention, customer journey. • Business Objective specification Based on the objective. For example: which product to acquire? • Features to include / exclude Selection from a list. Done based on domain expertise. • Customers to include / exclude SQL “where clause”. Based on domain expertise. Optional ingredients (with defaults): • Times specification: (How long does it take to acquire, how long before customer makes decision) • Modelling techniques: (for advanced / data scientists users) Model Recipe Model specification is translated to a 10-15 lines JSON file and is filled by a DA
  • 19. Analytics features extraction Machine learning monitoring processes Target templates (i.e. acquisition, deepsell) Classifiers Evaluators Hyperparameter / model selection (AutoML) Fairness & bias reduction Building Blocks Data-sets creators Uplift measurement Storage management Scheduling Hosting GDPR applications Interaction with ING services Available to all models built with a recipe specification:
  • 21. Building Blocks Example (1): Data Sources Clients (~80) Products (~600) Engagements (~300) Data dumps & streams from ING sources Data Lake Structured Data Sources Analytics Features Table(s) DE - Eleanor DA - Arjen DS - Samir Features in the table are GDPR validated Data scientists & analysts build an analytics repository from data sources. Data engineers build the ETL processes to create data sources. Built on top of: • IBM PureData for Analytics (PDA) • SAS Enterprise Global Creating the model feature sources
  • 22. Building Blocks Example (1): Data Sources Via Apache Sqoop (diffs / full) Deployed in Ansible Scheduled by IBM Workload Scheduler Hortonworks Data Platform IBM PureData for Analytics Transferring the data to our model building environment 1 2 Managing access via Apache Ranger on both levels: Files are extracted from PDA’s tables to HDFS Hive tables’ metadata is being created / updated. HDFS policies: hdfs://ingestion/raw Hive policies on specific tables and model results. i.e.: • hdfs://access/BEL/models • hdfs://access/NED/models Synchronize users & permissions ldap_user_sync LDAP Container
  • 23. Building Blocks Example (2): Data Sets Creators Some tips to building datasets: • Selecting different customers in each timestamps  Generalizing to new customers. • Arranging data set in time series accordance  Generalizing better for forecasting. Training set Jan ‘17 Jan ‘18 Valid Mar ‘18 Training set Jan ‘17 May ‘18 Valid Mar ‘18 Training set Jan ‘17 Jul ‘18 Valid May ‘18 Training set Jan ‘17 Jul ‘18 Test Dec ‘18 Time series cross validator Picking best hyper-parameters Train fit the model. Used to Valid learn hyper- parameters. Used to Test Legend evaluate and to pick best mode. Used to Useful resource - Timothy Lin’s Creating a Custom Cross-Validation Function in PySpark
  • 24. Building Blocks Example (3): Preparing data for training Pipeline approaches (pyspark.ml.pipeline): Elegant way to manage the workflow of your data processing. Each stage simply appends new transformers to the pipeline model stages: Transformers: The vast majority of transformers we use can be found in pyspark.ml.feature, as well as some custom transformers we’ve defined ourselves to make sure all preprocessing is managed in the pipeline object. Code snippet: each fit function adds stages to pipeline & transforms data for next stage Code snippet: Example of basic custom transformer
  • 25. Building Blocks Example (3 - Bonus): Filling missing values Apart from the pyspark.ml.feature Imputer class, we also experiment with using Autoencoders for filling missing values. • Encoder: A dimensionality reduction model, from the original features to a lower dimensional code. • Decoder: The reverse action – tries to recreate the original input from the code (not a perfect match). A very good resource for distributed deep learning on top of Spark: dist-keras (by CERN Database group). To learn more about autoencoder’s check out Irhum Shafkat’s Intuitively Understanding Variational Autoencoders. X: 5 10 23 <?> 0 Autoencoder design Autoencoder application: Data with missing feature gets in, filled value comes out X’: 5 10 22 7 0
  • 26. Building Blocks Example (4): Model Building Relying on open-source Big Data technologies as building blocks Classifiers (the model types): mainly based on the Spark Machine Learning framework and includes: • Linear / Logistic regression • Naïve Bayes • Decision Trees • Ensemble methods (Random Forest, GBRT) • Neural Networks (MLP) Evaluators (the model performance validation): • Everything under the Spark MLLib evaluation metrics. AutoML (finding the best model): • Currently experimenting with auto-sklearn & H2O for faster hyper-parameter tuning. See Georgian Partners’ comparison.
  • 27. Building Blocks Example (5): Fairness Resource: https://research.google.com/bigpicture/attacking-discrimination-in-ml/ For easy explanation: Attacking discrimination with smarter machine learning Resource: http://aif360.mybluemix.net/ For approaches on reducing bias: IBM AI Fairness 360
  • 29. Engaging with the model factory process & results Customer Journey Expert Data Analyst Data Scientist Data Engineer Validating building blocks Validating model execution Validate model quality Understand the customer better Selecting customers for campaign Post-hoc campaign evaluation Getting the big picture of model usage                  Monitoring Tool (Developed in-house) BI Tools (IBM Cognos Analytics)
  • 31. Designated system for monitoring production ML models This architecture enables access to metrics data from production models 33 User views project metrics Frontend loads the project Logstash Development / production cluster JSON files with project details, metrics, models, etc. When a user creates a model, we create a new model monitoring Project JSON files are pushed Backend Elastic SearchMonitoring Dashboard Test / Prd Test / Prd ING servers
  • 32. Designated system for monitoring production ML models Useful resource: Google AI’s What’s your ML test score? A rubric for ML production systems (Breck et. al, 2016) Open source alternative: mlflow.org (platform for machine learning lifecycle)
  • 33. Designated system for monitoring production ML models Useful resource: Google AI’s What’s your ML test score? A rubric for ML production systems (Breck et. al, 2016) Open source alternative: mlflow.org (platform for machine learning lifecycle)
  • 34. Reporting on the model built Recall Precision AUCs BA G H C D E F F* I F. Customer Segmentation G. Model comparison heat map. H. Compare features distributions. I. Score distribution J. Conversion for feature values. A. Technical quality metrics B. Lift curve C. Cumulative Gains D. Overlap with manual selection E. Feature Importance J 36
  • 35. A couple of examples to explain visually how good the model is: Credit Card Acquisition – Model Performance Customer percentiles (1% = 54k) 37 B/C Cumulative Gains – how many of our conversions did the model catch? Lift Curve – how much better is our selection than random selection? Customer percentiles (1% = 54k) DA - Arjen “Ok… It’s better than random… But how does it compare to my previous selection?”
  • 36. What’s the difference between my old selection and the model’s? 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 Customers not in old selection Ranked customers percentile (left – most relevant) Customersselectedinpercentile 20% Threshold 38 D DA - Arjen Customers in old selection “I feel more confident the model makes meaningful selections” “I see that the model found top customers that I haven’t contacted yet” “I still don’t understand who did I miss…”
  • 37. Who are the likely customers? (In this example: by age) 6.24% 10.82% 10.55% 10.32% 10.13% 12.06% 12.72% 12.82% 14.34% 1.94% 7.05% 8.38% 8.00% 8.35% 11.59% 15.05% 18.01% 21.64% 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 18-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-60+ How does [age] distribute over the most likely customers? Top 10% versus Bottom 90% Age distribution in each customer group 39 Portionofentireavailablepopulation Who did we miss in the top 20%? Selected versus Not Selected 5.82% 13.77% 12.34% 9.61% 8.68% 10.63% 12.12% 12.35% 14.68% 1.63% 6.81% 8.27% 8.38% 8.72% 12.10% 15.29% 17.99% 20.81% 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 18-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-60+ H
  • 38. Strategy for features: iterative refinement of the filtered features. Credit Card Acquisition – Important Features (larger = more important, blue color = significant to only one model) Regular credit card acquisitions Platinum credit card acquisitions 40 f_53 f_348 f_127 f_218 f_31 f_38 f_43 f_8 f_857 f_842 f_842 f_12 f_15 f_38 f_31 f_457 f_458 f_857 f_218 f_127 E DA - Arjen “I gain confidence that the models generate meaningful results.” “I can easily troubleshoot issues with model recipe”. “I learn more about our customers.”
  • 39. Grouping customers together based on the model’s important features. Customer Segmentation Segment size: indication of number of customers. Segment color: average conversion (more yellow = higher conversion). Y,X Axes: Don’t mean much, but the overall distance between segments mean that customers are more different based on important features (closer segments = more similar). Allows for further analysis on customer segments F CJE Christina “This helps me understand who are my customers and to tailor a message for each type of customers.”
  • 40. X-axis: Ranked customers interested in regular credit card (left - most interested) Y-axis: Ranked customers interested in platinum credit cards (down - most interested). Rectangles – the top 10% of customers in each group. Credit Card Acquisition – Which proposal to who? Bottom 90% Platinum 347k 4.9Mil Top 10% - Platinum 232k 412k Top 10% Regular Bottom 90% Regular # customers in shared percentile (log scale) Brighter = more customers Combined ranking for both credit card acquisition models 42 G DA - Arjen “I can now send the relevant offer to the relevant customers and avoid spamming.
  • 41. What to expect from the top-scored customers next month? 0.00% 0.50% 1.00% 1.50% 2.00% 2.50% 3.00% 3.50% 0 20 40 60 80 100 Expected next-month conversion rate for each percentile Conversionratenextmonth Customer percentile as ranked by the model (lower = more relevant) Percentile Expected Percentile’s Conversion Expected Total Conversion Expected Total Conversion 5 1.81% 2.25% 1404 10 1.37% 1.88% 2351 20 0.95% 1.50% 3737 50 0.48% 1.00% 6268 100 0.09% 0.65% 8103 43
  • 43. Summary: • Enabling model creation, using data scientists best practices and cumulative efforts. • Simple specification, modular design. • Accelerates DA’s, empowers CJE’s, and makes all of us more relevant to our customers. Model Factory @ING Bank Selected Resources: Driving innovation: • ING PACE: Evidence-based design-driven lean approach Model building: • Uber’s Ludwig – Building models without coding • AirBnB’s BigHead • Georgian Partners’ AutoML comparison • Creating a Custom Cross-Validation Function in PySpark • Distributed deep learning on spark: dist-keras Machine learning in production: • What’s your ML test score? A rubric for ML production systems • MLFlow: machine learning lifecycle Fairness & bias removal: • Google’s “Attacking Discrimination in ML” • IBM’s AI Fairness 360 Linkedin.com/in/KedemDor Dor.Kedem (at) ing.com