Smart data for a predictive bank
Alex Buijsman, Lead Architect Data Lake
Bart Buter, Head of Data Engineering – International Advanced Analytics
Hadoop Summit Europe 2016
Dublin • April 14th 2016
How to eat an elephant?
European countries with ING presence
Countries with IAA projects
Who are we?
2
3
Market leaders Benelux
Growth markets
Commercial Banking
Challengers
The world of ING – Data Driven Since 1881
4
Customers
34 Million
Private, Corporate and
Institutional Customers
Countries
more than 40
In Europe, Asia, Australia,
North and South America
Employees
52,000
In an environment that is accelerating
5
71%
would rather go to the
dentist than listen to what
banks are saying.
Self-service: IT enabled the ability for customers
to interact with their accounts and products
Real-time: customers have a real-time
experience in transactions and solution selling
Mobile and internet: majority of our customers
are enabled to do banking through the internet
and mobile anywhere and anytime
Financial fit: IT enables advanced predictive
analytics for forward looking scenario analysis on
wealth management including investments,
pension, savings, cash management, etc.
Monthly releases based on customer feedback:
Backlog management is incremental and
increasingly based on the voice of the customer
Assisted: customers interacted with branch
employees to use their accounts and products
Overnight batch processing: majority of banking
transactions executed overnight
Local branch open from 09:00 to 17:00: IT
supported the local branches and call centres
from 09:00 to 17:00
Bi-weekly reports on transactions: the majority
of insight into financial situation was provided by
(bi-)weekly backward looking statements on the
accounts
Big programs: IT was delivered in big programs
for which the objectives were specified at the
start
6
IT today is at the forefront of putting new technology to work
for customers
Role of IT in the past Role of IT today
Therefore a next generation Digital Bank must have
7
Smart Data
Data becomes the key
8
 Data is no longer something that is locked in operational systems, but must be
governed across all systems
 Data is the basis for creating an Omni channel experience for the customer and
therefore turning the Bank into a real digital bank the customer can do business with
24*7 and via any channel
 Data is the key in proving the Bank knows their customer and offering relevant
products and services. Therefore, analytics needs to transform from a prescriptive
use of data into predictive use of data
 Websites need to become personalized and also fully integrated with the digital
channels on mobile devices.
 Being in control of our data is key in maintaining our customers’ trust and regulators
demand proof from the bank of being in control of its data.
=> To be able to realise all of this, all data needs to be centrally governed
ING’s response – The Data Lake Architecture
9
The Data Lake is the “memory” of the Bank holding all data
relevant for reporting, advanced analytical and data exchange
ING Countries
Community Members
 We needed to learn
 Experiment together
 Find the correct answer
 And position Hadoop
10
To find out we started a community
2015Q1
2016Q1
2015Q2 & Q3
 Which gave us a common ground
 So we needed to compare
 So guidance is important
 So we had to harmonize to understand it
 But not good enough
 So we had to find common patterns
 Standardizing building blocks is needed
 Which is hard work and requires traveling
 We were doing similar things
 But not in a similar way
 And not all for the same reasons
 Language is key
 Knowledge sharing is important
 Value drives collaboration
 The devil is in the details
 So detailed collaboration is required
11
Collaboration taught us a lot
Hadoop as data preparation environment
Hadoop as exploration environment
Hadoop part of real time response systems (still under discussion)
12
We found the following patterns
using Hadoop to store filesFile Storage
Deep Data
Analytic Hadoop
Real Time
That fit very well
13
14
User Access Non personal & admin
Resources Cold data high volume
Integrity & Availability High
Change Slow
Usage Predictable, batch
Responsiveness Predictable
File Storage
15
User Access Non personal & admin
Resources Hot data, medium
volume
Integrity & Availability High
Change Slow
Usage Predictable, batch
Responsiveness Predictability
Deep Data
16
Analytics
User Access Personal
Resources Hot data, volume
based on use case
Integrity & Availability Low
Change High
Usage Ad hoc
Responsiveness Faster is better
17
User Access Non personal & admin
Resources Hot data, application
specific
Integrity & Availability High
Change Application specific
Usage Application specific
Responsiveness High
Real Time
18
Synergy
19
The Analytic pattern is especially difficult
The data needs additional protection because of its sensitivity
 Your bug will be fixed in the next version
and your next version will contain new
features with new bugs
 Do we invest in a work around or will
be wait?
 Keeping employees motivated to work
with new technology with related
problems:
 Finding qualified personnel
 Train and collaborate internally,
 Hadoop is a complex environment
 Many things can and will go wrong
 Building “one” cluster for all your
needs will lead to an overly complex
system
 Hadoop impacts many IT layers in
organization
 Data center, Virtualization & OS
 Authentication and Authorization
 Application management
20
Challenges
 Personal commitment is essential, but difficult in a decentralised organisation
 Collaboration requires frequent travel, so requires management commitment
 Hadoop and the surrounding tools are not enterprise ready yet in all aspects. Setting up
and maintaining environments costs a lot of time and effort
 Finding patterns in big data sets requires a paradigm shift which takes time
 Horton works as a partner is also still learning
Organizational learnings
21
 Provision Hadoop patterns as a service in the ING private cloud
 Community agreed to deliver the standard building blocks with regard to security, setup,
LDAP integration, scheduling
 Define how to put analytical models and scripts into production
 Embed analytical pattern in clear governance to guarantee compliance with the very
strict EU privacy legislation
 The Apache Atlas project is key and ING is expecting all vendors to make this successful
Future steps
22
Follow us to stay a step ahead
ING.com
YouTube.com/ING
SlideShare.net/ING@ING_News LinkedIn.com/company/ING
Flickr.com/INGGroupFacebook.com/ING
How to eat an Elephant: Illustration by Sean Gallo www.seangallo.com - used with permission.
Images of presenters by the presenters – used with permission.
Darwin's finches or Galapagos finches: Darwin, 1845. Journal of researches into the natural history
and geology of the countries visited during the voyage of H.M.S. Beagle round the world, under the
Command of Capt. Fitz Roy, R.N. 2d edition. – licensed under public domain
Fintechs: Venture scanner insights.venturescanner.com
Tabet and smartphone: ING OIB Image Bank
Data Lake: ING
Big Data: ING OIB Image Bank
Running man: S.R.Kooistra – used with permission
Collaboration: ING OIB Image Bank
Stopwatch: by Julian Lim - licensed under CC BY 2.0
Image attributions

Smart data for a predictive bank

  • 1.
    Smart data fora predictive bank Alex Buijsman, Lead Architect Data Lake Bart Buter, Head of Data Engineering – International Advanced Analytics Hadoop Summit Europe 2016 Dublin • April 14th 2016 How to eat an elephant?
  • 2.
    European countries withING presence Countries with IAA projects Who are we? 2
  • 3.
  • 4.
    Market leaders Benelux Growthmarkets Commercial Banking Challengers The world of ING – Data Driven Since 1881 4 Customers 34 Million Private, Corporate and Institutional Customers Countries more than 40 In Europe, Asia, Australia, North and South America Employees 52,000
  • 5.
    In an environmentthat is accelerating 5 71% would rather go to the dentist than listen to what banks are saying.
  • 6.
    Self-service: IT enabledthe ability for customers to interact with their accounts and products Real-time: customers have a real-time experience in transactions and solution selling Mobile and internet: majority of our customers are enabled to do banking through the internet and mobile anywhere and anytime Financial fit: IT enables advanced predictive analytics for forward looking scenario analysis on wealth management including investments, pension, savings, cash management, etc. Monthly releases based on customer feedback: Backlog management is incremental and increasingly based on the voice of the customer Assisted: customers interacted with branch employees to use their accounts and products Overnight batch processing: majority of banking transactions executed overnight Local branch open from 09:00 to 17:00: IT supported the local branches and call centres from 09:00 to 17:00 Bi-weekly reports on transactions: the majority of insight into financial situation was provided by (bi-)weekly backward looking statements on the accounts Big programs: IT was delivered in big programs for which the objectives were specified at the start 6 IT today is at the forefront of putting new technology to work for customers Role of IT in the past Role of IT today
  • 7.
    Therefore a nextgeneration Digital Bank must have 7 Smart Data
  • 8.
    Data becomes thekey 8  Data is no longer something that is locked in operational systems, but must be governed across all systems  Data is the basis for creating an Omni channel experience for the customer and therefore turning the Bank into a real digital bank the customer can do business with 24*7 and via any channel  Data is the key in proving the Bank knows their customer and offering relevant products and services. Therefore, analytics needs to transform from a prescriptive use of data into predictive use of data  Websites need to become personalized and also fully integrated with the digital channels on mobile devices.  Being in control of our data is key in maintaining our customers’ trust and regulators demand proof from the bank of being in control of its data. => To be able to realise all of this, all data needs to be centrally governed
  • 9.
    ING’s response –The Data Lake Architecture 9 The Data Lake is the “memory” of the Bank holding all data relevant for reporting, advanced analytical and data exchange
  • 10.
    ING Countries Community Members We needed to learn  Experiment together  Find the correct answer  And position Hadoop 10 To find out we started a community 2015Q1 2016Q1 2015Q2 & Q3
  • 11.
     Which gaveus a common ground  So we needed to compare  So guidance is important  So we had to harmonize to understand it  But not good enough  So we had to find common patterns  Standardizing building blocks is needed  Which is hard work and requires traveling  We were doing similar things  But not in a similar way  And not all for the same reasons  Language is key  Knowledge sharing is important  Value drives collaboration  The devil is in the details  So detailed collaboration is required 11 Collaboration taught us a lot
  • 12.
    Hadoop as datapreparation environment Hadoop as exploration environment Hadoop part of real time response systems (still under discussion) 12 We found the following patterns using Hadoop to store filesFile Storage Deep Data Analytic Hadoop Real Time
  • 13.
  • 14.
    14 User Access Nonpersonal & admin Resources Cold data high volume Integrity & Availability High Change Slow Usage Predictable, batch Responsiveness Predictable File Storage
  • 15.
    15 User Access Nonpersonal & admin Resources Hot data, medium volume Integrity & Availability High Change Slow Usage Predictable, batch Responsiveness Predictability Deep Data
  • 16.
    16 Analytics User Access Personal ResourcesHot data, volume based on use case Integrity & Availability Low Change High Usage Ad hoc Responsiveness Faster is better
  • 17.
    17 User Access Nonpersonal & admin Resources Hot data, application specific Integrity & Availability High Change Application specific Usage Application specific Responsiveness High Real Time
  • 18.
  • 19.
    19 The Analytic patternis especially difficult The data needs additional protection because of its sensitivity
  • 20.
     Your bugwill be fixed in the next version and your next version will contain new features with new bugs  Do we invest in a work around or will be wait?  Keeping employees motivated to work with new technology with related problems:  Finding qualified personnel  Train and collaborate internally,  Hadoop is a complex environment  Many things can and will go wrong  Building “one” cluster for all your needs will lead to an overly complex system  Hadoop impacts many IT layers in organization  Data center, Virtualization & OS  Authentication and Authorization  Application management 20 Challenges
  • 21.
     Personal commitmentis essential, but difficult in a decentralised organisation  Collaboration requires frequent travel, so requires management commitment  Hadoop and the surrounding tools are not enterprise ready yet in all aspects. Setting up and maintaining environments costs a lot of time and effort  Finding patterns in big data sets requires a paradigm shift which takes time  Horton works as a partner is also still learning Organizational learnings 21
  • 22.
     Provision Hadooppatterns as a service in the ING private cloud  Community agreed to deliver the standard building blocks with regard to security, setup, LDAP integration, scheduling  Define how to put analytical models and scripts into production  Embed analytical pattern in clear governance to guarantee compliance with the very strict EU privacy legislation  The Apache Atlas project is key and ING is expecting all vendors to make this successful Future steps 22
  • 23.
    Follow us tostay a step ahead ING.com YouTube.com/ING SlideShare.net/ING@ING_News LinkedIn.com/company/ING Flickr.com/INGGroupFacebook.com/ING
  • 24.
    How to eatan Elephant: Illustration by Sean Gallo www.seangallo.com - used with permission. Images of presenters by the presenters – used with permission. Darwin's finches or Galapagos finches: Darwin, 1845. Journal of researches into the natural history and geology of the countries visited during the voyage of H.M.S. Beagle round the world, under the Command of Capt. Fitz Roy, R.N. 2d edition. – licensed under public domain Fintechs: Venture scanner insights.venturescanner.com Tabet and smartphone: ING OIB Image Bank Data Lake: ING Big Data: ING OIB Image Bank Running man: S.R.Kooistra – used with permission Collaboration: ING OIB Image Bank Stopwatch: by Julian Lim - licensed under CC BY 2.0 Image attributions