Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Bas Geerdink, 8 June 2017
Streaming Analytics at
IT Manager at ING Analytics
Academic background: A.I. & Informatics
Working in IT since 2004
At ING since 2013
@bgeerdink ...
Strategy
ING is a top financial enterprise, operating since 1881
Customers
33 Million
Private, Corporate and
Institutional Customer...
ING’s Think Forward strategy requires a data-driven approach
5
We’re making a shift from batch to real-time
6
Secure and reliable
Relevant
Personal
Omnichannel
Predictive Actionable
We’re building a Streaming Analytics platform with
high-end capabilities
7
• High performance & low latency to manage and ...
• Fraud detection:
• Detect possible fraudulous transactions
• Identify possible money laundering accounts
• Actionable In...
• Wisdom of the crowd:
• making use of similar customers to make better predictions
• Online learning:
• making use of cus...
Streaming Data Platform
The Streaming Data Platform has to be probabilistic in nature
11
Deterministic Probabilistic
• Business rules / decision t...
Streaming Data Platform Architecture
CEP Engine Machine Learning Engine Post-Processor
“detect pattern”
“determine
relevan...
Streaming Data Platform Architecture
13
CEP Engine Machine Learning Engine Post-Processor
Raw
Event
Business
Event
Notific...
Technology
Flink:
• originated as research project “Stratosphere” at Technische Universität Berlin
• is an open source framework for ...
PMML is the de facto standard for model representation
• A PMML file contains all pre- and post-processing as well as one ...
• Cross functional squads
• business, dev, ops, data scientist
• Reusing ING standard building blocks and processes
• Cont...
The application, in binary form:
• provides a framework for its business users,
• gives just enough configurability to be ...
Squad
In ING’s one way of working, ‘business’ and ‘IT’ go hand-in-hand
SquadSquad
Chapter
Guild
DS
DA
OpsOps
CJE
DS
DA
DS
...
Conclusions
 Bigger streaming data volumes to handle, with higher performance demands
 Enterprise-readyness and security of Flink is...
22
Questions
Follow us to stay a step ahead
We’re hiring!
ING.com
YouTube.com/ING
SlideShare.net/ING@ING_News LinkedIn.com/company/ING
...
Upcoming SlideShare
Loading in …5
×

Fast Data at ING – the why, what and how of the streaming analytics platform in a financial enterprise

1,168 views

Published on

Presented at the Spark/Flink meetup during the Dutch Data Science Week 2017

Published in: Data & Analytics
  • If u need a hand in making your writing assignments - visit ⇒ www.HelpWriting.net ⇐ for more detailed information.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Very nice tips on this. In case you need help on any kind of academic writing visit website ⇒ www.WritePaper.info ⇐ and place your order
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Fast Data at ING – the why, what and how of the streaming analytics platform in a financial enterprise

  1. 1. Bas Geerdink, 8 June 2017 Streaming Analytics at
  2. 2. IT Manager at ING Analytics Academic background: A.I. & Informatics Working in IT since 2004 At ING since 2013 @bgeerdink https://nl.linkedin.com/in/geerdink Bas Geerdink
  3. 3. Strategy
  4. 4. ING is a top financial enterprise, operating since 1881 Customers 33 Million Private, Corporate and Institutional Customers Countries 41 In Europe, Asia, Australia, North and South America Employees 52,000 Market leaders Benelux Growth markets Commercial Banking Challengers 4
  5. 5. ING’s Think Forward strategy requires a data-driven approach 5
  6. 6. We’re making a shift from batch to real-time 6 Secure and reliable Relevant Personal Omnichannel Predictive Actionable
  7. 7. We’re building a Streaming Analytics platform with high-end capabilities 7 • High performance & low latency to manage and process users’ events. • Scalability & availability to build new features, scenarios and integration with other external systems. • Fault tolerance mechanism to ensure high quality output with strong information consistency. • Online machine learning and business rules management capabilities.
  8. 8. • Fraud detection: • Detect possible fraudulous transactions • Identify possible money laundering accounts • Actionable Insights: • Small financial insights and robotized advise • Customer-defined alerts and push notifications • Offer chat/call when customers don’t finish their customer journey • Marketing and commercial offerings (‘Next Best Action’) Main use cases for a Streaming Analytics Platform 8
  9. 9. • Wisdom of the crowd: • making use of similar customers to make better predictions • Online learning: • making use of customer feedback in real-time to train models • Beyond banking: • enabling IoT Future scenarios 9
  10. 10. Streaming Data Platform
  11. 11. The Streaming Data Platform has to be probabilistic in nature 11 Deterministic Probabilistic • Business rules / decision trees • Fixed logic (if-then-else) • Rigid • Machine learning • Fuzzy logic • Flexible
  12. 12. Streaming Data Platform Architecture CEP Engine Machine Learning Engine Post-Processor “detect pattern” “determine relevancy” “produce notification” Application Flow: Business Flow: “Data generator” Formats outgoing messages e.g. create push message in HTML format Get Customer Profile Apply Selection Criteria Score Notificatio ns Detect Pattern Get Intermed Event Get Raw Events Send Output Create Intermed Event Get Business Event Create Business Event Format Event “Data gatherer” Load stream of raw events and combine/aggregate them into a meaningful business event e.g. multiple transactions within one hour lead to detection of ‘out of money’ pattern “Event scorer” Scoring of PMML models to determine the personal and relevant follow-up actions e.g. given customer is running out of money, give a suggestion to transfer money from savings to checkings account
  13. 13. Streaming Data Platform Architecture 13 CEP Engine Machine Learning Engine Post-Processor Raw Event Business Event Notification Event “detect pattern” “determine relevancy” “produce notification” Application Flow : Kafka Events: State: Business Flow: Customer Features Notification Definitions Models Get Customer Profile Apply Selection Criteria Score Notificatio ns Detect Pattern Get Intermed Event Get Raw Events Send Output Business Users System configuration: Machine Learning Environment Configuration GUI (future scope) Data Scientists Create Intermed Event Get Business Event Create Business Event Intermediate Event Format Event Data Lake CEP Functions Feedback loop heap openscoring.io
  14. 14. Technology
  15. 15. Flink: • originated as research project “Stratosphere” at Technische Universität Berlin • is an open source framework for distributed, in-memory (big) data analytics • Contains Java, Scala, and Python APIs for streams (DataStream), batch (DataSet), and relational data (Table API) Benefits: • true streaming: per-event processing, no micro-batching • high volumes, low latency Features: • state management and fault tolerancy: savepointing, checkpointing  exactly-once semanctics • time windowing: event time, flexible (e.g. sum all transactions in the past minute) • Complex Event Processing, Machine Learning, SQL (at various levels of maturity) What is Flink, and why use it? 15
  16. 16. PMML is the de facto standard for model representation • A PMML file contains all pre- and post-processing as well as one or more machine learning models • Support of hot-replacing an algorithm (configuration) at runtime by just providing an updated PMML model and interpreting (scoring) it with openscoring.io • PMML models can be created by a large number of tools, including: Spark, SAS, R, Knime, Python, H20.ai • Large number of supported machine learning models PMML (Predictive Model Markup Language) bridges the gap between data scientists and data engineers 16 Type Model support Simple business rules • Rulesets, score cards Complex business rules • Sequences, association rules Predictive models • Naive Bayes • k-Nearest Neighbors • Regression: linear, logistic, regularized • Trees: decision trees, random forest • Support Vector Machines Advanced models • Bayesian Network • Neural Network (Deep Learning) • Gaussian Process Other • Time Series • Cluster Models Openscoring.io
  17. 17. • Cross functional squads • business, dev, ops, data scientist • Reusing ING standard building blocks and processes • Continuous integration & delivery in DTAP environment • Develop: Gitlab  Jenkins  Artifactory • Using Flink’s savepointing mechanism • No data loss, guaranteed • Limited downtime • Scalability, resilience and robustness • Flink gives us automatic restart on failure, using checkpointing mechnism • Rolling updates of infrastructure and middleware leads to no downtime Development & operations according to the ING way of working 17 Design Develop TestDeploy Monitor
  18. 18. The application, in binary form: • provides a framework for its business users, • gives just enough configurability to be effective but not overcomplex • gives flexibility in the form of a DSL and GUI (future scope) The application’s internal components (the code, the artifacts): • Provide a toolbox for developers • Contain clearly defined building blocks, for maximum reusability • Are open source within ING The best of both worlds – Functional and Modular Flexibility 18
  19. 19. Squad In ING’s one way of working, ‘business’ and ‘IT’ go hand-in-hand SquadSquad Chapter Guild DS DA OpsOps CJE DS DA DS CJE DS Ops DS CJE Dev DevDevDevDev Tribe
  20. 20. Conclusions
  21. 21.  Bigger streaming data volumes to handle, with higher performance demands  Enterprise-readyness and security of Flink is continuously improving  With PMML we have a standard format that is very well suited for multidisciplinary teams, where data scientists work on the same user stories as engineers  We set up a streaming analytics community within ING, and work together with industry leaders (Lightbend, DataArtisans)  ING build a streaming analytics platform is as a generic, multi-purpose solution with fraud detection and actionable insights as it first use cases Why is Streaming Analytics with Flink the right choice for ING, moving forward? 21
  22. 22. 22 Questions
  23. 23. Follow us to stay a step ahead We’re hiring! ING.com YouTube.com/ING SlideShare.net/ING@ING_News LinkedIn.com/company/ING Flickr.com/INGGroupFacebook.com/ING

×