SlideShare a Scribd company logo
1 of 42
Pipelining the Heroes
with Kafka and Graph
Will Bleker
Gary Stewart
Amsterdam, 12 November 2018
2
Graph - Nodes and Relationships
3
Graph - Add properties
4
About us
5
EMPLOYED_BY
EMPLOYED_BY
Role: Chapter Lead
Role: Platform Architect
Gary Stewart
Will Bleker
Market leaders Benelux
Growth markets
Commercial Banking
Challengers
6
About ING
Over 40 countries
52,000+ employees
Concepts
7
• Concepts
• Pipeline
• Use case
• Learnings from NoSQL thinking
A little history …
8
Pets
• Unique and indispensable
• Often long lived
• Scale-up
• Active-passive architecture, pairs of servers, etc
Cattle
• Disposable, one of a herd
• Often short lived
• Scale-out
• Active-active architecture
• ‘Cattle’ datastores include
• Apache Cassandra
• Apache Kafka
Databases are much like the transport industry
• Undergone massive transformations
• Various types suitable for different needs
• Trains
• Relatively expensive
• Built on-demand
• Longer lifespan
• Cars
• Manual to mass production
• Almost consumable
• Often cheaper to rebuild, recycle than repair
Our twist – trains versus cars
9
Our journey to NoSQL
10
A few years back we started with Cassandra
• Exotic key/value store (column family)
• Distributed active-active architecture
• Tuneable consistency
• Paradigm shift
• Nights and weekends are now free (including LCM)
• However now a liquid expectation!
RDBMSNoSQL
Adding graph to the landscape
11
Gap in our DB landscape - connected data
• Another NoSQL database!
Interesting observation
• Key/Value easy to mess up a partition
• RDBMS easy to mess up a table
• Graph easy to mess up the database!
?
RDBMSNoSQL Graph
Typical architecture for caching data
12
SOR
Challenges include
• Initial load is usually once-off or manual
• Lifecycle management (LCM) is hard and per component
• Resilience patterns need to be applied everywhere
• Cross DC resilience pattern also required
Real-Time
Sync
Batch
Loader
Cache
Cache
Cache ‘Cattle’ Paradigm
13
SOR
Real-Time
Sync
Single
Materialized ViewCommit Log
Challenges include
• Initial load Integrated by design
• Lifecycle management (LCM) Simplified
• Resilience Simplified
• Cross DC Simplified
Batch
Loader
Cache ‘Cattle’ Paradigm – Taking it further
14
SOR
Real-Time
Sync
Single
Materialized ViewCommit Log
E.g. Apache Kafka supports compacting topics
• Ideal for rebuilding cache use cases
Batch
Loader
Cache ‘Cattle’ Paradigm – Going all the way
15
Real-Time
Sync
Single
Materialized ViewCommit Log
Batch
Loader
SOR
Pipeline
16
• Concepts
• Pipeline
• Use case
• Learnings from NoSQL thinking
Pipelining the ‘Materialized View’
17
Real-Time
Sync
Batch
Loader
Materialized View
SOR
Block ports
Get stuff
• Artifacts
Pre Processing
• Convert data
• neo4j-admin import
Start Neo4j
Post processing
Create Indexes
Complex calculations
Start applications in desired order
Unblock ports
Switch - Opening for client requests
Pipelining the ‘Materialized View’
18
Real-Time
Sync
Batch
Loader
Materialized View
SOR
Neo4j-admin import
19
Neo4j-admin import
20
IMPORT DONE in 4m 17s 923ms. Imported:
11 759 853 nodes
11 774 714 relationships
35 332 411 properties
IMPORT DONE in 1m 41s 941ms.Imported:
26 444 678 nodes
26 444 020 relationships
66 129 151 properties
Use case
21
• Concepts
• Pipeline
• Use case
• Learnings from NoSQL thinking
• Customer Teams
• Platform Teams
• Components
Challenges
• Rapid growth
• Too many dependencies
• Too many changes
• Often converging
• Too much data (metrics)
• Heroes
• …
Change is needed!
Managing our Cassandra Platform
22
Observation
• Ops person ‘pulls’ information when needed (requires logging on)
• Full shifting left is particularly challenging for multi-tenant platforms
Pipeline everything
• We don’t value initial bursts of energy anymore
• Pipeline the heroes J
• Design to rebuild
‘Personal’ goal
• Learn Graph technology
Understanding the challenges ahead
23
Architecture
24
• Simple and effective
• Executes ‘Ops’ commands:
• ps –ef
• netstat
• cat config.json
• Raw output sent as message
• All parsing rules for raw messages
• Store in databases
• Complex rules and analysis
Architecture
Commit LogProducer Materialized view
Consumer
App
25
Consuming message (data pipeline)
• Parse
• Store as Key/Value
• Calculate
• Observe
• Recommend
Principles
• Compare to inventory not ‘self discovery’
• Define agreements in code
• Design to be idempotent as possible
• No data migrations allowed (simply rebuild!)
Consumer - Graph Model
26
Graph proven to be a VERY good fit for this use case!
• Already producing 50+ commands
• Heroes have a place to ‘query’
• Various unexpected findings
• Misconfigured tenants
• 100+ recommendations made
Architecture now helps with
• Allow us to be ‘in’ control without ‘having’ control
• Remaining agile
• Shifted left our operational analysis
300+ cars were destroyed in our 8 month journey to learn graph!
So what happened?
27
Learnings from
NoSQL thinking
28
• Concepts
• Pipeline
• Use case
• Learnings from NoSQL thinking
Parse each message into key/value
WITH {a: "val 1", b: "val 2", c: "val 3"} AS data
CREATE (n:MyNode)
SET n += data;
Added 1 label, created 1 node, set 3 properties …
Benefits
• Easier to build insights with existing data
• ‘Raw’ data remains unchanged
• However if parsing rules change
Modify code
à Simply rebuild and …
Breaking it down to Key/Value
29
30
Simple model
• (:Source)-[:HAS_MSG]->(:Message)
Lots of nodes
31
Consider traversing to reduce density
• (:Source)-[:LAST]->(:Message)->[:PREV]->(:Message)
Modify code
à Simply rebuild and J
Partitioning in graph
32
33
MATCH …
OPTIONAL MATCH
WHERE …
MERGE
MATCH …
WHERE …
OPTIONAL MATCH
MERGE
Modify code
à Simply rebuild and J
Attempted to migrate data anyway
34
35
Kafka is not just a ‘messaging’ product
Actually it is a distributed commit log!
Like Graph we learnt Kafka too!!
What about Kafka?
36
Conclusions
37
Summary
38
Shifting left
• Requires initial extra effort
• Design for rebuild instead of migrations
• Creates a learning environment for new technology
Ensures agility
• Adding new features
• Implement learnings quickly
• Easier to experiment
Ensures control
• Easier to secure
• Data quality actually improves due to frequent rebuilds
• Safer and cheaper to build systems to not last long
Pipeline the heroes
• Create a data pipeline
• Automate data collection
• Automate analysis
Shift left your operational analysis!
In other words …
39
Thank you
Wilhelmus.Bleker@ing.com - @WBleker
Gary.Stewart@ing.com - @Gaz_GandA
We are hiring!
40
Follow us to stay a step ahead
ING.com
YouTube.com/ING
SlideShare.net/ING@ING_News LinkedIn.com/company/ING
Flickr.com/INGGroupFacebook.com/ING
ING Group’s Annual Accounts are prepared in accordance with
International Financial Reporting Standards as adopted by the
European Union (‘IFRS-EU’).
In preparing the financial information in this document, the same
accounting principles are applied as in the 2014 ING Group Annual
Accounts. All figures in this document are unaudited. Small
differences are possible in the tables due to rounding.
Certain of the statements contained herein are not historical facts,
including, without limitation, certain statements made of future
expectations and other forward-looking statements that are based
on management’s current views and assumptions and involve
known and unknown risks and uncertainties that could cause actual
results, performance or events to differ materially from those
expressed or implied in such statements. Actual results,
performance or events may differ materially from those in such
statements due to, without limitation: (1) changes in general
economic conditions, in particular economic conditions in ING’s core
markets, (2) changes in performance of financial markets, including
developing markets, (3) consequences of a potential (partial) break-
up of the euro, (4) the implementation of ING’s restructuring plan to
separate banking and insurance operations, (5) changes in the
availability of, and costs associated with, sources of liquidity such as
interbank funding, as well as conditions in the credit markets
generally, including changes in borrower and counterparty
creditworthiness, (6) the frequency and severity of insured loss
events, (7) changes affecting mortality and
morbidity levels and trends,(8) changes affecting persistency levels,
(9) changes affecting interest rate levels, (10) changes affecting
currency exchange rates, (11) changes in investor, customer and
policyholder behaviour, (12) changes in general competitive factors,
(13) changes in laws and regulations, (14) changes in the policies of
governments and/or regulatory authorities, (15) conclusions with
regard to purchase accounting assumptions and methodologies,
(16) changes in ownership that could affect the future availability to
us of net operating loss, net capital and built-in loss carry forwards,
(17) changes in credit ratings, (18) ING’s ability to achieve projected
operational synergies and (19) the other risks and uncertainties
detailed in the Risk Factors section contained in the most recent
annual report of ING Groep N.V. Any forward-looking statements
made by or on behalf of ING speak only as of the date they are
made, and, ING assumes no obligation to publicly update or revise
any forward-looking statements, whether as a result of new
information or for any other reason.
This document does not constitute an offer to sell, or a solicitation
of an offer to purchase, any securities in the United States or any
other jurisdiction. The securities of NN Group have not been and will
not be registered under the U.S. Securities Act of 1933, as amended
(the “Securities Act”), and may not be offered or sold within the
United States absent registration or an applicable exemption from
the registration requirements of the Securities Act.
www.ing.com
Disclaimer
42

More Related Content

What's hot

Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
confluent
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
confluent
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
confluent
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
confluent
 
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
confluent
 

What's hot (20)

Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
 
Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...
 
Elastically Scaling Kafka Using Confluent
Elastically Scaling Kafka Using ConfluentElastically Scaling Kafka Using Confluent
Elastically Scaling Kafka Using Confluent
 
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
 
Real-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka StreamsReal-Time Market Data Analytics Using Kafka Streams
Real-Time Market Data Analytics Using Kafka Streams
 
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWSBridge to Cloud: Using Apache Kafka to Migrate to AWS
Bridge to Cloud: Using Apache Kafka to Migrate to AWS
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Stream me to the Cloud (and back) with Confluent & MongoDB
Stream me to the Cloud (and back) with Confluent & MongoDBStream me to the Cloud (and back) with Confluent & MongoDB
Stream me to the Cloud (and back) with Confluent & MongoDB
 
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
 
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
Flink Forward Berlin 2017: Gyula Fora - Building and operating large-scale st...
 
Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams Building Pinterest Real-Time Ads Platform Using Kafka Streams
Building Pinterest Real-Time Ads Platform Using Kafka Streams
 
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
Confluent Cloud for Apache Kafka® | Google Cloud Next ’19
 
Bridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure WebinarBridge Your Kafka Streams to Azure Webinar
Bridge Your Kafka Streams to Azure Webinar
 
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®
 
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingThe Rise Of Event Streaming – Why Apache Kafka Changes Everything
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
 
Pivoting event streaming, from PROJECTS to a PLATFORM
Pivoting event streaming, from PROJECTS to a PLATFORMPivoting event streaming, from PROJECTS to a PLATFORM
Pivoting event streaming, from PROJECTS to a PLATFORM
 
Connecting Apache Kafka to Cash
Connecting Apache Kafka to CashConnecting Apache Kafka to Cash
Connecting Apache Kafka to Cash
 

Similar to Pipelining the Heroes with Kafka and Graph

Manish_rawal_Background_final3
Manish_rawal_Background_final3Manish_rawal_Background_final3
Manish_rawal_Background_final3
Manish Rawal
 

Similar to Pipelining the Heroes with Kafka and Graph (20)

GraphTour - ING - Fighting insanity
GraphTour - ING - Fighting insanityGraphTour - ING - Fighting insanity
GraphTour - ING - Fighting insanity
 
Modernizing Data Architecture using Data Virtualization for Agile Data Delivery
Modernizing Data Architecture using Data Virtualization for Agile Data DeliveryModernizing Data Architecture using Data Virtualization for Agile Data Delivery
Modernizing Data Architecture using Data Virtualization for Agile Data Delivery
 
Scalling through Couchbase at Sky Deutschland (Couchbase Live France 2015)
Scalling through Couchbase at Sky Deutschland (Couchbase Live France 2015)Scalling through Couchbase at Sky Deutschland (Couchbase Live France 2015)
Scalling through Couchbase at Sky Deutschland (Couchbase Live France 2015)
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 
Roadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph StrategyRoadmap for Enterprise Graph Strategy
Roadmap for Enterprise Graph Strategy
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Sitecore Experience & SUGCON 2019
Sitecore Experience & SUGCON 2019Sitecore Experience & SUGCON 2019
Sitecore Experience & SUGCON 2019
 
2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final2015 nov 27_thug_paytm_rt_ingest_brief_final
2015 nov 27_thug_paytm_rt_ingest_brief_final
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Caso de Sucesso Vodafone e Splunk
Caso de Sucesso Vodafone e SplunkCaso de Sucesso Vodafone e Splunk
Caso de Sucesso Vodafone e Splunk
 
Building Fast Applications for Streaming Data
Building Fast Applications for Streaming DataBuilding Fast Applications for Streaming Data
Building Fast Applications for Streaming Data
 
Manish_rawal_Background_final3
Manish_rawal_Background_final3Manish_rawal_Background_final3
Manish_rawal_Background_final3
 
Manish Rawal Solution Architect
Manish Rawal Solution ArchitectManish Rawal Solution Architect
Manish Rawal Solution Architect
 
Splunk Platform 2020 & Beyond
Splunk Platform 2020 & Beyond Splunk Platform 2020 & Beyond
Splunk Platform 2020 & Beyond
 
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...
 
Life is but a Stream
Life is but a StreamLife is but a Stream
Life is but a Stream
 

More from confluent

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Pipelining the Heroes with Kafka and Graph

  • 1. Pipelining the Heroes with Kafka and Graph Will Bleker Gary Stewart Amsterdam, 12 November 2018
  • 2. 2
  • 3. Graph - Nodes and Relationships 3
  • 4. Graph - Add properties 4
  • 5. About us 5 EMPLOYED_BY EMPLOYED_BY Role: Chapter Lead Role: Platform Architect Gary Stewart Will Bleker
  • 6. Market leaders Benelux Growth markets Commercial Banking Challengers 6 About ING Over 40 countries 52,000+ employees
  • 7. Concepts 7 • Concepts • Pipeline • Use case • Learnings from NoSQL thinking
  • 8. A little history … 8 Pets • Unique and indispensable • Often long lived • Scale-up • Active-passive architecture, pairs of servers, etc Cattle • Disposable, one of a herd • Often short lived • Scale-out • Active-active architecture • ‘Cattle’ datastores include • Apache Cassandra • Apache Kafka
  • 9. Databases are much like the transport industry • Undergone massive transformations • Various types suitable for different needs • Trains • Relatively expensive • Built on-demand • Longer lifespan • Cars • Manual to mass production • Almost consumable • Often cheaper to rebuild, recycle than repair Our twist – trains versus cars 9
  • 10. Our journey to NoSQL 10 A few years back we started with Cassandra • Exotic key/value store (column family) • Distributed active-active architecture • Tuneable consistency • Paradigm shift • Nights and weekends are now free (including LCM) • However now a liquid expectation! RDBMSNoSQL
  • 11. Adding graph to the landscape 11 Gap in our DB landscape - connected data • Another NoSQL database! Interesting observation • Key/Value easy to mess up a partition • RDBMS easy to mess up a table • Graph easy to mess up the database! ? RDBMSNoSQL Graph
  • 12. Typical architecture for caching data 12 SOR Challenges include • Initial load is usually once-off or manual • Lifecycle management (LCM) is hard and per component • Resilience patterns need to be applied everywhere • Cross DC resilience pattern also required Real-Time Sync Batch Loader Cache Cache
  • 13. Cache ‘Cattle’ Paradigm 13 SOR Real-Time Sync Single Materialized ViewCommit Log Challenges include • Initial load Integrated by design • Lifecycle management (LCM) Simplified • Resilience Simplified • Cross DC Simplified Batch Loader
  • 14. Cache ‘Cattle’ Paradigm – Taking it further 14 SOR Real-Time Sync Single Materialized ViewCommit Log E.g. Apache Kafka supports compacting topics • Ideal for rebuilding cache use cases Batch Loader
  • 15. Cache ‘Cattle’ Paradigm – Going all the way 15 Real-Time Sync Single Materialized ViewCommit Log Batch Loader SOR
  • 16. Pipeline 16 • Concepts • Pipeline • Use case • Learnings from NoSQL thinking
  • 17. Pipelining the ‘Materialized View’ 17 Real-Time Sync Batch Loader Materialized View SOR
  • 18. Block ports Get stuff • Artifacts Pre Processing • Convert data • neo4j-admin import Start Neo4j Post processing Create Indexes Complex calculations Start applications in desired order Unblock ports Switch - Opening for client requests Pipelining the ‘Materialized View’ 18 Real-Time Sync Batch Loader Materialized View SOR
  • 20. Neo4j-admin import 20 IMPORT DONE in 4m 17s 923ms. Imported: 11 759 853 nodes 11 774 714 relationships 35 332 411 properties IMPORT DONE in 1m 41s 941ms.Imported: 26 444 678 nodes 26 444 020 relationships 66 129 151 properties
  • 21. Use case 21 • Concepts • Pipeline • Use case • Learnings from NoSQL thinking
  • 22. • Customer Teams • Platform Teams • Components Challenges • Rapid growth • Too many dependencies • Too many changes • Often converging • Too much data (metrics) • Heroes • … Change is needed! Managing our Cassandra Platform 22
  • 23. Observation • Ops person ‘pulls’ information when needed (requires logging on) • Full shifting left is particularly challenging for multi-tenant platforms Pipeline everything • We don’t value initial bursts of energy anymore • Pipeline the heroes J • Design to rebuild ‘Personal’ goal • Learn Graph technology Understanding the challenges ahead 23
  • 24. Architecture 24 • Simple and effective • Executes ‘Ops’ commands: • ps –ef • netstat • cat config.json • Raw output sent as message • All parsing rules for raw messages • Store in databases • Complex rules and analysis
  • 26. Consuming message (data pipeline) • Parse • Store as Key/Value • Calculate • Observe • Recommend Principles • Compare to inventory not ‘self discovery’ • Define agreements in code • Design to be idempotent as possible • No data migrations allowed (simply rebuild!) Consumer - Graph Model 26
  • 27. Graph proven to be a VERY good fit for this use case! • Already producing 50+ commands • Heroes have a place to ‘query’ • Various unexpected findings • Misconfigured tenants • 100+ recommendations made Architecture now helps with • Allow us to be ‘in’ control without ‘having’ control • Remaining agile • Shifted left our operational analysis 300+ cars were destroyed in our 8 month journey to learn graph! So what happened? 27
  • 28. Learnings from NoSQL thinking 28 • Concepts • Pipeline • Use case • Learnings from NoSQL thinking
  • 29. Parse each message into key/value WITH {a: "val 1", b: "val 2", c: "val 3"} AS data CREATE (n:MyNode) SET n += data; Added 1 label, created 1 node, set 3 properties … Benefits • Easier to build insights with existing data • ‘Raw’ data remains unchanged • However if parsing rules change Modify code à Simply rebuild and … Breaking it down to Key/Value 29
  • 30. 30
  • 32. Consider traversing to reduce density • (:Source)-[:LAST]->(:Message)->[:PREV]->(:Message) Modify code à Simply rebuild and J Partitioning in graph 32
  • 33. 33
  • 34. MATCH … OPTIONAL MATCH WHERE … MERGE MATCH … WHERE … OPTIONAL MATCH MERGE Modify code à Simply rebuild and J Attempted to migrate data anyway 34
  • 35. 35
  • 36. Kafka is not just a ‘messaging’ product Actually it is a distributed commit log! Like Graph we learnt Kafka too!! What about Kafka? 36
  • 38. Summary 38 Shifting left • Requires initial extra effort • Design for rebuild instead of migrations • Creates a learning environment for new technology Ensures agility • Adding new features • Implement learnings quickly • Easier to experiment Ensures control • Easier to secure • Data quality actually improves due to frequent rebuilds • Safer and cheaper to build systems to not last long
  • 39. Pipeline the heroes • Create a data pipeline • Automate data collection • Automate analysis Shift left your operational analysis! In other words … 39
  • 40. Thank you Wilhelmus.Bleker@ing.com - @WBleker Gary.Stewart@ing.com - @Gaz_GandA We are hiring! 40
  • 41. Follow us to stay a step ahead ING.com YouTube.com/ING SlideShare.net/ING@ING_News LinkedIn.com/company/ING Flickr.com/INGGroupFacebook.com/ING
  • 42. ING Group’s Annual Accounts are prepared in accordance with International Financial Reporting Standards as adopted by the European Union (‘IFRS-EU’). In preparing the financial information in this document, the same accounting principles are applied as in the 2014 ING Group Annual Accounts. All figures in this document are unaudited. Small differences are possible in the tables due to rounding. Certain of the statements contained herein are not historical facts, including, without limitation, certain statements made of future expectations and other forward-looking statements that are based on management’s current views and assumptions and involve known and unknown risks and uncertainties that could cause actual results, performance or events to differ materially from those expressed or implied in such statements. Actual results, performance or events may differ materially from those in such statements due to, without limitation: (1) changes in general economic conditions, in particular economic conditions in ING’s core markets, (2) changes in performance of financial markets, including developing markets, (3) consequences of a potential (partial) break- up of the euro, (4) the implementation of ING’s restructuring plan to separate banking and insurance operations, (5) changes in the availability of, and costs associated with, sources of liquidity such as interbank funding, as well as conditions in the credit markets generally, including changes in borrower and counterparty creditworthiness, (6) the frequency and severity of insured loss events, (7) changes affecting mortality and morbidity levels and trends,(8) changes affecting persistency levels, (9) changes affecting interest rate levels, (10) changes affecting currency exchange rates, (11) changes in investor, customer and policyholder behaviour, (12) changes in general competitive factors, (13) changes in laws and regulations, (14) changes in the policies of governments and/or regulatory authorities, (15) conclusions with regard to purchase accounting assumptions and methodologies, (16) changes in ownership that could affect the future availability to us of net operating loss, net capital and built-in loss carry forwards, (17) changes in credit ratings, (18) ING’s ability to achieve projected operational synergies and (19) the other risks and uncertainties detailed in the Risk Factors section contained in the most recent annual report of ING Groep N.V. Any forward-looking statements made by or on behalf of ING speak only as of the date they are made, and, ING assumes no obligation to publicly update or revise any forward-looking statements, whether as a result of new information or for any other reason. This document does not constitute an offer to sell, or a solicitation of an offer to purchase, any securities in the United States or any other jurisdiction. The securities of NN Group have not been and will not be registered under the U.S. Securities Act of 1933, as amended (the “Securities Act”), and may not be offered or sold within the United States absent registration or an applicable exemption from the registration requirements of the Securities Act. www.ing.com Disclaimer 42