SlideShare a Scribd company logo
1 of 29
Download to read offline
Growing a better world together
Rabobank Group
DataWorks Summit, Berlin, 2018
Introduction
1996
2007
JeroenWolffensperger
Solution Architect Data
jeroen.wolffensperger@rabobank.nl
Introduction
martijn.groen@rabobank.nl
1995
1996
2007
2010
2015
Martijn Groen Msc. PMP
Rabobank Netherlands HQ
Delivery manager Data Lake & Delivery
Distribution (Client Data)
5Source: https://www.iea.org/newsroom/
6
7
8
9
Rabobank
11
What is needed to create all these new business
models?
Lot’s of
Data
Formats
Speed
Data Architecture
Data product types & Development styles
Raw & Defined data Information Product
Ad-Hoc Data R&D / Analytics
Source: Damhof Quadrant
OperationalizeOperationalize
Process
Dataflow
Data scientist,
Analysts
End-uses
Systematic
Opportunistic
Push/Supply/Source driven Pull/Demand/Product driven
Develop-
ment
Style
Push-Pull
point
Data Architecture
Data product types & Development styles
Raw & Defined data Information Product
Ad-Hoc Data R&D / Analytics
Systematic
Opportunistic
Push/Supply/Source driven Pull/Demand/Product driven
Develop-
ment
Style
Data Lab
Data Factory
Data Lake
Business-value
Provisioning
Data Lab
Data Architecture
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security)
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Monitoring (Infrastructure, Data usage)
Data Architecture building blocks
• Based on manufacturing
production process
• Each building block is
replaceable.
Data Logistics
Data Storage
Meta Data Storage
Data Refinery
Data
Provisioning
Transport
Compute
Catalog
Provide
Secure
Store
Resource management Monitor
import export
Data Architecture: technology
Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security)
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Monitoring (Infrastructure, Data usage)
Kafka: https://www.datanami.com/2017/08/15/kafka-helped-rabobank-modernize-alerting-system/
Data Logistics
Big Data Management
Why did we choose for: HDF – NiFi?
January 2017
• We compared: NiFi, Informatica Intelligent Streaming, Streamsets Data Collector
• NiFi has an open architecture, making it easy to create your own connectors.
• NiFi has the most functionality and is easy to use
• NiFi has the biggest user base and a very active community.
• Works well in combination with Cloudera.
• No data lineage and support for template deployment yet, but are on the roadmap
(release 3.2).
• Informatica’s first release of Intelligent Streaming* was December 2016. Product was
not yet mature enough.
• Streamsets is 100% in memory, where NiFi writes to disk. In our opinion less mature
than NiFi.
* Renamed to Big Data Streaming since January 2018
Data Architecture: technology
Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security)
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Monitoring (Infrastructure, Data usage)
HDFS
Data Storage
Data Architecture: technology
Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security)
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Monitoring (Infrastructure, Data usage)
Data Refinery
Big Data Management
Data Architecture: technology
Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security)
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Monitoring (Infrastructure, Data usage)
Data Provisioning
Data Architecture: technology
Business-value
Provisioning
Data Lab
Business
Intelligence
Analytics
Marketing
On-line Services
Real-time relevance
Data Lake
Sources
Data
Domains
External
Data
Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security)
Batch services
Real-time
services
services
Data Factory
Definition Factory
Information Factory
Monitoring (Infrastructure, Data usage)
Data Governance
Enterprise Data Catalog
Navigator
Big Data Management
Business case: Bedrijfskompas
(Company Compass)
• We deliver insight in your financial position and a benchmark about the
performance of other companies within your own branch or sector.
• We will do this via:
• An online dashboard with a graphical presentation of your liquidity.
• Displaying the performance of your company compared to aggregated
benchmark data of peers from the sector.
• We first implemented the liquidity dashboard and is currently made
accessible as stand alone visual via our internet banking environment.
Liquidity dashboard
Growth Hack Prototype Final F&F Release
Concept Growth Hack
Prototype
Initial F&F
Release
Final F&F
Release
Pilot Bank
Release
Full scale
release
Data Lab
Start Data
Lake
Connection
real-time
transactie data
Security
First API
endpoint live
Full API live
Performance
tuning
First API
specification
OpenshiftBig Data
Cluster
Start Front-End
team
26
Some figures
• Business case implemented in 8 months including initial set-up
infrastructure and security.
• HortonWorks Data Flow (3.0.2):
• Able to process 100.000 events per sec.; 0,6 GB per sec.
• Initial load: 25 billion payment transactions; 7 years of history loaded in 7
hours.
• NRT load: average of 15 million transactions per day
• Current average response time API call: < 100 ms
• Initial set-up costs are earned back via other business cases making use
of the infrastructure.
Key takeaways
• Fail fast: Experimental approach gives quick insights of possible fit within the
overall data architecture.
• Every technology component must be replaceable when choices made earlier
are proved to be not as good as expected and Hadoop technologies change fast.
• Hire (professional services) expertise for securing your cluster. Kerberos is a
headache but necessary.We thought we secured everything: NOT.
• Stay in control of the data provisioned via API’s.
• Data Governance is key to keep an overview of your Data Lake and also to
comply with all regulations like GDPR and BCBS239. A good Data Catalog is a
must.
2929
Thank you for your attention!

More Related Content

What's hot

Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...DataWorks Summit
 
Tools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudTools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudDataWorks Summit
 
Real Time Streaming Architecture at Ford
Real Time Streaming Architecture at FordReal Time Streaming Architecture at Ford
Real Time Streaming Architecture at FordDataWorks Summit
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNDataWorks Summit
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United AirlinesDataWorks Summit
 
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...DataWorks Summit
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...DataWorks Summit
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...DataWorks Summit
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataMats Johansson
 
Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveDataWorks Summit
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data PlatformLessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data PlatformDataWorks Summit
 
Kudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit ProcessesKudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit ProcessesDataWorks Summit
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...DataWorks Summit
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...DataWorks Summit/Hadoop Summit
 

What's hot (20)

Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
 
Tools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloudTools and approaches for migrating big datasets to the cloud
Tools and approaches for migrating big datasets to the cloud
 
Real Time Streaming Architecture at Ford
Real Time Streaming Architecture at FordReal Time Streaming Architecture at Ford
Real Time Streaming Architecture at Ford
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
 
Big data at United Airlines
Big data at United AirlinesBig data at United Airlines
Big data at United Airlines
 
Rebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for ScaleRebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for Scale
 
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
 
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
 
Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache Hive
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data PlatformLessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
Lessons Learned Migrating from IBM BigInsights to Hortonworks Data Platform
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Kudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit ProcessesKudu as Storage Layer to Digitize Credit Processes
Kudu as Storage Layer to Digitize Credit Processes
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
 

Similar to Growing together for a better world with Rabobank's DataWorks Summit

A Winning Strategy for the Digital Economy
A Winning Strategy for the Digital EconomyA Winning Strategy for the Digital Economy
A Winning Strategy for the Digital EconomyEric Kavanagh
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2Joe_F
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Impetus Technologies
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointconfluent
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessInside Analysis
 
Leverage Machine Data
Leverage Machine DataLeverage Machine Data
Leverage Machine DataSplunk
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraAttunity
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationDenodo
 
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...DATAVERSITY
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseHortonworks
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionMongoDB
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 

Similar to Growing together for a better world with Rabobank's DataWorks Summit (20)

A Winning Strategy for the Digital Economy
A Winning Strategy for the Digital EconomyA Winning Strategy for the Digital Economy
A Winning Strategy for the Digital Economy
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPoint
 
Take Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven BusinessTake Action: The New Reality of Data-Driven Business
Take Action: The New Reality of Data-Driven Business
 
Leverage Machine Data
Leverage Machine DataLeverage Machine Data
Leverage Machine Data
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming Era
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
The Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reductionThe Double win business transformation and in-year ROI and TCO reduction
The Double win business transformation and in-year ROI and TCO reduction
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 

Growing together for a better world with Rabobank's DataWorks Summit

  • 1. Growing a better world together Rabobank Group DataWorks Summit, Berlin, 2018
  • 3. Introduction martijn.groen@rabobank.nl 1995 1996 2007 2010 2015 Martijn Groen Msc. PMP Rabobank Netherlands HQ Delivery manager Data Lake & Delivery Distribution (Client Data)
  • 4.
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. 9
  • 11. 11
  • 12.
  • 13. What is needed to create all these new business models? Lot’s of Data Formats Speed
  • 14. Data Architecture Data product types & Development styles Raw & Defined data Information Product Ad-Hoc Data R&D / Analytics Source: Damhof Quadrant OperationalizeOperationalize Process Dataflow Data scientist, Analysts End-uses Systematic Opportunistic Push/Supply/Source driven Pull/Demand/Product driven Develop- ment Style Push-Pull point
  • 15. Data Architecture Data product types & Development styles Raw & Defined data Information Product Ad-Hoc Data R&D / Analytics Systematic Opportunistic Push/Supply/Source driven Pull/Demand/Product driven Develop- ment Style Data Lab Data Factory Data Lake
  • 16. Business-value Provisioning Data Lab Data Architecture Business Intelligence Analytics Marketing On-line Services Real-time relevance Data Lake Sources Data Domains External Data Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security) Batch services Real-time services services Data Factory Definition Factory Information Factory Monitoring (Infrastructure, Data usage)
  • 17. Data Architecture building blocks • Based on manufacturing production process • Each building block is replaceable. Data Logistics Data Storage Meta Data Storage Data Refinery Data Provisioning Transport Compute Catalog Provide Secure Store Resource management Monitor import export
  • 18. Data Architecture: technology Business-value Provisioning Data Lab Business Intelligence Analytics Marketing On-line Services Real-time relevance Data Lake Sources Data Domains External Data Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security) Batch services Real-time services services Data Factory Definition Factory Information Factory Monitoring (Infrastructure, Data usage) Kafka: https://www.datanami.com/2017/08/15/kafka-helped-rabobank-modernize-alerting-system/ Data Logistics Big Data Management
  • 19. Why did we choose for: HDF – NiFi? January 2017 • We compared: NiFi, Informatica Intelligent Streaming, Streamsets Data Collector • NiFi has an open architecture, making it easy to create your own connectors. • NiFi has the most functionality and is easy to use • NiFi has the biggest user base and a very active community. • Works well in combination with Cloudera. • No data lineage and support for template deployment yet, but are on the roadmap (release 3.2). • Informatica’s first release of Intelligent Streaming* was December 2016. Product was not yet mature enough. • Streamsets is 100% in memory, where NiFi writes to disk. In our opinion less mature than NiFi. * Renamed to Big Data Streaming since January 2018
  • 20. Data Architecture: technology Business-value Provisioning Data Lab Business Intelligence Analytics Marketing On-line Services Real-time relevance Data Lake Sources Data Domains External Data Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security) Batch services Real-time services services Data Factory Definition Factory Information Factory Monitoring (Infrastructure, Data usage) HDFS Data Storage
  • 21. Data Architecture: technology Business-value Provisioning Data Lab Business Intelligence Analytics Marketing On-line Services Real-time relevance Data Lake Sources Data Domains External Data Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security) Batch services Real-time services services Data Factory Definition Factory Information Factory Monitoring (Infrastructure, Data usage) Data Refinery Big Data Management
  • 22. Data Architecture: technology Business-value Provisioning Data Lab Business Intelligence Analytics Marketing On-line Services Real-time relevance Data Lake Sources Data Domains External Data Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security) Batch services Real-time services services Data Factory Definition Factory Information Factory Monitoring (Infrastructure, Data usage) Data Provisioning
  • 23. Data Architecture: technology Business-value Provisioning Data Lab Business Intelligence Analytics Marketing On-line Services Real-time relevance Data Lake Sources Data Domains External Data Data Management (Data Governance, Data Lineage, Data Quality, Metadata Management, Data Catalog, Data Security) Batch services Real-time services services Data Factory Definition Factory Information Factory Monitoring (Infrastructure, Data usage) Data Governance Enterprise Data Catalog Navigator Big Data Management
  • 24. Business case: Bedrijfskompas (Company Compass) • We deliver insight in your financial position and a benchmark about the performance of other companies within your own branch or sector. • We will do this via: • An online dashboard with a graphical presentation of your liquidity. • Displaying the performance of your company compared to aggregated benchmark data of peers from the sector. • We first implemented the liquidity dashboard and is currently made accessible as stand alone visual via our internet banking environment.
  • 25. Liquidity dashboard Growth Hack Prototype Final F&F Release Concept Growth Hack Prototype Initial F&F Release Final F&F Release Pilot Bank Release Full scale release Data Lab Start Data Lake Connection real-time transactie data Security First API endpoint live Full API live Performance tuning First API specification OpenshiftBig Data Cluster Start Front-End team
  • 26. 26
  • 27. Some figures • Business case implemented in 8 months including initial set-up infrastructure and security. • HortonWorks Data Flow (3.0.2): • Able to process 100.000 events per sec.; 0,6 GB per sec. • Initial load: 25 billion payment transactions; 7 years of history loaded in 7 hours. • NRT load: average of 15 million transactions per day • Current average response time API call: < 100 ms • Initial set-up costs are earned back via other business cases making use of the infrastructure.
  • 28. Key takeaways • Fail fast: Experimental approach gives quick insights of possible fit within the overall data architecture. • Every technology component must be replaceable when choices made earlier are proved to be not as good as expected and Hadoop technologies change fast. • Hire (professional services) expertise for securing your cluster. Kerberos is a headache but necessary.We thought we secured everything: NOT. • Stay in control of the data provisioned via API’s. • Data Governance is key to keep an overview of your Data Lake and also to comply with all regulations like GDPR and BCBS239. A good Data Catalog is a must.
  • 29. 2929 Thank you for your attention!