SlideShare a Scribd company logo
1 of 23
1© Cloudera, Inc. All rights reserved.
How Apache Spark and Apache
Hadoop is helping to keep the Banking
regulators happy
2© Cloudera, Inc. All rights reserved.
Agenda
• Existing Architecture for Analytics & Risk
• Ever-changing Regulatory Landscape
• Challenges with existing architectures
• Modern architecture for Financial Risk
• Demo of key capabilities
3© Cloudera, Inc. All rights reserved.
Typical Existing Analytical Architecture
Data Sources
ETL/Staging
EDW
Archive
Data
Marts
Canned
Reports
Dashboards/An
alytic
Applications
Non-SQL
Workloads
Self-Service
BI/Ad Hoc
4© Cloudera, Inc. All rights reserved.
Regulatory Landscape
2012 2013 2014 2015 2016 2017 2018 2019
ICB Ring-fencing
ICB Loss
Absorbency
Leverage
Ratio -
Basel III
NSFR – Basel
III
MiFID II
T2S
LCR -
Basel III
ICB / Competition
Audit Policy
Cross Border
Debt Recovery
Financial
Transaction Tax
Market Abuse
Directive (MAD
II)
PRIP
Accounting
Directive
Review
AIFM Directive
EU Transparency
Directive
EU Reg on
Credit Rating
Agencies
CRDV
Internal
Governance
GuidelinesFATCA
PD
EMIR
SWAPS Push Out
– Dodd Frank
Securities Law
Directive (SLD)
Volker Rule –
Dodd Frank
Short Selling
Close Out
Netting
Crisis
Management
Recovery &
Resolution
Effective dates yet to be confirmed
BCBS 239 FRTB
5© Cloudera, Inc. All rights reserved.
Existing Architectures under pressure
Limited Data – Incorporating new risk factors
Data Sources
ETL/Staging
EDW
Archive
Data
Marts
Canned
Reports
Dashboards/An
alytic
Applications
Non-SQL
Workloads
Self-Service
BI/Ad Hoc
!
Limited Data & Insight
• Adding new data source
• Risk Factors
!
Latent Value
• How long to get new
reports with new risk factors
6© Cloudera, Inc. All rights reserved.
Existing Architectures under pressure
Missed SLA’s for VaR, ES & Stress scenarios
Data Sources
ETL/Staging
EDW
Archive
Data
Marts
Canned
Reports
Dashboards/An
alytic
Applications
Non-SQL
Workloads
Self-Service
BI/Ad Hoc
!
Overloaded Bottlenecks
* Ever-increasing ETL
windows
!
Overloaded Bottlenecks
* Ever-increasing batch
windows to extract data
7© Cloudera, Inc. All rights reserved.
Existing Architectures under pressure
Frustrated Quants on the “edge” nodes (not-only-sql)
Data Sources
ETL/Staging
EDW
Archive
Data
Marts
Canned
Reports
Dashboards/An
alytic
Applications
Non-SQL
Workloads
Self-Service
BI/Ad Hoc
!
Lack of Tooling
* Ad-hoc, on-demand
complex risk modeling
requirements
8© Cloudera, Inc. All rights reserved.
http://www.bis.org/publ/bcbs239.pdf
9© Cloudera, Inc. All rights reserved.
III - Accuracy &
Integrity
Strive for a single
authoritative source for
risk data. Aggregate on
an automated basis.
IV - Completeness
Capture and aggregate
all material risk data.
Data available by
business line, legal entity,
asset type, industry,
region.…
V - Timeliness
Generate aggregate
and up-to-date risk
data in a timely
manner.
VI - Adaptability
Meet a broad range of
on-demand, ad-hoc
risk management
reporting requests.
BCBS-239: Principles for Risk Data Aggregation
• Data, models and
processes live in silos
• Hard to get enterprise
wide view of risk
• Difficult to aggregate
• Lack of enterprise data
taxonomy
• Failed audits
• Aggregate / reported
risk data is infrequent
and stale
• Unable to handle
crisis situations
• Complex risk
modeling process
• Unable to handle
crisis situations
10© Cloudera, Inc. All rights reserved.
A modern risk platform calls for…
Scalability
More risk measures, more
scenarios. Fine-grained risk
data result in an order of
magnitude increase in
volume.
Speed
More frequent stress testing
and regulatory reporting.
High velocity scenario
development and
deployment.
Agility
More frequent stress testing
and Support for variety of
languages. Pre-trade
decisions. “What-if”
scenarios.
Transparency
Verifiable data. Timely
response to audits. Data
quality and lineage. Data and
model governance.
11© Cloudera, Inc. All rights reserved.
Storage
• Archival
• Traceability
Batch
• ETL
• Data Validation
• Reg Reporting
Interactive
• Risk Aggregation
• Stress Testing
HPC
• Risk Modeling
• Backtesting
• Simulation
Streaming &
Real Time
• Mkt Surveillance
• Best Execution
Evolution towards a modern risk platform
Risk & Regulatory Compliance Use Cases on Hadoop
HDFS
High-throughput, scalable,
fault-tolerant, distributed
file system.
MapReduce
Distributed parallel
processing
frameworks.
12© Cloudera, Inc. All rights reserved.
Storage
• Archival
• Traceability
Batch
• ETL
• Data Validation
• Reg Reporting
Interactive
• Risk Aggregation
• Stress Testing
HPC
• Risk Modeling
• Backtesting
• Simulation
Streaming &
Real Time
• Mkt Surveillance
• Best Execution
Apache Impala
Massively Parallel
Processing (MPP) SQL
engine.
Apache Spark
In-memory distributed
processing framework.
Evolution towards a modern risk platform
Risk & Regulatory Compliance Use Cases on Hadoop
13© Cloudera, Inc. All rights reserved.
Storage
• Archival
• Traceability
Batch
• ETL
• Data Validation
• Reg Reporting
Interactive
• Risk Aggregation
• Stress Testing
HPC
• Risk Modeling
• Backtesting
• Simulation
Streaming &
Real Time
• Mkt Surveillance
• Best Execution
Apache Spark
Distributed compute
framework. Can support
Python / C++, as well as
Java and Scala.
Data Science Workbench
Fully integrated data science
notebook application.
Cloudera Data
Science Workbench
Evolution towards a modern risk platform
Risk & Regulatory Compliance Use Cases on Hadoop
14© Cloudera, Inc. All rights reserved.
Storage
• Archival
• Traceability
Batch
• ETL
• Data Validation
• Reg Reporting
Interactive
• Risk Aggregation
• Stress Testing
HPC
• Risk Modeling
• Backtesting
• Simulation
Streaming &
Real Time
• Mkt Surveillance
• Best Execution
Cloudera Data
Science Workbench
Apache Kudu
Real-time streaming
architectures for true
Aggregated Risk of
Demand
Evolution towards a modern risk platform
Risk & Regulatory Compliance Use Cases on Hadoop
15© Cloudera, Inc. All rights reserved.
Modern Platform for Analytics and Machine Learning
Data
Sources
EDW
Analytic
Database
Operational
Database
Data Science
& Engineering
Shared Data
Layer
Modern Data Platform
Fixed
Reports
Dashboards/
Analytic
Applications
Non-SQL
Workloads
Self-
Service
BI/Ad Hoc
Flexible
Reporting
MiFID II, FRTB, IFRS-9, BCBS-239, MAD/MAR, GDPR, ….
16© Cloudera, Inc. All rights reserved.
BCBS 239 / FRTB “Illustrative” Architecture
Market Data Revaluation Calculation & Aggregation Reporting
Market Data Feeds
IPV
Independent Price
Valuation Function
MRF / NMRF
Modelable & Non-
Modelable Risk Factors
Calibration
Fixed Income
Front Office
Pricing Engines
Equity Mkts
Front Office
Pricing Engines
FX
Front Office
Pricing Engines
… Other Mkts
Front Office
Pricing Engines
Enterprise Data Hub
Static Data Market Data Configuration
P&L Vectors Sensitivities Events
Positions & Transaction Data
Scenarios
- Current
- Historic
- Stressed
- Projected
Risk
Metrics SA-related Risk
Components
Counter-Party
Credit Risk XVA
ES & Stressed ES P&L Attribution VaR
Regulatory
Applications
MiFID 2 Stress Testing GDPR
FRTB SA FRTB IMA EMIR
Regulatory
Reporting
Management
Reporting
Scenarios
RiskSensitivities
17© Cloudera, Inc. All rights reserved.
BCBS 239 – Timeliness (Real-time risk)
Simplifying Lambda architectures with Apache Kudu
Kafka Spark
Streaming
Kudu
Spark MLlib
Application
Data
Sources
Individual Session
Full Model/Learning
Genesis
Real-time
Risk with
Greeks
1
Event
Occurs
2
Market
Data 3
Stream
Processin
g
4
Land in
RDBMS
5
Batch
Valuation
18© Cloudera, Inc. All rights reserved.
Metadata
Management
Ingest
Validation
Profiling
Developer Tools: IDEs, Notebooks, SCM Operations Tools: Scheduling, Workflow, Publishing
Data Management Exploration / Model Development Production / Model Deployment
Feature
Engineering
Model Training
& Testing
Visualization
Production
Feature
Generation
Production
Model Port
Production
Testing
Result
Validation
Serving
User: Data Engineer User: Quant Analyst Users: Data / Dev / Ops Engineer
Modern Platform for Analytics and Machine Learning
Supporting complete development lifecycle for risk
19© Cloudera, Inc. All rights reserved.
Risk Footprint with
Apache Spark and Hadoop
o 19 GSIB customers
o 9 banks with risk use
cases in production
o 6000+ nodes deployed
o >5 years in production
20© Cloudera, Inc. All rights reserved.
Market Risk
aggregation platform
for a Global
Systemically
Important Bank
55x faster processing, 8x more data
capacity
300+ daily interactive users analyzing
current and historical data
21© Cloudera, Inc. All rights reserved.
Global Systemically
Important Bank
On-premise and cloud-
based Hadoop clusters
according to workload.
Tested on AWS to 40,000
cores. Demonstrated
linear scaling of simulation
workloads.
22© Cloudera, Inc. All rights reserved.
Demo
23© Cloudera, Inc. All rights reserved.
Q&A

More Related Content

What's hot

Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
Apache Kafka in the Insurance Industry
Apache Kafka in the Insurance IndustryApache Kafka in the Insurance Industry
Apache Kafka in the Insurance IndustryKai Wähner
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motionconfluent
 
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Kai Wähner
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaMainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaKai Wähner
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail IndustryKai Wähner
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Itai Yaffe
 
Ist Daten-Liberalismus der richtige Weg?
Ist Daten-Liberalismus der richtige Weg?Ist Daten-Liberalismus der richtige Weg?
Ist Daten-Liberalismus der richtige Weg?confluent
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceDATAVERSITY
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleDatabricks
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Data in Motion bei LKW WALTER
Data in Motion bei LKW WALTERData in Motion bei LKW WALTER
Data in Motion bei LKW WALTERconfluent
 

What's hot (20)

Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Apache Kafka in the Insurance Industry
Apache Kafka in the Insurance IndustryApache Kafka in the Insurance Industry
Apache Kafka in the Insurance Industry
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
 
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache KafkaMainframe Integration, Offloading and Replacement with Apache Kafka
Mainframe Integration, Offloading and Replacement with Apache Kafka
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
 
Ist Daten-Liberalismus der richtige Weg?
Ist Daten-Liberalismus der richtige Weg?Ist Daten-Liberalismus der richtige Weg?
Ist Daten-Liberalismus der richtige Weg?
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 
Big Data Hadoop Customer 360 Degree View
Big Data Hadoop Customer 360 Degree ViewBig Data Hadoop Customer 360 Degree View
Big Data Hadoop Customer 360 Degree View
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Data in Motion bei LKW WALTER
Data in Motion bei LKW WALTERData in Motion bei LKW WALTER
Data in Motion bei LKW WALTER
 

Similar to How Apache Spark and Apache Hadoop are being used to keep banking regulators happy

Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Precisely
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Matt Stubbs
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...Deepak Chandramouli
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of dataconfluent
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
 
Finance Trading in The Cloud - AWS Michigan Meetup
Finance Trading in The Cloud - AWS Michigan MeetupFinance Trading in The Cloud - AWS Michigan Meetup
Finance Trading in The Cloud - AWS Michigan MeetupEric Detterman
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesTimothy Spann
 
Real-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicReal-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicAmazon Web Services
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Kai Wähner
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertconfluent
 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Niel Dunnage
 
Big Data Case study - caixa bank
Big Data Case study - caixa bankBig Data Case study - caixa bank
Big Data Case study - caixa bankChungsik Yun
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Addressing Challenges with IoT Edge Management
Addressing Challenges with IoT Edge ManagementAddressing Challenges with IoT Edge Management
Addressing Challenges with IoT Edge ManagementDataWorks Summit
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Cloudera, Inc.
 
Kafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance IndustryKafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance IndustryKai Wähner
 
Real-Time Analytics for Industries
Real-Time Analytics for IndustriesReal-Time Analytics for Industries
Real-Time Analytics for IndustriesAvadhoot Patwardhan
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 

Similar to How Apache Spark and Apache Hadoop are being used to keep banking regulators happy (20)

Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...Processing Real-Time Data at Scale: A streaming platform as a central nervous...
Processing Real-Time Data at Scale: A streaming platform as a central nervous...
 
APM
APMAPM
APM
 
Finance Trading in The Cloud - AWS Michigan Meetup
Finance Trading in The Cloud - AWS Michigan MeetupFinance Trading in The Cloud - AWS Michigan Meetup
Finance Trading in The Cloud - AWS Michigan Meetup
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
Real-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo LogicReal-time Visibility at Scale with Sumo Logic
Real-time Visibility at Scale with Sumo Logic
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
 
Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2Fighting cyber fraud with hadoop v2
Fighting cyber fraud with hadoop v2
 
Big Data Case study - caixa bank
Big Data Case study - caixa bankBig Data Case study - caixa bank
Big Data Case study - caixa bank
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Addressing Challenges with IoT Edge Management
Addressing Challenges with IoT Edge ManagementAddressing Challenges with IoT Edge Management
Addressing Challenges with IoT Edge Management
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 
Kafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance IndustryKafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance Industry
 
Real-Time Analytics for Industries
Real-Time Analytics for IndustriesReal-Time Analytics for Industries
Real-Time Analytics for Industries
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 

How Apache Spark and Apache Hadoop are being used to keep banking regulators happy

  • 1. 1© Cloudera, Inc. All rights reserved. How Apache Spark and Apache Hadoop is helping to keep the Banking regulators happy
  • 2. 2© Cloudera, Inc. All rights reserved. Agenda • Existing Architecture for Analytics & Risk • Ever-changing Regulatory Landscape • Challenges with existing architectures • Modern architecture for Financial Risk • Demo of key capabilities
  • 3. 3© Cloudera, Inc. All rights reserved. Typical Existing Analytical Architecture Data Sources ETL/Staging EDW Archive Data Marts Canned Reports Dashboards/An alytic Applications Non-SQL Workloads Self-Service BI/Ad Hoc
  • 4. 4© Cloudera, Inc. All rights reserved. Regulatory Landscape 2012 2013 2014 2015 2016 2017 2018 2019 ICB Ring-fencing ICB Loss Absorbency Leverage Ratio - Basel III NSFR – Basel III MiFID II T2S LCR - Basel III ICB / Competition Audit Policy Cross Border Debt Recovery Financial Transaction Tax Market Abuse Directive (MAD II) PRIP Accounting Directive Review AIFM Directive EU Transparency Directive EU Reg on Credit Rating Agencies CRDV Internal Governance GuidelinesFATCA PD EMIR SWAPS Push Out – Dodd Frank Securities Law Directive (SLD) Volker Rule – Dodd Frank Short Selling Close Out Netting Crisis Management Recovery & Resolution Effective dates yet to be confirmed BCBS 239 FRTB
  • 5. 5© Cloudera, Inc. All rights reserved. Existing Architectures under pressure Limited Data – Incorporating new risk factors Data Sources ETL/Staging EDW Archive Data Marts Canned Reports Dashboards/An alytic Applications Non-SQL Workloads Self-Service BI/Ad Hoc ! Limited Data & Insight • Adding new data source • Risk Factors ! Latent Value • How long to get new reports with new risk factors
  • 6. 6© Cloudera, Inc. All rights reserved. Existing Architectures under pressure Missed SLA’s for VaR, ES & Stress scenarios Data Sources ETL/Staging EDW Archive Data Marts Canned Reports Dashboards/An alytic Applications Non-SQL Workloads Self-Service BI/Ad Hoc ! Overloaded Bottlenecks * Ever-increasing ETL windows ! Overloaded Bottlenecks * Ever-increasing batch windows to extract data
  • 7. 7© Cloudera, Inc. All rights reserved. Existing Architectures under pressure Frustrated Quants on the “edge” nodes (not-only-sql) Data Sources ETL/Staging EDW Archive Data Marts Canned Reports Dashboards/An alytic Applications Non-SQL Workloads Self-Service BI/Ad Hoc ! Lack of Tooling * Ad-hoc, on-demand complex risk modeling requirements
  • 8. 8© Cloudera, Inc. All rights reserved. http://www.bis.org/publ/bcbs239.pdf
  • 9. 9© Cloudera, Inc. All rights reserved. III - Accuracy & Integrity Strive for a single authoritative source for risk data. Aggregate on an automated basis. IV - Completeness Capture and aggregate all material risk data. Data available by business line, legal entity, asset type, industry, region.… V - Timeliness Generate aggregate and up-to-date risk data in a timely manner. VI - Adaptability Meet a broad range of on-demand, ad-hoc risk management reporting requests. BCBS-239: Principles for Risk Data Aggregation • Data, models and processes live in silos • Hard to get enterprise wide view of risk • Difficult to aggregate • Lack of enterprise data taxonomy • Failed audits • Aggregate / reported risk data is infrequent and stale • Unable to handle crisis situations • Complex risk modeling process • Unable to handle crisis situations
  • 10. 10© Cloudera, Inc. All rights reserved. A modern risk platform calls for… Scalability More risk measures, more scenarios. Fine-grained risk data result in an order of magnitude increase in volume. Speed More frequent stress testing and regulatory reporting. High velocity scenario development and deployment. Agility More frequent stress testing and Support for variety of languages. Pre-trade decisions. “What-if” scenarios. Transparency Verifiable data. Timely response to audits. Data quality and lineage. Data and model governance.
  • 11. 11© Cloudera, Inc. All rights reserved. Storage • Archival • Traceability Batch • ETL • Data Validation • Reg Reporting Interactive • Risk Aggregation • Stress Testing HPC • Risk Modeling • Backtesting • Simulation Streaming & Real Time • Mkt Surveillance • Best Execution Evolution towards a modern risk platform Risk & Regulatory Compliance Use Cases on Hadoop HDFS High-throughput, scalable, fault-tolerant, distributed file system. MapReduce Distributed parallel processing frameworks.
  • 12. 12© Cloudera, Inc. All rights reserved. Storage • Archival • Traceability Batch • ETL • Data Validation • Reg Reporting Interactive • Risk Aggregation • Stress Testing HPC • Risk Modeling • Backtesting • Simulation Streaming & Real Time • Mkt Surveillance • Best Execution Apache Impala Massively Parallel Processing (MPP) SQL engine. Apache Spark In-memory distributed processing framework. Evolution towards a modern risk platform Risk & Regulatory Compliance Use Cases on Hadoop
  • 13. 13© Cloudera, Inc. All rights reserved. Storage • Archival • Traceability Batch • ETL • Data Validation • Reg Reporting Interactive • Risk Aggregation • Stress Testing HPC • Risk Modeling • Backtesting • Simulation Streaming & Real Time • Mkt Surveillance • Best Execution Apache Spark Distributed compute framework. Can support Python / C++, as well as Java and Scala. Data Science Workbench Fully integrated data science notebook application. Cloudera Data Science Workbench Evolution towards a modern risk platform Risk & Regulatory Compliance Use Cases on Hadoop
  • 14. 14© Cloudera, Inc. All rights reserved. Storage • Archival • Traceability Batch • ETL • Data Validation • Reg Reporting Interactive • Risk Aggregation • Stress Testing HPC • Risk Modeling • Backtesting • Simulation Streaming & Real Time • Mkt Surveillance • Best Execution Cloudera Data Science Workbench Apache Kudu Real-time streaming architectures for true Aggregated Risk of Demand Evolution towards a modern risk platform Risk & Regulatory Compliance Use Cases on Hadoop
  • 15. 15© Cloudera, Inc. All rights reserved. Modern Platform for Analytics and Machine Learning Data Sources EDW Analytic Database Operational Database Data Science & Engineering Shared Data Layer Modern Data Platform Fixed Reports Dashboards/ Analytic Applications Non-SQL Workloads Self- Service BI/Ad Hoc Flexible Reporting MiFID II, FRTB, IFRS-9, BCBS-239, MAD/MAR, GDPR, ….
  • 16. 16© Cloudera, Inc. All rights reserved. BCBS 239 / FRTB “Illustrative” Architecture Market Data Revaluation Calculation & Aggregation Reporting Market Data Feeds IPV Independent Price Valuation Function MRF / NMRF Modelable & Non- Modelable Risk Factors Calibration Fixed Income Front Office Pricing Engines Equity Mkts Front Office Pricing Engines FX Front Office Pricing Engines … Other Mkts Front Office Pricing Engines Enterprise Data Hub Static Data Market Data Configuration P&L Vectors Sensitivities Events Positions & Transaction Data Scenarios - Current - Historic - Stressed - Projected Risk Metrics SA-related Risk Components Counter-Party Credit Risk XVA ES & Stressed ES P&L Attribution VaR Regulatory Applications MiFID 2 Stress Testing GDPR FRTB SA FRTB IMA EMIR Regulatory Reporting Management Reporting Scenarios RiskSensitivities
  • 17. 17© Cloudera, Inc. All rights reserved. BCBS 239 – Timeliness (Real-time risk) Simplifying Lambda architectures with Apache Kudu Kafka Spark Streaming Kudu Spark MLlib Application Data Sources Individual Session Full Model/Learning Genesis Real-time Risk with Greeks 1 Event Occurs 2 Market Data 3 Stream Processin g 4 Land in RDBMS 5 Batch Valuation
  • 18. 18© Cloudera, Inc. All rights reserved. Metadata Management Ingest Validation Profiling Developer Tools: IDEs, Notebooks, SCM Operations Tools: Scheduling, Workflow, Publishing Data Management Exploration / Model Development Production / Model Deployment Feature Engineering Model Training & Testing Visualization Production Feature Generation Production Model Port Production Testing Result Validation Serving User: Data Engineer User: Quant Analyst Users: Data / Dev / Ops Engineer Modern Platform for Analytics and Machine Learning Supporting complete development lifecycle for risk
  • 19. 19© Cloudera, Inc. All rights reserved. Risk Footprint with Apache Spark and Hadoop o 19 GSIB customers o 9 banks with risk use cases in production o 6000+ nodes deployed o >5 years in production
  • 20. 20© Cloudera, Inc. All rights reserved. Market Risk aggregation platform for a Global Systemically Important Bank 55x faster processing, 8x more data capacity 300+ daily interactive users analyzing current and historical data
  • 21. 21© Cloudera, Inc. All rights reserved. Global Systemically Important Bank On-premise and cloud- based Hadoop clusters according to workload. Tested on AWS to 40,000 cores. Demonstrated linear scaling of simulation workloads.
  • 22. 22© Cloudera, Inc. All rights reserved. Demo
  • 23. 23© Cloudera, Inc. All rights reserved. Q&A