SlideShare a Scribd company logo
1 of 36
Download to read offline
An Architecture for Trade Capture
and Regulatory Reporting
Osemeke Isibor
Solutions Architect, AWS.
iosemeke@amazon.com
What to Expect From This Session
• Identifying the challenges in architecting a data lake that meets the unique
requirements for regulatory reporting
• A pattern for ingestion, processing, and transformation of semi-structured
data in a secure and auditable data repository that can be used for a variety
of reporting and analytics applications
• An implementation of consolidated audit trail (CAT) reporting using AWS
services integrated with herd, an open-source unified data catalog
framework
Agenda
• Review of Regulatory Reporting Challenges
• Consolidated Audit Trail
• Architecture for Consolidated Audit Trail Reporting
• Security and Lineage Framework
• FIX Message Ingestion
• Message Transformation and Optimization
• Reporting and Analytics Tools
• Recap
Today’s Regulatory Reporting Landscape
• Financial institutions face challenges capturing, cleaning, organizing, and reporting for an array of
regulators and regulatory frameworks along with new expectations of fine-grained, n-dimensional
reporting with data lineage and governance controls.
EMA
PRA
Treasury
FDIC
FFIECBASEL
Dodd-Frank
NMSMiFID II
BCBS 239
CCAR
ESMA
RDA
FR Y-9C
Current Architecture Challenges
• Legacy System Fragmentation
• Data stored in multiple disconnected data silos
• Silos don’t provide lineage back to source data
• Distributed ETL processes at multiple levels, inconsistent
transformation between silos
• Static Infrastructure vs. Dynamic Data
• Slow to onboard new data sources
• Slow to adapt to data format changes
• Slow to build new types of reports
• Slow to share data across teams and with regulators
Regulatory Reporting Challenges
Diversity of
sources and
formats
Massive data
volumes
Stringent SLAs
(and fines)Security
Single record of
truth with lineage
and recreatability
A More Strategic Approach to Reporting
Financial institutions are viewing their reporting obligations as a catalyst to
pursue broader data management objectives that can help unlock the value of
their data.
Business
benefits
Enhanced data
governance
Improved
efficiency
Consolidated Audit Trail
Real-Life Example: SEC Rule 613
• “… plan to create,
implement, and maintain
a consolidated order
tracking system, or
consolidated audit trail,
with respect to the
trading of reportable
securities … ”
Trading to CAT
Broker Dealer
Exchanges
CAT
Consolidator
Client/Firm
FIX Protocol
Regulators
8=FIX.4.09=029835=849=AWSHUB56=B
RAES50=OR6857=PFDR34=42762852=20
141011-
15:22:356=38.1550011=B17238605x1
c75s114=10017=3248042730=1331=38
.1550032=10037=500091438=60039=1
40=P44=0.0000054=155=GYMB59=060=
20141011-
15:22:3563=075=2014101176=AWS20=
07100=C,M7101=M7107=38.20,A,L710
8=100
{
"type": "MEOT",
"reporter": “AWSHUB",
"eventTimestamp": "20141011T152235.023471",
"sequenceNumber": 1199,
"symbol": “GYMB",
"tradeID": “32480427",
"quantity": 100,
"price": 38.155,
"buyDetails": {
"side": "Buy",
"leavesQty": 0,
"orderID": "B17238605x1c75s1”,
"capacity": "Agency",
"claringNumber": "0002",
"liquidityCode": “A"
},
“nbbPrice": 38.16,
"nbbQty": 200,
"nboPrice": 38.15,
"nboQty": 500,
"nbboSource": "SIP",
"nbboTimestamp": " 20141011T152235.023317 "
}
FIX – CAT/JSON
Consolidated Audit Trail Reporting
Architecture
CAT Reporting Pipeline on AWS
Business
Intelligence
FIX
Messages
Single
Source of
Truth
Transform
and
Optimize
Optimized
Data
Repository
Transaction
Linking and
Transformation
Regulatory
Report
Ad-hoc Data
Analysis
FIX Ingestion
Transform FIX to Parquet
CAT Reporting
Trade Analytics
Region
Multipart
upload of
encrypted
data
Amazon
S3 data
lake
Transient Amazon
EMR Clusters for
ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm AWS CloudTrail
Amazon
Glacier
(WORM
storage)
AWS KMS
CAT Reporting Architecture on AWS
BYO Key
Amazon
S3 Data
Warehouse
Transient
Amazon EMR
Clusters for Event
Sequencing
CAT
output
herd Metadata
Store
Core Services Being Used
Amazon
S3
Amazon
Glacier
Amazon
CloudWatch
AWS
CloudTrail
AWS KMS Amazon
EMR
Amazon
Athena
Amazon
QuickSight
AWS Direct
Connect
Security
Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm CloudTrail
Glacier
(WORM
storage)
AWS KMS
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
herd Metadata
Store
Security Framework
Network Isolation
(AWS Direct Connect, VPC, VPN)
Encryption
(Data in Transit)
(Data at Rest)
Auditing
(CloudTrail, CloudWatch)
Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm CloudTrail
Amazon
Glacier
(WORM
storage)
AWS
KMS
Lineage
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
herd Metadata
Store
Lineage Framework – herd
Unified data catalog
A centralized, auditable
catalog for operational
usage and data governance
Track lineage
Capture data ancestry for
regulatory, forensic, and
analytical purposes
herd is a FINRA-built, open-source framework that tracks and catalogs data in a
unified data repository in order to capture audit and data lineage information
Integrating herd
ETL Import/Export
ETL Transformation
herd
Metadata
Store
• All ETL applications update
the herd store with input,
output, and ETL application
version
• herd usage validated by
CloudTrail logs
Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm CloudTrail
Amazon
Glacier
(WORM
storage)
AWS KMS
FIX Ingestion
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
HERD Metadata
Store
FIX Ingestion – End of Day
Multipart upload of
encrypted data
S3 data
lake
Internal App
Amazon
Glacier
(WORM
storage)
AWS
Direct
Connect
Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm CloudTrail
Amazon
Glacier
(WORM
storage)
AWS KMS
Message Transformation and Optimization
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
HERD Metadata
Store
FIX
S3 data
lake
Transient EMR
Clusters for
ETL
Message Optimization
S3 Data
Repository
EMRFS
Parquet
Core Nodes
Task Nodes
EMRFS
Message Optimization
136
237
100
232
44
345
488
130
215
260
435
109
62
0
100
200
300
400
500
600
kudu parquet hbase avro mapfile
AverageScanRate(kHZ)
No compression Snappy Gzip/BZip2
Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm CloudTrail
Amazon
Glacier
(WORM
storage)
KMS
Reporting and Analytics
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
HERD Metadata
Store
Parquet
Reporting
CAT
Consolidator
JSON
S3 data
lake
Transient EMR
Clusters
S3 Data
Warehouse
EMRFS
EMRFS
Athena
Athena Creating Tables – Parquet
CREATE EXTERNAL TABLE db_name. transactions (
reporter STRING,
event_timestamp TIMESTAMP,
symbol STRING,
tradeID STRING,
quantity INT,
price DOUBLE,
side INT,
liquidity INT,
clearingNumber STRING
)
PARTITIONED BY (YEAR INT, MONTH INT, DAY INT, CLEARINGNUMBER STRING)
STORED AS PARQUET
LOCATION 's3://fsi-sandbox/catarch/parquet’
TBLPROPERTIES ('has_encrypted_data'=’true');
Parquet
Analytics
S3 data
lake
EMRFS
Amazon QuickSightAthena
Analytics
User
Analytics
User
Analytics
User
Amazon QuickSightAthena
Amazon QuickSightAthena
Ad-hoc Data Analysis: A Typical Situation
Provide
all the trades
in ABC Corp
in last five years
9 TB
2016
2015
2014
2013
2012
Options?
What Are Your Options?
Option 3: Query data
at rest using Amazon
Athena or Amazon
Redshift Spectrum
Amazon
Athena
Amazon
S3 data lake
Ad-hoc queries
Option 2: Archive the data, and
upon request, stand up the
database server, restore the
data, and then query the data
$45 for 9 TB scanned
Option 1: Keep it
online all the time
Amazon QuickSight: Import Dataset
Amazon QuickSight: One-click Visualization
Analytics
Parquet
Formatted
S3 Data
Warehouse
Recap
• Identified the challenges in architecting a data lake that meets the unique
requirements for regulatory reporting: security, lineage, scale, and elasticity
• Reviewed an architecture for ingestion, processing, and transformation of
FIX dataset into a data repository that can be used for a variety of reporting
and analytics applications
• Demonstrated a reference implementation of CAT reporting using AWS
services integrated with herd
Thank You

More Related Content

What's hot

Building Modern Streaming Analytics with Confluent on AWS
Building Modern Streaming Analytics with Confluent on AWSBuilding Modern Streaming Analytics with Confluent on AWS
Building Modern Streaming Analytics with Confluent on AWSconfluent
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeDatabricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADFData Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADFMark Kromer
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareKai Wähner
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache FlinkAKASH SIHAG
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in SparkDatabricks
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraAmazon Web Services
 
デモとディスカッションで体験するOracle DBトラブル対応
デモとディスカッションで体験するOracle DBトラブル対応デモとディスカッションで体験するOracle DBトラブル対応
デモとディスカッションで体験するOracle DBトラブル対応歩 柴田
 
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangDatabricks
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Oracle to Azure PostgreSQL database migration webinar
Oracle to Azure PostgreSQL database migration webinarOracle to Azure PostgreSQL database migration webinar
Oracle to Azure PostgreSQL database migration webinarMinnie Seungmin Cho
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkDatabricks
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflakeSivakumar Ramar
 

What's hot (20)

Building Modern Streaming Analytics with Confluent on AWS
Building Modern Streaming Analytics with Confluent on AWSBuilding Modern Streaming Analytics with Confluent on AWS
Building Modern Streaming Analytics with Confluent on AWS
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADFData Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADF
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
SRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon AuroraSRV407 Deep Dive on Amazon Aurora
SRV407 Deep Dive on Amazon Aurora
 
デモとディスカッションで体験するOracle DBトラブル対応
デモとディスカッションで体験するOracle DBトラブル対応デモとディスカッションで体験するOracle DBトラブル対応
デモとディスカッションで体験するOracle DBトラブル対応
 
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan ZhangExperiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
Experiences Migrating Hive Workload to SparkSQL with Jie Xiong and Zhan Zhang
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Oracle to Azure PostgreSQL database migration webinar
Oracle to Azure PostgreSQL database migration webinarOracle to Azure PostgreSQL database migration webinar
Oracle to Azure PostgreSQL database migration webinar
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
 

Similar to An Architecture for Trade Capture and Regulatory Reporting

FSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory ReportingFSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory ReportingAmazon Web Services
 
FSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingFSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingAmazon Web Services
 
Newest Family Member - IT Automation With Opalis
Newest Family Member - IT Automation With OpalisNewest Family Member - IT Automation With Opalis
Newest Family Member - IT Automation With OpalisAmit Gatenyo
 
Kluczowe elementy infrastruktury...
Kluczowe elementy infrastruktury...Kluczowe elementy infrastruktury...
Kluczowe elementy infrastruktury...Alicja Sieminska
 
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptxTrack 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptxAmazon Web Services
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
State of the Union: Database & Analytics
State of the Union: Database & AnalyticsState of the Union: Database & Analytics
State of the Union: Database & AnalyticsAmazon Web Services
 
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery ToolsAntonio Rolle
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleNoriaki Tatsumi
 
(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
(ARC305) How J&J Manages AWS At Scale For Enterprise WorkloadsAmazon Web Services
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017Pratim Das
 
Logging, indicateurs et APM : le trio gagnant pour des opérations réussies
Logging, indicateurs et APM : le trio gagnant pour des opérations réussiesLogging, indicateurs et APM : le trio gagnant pour des opérations réussies
Logging, indicateurs et APM : le trio gagnant pour des opérations réussiesElasticsearch
 
Innovation, Cloud Powered - Dr Werner Vogels
Innovation, Cloud Powered - Dr Werner VogelsInnovation, Cloud Powered - Dr Werner Vogels
Innovation, Cloud Powered - Dr Werner VogelsAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Technology Overview
Technology OverviewTechnology Overview
Technology OverviewLiran Zelkha
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerSambit Banerjee
 
Unify Analytics: Combine Strengths of Data Lake and Data Warehouse
Unify Analytics: Combine Strengths of Data Lake and Data WarehouseUnify Analytics: Combine Strengths of Data Lake and Data Warehouse
Unify Analytics: Combine Strengths of Data Lake and Data WarehousePaige_Roberts
 
Wwt Corp. Overview & Data Center Presentation For Hendee
Wwt Corp. Overview & Data Center Presentation For HendeeWwt Corp. Overview & Data Center Presentation For Hendee
Wwt Corp. Overview & Data Center Presentation For HendeeChristopher Hendee
 

Similar to An Architecture for Trade Capture and Regulatory Reporting (20)

FSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory ReportingFSI301 An Architecture for Trade Capture and Regulatory Reporting
FSI301 An Architecture for Trade Capture and Regulatory Reporting
 
FSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory ReportingFSV302_An Architecture for Trade Capture and Regulatory Reporting
FSV302_An Architecture for Trade Capture and Regulatory Reporting
 
Keynote sp summit 2014 final
Keynote sp summit 2014  finalKeynote sp summit 2014  final
Keynote sp summit 2014 final
 
Newest Family Member - IT Automation With Opalis
Newest Family Member - IT Automation With OpalisNewest Family Member - IT Automation With Opalis
Newest Family Member - IT Automation With Opalis
 
Kluczowe elementy infrastruktury...
Kluczowe elementy infrastruktury...Kluczowe elementy infrastruktury...
Kluczowe elementy infrastruktury...
 
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptxTrack 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
State of the Union: Database & Analytics
State of the Union: Database & AnalyticsState of the Union: Database & Analytics
State of the Union: Database & Analytics
 
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scale
 
(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
(ARC305) How J&J Manages AWS At Scale For Enterprise Workloads
 
London Redshift Meetup - July 2017
London Redshift Meetup - July 2017London Redshift Meetup - July 2017
London Redshift Meetup - July 2017
 
Logging, indicateurs et APM : le trio gagnant pour des opérations réussies
Logging, indicateurs et APM : le trio gagnant pour des opérations réussiesLogging, indicateurs et APM : le trio gagnant pour des opérations réussies
Logging, indicateurs et APM : le trio gagnant pour des opérations réussies
 
Innovation, Cloud Powered - Dr Werner Vogels
Innovation, Cloud Powered - Dr Werner VogelsInnovation, Cloud Powered - Dr Werner Vogels
Innovation, Cloud Powered - Dr Werner Vogels
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Technology Overview
Technology OverviewTechnology Overview
Technology Overview
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access Layer
 
Unify Analytics: Combine Strengths of Data Lake and Data Warehouse
Unify Analytics: Combine Strengths of Data Lake and Data WarehouseUnify Analytics: Combine Strengths of Data Lake and Data Warehouse
Unify Analytics: Combine Strengths of Data Lake and Data Warehouse
 
Wwt Corp. Overview & Data Center Presentation For Hendee
Wwt Corp. Overview & Data Center Presentation For HendeeWwt Corp. Overview & Data Center Presentation For Hendee
Wwt Corp. Overview & Data Center Presentation For Hendee
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

An Architecture for Trade Capture and Regulatory Reporting

  • 1. An Architecture for Trade Capture and Regulatory Reporting Osemeke Isibor Solutions Architect, AWS. iosemeke@amazon.com
  • 2. What to Expect From This Session • Identifying the challenges in architecting a data lake that meets the unique requirements for regulatory reporting • A pattern for ingestion, processing, and transformation of semi-structured data in a secure and auditable data repository that can be used for a variety of reporting and analytics applications • An implementation of consolidated audit trail (CAT) reporting using AWS services integrated with herd, an open-source unified data catalog framework
  • 3. Agenda • Review of Regulatory Reporting Challenges • Consolidated Audit Trail • Architecture for Consolidated Audit Trail Reporting • Security and Lineage Framework • FIX Message Ingestion • Message Transformation and Optimization • Reporting and Analytics Tools • Recap
  • 4. Today’s Regulatory Reporting Landscape • Financial institutions face challenges capturing, cleaning, organizing, and reporting for an array of regulators and regulatory frameworks along with new expectations of fine-grained, n-dimensional reporting with data lineage and governance controls. EMA PRA Treasury FDIC FFIECBASEL Dodd-Frank NMSMiFID II BCBS 239 CCAR ESMA RDA FR Y-9C
  • 5. Current Architecture Challenges • Legacy System Fragmentation • Data stored in multiple disconnected data silos • Silos don’t provide lineage back to source data • Distributed ETL processes at multiple levels, inconsistent transformation between silos • Static Infrastructure vs. Dynamic Data • Slow to onboard new data sources • Slow to adapt to data format changes • Slow to build new types of reports • Slow to share data across teams and with regulators
  • 6. Regulatory Reporting Challenges Diversity of sources and formats Massive data volumes Stringent SLAs (and fines)Security Single record of truth with lineage and recreatability
  • 7. A More Strategic Approach to Reporting Financial institutions are viewing their reporting obligations as a catalyst to pursue broader data management objectives that can help unlock the value of their data. Business benefits Enhanced data governance Improved efficiency
  • 9. Real-Life Example: SEC Rule 613 • “… plan to create, implement, and maintain a consolidated order tracking system, or consolidated audit trail, with respect to the trading of reportable securities … ”
  • 10. Trading to CAT Broker Dealer Exchanges CAT Consolidator Client/Firm FIX Protocol Regulators
  • 11. 8=FIX.4.09=029835=849=AWSHUB56=B RAES50=OR6857=PFDR34=42762852=20 141011- 15:22:356=38.1550011=B17238605x1 c75s114=10017=3248042730=1331=38 .1550032=10037=500091438=60039=1 40=P44=0.0000054=155=GYMB59=060= 20141011- 15:22:3563=075=2014101176=AWS20= 07100=C,M7101=M7107=38.20,A,L710 8=100 { "type": "MEOT", "reporter": “AWSHUB", "eventTimestamp": "20141011T152235.023471", "sequenceNumber": 1199, "symbol": “GYMB", "tradeID": “32480427", "quantity": 100, "price": 38.155, "buyDetails": { "side": "Buy", "leavesQty": 0, "orderID": "B17238605x1c75s1”, "capacity": "Agency", "claringNumber": "0002", "liquidityCode": “A" }, “nbbPrice": 38.16, "nbbQty": 200, "nboPrice": 38.15, "nboQty": 500, "nbboSource": "SIP", "nbboTimestamp": " 20141011T152235.023317 " } FIX – CAT/JSON
  • 12. Consolidated Audit Trail Reporting Architecture
  • 13. CAT Reporting Pipeline on AWS Business Intelligence FIX Messages Single Source of Truth Transform and Optimize Optimized Data Repository Transaction Linking and Transformation Regulatory Report Ad-hoc Data Analysis FIX Ingestion Transform FIX to Parquet CAT Reporting Trade Analytics
  • 14. Region Multipart upload of encrypted data Amazon S3 data lake Transient Amazon EMR Clusters for ETL Cleansed, Formatted, Split, Compressed Output Internal App On premises On-premises HSM (optional) CloudWatch Alarm AWS CloudTrail Amazon Glacier (WORM storage) AWS KMS CAT Reporting Architecture on AWS BYO Key Amazon S3 Data Warehouse Transient Amazon EMR Clusters for Event Sequencing CAT output herd Metadata Store
  • 15. Core Services Being Used Amazon S3 Amazon Glacier Amazon CloudWatch AWS CloudTrail AWS KMS Amazon EMR Amazon Athena Amazon QuickSight AWS Direct Connect
  • 16. Security Region Multipart upload of encrypted data S3 data lake Transient EMR Clusters for ETL Cleansed, Formatted, Split, Compressed Output Internal App On premises On-premises HSM (optional) CloudWatch Alarm CloudTrail Glacier (WORM storage) AWS KMS BYO Key S3 Data Warehouse Transient EMR Clusters for Event Sequencing CAT output herd Metadata Store
  • 17. Security Framework Network Isolation (AWS Direct Connect, VPC, VPN) Encryption (Data in Transit) (Data at Rest) Auditing (CloudTrail, CloudWatch)
  • 18. Region Multipart upload of encrypted data S3 data lake Transient EMR Clusters for ETL Cleansed, Formatted, Split, Compressed Output Internal App On premises On-premises HSM (optional) CloudWatch Alarm CloudTrail Amazon Glacier (WORM storage) AWS KMS Lineage BYO Key S3 Data Warehouse Transient EMR Clusters for Event Sequencing CAT output herd Metadata Store
  • 19. Lineage Framework – herd Unified data catalog A centralized, auditable catalog for operational usage and data governance Track lineage Capture data ancestry for regulatory, forensic, and analytical purposes herd is a FINRA-built, open-source framework that tracks and catalogs data in a unified data repository in order to capture audit and data lineage information
  • 20. Integrating herd ETL Import/Export ETL Transformation herd Metadata Store • All ETL applications update the herd store with input, output, and ETL application version • herd usage validated by CloudTrail logs
  • 21. Region Multipart upload of encrypted data S3 data lake Transient EMR Clusters for ETL Cleansed, Formatted, Split, Compressed Output Internal App On premises On-premises HSM (optional) CloudWatch Alarm CloudTrail Amazon Glacier (WORM storage) AWS KMS FIX Ingestion BYO Key S3 Data Warehouse Transient EMR Clusters for Event Sequencing CAT output HERD Metadata Store
  • 22. FIX Ingestion – End of Day Multipart upload of encrypted data S3 data lake Internal App Amazon Glacier (WORM storage) AWS Direct Connect
  • 23. Region Multipart upload of encrypted data S3 data lake Transient EMR Clusters for ETL Cleansed, Formatted, Split, Compressed Output Internal App On premises On-premises HSM (optional) CloudWatch Alarm CloudTrail Amazon Glacier (WORM storage) AWS KMS Message Transformation and Optimization BYO Key S3 Data Warehouse Transient EMR Clusters for Event Sequencing CAT output HERD Metadata Store
  • 24. FIX S3 data lake Transient EMR Clusters for ETL Message Optimization S3 Data Repository EMRFS Parquet Core Nodes Task Nodes EMRFS
  • 25. Message Optimization 136 237 100 232 44 345 488 130 215 260 435 109 62 0 100 200 300 400 500 600 kudu parquet hbase avro mapfile AverageScanRate(kHZ) No compression Snappy Gzip/BZip2
  • 26. Region Multipart upload of encrypted data S3 data lake Transient EMR Clusters for ETL Cleansed, Formatted, Split, Compressed Output Internal App On premises On-premises HSM (optional) CloudWatch Alarm CloudTrail Amazon Glacier (WORM storage) KMS Reporting and Analytics BYO Key S3 Data Warehouse Transient EMR Clusters for Event Sequencing CAT output HERD Metadata Store
  • 28. Athena Creating Tables – Parquet CREATE EXTERNAL TABLE db_name. transactions ( reporter STRING, event_timestamp TIMESTAMP, symbol STRING, tradeID STRING, quantity INT, price DOUBLE, side INT, liquidity INT, clearingNumber STRING ) PARTITIONED BY (YEAR INT, MONTH INT, DAY INT, CLEARINGNUMBER STRING) STORED AS PARQUET LOCATION 's3://fsi-sandbox/catarch/parquet’ TBLPROPERTIES ('has_encrypted_data'=’true');
  • 30. Ad-hoc Data Analysis: A Typical Situation Provide all the trades in ABC Corp in last five years 9 TB 2016 2015 2014 2013 2012 Options?
  • 31. What Are Your Options? Option 3: Query data at rest using Amazon Athena or Amazon Redshift Spectrum Amazon Athena Amazon S3 data lake Ad-hoc queries Option 2: Archive the data, and upon request, stand up the database server, restore the data, and then query the data $45 for 9 TB scanned Option 1: Keep it online all the time
  • 35. Recap • Identified the challenges in architecting a data lake that meets the unique requirements for regulatory reporting: security, lineage, scale, and elasticity • Reviewed an architecture for ingestion, processing, and transformation of FIX dataset into a data repository that can be used for a variety of reporting and analytics applications • Demonstrated a reference implementation of CAT reporting using AWS services integrated with herd