An Architecture for Trade Capture and Regulatory Reporting

An Architecture for Trade Capture
and Regulatory Reporting
Osemeke Isibor
Solutions Architect, AWS.
iosemeke@amazon.com

What to Expect From This Session
• Identifying the challenges in architecting a data lake that meets the unique
requirements for regulatory reporting
• A pattern for ingestion, processing, and transformation of semi-structured
data in a secure and auditable data repository that can be used for a variety
of reporting and analytics applications
• An implementation of consolidated audit trail (CAT) reporting using AWS
services integrated with herd, an open-source unified data catalog
framework

Agenda
• Review of Regulatory Reporting Challenges
• Consolidated Audit Trail
• Architecture for Consolidated Audit Trail Reporting
• Security and Lineage Framework
• FIX Message Ingestion
• Message Transformation and Optimization
• Reporting and Analytics Tools
• Recap

Today’s Regulatory Reporting Landscape
• Financial institutions face challenges capturing, cleaning, organizing, and reporting for an array of
regulators and regulatory frameworks along with new expectations of fine-grained, n-dimensional
reporting with data lineage and governance controls.
EMA
PRA
Treasury
FDIC
FFIECBASEL
Dodd-Frank
NMSMiFID II
BCBS 239
CCAR
ESMA
RDA
FR Y-9C

Current Architecture Challenges
• Legacy System Fragmentation
• Data stored in multiple disconnected data silos
• Silos don’t provide lineage back to source data
• Distributed ETL processes at multiple levels, inconsistent
transformation between silos
• Static Infrastructure vs. Dynamic Data
• Slow to onboard new data sources
• Slow to adapt to data format changes
• Slow to build new types of reports
• Slow to share data across teams and with regulators

Regulatory Reporting Challenges
Diversity of
sources and
formats
Massive data
volumes
Stringent SLAs
(and fines)Security
Single record of
truth with lineage
and recreatability

A More Strategic Approach to Reporting
Financial institutions are viewing their reporting obligations as a catalyst to
pursue broader data management objectives that can help unlock the value of
their data.
Business
benefits
Enhanced data
governance
Improved
efficiency

Real-Life Example: SEC Rule 613
• “… plan to create,
implement, and maintain
a consolidated order
tracking system, or
consolidated audit trail,
with respect to the
trading of reportable
securities … ”

Trading to CAT
Broker Dealer
Exchanges
CAT
Consolidator
Client/Firm
FIX Protocol
Regulators

8=FIX.4.09=029835=849=AWSHUB56=B
RAES50=OR6857=PFDR34=42762852=20
141011-
15:22:356=38.1550011=B17238605x1
c75s114=10017=3248042730=1331=38
.1550032=10037=500091438=60039=1
40=P44=0.0000054=155=GYMB59=060=
20141011-
15:22:3563=075=2014101176=AWS20=
07100=C,M7101=M7107=38.20,A,L710
8=100
{
"type": "MEOT",
"reporter": “AWSHUB",
"eventTimestamp": "20141011T152235.023471",
"sequenceNumber": 1199,
"symbol": “GYMB",
"tradeID": “32480427",
"quantity": 100,
"price": 38.155,
"buyDetails": {
"side": "Buy",
"leavesQty": 0,
"orderID": "B17238605x1c75s1”,
"capacity": "Agency",
"claringNumber": "0002",
"liquidityCode": “A"
},
“nbbPrice": 38.16,
"nbbQty": 200,
"nboPrice": 38.15,
"nboQty": 500,
"nbboSource": "SIP",
"nbboTimestamp": " 20141011T152235.023317 "
}
FIX – CAT/JSON

Consolidated Audit Trail Reporting
Architecture

CAT Reporting Pipeline on AWS
Business
Intelligence
FIX
Messages
Single
Source of
Truth
Transform
and
Optimize
Optimized
Data
Repository
Transaction
Linking and
Transformation
Regulatory
Report
Ad-hoc Data
Analysis
FIX Ingestion
Transform FIX to Parquet
CAT Reporting
Trade Analytics

Region
Multipart
upload of
encrypted
data
Amazon
S3 data
lake
Transient Amazon
EMR Clusters for
ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm AWS CloudTrail
Amazon
Glacier
(WORM
storage)
AWS KMS
CAT Reporting Architecture on AWS
BYO Key
Amazon
S3 Data
Warehouse
Transient
Amazon EMR
Clusters for Event
Sequencing
CAT
output
herd Metadata
Store

Core Services Being Used
Amazon
S3
Amazon
Glacier
Amazon
CloudWatch
AWS
CloudTrail
AWS KMS Amazon
EMR
Amazon
Athena
Amazon
QuickSight
AWS Direct
Connect

Security
Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm CloudTrail
Glacier
(WORM
storage)
AWS KMS
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
herd Metadata
Store

Security Framework
Network Isolation
(AWS Direct Connect, VPC, VPN)
Encryption
(Data in Transit)
(Data at Rest)
Auditing
(CloudTrail, CloudWatch)

Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
Amazon
Glacier
(WORM
storage)
AWS
KMS
Lineage
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
herd Metadata
Store

Lineage Framework – herd
Unified data catalog
A centralized, auditable
catalog for operational
usage and data governance
Track lineage
Capture data ancestry for
regulatory, forensic, and
analytical purposes
herd is a FINRA-built, open-source framework that tracks and catalogs data in a
unified data repository in order to capture audit and data lineage information

Integrating herd
ETL Import/Export
ETL Transformation
herd
Metadata
Store
• All ETL applications update
the herd store with input,
output, and ETL application
version
• herd usage validated by
CloudTrail logs

Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
Amazon
Glacier
(WORM
storage)
AWS KMS
FIX Ingestion
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
HERD Metadata
Store

FIX Ingestion – End of Day
Multipart upload of
encrypted data
S3 data
lake
Internal App
Amazon
Glacier
(WORM
storage)
AWS
Direct
Connect

Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
Amazon
Glacier
(WORM
storage)
AWS KMS
Message Transformation and Optimization
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
HERD Metadata
Store

FIX
S3 data
lake
Transient EMR
Clusters for
ETL
Message Optimization
S3 Data
Repository
EMRFS
Parquet
Core Nodes
Task Nodes
EMRFS

Message Optimization
136
237
100
232
44
345
488
130
215
260
435
109
62
0
100
200
300
400
500
600
kudu parquet hbase avro mapfile
AverageScanRate(kHZ)
No compression Snappy Gzip/BZip2

Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
Amazon
Glacier
(WORM
storage)
KMS
Reporting and Analytics
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
HERD Metadata
Store

Parquet
Reporting
CAT
Consolidator
JSON
S3 data
lake
Transient EMR
Clusters
S3 Data
Warehouse
EMRFS
EMRFS
Athena

Athena Creating Tables – Parquet
CREATE EXTERNAL TABLE db_name. transactions (
reporter STRING,
event_timestamp TIMESTAMP,
symbol STRING,
tradeID STRING,
quantity INT,
price DOUBLE,
side INT,
liquidity INT,
clearingNumber STRING
)
PARTITIONED BY (YEAR INT, MONTH INT, DAY INT, CLEARINGNUMBER STRING)
STORED AS PARQUET
LOCATION 's3://fsi-sandbox/catarch/parquet’
TBLPROPERTIES ('has_encrypted_data'=’true');

Parquet
Analytics
S3 data
lake
EMRFS
Amazon QuickSightAthena
Analytics
User
Analytics
User
Analytics
User

Ad-hoc Data Analysis: A Typical Situation
Provide
all the trades
in ABC Corp
in last five years
9 TB
2016
2015
2014
2013
2012
Options?

What Are Your Options?
Option 3: Query data
at rest using Amazon
Athena or Amazon
Redshift Spectrum
Amazon
Athena
Amazon
S3 data lake
Ad-hoc queries
Option 2: Archive the data, and
upon request, stand up the
database server, restore the
data, and then query the data
$45 for 9 TB scanned
Option 1: Keep it
online all the time

Amazon QuickSight: Import Dataset

Amazon QuickSight: One-click Visualization

Analytics
Parquet
Formatted
S3 Data
Warehouse

Recap
• Identified the challenges in architecting a data lake that meets the unique
requirements for regulatory reporting: security, lineage, scale, and elasticity
• Reviewed an architecture for ingestion, processing, and transformation of
FIX dataset into a data repository that can be used for a variety of reporting
and analytics applications
• Demonstrated a reference implementation of CAT reporting using AWS
services integrated with herd

An Architecture for Trade Capture and Regulatory Reporting

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to An Architecture for Trade Capture and Regulatory Reporting

Similar to An Architecture for Trade Capture and Regulatory Reporting (20)

More from Amazon Web Services

More from Amazon Web Services (20)

An Architecture for Trade Capture and Regulatory Reporting