For many securities organisations, post-trade processing is expensive, cumbersome, and time-consuming. This is in part due to the massive volumes of data required for processing a trade and the limited agility of the technology many organizations rely on today. In order to create efficiencies and move faster, many Financial Services organizations are working with AWS to implement post-trade solutions built with AWS’ storage services (S3 and Glacier) and big data capabilities (Athena, EMR, Redshift, and QuickSight ). In this session, AWS will walk through a trade capture and regulatory reporting solution that utilizes the aforementioned AWS services.
Speaker: Osemeke Isibor, Solutions Architect, AWS
An Architecture for Trade Capture and Regulatory Reporting
1. An Architecture for Trade Capture
and Regulatory Reporting
Osemeke Isibor
Solutions Architect, AWS.
iosemeke@amazon.com
2. What to Expect From This Session
• Identifying the challenges in architecting a data lake that meets the unique
requirements for regulatory reporting
• A pattern for ingestion, processing, and transformation of semi-structured
data in a secure and auditable data repository that can be used for a variety
of reporting and analytics applications
• An implementation of consolidated audit trail (CAT) reporting using AWS
services integrated with herd, an open-source unified data catalog
framework
3. Agenda
• Review of Regulatory Reporting Challenges
• Consolidated Audit Trail
• Architecture for Consolidated Audit Trail Reporting
• Security and Lineage Framework
• FIX Message Ingestion
• Message Transformation and Optimization
• Reporting and Analytics Tools
• Recap
4. Today’s Regulatory Reporting Landscape
• Financial institutions face challenges capturing, cleaning, organizing, and reporting for an array of
regulators and regulatory frameworks along with new expectations of fine-grained, n-dimensional
reporting with data lineage and governance controls.
EMA
PRA
Treasury
FDIC
FFIECBASEL
Dodd-Frank
NMSMiFID II
BCBS 239
CCAR
ESMA
RDA
FR Y-9C
5. Current Architecture Challenges
• Legacy System Fragmentation
• Data stored in multiple disconnected data silos
• Silos don’t provide lineage back to source data
• Distributed ETL processes at multiple levels, inconsistent
transformation between silos
• Static Infrastructure vs. Dynamic Data
• Slow to onboard new data sources
• Slow to adapt to data format changes
• Slow to build new types of reports
• Slow to share data across teams and with regulators
6. Regulatory Reporting Challenges
Diversity of
sources and
formats
Massive data
volumes
Stringent SLAs
(and fines)Security
Single record of
truth with lineage
and recreatability
7. A More Strategic Approach to Reporting
Financial institutions are viewing their reporting obligations as a catalyst to
pursue broader data management objectives that can help unlock the value of
their data.
Business
benefits
Enhanced data
governance
Improved
efficiency
9. Real-Life Example: SEC Rule 613
• “… plan to create,
implement, and maintain
a consolidated order
tracking system, or
consolidated audit trail,
with respect to the
trading of reportable
securities … ”
13. CAT Reporting Pipeline on AWS
Business
Intelligence
FIX
Messages
Single
Source of
Truth
Transform
and
Optimize
Optimized
Data
Repository
Transaction
Linking and
Transformation
Regulatory
Report
Ad-hoc Data
Analysis
FIX Ingestion
Transform FIX to Parquet
CAT Reporting
Trade Analytics
14. Region
Multipart
upload of
encrypted
data
Amazon
S3 data
lake
Transient Amazon
EMR Clusters for
ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm AWS CloudTrail
Amazon
Glacier
(WORM
storage)
AWS KMS
CAT Reporting Architecture on AWS
BYO Key
Amazon
S3 Data
Warehouse
Transient
Amazon EMR
Clusters for Event
Sequencing
CAT
output
herd Metadata
Store
15. Core Services Being Used
Amazon
S3
Amazon
Glacier
Amazon
CloudWatch
AWS
CloudTrail
AWS KMS Amazon
EMR
Amazon
Athena
Amazon
QuickSight
AWS Direct
Connect
16. Security
Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm CloudTrail
Glacier
(WORM
storage)
AWS KMS
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
herd Metadata
Store
18. Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm CloudTrail
Amazon
Glacier
(WORM
storage)
AWS
KMS
Lineage
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
herd Metadata
Store
19. Lineage Framework – herd
Unified data catalog
A centralized, auditable
catalog for operational
usage and data governance
Track lineage
Capture data ancestry for
regulatory, forensic, and
analytical purposes
herd is a FINRA-built, open-source framework that tracks and catalogs data in a
unified data repository in order to capture audit and data lineage information
20. Integrating herd
ETL Import/Export
ETL Transformation
herd
Metadata
Store
• All ETL applications update
the herd store with input,
output, and ETL application
version
• herd usage validated by
CloudTrail logs
21. Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm CloudTrail
Amazon
Glacier
(WORM
storage)
AWS KMS
FIX Ingestion
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
HERD Metadata
Store
22. FIX Ingestion – End of Day
Multipart upload of
encrypted data
S3 data
lake
Internal App
Amazon
Glacier
(WORM
storage)
AWS
Direct
Connect
23. Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm CloudTrail
Amazon
Glacier
(WORM
storage)
AWS KMS
Message Transformation and Optimization
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
HERD Metadata
Store
26. Region
Multipart
upload of
encrypted
data
S3 data
lake
Transient EMR
Clusters for ETL
Cleansed,
Formatted,
Split,
Compressed
Output
Internal App
On
premises
On-premises HSM
(optional)
CloudWatch Alarm CloudTrail
Amazon
Glacier
(WORM
storage)
KMS
Reporting and Analytics
BYO Key
S3 Data
Warehouse
Transient EMR
Clusters for Event
Sequencing
CAT
output
HERD Metadata
Store
30. Ad-hoc Data Analysis: A Typical Situation
Provide
all the trades
in ABC Corp
in last five years
9 TB
2016
2015
2014
2013
2012
Options?
31. What Are Your Options?
Option 3: Query data
at rest using Amazon
Athena or Amazon
Redshift Spectrum
Amazon
Athena
Amazon
S3 data lake
Ad-hoc queries
Option 2: Archive the data, and
upon request, stand up the
database server, restore the
data, and then query the data
$45 for 9 TB scanned
Option 1: Keep it
online all the time
35. Recap
• Identified the challenges in architecting a data lake that meets the unique
requirements for regulatory reporting: security, lineage, scale, and elasticity
• Reviewed an architecture for ingestion, processing, and transformation of
FIX dataset into a data repository that can be used for a variety of reporting
and analytics applications
• Demonstrated a reference implementation of CAT reporting using AWS
services integrated with herd