SlideShare a Scribd company logo
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Capi tal Markets Di scovery: How F INRA Runs
Trad e Anal yti cs and Survei l l ance on AWS
R o b e r t K i s s e l l
S r . S o l u t i o n s A r c h i t e c t
W W P S F e d e r a l F i n a n c i a l s
A W S
J o h n H i t c h i n g h a m
S r . D i r e c t o r E n g i n e e r i n g
F I N R A
N o v e m b e r 2 7 , 2 0 1 7
F S V 3 0 7
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Four pillars of the data lake
Scale
• Store and analyze all
data centrally
• Ingest data quickly
without predefined
schemas
• Separate storage and
compute, scaling each
component as needed
Cost
• Pay only for what you
need
• Use only the services you
need
• Utilize diverse services/
features to optimize cost
Security
Encryption at each step
• Explicit control of egress
and ingress points
• Compliance and
Governance of Data
access using AWS native
services/features
Agility
• Big data does not mean
just batch processing
• Mix and match on-
premises and cloud
• Custom development and
managed services
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data lake
Central Storage
Secure, cost-effective
storage in
Amazon S3
Data Ingestion
Get your data into S3 quickly and securely
Kinesis Firehose, Direct Connect,
AWS Snowball, Database Migration Service
Processing & Analytics
Use of predictive and prescriptive analytics
to gain better understanding
DynamoDB
Elasticsearch Service
Athena, Amazon QuickSight, Amazon EMR,
Amazon Redshift
Protect & Secure
Use entitlements to ensure data is
secure and users’ identities are
verified
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FINRA’s Data Lake
Surveilling markets with FINRA’s multi-petabyte enterprise-grade data
lake
Market regulation—analytics pipeline
Validation
Prepare for
Analytics
(ETL)
Run Automated
Detection
Models
Interactive
Analytics
Regulatory
Analyst
Explore
Investigate
Regulatory
Follow up
BDs Exchanges Reference
Data Providers
Trade execution records
Market reference data
Data
Scientist
Develop
Models
75B+ events 20+ PB of Data 3Yrs Prod on CloudMajor Exchange Clients
Cloud journey—data puddles to data lake
Database1
Storage
Query/Compute
Catalog
Database2
Storage
Query/Compute
Catalog
Databasen
Storage
Query/Compute
Catalog
Storage
Query/
Compute
Catalog
EMR LambdaEMR Presto EMR HBase
FINRA
herd
Hive
metastore
Silo
Amazon
S3
Scales
http://finraos.github.io/herd
Unified catalog
• Schemas
• Versions
• Encryption type
• Storage policies
Lineage and Usage
• Track publishers and consumers
• Easily identify jobs and derived data sets
Shared Metastore
• Common definition of tables and partitions
• Use with Spark, Presto, Hive, etc.
• Faster instantiation of clusters
Herd catalog—for centralized data management
Trades Surveillance
2017-03-01 v1
2017-03-02 v1
2017-03-01 v1
2017-03-02 v1
Regulatory
conclusion
Lineage
1
Trades Surveillance
2017-03-01 v1
2017-03-02 v1
2017-03-01 v1
2017-03-02 v1
Regulatory
conclusion
2 2017-03-01 v2
v2 Data Version
?
?
Example—lineage and data versioning
Files
Ingest
Define
Record
Legal Hold?
No IAM
role with
delete on bucket
Review/Approve
Process
Tag files
For delete
DM Managed
Amazon
S3 Bucket
Trade Reports
OATS Orders
Model Outputs
Delete
Delete files call
Herd—foundation for records management
Files
Herd
DM
Metadata
All deletes
via policy
based on tags
Register
Object
Store
file(s)
Set Record Flag
Set Record Period
Set Record Owner
Set / Clear
Legal Hold
Gen list of
Records eligible
for deletion
File life on Amazon S3
Universal data catalog—explore data
Analysts Data Scientists Developers
Built on
Catalog &
Storage
ETL
Normalize, Enrich, Reformat
Human
Analytics
Validation
Ingest
Broker Dealers
Exchanges
Third-Party
Providers
Data
Files
Analyst
Data Scientist
Regulatory User
Detection models (Patterns)
Automated Surveillance
P
P
P
A
A
P Processing Pipeline
A Analytics
Analytic data processing pipeline
on the data lake
ETL framework
ETL execution
Input Data Input Data Input Data Input Data Input Data
Job1 Job2 Job3
Job4 Job5 Job6 JobN
…
Output Data Output Data Output Data Output Data Output Data
Amazon
S3
Amazon
S3
Amazon
EMR
Orchestration
Data Location
Registration
Per Second BillingSpot Hive (Deprecated) Spark
Dynamic processing
0.0
1.0
2.0
3.0
4.0
5.0
11/1 11/8 11/15 11/22 11/29
Daily Order Volume (Billions)
0
2000
4000
6000
8000
10000
12000
2016-10-17T02
2016-10-17T08
2016-10-17T14
2016-10-17T20
2016-10-18T02
2016-10-18T08
2016-10-18T14
2016-10-18T20
2016-10-19T02
2016-10-19T08
2016-10-19T14
2016-10-19T20
2016-10-20T02
2016-10-20T08
2016-10-20T14
2016-10-20T20
2016-10-21T02
2016-10-21T08
2016-10-21T14
2016-10-21T20
2016-10-22T02
2016-10-24T03
2016-10-24T20
ComputeNodes
Hour of Day
Amazon EMR compute on Amazon EC2
EMR
20k – 25k EC2 nodes per day 93% of EC2 is on EMR
Avg EC2 node: 3 cores
Avg EC2 uptime: 3 hours
96% of EC2 nodes live < 24 hrsOver 50k nodes on peak day
Interactive analytics—fundamentals
Data
Analyst
Data
Scientist
JDBC/ODBC
Client
JDBC/ODBC
Client
Table 1
Table 2
AuthN
AuthZ
Metastore
Table N
Logical “Database” = 4+ PB
Amazon EMR
Achieving interactive query
Query Table size
(rows)
Output
size (rows)
ORC TXT/BZ2
select count(*) from TABLE_1
where trade_date = cast(‘2016-08-09’ as date)
2469171608 1 4s 1m56s
select col1, count(*) from TABLE_1 where col2 = cast('2016-
08-09' as date) group by col1 order by col1
2469171608 12 3s 1m51s
select col1, count(*) from TABLE_1 where col2 = cast('2016-
08-09' as date) group by col1 order by col1
2469171608 8364 5s 2m5s
select * from TABLE_1 where col2 = cast('2016-08-10' as
date) and col3='I' and col4='CR' and col5 between 100000.0
and 103000.0
2469171608 760 10s 2m3s
Test Config:
Presto 0.167.0.6t (Teradata) On EMR
Data on S3 (external tables)
Cluster size: 60 worker node x r4.4xlarge
Key points:
Use ORC (Or Parquet) for performant query
User A JDBC/ODBC
Client Table 1
Table 2
Metastore
Table N
Logical “Database”
JDBC/ODBC
Client
User B
JDBC App
Cluster A
Cluster B
Cluster N
Still One Copy
Of Data
Scaling out interactive query
FINRA’s interactive Big Data portfolio
Data Lake
Diver MIRS DOMT User-Directed FOLA Marketspace
Crosstab UI
Personal marts -
billons of rows
Domain-specific
interactive reports
and visualizations
Visualize
depth of market
Investigation
and data profiling
via SQL
Retrieve market
events to render
order lifecycle
Exception and
alert viewer
Data science ecosystem on data lake
Data
Scientist
JDBC/ODBC
Client
Logical ‘Database’
EMR Cluster Source
Data
Spark Cluster
DS-in-a-box
Data
Scientist
Notebook
Interface
Data
Scientist
Catalog
Notebook or Shell
Personal
Data Marts
Explore
Example—cross-market surveillance
NASDAQ
PSX
NYSE
AMEX
ARCA
OATS
TRF
ISG Audit
Trail
Cross-market Data Model
Unifies market
data into five
major events:
orders,
reports,
cancels,
trades, and
quotes.
Captures
events and
attributes
required for
patterns.
Provides
consistent
cross market
participant
definition.
Propagates
participant
information as
an order is
routed from
Firm to
Exchange and
from
Exchange to
Exchange
Calculates
open interest
for all orders
at any given
time during
the day
ETL
Data
Cross Market
Surveillance Models
(automated)
Depth of Market Tool
& Diver
(interactive)
Use Use
Surveillance execution (like ETL)
Input Data Input Data Input Data Input Data Input Data
Pattern1 Pattern2 Pattern3 Pattern4 Pattern5 Pattern6 PatternN…
Output Data Output Data Output Data Output Data Output Data
Amazon
S3
Amazon
EMR
Orchestration
Data Location
Registration
Fwk
Mgr
Dev Ops
Per Second BillingSpot Hive (Deprecated) Spark
Amazon
S3
Surveillance evolution
Execution Engine Relational DB Hive, Spark Spark
Language SQL SQL (HiveQL, Spark SQL) Scala, Python, R, SQL, Java
Production Logic SQL w/ some scripting SQL w/ some scripting ML model (H2O, MLlib)
Data Catalog N/A
Catalog provides schema/
location
Create dataframes
Catalog provides schema/
location
Data Framework N/A N/A
Data manipulated as dataframe
API for common manipulations
today
Before Cloud Cloud v1 Cloud v2
FINRA’s dynamic surveillance platform
Data Engineering Model Selection
ML Framework
Data Framework
Trained
Model
Scoring
Algorithms
EGRPython, R,
Scala, SQL
Scala
Python
Scala, Python, R
Test
Chosen
Model
Data
Observation-1
Observation-2
Observation-n
…
Notebook
Promotion
Data Lake
Amazon
EC2
Amazon
EC2
Amazon
S3
Model Development Prod
FINRA
herd
Python, R,
Scala
Data Framework
Scala
Python
Iterative
VPC isolation
Security Groups
VPC Endpoints
SDLC Isolation (Accts)
AWS KMS
EMR Security Configs
S3 SSE
S3 KMS
EBS KMS
AWS CloudTrail
Splunk
Nagios
Isolation Encryption MonitoringAuthN/AuthZ
Role-based access
IAM ADFS Federation
Temporary token access
AD LDAP Integration (Apps)
Security
Compliance—consistency, transparency
Compliance
Reports
FINRA Provision Tool
Compliant Stack Configs
FINRA Portus Tool
Approved Security Groups
Dev Account
QC Account
Prod Account
Security EA
FINRA IAMUS Tool
IAM Role Templates
Development Tools
Dev
Teams
Automated
Deploy
Automated
Deploy
Configs / Chg Events
Configs / Chg Events
Configs / Chg Events
CloudTrail
Policies
Reg SCI SoX SOC2SECAudits
Reporting/
Investigation
Data Science
Machine Learning
Data Management Data Processing Pipeline
Improved
Cost Reduction
Security
Regulatory Compliance



AchievedSimplified
Benefits of a data lake implementation
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FINRA Presentations re:Invent 2017
FSV307 – Capital Markets Discovery: How FINRA Runs Trade Analytics and Surveillance on AWS
The FINRA analytics platform unlocks the value in capital markets data by accelerating trade analytics and providing a foundation for machine
learning at scale. Monday, Nov 27, 10:45 a.m. – 11:45 a.m. Venetian, Level 5, Palazzo P
SID326 – AWS Security State of the Union
Steve Schmidt, chief information security officer of AWS, addresses the current state of security in the cloud. As part of this pr esentation,
John Brady (CISO of FINRA) shares the FINRA journey to the cloud. Wednesday, Nov 29, 12:15 p.m. – 1:15 p.m. MGM, Level 3, Premier
Ballroom 316
ABD310 – How FINRA Secures Its Big Data and Data Science Platform on AWS
Learn how FINRA secures its Amazon S3 Data Lake and its data science platform on Amazon EMR and Amazon Redshift, while empowering
data scientists with tools they need to be effective. Wednesday, Nov 29, 11:30 a.m. – 12:30 p.m. Aria, Level 3, Juniper 3
ENT328 – FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud
The Financial Impact Regulatory Authority (FINRA) Technology Group has changed its customers' relationships with data by creating a
managed data lake Thursday, Nov 30, 1 p.m. – 2 p.m. MGM, Level 3, Premier Ballroom 319
DEV335 – Manage Infrastructure Securely at Scale and Eliminate Operational Risks
Managing AWS and hybrid environments securely and safely while having actionable insights is an operational priority and business driv er for
all customers. Thursday, Nov 30, 4 p.m. – 5 p.m. Venetian, Level 2, Venetian E
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Thank you!

More Related Content

What's hot

AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdfAMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
Amazon Web Services
 
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
Amazon Web Services
 
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Amazon Web Services
 
BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...
BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...
BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...
Amazon Web Services
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
Amazon Web Services
 
GPSTEC310_IAM Best Practices and Becoming an IAM Ninja
GPSTEC310_IAM Best Practices and Becoming an IAM NinjaGPSTEC310_IAM Best Practices and Becoming an IAM Ninja
GPSTEC310_IAM Best Practices and Becoming an IAM Ninja
Amazon Web Services
 
GPSBUS202_Driving Customer Value with Big Data Analytics
GPSBUS202_Driving Customer Value with Big Data AnalyticsGPSBUS202_Driving Customer Value with Big Data Analytics
GPSBUS202_Driving Customer Value with Big Data Analytics
Amazon Web Services
 
Build a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million usersBuild a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million users
Amazon Web Services
 
ABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data Applications
Amazon Web Services
 
GPSTEC307_Too Many Tools
GPSTEC307_Too Many ToolsGPSTEC307_Too Many Tools
GPSTEC307_Too Many Tools
Amazon Web Services
 
GPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting PartnersGPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
Amazon Web Services
 
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
Amazon Web Services
 
Analyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAnalyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon Kinesis
Amazon Web Services
 
How to Build Scalable Serverless Applications
How to Build Scalable Serverless ApplicationsHow to Build Scalable Serverless Applications
How to Build Scalable Serverless Applications
Amazon Web Services
 
SID301_Using AWS Lambda as a Security Team
SID301_Using AWS Lambda as a Security TeamSID301_Using AWS Lambda as a Security Team
SID301_Using AWS Lambda as a Security Team
Amazon Web Services
 
GPSBUS205_Power to the People- Amazon Connect
GPSBUS205_Power to the People- Amazon ConnectGPSBUS205_Power to the People- Amazon Connect
GPSBUS205_Power to the People- Amazon Connect
Amazon Web Services
 
GPSTEC302_Anti-Patterns- Learning through Failure
GPSTEC302_Anti-Patterns- Learning through FailureGPSTEC302_Anti-Patterns- Learning through Failure
GPSTEC302_Anti-Patterns- Learning through Failure
Amazon Web Services
 
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
Amazon Web Services
 
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
Amazon Web Services
 
GPSBUS204_Building a Profitable Next Generation AWS MSP Practice
GPSBUS204_Building a Profitable Next Generation AWS MSP PracticeGPSBUS204_Building a Profitable Next Generation AWS MSP Practice
GPSBUS204_Building a Profitable Next Generation AWS MSP Practice
Amazon Web Services
 

What's hot (20)

AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdfAMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
AMF302-Alexa Wheres My Car A Test Drive of the AWS Connected Car Reference.pdf
 
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
STG314-Case Study Learn How HERE Uses JFrog Artifactory w Amazon EFS Support ...
 
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
 
BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...
BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...
BAP202_Amazon Connect Delivers Personalized Customer Experiences for Your Clo...
 
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
LFS301-SAGE Bionetworks, Digital Mammography DREAM Challenge and How AWS Enab...
 
GPSTEC310_IAM Best Practices and Becoming an IAM Ninja
GPSTEC310_IAM Best Practices and Becoming an IAM NinjaGPSTEC310_IAM Best Practices and Becoming an IAM Ninja
GPSTEC310_IAM Best Practices and Becoming an IAM Ninja
 
GPSBUS202_Driving Customer Value with Big Data Analytics
GPSBUS202_Driving Customer Value with Big Data AnalyticsGPSBUS202_Driving Customer Value with Big Data Analytics
GPSBUS202_Driving Customer Value with Big Data Analytics
 
Build a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million usersBuild a Website & Mobile App for your first 10 million users
Build a Website & Mobile App for your first 10 million users
 
ABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data ApplicationsABD202_Best Practices for Building Serverless Big Data Applications
ABD202_Best Practices for Building Serverless Big Data Applications
 
GPSTEC307_Too Many Tools
GPSTEC307_Too Many ToolsGPSTEC307_Too Many Tools
GPSTEC307_Too Many Tools
 
GPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting PartnersGPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
GPSTEC201_Building an Artificial Intelligence Practice for Consulting Partners
 
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
EUT303_Modernizing the Energy and Utilities Industry with IoT Moving SCADA to...
 
Analyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAnalyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon Kinesis
 
How to Build Scalable Serverless Applications
How to Build Scalable Serverless ApplicationsHow to Build Scalable Serverless Applications
How to Build Scalable Serverless Applications
 
SID301_Using AWS Lambda as a Security Team
SID301_Using AWS Lambda as a Security TeamSID301_Using AWS Lambda as a Security Team
SID301_Using AWS Lambda as a Security Team
 
GPSBUS205_Power to the People- Amazon Connect
GPSBUS205_Power to the People- Amazon ConnectGPSBUS205_Power to the People- Amazon Connect
GPSBUS205_Power to the People- Amazon Connect
 
GPSTEC302_Anti-Patterns- Learning through Failure
GPSTEC302_Anti-Patterns- Learning through FailureGPSTEC302_Anti-Patterns- Learning through Failure
GPSTEC302_Anti-Patterns- Learning through Failure
 
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
GPSBUS221_Breaking Barriers Move Enterprise SAP Customers to SAP HANA on AWS ...
 
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
CMP216_Use Amazon EC2 Spot Instances to Deploy a Deep Learning Framework on A...
 
GPSBUS204_Building a Profitable Next Generation AWS MSP Practice
GPSBUS204_Building a Profitable Next Generation AWS MSP PracticeGPSBUS204_Building a Profitable Next Generation AWS MSP Practice
GPSBUS204_Building a Profitable Next Generation AWS MSP Practice
 

Similar to FSV307-Capital Markets Discovery How FINRA Runs Trade Analytics and Surveillance on AWS.pdf

AWS Cloud Experience CA: Data Lakes & Analytics en AWS
AWS Cloud Experience CA: Data Lakes & Analytics en AWSAWS Cloud Experience CA: Data Lakes & Analytics en AWS
AWS Cloud Experience CA: Data Lakes & Analytics en AWS
Amazon Web Services LATAM
 
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
Amazon Web Services
 
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
Amazon Web Services
 
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Amazon Web Services
 
BDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practicesBDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practices
Amazon Web Services
 
Modern Data Architectures for Real Time Analytics & Engagement
Modern Data Architectures for Real Time Analytics & EngagementModern Data Architectures for Real Time Analytics & Engagement
Modern Data Architectures for Real Time Analytics & Engagement
Amazon Web Services
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
Amazon Web Services
 
Modern data architectures for real time analytics and engagement
Modern data architectures for real time analytics and engagementModern data architectures for real time analytics and engagement
Modern data architectures for real time analytics and engagement
Amazon Web Services
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
Amazon Web Services
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Amazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Amazon Web Services
 
AWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWSAWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWS
Amazon Web Services
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
Amazon Web Services
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
Amazon Web Services
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
Amazon Web Services LATAM
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Amazon Web Services
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
Sungmin Kim
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Amazon Web Services
 

Similar to FSV307-Capital Markets Discovery How FINRA Runs Trade Analytics and Surveillance on AWS.pdf (20)

AWS Cloud Experience CA: Data Lakes & Analytics en AWS
AWS Cloud Experience CA: Data Lakes & Analytics en AWSAWS Cloud Experience CA: Data Lakes & Analytics en AWS
AWS Cloud Experience CA: Data Lakes & Analytics en AWS
 
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
AWS re:Invent 2016: Big Data Mini Con State of the Union (BDM205)
 
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
From Data Collection to Actionable Insights in 60 Seconds: AWS Developer Work...
 
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
Semplificare l'analisi dei dati con architetture "Serverless": architetture e...
 
BDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practicesBDA303 Serverless big data architectures: Design patterns and best practices
BDA303 Serverless big data architectures: Design patterns and best practices
 
Modern Data Architectures for Real Time Analytics & Engagement
Modern Data Architectures for Real Time Analytics & EngagementModern Data Architectures for Real Time Analytics & Engagement
Modern Data Architectures for Real Time Analytics & Engagement
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Modern data architectures for real time analytics and engagement
Modern data architectures for real time analytics and engagementModern data architectures for real time analytics and engagement
Modern data architectures for real time analytics and engagement
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
AWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWSAWS APAC Webinar Week - 2015 An Amazing Year in AWS
AWS APAC Webinar Week - 2015 An Amazing Year in AWS
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Big Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_SingaporeBig Data@Scale_AWSPSSummit_Singapore
Big Data@Scale_AWSPSSummit_Singapore
 
Builders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LCBuilders' Day - Building Data Lakes for Analytics On AWS LC
Builders' Day - Building Data Lakes for Analytics On AWS LC
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
AWS Analytics Immersion Day - Build BI System from Scratch (Day1, Day2 Full V...
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

FSV307-Capital Markets Discovery How FINRA Runs Trade Analytics and Surveillance on AWS.pdf

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS re:INVENT Capi tal Markets Di scovery: How F INRA Runs Trad e Anal yti cs and Survei l l ance on AWS R o b e r t K i s s e l l S r . S o l u t i o n s A r c h i t e c t W W P S F e d e r a l F i n a n c i a l s A W S J o h n H i t c h i n g h a m S r . D i r e c t o r E n g i n e e r i n g F I N R A N o v e m b e r 2 7 , 2 0 1 7 F S V 3 0 7
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Four pillars of the data lake Scale • Store and analyze all data centrally • Ingest data quickly without predefined schemas • Separate storage and compute, scaling each component as needed Cost • Pay only for what you need • Use only the services you need • Utilize diverse services/ features to optimize cost Security Encryption at each step • Explicit control of egress and ingress points • Compliance and Governance of Data access using AWS native services/features Agility • Big data does not mean just batch processing • Mix and match on- premises and cloud • Custom development and managed services
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data lake Central Storage Secure, cost-effective storage in Amazon S3 Data Ingestion Get your data into S3 quickly and securely Kinesis Firehose, Direct Connect, AWS Snowball, Database Migration Service Processing & Analytics Use of predictive and prescriptive analytics to gain better understanding DynamoDB Elasticsearch Service Athena, Amazon QuickSight, Amazon EMR, Amazon Redshift Protect & Secure Use entitlements to ensure data is secure and users’ identities are verified Catalog & Search Access and search metadata Access & User Interface Give your users easy and secure access
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FINRA’s Data Lake Surveilling markets with FINRA’s multi-petabyte enterprise-grade data lake
  • 5.
  • 6. Market regulation—analytics pipeline Validation Prepare for Analytics (ETL) Run Automated Detection Models Interactive Analytics Regulatory Analyst Explore Investigate Regulatory Follow up BDs Exchanges Reference Data Providers Trade execution records Market reference data Data Scientist Develop Models 75B+ events 20+ PB of Data 3Yrs Prod on CloudMajor Exchange Clients
  • 7. Cloud journey—data puddles to data lake Database1 Storage Query/Compute Catalog Database2 Storage Query/Compute Catalog Databasen Storage Query/Compute Catalog Storage Query/ Compute Catalog EMR LambdaEMR Presto EMR HBase FINRA herd Hive metastore Silo Amazon S3 Scales
  • 8. http://finraos.github.io/herd Unified catalog • Schemas • Versions • Encryption type • Storage policies Lineage and Usage • Track publishers and consumers • Easily identify jobs and derived data sets Shared Metastore • Common definition of tables and partitions • Use with Spark, Presto, Hive, etc. • Faster instantiation of clusters Herd catalog—for centralized data management
  • 9. Trades Surveillance 2017-03-01 v1 2017-03-02 v1 2017-03-01 v1 2017-03-02 v1 Regulatory conclusion Lineage 1 Trades Surveillance 2017-03-01 v1 2017-03-02 v1 2017-03-01 v1 2017-03-02 v1 Regulatory conclusion 2 2017-03-01 v2 v2 Data Version ? ? Example—lineage and data versioning
  • 10. Files Ingest Define Record Legal Hold? No IAM role with delete on bucket Review/Approve Process Tag files For delete DM Managed Amazon S3 Bucket Trade Reports OATS Orders Model Outputs Delete Delete files call Herd—foundation for records management Files Herd DM Metadata All deletes via policy based on tags Register Object Store file(s) Set Record Flag Set Record Period Set Record Owner Set / Clear Legal Hold Gen list of Records eligible for deletion File life on Amazon S3
  • 11. Universal data catalog—explore data Analysts Data Scientists Developers Built on
  • 12. Catalog & Storage ETL Normalize, Enrich, Reformat Human Analytics Validation Ingest Broker Dealers Exchanges Third-Party Providers Data Files Analyst Data Scientist Regulatory User Detection models (Patterns) Automated Surveillance P P P A A P Processing Pipeline A Analytics Analytic data processing pipeline on the data lake
  • 14. ETL execution Input Data Input Data Input Data Input Data Input Data Job1 Job2 Job3 Job4 Job5 Job6 JobN … Output Data Output Data Output Data Output Data Output Data Amazon S3 Amazon S3 Amazon EMR Orchestration Data Location Registration Per Second BillingSpot Hive (Deprecated) Spark
  • 15. Dynamic processing 0.0 1.0 2.0 3.0 4.0 5.0 11/1 11/8 11/15 11/22 11/29 Daily Order Volume (Billions) 0 2000 4000 6000 8000 10000 12000 2016-10-17T02 2016-10-17T08 2016-10-17T14 2016-10-17T20 2016-10-18T02 2016-10-18T08 2016-10-18T14 2016-10-18T20 2016-10-19T02 2016-10-19T08 2016-10-19T14 2016-10-19T20 2016-10-20T02 2016-10-20T08 2016-10-20T14 2016-10-20T20 2016-10-21T02 2016-10-21T08 2016-10-21T14 2016-10-21T20 2016-10-22T02 2016-10-24T03 2016-10-24T20 ComputeNodes Hour of Day Amazon EMR compute on Amazon EC2 EMR 20k – 25k EC2 nodes per day 93% of EC2 is on EMR Avg EC2 node: 3 cores Avg EC2 uptime: 3 hours 96% of EC2 nodes live < 24 hrsOver 50k nodes on peak day
  • 16. Interactive analytics—fundamentals Data Analyst Data Scientist JDBC/ODBC Client JDBC/ODBC Client Table 1 Table 2 AuthN AuthZ Metastore Table N Logical “Database” = 4+ PB Amazon EMR
  • 17. Achieving interactive query Query Table size (rows) Output size (rows) ORC TXT/BZ2 select count(*) from TABLE_1 where trade_date = cast(‘2016-08-09’ as date) 2469171608 1 4s 1m56s select col1, count(*) from TABLE_1 where col2 = cast('2016- 08-09' as date) group by col1 order by col1 2469171608 12 3s 1m51s select col1, count(*) from TABLE_1 where col2 = cast('2016- 08-09' as date) group by col1 order by col1 2469171608 8364 5s 2m5s select * from TABLE_1 where col2 = cast('2016-08-10' as date) and col3='I' and col4='CR' and col5 between 100000.0 and 103000.0 2469171608 760 10s 2m3s Test Config: Presto 0.167.0.6t (Teradata) On EMR Data on S3 (external tables) Cluster size: 60 worker node x r4.4xlarge Key points: Use ORC (Or Parquet) for performant query
  • 18. User A JDBC/ODBC Client Table 1 Table 2 Metastore Table N Logical “Database” JDBC/ODBC Client User B JDBC App Cluster A Cluster B Cluster N Still One Copy Of Data Scaling out interactive query
  • 19. FINRA’s interactive Big Data portfolio Data Lake Diver MIRS DOMT User-Directed FOLA Marketspace Crosstab UI Personal marts - billons of rows Domain-specific interactive reports and visualizations Visualize depth of market Investigation and data profiling via SQL Retrieve market events to render order lifecycle Exception and alert viewer
  • 20. Data science ecosystem on data lake Data Scientist JDBC/ODBC Client Logical ‘Database’ EMR Cluster Source Data Spark Cluster DS-in-a-box Data Scientist Notebook Interface Data Scientist Catalog Notebook or Shell Personal Data Marts Explore
  • 21. Example—cross-market surveillance NASDAQ PSX NYSE AMEX ARCA OATS TRF ISG Audit Trail Cross-market Data Model Unifies market data into five major events: orders, reports, cancels, trades, and quotes. Captures events and attributes required for patterns. Provides consistent cross market participant definition. Propagates participant information as an order is routed from Firm to Exchange and from Exchange to Exchange Calculates open interest for all orders at any given time during the day ETL Data Cross Market Surveillance Models (automated) Depth of Market Tool & Diver (interactive) Use Use
  • 22. Surveillance execution (like ETL) Input Data Input Data Input Data Input Data Input Data Pattern1 Pattern2 Pattern3 Pattern4 Pattern5 Pattern6 PatternN… Output Data Output Data Output Data Output Data Output Data Amazon S3 Amazon EMR Orchestration Data Location Registration Fwk Mgr Dev Ops Per Second BillingSpot Hive (Deprecated) Spark Amazon S3
  • 23. Surveillance evolution Execution Engine Relational DB Hive, Spark Spark Language SQL SQL (HiveQL, Spark SQL) Scala, Python, R, SQL, Java Production Logic SQL w/ some scripting SQL w/ some scripting ML model (H2O, MLlib) Data Catalog N/A Catalog provides schema/ location Create dataframes Catalog provides schema/ location Data Framework N/A N/A Data manipulated as dataframe API for common manipulations today Before Cloud Cloud v1 Cloud v2
  • 24. FINRA’s dynamic surveillance platform Data Engineering Model Selection ML Framework Data Framework Trained Model Scoring Algorithms EGRPython, R, Scala, SQL Scala Python Scala, Python, R Test Chosen Model Data Observation-1 Observation-2 Observation-n … Notebook Promotion Data Lake Amazon EC2 Amazon EC2 Amazon S3 Model Development Prod FINRA herd Python, R, Scala Data Framework Scala Python Iterative
  • 25. VPC isolation Security Groups VPC Endpoints SDLC Isolation (Accts) AWS KMS EMR Security Configs S3 SSE S3 KMS EBS KMS AWS CloudTrail Splunk Nagios Isolation Encryption MonitoringAuthN/AuthZ Role-based access IAM ADFS Federation Temporary token access AD LDAP Integration (Apps) Security
  • 26. Compliance—consistency, transparency Compliance Reports FINRA Provision Tool Compliant Stack Configs FINRA Portus Tool Approved Security Groups Dev Account QC Account Prod Account Security EA FINRA IAMUS Tool IAM Role Templates Development Tools Dev Teams Automated Deploy Automated Deploy Configs / Chg Events Configs / Chg Events Configs / Chg Events CloudTrail Policies Reg SCI SoX SOC2SECAudits
  • 27. Reporting/ Investigation Data Science Machine Learning Data Management Data Processing Pipeline Improved Cost Reduction Security Regulatory Compliance    AchievedSimplified Benefits of a data lake implementation
  • 28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FINRA Presentations re:Invent 2017 FSV307 – Capital Markets Discovery: How FINRA Runs Trade Analytics and Surveillance on AWS The FINRA analytics platform unlocks the value in capital markets data by accelerating trade analytics and providing a foundation for machine learning at scale. Monday, Nov 27, 10:45 a.m. – 11:45 a.m. Venetian, Level 5, Palazzo P SID326 – AWS Security State of the Union Steve Schmidt, chief information security officer of AWS, addresses the current state of security in the cloud. As part of this pr esentation, John Brady (CISO of FINRA) shares the FINRA journey to the cloud. Wednesday, Nov 29, 12:15 p.m. – 1:15 p.m. MGM, Level 3, Premier Ballroom 316 ABD310 – How FINRA Secures Its Big Data and Data Science Platform on AWS Learn how FINRA secures its Amazon S3 Data Lake and its data science platform on Amazon EMR and Amazon Redshift, while empowering data scientists with tools they need to be effective. Wednesday, Nov 29, 11:30 a.m. – 12:30 p.m. Aria, Level 3, Juniper 3 ENT328 – FINRA's Managed Data Lake: Next-Gen Analytics in the Cloud The Financial Impact Regulatory Authority (FINRA) Technology Group has changed its customers' relationships with data by creating a managed data lake Thursday, Nov 30, 1 p.m. – 2 p.m. MGM, Level 3, Premier Ballroom 319 DEV335 – Manage Infrastructure Securely at Scale and Eliminate Operational Risks Managing AWS and hybrid environments securely and safely while having actionable insights is an operational priority and business driv er for all customers. Thursday, Nov 30, 4 p.m. – 5 p.m. Venetian, Level 2, Venetian E
  • 29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Thank you!