Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A Streaming Quantitative Analytics Engine
Raj Subramani <raj@subramani.com>
The Challenge
Move to Centralized
Clearing
Lower margin
Higher Volumes
Continued Regulatory
Expansion
Principle based comp...
The Anatomy of a Risk Engine
Source Workflow Distribution Results
Trades
Market Data
Results
Cache
Action A
Action B
Actio...
Cloud Dataflow - Unified Batch and Stream
Any combination of primitive & custom transformations
Filtered
Filtered & groupe...
Apache Beam
Apache Beam
Language B
SDK
Language A
SDK
Language C
SDK
Runner 1 Runner 3Runner 2
The Beam Model
Language A Language CLan...
The Beam Model & Cloud Dataflow
a unified model for
batch and stream processing
supporting multiple runtimes
a great place...
DoFn – Functional Programming Style
Input (PCollection) Function
Output (PCollection)
Output (PCollection)
8
Workflow Pipeline
DoFn A
DoFn B
DoFn C
DoFn D
9
ParDo
ParDo
ParDo
ParDo – Parallel Processing of DoFn’s (Known as a PTransform)
DoFn A
DoFn B
DoFn C
DoFn D
ParDo
10
ParDo
ParDo
Distribution to workers
DoFn A
ParDo
Worker
DoFn A
Worker
DoFn A
Worker
DoFn A
11
ParDo
DoFn A
ParDo
Dataflow Fusion
DoFn A
Worker
DoFn A
Worker
DoFn A
Worker
DoFn A
DoFn B
DoFn B
DoFn B
12
Native Code Execution
Worker Container
Process (Java) JNI
C++
13
Out of Process Call
Worker
Container
Process
(Java)
JNI
C++
Standard In
Process Signal
14
Out of Process Call
Process Signal
Worker
Container
Process
(Java)
JNI
C++
Interface
15
Out of Process Call
Worker
Container
Process
(Java)
JNI
C++
Disk
Process Signal
Input Input
Output Output
16
Out of Process Call
Protobuf
Java
C++
C#
Python
17
Out of Process Call
Worker
Container
Process
(Java)
C++
Disk
Process Signal
Protobuf Input
Protobuf Output
18
Out of Process Call: Testing!
C++Protobuf
Unit Test (C++)
Protobuf
Unit Test (Java)
DoFn
19
Out of Process Call
Worker
Container
Process
(Java)
Go
Protobuf C++
Python
20
Out of Process Call
C++Input (Protobuf)
Risk Output
(Protobuf)
Error Output
(Protobuf)
Worker
Container
Process
(Java)
21
Module Separation
S
O
U
R
C
E
S
I
N
K
Cloud
Pub/Sub
Cloud
SQL
Cloud
Spanner
Cloud
Bigtable
Cloud
Storage
Pipeline code and...
Module Separation
S
O
U
R
C
E
S
I
N
K
Cloud
SQL
Pipeline code and functionsI/O I/O
Cloud
Storage
23
Apache Flink
24
What is Apache Flink?
Apache Flink is open source stream
processing framework
Code written for Apache Beam can 
run on Apa...
Building a Risk Engine
Source Workflow Distribution Results
Trades
Market Data
Results
Cache
Action A
Action B
Action C
26
Building Blocks: Flink
Source Workflow Distribution Results
Results
Cache
Action A
Action B
Action C
27
Building Blocks: Flink Running Beam
Source Workflow Distribution Results
Results
Cache
Action A
Action B
Action C
28
Building Blocks: Disk
Source Workflow Distribution Results
Results
Cache
Action A
Action B
Action C
29
What do You Need to do?
Results
Action A
Action B
Action C
Configure/Scale
30
I Really Need to Run on Premises
https://data-artisans.com/da-platform-2  31
During Development
Source Workflow Distribution Results
Results
Cache
Action A
Action B
Action C
32
During Development: Local Disks
Source Workflow Distribution Results
Results
Cache
Action A
Action B
Action C
33
During Testing: Use the Cloud
34
Watch Out for Runner Differences
I need a Uber Jar
I need a folder of Jars in
Google Cloud Storage
35
Watch Out for Runner Differences
What level of parallelism
do you need?
I'll decide parallelism for you
36
Test Setup
Results
Linux Blades
Results
Cloud
Bigtable
Analytics: Open source Quantlib v1.9.2 (for XML over JNI)
Open sour...
Batch Results
2,000,000 plain vanilla interest rate swaps
Interest rates curves from FRA, Futures & Swaps, OIS & Libor in ...
Scaling Out
2,000,000 plain vanilla interest rate swaps
Interest rates curves from FRA, Futures & Swaps, OIS & Libor in 12...
Thank You
Upcoming SlideShare
Loading in …5
×

Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytics engine"

557 views

Published on

The application of Quantitative Analytics to trades for the generation of Risk and P&L metrics has traditionally followed a batch based approach. Regulatory changes impose increasing demand for compute on financial institutions along with a growing demand for real time analytics due to increased volumes in eTrading across all asset classes

The talk is based on a use case for pricing Interest Rate Swaps, using Apache Beam, with a call to an external C++ analytics process. It describes the performance characteristics when operating in a non-cloud environment using Apache Flink as opposed to Google Cloud Dataflow

The talk will touch upon the subtle difference when operating across multiple runners. It will make suggestions on approaches to portability when architecting for a multi-runner operational environment.

Published in: Technology
  • Excelente los enlaces. he visto varios videos. https://uautonoma.cl
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Flink Forward Berlin 2018: Raj Subramani - "A streaming Quantitative Analytics engine"

  1. 1. A Streaming Quantitative Analytics Engine Raj Subramani <raj@subramani.com>
  2. 2. The Challenge Move to Centralized Clearing Lower margin Higher Volumes Continued Regulatory Expansion Principle based compliance Non-financial Risks Contagion risk Model risk Technology and Analytics as Risk Muscle Big data Machine learning Improved Decision Making Bias recognition Bias elimination Cost Challenges Capex Opex 2
  3. 3. The Anatomy of a Risk Engine Source Workflow Distribution Results Trades Market Data Results Cache Action A Action B Action C 3
  4. 4. Cloud Dataflow - Unified Batch and Stream Any combination of primitive & custom transformations Filtered Filtered & grouped Filtered, grouped & windowed Batch Stream 4
  5. 5. Apache Beam
  6. 6. Apache Beam Language B SDK Language A SDK Language C SDK Runner 1 Runner 3Runner 2 The Beam Model Language A Language CLanguage B The Beam Model 6
  7. 7. The Beam Model & Cloud Dataflow a unified model for batch and stream processing supporting multiple runtimes a great place to run Beam Apache Beam Google Cloud Dataflow 7
  8. 8. DoFn – Functional Programming Style Input (PCollection) Function Output (PCollection) Output (PCollection) 8
  9. 9. Workflow Pipeline DoFn A DoFn B DoFn C DoFn D 9
  10. 10. ParDo ParDo ParDo ParDo – Parallel Processing of DoFn’s (Known as a PTransform) DoFn A DoFn B DoFn C DoFn D ParDo 10
  11. 11. ParDo ParDo Distribution to workers DoFn A ParDo Worker DoFn A Worker DoFn A Worker DoFn A 11
  12. 12. ParDo DoFn A ParDo Dataflow Fusion DoFn A Worker DoFn A Worker DoFn A Worker DoFn A DoFn B DoFn B DoFn B 12
  13. 13. Native Code Execution Worker Container Process (Java) JNI C++ 13
  14. 14. Out of Process Call Worker Container Process (Java) JNI C++ Standard In Process Signal 14
  15. 15. Out of Process Call Process Signal Worker Container Process (Java) JNI C++ Interface 15
  16. 16. Out of Process Call Worker Container Process (Java) JNI C++ Disk Process Signal Input Input Output Output 16
  17. 17. Out of Process Call Protobuf Java C++ C# Python 17
  18. 18. Out of Process Call Worker Container Process (Java) C++ Disk Process Signal Protobuf Input Protobuf Output 18
  19. 19. Out of Process Call: Testing! C++Protobuf Unit Test (C++) Protobuf Unit Test (Java) DoFn 19
  20. 20. Out of Process Call Worker Container Process (Java) Go Protobuf C++ Python 20
  21. 21. Out of Process Call C++Input (Protobuf) Risk Output (Protobuf) Error Output (Protobuf) Worker Container Process (Java) 21
  22. 22. Module Separation S O U R C E S I N K Cloud Pub/Sub Cloud SQL Cloud Spanner Cloud Bigtable Cloud Storage Pipeline code and functionsI/O I/O 22
  23. 23. Module Separation S O U R C E S I N K Cloud SQL Pipeline code and functionsI/O I/O Cloud Storage 23
  24. 24. Apache Flink 24
  25. 25. What is Apache Flink? Apache Flink is open source stream processing framework Code written for Apache Beam can  run on Apache Flink 25
  26. 26. Building a Risk Engine Source Workflow Distribution Results Trades Market Data Results Cache Action A Action B Action C 26
  27. 27. Building Blocks: Flink Source Workflow Distribution Results Results Cache Action A Action B Action C 27
  28. 28. Building Blocks: Flink Running Beam Source Workflow Distribution Results Results Cache Action A Action B Action C 28
  29. 29. Building Blocks: Disk Source Workflow Distribution Results Results Cache Action A Action B Action C 29
  30. 30. What do You Need to do? Results Action A Action B Action C Configure/Scale 30
  31. 31. I Really Need to Run on Premises https://data-artisans.com/da-platform-2  31
  32. 32. During Development Source Workflow Distribution Results Results Cache Action A Action B Action C 32
  33. 33. During Development: Local Disks Source Workflow Distribution Results Results Cache Action A Action B Action C 33
  34. 34. During Testing: Use the Cloud 34
  35. 35. Watch Out for Runner Differences I need a Uber Jar I need a folder of Jars in Google Cloud Storage 35
  36. 36. Watch Out for Runner Differences What level of parallelism do you need? I'll decide parallelism for you 36
  37. 37. Test Setup Results Linux Blades Results Cloud Bigtable Analytics: Open source Quantlib v1.9.2 (for XML over JNI) Open source Quantlib v1.10.0 (for Protobuf/Direct calls) Trade data: 2,000,000 plain vanilla mono currency interest rate swaps 100,000 Bermudan Swaptions Market data: Interest rate curves built using FRA, Futures and Swaps in 12 currencies 37
  38. 38. Batch Results 2,000,000 plain vanilla interest rate swaps Interest rates curves from FRA, Futures & Swaps, OIS & Libor in 12 currencies Open source Quantlib v1.10.1 Batch size 200,000 400,000 600,000 800,000 2,000,000 200,000 400,000 600,000 800,000 2,000,000 0 600 1100 1600 2100 2600 0 500 1000 1500 2000 2500 Wallclocktime(sec) Trades Market Data Analytics Flink: JNI/XML vs Protobuf Dataflow: JNI/XML vs Protobuf XML/JNI: Flink vs Dataflow Protobuf/Direct C++: Flink vs Dataflow 38
  39. 39. Scaling Out 2,000,000 plain vanilla interest rate swaps Interest rates curves from FRA, Futures & Swaps, OIS & Libor in 12 currencies Open source Quantlib v1.10.1 Wallclocktime(sec) Trades Market Data Analytics 2,000 0 4,000 6,000 8,000 10,000 12,000 14,000 16,000 1,000 2,000 3,000 4,000 5,000 6,000 Number of vCPUs deployed Scale out will depend on data structure and workflow logic The more the workflow is controlled by Beam, the better the opportunity for dynamic rebalancing 39
  40. 40. Thank You

×