SlideShare a Scribd company logo
1 of 6
Download to read offline
Complex Event Processing within a Managed Accounts Platform
M. Dieckmann
Stocksinside Ltd, London, UK
This document describes a pattern how to process real-time market events against a huge number of trading strategies
within the context of complex event processing and separately managed accounts. The main challenge is to define an
efficient parallel processing architecture for market data analysis that could deal with millions of rule checks per
second, triggered by a broad stream of market events and influenced by a huge number of different trading strategies.
Another problem within this area is the efficient reduction of overall rule checks without violating the completeness
of market data analysis. This means, that filtering of the input events can significantly reduce the number of processed
rule checks, but always at the burden of missing trading opportunities.
I. INTRODUCTION
A key advantage of separately managed accounts (SMA)
as a financial product is the execution of individual trading
strategies according to the investment preferences of the
account owner. Within this context, a trading strategy
defines the “what” and the “when” to buy or to sell. The
selection of the right financial instruments, at the right point
in time is a key element of any investment strategy.
But doing the research for only a single SMA with a single
trading strategy can be hard work and if not automated, the
operation of many SMA, each with many different trading
strategies can be almost impossible.
Managed Account Platforms (MAP), are software
applications that support the automation of the account
management process, in terms of asset allocation, risk
management and reporting. Each SMA could have multiple
trading strategies defined, which have to be controlled by the
MAP regarding the execution of trade actions. The system
has to detect, if a security or other financial instrument is
matching the strategy and has to be bought or sold according
the strategy.
This requires the uninterrupted and focused supervision of
the stock market and finally the generation of trade signals to
take appropriate action. One can roughly estimate the
necessary processing power, when considering dozens of
conditions per trading strategy (e.g. 100), having ten or more
strategies per SMA (10), multiplied by thousands of SMAs
(103
) and checked against only a million of market data
events per second. This example would result in 1012
condition checks per second. A very effective reduction of
calculations has to be applied in order to not overwhelm the
MAP. This area of concern – widely known as Complex
Event Processing – is the subject of the pattern described
within this paper.
II. COMPLEX EVENT PROCESSING
The term Complex Event Processing (CEP) is usually
defined as analysing real-time streams of data to detect
meaningful events or patterns and take appropriate action
[1]. The term “Complex” within CEP is used, because data
of multiple sources and types has to be analysed by large
rule sets.
An event can be defined as a pattern match, where the
pattern which consists of multiple rules with conditions and
related thresholds. Typically, the data of the real-time stream
is compared with historical data within the rule conditions.
In the financial service industry, the real-time data is mostly
a market data stream, whereas the historical data is based on
a time series database.
Figure 1 : High-level diagram of CEP data flows within the
SMA domain. The market data stream is analysed by the help of
defined trading strategies and their rules as well as historical data to
create trading signals.
III. PROBLEM AND OBJECTIVES
The major problem of designing a CEP system within the
context of separately managed accounts is processing a large
set of trading strategies, each of it consisting of many rules,
whose conditions can contain calculated values on-the-fly, at
the rate of the incoming stream.
As already mentioned in the introduction, processing the
unfiltered amount of rules doesn’t work well, due to the
humongous number of rule check operations in a typical
scenario. That’s why the number of rule checks has to be
massively reduced and the overall process has to parallelized
as good as possible.
The objectives addressed within this paper are:
• Design a CEP approach to run a huge number of
trading strategies within the context of MAP.
• Utilize a parallel processing architecture pattern
to efficiently distribute the calculation load.
• Significant reduction of calculation steps while
processing the stream of market data events.
Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 2
IV. TRADING STRATEGIES
Before we go into the details of system design, the theory
and definition of trading strategies should be explained
within this chapter.
The term “Trading Strategy” can be defined as follows: A
set of objective rules designating the conditions that must be
met for trade entries and exits to occur [2].
Since a trading strategy is defined as a set of rules, a closer
look on these elements of a strategy should be provided.
A single rule, used as a building block of a strategy can be
defined as a Boolean-valued function, that maps a vector of
the input domain X (x1 to xn) to a Boolean set B = {0, 1}.
𝑓: 𝑋 → 𝐵
A rule can also be described as a condition in Boolean
algebra, with a value compared to another value by the help
of a relational operator. These values are usually the result of
a function (e.g. moving average of prices or price/earnings
ratio). The Boolean function ƒ(X) equals 1, if the condition
is true.
Example: last price > SMA (10d) * 1.10
(SMA = simple moving average)
A trading strategy usually consists of many conditions,
that all have to be fulfilled in order to represent a strategy
match. This can be expressed by the following Boolean-
valued function, where a k-ary Boolean condition is mapped
to the result set B.
𝑓: 𝐵 𝑘
→ 𝐵
Every k-ary Boolean function can be expressed as a
propositional formula with k propositions (conditions)
connected via Boolean operators (e.g. conjunction, etc.).
Strategy: c1 ˄ c2 ˄ . . . ˄ ck = 1 (if c1 to ck = 1)
An example from the market data analysis area could be:
P/E < 10 AND Last-Price < SMA (10) * (1 – 0.1)
Which means, if the price/earnings ratio of a company is
smaller than 10 and the last price is smaller than 90% the
simple moving average of the last 10 days, then the Boolean
algebra equation is true (1) and the strategy is considered as
matched.
Trading strategies in real world scenarios usually have 10
to 20 conditions from various domains and incorporate a
similar set of complex functions (e.g. SMA) within the
conditions.
V. ARCHITECTURE PATTERNS
An architectural pattern is a concept that solves and
delineates some essential cohesive elements of software
architecture [3].
In the area of parallel processing, the “pipeline” pattern is
a well-known architecture pattern to parallelize calculations.
It’s based on the assumption that complex data processing
can be separated into stages, which are executed in parallel.
Figure 2 : Data chunks (D1-3) arriving at the left side of the
pipeline are processed by several stages (S1-3). When S1 has
finished processing D1, this data chunk is forwarded to S2 and S1
starts processing D2 and so on. In a continuous stream of data
chunks Dx, each stage Sn is always processing a certain data chunk.
This pattern works well, if the various stages require
similar time to process each data chunk. If one stage needs
significantly much more time to complete, then the data
chunks queue up before that stage and the overall throughput
is limited by the slowest stage.
To mitigate this effect, the stages themselves have to
provide parallel processing capabilities. This can be done by
assigning multiple threads (thread pool) to each stage. Like
in the pipe-and-filter pattern, the edges between the stages
are realized by message queues, which lead to the following
processing pattern.
Figure 3 : Stages Sx are separated by input queues Qx and
contain multiple threads (m, n, k), which read messages from the
queues, whenever they are idle.
The architecture pattern illustrated in figure 3 is also
called staged event-driven architecture (SEDA) and refers to
an approach that decomposes a complex, event-driven
application into a set of stages connected by queues [4].
This pattern guarantees that the processing of a multiple
data chunks is optimized in such a way, that other stages do
not wait, while a data chunk Di is processed by a certain
stage. Additional parallelization can be achieved by using
multiple pipelines and input stream partitioning.
S1 S2
S3
D1
D1
D2
D1
D2
D3
S1
T1
T2
Tm
S2
T1
T2
Tn
S3
T1
T2
Tk
Q1 Q2
Q3
.
.
.
.
.
.
.
.
.
Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 3
VI. LOGICAL DATA MODEL
Within the context of a managed accounts platform, the
following logical data model is the foundation of the
described processing of market data events.
Figure 4 : Logical Data Model of CEP/MAP approach. Each
trading strategy consists of multiple business rules and related
configuration objects that contain the rule parameters (e.g.
thresholds and function parameters). Each pattern match refers to a
single financial instrument.
In stream processing systems the entire computation starts
with the reception of input data objects, which is in this case
any market data event that is part of the inbound stream.
Market data events belong to a financial instrument, and in
the model illustrated above, they also belong to business rule
configurations (BRC), which are part of trading strategies.
When a market data event arrives, the computational
navigation through the logical data model starts with the
evaluation of related business rule configurations and the
concrete business rule conditions to be checked. If a rule
condition matches, the remaining business rules of the
related trading strategy definition are checked as well.
If all business rules of a trading strategy are matching, a
pattern match is created. This kind of object already
represents a trading signal to take action upon. Since the
original market data event is related to a financial
instrument, the pattern match is related to the same. If the
user really decides to take action, the financial instrument
becomes part of a portfolio, which is again tied to a managed
account on behalf of the user.
Trading strategies are usually not operated without any
reference to a certain market and that’s the reason why they
are assigned to a market segment, which again contains any
number of financial instruments. This means, that trading
strategies within our model are related to the instruments of
a market segment and that different strategy configurations
can be executed within different market segments. The
reason behind this is that the same strategy from a structural
point of view requires slightly different parameters, if
applied in different market segments.
VII. INPUT STREAM PARTITIONING
Within the context of complex event processing a market
data stream consists of an unlimited number of continuous
market data events that are related to a limited number of
financial instruments. If the single financial instrument is
taken as sorting criteria, the overall stream can be seen as a
limited number of logical streams, each comprising all kinds
of events related to a single financial instrument.
Input stream partitioning in that context implies the
grouping of logical streams into sets of financial instruments
to be processed by dedicated pipelines.
Figure 5 : Market events related to a financial instrument FIk,
k = {1..M}, which represent a single logical stream, can be further
divided into logical sub-streams, containing events of certain event
types ETr, r = {1..N}.
To retain the logical order of the events within a logical
stream and to achieve highest processing throughput of that
stream, this stream must be processed by a dedicated stream
processing pipeline. If the events of different sub-streams are
uncorrelated from a timely perspective and certain event
types could cause significant delay, the processing of these
sub-streams could be distributed across several specialized
processing pipelines.
In our case all sub-streams (event types) of a logical
stream are processed by the same pipeline instance.
VIII. SYSTEM ARCHITECTURE
This chapter describes different aspects of the system
architecture.
A. Overview
The main objective of any CEP platform is to determine
pattern matches out of a stream of data. The following figure
shows the main building blocks of the system described
within this paper.
Figure 6 : The figure shows a simplified view of the system
architecture of the CEP platform. Beginning at the left side, several
market data sources are normalized and provide a stream of market
events to the CEP component, which determines pattern matches.
These are then delivered to the particular users.
Business Rule
Configuration
Trading
Strategy
Market
Segment
Portfolio
Financial
Instrument
Pattern
Match
Managed
Account
*
*
*
*
*
1
11
11
*
1
1
11
Market Data
Event
Managed Account
Platform* 11
FI1
FIM
ET1
ET2
ETN
ET1
ET2
ETN
Logical
Stream 1
Logical
Stream M
Market Events
Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 4
Typical for CEP applications is a normalized stream of
data created from different data sources. A normalization
component is responsible to create this input stream, which
is forwarded to the CEP engine and the historical data store.
As already mentioned in the chapter before, this overall
stream can be seen as set of many logical streams, each
related to a single financial instrument.
Historical data is often needed to calculate complex
parameters, used within rule condition checks and compared
with current data extracted from the stream. For example the
50-day moving average of prices is a function that requires
historical price data (e.g. closing prices).
The CEP engine is processing the input data stream by
continuously checking all rule conditions of all strategies. If
all rule conditions of a strategy match, a strategy pattern
match has been occurred, this is then stored within a
database for late retrieval and forwarded to a delivery
component for direct notification of related users. The
delivery component normally has to use historical data to
visualize pattern matches.
The major part of processing and complexity lies within
the CEP component, which is described in more detail
within the next sections.
B. Event Processing Pipeline
The determination of trading signals from market events is
a multi-stage process that can be seen as a processing
pipeline. The individual processing steps are illustrated
within the next figure.
Figure 7 : The figure shows the activities and result objects
within the market event processing pipeline.
When a market event arrives at the beginning of the
pipeline the following steps are executed:
1. The market event is categorized by type and the
related business rule types are determined. In addition
all related business rule configurations are selected.
These are concrete rule specifications that are part of
real strategy definitions and contain rule condition
parameters to be used within rule checking1
.
2. All business rule configurations are evaluated during
the rule checking activity. This means, that the
parameters of a single business rule configuration
together with the current market event are loaded into
1
Only those rule configurations are selected, that are part of a
strategy, which should be applied to this market event from an
instrument type perspective (e.g. option price ≈ option strategy).
the check function of the business rule and evaluated.
If the condition of the rule check is fulfilled, the
business rule configuration is considered a matching
rule configuration.
3. Based on the matching business rule configurations,
the related strategy definitions are determined.
4. For each strategy the remaining business rule
configurations are determined. Usually, a trading
strategy consists of many business rule configurations
that all have to be checked in order to detect a
strategy match that is equal to a trading signal.
5. For each strategy the unchecked business rule
configurations are checked. If any of the remaining
rule checks fails, the checking is aborted for the
particular strategy.
6. For all trading strategies with completely checked
business rule configurations, a trading pattern match
also known as trade signal is created and persisted.
C. Reduce Processing Complexity
As already mentioned, the reduction of processing
complexity is crucial for a proper system design that can
achieve the objectives. The following multi-level approach
to reduce the overall amount of work has been selected.
1) The first strategy to reduce the number of condition
checks is the filtering of similar input events. If for instance
two quotes are very close together, one might consider
checking only the first one and dropping the second one due
to similar pattern matching expectations.
Certainly, there is the possibility that particular pattern
matches are not detected, since the first event doesn’t trigger
the match and the second event is not processed due to
filtering. The rate of losses depends on the chosen difference
in order to drop subsequent events. This is the reason why
this approach is a compromise between reduction of
processing and accuracy of event detection.
If applied, this type of filtering is done at the beginning of
the CEP pipeline (figure 8).
2) Rule condition checks of the same rule, with the same
condition threshold values are only done once, even if the
rules belong to different strategies or SMA. This is because
within a huge set of SMA, the repeating usage of the same
strategies is very likely, especially if a strategy has proven
superior.
This reduction is done in the first step “Determine related
unique business rule configuration”, where BRCs with equal
parameters are simply discarded. At a later stage of
processing, rule matches are reassigned to those strategies,
where they were removed at this stage of processing.
3) The last concept to further reduce rule checking is the
filtering of strategies that are not assigned to a certain type
of market event. For example if a strategy is marked to be
designed for options, a stock price message won’t trigger the
execution of the rule checking mechanism. All business rule
Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 5
configurations that are part of option strategies will be
discarded before rule checking.
Figure 8 : This figure shows the three stages of reducing rule
checks within the market event processing pipeline.
IX. VOLUME ESTIMATIONS
This chapter provides a rough estimation about real world
figures within a scenario of stock-based trading strategies to
be applied in the US stock market.
The following table shows initial figures and some derived
estimations:
Initial Figures (Assumptions)
Number of stocks (major US markets): 10'000
Number of stocks with 100 trade ticks per sec: 700
Number of stocks with 10 trade ticks per sec: 2'000
Number of stocks with 1 trade ticks per sec: 7'000
Total number of trades per second: 97'000
Number of SMA (accounts): 1'000
Number of strategies per SMA: 3
Number of rules per strategy: 10
Derived Figures
Number of primary rule checks per second
without any reduction = number of trade ticks:
97'000
Number of different trade ticks after filtering
(strategy 1, 1-out-of-10):
16'000
Number of rule checks due to strategy 2: 16'000
Number of rule matches in percent: 5.0%
Number of event triggered rule matches: 800
Number of related strategies after expansion 3x: 2'400
Number of strategies related to market 20% (3): 480
Number of rules to check strategies completely: 4'320
Typical number of strategy matches (5%): 24
A brief explanation regarding the derived figures is given
here.
The reduction of trade ticks from 97’000 to 16’000 is
calculated as follows: 700 stocks with 100 ticks per second,
but after filtering (1 out-of 10) only 7’000 ticks remain.
From 2’000 stocks with 10 ticks per second, only 2’000 ticks
remain. And the 7’000 stocks with 1 tick per second all
7’000 ticks are used. This results to 16’000 ticks per second
at the input of the rule processing chain.
Independent from the number of equal rule configurations,
all 16’000 ticks per second are subject of initial rule
checking. Reduction of equal rule configurations in different
strategies (concept 2) ensures that the ticks are analysed only
once, which leads to 16’000 initial rule check operations.
A typical matching ratio of these rule checks is 5%, which
leads to 800 initial rule matches per second.
The next step is to determine all trading strategies where
the matching rules apply. Due to the reduction of equal rule
configurations, this number is typically a multiple of the
initial matching rules. In this scenario we have used a factor
of 3, which leads us to 2’400 related strategies.
Now we take out all strategies which are not assigned to
the market segment or stock exchange of the triggering price
event. We have chosen a 20% ratio between stock event and
strategy assignment. This means, that if the event is an IBM
price tick originated from NYSE, but the related strategy is
assigned to NASDAQ stocks only, then the strategy is not
further investigated in terms of this event. This reduces the
number of strategies that are subject of further investigation
to 480.
If the average trading strategy consists of 10 rule
conditions, then 9 remaining rule checks have to be
performed for all the 480 strategies. This leads to 4’320 rule
checks.
If we assume a 5% likelihood that all remaining rule
conditions are matching, then 24 strategy matches are
determined per second.
Within the next and all following seconds a similar scenario
is processed by system and leads to similar figures, but we
have duplicate strategy match detection in place, that avoids
that the same strategy matches are presented to the user
again and again. Nevertheless, the load from processing the
continuous stream of market data has to be handled before
the duplicate detection takes place.
X. CONCLUSIONS
The architecture pattern presented within this article is the
result of our research in order to stem the challenges
described in the previous chapter. The usage of the SEDA
pattern is an approach to implement a SMA infrastructure
that can be scaled into multiple dimensions, firstly regarding
the ever increasing volume of market data events (increased
traffic from stock exchanges and other data providers) and
secondly the growing number of accounts, strategies and
rule conditions a SMA business must be able to manage.
In addition of using SEDA to implement the CEP for a
single processing node, the proper partitioning of the overall
market data stream is a key success factor to manage the
future market data volumes. The entire market data has to be
distributed across a set of processing nodes, using some kind
of shared-nothing architecture in terms of event processing.
This means that additional markets or market segments must
be handled by additional processing nodes, each using the
same software infrastructure.
XI. AREAS OF FURTHER RESEARCH
Further research has to be done regarding how the system
behaves in real world market situations with different
configurations. The statistical analysis of different market
event types and related reduction of rule checking operations
is in main focus.
Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 6
Another important area might be the adaption of reduction
strategies (see VIII/C) with regard to accuracy of pattern
matching results. Some strategies, like the filtering of input
events, lead to a certain level of loss with regard to matching
events.
Also the filtering of subsequent events of the same type
that has already created a pattern match has to be analysed
with regard to not missing matches with other strategies,
where these events participate within another context.
REFERENCES
[1] “Complex Event Processing”, Wikipedia,
http://en.wikipedia.org/wiki/Complex_event_proces
sing
[2] “Trading Strategy”, Investopedia,
http://www.investopedia.com/terms/t/trading-
strategy.asp
[3] “Architecture Pattern”, Wikipedia,
http://en.wikipedia.org/wiki/Architectural_pattern
[4] “Staged Event-driven Architecture”, Matt Welsh,
Harvard University,
http://www.eecs.harvard.edu/~mdw/proj/seda/ ,
http://www.genmaint.com/what-is-seda-staged-
event-driven-architecture.html

More Related Content

Viewers also liked

Viewers also liked (8)

Lacerillera
LacerilleraLacerillera
Lacerillera
 
Personal References
Personal ReferencesPersonal References
Personal References
 
7 Messages From Wrestling Fans to Haters
7 Messages From Wrestling Fans to Haters7 Messages From Wrestling Fans to Haters
7 Messages From Wrestling Fans to Haters
 
Human development as described in the Quran and Sunnah
Human development as described in the Quran and SunnahHuman development as described in the Quran and Sunnah
Human development as described in the Quran and Sunnah
 
Principales problemas economicos
Principales problemas economicosPrincipales problemas economicos
Principales problemas economicos
 
Gripeopdf
GripeopdfGripeopdf
Gripeopdf
 
Land assignment Act & Rules, Kerala
Land assignment Act & Rules, KeralaLand assignment Act & Rules, Kerala
Land assignment Act & Rules, Kerala
 
Acids and bases
Acids and basesAcids and bases
Acids and bases
 

Similar to Complex Event Processing within a Managed Accounts Platform

FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...IJDKP
 
Data Mining Problems in Retail
Data Mining Problems in RetailData Mining Problems in Retail
Data Mining Problems in RetailIlya Katsov
 
Integration of a Predictive, Continuous Time Neural Network into Securities M...
Integration of a Predictive, Continuous Time Neural Network into Securities M...Integration of a Predictive, Continuous Time Neural Network into Securities M...
Integration of a Predictive, Continuous Time Neural Network into Securities M...Chris Kirk, PhD, FIAP
 
Towards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business ProcessesTowards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business ProcessesAnastasija Nikiforova
 
CostPerform’s multidimensional costing function leads to improved profitability
CostPerform’s multidimensional costing function leads to improved profitabilityCostPerform’s multidimensional costing function leads to improved profitability
CostPerform’s multidimensional costing function leads to improved profitabilityBrian Plowman
 
Adapter marketplace
Adapter marketplaceAdapter marketplace
Adapter marketplacenact27
 
Scalable Transaction Management on Cloud Data Management Systems
Scalable Transaction Management on Cloud Data Management SystemsScalable Transaction Management on Cloud Data Management Systems
Scalable Transaction Management on Cloud Data Management SystemsIOSR Journals
 
Reinforcement learning for electrical markets and the energy transition
Reinforcement learning for electrical markets and the energy transitionReinforcement learning for electrical markets and the energy transition
Reinforcement learning for electrical markets and the energy transitionUniversité de Liège (ULg)
 
Integrated Financial Modeling - MS Excel and VBA for MS Excel
Integrated Financial Modeling - MS Excel and VBA for MS ExcelIntegrated Financial Modeling - MS Excel and VBA for MS Excel
Integrated Financial Modeling - MS Excel and VBA for MS ExcelNalayiram Subramanian
 
Reducing False Positives
Reducing False PositivesReducing False Positives
Reducing False PositivesMayank Johri
 
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...AIRCC Publishing Corporation
 
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...ijcsit
 
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachErik De Monte
 
An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...
An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...
An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...AIRCC Publishing Corporation
 
everis Marcus Evans FRTB Conference 23Feb17
everis Marcus Evans FRTB Conference 23Feb17everis Marcus Evans FRTB Conference 23Feb17
everis Marcus Evans FRTB Conference 23Feb17Jonathan Philp
 
Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...butest
 

Similar to Complex Event Processing within a Managed Accounts Platform (20)

10.1.1.129.1408
10.1.1.129.140810.1.1.129.1408
10.1.1.129.1408
 
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
FREQUENT ITEMSET MINING IN TRANSACTIONAL DATA STREAMS BASED ON QUALITY CONTRO...
 
Data Mining Problems in Retail
Data Mining Problems in RetailData Mining Problems in Retail
Data Mining Problems in Retail
 
Integration of a Predictive, Continuous Time Neural Network into Securities M...
Integration of a Predictive, Continuous Time Neural Network into Securities M...Integration of a Predictive, Continuous Time Neural Network into Securities M...
Integration of a Predictive, Continuous Time Neural Network into Securities M...
 
Management Science
Management ScienceManagement Science
Management Science
 
Towards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business ProcessesTowards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business Processes
 
CostPerform’s multidimensional costing function leads to improved profitability
CostPerform’s multidimensional costing function leads to improved profitabilityCostPerform’s multidimensional costing function leads to improved profitability
CostPerform’s multidimensional costing function leads to improved profitability
 
Adapter marketplace
Adapter marketplaceAdapter marketplace
Adapter marketplace
 
Scalable Transaction Management on Cloud Data Management Systems
Scalable Transaction Management on Cloud Data Management SystemsScalable Transaction Management on Cloud Data Management Systems
Scalable Transaction Management on Cloud Data Management Systems
 
Reinforcement learning for electrical markets and the energy transition
Reinforcement learning for electrical markets and the energy transitionReinforcement learning for electrical markets and the energy transition
Reinforcement learning for electrical markets and the energy transition
 
Supply chain network design
Supply chain network designSupply chain network design
Supply chain network design
 
Supply chain network modelling
Supply chain network modellingSupply chain network modelling
Supply chain network modelling
 
Integrated Financial Modeling - MS Excel and VBA for MS Excel
Integrated Financial Modeling - MS Excel and VBA for MS ExcelIntegrated Financial Modeling - MS Excel and VBA for MS Excel
Integrated Financial Modeling - MS Excel and VBA for MS Excel
 
Reducing False Positives
Reducing False PositivesReducing False Positives
Reducing False Positives
 
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
 
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
 
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
 
An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...
An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...
An Adaptive Network-Based Approach for Advanced Forecasting of Cryptocurrency...
 
everis Marcus Evans FRTB Conference 23Feb17
everis Marcus Evans FRTB Conference 23Feb17everis Marcus Evans FRTB Conference 23Feb17
everis Marcus Evans FRTB Conference 23Feb17
 
Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...Use of data mining techniques in the discovery of spatial and ...
Use of data mining techniques in the discovery of spatial and ...
 

Complex Event Processing within a Managed Accounts Platform

  • 1. Complex Event Processing within a Managed Accounts Platform M. Dieckmann Stocksinside Ltd, London, UK This document describes a pattern how to process real-time market events against a huge number of trading strategies within the context of complex event processing and separately managed accounts. The main challenge is to define an efficient parallel processing architecture for market data analysis that could deal with millions of rule checks per second, triggered by a broad stream of market events and influenced by a huge number of different trading strategies. Another problem within this area is the efficient reduction of overall rule checks without violating the completeness of market data analysis. This means, that filtering of the input events can significantly reduce the number of processed rule checks, but always at the burden of missing trading opportunities. I. INTRODUCTION A key advantage of separately managed accounts (SMA) as a financial product is the execution of individual trading strategies according to the investment preferences of the account owner. Within this context, a trading strategy defines the “what” and the “when” to buy or to sell. The selection of the right financial instruments, at the right point in time is a key element of any investment strategy. But doing the research for only a single SMA with a single trading strategy can be hard work and if not automated, the operation of many SMA, each with many different trading strategies can be almost impossible. Managed Account Platforms (MAP), are software applications that support the automation of the account management process, in terms of asset allocation, risk management and reporting. Each SMA could have multiple trading strategies defined, which have to be controlled by the MAP regarding the execution of trade actions. The system has to detect, if a security or other financial instrument is matching the strategy and has to be bought or sold according the strategy. This requires the uninterrupted and focused supervision of the stock market and finally the generation of trade signals to take appropriate action. One can roughly estimate the necessary processing power, when considering dozens of conditions per trading strategy (e.g. 100), having ten or more strategies per SMA (10), multiplied by thousands of SMAs (103 ) and checked against only a million of market data events per second. This example would result in 1012 condition checks per second. A very effective reduction of calculations has to be applied in order to not overwhelm the MAP. This area of concern – widely known as Complex Event Processing – is the subject of the pattern described within this paper. II. COMPLEX EVENT PROCESSING The term Complex Event Processing (CEP) is usually defined as analysing real-time streams of data to detect meaningful events or patterns and take appropriate action [1]. The term “Complex” within CEP is used, because data of multiple sources and types has to be analysed by large rule sets. An event can be defined as a pattern match, where the pattern which consists of multiple rules with conditions and related thresholds. Typically, the data of the real-time stream is compared with historical data within the rule conditions. In the financial service industry, the real-time data is mostly a market data stream, whereas the historical data is based on a time series database. Figure 1 : High-level diagram of CEP data flows within the SMA domain. The market data stream is analysed by the help of defined trading strategies and their rules as well as historical data to create trading signals. III. PROBLEM AND OBJECTIVES The major problem of designing a CEP system within the context of separately managed accounts is processing a large set of trading strategies, each of it consisting of many rules, whose conditions can contain calculated values on-the-fly, at the rate of the incoming stream. As already mentioned in the introduction, processing the unfiltered amount of rules doesn’t work well, due to the humongous number of rule check operations in a typical scenario. That’s why the number of rule checks has to be massively reduced and the overall process has to parallelized as good as possible. The objectives addressed within this paper are: • Design a CEP approach to run a huge number of trading strategies within the context of MAP. • Utilize a parallel processing architecture pattern to efficiently distribute the calculation load. • Significant reduction of calculation steps while processing the stream of market data events.
  • 2. Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 2 IV. TRADING STRATEGIES Before we go into the details of system design, the theory and definition of trading strategies should be explained within this chapter. The term “Trading Strategy” can be defined as follows: A set of objective rules designating the conditions that must be met for trade entries and exits to occur [2]. Since a trading strategy is defined as a set of rules, a closer look on these elements of a strategy should be provided. A single rule, used as a building block of a strategy can be defined as a Boolean-valued function, that maps a vector of the input domain X (x1 to xn) to a Boolean set B = {0, 1}. 𝑓: 𝑋 → 𝐵 A rule can also be described as a condition in Boolean algebra, with a value compared to another value by the help of a relational operator. These values are usually the result of a function (e.g. moving average of prices or price/earnings ratio). The Boolean function ƒ(X) equals 1, if the condition is true. Example: last price > SMA (10d) * 1.10 (SMA = simple moving average) A trading strategy usually consists of many conditions, that all have to be fulfilled in order to represent a strategy match. This can be expressed by the following Boolean- valued function, where a k-ary Boolean condition is mapped to the result set B. 𝑓: 𝐵 𝑘 → 𝐵 Every k-ary Boolean function can be expressed as a propositional formula with k propositions (conditions) connected via Boolean operators (e.g. conjunction, etc.). Strategy: c1 ˄ c2 ˄ . . . ˄ ck = 1 (if c1 to ck = 1) An example from the market data analysis area could be: P/E < 10 AND Last-Price < SMA (10) * (1 – 0.1) Which means, if the price/earnings ratio of a company is smaller than 10 and the last price is smaller than 90% the simple moving average of the last 10 days, then the Boolean algebra equation is true (1) and the strategy is considered as matched. Trading strategies in real world scenarios usually have 10 to 20 conditions from various domains and incorporate a similar set of complex functions (e.g. SMA) within the conditions. V. ARCHITECTURE PATTERNS An architectural pattern is a concept that solves and delineates some essential cohesive elements of software architecture [3]. In the area of parallel processing, the “pipeline” pattern is a well-known architecture pattern to parallelize calculations. It’s based on the assumption that complex data processing can be separated into stages, which are executed in parallel. Figure 2 : Data chunks (D1-3) arriving at the left side of the pipeline are processed by several stages (S1-3). When S1 has finished processing D1, this data chunk is forwarded to S2 and S1 starts processing D2 and so on. In a continuous stream of data chunks Dx, each stage Sn is always processing a certain data chunk. This pattern works well, if the various stages require similar time to process each data chunk. If one stage needs significantly much more time to complete, then the data chunks queue up before that stage and the overall throughput is limited by the slowest stage. To mitigate this effect, the stages themselves have to provide parallel processing capabilities. This can be done by assigning multiple threads (thread pool) to each stage. Like in the pipe-and-filter pattern, the edges between the stages are realized by message queues, which lead to the following processing pattern. Figure 3 : Stages Sx are separated by input queues Qx and contain multiple threads (m, n, k), which read messages from the queues, whenever they are idle. The architecture pattern illustrated in figure 3 is also called staged event-driven architecture (SEDA) and refers to an approach that decomposes a complex, event-driven application into a set of stages connected by queues [4]. This pattern guarantees that the processing of a multiple data chunks is optimized in such a way, that other stages do not wait, while a data chunk Di is processed by a certain stage. Additional parallelization can be achieved by using multiple pipelines and input stream partitioning. S1 S2 S3 D1 D1 D2 D1 D2 D3 S1 T1 T2 Tm S2 T1 T2 Tn S3 T1 T2 Tk Q1 Q2 Q3 . . . . . . . . .
  • 3. Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 3 VI. LOGICAL DATA MODEL Within the context of a managed accounts platform, the following logical data model is the foundation of the described processing of market data events. Figure 4 : Logical Data Model of CEP/MAP approach. Each trading strategy consists of multiple business rules and related configuration objects that contain the rule parameters (e.g. thresholds and function parameters). Each pattern match refers to a single financial instrument. In stream processing systems the entire computation starts with the reception of input data objects, which is in this case any market data event that is part of the inbound stream. Market data events belong to a financial instrument, and in the model illustrated above, they also belong to business rule configurations (BRC), which are part of trading strategies. When a market data event arrives, the computational navigation through the logical data model starts with the evaluation of related business rule configurations and the concrete business rule conditions to be checked. If a rule condition matches, the remaining business rules of the related trading strategy definition are checked as well. If all business rules of a trading strategy are matching, a pattern match is created. This kind of object already represents a trading signal to take action upon. Since the original market data event is related to a financial instrument, the pattern match is related to the same. If the user really decides to take action, the financial instrument becomes part of a portfolio, which is again tied to a managed account on behalf of the user. Trading strategies are usually not operated without any reference to a certain market and that’s the reason why they are assigned to a market segment, which again contains any number of financial instruments. This means, that trading strategies within our model are related to the instruments of a market segment and that different strategy configurations can be executed within different market segments. The reason behind this is that the same strategy from a structural point of view requires slightly different parameters, if applied in different market segments. VII. INPUT STREAM PARTITIONING Within the context of complex event processing a market data stream consists of an unlimited number of continuous market data events that are related to a limited number of financial instruments. If the single financial instrument is taken as sorting criteria, the overall stream can be seen as a limited number of logical streams, each comprising all kinds of events related to a single financial instrument. Input stream partitioning in that context implies the grouping of logical streams into sets of financial instruments to be processed by dedicated pipelines. Figure 5 : Market events related to a financial instrument FIk, k = {1..M}, which represent a single logical stream, can be further divided into logical sub-streams, containing events of certain event types ETr, r = {1..N}. To retain the logical order of the events within a logical stream and to achieve highest processing throughput of that stream, this stream must be processed by a dedicated stream processing pipeline. If the events of different sub-streams are uncorrelated from a timely perspective and certain event types could cause significant delay, the processing of these sub-streams could be distributed across several specialized processing pipelines. In our case all sub-streams (event types) of a logical stream are processed by the same pipeline instance. VIII. SYSTEM ARCHITECTURE This chapter describes different aspects of the system architecture. A. Overview The main objective of any CEP platform is to determine pattern matches out of a stream of data. The following figure shows the main building blocks of the system described within this paper. Figure 6 : The figure shows a simplified view of the system architecture of the CEP platform. Beginning at the left side, several market data sources are normalized and provide a stream of market events to the CEP component, which determines pattern matches. These are then delivered to the particular users. Business Rule Configuration Trading Strategy Market Segment Portfolio Financial Instrument Pattern Match Managed Account * * * * * 1 11 11 * 1 1 11 Market Data Event Managed Account Platform* 11 FI1 FIM ET1 ET2 ETN ET1 ET2 ETN Logical Stream 1 Logical Stream M Market Events
  • 4. Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 4 Typical for CEP applications is a normalized stream of data created from different data sources. A normalization component is responsible to create this input stream, which is forwarded to the CEP engine and the historical data store. As already mentioned in the chapter before, this overall stream can be seen as set of many logical streams, each related to a single financial instrument. Historical data is often needed to calculate complex parameters, used within rule condition checks and compared with current data extracted from the stream. For example the 50-day moving average of prices is a function that requires historical price data (e.g. closing prices). The CEP engine is processing the input data stream by continuously checking all rule conditions of all strategies. If all rule conditions of a strategy match, a strategy pattern match has been occurred, this is then stored within a database for late retrieval and forwarded to a delivery component for direct notification of related users. The delivery component normally has to use historical data to visualize pattern matches. The major part of processing and complexity lies within the CEP component, which is described in more detail within the next sections. B. Event Processing Pipeline The determination of trading signals from market events is a multi-stage process that can be seen as a processing pipeline. The individual processing steps are illustrated within the next figure. Figure 7 : The figure shows the activities and result objects within the market event processing pipeline. When a market event arrives at the beginning of the pipeline the following steps are executed: 1. The market event is categorized by type and the related business rule types are determined. In addition all related business rule configurations are selected. These are concrete rule specifications that are part of real strategy definitions and contain rule condition parameters to be used within rule checking1 . 2. All business rule configurations are evaluated during the rule checking activity. This means, that the parameters of a single business rule configuration together with the current market event are loaded into 1 Only those rule configurations are selected, that are part of a strategy, which should be applied to this market event from an instrument type perspective (e.g. option price ≈ option strategy). the check function of the business rule and evaluated. If the condition of the rule check is fulfilled, the business rule configuration is considered a matching rule configuration. 3. Based on the matching business rule configurations, the related strategy definitions are determined. 4. For each strategy the remaining business rule configurations are determined. Usually, a trading strategy consists of many business rule configurations that all have to be checked in order to detect a strategy match that is equal to a trading signal. 5. For each strategy the unchecked business rule configurations are checked. If any of the remaining rule checks fails, the checking is aborted for the particular strategy. 6. For all trading strategies with completely checked business rule configurations, a trading pattern match also known as trade signal is created and persisted. C. Reduce Processing Complexity As already mentioned, the reduction of processing complexity is crucial for a proper system design that can achieve the objectives. The following multi-level approach to reduce the overall amount of work has been selected. 1) The first strategy to reduce the number of condition checks is the filtering of similar input events. If for instance two quotes are very close together, one might consider checking only the first one and dropping the second one due to similar pattern matching expectations. Certainly, there is the possibility that particular pattern matches are not detected, since the first event doesn’t trigger the match and the second event is not processed due to filtering. The rate of losses depends on the chosen difference in order to drop subsequent events. This is the reason why this approach is a compromise between reduction of processing and accuracy of event detection. If applied, this type of filtering is done at the beginning of the CEP pipeline (figure 8). 2) Rule condition checks of the same rule, with the same condition threshold values are only done once, even if the rules belong to different strategies or SMA. This is because within a huge set of SMA, the repeating usage of the same strategies is very likely, especially if a strategy has proven superior. This reduction is done in the first step “Determine related unique business rule configuration”, where BRCs with equal parameters are simply discarded. At a later stage of processing, rule matches are reassigned to those strategies, where they were removed at this stage of processing. 3) The last concept to further reduce rule checking is the filtering of strategies that are not assigned to a certain type of market event. For example if a strategy is marked to be designed for options, a stock price message won’t trigger the execution of the rule checking mechanism. All business rule
  • 5. Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 5 configurations that are part of option strategies will be discarded before rule checking. Figure 8 : This figure shows the three stages of reducing rule checks within the market event processing pipeline. IX. VOLUME ESTIMATIONS This chapter provides a rough estimation about real world figures within a scenario of stock-based trading strategies to be applied in the US stock market. The following table shows initial figures and some derived estimations: Initial Figures (Assumptions) Number of stocks (major US markets): 10'000 Number of stocks with 100 trade ticks per sec: 700 Number of stocks with 10 trade ticks per sec: 2'000 Number of stocks with 1 trade ticks per sec: 7'000 Total number of trades per second: 97'000 Number of SMA (accounts): 1'000 Number of strategies per SMA: 3 Number of rules per strategy: 10 Derived Figures Number of primary rule checks per second without any reduction = number of trade ticks: 97'000 Number of different trade ticks after filtering (strategy 1, 1-out-of-10): 16'000 Number of rule checks due to strategy 2: 16'000 Number of rule matches in percent: 5.0% Number of event triggered rule matches: 800 Number of related strategies after expansion 3x: 2'400 Number of strategies related to market 20% (3): 480 Number of rules to check strategies completely: 4'320 Typical number of strategy matches (5%): 24 A brief explanation regarding the derived figures is given here. The reduction of trade ticks from 97’000 to 16’000 is calculated as follows: 700 stocks with 100 ticks per second, but after filtering (1 out-of 10) only 7’000 ticks remain. From 2’000 stocks with 10 ticks per second, only 2’000 ticks remain. And the 7’000 stocks with 1 tick per second all 7’000 ticks are used. This results to 16’000 ticks per second at the input of the rule processing chain. Independent from the number of equal rule configurations, all 16’000 ticks per second are subject of initial rule checking. Reduction of equal rule configurations in different strategies (concept 2) ensures that the ticks are analysed only once, which leads to 16’000 initial rule check operations. A typical matching ratio of these rule checks is 5%, which leads to 800 initial rule matches per second. The next step is to determine all trading strategies where the matching rules apply. Due to the reduction of equal rule configurations, this number is typically a multiple of the initial matching rules. In this scenario we have used a factor of 3, which leads us to 2’400 related strategies. Now we take out all strategies which are not assigned to the market segment or stock exchange of the triggering price event. We have chosen a 20% ratio between stock event and strategy assignment. This means, that if the event is an IBM price tick originated from NYSE, but the related strategy is assigned to NASDAQ stocks only, then the strategy is not further investigated in terms of this event. This reduces the number of strategies that are subject of further investigation to 480. If the average trading strategy consists of 10 rule conditions, then 9 remaining rule checks have to be performed for all the 480 strategies. This leads to 4’320 rule checks. If we assume a 5% likelihood that all remaining rule conditions are matching, then 24 strategy matches are determined per second. Within the next and all following seconds a similar scenario is processed by system and leads to similar figures, but we have duplicate strategy match detection in place, that avoids that the same strategy matches are presented to the user again and again. Nevertheless, the load from processing the continuous stream of market data has to be handled before the duplicate detection takes place. X. CONCLUSIONS The architecture pattern presented within this article is the result of our research in order to stem the challenges described in the previous chapter. The usage of the SEDA pattern is an approach to implement a SMA infrastructure that can be scaled into multiple dimensions, firstly regarding the ever increasing volume of market data events (increased traffic from stock exchanges and other data providers) and secondly the growing number of accounts, strategies and rule conditions a SMA business must be able to manage. In addition of using SEDA to implement the CEP for a single processing node, the proper partitioning of the overall market data stream is a key success factor to manage the future market data volumes. The entire market data has to be distributed across a set of processing nodes, using some kind of shared-nothing architecture in terms of event processing. This means that additional markets or market segments must be handled by additional processing nodes, each using the same software infrastructure. XI. AREAS OF FURTHER RESEARCH Further research has to be done regarding how the system behaves in real world market situations with different configurations. The statistical analysis of different market event types and related reduction of rule checking operations is in main focus.
  • 6. Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 6 Another important area might be the adaption of reduction strategies (see VIII/C) with regard to accuracy of pattern matching results. Some strategies, like the filtering of input events, lead to a certain level of loss with regard to matching events. Also the filtering of subsequent events of the same type that has already created a pattern match has to be analysed with regard to not missing matches with other strategies, where these events participate within another context. REFERENCES [1] “Complex Event Processing”, Wikipedia, http://en.wikipedia.org/wiki/Complex_event_proces sing [2] “Trading Strategy”, Investopedia, http://www.investopedia.com/terms/t/trading- strategy.asp [3] “Architecture Pattern”, Wikipedia, http://en.wikipedia.org/wiki/Architectural_pattern [4] “Staged Event-driven Architecture”, Matt Welsh, Harvard University, http://www.eecs.harvard.edu/~mdw/proj/seda/ , http://www.genmaint.com/what-is-seda-staged- event-driven-architecture.html