Use of data mining techniques in the discovery of spatial and ...
Complex Event Processing within a Managed Accounts Platform
1. Complex Event Processing within a Managed Accounts Platform
M. Dieckmann
Stocksinside Ltd, London, UK
This document describes a pattern how to process real-time market events against a huge number of trading strategies
within the context of complex event processing and separately managed accounts. The main challenge is to define an
efficient parallel processing architecture for market data analysis that could deal with millions of rule checks per
second, triggered by a broad stream of market events and influenced by a huge number of different trading strategies.
Another problem within this area is the efficient reduction of overall rule checks without violating the completeness
of market data analysis. This means, that filtering of the input events can significantly reduce the number of processed
rule checks, but always at the burden of missing trading opportunities.
I. INTRODUCTION
A key advantage of separately managed accounts (SMA)
as a financial product is the execution of individual trading
strategies according to the investment preferences of the
account owner. Within this context, a trading strategy
defines the “what” and the “when” to buy or to sell. The
selection of the right financial instruments, at the right point
in time is a key element of any investment strategy.
But doing the research for only a single SMA with a single
trading strategy can be hard work and if not automated, the
operation of many SMA, each with many different trading
strategies can be almost impossible.
Managed Account Platforms (MAP), are software
applications that support the automation of the account
management process, in terms of asset allocation, risk
management and reporting. Each SMA could have multiple
trading strategies defined, which have to be controlled by the
MAP regarding the execution of trade actions. The system
has to detect, if a security or other financial instrument is
matching the strategy and has to be bought or sold according
the strategy.
This requires the uninterrupted and focused supervision of
the stock market and finally the generation of trade signals to
take appropriate action. One can roughly estimate the
necessary processing power, when considering dozens of
conditions per trading strategy (e.g. 100), having ten or more
strategies per SMA (10), multiplied by thousands of SMAs
(103
) and checked against only a million of market data
events per second. This example would result in 1012
condition checks per second. A very effective reduction of
calculations has to be applied in order to not overwhelm the
MAP. This area of concern – widely known as Complex
Event Processing – is the subject of the pattern described
within this paper.
II. COMPLEX EVENT PROCESSING
The term Complex Event Processing (CEP) is usually
defined as analysing real-time streams of data to detect
meaningful events or patterns and take appropriate action
[1]. The term “Complex” within CEP is used, because data
of multiple sources and types has to be analysed by large
rule sets.
An event can be defined as a pattern match, where the
pattern which consists of multiple rules with conditions and
related thresholds. Typically, the data of the real-time stream
is compared with historical data within the rule conditions.
In the financial service industry, the real-time data is mostly
a market data stream, whereas the historical data is based on
a time series database.
Figure 1 : High-level diagram of CEP data flows within the
SMA domain. The market data stream is analysed by the help of
defined trading strategies and their rules as well as historical data to
create trading signals.
III. PROBLEM AND OBJECTIVES
The major problem of designing a CEP system within the
context of separately managed accounts is processing a large
set of trading strategies, each of it consisting of many rules,
whose conditions can contain calculated values on-the-fly, at
the rate of the incoming stream.
As already mentioned in the introduction, processing the
unfiltered amount of rules doesn’t work well, due to the
humongous number of rule check operations in a typical
scenario. That’s why the number of rule checks has to be
massively reduced and the overall process has to parallelized
as good as possible.
The objectives addressed within this paper are:
• Design a CEP approach to run a huge number of
trading strategies within the context of MAP.
• Utilize a parallel processing architecture pattern
to efficiently distribute the calculation load.
• Significant reduction of calculation steps while
processing the stream of market data events.
2. Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 2
IV. TRADING STRATEGIES
Before we go into the details of system design, the theory
and definition of trading strategies should be explained
within this chapter.
The term “Trading Strategy” can be defined as follows: A
set of objective rules designating the conditions that must be
met for trade entries and exits to occur [2].
Since a trading strategy is defined as a set of rules, a closer
look on these elements of a strategy should be provided.
A single rule, used as a building block of a strategy can be
defined as a Boolean-valued function, that maps a vector of
the input domain X (x1 to xn) to a Boolean set B = {0, 1}.
𝑓: 𝑋 → 𝐵
A rule can also be described as a condition in Boolean
algebra, with a value compared to another value by the help
of a relational operator. These values are usually the result of
a function (e.g. moving average of prices or price/earnings
ratio). The Boolean function ƒ(X) equals 1, if the condition
is true.
Example: last price > SMA (10d) * 1.10
(SMA = simple moving average)
A trading strategy usually consists of many conditions,
that all have to be fulfilled in order to represent a strategy
match. This can be expressed by the following Boolean-
valued function, where a k-ary Boolean condition is mapped
to the result set B.
𝑓: 𝐵 𝑘
→ 𝐵
Every k-ary Boolean function can be expressed as a
propositional formula with k propositions (conditions)
connected via Boolean operators (e.g. conjunction, etc.).
Strategy: c1 ˄ c2 ˄ . . . ˄ ck = 1 (if c1 to ck = 1)
An example from the market data analysis area could be:
P/E < 10 AND Last-Price < SMA (10) * (1 – 0.1)
Which means, if the price/earnings ratio of a company is
smaller than 10 and the last price is smaller than 90% the
simple moving average of the last 10 days, then the Boolean
algebra equation is true (1) and the strategy is considered as
matched.
Trading strategies in real world scenarios usually have 10
to 20 conditions from various domains and incorporate a
similar set of complex functions (e.g. SMA) within the
conditions.
V. ARCHITECTURE PATTERNS
An architectural pattern is a concept that solves and
delineates some essential cohesive elements of software
architecture [3].
In the area of parallel processing, the “pipeline” pattern is
a well-known architecture pattern to parallelize calculations.
It’s based on the assumption that complex data processing
can be separated into stages, which are executed in parallel.
Figure 2 : Data chunks (D1-3) arriving at the left side of the
pipeline are processed by several stages (S1-3). When S1 has
finished processing D1, this data chunk is forwarded to S2 and S1
starts processing D2 and so on. In a continuous stream of data
chunks Dx, each stage Sn is always processing a certain data chunk.
This pattern works well, if the various stages require
similar time to process each data chunk. If one stage needs
significantly much more time to complete, then the data
chunks queue up before that stage and the overall throughput
is limited by the slowest stage.
To mitigate this effect, the stages themselves have to
provide parallel processing capabilities. This can be done by
assigning multiple threads (thread pool) to each stage. Like
in the pipe-and-filter pattern, the edges between the stages
are realized by message queues, which lead to the following
processing pattern.
Figure 3 : Stages Sx are separated by input queues Qx and
contain multiple threads (m, n, k), which read messages from the
queues, whenever they are idle.
The architecture pattern illustrated in figure 3 is also
called staged event-driven architecture (SEDA) and refers to
an approach that decomposes a complex, event-driven
application into a set of stages connected by queues [4].
This pattern guarantees that the processing of a multiple
data chunks is optimized in such a way, that other stages do
not wait, while a data chunk Di is processed by a certain
stage. Additional parallelization can be achieved by using
multiple pipelines and input stream partitioning.
S1 S2
S3
D1
D1
D2
D1
D2
D3
S1
T1
T2
Tm
S2
T1
T2
Tn
S3
T1
T2
Tk
Q1 Q2
Q3
.
.
.
.
.
.
.
.
.
3. Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 3
VI. LOGICAL DATA MODEL
Within the context of a managed accounts platform, the
following logical data model is the foundation of the
described processing of market data events.
Figure 4 : Logical Data Model of CEP/MAP approach. Each
trading strategy consists of multiple business rules and related
configuration objects that contain the rule parameters (e.g.
thresholds and function parameters). Each pattern match refers to a
single financial instrument.
In stream processing systems the entire computation starts
with the reception of input data objects, which is in this case
any market data event that is part of the inbound stream.
Market data events belong to a financial instrument, and in
the model illustrated above, they also belong to business rule
configurations (BRC), which are part of trading strategies.
When a market data event arrives, the computational
navigation through the logical data model starts with the
evaluation of related business rule configurations and the
concrete business rule conditions to be checked. If a rule
condition matches, the remaining business rules of the
related trading strategy definition are checked as well.
If all business rules of a trading strategy are matching, a
pattern match is created. This kind of object already
represents a trading signal to take action upon. Since the
original market data event is related to a financial
instrument, the pattern match is related to the same. If the
user really decides to take action, the financial instrument
becomes part of a portfolio, which is again tied to a managed
account on behalf of the user.
Trading strategies are usually not operated without any
reference to a certain market and that’s the reason why they
are assigned to a market segment, which again contains any
number of financial instruments. This means, that trading
strategies within our model are related to the instruments of
a market segment and that different strategy configurations
can be executed within different market segments. The
reason behind this is that the same strategy from a structural
point of view requires slightly different parameters, if
applied in different market segments.
VII. INPUT STREAM PARTITIONING
Within the context of complex event processing a market
data stream consists of an unlimited number of continuous
market data events that are related to a limited number of
financial instruments. If the single financial instrument is
taken as sorting criteria, the overall stream can be seen as a
limited number of logical streams, each comprising all kinds
of events related to a single financial instrument.
Input stream partitioning in that context implies the
grouping of logical streams into sets of financial instruments
to be processed by dedicated pipelines.
Figure 5 : Market events related to a financial instrument FIk,
k = {1..M}, which represent a single logical stream, can be further
divided into logical sub-streams, containing events of certain event
types ETr, r = {1..N}.
To retain the logical order of the events within a logical
stream and to achieve highest processing throughput of that
stream, this stream must be processed by a dedicated stream
processing pipeline. If the events of different sub-streams are
uncorrelated from a timely perspective and certain event
types could cause significant delay, the processing of these
sub-streams could be distributed across several specialized
processing pipelines.
In our case all sub-streams (event types) of a logical
stream are processed by the same pipeline instance.
VIII. SYSTEM ARCHITECTURE
This chapter describes different aspects of the system
architecture.
A. Overview
The main objective of any CEP platform is to determine
pattern matches out of a stream of data. The following figure
shows the main building blocks of the system described
within this paper.
Figure 6 : The figure shows a simplified view of the system
architecture of the CEP platform. Beginning at the left side, several
market data sources are normalized and provide a stream of market
events to the CEP component, which determines pattern matches.
These are then delivered to the particular users.
Business Rule
Configuration
Trading
Strategy
Market
Segment
Portfolio
Financial
Instrument
Pattern
Match
Managed
Account
*
*
*
*
*
1
11
11
*
1
1
11
Market Data
Event
Managed Account
Platform* 11
FI1
FIM
ET1
ET2
ETN
ET1
ET2
ETN
Logical
Stream 1
Logical
Stream M
Market Events
4. Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 4
Typical for CEP applications is a normalized stream of
data created from different data sources. A normalization
component is responsible to create this input stream, which
is forwarded to the CEP engine and the historical data store.
As already mentioned in the chapter before, this overall
stream can be seen as set of many logical streams, each
related to a single financial instrument.
Historical data is often needed to calculate complex
parameters, used within rule condition checks and compared
with current data extracted from the stream. For example the
50-day moving average of prices is a function that requires
historical price data (e.g. closing prices).
The CEP engine is processing the input data stream by
continuously checking all rule conditions of all strategies. If
all rule conditions of a strategy match, a strategy pattern
match has been occurred, this is then stored within a
database for late retrieval and forwarded to a delivery
component for direct notification of related users. The
delivery component normally has to use historical data to
visualize pattern matches.
The major part of processing and complexity lies within
the CEP component, which is described in more detail
within the next sections.
B. Event Processing Pipeline
The determination of trading signals from market events is
a multi-stage process that can be seen as a processing
pipeline. The individual processing steps are illustrated
within the next figure.
Figure 7 : The figure shows the activities and result objects
within the market event processing pipeline.
When a market event arrives at the beginning of the
pipeline the following steps are executed:
1. The market event is categorized by type and the
related business rule types are determined. In addition
all related business rule configurations are selected.
These are concrete rule specifications that are part of
real strategy definitions and contain rule condition
parameters to be used within rule checking1
.
2. All business rule configurations are evaluated during
the rule checking activity. This means, that the
parameters of a single business rule configuration
together with the current market event are loaded into
1
Only those rule configurations are selected, that are part of a
strategy, which should be applied to this market event from an
instrument type perspective (e.g. option price ≈ option strategy).
the check function of the business rule and evaluated.
If the condition of the rule check is fulfilled, the
business rule configuration is considered a matching
rule configuration.
3. Based on the matching business rule configurations,
the related strategy definitions are determined.
4. For each strategy the remaining business rule
configurations are determined. Usually, a trading
strategy consists of many business rule configurations
that all have to be checked in order to detect a
strategy match that is equal to a trading signal.
5. For each strategy the unchecked business rule
configurations are checked. If any of the remaining
rule checks fails, the checking is aborted for the
particular strategy.
6. For all trading strategies with completely checked
business rule configurations, a trading pattern match
also known as trade signal is created and persisted.
C. Reduce Processing Complexity
As already mentioned, the reduction of processing
complexity is crucial for a proper system design that can
achieve the objectives. The following multi-level approach
to reduce the overall amount of work has been selected.
1) The first strategy to reduce the number of condition
checks is the filtering of similar input events. If for instance
two quotes are very close together, one might consider
checking only the first one and dropping the second one due
to similar pattern matching expectations.
Certainly, there is the possibility that particular pattern
matches are not detected, since the first event doesn’t trigger
the match and the second event is not processed due to
filtering. The rate of losses depends on the chosen difference
in order to drop subsequent events. This is the reason why
this approach is a compromise between reduction of
processing and accuracy of event detection.
If applied, this type of filtering is done at the beginning of
the CEP pipeline (figure 8).
2) Rule condition checks of the same rule, with the same
condition threshold values are only done once, even if the
rules belong to different strategies or SMA. This is because
within a huge set of SMA, the repeating usage of the same
strategies is very likely, especially if a strategy has proven
superior.
This reduction is done in the first step “Determine related
unique business rule configuration”, where BRCs with equal
parameters are simply discarded. At a later stage of
processing, rule matches are reassigned to those strategies,
where they were removed at this stage of processing.
3) The last concept to further reduce rule checking is the
filtering of strategies that are not assigned to a certain type
of market event. For example if a strategy is marked to be
designed for options, a stock price message won’t trigger the
execution of the rule checking mechanism. All business rule
5. Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 5
configurations that are part of option strategies will be
discarded before rule checking.
Figure 8 : This figure shows the three stages of reducing rule
checks within the market event processing pipeline.
IX. VOLUME ESTIMATIONS
This chapter provides a rough estimation about real world
figures within a scenario of stock-based trading strategies to
be applied in the US stock market.
The following table shows initial figures and some derived
estimations:
Initial Figures (Assumptions)
Number of stocks (major US markets): 10'000
Number of stocks with 100 trade ticks per sec: 700
Number of stocks with 10 trade ticks per sec: 2'000
Number of stocks with 1 trade ticks per sec: 7'000
Total number of trades per second: 97'000
Number of SMA (accounts): 1'000
Number of strategies per SMA: 3
Number of rules per strategy: 10
Derived Figures
Number of primary rule checks per second
without any reduction = number of trade ticks:
97'000
Number of different trade ticks after filtering
(strategy 1, 1-out-of-10):
16'000
Number of rule checks due to strategy 2: 16'000
Number of rule matches in percent: 5.0%
Number of event triggered rule matches: 800
Number of related strategies after expansion 3x: 2'400
Number of strategies related to market 20% (3): 480
Number of rules to check strategies completely: 4'320
Typical number of strategy matches (5%): 24
A brief explanation regarding the derived figures is given
here.
The reduction of trade ticks from 97’000 to 16’000 is
calculated as follows: 700 stocks with 100 ticks per second,
but after filtering (1 out-of 10) only 7’000 ticks remain.
From 2’000 stocks with 10 ticks per second, only 2’000 ticks
remain. And the 7’000 stocks with 1 tick per second all
7’000 ticks are used. This results to 16’000 ticks per second
at the input of the rule processing chain.
Independent from the number of equal rule configurations,
all 16’000 ticks per second are subject of initial rule
checking. Reduction of equal rule configurations in different
strategies (concept 2) ensures that the ticks are analysed only
once, which leads to 16’000 initial rule check operations.
A typical matching ratio of these rule checks is 5%, which
leads to 800 initial rule matches per second.
The next step is to determine all trading strategies where
the matching rules apply. Due to the reduction of equal rule
configurations, this number is typically a multiple of the
initial matching rules. In this scenario we have used a factor
of 3, which leads us to 2’400 related strategies.
Now we take out all strategies which are not assigned to
the market segment or stock exchange of the triggering price
event. We have chosen a 20% ratio between stock event and
strategy assignment. This means, that if the event is an IBM
price tick originated from NYSE, but the related strategy is
assigned to NASDAQ stocks only, then the strategy is not
further investigated in terms of this event. This reduces the
number of strategies that are subject of further investigation
to 480.
If the average trading strategy consists of 10 rule
conditions, then 9 remaining rule checks have to be
performed for all the 480 strategies. This leads to 4’320 rule
checks.
If we assume a 5% likelihood that all remaining rule
conditions are matching, then 24 strategy matches are
determined per second.
Within the next and all following seconds a similar scenario
is processed by system and leads to similar figures, but we
have duplicate strategy match detection in place, that avoids
that the same strategy matches are presented to the user
again and again. Nevertheless, the load from processing the
continuous stream of market data has to be handled before
the duplicate detection takes place.
X. CONCLUSIONS
The architecture pattern presented within this article is the
result of our research in order to stem the challenges
described in the previous chapter. The usage of the SEDA
pattern is an approach to implement a SMA infrastructure
that can be scaled into multiple dimensions, firstly regarding
the ever increasing volume of market data events (increased
traffic from stock exchanges and other data providers) and
secondly the growing number of accounts, strategies and
rule conditions a SMA business must be able to manage.
In addition of using SEDA to implement the CEP for a
single processing node, the proper partitioning of the overall
market data stream is a key success factor to manage the
future market data volumes. The entire market data has to be
distributed across a set of processing nodes, using some kind
of shared-nothing architecture in terms of event processing.
This means that additional markets or market segments must
be handled by additional processing nodes, each using the
same software infrastructure.
XI. AREAS OF FURTHER RESEARCH
Further research has to be done regarding how the system
behaves in real world market situations with different
configurations. The statistical analysis of different market
event types and related reduction of rule checking operations
is in main focus.
6. Nov. 2008 Complex Event Processing within a Managed Account Platform pg. 6
Another important area might be the adaption of reduction
strategies (see VIII/C) with regard to accuracy of pattern
matching results. Some strategies, like the filtering of input
events, lead to a certain level of loss with regard to matching
events.
Also the filtering of subsequent events of the same type
that has already created a pattern match has to be analysed
with regard to not missing matches with other strategies,
where these events participate within another context.
REFERENCES
[1] “Complex Event Processing”, Wikipedia,
http://en.wikipedia.org/wiki/Complex_event_proces
sing
[2] “Trading Strategy”, Investopedia,
http://www.investopedia.com/terms/t/trading-
strategy.asp
[3] “Architecture Pattern”, Wikipedia,
http://en.wikipedia.org/wiki/Architectural_pattern
[4] “Staged Event-driven Architecture”, Matt Welsh,
Harvard University,
http://www.eecs.harvard.edu/~mdw/proj/seda/ ,
http://www.genmaint.com/what-is-seda-staged-
event-driven-architecture.html