Entity Profiling and Collusion Detection

iesl/pub/guide
1 ENGINEER
A Novel Entity Profiling and Collusion Detection
Algorithm
Abstract: Ensuring an efficient market and a level playing field is the province of Market
Surveillance. Of which, detecting and deterring collusive behavior is a priority among regulators.
Market participants are no longer attempting to manipulate the market single handedly. In fact, it can
be argued that it is not possible. In this paper we present a novel trader profiling and collusion detection
algorithm that models trading characteristics and detects collusive trading behavior. Traders place their
orders in response to market conditions and the demand and supply for the security as observed in the
order book. In the absence of information asymmetry, we would expect to see groups of traders follow
similar trading strategies in search of profit or those that are fulfilling other roles like the provision of
liquidity.
The study of such groups of traders and their inter-relationships provide insights in to those groups
that are distinctly different from the rest of the field. These outliers when profiled by a set of features of
their trading behavior provide indications of their motivations in the market.
We employ two novel approaches to detecting potential collusive behaviour. In the first, the cumulative
effect of trading between each pair of traders and their overall standing in the market in terms of the
total number of trades and the total volume traded is observed. In the second, we create overlapping
groups of traders by “fuzzy clustering” a set of features that characterize their trading behaviour and
identify collusive behaviour through a process of cluster profiling and outlier detection.
Keywords: collusion detection, graph mining, machine learning, market manipulation, market
surveillance, outlier detection
1. Introduction
In this paper we present a novel algorithm that
profiles entities according to the trading behavi
our and the characteristics of the entity. Profiles
can be derived at the client, trader, broker, and s
ecurity level. The attributes that determine the b
asis on which the entities are grouped are user d
efinable or can be selected from a predefined set
that has been defined to distinguish behavioura
l properties between the entities.
A fundamental result from the profiling is the d
etermination of groups of entities that behave si
milarly (in relation to a set of behavioural featur
es) and display similar characteristics. The ident
ification of those entities that differ markedly fr
om the rest is classed as outliers in terms of thei
r behaviour and provides a means for tracking t
hose that require attention.
The relationships between entities are estimated
on different criteria depending on the type of th
e interaction. In the principal approach, a fuzzy
clustering algorithm is employed with a feature
set that characterizes the trading behaviour of th
e entities to determine a set of overlapping fuzz
y clusters.
The degree of membership in a particular cluste
r is a measure of the likelihood of an entity belo
nging to that cluster and serves as a means to de
termine the degree of correlation between the gr
oup behaviours of the entities. This vector of pr
obabilities associated with each entity can be us
ed to determine those entities that behave very d
ifferently from the rest by using it as the basis of
comparison.
The algorithm is able to match behavioural patt
erns based on a variety of behavioural features i
n order to detect outliers. In another configurati
on of the algorithm, relationships between entiti
es are estimated depending on the amount of tr
ading between entities that are also taken as the
measure of the strength of the relationship betw
een each pair of entities. Outlier detection perfo
rmed on the collection of the pairs of entities wi
th a focus on detecting those entities that are hig
hly related results in a network of entities that a
re bound by their direct trading relationship.
Asoka Korale, Millennium IT
Fuard Ahamed, Millennium IT
Kaushalya Kularatnam, London Stock Exchange
Liam Smith, London Stock Exchange

ENGINEER 2
The algorithm is thus robust with respect to the
type of entity and the kind of approach used to
profile it providing a host of insights that are on
ly possible in such a versatile hybrid technique.
2. Current State of the Art in Entity
behaviour Profiling and Collusion
Detection
The current state of the art with respect to trader
profiling primarily relies on classifying the
traders according to their trading behaviour, the
characteristics of the traders or their association
with one another.
The traders can be classified according to their
trading behaviour in to the two broad categories
of algorithmic traders and human traders.
Algorithmic traders are those who almost
exclusively rely on algorithms to trade and base
their trading strategies on automated systems
whereas human traders make the trading
decisions themselves based on observed market
conditions and other information that is material
to the market. Algorithms which are automated
systems or computer programs operate by
considering market conditions as evidenced by
the state of the order book, news and a host of
other structured and unstructured data and are
able to execute trades at very low latencies. As a
result algorithmic trading may have an
unexpected and catastrophic impact on the
market when many algorithms trade in quick
succession and in response to the decisions
taken by other algorithms resulting in a cascade
of actions. This activity can lead to rapid rises or
declines in many stocks simultaneously across
the market over very short time intervals leading
to high volatility and even market crashes. The
‘Flash Crash of 2015’ is good example of what
could occur when algorithms go into panic
mode.
Human trading, on the other hand, relies on
experience and a holistic understanding of
market conditions and the general economic and
political environment and can outperform
algorithms particularly in the absence of
volatility.
Another means of classification of the trader is
in to the categories of a day trader and market
maker. A day trader takes advantage of
temporary inefficiencies in the market as
evidenced by the imbalances in the order book
resulting from fluctuations in the supply and
demand for the security at a given point in time
to profit from the volatility in price [3]. Typically
such traders close out their positions at the end
of the day. Market makers on the other hand
deal in securities or other assets and undertakes
to buy or sell at specified prices at all times [4].
The traders can also be profiled with respect to
their relationships with other traders, which
provide a third means of classification. The
relationship between traders is reflected in the
way they trade in the market. If there are
similarities in the trading behaviour over time or
in the strategy of trading employed by the
traders then a relationship can be said to exist
between such groups. An unusual amount of
trading (buying from and selling to) between a
group of traders is also evidence of a strong
relationship between the members of this group.
The relationship between traders can then be
expressed as a network of interactions where
each pair of related traders is linked in a network
diagram. The strength or weight of each link is
representative of the strength of the relationship
or the strength of the interaction. Such networks
when mined using graph theoretic measures
provide clues to the key actors, communities and
other useful characteristics.
3. Behaviour Profiling and
Collusion Detection Algorithm
The algorithm employs two principal approach
es to collusion detection. In the first, the similari
ty of trading behaviour is modelled where it att
empts to detect small groups of similar behavin
g entities that are very different from the rest of
the entities. In other words it attempts to detect
those outlier entities that share similar behaviou
ral characteristics. The argument being that mos
t entities should fall in to large groups that exhi
bit similar behaviours while those small groups
of similar behaving entities that are at the same t
ime very different from the rest of the field (thos
e that exhibit anomalous behaviours) are cause f
or concern or worthy of further investigation. W
hen these outlier groups exhibit behaviours con
sistent with collusion it is also quite likely that t
hose entities exhibit strong relationships with ea
ch other forming a collusive clique.
In the second approach we model the direct trad
ing relationship between each pair of entities an
d detect those entities that exhibit a strength of r
elationship well above the norm. Entities so stro
ngly related provide sufficient evidence to raise
suspicion of collusive behaviour. This is because
as it is very difficult to prearrange the parties to
a trade or determine which party will buy and w
hich will sell in a particular transaction, especial
ly in the case of heavily traded securities.

3 ENGINEER
In both approaches it is key that the appropriate
profiling features are selected to help detect the
desired suspicious behaviours and identify thos
e entities that are responsible.
3.1 Behaviour Profiling Approach
The essence of the algorithm lies in its ability to
group entities according to a user defined profil
e of their trading behaviour, trader characteristi
cs and their relationships to one another. Thus t
he algorithm is flexible and robust with respect t
o all of the desired profiling criteria that current
methods provide in a single hybrid technique.
We employ the well-known Fuzzy C-Means clu
stering to find “fuzzy” groups or groups of entit
ies with overlapping characteristics or fuzzy me
mberships.
The membership function of each entity provide
s a measure of the degree to which each entity b
elongs to each cluster. This fuzzy membership a
llows us to correlate the group membership beh
aviour of entities with each other.
The algorithm employs the novel idea that the g
roup membership function of each entity can be
employed as a probability density function. Thi
s vector of probabilities is then used to establish
“correlations” between the clustered entities an
d gauge the strength of the relationship between
them. In this regard the strength of the relations
hip is another measure or proxy for the degree o
f similarity between entities.
The correlation coefficient measures the degree
of dependence between two variables [1]. It can
be thought to express the degree to which how c
losely and in which direction the variables mov
e together.
YX
XY
YXCOV


),(
 (1)
YXYX
YEYXEXEYXCOV

))}())(({(),( 
 (2)
11  XY (3)
The group membership function which takes th
e form of a probability density function allows u
s to employ techniques for the comparison of pr
obability densities to determine those entities w
ith group membership behaviours most diverge
nt from the rest of the entities.
This process can be considered a form of outlier
detection where we detect those entities that exh
ibit group membership behaviours most differe
nt from the rest by comparing the group membe
rship function of each entity with the rest of the
entities. This process also enables us to determin
e those entities with group behaviour that is mo
stly like the rest of the entities.
3.1 Direct Trading Relationship Approach
In the second approach we examine the direct tr
ading relationship between every pair of entitie
s. The total number of trades and the total volu
me of trades between every pair of entities are u
sed to identify outliers or those pairs of entities t
hat have traded a number of times and a volum
e of shares far in excess of the rest of the field.
The data can be represented in a two dimension
al histogram which captures the variation in the
number of trades and the total volume traded b
etween every pair of entities at once. The two di
mensional histogram is used to detect outliers b
y capturing those pairs of entities that exhibit ex
treme values.
Other outlier detection mechanisms like K-Mea
ns clustering, Mahalanobis distance and Princip
al component analysis which are multivariate m
ethods may also be considered but may not add
much in the way of new insights in this case as
we are dealing with only two variables and we a
im to detect extreme values in the two dimensio
nal data.
4. Fuzzy – C Means Clustering
Algorithm [2]
The Fuzzy C-Means clustering algorithm create
s clusters with fuzzy boundaries. Unlike the K-
Means or Hierarchical Clustering algorithms wh
ere the boundaries between clusters are hard thi
s algorithm generates a set of clusters to which e
very object belongs to a certain degree.
This degree of cluster membership is in effect a
measure of the proximity of an entity to each clu
ster as a proportion of its distance to all of the cl
usters. Thus the degree of membership of a part
icular object to a particular cluster is an inverse f
unction of the distance of that object to the clust
er in question as a proportion of the distance of t
hat object to all of the other clusters. The distanc
e to a cluster is typically the distance of the obje
ct from the centroid of the cluster. Typically an o
bject is assigned to the cluster to which it shows
the highest membership.

ENGINEER 4
Let ix be a data vector (corresponding to a row
vector in matrix X). Let jc be the center of the
“j”th fuzzy cluster where Cj ,...1 . let iju
represent the degree of membership of ix in
cluster “j", where 1
1


c
j
ijU
the objective function that will be minimized in
order to achieve the clustering of data around
the centroids jc is
 
 

N
i
C
j
ji
m
ijm cxuJ
1
2
1
)( ,  m1 (6)
The value “m” influences the fuzziness of the
clusters, larger the value of m, the more fuzzy
the boundaries. Commonly a value m = 2 is used
as there no theoretically optimal value for this
parameter. When m = 1, the fuzzy algorithm
becomes hard.
The steps of the algorithm can be summarized
as:
1. Initialize ][ ijuU  the membership
matrix, and centroids
2. Update the membership matrix via














C
k
m
ki
ji
ij
cx
cx
u
1
1
2
1
3. determine centroids




 N
i
m
ij
N
i
i
m
ij
j
u
xu
c
1
1
4. check convergence criteria on k
U at
“kth” th iteration
 kk
UU 1
5. stop if convergence criteria met or go
back to step 2
5. Main Contributions of the
Algorithm
The algorithm has the advantage over legacy
systems in its flexibility, configurability and
future proofing and is novel in several respects.
It is designed to be flexible in that it employs no
thresholds, counts or limits and so is not rigid.
Systems that employ thresholds are faced with
the challenge of optimizing the thresholds and
are also faced with the inability to handle the
variety of scenarios encountered in live trading
leading to a high error rate or a high level of false
positives when the thresholds are set too
conservatively. It requires no training and is
unsupervised in its learning of the underlying
behaviours and patterns.
The algorithm includes the following unique
capabilities and features:
Fuzzy Segmentation:
Create clusters of entities with fuzzy
memberships (overlapping groups) such that
their individual behaviours are described by a
collection of user definable features and by the
degree of their membership to each cluster. The
fuzzy membership is akin to a probability
density function which allows a host of
similarity comparisons to be made.
Entity Profiling:
Each entity is profiled into two parts. In the first,
each entity is described via a meaningful set of
attributes that capture its trading behaviour. In
the second, the entity is described by its group
behaviour or fuzzy cluster membership
function. In this approach the entity may be
assigned to the cluster to which it displays the
highest degree of membership.
Correlation and Similar Entities:
Determine the degree of similarity between
pairs of entities using the vector of fuzzy
memberships to the clusters to estimate their
degree of match.
Outlier / Anomaly Detection:
The degree of membership in the clusters is
employed to compare between the behaviours of
the entities as characterized by their profiling
features. The technique detects those entities that
exhibit behaviours most dissimilar from the rest
as well as those that are broadly similar.
Collusive Groups
Estimate groups of entities that exhibit very hig
h strength of relationship as measured by the de
gree of similarity between the pairs of entities.
6. Feature Engineering for Trader
Profiling and Collusion Detection
The two approaches use two separate sets of fea
tures, the first selected for its ability to detect ag
gressive behaviour which is characteristic of pri
ce manipulation and the second for its ability to

5 ENGINEER
characterize the total activity between two entiti
es which is a measure of the degree to which the
entities are directly related.
Features that characterize aggressive trader beh
aviour were selected to demonstrate the algorith
m and generating results consistent with detecti
ng collusive behaviour. Aggressive behaviour o
n the part of a collusive group of traders can be
employed to manipulate the market by manipul
ating prices through ramping, wash trades and l
ayering.
The following features measure the how soon a
n order is executed or cancelled, the proportion
of new orders that are executed, the proportion
of (new) orders that are aggressors and the prop
ortion of orders that are cancelled. The first thre
e features capture the tendency of an order plac
ed by a trader to lie at or near the best bid or off
er. The fourth measures the intent or the sincerit
y with which an order is placed.
 Average order resting duration (i.e. the
average time between a new order and
a cancel or fill)
 Order to fill ratio
 Aggressiveness (ratio of new orders to a
ggressive fills)
 Order to cancel ratio
6.2 Direct Trade Relationship Profiling Appro
ach
The following features measure the trading acti
vity between a pair of entities.
For each pair of entities (traders)
 Total number of trades between the enti
ties
 The total volume of trades (buy and sell
) between the entities
7. Results
The results are presented on a highly liquid
stock for a single trading day. There were 246
active traders or participants on the day.
Figure 1 through Figure 4 depicts the four
features discussed in section 6.1 that are used to
profile the trading behaviour. As can be
observed there are groups of traders that
demonstrate aggressive behaviour as well as
those that depict passive behaviour. Aggressive
traders are liquidity takers and passive traders
are those that provide liquidity. The two groups
perform a complementary function in the
market.
Liquid markets characterized by heavy trading
are difficult to manipulate by colluding with
other participants as there is no guarantee of
predetermining the two parties to a trade.
Figure 1 – Order resting time
Figure 1 depicts the order resting time. It is a
measure that can be used to gauge the
aggressiveness of the order in the case of a trade
or the sincerity by which the order was placed in
the case that it was cancelled.
If the order is cancelled relatively soon it may
indicate an attempt to deceive on the part of the
trader. If on the other hand the order is filled
relatively quickly it could have lain near the top
of the book providing an insight as to the intent
of the trader with regard to the level of
aggressiveness in the trading approach. Orders
with long resting time are indicative of passive
trading.
Figure 2 – Average aggressiveness
Figure 2 depicts the order to fill ratio which in
this instance captures both the aggressive orders

ENGINEER 6
that do not lie in the book as well as those orders
that are filled after lying on the book. It is an
average measure of the degree of aggressiveness
in the trading behaviour. If a large proportion of
orders are not filled it would indicate that they
had been placed lower in the book and that the
trading behaviour is passive and not that of an
aggressor.
Figure 3 – Aggressiveness
Figure 3 depicts the aggressive fill to new order
ratio which is a measure of the aggressive orders
placed by a trader, and is a direct measure of the
level of aggressiveness in the trading behaviour.
Illiquid markets on the other hand where the
trading activity is relatively infrequent may
make it possible to pre arrange trades with other
participants to collusion. In such illiquid markets
aggressive behaviour can be taken to be
indicative of potential collusive behaviour
where the participants to a trade have been
predetermined and the trading behaviour
prearranged.
Figure 4 – Overall Aggressiveness
Figure 4 depicts the proportion of new orders
that have been cancelled. A high proportion of
cancelled orders may be indicative of a policy to
deceive or it could also be indicative of a scheme
to provide liquidity to the market depending on
how long the orders were on the book.
Figure 5 depicts the fuzzy cluster membership
function which indicates that there are several
large groups of traders that exhibit similar group
membership behaviour with respect to the ten
clusters. These large groups, in other words,
behave similarly to each other with respect to the
profiling attributes. The figure also indicates
that there are a few smaller groups of traders
(outlier groups) that behave similarly to one
another.
Figure 5 – Cluster membership function
Figures 6 – 9 present a series of box plots
depicting the variation in each of the profiling
attributes across the clusters. We observe in
particular cluster number 6, which shows
characteristics consistent with aggressive
behaviour with low order resting time, high
aggressiveness and high order execution rates.
Cluster 6, has five members corresponding to
traders with identification numbers 37, 59, 101,
136, 137. As a group they made only aggressive
trades during this trading period and therefore
have zero average order resting times. A large
proportion of all orders were also aggressors.
Cluster 10, on the other hand, exhibits behaviour
consistent with passive trading with high order
resting times, low aggressiveness, low execution
rates, and a relatively high order cancellation
rate. The other clusters contain traders exhibiting
a gradation in the degree of aggressive and
passive behaviour.

7 ENGINEER
Figure 6 – Variation in order resting time
Figure 7 – Variation in execution rate
Figure 8 – Variation in aggressiveness
Tables 1 and 2 provide summary statistics on
each of the attributes across the clusters. The
relative size of a cluster is usually a good
indicator of its candidature as an outlier cluster.
Each entity is assigned to the cluster to which it
shows the highest degree of membership.
The mean and standard deviation of the
attributes of the entities assigned to each cluster
is a means by which a cluster can be profiled and
groups of entities with desired characteristics
found.
Figure 9 – Variation in degree of passiveness
Table 1 – Cluster Profile Statistics I
Cluster
ID
Entities
in
Cluster
AVG.
Time
(s)
STD.
Time
(s)
AVG.
Fills to
New
Order
STD.
Fills to
New
Orders
1 14 192 465 0.518 0.097
2 63 214 660 0.028 0.025
3 25 131 244 0.927 0.106
4 22 329 1308 0.719 0.167
5 27 366 754 0.321 0.074
6 5 0 0 0.823 0.102
7 26 456 721 0.533 0.084
8 34 233 933 0.119 0.035
9 19 198 419 0.067 0.067
10 11 20100 15660 0.317 0.312
Table 1. Presents the mean and standard
deviation of the average order resting times and
average fill to new order ratio of the entities in
each cluster.
Table 2 – Cluster Profile Statistics II
Cluster
ID
Entities
in
Cluster
AVG.
Aggressive
Fill to New
Orders
STD.
Aggressive
Fill to New
Orders
AVG.
Cancel
to
New
Orders
STD.
Cancel
to
New
Orders
1 14 0.164 0.054 0.154 0.204
2 63 0.002 0.004 0.969 0.028
3 25 0.004 0.014 0.043 0.068
4 22 0.338 0.082 0.038 0.115
5 27 0.031 0.033 0.602 0.102
6 5 0.685 0.054 0 0
7 26 0.041 0.032 0.374 0.096
8 34 0.027 0.029 0.858 0.041
9 19 0.004 0.011 0.088 0.1
10 11 0.101 0.2 0.377 0.369

ENGINEER 8
Table 2. Presents the mean and standard
deviation of the aggressive fill to new order ratio
and cancel to new order ratio.
7.2 Direct Trader Relationship Approach
Figure 10 depicts the direct trading relationship
between each pair of entities (traders). There are
two pairs of traders that trade volumes on the
order of 18 million shares far more than the rest.
That pair seem to have exchanged large parcels
as the total number of interactions is relatively
small. There is also a single pair that trade a
small total volume but interact with each other
over 120 times.
Both scenarios illustrate outliers in the number
of shares traded and the number of times traded.
Both quantities are indicative of an unusually
strong relationship between each pair of traders
and would be cause for further investigation
especially in the case of a lightly traded security.
Figure 11 – Strength of trading relationship
8. Conclusion and Future Work
We establish through our results that the
proposed algorithm can successfully profile
trading behaviour according to a set of selected
criteria to detect groups of traders with unusual
characteristics and behaviours.
In particular through this modelling we detect
entities displaying aggressive behaviour which
is a strategy often employed by those colluding
to manipulate the market by manipulating prices
through ramping, wash trades and layering.
A process of outlier detection can also identify
those entities exhibiting strong relationships
with each other providing further insights in to
collusive behaviour.
9. References
1. Oxford Dictionary of Statistical Terms, Oxford
University Press, 2008
2. https://home.deib.polimi.it/matteucc/Clus
tering/tutorial_html/cmeans.html
3. https://www.investopedia.com/terms/d/daytra
der.asp extracted on 24 April 2018
4. Hastie, T., Tibshirani, R., “The Elements of
Statistical Learning”, 2nd ed., Springer, USA, 2009,
488-499pp.

Entity Profiling and Collusion Detection

More Related Content

Similar to Entity Profiling and Collusion Detection

More from Asoka Korale

Recently uploaded

Entity Profiling and Collusion Detection