Machine_Learning_Project_Report

MACHINE LEARNING PROJECT REPORT, JUNE 2014 1
SOM for Temporal Clustering Experiment
Henrik Grandin Aditya Hendra
Abstract—This paper is a report for the machine learning
project course. In this project we are going to use unsupervised
learning on the S&P 500 historical index data to find unusual
trading pattern. This could be used as an indicator, that during
a certain period, the index was in a very unusual condition, for
example during financial crisis 2008.
This paper will discuss the unsuccessful results of our prelim-
inary experiments using SOM[1] as the unsupervised learning
method, what we think as the causes for it and our suggestion
for future studies.
Keywords—Machine Learning, SOM, Unsupervised Learning,
Time Series, S&P500 index, STS Clustering
I. INTRODUCTION
Time series data is everywhere, especially in the data
mining field where we want to get valuable and interpretable
information from huge raw data sets. It is especially true for
financial time series, for predicting financial trend based on
past historical data has been one of the most prevalent ways
to make profit.
In order to learn more about this financial time series data,
we propose an experiment using machine learning algorithm
on S&P500 data set to see whether such time series data has
a unique data or pattern that could be clustered, i.e: a cluster
of very fluctuate index during financial crisis.
The goal of clustering is to identify structure in an unlabeled
data set by objectively organizing data into homogeneous
groups where the within-group-object similarity is minimized
and the between-group-object dissimilarity is maximized. In
this sense, clustering is sometimes called automatic classifica-
tion [3].
Without using any class label, clustering is also known as
unsupervised learning or learning by observations instead of by
examples. Han and Kamber [3] classified clustering methods
into five major categories: partitioning methods, hierarchical
methods, density based methods, grid-based methods, and
model-based methods.
One of model-based methods approach is neural network[7],
which consists of competitive learning, including ART and
self-organizing feature maps (SOM).
We choose SOM because it is one of the most frequently
used method for clustering temporal sequences [9].
This report is organized as follows. Section 2 gives a brief
introduction to SOMs. Section 3 explain the preprocessing
approach we use. Section 4 describes the experiment design.
Section 5 examines the result and consequences from it.
Section 6 gives possible improvement suggestion and section
7 concludes the report.
Henrik. Grandin is a student at Uppsala University
Aditya. Hendra is a student at Uppsala University
II. SELF ORGANIZING MAP
B. Hammer, A. Micheli, A. Sperduti, and M. Strickert [6]
used Self Organizing Map (SOM) for time series clustering
prediction with recurrent neural networks, it is said that
SOM is the most frequently used method when dealing with
clustering of temporal sequences.
The SOM is part of unsupervised learning which works
without using labels or examples to achieve auto classification,
data segmentation or vector quantification [4]. SOM adopts a
modified competitive learning, where nodes which have the
most similarity with inputs win the competition and its weight
get updated. A difference with normal competitive learning is
that, SOM also updates its neighbourhoods’ weight with value
less than the winner.
It is also mentioned by C. Y. Tsao and S. H. Chen [4],
from their literature research, that SOMs have been proven to
be an effective methodology to analyze problems in finance,
economics and marketing.
Therefore, we think it is appropriate to use SOM to find
clusters that contains so-called unusual trading data which
usually happens during financial crisis.
III. DATA PREPROCESSING
The data that we have worked on for each day is open, high,
low, close, and volume. It was downloaded to a csv file from
the Yahoo Finance web page [11].
Since a stock exchange is a non linear time series (inflation),
working with the raw data would not give any results. We need
to process it in some sense.
It is important that one attribute does not dwarf the other
attributes. At the same time, we do not want to make light
when a big change is happening in a single attribute. The
solution we ended up using was to work with the the percent
of change of an attribute from the previous day.
Pn =
An
An−1
(1)
Where P is the processed attribute and A is the raw data.
To be able to study the stock over time we need to group
the days into sets of days. It is not obvious what size of
these sets would be optimal so we will have to try a number
of different sizes. To ensure that we do not miss interesting
patterns because they are happening from the end of one set
into the beginning of another, we need to have overlapping
between the sets.
This is all solved with a python script that reads a csv file
with the raw data. It gives us the possibility to decide the size
of each set, the number of days two neighboring sets should
have in common and which of the attributes we wish to use
(open, high, low, close and volume).

IV. EXPERIMENT DESIGN
We decided to use the software Orange [12]. It is a software
written in python with a nice and easy to use graphical
interface. It has a lot of classification methods but we are only
interested in its SOM implementation. We started experiment-
ing in Matlab but the SOM implementation in Matlab does
not allow us to see exactly which of input sets which gets
associated to which nodes in the output map. This is a vital
functionality for our experiment and orange do provide it.
The SOM application in Orange allow us to customize it to
some extent. We can change the size and topology of the output
map. The initial weights of the map can either be random or
evenly distributed. We can set the initial and final size of the
neighborhood, as well as decide if the neighborhood move
function should be a Gaussian or a ”top hat” function. One
setting that would have been helpful but unfortunately was
not included in the orange application would be to decide how
many iterations for the tuning phase, the only thing you could
decide on was the total amount of iterations.
In our first experiment we used a set of 20 days of trading
(one month) with an overlap of 10 days between each set. The
result was basically a random spread over the 2D map. No
distance between the nodes had been created and the amount
of sets in each node was basically the same all over the map.
This remained the same with whatever settings we used on
the SOM. After this experiment we concluded that we have to
account for a problem with time series in a SOM.
If two sets are basically identical, with the only difference
that one set is delayed by one day. The SOM will not be able
to recognize the similarities, since it is only comparing day
one with day one, day two with day two and so forth. The
solution to this is to ensure that each day will be used in each
position of the set. If we have a set size of ten days, every day
will be in ten different sets. Two neighboring sets will have
nine days in common. Basically we are sliding the data sets
by one day.
When using the new settings, 20 days with 19 days overlap,
the resulting 2D map remained the same. When using 20 days
with five attributes for each day, we get 100 dimensions for our
SOM. We tried to reduce the dimensions by either reducing
the days in each set, or by reducing the number of attributes
we track for each day. We tried a Number of combinations
between 5-20 days and 1-5 attributes but still sliding the data
sets by one day.
Nevertheless, the result was for the most part as unimpres-
sive as before, it was only when we only used one attribute that
some resemblance of clusters started appearing. The problem
is that the clusters aren’t really separated when you start to
look at the days that are contained in each clusters. Since we
have overlapping in our sets, days will be contained in multiple
sets. The problem is that these sets are not contained in the
same cluster for the most part. Most days will therefore be
present in most clusters, outliers as well as the main clusters.
Fig. 1: 20 days set, 10 days overlap, tracking open, high, low,
close and volume
Fig. 2: 5 days set, 4 days overlap, tracking open, high, low,
close and volume

Fig. 3: 20 days set, 19 days overlap, tracking closing value
Fig. 4: 5 days set, 4 days overlap, tracking amount of stocks
traded
This is a sample of the resulting 2D maps from our SOM.
The size of the circles represent how many of the sets are
associated with that node. The colour of each node represent
the distance between the nodes, light colour represent short
distance while dark colour represent larger distances. Nodes
without circles are there to indicate distances between the
nodes with circles.
Figure one and two are ran while tracking all 5 attributes,
as a result we do not manage to find any clusters. A few of
the corners have created distance from the rest of the map, but
each node only have 1 set in each node. These sets do not
include any dates that is on the list of days with great change
in the SP 500 index [13]. When looking at the values each set,
there was no apparent reason why they where outliers.
As you can see in figure three and four, we actually manage
to get clusters when we are only tracking one attribute in the
time series (closing value did in general create more distinct
clusters than volume).
V. RESULTS AND FAILURES
Overall, the SOM does not produce a deterministic cluster
pattern that could indicate whether a cluster contain a group of
unusual data such as the index during financial crisis 2008. The
resulting clusters contain data that looks more like a normal
random data, with each has data from various time as seen at
previous figures.
For our preprocessing, we extract data sets from a single
time series using sliding windows method to create more
data sets with overlapping time. This method also called as
sub-sequence clustering or STS (Subsequence Time Series)
clustering. Lin, E. Keogh, and W. Truppel at [5] claim that
clustering of streaming time series is meaningless because the
use of data sets extracted by sliding window method.
As shocking as it is, the claim provide proof that clustering
from sliding window time series is essentially no different than
clustering from a random walk data. It is said in that literature
that for any time series data set T, if T is clustered using sliding
windows and sampling’s length for the sliding windows is very
small compared to the length of the overall time series, then
the mean of all data will be an approximately constant vector.
Although we haven’t really tested and proved this theorem,
the visual confirmation of the SOM clusters indicate that each
cluster contains a general data consists of random sampling
from each data sets.
The reason of why this happens could be
explained by introducing cluster distance(A,B) and
cluster meaningfullness(X,Y). We will use the following
equation to help us understand these two terms:
• Let A=( ¯a1, ¯a2, ..., ¯ak) be the cluster centers derived
from one run of STS k-means.
• Let B=( ¯b1, ¯b2, ..., ¯bk) be the cluster centers derived from
another different run of STS k-means.
• Let dist( ¯ai, ¯aj) be the distance between two cluster
centers, measured with Euclidean distance.
Then we could define the distance between two sets of
clusters as:
cluster distance(A, B) ≡
k
i=1
min[dist( ¯ai, ¯aj)], 1 ≤ j ≤ k
(2)
We could use this distance to tell us the similarity between
the two cluster sets. The experiment described in the literature

uses k-means as the main clustering algorithm example, 3
random restarts of k-means on a stock market data set was
created and saved as set X. Another 3 random restarts on
random walk data set was also created and saved as set Y.
Both set are processed as follows:
• within set X distance is the average cluster distance
between one set of X with other set of X.
• between set X and Y distance is the average cluster
distance between one set of X with other set of Y.
The relationship between these two equation is represented
as:
cluster meaningfullness(X, Y ) ≡
within set X distance
between set X and Y distance
(3)
Since the numerator is measuring distance between similar
clusters the value should be very small or close to zero. On
the other hand, the denominator is measuring distance between
two different cluster, therefore, the value should be large and
overall the cluster meaningfullness(X,Y) should be very close
to zero.
The result mentioned in the literature is actually very dif-
ferent, with between set X and Y distance’s value suggesting
that X and Y set is very similar.
The literature also said that the experiments were done with
many other clustering algorithm including SOM.
One conclusion that was suggested as the root cause of why
this happens is that ”STS clustering algorithms are simply
returning a set of basis functions that can be added together
in a weighted combination to approximate the original data.”
VI. POSSIBLE IMPROVEMENTS
One problem with the regular SOM is that it does not have
a sense of time. When you are going to update the weights in
a node, it is based on its current position. Its current position is
basically just a regular sum of all previous movements and its
starting position. When working with time series, it does make
sense to change this. Patterns that happened the last month
should have more impact than patterns that happened 10 years
ago. The recurrent self-organizing map is an alteration to the
regular SOM that aims to fix this. The RSOM update rule
looks like: [10]
yi(t) = a
n−1
k=0
(1−a)k
(x(t−k)−wi(t−k))+(1−a)k
yi(t−n)
(4)
Where x(t) is the input pattern at iteration t, wi(t) is the
weights of node i at iteration t, x(t) -wi(t) is be the movement
needed to move node i to x(t), a, 0 < a ≤ 1 determines the
impact of older movements. when a approaches 1, we discard
old movements, the system is a short-term memory. When a
approaches 0 the system is a long-term memory. We did not
manage to find any application that had RSOM implemented,
and we did not have enough time to implement it. But it seems
reasonable that this would at least improve our clusters.
When we used multiple attributes in our time series (open,
high, low, close and volume) our SOM did not manage to
create any clusters. We have not find any paper that had used
more than one attribute in their time series.
One of the problems with using multiple attributes is of
course that the input dimension in your SOM greatly increases.
But using 25 day time series with one attribute created better
clusters than using five attributes in a five day time series.
If the problem was only about dimensionality they should
be comparable. We propose an experiment with a different
approach to the multiple attribute time series.
The first step is to use the SOM on each attribute individu-
ally, then define clusters of the output map for each attribute.
The next step is to combine clusters from different attributes.
If two clusters from two different attributes have 80% in
common, create a new cluster with the 80%. The 20% from
the two original clusters get put in two smaller clusters.
We imagine that there might be quite a lot of clusters so it
might be necessary to create a distance measure between the
clusters that makes it possible to merge clusters that gets to
close to each other.
Other simple modification that we could suggest is not using
sliding windows approach, instead of using subsequences that
are randomly extracted[5].
Another approach that is quite differnt is to combine clus-
tering (SOM) with recurrent neural network (RNN) algorithm
[9]. In this literature, the SOM algorithm is used for temporal
sequence processing and classification and a Recurrent Neural
Network will be associated for each created cluster as a
predictor. The literature also states that RNN uses ”internal
feedback mechanism that creates an implicit memory that
contributes to the prediction.” This also means using data in
sequentially correct order is obligatory for correct predictions.
Although this approach also uses sliding temporal window
method, we do not know how this method affect the overall
approach, or whether a non-sliding temporal window method
should be used.
VII. CONCLUSION
For this project, we tried to find a unique cluster from
S&P500 time series using Self-Organizing Map. We used the
value of Opening, Closing, Highest and Lowest of daily index
and also daily trading volume. But the resulting clusters do
not have any distinguishable value. One of the main cause of
the problem is probably the use of sliding window method.
One of the reason we think the approach by A. Cherif, H.
Cardot, and R. Bon at [9] could work is because this literature
discusses experiments which use a well-known time series, the
MackeyGlass, which visually looks very similar with S&P500
index chart. The usefullness of this approach at a time series
that is similar with S&P500 makes it worth to mention.
ACKNOWLEDGMENT
The authors would like to thank Joseph Scott for supervising
our project and our lecturer Olle Gallmo for his teaching during
the course.

REFERENCES
[1] T. Kohonen, The self-organizing map, Proceedings of the IEEE, vol. 78,
no. 9, pp. 14641480, Sep. 1990.
[2] T. Koskela, M. Varsta, J. Heikkonen, K. Kaski, ”Recurrent SOM with
Local Linear Models in Time Series Prediction” in 6th European Sym-
posium on Artificial Neural Networks, 1998, pp. 167-172.
[3] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan
Kaufmann, San Francisco, 2001 pp. 346389.
[4] C. Y. Tsao and S.-H. Chen, Self-organizing maps as a foundation for
charting or geometric pattern recognition in financial time series, in
2003 IEEE International Conference on Computational Intelligence for
Financial Engineering, 2003. Proceedings, 2003, pp. 387394.
[5] Lin, E. Keogh, and W. Truppel, Clustering of streaming time series is
meaningless, in Proceedings of the 8th ACM SIGMOD workshop on
Research issues in data mining and knowledge discovery, 2003, pp. 5665.
[6] B. Hammer, A. Micheli, A. Sperduti, and M. Strickert, Recursive self-
organizing network models, Neural Networks, vol. 17, no. 89, pp.
10611085, Oct. 2004.
[7] T. Warren Liao, Clustering of time series data survey, Pattern Recogni-
tion, vol. 38, no. 11, pp. 18571874, Nov. 2005.
[8] A. Fonseka, D. Alahakoon, and S. Bedingfield, GSOM sequence: An
unsupervised dynamic approach for knowledge discovery in temporal
data, in 2011 IEEE Symposium on Computational Intelligence and Data
Mining (CIDM), 2011, pp. 232238.
[9] A. Cherif, H. Cardot, and R. Bone, SOM time series clustering and
prediction with recurrent neural networks, Neurocomputing, vol. 74, no.
11, pp. 19361944, May 2011.
[10] M. Varsta, J.R. Milln and J. Heikkonen, ”A Recurrent Self-Organizing
Map for Temporal Sequence Processing”, Lecture Notes in Computer
Science Volume 1327, 1997, pp 421-426
[11] Website:Yahoo Finance, S&P 500 Stock data, Accessed: 28/5 2014,
URL: http://finance.yahoo.com/q/hp?s=%5EGSPC+Historical+Prices
[12] Website: SOM Application, Orange, Accessed: 28/5 2015, URL:
http://orange.biolab.si/
[13] Website:List of largest changes in the S&P 500 Accessed: 29/5 2015,
URL: http://en.wikipedia.org/wiki/List of largest daily changes in the
S%26P 500

Machine_Learning_Project_Report

Recommended

Recommended

More Related Content

Similar to Machine_Learning_Project_Report

Similar to Machine_Learning_Project_Report (20)

Machine_Learning_Project_Report