SlideShare a Scribd company logo
1 of 5
Download to read offline
MACHINE LEARNING PROJECT REPORT, JUNE 2014 1
SOM for Temporal Clustering Experiment
Henrik Grandin Aditya Hendra
Abstract—This paper is a report for the machine learning
project course. In this project we are going to use unsupervised
learning on the S&P 500 historical index data to find unusual
trading pattern. This could be used as an indicator, that during
a certain period, the index was in a very unusual condition, for
example during financial crisis 2008.
This paper will discuss the unsuccessful results of our prelim-
inary experiments using SOM[1] as the unsupervised learning
method, what we think as the causes for it and our suggestion
for future studies.
Keywords—Machine Learning, SOM, Unsupervised Learning,
Time Series, S&P500 index, STS Clustering
I. INTRODUCTION
Time series data is everywhere, especially in the data
mining field where we want to get valuable and interpretable
information from huge raw data sets. It is especially true for
financial time series, for predicting financial trend based on
past historical data has been one of the most prevalent ways
to make profit.
In order to learn more about this financial time series data,
we propose an experiment using machine learning algorithm
on S&P500 data set to see whether such time series data has
a unique data or pattern that could be clustered, i.e: a cluster
of very fluctuate index during financial crisis.
The goal of clustering is to identify structure in an unlabeled
data set by objectively organizing data into homogeneous
groups where the within-group-object similarity is minimized
and the between-group-object dissimilarity is maximized. In
this sense, clustering is sometimes called automatic classifica-
tion [3].
Without using any class label, clustering is also known as
unsupervised learning or learning by observations instead of by
examples. Han and Kamber [3] classified clustering methods
into five major categories: partitioning methods, hierarchical
methods, density based methods, grid-based methods, and
model-based methods.
One of model-based methods approach is neural network[7],
which consists of competitive learning, including ART and
self-organizing feature maps (SOM).
We choose SOM because it is one of the most frequently
used method for clustering temporal sequences [9].
This report is organized as follows. Section 2 gives a brief
introduction to SOMs. Section 3 explain the preprocessing
approach we use. Section 4 describes the experiment design.
Section 5 examines the result and consequences from it.
Section 6 gives possible improvement suggestion and section
7 concludes the report.
Henrik. Grandin is a student at Uppsala University
Aditya. Hendra is a student at Uppsala University
II. SELF ORGANIZING MAP
B. Hammer, A. Micheli, A. Sperduti, and M. Strickert [6]
used Self Organizing Map (SOM) for time series clustering
prediction with recurrent neural networks, it is said that
SOM is the most frequently used method when dealing with
clustering of temporal sequences.
The SOM is part of unsupervised learning which works
without using labels or examples to achieve auto classification,
data segmentation or vector quantification [4]. SOM adopts a
modified competitive learning, where nodes which have the
most similarity with inputs win the competition and its weight
get updated. A difference with normal competitive learning is
that, SOM also updates its neighbourhoods’ weight with value
less than the winner.
It is also mentioned by C. Y. Tsao and S. H. Chen [4],
from their literature research, that SOMs have been proven to
be an effective methodology to analyze problems in finance,
economics and marketing.
Therefore, we think it is appropriate to use SOM to find
clusters that contains so-called unusual trading data which
usually happens during financial crisis.
III. DATA PREPROCESSING
The data that we have worked on for each day is open, high,
low, close, and volume. It was downloaded to a csv file from
the Yahoo Finance web page [11].
Since a stock exchange is a non linear time series (inflation),
working with the raw data would not give any results. We need
to process it in some sense.
It is important that one attribute does not dwarf the other
attributes. At the same time, we do not want to make light
when a big change is happening in a single attribute. The
solution we ended up using was to work with the the percent
of change of an attribute from the previous day.
Pn =
An
An−1
(1)
Where P is the processed attribute and A is the raw data.
To be able to study the stock over time we need to group
the days into sets of days. It is not obvious what size of
these sets would be optimal so we will have to try a number
of different sizes. To ensure that we do not miss interesting
patterns because they are happening from the end of one set
into the beginning of another, we need to have overlapping
between the sets.
This is all solved with a python script that reads a csv file
with the raw data. It gives us the possibility to decide the size
of each set, the number of days two neighboring sets should
have in common and which of the attributes we wish to use
(open, high, low, close and volume).
MACHINE LEARNING PROJECT REPORT, JUNE 2014 2
IV. EXPERIMENT DESIGN
We decided to use the software Orange [12]. It is a software
written in python with a nice and easy to use graphical
interface. It has a lot of classification methods but we are only
interested in its SOM implementation. We started experiment-
ing in Matlab but the SOM implementation in Matlab does
not allow us to see exactly which of input sets which gets
associated to which nodes in the output map. This is a vital
functionality for our experiment and orange do provide it.
The SOM application in Orange allow us to customize it to
some extent. We can change the size and topology of the output
map. The initial weights of the map can either be random or
evenly distributed. We can set the initial and final size of the
neighborhood, as well as decide if the neighborhood move
function should be a Gaussian or a ”top hat” function. One
setting that would have been helpful but unfortunately was
not included in the orange application would be to decide how
many iterations for the tuning phase, the only thing you could
decide on was the total amount of iterations.
In our first experiment we used a set of 20 days of trading
(one month) with an overlap of 10 days between each set. The
result was basically a random spread over the 2D map. No
distance between the nodes had been created and the amount
of sets in each node was basically the same all over the map.
This remained the same with whatever settings we used on
the SOM. After this experiment we concluded that we have to
account for a problem with time series in a SOM.
If two sets are basically identical, with the only difference
that one set is delayed by one day. The SOM will not be able
to recognize the similarities, since it is only comparing day
one with day one, day two with day two and so forth. The
solution to this is to ensure that each day will be used in each
position of the set. If we have a set size of ten days, every day
will be in ten different sets. Two neighboring sets will have
nine days in common. Basically we are sliding the data sets
by one day.
When using the new settings, 20 days with 19 days overlap,
the resulting 2D map remained the same. When using 20 days
with five attributes for each day, we get 100 dimensions for our
SOM. We tried to reduce the dimensions by either reducing
the days in each set, or by reducing the number of attributes
we track for each day. We tried a Number of combinations
between 5-20 days and 1-5 attributes but still sliding the data
sets by one day.
Nevertheless, the result was for the most part as unimpres-
sive as before, it was only when we only used one attribute that
some resemblance of clusters started appearing. The problem
is that the clusters aren’t really separated when you start to
look at the days that are contained in each clusters. Since we
have overlapping in our sets, days will be contained in multiple
sets. The problem is that these sets are not contained in the
same cluster for the most part. Most days will therefore be
present in most clusters, outliers as well as the main clusters.
Fig. 1: 20 days set, 10 days overlap, tracking open, high, low,
close and volume
Fig. 2: 5 days set, 4 days overlap, tracking open, high, low,
close and volume
MACHINE LEARNING PROJECT REPORT, JUNE 2014 3
Fig. 3: 20 days set, 19 days overlap, tracking closing value
Fig. 4: 5 days set, 4 days overlap, tracking amount of stocks
traded
This is a sample of the resulting 2D maps from our SOM.
The size of the circles represent how many of the sets are
associated with that node. The colour of each node represent
the distance between the nodes, light colour represent short
distance while dark colour represent larger distances. Nodes
without circles are there to indicate distances between the
nodes with circles.
Figure one and two are ran while tracking all 5 attributes,
as a result we do not manage to find any clusters. A few of
the corners have created distance from the rest of the map, but
each node only have 1 set in each node. These sets do not
include any dates that is on the list of days with great change
in the SP 500 index [13]. When looking at the values each set,
there was no apparent reason why they where outliers.
As you can see in figure three and four, we actually manage
to get clusters when we are only tracking one attribute in the
time series (closing value did in general create more distinct
clusters than volume).
V. RESULTS AND FAILURES
Overall, the SOM does not produce a deterministic cluster
pattern that could indicate whether a cluster contain a group of
unusual data such as the index during financial crisis 2008. The
resulting clusters contain data that looks more like a normal
random data, with each has data from various time as seen at
previous figures.
For our preprocessing, we extract data sets from a single
time series using sliding windows method to create more
data sets with overlapping time. This method also called as
sub-sequence clustering or STS (Subsequence Time Series)
clustering. Lin, E. Keogh, and W. Truppel at [5] claim that
clustering of streaming time series is meaningless because the
use of data sets extracted by sliding window method.
As shocking as it is, the claim provide proof that clustering
from sliding window time series is essentially no different than
clustering from a random walk data. It is said in that literature
that for any time series data set T, if T is clustered using sliding
windows and sampling’s length for the sliding windows is very
small compared to the length of the overall time series, then
the mean of all data will be an approximately constant vector.
Although we haven’t really tested and proved this theorem,
the visual confirmation of the SOM clusters indicate that each
cluster contains a general data consists of random sampling
from each data sets.
The reason of why this happens could be
explained by introducing cluster distance(A,B) and
cluster meaningfullness(X,Y). We will use the following
equation to help us understand these two terms:
• Let A=( ¯a1, ¯a2, ..., ¯ak) be the cluster centers derived
from one run of STS k-means.
• Let B=( ¯b1, ¯b2, ..., ¯bk) be the cluster centers derived from
another different run of STS k-means.
• Let dist( ¯ai, ¯aj) be the distance between two cluster
centers, measured with Euclidean distance.
Then we could define the distance between two sets of
clusters as:
cluster distance(A, B) ≡
k
i=1
min[dist( ¯ai, ¯aj)], 1 ≤ j ≤ k
(2)
We could use this distance to tell us the similarity between
the two cluster sets. The experiment described in the literature
MACHINE LEARNING PROJECT REPORT, JUNE 2014 4
uses k-means as the main clustering algorithm example, 3
random restarts of k-means on a stock market data set was
created and saved as set X. Another 3 random restarts on
random walk data set was also created and saved as set Y.
Both set are processed as follows:
• within set X distance is the average cluster distance
between one set of X with other set of X.
• between set X and Y distance is the average cluster
distance between one set of X with other set of Y.
The relationship between these two equation is represented
as:
cluster meaningfullness(X, Y ) ≡
within set X distance
between set X and Y distance
(3)
Since the numerator is measuring distance between similar
clusters the value should be very small or close to zero. On
the other hand, the denominator is measuring distance between
two different cluster, therefore, the value should be large and
overall the cluster meaningfullness(X,Y) should be very close
to zero.
The result mentioned in the literature is actually very dif-
ferent, with between set X and Y distance’s value suggesting
that X and Y set is very similar.
The literature also said that the experiments were done with
many other clustering algorithm including SOM.
One conclusion that was suggested as the root cause of why
this happens is that ”STS clustering algorithms are simply
returning a set of basis functions that can be added together
in a weighted combination to approximate the original data.”
VI. POSSIBLE IMPROVEMENTS
One problem with the regular SOM is that it does not have
a sense of time. When you are going to update the weights in
a node, it is based on its current position. Its current position is
basically just a regular sum of all previous movements and its
starting position. When working with time series, it does make
sense to change this. Patterns that happened the last month
should have more impact than patterns that happened 10 years
ago. The recurrent self-organizing map is an alteration to the
regular SOM that aims to fix this. The RSOM update rule
looks like: [10]
yi(t) = a
n−1
k=0
(1−a)k
(x(t−k)−wi(t−k))+(1−a)k
yi(t−n)
(4)
Where x(t) is the input pattern at iteration t, wi(t) is the
weights of node i at iteration t, x(t) -wi(t) is be the movement
needed to move node i to x(t), a, 0 < a ≤ 1 determines the
impact of older movements. when a approaches 1, we discard
old movements, the system is a short-term memory. When a
approaches 0 the system is a long-term memory. We did not
manage to find any application that had RSOM implemented,
and we did not have enough time to implement it. But it seems
reasonable that this would at least improve our clusters.
When we used multiple attributes in our time series (open,
high, low, close and volume) our SOM did not manage to
create any clusters. We have not find any paper that had used
more than one attribute in their time series.
One of the problems with using multiple attributes is of
course that the input dimension in your SOM greatly increases.
But using 25 day time series with one attribute created better
clusters than using five attributes in a five day time series.
If the problem was only about dimensionality they should
be comparable. We propose an experiment with a different
approach to the multiple attribute time series.
The first step is to use the SOM on each attribute individu-
ally, then define clusters of the output map for each attribute.
The next step is to combine clusters from different attributes.
If two clusters from two different attributes have 80% in
common, create a new cluster with the 80%. The 20% from
the two original clusters get put in two smaller clusters.
We imagine that there might be quite a lot of clusters so it
might be necessary to create a distance measure between the
clusters that makes it possible to merge clusters that gets to
close to each other.
Other simple modification that we could suggest is not using
sliding windows approach, instead of using subsequences that
are randomly extracted[5].
Another approach that is quite differnt is to combine clus-
tering (SOM) with recurrent neural network (RNN) algorithm
[9]. In this literature, the SOM algorithm is used for temporal
sequence processing and classification and a Recurrent Neural
Network will be associated for each created cluster as a
predictor. The literature also states that RNN uses ”internal
feedback mechanism that creates an implicit memory that
contributes to the prediction.” This also means using data in
sequentially correct order is obligatory for correct predictions.
Although this approach also uses sliding temporal window
method, we do not know how this method affect the overall
approach, or whether a non-sliding temporal window method
should be used.
VII. CONCLUSION
For this project, we tried to find a unique cluster from
S&P500 time series using Self-Organizing Map. We used the
value of Opening, Closing, Highest and Lowest of daily index
and also daily trading volume. But the resulting clusters do
not have any distinguishable value. One of the main cause of
the problem is probably the use of sliding window method.
One of the reason we think the approach by A. Cherif, H.
Cardot, and R. Bon at [9] could work is because this literature
discusses experiments which use a well-known time series, the
MackeyGlass, which visually looks very similar with S&P500
index chart. The usefullness of this approach at a time series
that is similar with S&P500 makes it worth to mention.
ACKNOWLEDGMENT
The authors would like to thank Joseph Scott for supervising
our project and our lecturer Olle Gallmo for his teaching during
the course.
MACHINE LEARNING PROJECT REPORT, JUNE 2014 5
REFERENCES
[1] T. Kohonen, The self-organizing map, Proceedings of the IEEE, vol. 78,
no. 9, pp. 14641480, Sep. 1990.
[2] T. Koskela, M. Varsta, J. Heikkonen, K. Kaski, ”Recurrent SOM with
Local Linear Models in Time Series Prediction” in 6th European Sym-
posium on Artificial Neural Networks, 1998, pp. 167-172.
[3] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan
Kaufmann, San Francisco, 2001 pp. 346389.
[4] C. Y. Tsao and S.-H. Chen, Self-organizing maps as a foundation for
charting or geometric pattern recognition in financial time series, in
2003 IEEE International Conference on Computational Intelligence for
Financial Engineering, 2003. Proceedings, 2003, pp. 387394.
[5] Lin, E. Keogh, and W. Truppel, Clustering of streaming time series is
meaningless, in Proceedings of the 8th ACM SIGMOD workshop on
Research issues in data mining and knowledge discovery, 2003, pp. 5665.
[6] B. Hammer, A. Micheli, A. Sperduti, and M. Strickert, Recursive self-
organizing network models, Neural Networks, vol. 17, no. 89, pp.
10611085, Oct. 2004.
[7] T. Warren Liao, Clustering of time series data survey, Pattern Recogni-
tion, vol. 38, no. 11, pp. 18571874, Nov. 2005.
[8] A. Fonseka, D. Alahakoon, and S. Bedingfield, GSOM sequence: An
unsupervised dynamic approach for knowledge discovery in temporal
data, in 2011 IEEE Symposium on Computational Intelligence and Data
Mining (CIDM), 2011, pp. 232238.
[9] A. Cherif, H. Cardot, and R. Bone, SOM time series clustering and
prediction with recurrent neural networks, Neurocomputing, vol. 74, no.
11, pp. 19361944, May 2011.
[10] M. Varsta, J.R. Milln and J. Heikkonen, ”A Recurrent Self-Organizing
Map for Temporal Sequence Processing”, Lecture Notes in Computer
Science Volume 1327, 1997, pp 421-426
[11] Website:Yahoo Finance, S&P 500 Stock data, Accessed: 28/5 2014,
URL: http://finance.yahoo.com/q/hp?s=%5EGSPC+Historical+Prices
[12] Website: SOM Application, Orange, Accessed: 28/5 2015, URL:
http://orange.biolab.si/
[13] Website:List of largest changes in the S&P 500 Accessed: 29/5 2015,
URL: http://en.wikipedia.org/wiki/List of largest daily changes in the
S%26P 500

More Related Content

Similar to Machine_Learning_Project_Report

Seminar_Koga_Yuki_v2.pdf
Seminar_Koga_Yuki_v2.pdfSeminar_Koga_Yuki_v2.pdf
Seminar_Koga_Yuki_v2.pdfIkedaYuki
 
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and AbstractIEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and Abstracttsysglobalsolutions
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdfLeonardo Auslender
 
Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...
Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...
Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...Dr. Amarjeet Singh
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
Modularizing Arcihtectures Using Dendrograms1
Modularizing Arcihtectures Using Dendrograms1Modularizing Arcihtectures Using Dendrograms1
Modularizing Arcihtectures Using Dendrograms1victor tang
 
operation research notes
operation research notesoperation research notes
operation research notesRenu Thakur
 
Clustering big spatiotemporal interval data
Clustering big spatiotemporal interval dataClustering big spatiotemporal interval data
Clustering big spatiotemporal interval dataNexgen Technology
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basicsNeeleEilers
 
Hyatt Hotel Group Project
Hyatt Hotel Group ProjectHyatt Hotel Group Project
Hyatt Hotel Group ProjectErik Bebernes
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
 
Software Defect Trend Forecasting In Open Source Projects using A Univariate ...
Software Defect Trend Forecasting In Open Source Projects using A Univariate ...Software Defect Trend Forecasting In Open Source Projects using A Univariate ...
Software Defect Trend Forecasting In Open Source Projects using A Univariate ...CSCJournals
 
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...ijtsrd
 
Module 04 Content· As a continuation to examining your policies, r
Module 04 Content· As a continuation to examining your policies, rModule 04 Content· As a continuation to examining your policies, r
Module 04 Content· As a continuation to examining your policies, rIlonaThornburg83
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Jisu Han
 

Similar to Machine_Learning_Project_Report (20)

Seminar_Koga_Yuki_v2.pdf
Seminar_Koga_Yuki_v2.pdfSeminar_Koga_Yuki_v2.pdf
Seminar_Koga_Yuki_v2.pdf
 
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and AbstractIEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
 
A Tour through the Data Vizualization Zoo - Communications of the ACM
A Tour through the Data Vizualization Zoo - Communications of the ACMA Tour through the Data Vizualization Zoo - Communications of the ACM
A Tour through the Data Vizualization Zoo - Communications of the ACM
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
 
Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...
Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...
Exploring the Impact of Magnitude- and Direction-based Loss Function on the P...
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Modularizing Arcihtectures Using Dendrograms1
Modularizing Arcihtectures Using Dendrograms1Modularizing Arcihtectures Using Dendrograms1
Modularizing Arcihtectures Using Dendrograms1
 
operation research notes
operation research notesoperation research notes
operation research notes
 
Clustering big spatiotemporal interval data
Clustering big spatiotemporal interval dataClustering big spatiotemporal interval data
Clustering big spatiotemporal interval data
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
 
Hyatt Hotel Group Project
Hyatt Hotel Group ProjectHyatt Hotel Group Project
Hyatt Hotel Group Project
 
Data Structures for Robotic Learning
Data Structures for Robotic LearningData Structures for Robotic Learning
Data Structures for Robotic Learning
 
Finding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster ResultsFinding Relationships between the Our-NIR Cluster Results
Finding Relationships between the Our-NIR Cluster Results
 
Software Defect Trend Forecasting In Open Source Projects using A Univariate ...
Software Defect Trend Forecasting In Open Source Projects using A Univariate ...Software Defect Trend Forecasting In Open Source Projects using A Univariate ...
Software Defect Trend Forecasting In Open Source Projects using A Univariate ...
 
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...
 
Poster
PosterPoster
Poster
 
Module 04 Content· As a continuation to examining your policies, r
Module 04 Content· As a continuation to examining your policies, rModule 04 Content· As a continuation to examining your policies, r
Module 04 Content· As a continuation to examining your policies, r
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
Preference learning for guiding the tree searches in continuous POMDPs (CoRL ...
 

Machine_Learning_Project_Report

  • 1. MACHINE LEARNING PROJECT REPORT, JUNE 2014 1 SOM for Temporal Clustering Experiment Henrik Grandin Aditya Hendra Abstract—This paper is a report for the machine learning project course. In this project we are going to use unsupervised learning on the S&P 500 historical index data to find unusual trading pattern. This could be used as an indicator, that during a certain period, the index was in a very unusual condition, for example during financial crisis 2008. This paper will discuss the unsuccessful results of our prelim- inary experiments using SOM[1] as the unsupervised learning method, what we think as the causes for it and our suggestion for future studies. Keywords—Machine Learning, SOM, Unsupervised Learning, Time Series, S&P500 index, STS Clustering I. INTRODUCTION Time series data is everywhere, especially in the data mining field where we want to get valuable and interpretable information from huge raw data sets. It is especially true for financial time series, for predicting financial trend based on past historical data has been one of the most prevalent ways to make profit. In order to learn more about this financial time series data, we propose an experiment using machine learning algorithm on S&P500 data set to see whether such time series data has a unique data or pattern that could be clustered, i.e: a cluster of very fluctuate index during financial crisis. The goal of clustering is to identify structure in an unlabeled data set by objectively organizing data into homogeneous groups where the within-group-object similarity is minimized and the between-group-object dissimilarity is maximized. In this sense, clustering is sometimes called automatic classifica- tion [3]. Without using any class label, clustering is also known as unsupervised learning or learning by observations instead of by examples. Han and Kamber [3] classified clustering methods into five major categories: partitioning methods, hierarchical methods, density based methods, grid-based methods, and model-based methods. One of model-based methods approach is neural network[7], which consists of competitive learning, including ART and self-organizing feature maps (SOM). We choose SOM because it is one of the most frequently used method for clustering temporal sequences [9]. This report is organized as follows. Section 2 gives a brief introduction to SOMs. Section 3 explain the preprocessing approach we use. Section 4 describes the experiment design. Section 5 examines the result and consequences from it. Section 6 gives possible improvement suggestion and section 7 concludes the report. Henrik. Grandin is a student at Uppsala University Aditya. Hendra is a student at Uppsala University II. SELF ORGANIZING MAP B. Hammer, A. Micheli, A. Sperduti, and M. Strickert [6] used Self Organizing Map (SOM) for time series clustering prediction with recurrent neural networks, it is said that SOM is the most frequently used method when dealing with clustering of temporal sequences. The SOM is part of unsupervised learning which works without using labels or examples to achieve auto classification, data segmentation or vector quantification [4]. SOM adopts a modified competitive learning, where nodes which have the most similarity with inputs win the competition and its weight get updated. A difference with normal competitive learning is that, SOM also updates its neighbourhoods’ weight with value less than the winner. It is also mentioned by C. Y. Tsao and S. H. Chen [4], from their literature research, that SOMs have been proven to be an effective methodology to analyze problems in finance, economics and marketing. Therefore, we think it is appropriate to use SOM to find clusters that contains so-called unusual trading data which usually happens during financial crisis. III. DATA PREPROCESSING The data that we have worked on for each day is open, high, low, close, and volume. It was downloaded to a csv file from the Yahoo Finance web page [11]. Since a stock exchange is a non linear time series (inflation), working with the raw data would not give any results. We need to process it in some sense. It is important that one attribute does not dwarf the other attributes. At the same time, we do not want to make light when a big change is happening in a single attribute. The solution we ended up using was to work with the the percent of change of an attribute from the previous day. Pn = An An−1 (1) Where P is the processed attribute and A is the raw data. To be able to study the stock over time we need to group the days into sets of days. It is not obvious what size of these sets would be optimal so we will have to try a number of different sizes. To ensure that we do not miss interesting patterns because they are happening from the end of one set into the beginning of another, we need to have overlapping between the sets. This is all solved with a python script that reads a csv file with the raw data. It gives us the possibility to decide the size of each set, the number of days two neighboring sets should have in common and which of the attributes we wish to use (open, high, low, close and volume).
  • 2. MACHINE LEARNING PROJECT REPORT, JUNE 2014 2 IV. EXPERIMENT DESIGN We decided to use the software Orange [12]. It is a software written in python with a nice and easy to use graphical interface. It has a lot of classification methods but we are only interested in its SOM implementation. We started experiment- ing in Matlab but the SOM implementation in Matlab does not allow us to see exactly which of input sets which gets associated to which nodes in the output map. This is a vital functionality for our experiment and orange do provide it. The SOM application in Orange allow us to customize it to some extent. We can change the size and topology of the output map. The initial weights of the map can either be random or evenly distributed. We can set the initial and final size of the neighborhood, as well as decide if the neighborhood move function should be a Gaussian or a ”top hat” function. One setting that would have been helpful but unfortunately was not included in the orange application would be to decide how many iterations for the tuning phase, the only thing you could decide on was the total amount of iterations. In our first experiment we used a set of 20 days of trading (one month) with an overlap of 10 days between each set. The result was basically a random spread over the 2D map. No distance between the nodes had been created and the amount of sets in each node was basically the same all over the map. This remained the same with whatever settings we used on the SOM. After this experiment we concluded that we have to account for a problem with time series in a SOM. If two sets are basically identical, with the only difference that one set is delayed by one day. The SOM will not be able to recognize the similarities, since it is only comparing day one with day one, day two with day two and so forth. The solution to this is to ensure that each day will be used in each position of the set. If we have a set size of ten days, every day will be in ten different sets. Two neighboring sets will have nine days in common. Basically we are sliding the data sets by one day. When using the new settings, 20 days with 19 days overlap, the resulting 2D map remained the same. When using 20 days with five attributes for each day, we get 100 dimensions for our SOM. We tried to reduce the dimensions by either reducing the days in each set, or by reducing the number of attributes we track for each day. We tried a Number of combinations between 5-20 days and 1-5 attributes but still sliding the data sets by one day. Nevertheless, the result was for the most part as unimpres- sive as before, it was only when we only used one attribute that some resemblance of clusters started appearing. The problem is that the clusters aren’t really separated when you start to look at the days that are contained in each clusters. Since we have overlapping in our sets, days will be contained in multiple sets. The problem is that these sets are not contained in the same cluster for the most part. Most days will therefore be present in most clusters, outliers as well as the main clusters. Fig. 1: 20 days set, 10 days overlap, tracking open, high, low, close and volume Fig. 2: 5 days set, 4 days overlap, tracking open, high, low, close and volume
  • 3. MACHINE LEARNING PROJECT REPORT, JUNE 2014 3 Fig. 3: 20 days set, 19 days overlap, tracking closing value Fig. 4: 5 days set, 4 days overlap, tracking amount of stocks traded This is a sample of the resulting 2D maps from our SOM. The size of the circles represent how many of the sets are associated with that node. The colour of each node represent the distance between the nodes, light colour represent short distance while dark colour represent larger distances. Nodes without circles are there to indicate distances between the nodes with circles. Figure one and two are ran while tracking all 5 attributes, as a result we do not manage to find any clusters. A few of the corners have created distance from the rest of the map, but each node only have 1 set in each node. These sets do not include any dates that is on the list of days with great change in the SP 500 index [13]. When looking at the values each set, there was no apparent reason why they where outliers. As you can see in figure three and four, we actually manage to get clusters when we are only tracking one attribute in the time series (closing value did in general create more distinct clusters than volume). V. RESULTS AND FAILURES Overall, the SOM does not produce a deterministic cluster pattern that could indicate whether a cluster contain a group of unusual data such as the index during financial crisis 2008. The resulting clusters contain data that looks more like a normal random data, with each has data from various time as seen at previous figures. For our preprocessing, we extract data sets from a single time series using sliding windows method to create more data sets with overlapping time. This method also called as sub-sequence clustering or STS (Subsequence Time Series) clustering. Lin, E. Keogh, and W. Truppel at [5] claim that clustering of streaming time series is meaningless because the use of data sets extracted by sliding window method. As shocking as it is, the claim provide proof that clustering from sliding window time series is essentially no different than clustering from a random walk data. It is said in that literature that for any time series data set T, if T is clustered using sliding windows and sampling’s length for the sliding windows is very small compared to the length of the overall time series, then the mean of all data will be an approximately constant vector. Although we haven’t really tested and proved this theorem, the visual confirmation of the SOM clusters indicate that each cluster contains a general data consists of random sampling from each data sets. The reason of why this happens could be explained by introducing cluster distance(A,B) and cluster meaningfullness(X,Y). We will use the following equation to help us understand these two terms: • Let A=( ¯a1, ¯a2, ..., ¯ak) be the cluster centers derived from one run of STS k-means. • Let B=( ¯b1, ¯b2, ..., ¯bk) be the cluster centers derived from another different run of STS k-means. • Let dist( ¯ai, ¯aj) be the distance between two cluster centers, measured with Euclidean distance. Then we could define the distance between two sets of clusters as: cluster distance(A, B) ≡ k i=1 min[dist( ¯ai, ¯aj)], 1 ≤ j ≤ k (2) We could use this distance to tell us the similarity between the two cluster sets. The experiment described in the literature
  • 4. MACHINE LEARNING PROJECT REPORT, JUNE 2014 4 uses k-means as the main clustering algorithm example, 3 random restarts of k-means on a stock market data set was created and saved as set X. Another 3 random restarts on random walk data set was also created and saved as set Y. Both set are processed as follows: • within set X distance is the average cluster distance between one set of X with other set of X. • between set X and Y distance is the average cluster distance between one set of X with other set of Y. The relationship between these two equation is represented as: cluster meaningfullness(X, Y ) ≡ within set X distance between set X and Y distance (3) Since the numerator is measuring distance between similar clusters the value should be very small or close to zero. On the other hand, the denominator is measuring distance between two different cluster, therefore, the value should be large and overall the cluster meaningfullness(X,Y) should be very close to zero. The result mentioned in the literature is actually very dif- ferent, with between set X and Y distance’s value suggesting that X and Y set is very similar. The literature also said that the experiments were done with many other clustering algorithm including SOM. One conclusion that was suggested as the root cause of why this happens is that ”STS clustering algorithms are simply returning a set of basis functions that can be added together in a weighted combination to approximate the original data.” VI. POSSIBLE IMPROVEMENTS One problem with the regular SOM is that it does not have a sense of time. When you are going to update the weights in a node, it is based on its current position. Its current position is basically just a regular sum of all previous movements and its starting position. When working with time series, it does make sense to change this. Patterns that happened the last month should have more impact than patterns that happened 10 years ago. The recurrent self-organizing map is an alteration to the regular SOM that aims to fix this. The RSOM update rule looks like: [10] yi(t) = a n−1 k=0 (1−a)k (x(t−k)−wi(t−k))+(1−a)k yi(t−n) (4) Where x(t) is the input pattern at iteration t, wi(t) is the weights of node i at iteration t, x(t) -wi(t) is be the movement needed to move node i to x(t), a, 0 < a ≤ 1 determines the impact of older movements. when a approaches 1, we discard old movements, the system is a short-term memory. When a approaches 0 the system is a long-term memory. We did not manage to find any application that had RSOM implemented, and we did not have enough time to implement it. But it seems reasonable that this would at least improve our clusters. When we used multiple attributes in our time series (open, high, low, close and volume) our SOM did not manage to create any clusters. We have not find any paper that had used more than one attribute in their time series. One of the problems with using multiple attributes is of course that the input dimension in your SOM greatly increases. But using 25 day time series with one attribute created better clusters than using five attributes in a five day time series. If the problem was only about dimensionality they should be comparable. We propose an experiment with a different approach to the multiple attribute time series. The first step is to use the SOM on each attribute individu- ally, then define clusters of the output map for each attribute. The next step is to combine clusters from different attributes. If two clusters from two different attributes have 80% in common, create a new cluster with the 80%. The 20% from the two original clusters get put in two smaller clusters. We imagine that there might be quite a lot of clusters so it might be necessary to create a distance measure between the clusters that makes it possible to merge clusters that gets to close to each other. Other simple modification that we could suggest is not using sliding windows approach, instead of using subsequences that are randomly extracted[5]. Another approach that is quite differnt is to combine clus- tering (SOM) with recurrent neural network (RNN) algorithm [9]. In this literature, the SOM algorithm is used for temporal sequence processing and classification and a Recurrent Neural Network will be associated for each created cluster as a predictor. The literature also states that RNN uses ”internal feedback mechanism that creates an implicit memory that contributes to the prediction.” This also means using data in sequentially correct order is obligatory for correct predictions. Although this approach also uses sliding temporal window method, we do not know how this method affect the overall approach, or whether a non-sliding temporal window method should be used. VII. CONCLUSION For this project, we tried to find a unique cluster from S&P500 time series using Self-Organizing Map. We used the value of Opening, Closing, Highest and Lowest of daily index and also daily trading volume. But the resulting clusters do not have any distinguishable value. One of the main cause of the problem is probably the use of sliding window method. One of the reason we think the approach by A. Cherif, H. Cardot, and R. Bon at [9] could work is because this literature discusses experiments which use a well-known time series, the MackeyGlass, which visually looks very similar with S&P500 index chart. The usefullness of this approach at a time series that is similar with S&P500 makes it worth to mention. ACKNOWLEDGMENT The authors would like to thank Joseph Scott for supervising our project and our lecturer Olle Gallmo for his teaching during the course.
  • 5. MACHINE LEARNING PROJECT REPORT, JUNE 2014 5 REFERENCES [1] T. Kohonen, The self-organizing map, Proceedings of the IEEE, vol. 78, no. 9, pp. 14641480, Sep. 1990. [2] T. Koskela, M. Varsta, J. Heikkonen, K. Kaski, ”Recurrent SOM with Local Linear Models in Time Series Prediction” in 6th European Sym- posium on Artificial Neural Networks, 1998, pp. 167-172. [3] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, 2001 pp. 346389. [4] C. Y. Tsao and S.-H. Chen, Self-organizing maps as a foundation for charting or geometric pattern recognition in financial time series, in 2003 IEEE International Conference on Computational Intelligence for Financial Engineering, 2003. Proceedings, 2003, pp. 387394. [5] Lin, E. Keogh, and W. Truppel, Clustering of streaming time series is meaningless, in Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, 2003, pp. 5665. [6] B. Hammer, A. Micheli, A. Sperduti, and M. Strickert, Recursive self- organizing network models, Neural Networks, vol. 17, no. 89, pp. 10611085, Oct. 2004. [7] T. Warren Liao, Clustering of time series data survey, Pattern Recogni- tion, vol. 38, no. 11, pp. 18571874, Nov. 2005. [8] A. Fonseka, D. Alahakoon, and S. Bedingfield, GSOM sequence: An unsupervised dynamic approach for knowledge discovery in temporal data, in 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), 2011, pp. 232238. [9] A. Cherif, H. Cardot, and R. Bone, SOM time series clustering and prediction with recurrent neural networks, Neurocomputing, vol. 74, no. 11, pp. 19361944, May 2011. [10] M. Varsta, J.R. Milln and J. Heikkonen, ”A Recurrent Self-Organizing Map for Temporal Sequence Processing”, Lecture Notes in Computer Science Volume 1327, 1997, pp 421-426 [11] Website:Yahoo Finance, S&P 500 Stock data, Accessed: 28/5 2014, URL: http://finance.yahoo.com/q/hp?s=%5EGSPC+Historical+Prices [12] Website: SOM Application, Orange, Accessed: 28/5 2015, URL: http://orange.biolab.si/ [13] Website:List of largest changes in the S&P 500 Accessed: 29/5 2015, URL: http://en.wikipedia.org/wiki/List of largest daily changes in the S%26P 500