Dissertation summary                                                                Francesco Azzena

Neural Networks fo...
Dissertation summary                                                                Francesco Azzena

With this objective...
Dissertation summary                                                                Francesco Azzena

The Perceptron idea...
Dissertation summary                                                                  Francesco Azzena

Starting from the...
Dissertation summary                                                                Francesco Azzena

We created fifty mo...
Upcoming SlideShare
Loading in …5

Summary Of Thesis


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Summary Of Thesis

  1. 1. Dissertation summary Francesco Azzena Neural Networks for Predicting Financial Series. A case of study: S&P Mib Index. The subject of this thesis is the study of so-called neural networks for forecasting purposes. These networks consist in a system of mathematical-statistical modelling capable of describing the relationship between one or more output variables and a set of inputs. This type of modelling is particular in that no direct relationship is established between inputs and outputs, one or more hidden layers being set between them that contain calculating units called neurons, which create fictitious variables. These models have experienced fluctuating fortunes over the years: the first attempts appeared in the 40s as simple linear combinations of inputs. Slowly, neural networks were refined, from the famous Perceptron models to multi-layers models with nonlinear activation functions. The construction of these models came through many phases of training whose purpose was to make the network learn the relationship between the variables under consideration. Despite the fact that proper computational tools had not yet been conceived, neural networks were slowly driven out of the attention of statisticians, above all because of their great complexity. A second break through came at the end of the 80s, almost unexpectedly: George Cybenko, an American professor of mathematics, proved through the study of the properties of the sigmoid function that if well constructed, neural networks with such activation functions and one hidden layer could approximate any nonlinear process generating data, and this with an arbitrary small margin of error. Of course, this conclusion received great prominence: working on non-linear generating process is always problematic precisely because of the difficulty implied in establishing a good model to use. Neural networks can approximate any type of non-linearity in the series —in theory at least— through the learning system and the non-linearity of the activation functions between the layers. For this reason, neural networks have made a comeback in many scientific fields, from economics to medicine, biology, meteorology and many others. We thought therefore that it would be interesting to investigate in our work the theoretical basis which underpin this tool, so much in vogue nowadays. 1/5
  2. 2. Dissertation summary Francesco Azzena With this objective in mind, we decided to produce a case study that would improve our knowledge of the issues related to the construction of a neural network. This required the use of particular computer tools. Notably, we chose to proceed with the drafting of a program in Matlab language instead of using the appropriate statistical packages. As this thesis was realized within the framework of a degree profile in “Banking and Finance", we decided to focus on the most important index of the Milan stock exchange, the Standard & Poor's Mib (S&P Mib), and to attempt to make good predictions. The choice may seem bold: according to the theory of the weak efficiency of the markets, we should not be able to provide a proper model for such kind of variable, since it should follow a random walk and therefore have completely random variations, just like a white noise. We nonetheless decided to make this attempt, hoping not only to learn about neural networks, but also to make interesting predictions or, in the worst-case scenario, to demonstrate the correctness of the theory under consideration. More specifically, the problem we tried to solve was how to predict the variations at the time of closure of the Italian stock exchange by using the information available before the opening time. The variables taken into account were the latest available closing value of the Tokyo and New York stock exchange, together with the currency exchange rates of Euro with Yen and Dollar published by the European Central Bank. If that modelling worked, therefore, a hypothetical operator in the Milan stock exchange would be able to anticipate the market successfully, speculating through the assumption of "short" or "long" positions on securities with high correlation with the market. The structure of the thesis is rather simple: after a brief introduction in the first chapter, we describe, in the following chapter, the story of the ideas related to neural networks, with a short digression on the medical phenomenon of the same name. We then analyze the historical process by which, from the first simple neural networks, more and more complex networks came to be used. The structure of neural networks has, indeed, evolved over time: initially, there was the model of the physiologists McCulloch and Pitts (MCP), which simply was a weighted sum compared with a threshold value, this allowing for a response of binary type comparable to the activity of the brain, which works through electrical impulses. As this methodology is suitable only for linearly divisible problems, Rosenblatt proposed in 1958 the Perceptron model: for the first time, the idea of a hidden level between inputs and outputs appeared. The composition of the inputs remained a simple weighted sum, but by increasing the number of neurons and thus of summations, it was made possible to define a precise area in the Cartesian plane and then solve problems not divisible linearly. 2/5
  3. 3. Dissertation summary Francesco Azzena The Perceptron idea opened the way for increasingly complex solutions. Today, the number of hidden layers and neurons in each one of them depends only on the choice of structure made by the user. Also, in every neuron we are now capable of applying to the sum of inputs in the previous level a function chosen discretionally. This high possibility of customization of the models has allowed neuron networks to adapt easily to many types of studies, thus finding applications in many branches of science. In the next part of the thesis, we explain the characteristics which distinguish the main types of neural networks: the presence or absence of a teacher (which allows the model to learn) distinguish between supervised networks (the most common type, the one on which we focussed our attention) and unsupervised ones, which are less efficient but have the advantage of operating in real time without any human intervention. After an overview of these models, we analyze in detail the various components of a common supervised neural network with particular attention to the activation functions and the weights. When we create a model of this type, indeed, the first decisions we have to take pertain to the structure: the high level of customization of the product allows us to create a network which we believe is best suited to the series under study. Once the number of layers and hidden neurons for each one of them have been determined, another important choice concerns the activation functions. As already mentioned, these are the functions that we apply to the weighted sum of inputs in each neuron of the network. There exists various types of functions of the kind, which we give a general picture of. The most important to date is the sigmoid, since a fundamental study concerning it has been produced already. The neural networks almost fell into disuse when the mathematician George Cybenko demonstrated, in an article focussing on the properties of the sigmoid, that a neural network with this type of activation function, if well structured and with the right choice of variables, is an universal approximator. Neural networks are therefore able to approximate a non-linear function generating data with a margin of error smaller than a small epsilon arbitrarily chosen. At the end of the second chapter of the thesis, we explain the methods of training the weights in the network: when we create the network, the weights of the sum in each neuron are chosen randomly. Subsequently, using the comparison between estimated values and values of the teacher, the weights are trained to reach their most correct estimation. 3/5
  4. 4. Dissertation summary Francesco Azzena Starting from the definition of error, we then examine how it is possible to use its gradient in order to move towards the minimum point on the curve of error by changing the weights of the model. Subsequently, we discuss the backpropagation algorithm, a method that enables the transmission of error to the hidden layers of the model, although it is calculated solely on the final outputs, the only detectable ones. We also pay attention to the most commonly used methods to reduce the time for calculating the weights and to improve performance. More particularly, we focus on methods based on learning speed, both constant and variable. The last one was subsequently used in the practice as proposed by Silva and Almeida. At the end of this chapter, we also discuss the most popular way to avoid overfitting, i.e. the excessive learning of the model about the sample, becoming almost useless out of sample. This is one of the greatest risks using a model characterized by the ability to learn. The method we propose and use is the division of the sample into three different groups: the training set to train the model, the cross validation set, which is used to avoid the overfitting through the comparison between the error curves on the training set and on the cross validation set, and the test set, the part of the sample used to test the forecast ability when the training has ended and the weights are fixed. In the third chapter of the thesis, we deal with practice. First, we analyze the data chosen to test the neural networks on financial series for forecasting: our choice fell on the daily percentage changes of the S&P Mib. We tried to explain its trend by using the data available before the start of trading: more particularly, we used the morning closure value of the Nikkei 225 index of the Tokyo stock exchange (thanks to time zone) and the one of the previous day, the Dow Jones Industrial Average of New York of the two previous days, the currency rates of the two previous days between the Euro and Yen and Dollar. The thesis also comports a section devoted to the explanation of the theory of the weak efficiency of the markets, which would seem to involve the unpredictability of the series under consideration, since it would result as a random variable. We then move on to the implementation of our ideas in Matlab language. We tried to create a small guide for the implementation of a neural network with this program, focusing on those commands which enact the options described in the theoretical part. 4/5
  5. 5. Dissertation summary Francesco Azzena We created fifty models for each type of neural network with one layer distinguished by the number of neurons (from one to fifteen, for a total of 750 models estimated). So we selected the best of every type and compared them to the prediction of the test set: the objective was to find a network that had a performance better than a white noise, which confirms the theory of the weak efficiency of the markets. In the last chapter of the thesis, we analyze the work done, pulling the conclusions about the practical work in light of the theory discussed previously. When we chose the S&P Mib for the practical application, we expected to meet many difficulties and so did it happen: the estimated models unfortunately did not give satisfactory results. Using the mean square error (MSE) for the assessment of the predicting performances, the best result was obtained assuming the series as a white noise, putting the expected result values constantly equal to zero: therefore the theory of the weak efficiency of the financial markets seems fully confirmed. However, in some cases, the results were interesting, especially concerning the ability to provide with a good approximation the combination of sign and magnitude of the change of the stock market index in the day. Therefore, although they can not be considered a tool which makes certain predictions, such models could be useful when used by an experienced operator: actually, managed in conjunction with other information spread in the markets and with the consciousness and the intuition derived from experience, they could find a profitable use. To operate on a wide and varied aggregate data as an stock exchange index, it is first necessary to work properly with the economic theories it is based on. We tried to identify the most appropriate inputs, but obviously, a financial analyst, with a greater knowledge of the mechanisms of the market, can make a better choice: the Cybenko theorem could function with a good choice not only of structure, but also of inputs to the network. 5/5