1. MECN4006 β Research Project
Correlation of Factors Influencing a Share Price
Name: Ashail Maharaj
Student Number: 536684
Supervisor: Dr Ian Campbell
25 August 2016
A project report submitted to the Faculty of Engineering and the Built Environment, University
of the Witwatersrand, Johannesburg, in partial fulο¬lment of the requirements for the degree of
Bachelor of Science in Engineering.
Johannesburg, August 2016
2. ii
DECLARATION
UNIVERSITY OF THE WITWATERSRAND, JOHANNESBURG
SCHOOL OF MECHANICAL, INDUSTRIAL AND
AERONAUTICAL ENGINEERING
I Ashail Maharaj Student Number: 536684, am registered for the course No. MECN 4006 - in the
year 2016.
I herewith submit the following task, βResearch Project: Correlation of Factors Influencing a Share
Priceβ in partial fulfilment of the requirements of the above course.
I hereby declare the following:
ο· I am aware that plagiarism (the use of someone elseβs work without their permission and/or
without acknowledging the original source) is wrong;
ο· I confirm that the work submitted herewith for assessment in the above course is my own
unaided work except where I have explicitly indicated otherwise;
ο· This task has not been submitted before, either individually or jointly, for any course
requirement, examination or degree at this or any other tertiary educational institution;
ο· I have followed the required conventions in referencing the thoughts and ideas of others;
ο· I understand that the University of the Witwatersrand may take disciplinary action against me
if it can be shown that this task is not my own unaided work or that I have failed to acknowledge
the sources of the ideas or words in my writing in this task.
Signature: ________________________________ Date: 25/08/2016
3. iii
ABSTRACT
When investors predict share prices it is important to find the relevant factors that could influence the
way the investors value the share. Generally, when investors value a share they use different factors
that they feel are strong indicators of the companyβs worth. These factors are also determined by the
knowledge the investors have gained. There are a vast number of factors that a companyβs financial
performance is dependent upon and each of these factors could be related to each other, in such a case
it would be redundant to include all the factors. The scope of this project focused around the correlation
of factors with each other and the share price, with the aim to find the factors that are most relevant to
the share price. The objectives of this research were to determine what factors influence the price of the
Richemont share, what the relationship between relevant factors and the Richemont share price are and
which factors can be used to predict the share price. A group of 8 factors were chosen. i.e. dividends
yield, volume traded, rand-euro exchange rate, rand dollar exchange rate, inflation (CPI), the prime
lending interest rate, the gold price and the platinum price. Each factor was recorded at daily, weekly,
monthly, quarterly, half-yearly and yearly intervals.
During this study a number of steps were carried out to ensure that the data used was valid for the
analysis. The correlation and cross correlation of the factors with the share price was determined. The
underlying assumptions for correlation were then tested. A linear regression was performed on each
factor with the share price and a residual analysis was done. Each series was then transformed to get a
better linear fit between the factors and the share price. Each transformed set of factors and share price
was linearly regressed after transformation and the residual analysis was performed again. Correlation
and cross correlation were evaluated between all factors to find redundant factors, which were then
eliminated. The volume traded was the only relevant factor remaining after the test. The relationship
found in the regression was then tested for each datasetβs volume traded series to see if the regression
model still holds.
This study then concluded showing that the only relevant factor from the initial 8 was the volume, the
relationship between the Richemont share price and the volume traded is showed by the equation β π¦3
=
(β2 β 10β27) β π₯3
+ 21.9 and that the volume traded is correlated with the share price with a correlation
coefficient of -0.894.
4. iv
TABLE OF CONTENTS
DECLARATION...................................................................................................................................ii
TABLE OF CONTENTS ....................................................................................................................iv
LIST OF FIGURES.............................................................................................................................ix
LIST OF TABLES...............................................................................................................................xi
1 INTRODUCTION.........................................................................................................................1
1.1 Background.............................................................................................................................1
1.2 Motivation...............................................................................................................................1
2 LITERATURE SURVEY.............................................................................................................2
2.1 Fundamental Analysis.............................................................................................................2
2.2 Technical Analysis..................................................................................................................3
2.3 Time Series Analysis ..............................................................................................................4
Seasonality......................................................................................................................6
Trends .............................................................................................................................7
Correlation ......................................................................................................................7
Pearsonβs Product Moment Correlation Coefficient.......................................................7
Cross correlation and Auto Correlation ........................................................................13
2.4 Feature Selection...................................................................................................................14
2.5 Exploratory Data Analysis (EDA) ........................................................................................14
2.6 Confirmatory Data Analysis (CDA) .....................................................................................14
2.7 Dividend Yield......................................................................................................................15
2.8 Interest Rate ..........................................................................................................................15
2.9 Principle Component Analysis (PCA) ..................................................................................15
2.10 Causality ...............................................................................................................................16
2.11 Linear regression...................................................................................................................16
2.12 Residual Analysis..................................................................................................................16
2.13 Related Studies......................................................................................................................17
3 OBJECTIVES .............................................................................................................................18
5. v
4 APPARATUS ..............................................................................................................................19
5 METHODOLOGY .....................................................................................................................19
5.1 Collection of Data.................................................................................................................19
5.2 Processing of Data ................................................................................................................20
5.3 Precautions............................................................................................................................21
6 OBSERVATIONS.......................................................................................................................23
7 DATA PROCESSING ................................................................................................................27
7.1 Correlation Values ................................................................................................................27
7.2 Cross Correlation and auto Correlations...............................................................................27
7.3 Assumption tests...................................................................................................................28
7.4 Coefficient of determination.................................................................................................29
7.5 Confidence intervals of regression plots...............................................................................29
7.6 Residual analysis for transformed data .................................................................................29
7.7 Correlation of factors ............................................................................................................29
7.8 Cross Correlation of all factors .............................................................................................29
7.9 Residuals for fitting regression model to each data setsβ volume traded data ......................29
8 RESULTS ....................................................................................................................................30
8.1 Correlation Values ................................................................................................................30
8.2 Cross Correlations and Auto Correlations ............................................................................31
8.3 Assumption Tests..................................................................................................................34
8.4 Coefficient of Determination ................................................................................................36
8.5 Regression of Price Vs Factors .............................................................................................37
8.6 Residual vs Actual ................................................................................................................41
8.7 Regression of Price Vs Factors After Transformations ........................................................45
8.8 Confidence Intervals on regression plots..............................................................................48
8.9 Residual Analysis for transformed data................................................................................50
8.10 Correlation of Factors ...........................................................................................................52
8.11 Cross Correlation of all Factors ............................................................................................53
8.12 Residuals for fitting regression model to each periods volume traded data..........................56
6. vi
9 DISCUSSION..............................................................................................................................59
9.1 Correlation Coefficients........................................................................................................59
Average correlation for each factor...............................................................................59
Absolute average correlation for each data capture frequency .....................................59
9.2 Cross Correlation ..................................................................................................................60
9.3 Assumption Tests..................................................................................................................61
Daily..............................................................................................................................61
Weekly..........................................................................................................................61
Monthly.........................................................................................................................62
Quarterly .......................................................................................................................62
Half-Yearly ...................................................................................................................62
Yearly............................................................................................................................62
9.4 Linear Regression .................................................................................................................62
Dividends Yield Vs Share Price....................................................................................63
Volume Traded Vs Share Price.....................................................................................63
Rand-Euro Exchange rate vs Share Price......................................................................63
Rand-Dollar Exchange rate Vs Share Price ..................................................................63
Inflation Vs Share Price ................................................................................................63
Interest Rate Vs Share Price..........................................................................................63
Gold Price Vs Share Price.............................................................................................63
Platinum Price Vs Share Price ......................................................................................64
9.5 Residuals vs Actual...............................................................................................................64
Share Price ....................................................................................................................64
Dividend yield...............................................................................................................64
Volume..........................................................................................................................64
Rand-Euro.....................................................................................................................64
Rand-Dollar...................................................................................................................65
Inflation.........................................................................................................................65
Interest...........................................................................................................................65
7. vii
Gold...............................................................................................................................65
Platinum........................................................................................................................65
9.6 Transformation and Linear Regression.................................................................................65
Dividends Yield ............................................................................................................66
Volume..........................................................................................................................67
Rand-Euro.....................................................................................................................67
Rand-Dollar...................................................................................................................67
Inflation.........................................................................................................................67
Interest Rate and Gold...................................................................................................67
Platinum........................................................................................................................67
9.7 Confidence intervals on regression plots ..............................................................................67
9.8 Residual analysis On transformed data.................................................................................68
Dividend Yield..............................................................................................................68
Inflation.........................................................................................................................68
Platinum Price...............................................................................................................68
Rand-Dollar Exchange Rate..........................................................................................68
Rand-Euro Exchange Rate............................................................................................68
9.9 Correlation of factors ............................................................................................................69
Dividends yield .............................................................................................................69
Volume traded...............................................................................................................69
Rand-Euro exchange rate..............................................................................................70
Rand-Dollar exchange rate............................................................................................70
Inflation.........................................................................................................................70
Interest Rate ..................................................................................................................71
Gold Price .....................................................................................................................71
Errors from Fitting the Linear Model to Each Volume Traded and Share Price Data Set
72
10 CONCLUSIONS .....................................................................................................................72
11 RECOMMENDATIONS........................................................................................................72
8. viii
12 REFERENCES........................................................................................................................74
APPENDIX A: Matlab Code for Cross Correlation and Auto Correlation..................................80
APPENDIX B: Matlab Code for Testing All Assumptions.............................................................81
APPENDIX C: Matlab Code for A Regression Plot with Confidence Limits ...............................83
APPENDIX D: Code for changing the data that is plotted.............................................................84
APPENDIX E: Matlab Code for The Cross Correlation with Each Data Set...............................85
APPENDIX F: Matlab Code for Extracting Error Graphs............................................................89
APPENDIX G: Residual Plots ...........................................................................................................91
9. ix
LIST OF FIGURES
Figure 1: Moods of the masses according to technical analysis[8].........................................................3
Figure 2: Shifting of Time Series..........................................................................................................14
Figure 3: Platinum Price Over Time.....................................................................................................23
Figure 4: Share Price Over Time ..........................................................................................................23
Figure 5: Gold Price Over Time ...........................................................................................................24
Figure 6 Volume Traded Over Time.....................................................................................................24
Figure 7: Rand-Euro Over Time...........................................................................................................25
Figure 8: Dividend Yield Over Time....................................................................................................25
Figure 9: Rand-Dollar Over Time.........................................................................................................26
Figure 10: Interest Rate Over Time ......................................................................................................26
Figure 11: Inflation Over Time.............................................................................................................27
Figure 12: Cross Correlation of Factors Within Daily Data Set ...........................................................32
Figure 13: Cross Correlation of Factors Within Weekly Data Set........................................................32
Figure 14: Cross Correlation of Factors Within Monthly Data Set ......................................................33
Figure 15: Cross Correlation of Factors Within Quarterly Data Set.....................................................33
Figure 16: Cross Correlation of Factors Within Half-Yearly Data Set.................................................34
Figure 17: Cross Correlation of Factors Within Yearly Data Set .........................................................34
Figure 27: Linear Regression of DY Vs Share price ............................................................................37
Figure 28: Linear Regression of Volume Traded Vs Share price.........................................................38
Figure 29: Linear Regression of Rand-Euro Exchange Rate Vs Share price........................................38
Figure 30: Linear Regression of Rand-Dollar Exchange Rate Vs Share price .....................................39
Figure 31: Linear Regression of Inflation Vs Share price ....................................................................39
Figure 32: Linear Regression of Interest Rate Vs Share price..............................................................40
Figure 33: Linear Regression of Gold Price Vs Share price.................................................................40
Figure 34: Linear Regression of Platinum Price Vs Share price ..........................................................41
Figure 18: Residual Plot for Share Price...............................................................................................41
Figure 19: Residual Plot for Dividend Yield ........................................................................................42
Figure 20: Residual Plot for Volume Traded........................................................................................42
Figure 21: Residual Plot for Rand-Euro Exchange Rate ......................................................................43
Figure 22: Residual Plot for Rand-Dollar Exchange Rate....................................................................43
Figure 23: Residual Plot for Inflation ...................................................................................................44
Figure 24: Residual Plot for Interest Rate.............................................................................................44
Figure 25: Residual Plot for Gold Price................................................................................................45
Figure 26: Residual Plot for Platinum Price .........................................................................................45
Figure 35: Linear Regression of Dividend Yield Vs Share Price After Transformation......................46
Figure 36: Linear Regression of Volume Traded Vs Share Price After Transformation......................46
10. x
Figure 37: Linear Regression of Rand-Euro Exchange Rate Vs Share Price After Transformation ....47
Figure 38: Linear Regression of Rand-Dollar Exchange Rate Vs Share Price After Transformation
..............................................................................................................................................................47
Figure 39: Linear Regression of Inflation Vs Share Price After Transformation.................................48
Figure 40: Linear Regression of Platinum Price Vs Share Price After Transformation.......................48
Figure 41: Regression Plot of transformed Dividend Yield vs transformed Share Price with Confidence
Intervals ................................................................................................................................................49
Figure 42: Regression Plot of transformed Rand-Euro vs transformed Share Price with Confidence
Intervals ................................................................................................................................................49
Figure 43: Regression Plot of transformed Rand-Dollar vs transformed Share Price with Confidence
Intervals ................................................................................................................................................49
Figure 44: Regression Plot of transformed Inflation vs transformed Share Price with Confidence
Intervals ................................................................................................................................................50
Figure 45: Regression Plot of transformed Platinum Price vs transformed Share Price with Confidence
Intervals ................................................................................................................................................50
Figure 46: Residual Plot of transformed regression between Share Price and DY...............................51
Figure 47: Residual Plot of transformed regression between Share Price and Inflation.......................51
Figure 48: Residual Plot of transformed regression between Share Price and Platinum Price.............51
Figure 49: Residual Plot of transformed regression between Share Price and Rand-Dollar ................51
Figure 50: Residual Plot of transformed regression between Share Price and Rand-Euro...................52
Figure 51: Error for Fitting Equation to Daily Data .............................................................................57
Figure 52: Error for Fitting Equation to Weekly Data..........................................................................57
Figure 53: Error for Fitting Equation to Monthly DataFigure 54: Error for Fitting Equation to Quarterly
Data.......................................................................................................................................................57
Figure 56: Error for Fitting Equation to Yearly Data ...........................................................................58
Figure 57: Error for Fitting Equation to Half-yearly Data....................................................................58
11. xi
LIST OF TABLES
Table 1: Correlation Values of Each Factor with Share Price ..............................................................31
Table 2:Assumption Test Results for Each Series in The Daily Data Set ............................................35
Table 3: Assumption Test Results for Each Series in The Weekly Data Set........................................35
Table 4: Assumption Test Results for Each Series in The Monthly Data Set ......................................35
Table 5: Assumption Test Results for Each Series in The Quarterly Data Set.....................................36
Table 6: Assumption Test Results for Each Series in The Half-Yearly Data Set.................................36
Table 7: Coefficient of Determination for Each Series.........................................................................37
Table 8: Correlation matrix of Untransformed Factors.........................................................................53
Table 9: Cross correlation of the Platinum Price with each factor........................................................54
Table 10: Cross Correlation of the Gold Price with Each Factor .........................................................54
Table 11: Cross Correlation of the Interest Rate with Each Factor ......................................................54
Table 12: Cross Correlation of Inflation with Each Factor...................................................................55
Table 13: Cross Correlation of the Rand-Dollar Exchange Rate with Each Factor..............................55
Table 14: Cross Correlation of the Rand-Euro Exchange Rate with Each Factor ................................55
Table 15: Cross Correlation of the Volume Traded with Each Factor..................................................56
Table 16: Cross Correlation of the Dividend Yield with Each Factor..................................................56
Table 17: Correlation of predicted and actual share price ....................................................................58
Table 18: Transformations used on each dataset ..................................................................................66
Table 19: R2
values ...............................................................................................................................66
12. 1
1 INTRODUCTION
1.1 Background
There are many factors that could influence a share price. The factors could be economic or behavioural.
An economic factor could be factors that influence the spending of individuals, factors like the value of
the currency, the GDP (Gross Domestic Product) and commodity prices. Behavioural factors can be
described as factors that influence the decision-making ability of individuals. Factors such as what was
taught to an individual about trading (knowledge), the risk that the individual is willing to take and the
driving factors for the investment (which could be seen as emotions). There are many factors to be
considered with any single sharesβ price. There are relationships between each factor and this could
lead to many different combinations of factors that can be used to analyse the value of a share price.
The price of a share is determined by the demand for the share, and what each individual buying or
selling the share perceives its value as. The problem with predicting the share price reduces to which
factors are most relevant to the markets perception of the value of the share. There are many factors to
use but not all will be relevant. Some factors may even be redundant as multiple factors have causal
relationships and may also be correlated with each other.
Finding the few factors, which are most relevant to the share price, can improve the prediction accuracy
and simplify the prediction process by reducing the computational complexity and the time taken to
complete the model. This project aims to find the factors which are most relevant to predicting the
Richemont share price. These factors could later be used in a prediction model. Currently there are
many methods to reaching the results required. The methods used in literature are, Factor analysis,
Principle Component Analysis, Causality and Correlations between factors.
1.2 Motivation
The purpose of this study is to investigate what factors are most relevant to the Richemont share price.
This study forms a preliminary investigation to the design of a simplified model of the JSE. While
trying to create this model, a number of factors as well as one specific share was chosen. The model
mentioned above will use Agent based simulation to simulate the decisions of entities within the market
according to economic factors such as;
ο· Fundamental pricing information including dividends.
ο· Distribution of knowledge.
ο· Economic conditions such as exchange rate, overseas market closing levels, inflation and interest
rates as well as commodity prices that may be correlated to the share.
ο· Market perceptions of the company including company ethics.
13. 2
ο· Agents will be rational and will have a risk profile and time view (e.g. buy and hold, day trading,
etc.)
ο· Competition and alternatives in the market.
ο· Agents will be segmented by financial level (and size of trades)
While searching through literature it was found that with many models selection of the correct factors
led to higher accuracy as opposed to using as many factors as possible[1]β[3]. Based on an investigation
on prior models it was found that a selection of the factors based on quality was a more effective
criterion than the quantity of factors selected. The reduction of the factors also allowed for the model
to be less computationally complex as well as it reduced the times it took for the simulations to be
completed. From this the question βWhat factors are most influential to the Richemont Share Price?β
was created and leads into this research.
2 LITERATURE SURVEY
2.1 Fundamental Analysis
Fundamental analysis is a method of evaluating the value of a share by looking at the effect of the
companyβs performance and their reactions to economic conditions [4]. Fundamental analysis seeks to
understand if the company is growing, if it is profitable, will it continue to improve or become the best
in its market segment, whether it is able to pay its debts and if the company is in good ethical standing.
All the questions and many more are answered in order to answer the question βDoes the company
make a good investment?β. Fundamental analysis is usually used for evaluation of stocks but can be
applied to securities, countries, markets and market segments [5].
There are many factors that can be analysed when performing fundamental analysis on a share, all
factors that could influence the companyβs performance could be included. Fundamental analysis is not
limited to using purely qualitative or purely quantitative data. Any news about a company can usually
be helpful to measuring the value of the company or the potential value of the company. This is where
the methodology of using event studies becomes useful. The quantitative analysis usually includes
looking at the companyβs finances and books in order to create a perceived value of the share. This
perceived value is called the intrinsic value, factors such as revenue, debt, dividends, and company
performance ratios are used as measures of this intrinsic value too[6].
The objectives of fundamental analysis are [4]:
ο· To predict the direction of economies that impact a company. This is done because the financial
performance of the company is dependent upon the economy is resides within.
ο· To estimate the intrinsic value of the stock and try to predict when changes in this value will occur.
ο· To select the right time to buy and sell stocks to maximise investment returns.
14. 3
2.2 Technical Analysis
Technical analysis is the evaluation of securities by analysing statistics which are generated by market
activity. These statistics are generally past prices or volumes traded[7]. The aim of technical analysis is
to identify patterns that can suggest future activity instead of trying to forecast intrinsic value. Technical
analysis typically depends on the use of charts patterns, technical indicators, oscillators or some
combination of the above mentioned[8]. Technical analysts believe that the charts show the moods of
the crowds and thus they focus on the analysis of mass human psychology. Emotional risk is inversely
correlated to financial risk; Figure 1 below displays the moods associated with the different price trends.
Figure 1: Moods of the masses according to technical analysis[8]
People are generally motivated by greed and optimism when buying and are driven by fear or pessimism
when selling. It is believed that people formulate scenarios based on their emotional state in order to
rationalise their emotions. Investors will try to sell at the top or as close to the top, and buy at the bottom
or as close to the bottom as possible using this rationale. Investors use this in the aid of finding turning
points which they cannot see[8].
Apart from the above-mentioned methods technical analysis, trend, support, and resistance and volume
analysis are used.
ο· Trend Analysis
Trend analysis is one of the most important and most used techniques in technical analysis. A trend is
the general direction in which the price is heading. Trends arenβt always easy to spot as there are lots
of fluctuations in the price over time. In trend analysis trends are classified according to their direction
into three sets; uptrends, horizontal trends and downtrends. An uptrend is characterised by a series of
higher highs and higher lows whereas a downtrend is characterised by lower lows and lower highs[9][8].
15. 4
Trends are then further classified into another set of three; a long-term trend, an intermediate trend and
a short-term trend. A long term trend is one that last longer than a year, an intermediate trend lasts
between one and three months and a short-term trend is considered to last up to a month. Channel lines
are the addition of two parallel trend lines which act as areas of support and resistance[8].
ο· Support and Resistance Analysis
Support is defined as the price level which the stock seldom falls below and resistance is the price level
which the stock seldom increases above. Support and resistance are governed by the psychology behind
supply and demand. Where the support is the price level which the market is willing to buy at and the
resistance level is the price at which the market is willing to sell at. When the price breaches the support
or the resistance level this means that there has been a shift in the supply or demand curves for the
shares. Once the resistance or support level is breached, itβs role will be reversed, i.e. the resistance
level will become the support level if the resistance level is broken and vice versa for the support
level[9][10].
ο· Volume Analysis
Volume is the number of shares that are traded over a given period of time, greater volume results in a
more active security. Volume charts have trends too which can show the increase or demand in the
demand or supply of the share. Volume analysis is important to technical analysis because it is used to
confirm chart trends and patterns. In most scenarios changes in volume precedes changes in price except
when the divergence case occurs. The divergence case is when the volume and price relationship starts
to deteriorate[10][11].
2.3 Time Series Analysis
A time series is a set of observations, each of which are recorded at a different point in time denoted by
π‘ . When π‘ is incremental and data is recorded at each increment the series is discrete, whereas if π‘ is
continuous the series is a continuous time series[12]. Time series analysis can be broken down into the
following objectives [13];
ο· Description
The description objective comprises of plotting the data, looking for trends, seasonality, outliers,
normality, stationarity and using more tools used to describe the data set better.
ο· Explanation
Explanation focuses around correlations and relationships between different time series or within a
single time series.
ο· Prediction
Prediction focuses around trying to estimate future values of the data. This is also known as forecasting.
This includes fitting models to the data to improve forecasts.
17. 6
ο· Control
This objective is usually used when dealing with quality control, it is used to ensure that process outputs
are within specification or are significantly within specification. The classical decomposition model is
used to describe a time series and use it to better forecast the future values, this model defines a time
series in terms of components such as the trend, seasonality and noise[14]. There are two classical
decomposition methods, the additive and the multiplicative. Equations 1 and 2 are used to describe them
respectively.
π = π + πΆ + π + π (1)
π = πΓπΆΓπΓπ (2)
Where
π is the value of the series at a specified point
π is the linear trend
πΆ is the cycle
π is the seasonality
π is the random error
Seasonality
Seasonality is described as the predictable changes that data in the time series experiences and recurs
over a one-year period[15]. Seasonality can be calculated by using Equation 3 below.
πΎπ‘ =
ππ‘
ππ‘
(3)
Where
πΎπ‘ is a series of seasonality and randomness
ππ‘ is the moving average of the time series
ππ‘ is the value of the series at time t
18. 7
In order to attain the seasonality series Equation 4 needs to be used. It is important to note that the
subscript g is the number of increments in a season, and that each season is the sum of the time each
season lasts. This is done to average out the randomness that occurs within each season[16].
π π = β πΎπ‘
(4)
Where
π π is the seasonality of the series
Trends
The trend of a time series is found by using a least squares fit of the model using Equation 5 below [16].
ππ‘ = π + ππ‘ + ππ‘ (5)
Where
ππ‘ is the Moving average value at time t
π is the intercept
π is the slope
ππ‘ is the residual
For the trend, just the linear part of Equation 5 is used and the residual term is discarded.
Correlation
Correlation in terms of time series is a measure of how two time series are able to fluctuate together or
the measure of the linear relationship between the two. It is used to tell how well one time series is able
to predict fluctuations in another[17], [18]. Correlation does not mean a causal relationship but merely
that there exists a relationship between the variables that can be exploited for the forecasting of the
other[19]. Correlated variables could have some common variable that causes the fluctuations in them
and this is what may create the relationship. A time series may also have correlations with lagged
version of itself which is called serial correlation or auto correlation [20]. Correlation also allows for
analysis of which time series leads which by using offset data and comparing the two this is called
cross-correlation[21], [22]. When these lags are used, generally a model will first be fit and then an
information criterion like AIC may be used to find the best lag order [23], [24].Alternatively the
maximum lags can be used.
Pearsonβs Product Moment Correlation Coefficient
Pearsonβs Product moment correlation (PPMC) also referred to as the Pearsonβs correlation coefficient
is a measure of how well two variables are related linearly. It enables the user to know whether fitting
a straight line to the data accurately represents the relationship between the variables in question. The
19. 8
equation below is used to calculate the coefficient of correlation. A strong correlation is represented by
an r value within the intervals [0.7; 1] or [ -1; -0.7]. Where an absolute value of 1 represents a perfect
linear relationship. A moderate strength correlation would fall within the intervals [0.3;0.7] or [-0.7; -
0.3]. Low correlation is represented with a value within the ranges [0.1;0.3] or [-0.3; -0.1]. No
correlation is when the r value is 0.[25][17]
π =
π(β π₯π¦) β (β π₯)(β π¦)
β(π β π₯2 β (β π₯)2)(π β π¦2 β (β π¦)2)
(6)
π =
πππ£(π₯, π¦)
π π₯ π π¦
(7)
Where
π is the number of observations in the sample
π₯ is the independent variable
π¦ is the dependant variable
πππ£(π₯, π¦) is the covariance between the two variables
π π₯ is the sample standard deviation of the independent variable
π π¦ is the sample standard deviation of the dependent variable
Equation 7 is a substitution of covariance for the numerator and variance of both of the variables into
the bracketed terms in equation 6 denominator [26]. With PPMC there are assumptions that were made,
if these assumptions are not met then the data may not mean what it is thought to, or the results would
not be valid[27][28]. These assumptions are;
ο· Normality
Normality is the measure of how the data is distributed and if the normal distribution can be fit to the
data significantly. Testing for normality can be done graphically or numerically. The numerical methods
that can be used to test this, are the Kolmogorov-Smirnov Test [29] (see Equation 8 for test statistic)
and the Shapiro-Wilk test (see Equation 9 and 10 for test statistic) [30]. The graphical methods that can
be used are, Q-Q plots, histograms and Box-and-Whisker diagrams[30]. The numerical tests can be
performed to a specified significance level to see if the data is normally distributed.
π = πππ₯|πΉβ(π₯) β π(π₯)| (8)
Where
π is the test statistic used for the Kolmogorov-Smirnov test
20. 9
πΉβ(π₯) is the data being tested
π(π₯) is the empirical distribution data (data from the normal distribution for normality tests)
π = (
π
π βπ β 1
)
2 (9)
π = β ππ = β π(π β π + 1)(π πβπ+1 β ππ) (10)
Where
π is the test statistic for the Shapiro-Wilk test
π is defined by Equation 10
π is defined as a Shapiro-Wilk coefficient
(π β π + 1) is defined as a Shapiro-Wilk coefficient
π₯ is defined as the data from the series being tested
Kolmogorov-Smirnov Test uses p-values to compare the test statistic and to accept or reject the null
hypothesis of the data being normal. The Shapiro-Wilk test uses critical values from a Shapiro-Wilk
table of values to compare the test statistic and conclude if the null hypothesis is correct or not.
ο· Linearity
Linearity of data is the ability of a line to display the relationship between the dependant variable and
independent variable. This is usually determined through linear regression. In linear regression the aim
is to fit a line through the data while minimising the error. The goodness of fit can be determined by
finding the coefficient of determination R2
of the line. The equation below can be used to find the
coefficient of determination [31].
π 2
= (
π(β π₯π¦) β (β π₯)(β π¦)
β(π β π₯2 β (β π₯)2)(π β π¦2 β (β π¦)2)
)
2 (11)
Or alternatively can be found by analysis of residuals with the formula below
π 2
= 1 β
β ππ
2
ππ2
(12)
Where
π 2
is the coefficient of determination
ππ is the error at each point i
21. 10
π is the standard deviation of the data set being analysed
ο· Stationarity
Stationarity is the absence of random effects. There are two types of stationarity, Difference stationarity
and trend stationarity [32][33]. Before one continues through these definitions it is important to first
define the following;
Pure Random Walk
A pure random walk is defined by the equation
ππ‘ = ππ‘β1 + ππ‘ (13)
Where
ππ‘ is white noise
ππ‘ is the series value at time t
ππ‘β1 is the series value at time t-1
White noise is stochastic; this means that this series will not become mean reverting as the variance will
evolve over time. The variance of the series will tend to infinity as time tends to infinity. This is a
difference stationary process[34].
Random walk with drift
This series is defined by the equation below
ππ‘ = πΌ + ππ‘β1 + ππ‘ (14)
Where
πΌ is the drift term in the series
This series too has a variance that is dependant on time and hence is not mean reverting. This is a
difference stationary process. [34]
Deterministic trend
This is defined by the equation below
ππ‘ = πΌ + π½π‘ + ππ‘ (15)
Where
π½π‘ is the deterministic trend
22. 11
This series although it looks similar to that of a random walk with drift is different as it is a regressed
series of the time trend π½π‘. A nonstationary process with a deterministic trend has a mean that grows
around a fixed trend which is constant and independent of time. This is a trend stationary process. [34]
Random walk with drift and deterministic trend
This series is described by the equation below
ππ‘ = πΌ + ππ‘β1 + π½π‘ + ππ‘ (16)
This series has both a drift component and a deterministic trend. This is both difference and trend
stationary. [34]
Difference stationarity
A series with a random walk can be transformed into a stationary process using differencing,
irrespective of whether it has drift or not. [34]
Trend Stationarity
A nonstationary process with a deterministic trend can be transformed into a stationary process by
detrending. [34]
Difference and Trend Stationary
In cases where a random walk with drift and a deterministic trend, stationarity can be achieved through
detrending but differencing needs to also be applied in order to ensure that the variance does not grow
to infinity over time. [34]
Testing for Trend stationarity and Difference Stationarity
There are two preferred methods for testing for stationarity, these are the Augmented Dickey-Fuller
(ADF) test and the KwiatkowskiβPhillipsβSchmidtβShin (KPSS) test [34]β[39].
Augmented Dickey-Fuller Test (ADF)
In the ADF test Equation 17 below is used to represent an AR process [35], [40], [41]
Where πΏ β [0,1], when πΏ is 0 the process is unit root stationary, when πΏ β 0 then the process is not.
The test sets a null hypothesis that the process is unit root stationary.
π»0: πΏ = 0
βππ‘ = πΌ + πΏππ‘β1 + ππ‘ (17)
23. 12
π»1: πΏ β 0
A t-statistic is calculated for the πΏΜ which is the estimated value of πΏ. This test statistic is then compared
to the critical values from the Dickey-Fuller Distribution.
When
π‘ < π·πΉπΆπππ‘ππππ (18)
The null hypothesis is rejected. The π·πΉπΆπππ‘ππππ value can be calculated using Equation 19.
π·πΉπΆπππ‘ππππ =
πΏΜ
ππΈ(πΏΜ )
(19)
KwiatkowskiβPhillipsβSchmidtβShin (KPSS) test
This test evaluates whether a univariate series is trend stationary as the null hypothesis and that it is a
nonstationary unit root process. It does this by first defining the series with Equation 20 below [42][43].
ππ‘ = ππ‘ + π½π‘ + π’1π‘ (20)
ππ‘ = ππ‘β1 + π’2π‘β² (21)
Where
ππ‘ is some random walk process
π’1π‘ is a stationary process
π’2π‘β² is and independent and iid process with a mean of 0 and variance π2
π»0: π2
= 0
π»0: π2
> 0
With the test statistic
ππ‘ππ π‘ =
β ππ‘
2π
π‘=1
π2 π2
(22)
Where
ππ‘ππ π‘ is the p-test statistic
π is the sample size
π2
is the Newey-West estimate of the long-run variance
And
ππ‘ = π1 + π2 + π3 + β― +ππ‘ (23)
24. 13
ο· Homoscedasticity
Homoscedasticity describes the variance of a series, it means that the variance does not increase with
time[44]. Homoscedasticity can be tested graphically by looking at plots of residuals against actuals. It
can also be tested using the Engle test for residual heteroscedasticity.
Engle test for residual heteroscedasticity
Residuals of the series are defined as in Equation 24
ππ‘ = π¦π‘ β π’Μ π‘ (24)
Where
π’Μ π‘ is the conditional mean of the process
ππ‘ is the residual which is identically distributed with a mean of 0 and variance of 1
π»0: πΌ0 = πΌ1 = πΌ2 = β― = πΌ π
π»1: ππ‘
2
= πΌ0 + πΌ1 ππ‘β1
2
+ πΌ2 ππ‘β2
2
= β― = πΌ π ππ‘βπ
2
+ π’ π‘
Where
π’ π‘ is a white noise error process
The null hypothesis tests that the error at time t is not dependent on the error from previous lags, which
means that it is not heteroscedastic. The test statistic is found by using the F statistic for regression on
the squared residuals, and the critical value is found in the π³2
distribution using m degrees of freedom
and the required significance[45][46].
Cross correlation and Auto Correlation
Cross correlation is the correlation of a time series with another time series at different lags. It is
achieved by shifting one of the series forward or backward by several lags. This results in the
observation of a lead-lag relationship between two variables. For example, if the cross correlation
between two series is highest at the offset of -3, which means that the correlation between the two series
is highest at that point. This could mean that the one variable signals when the other is going to be
affected. Auto correlation is the same as cross correlation except that, instead of using two different
series, it uses one series and measures if the series is correlated with its self. This could be exploited as
this too could signal change occurring. Cross correlation and auto correlation are both calculated using
the Pearsonβs correlation coefficient. One adjustment is made to this; this is for the shifting of the series
as seen in the figure below.
25. 14
Figure 2: Shifting of Time Series
2.4 Feature Selection
When creating a model that approximates functional relationships between inputs and outputs, generally
when using machine learning and artificial intelligence systems, a problem arises where there are too
many inputs that may be irrelevant and these lead to the overfitting of the model to the output data. To
deal with this feature selection was created. Feature selection is a methodology that eliminates the
irrelevant or redundant inputs [1],[47]. There are many different methods that can be applied. Principle
component analysis is used to reduce the amount of variables by transforming data from a higher to a
lower dimension while minimising the information lost [48], [49].
2.5 Exploratory Data Analysis (EDA)
Exploratory data analysis is usually used as a first step in any data analysis, it is an approach to data
analysis that uses different techniques to; increase the insight into the data, find what variables are
important, detect outliers and anomalies, uncover an underlying structure and test the underlying
assumptions before performing further analysis. It does this by looking at variable distribution,
scatterplots, correlation analysis and other multivariate approaches. EDA aims to be a more visual
analysis of variables[50].This is done through the following steps:
ο· Initial Extraction
ο· Determine number of factors to retain
ο· Rotation-a transformation
ο· Interpret solution
ο· Calculate factor scores
ο· Results in table
ο· Prepare results
2.6 Confirmatory Data Analysis (CDA)
Confirmatory data analysis uses statistical techniques to verify that there is indeed a factor structure
between a set of observed variables[50]. The method allows for the testing of the hypothesis that a
relationship does indeed exist. This is done through the following methodology;
26. 15
ο· Review the relevant theory and research literature to support model specification
ο· Specify a model
ο· Determine model identification
ο· Collect Data
ο· Conduct preliminary descriptive statistical analysis such as scaling, missing data, collinearity
measures and finding outliers
ο· Estimate parameters in the model
ο· Present and interpret results
2.7 Dividend Yield
Dividend yield is the ratio of the dividends paid against the price of the share. This ratio is seen as the
return on investment. The ratio is a measure of cash flow that is resulted from the purchase[51].
However, the dividends irrelevance theory says that dividends are not relevant to the shareholders as
the shareholder can sell their shares to achieve an income[52]
2.8 Interest Rate
An interest rate is seen as the cost of borrowing money. When investing money, one would benchmark
the return from borrowing money and compare it with the return the investment could yield [53]. When
investing, the interest rate could be used as an indicator of economic conditions due to the strong
relationships between the value of a currency and the inflation rate. In macroeconomics, the interest
rate is used to balance the demand of money in a country. When inflation increases, then the demand
for money begins to grow. This is when the minister of finance increases the interest rate. The increase
in interest rate results in the cost of borrowing money increasing. Due to this increase in cost, the
demand for money would decrease. This could be seen as a relevant factor to determining a shares value
as the share price is essentially determined by the demand and supply of the share [54].
2.9 Principle Component Analysis (PCA)
The main objective of PCA is the reduction of variables when working with large amounts of variables.
The graphical display would not be helpful in the analysis of the variables due to the number of variables
being studied. The procedure of the first principle analysis starts with defining a matrix X with all
variables. The matrix is then used to find a linear combination of the variables in the form of a
multifactor linear regression model without an intercept. The linear combination will have a matrix of
coefficients which describe the linear regression model. These coefficients are chosen to maximise the
variance of the linear combination. The sum of all the coefficients squared is constrained such that it
must be less than or equal to 1. The above described process is used again except the variance in this
model is just the remaining variance of the first principle component.[55]
27. 16
2.10 Causality
Causality is the term used for describing the relationship between two variables. The relationship
described will lead to realisation of which variable causes the fluctuation in the other. Unlike correlation
which only describes the strength of the relationship between the variables[56]. Causality is usually
tested using the Granger-Causality test.
2.11 Linear regression
A simple regression model is used to create linear relationships between two variables. The relationship
formed is of the form π¦π = π½0 + π½1 π₯π + ππ. The π½ values represent the regression coefficients. The error
that is resulted from fitting the model to the data is represented by ππ. The least squares method is used
to find the π½ values such that it minimises the sum of the squared residuals. The formulas below are
used to find the coefficients.
π½1 =
πππ₯π¦
πππ₯
(25)
π½0 = π¦Μ β π½1 π₯Μ (26)
Where
πππ₯π¦ = β(π₯π π¦π)
π
π=1
β
(β π₯π
π
π=1 )(β π¦π
π
π=1 )
π
(27)
πππ₯ = β(π₯π
2
)
π
π=1
β
(β π₯π
π
π=1 )2
π
(28)
π¦Μ =
β π¦π
π
π=1
π
(29)
π₯Μ =
β π₯π
π
π=1
π
(30)
2.12 Residual Analysis
Residual analysis is done in order to see how well a model fits data. The residual vs actual plot is one
that is rich in information. Residual analysis can be seen as the validation of the model. When the
residual vs actual plot reveals a pattern, it means that the data is not described by the model. Ideally
after fitting model to the data, a randomly distributed error should be found. This randomly distributed
error should be about the 0 residual line[57]. If the residual grows with an increase in the actual data,
this means that the data is heteroscedastic or that the data needs to be transformed using the log
transform. Heteroscedastic data can still be transformed using the Box-Cox transformations[58].
28. 17
2.13 Related Studies
Justin Colyn [23] had focused his research on determining whether Price-Earnings (P/E) ratio and the
Dividend Yield (D/Y) influence future price for a select few value weighted, equity capital market
indices. To do this he had used a methodology which tested for stationarity using the Augmented
Dickey-Fuller method (ADF) then for co-integration using the methodology of Johansen [59]. From
here he corrected the non-stationary series using the Vector Error Correction Model (VECM) for Co-
integrated variables and the Vector Auto-Regression (VAR) model at the correct lag which was found
by various information criterion such as the Akaike Information Criterion (AIC), Schwarz Information
Criterion (SC), and Hannan-Quinn Information Criterion (HQ) After these two tests had been done and
corrections made, Granger-Causality tests were performed between the series with the hypothesised
relationships. His study concluded that for most indices there is very little evidence of Granger-
Causality in either direction, Between Price and P/E ratio and between Price and DY but there appeared
to be Granger-Causality between price and P/E ratio in relation to the Financial Times Stock Exchange
top 100 (FTSE 100).[23]
Enos Lentsoane [60] had researched the stock price reaction to dividend changes. This study is slightly
different from the one carried out by Justin Colyn as this study focuses on event study methodology.
The changes in dividends when announced are treated as an event, this study methodology is usually
used for qualitative information such as Corporate events. This study used the event study methodology
as used by Khotari & Warner [61], which had nine steps to follow. These steps are as follows;
1. Define the event to be tested
2. Define period to be studied in terms of estimation window, event window, and event date.
3. Define what is meant by abnormal performance
4. Collect event data which meets data selection criteria as defined in step 2
5. Calculate pre-event abnormal returns
6. Calculate abnormal returns over event window
7. Calculate the Average Abnormal Return (AAR) and Cumulative Abnormal Return (CAR) for the
test statistic
8. Determine the critical values (Statistical significance) of the AAR and CAR
9. Analyse and interpret the results
When calculating the abnormal return three measures were used, Market Adjusted Abnormal Return
(MAAR), Market Model Abnormal Return (MMAR), and Buy-and-Hold Abnormal Return (BHAR).
This event study concluded that market reaction is not statistically significant on the announcement day
and that more negative returns occur during the pre-crisis period. He also concluded that the research
does not support the irrelevance theory but seems to support signalling hypothesis.[60]
29. 18
Nondumiso Ngidi [62] researched the effect of strikes in South Africa on the share prices of 49 listed
companies. He too used the event study method to find the effect that strikes in South Africa have on
share prices of listed companies on the Johannesburg Stock Exchange (JSE). He concluded that stock
prices react negatively to the news of strike action and continue to follow a downward trend for
approximately 5 days after the strike action has concluded. His study also finds that the JSE is not an
efficient market as it takes days for the market to return to equilibrium after these announcements. [62]
Yusuf Varli et al [24] studied the use of a new correlation coefficient that can be used for analysing
bivariate time series data. The new correlation coefficient was tried through simulations and compared
to the performance of other correlation coefficients, mainly Pearsonβs Correlation Coefficient. The
conclusions drawn from this were;
ο· The coefficient being tested takes lag-difference into account
ο· Better performance in capturing the cross-independence of two variables over time
ο· More normal than the Pearsonβs Coefficient
ο· Performs better than the Detrended Cross-Correlation Analysis (DCCA) coefficient in terms of
capturing the independence and co-integration in non-stationary series.[24]
Bwo-Nung Huang et al[63] used unit root and co-integration models to determine the appropriate
Granger-Causal relations between stock prices and exchange rates using the Asian Flu data. The tests
included in the methodology are as follows;
ο· Augmented Dickey Fuller (ADF)
ο· Phillips-Paren technique
ο· Bivariate Vector Auto-Regression (VAR) model
ο· Granger Causality test
ο· Co-Integration test
From this research it was found that data from South Korea are in agreement that exchange rates lead
stock prices, the data from the Philippines suggest the stock prices lead exchange rates with negative
correlation and the data from Hong Kong, Malaysia, Singapore, Thailand and Taiwan indicate strong
feedback relations whereas that of Indonesia and Japan fail to reveal any recognisable pattern.[63]
3 OBJECTIVES
To determine:
ο· What factors influence the price of the Richemont share?
ο· What the relationship between the more influential factors and the Richemont share price are?
ο· Which factors are more correlated with the share price?
30. 19
4 APPARATUS
1. Personal Computer using windows 10 OS with i7 processor
2. Matlab R2010a
3. Excel 2016
5 METHODOLOGY
From the factors listed in motivation, a list of 8 factors were chosen. The factors that were chosen are;
1. Dividend Yield
2. Volume Traded
3. Rand-Euro exchange rate
4. Rand-Dollar exchange rate
5. Inflation β Average National South African CPI was used.
6. Interest rate- Prime Lending Rate was used
7. Gold Price- Rand/Ounce
8. Platinum Price-Rand/Ounce
These factors were chosen due to nature of the company whose share was selected. Richemont is a
luxury goods company based in Switzerland. Richemontβs main focus is jewellery, luxury watches and
writing instruments[64]. Given that this is their main focus, it was hypothesised that Gold and Platinum
Prices would have some relationship with the share price. The factors that then needed to be evaluated
were exchange rates, and economic factors. Hence the Rand-Euro exchange rate, Rand-Dollar exchange
rate, Inflation and the Interest rate were used in this study. Dividend yield and Volume traded are two
of the more commonly used indicators when evaluating a share price, although controversial to that
there is a theory of Irrelevance of dividends [52].
5.1 Collection of Data
1. Each factorsβ data was further split into 6 data collection frequencies, Yearly, Half-Yearly,
Quarterly, Monthly, Weekly and Daily.
2. A table with the names of each data set was created and ticked for each data set as it was collected,
this ensure that there were no duplications of the data.
3. The collection of the data was done with Inet BFA database, using the student portal to access the
data.
4. The data that could not be found on Inet BFA database was found on Stats SA or Quantec EasyData.
5. All the exchange rates, gold prices, platinum prices, share prices, volumes and dividend yields were
found through Inet BFA.
31. 20
6. A 5-year dataset and different frequencies were chosen and all data would be output to excel files.
The Inflation rate and interest rate were taken from Stats SA and Quantec respectively.
7. Data had to be reordered for it to be used easily in one workbook per period. In some cases, data
needed to be extracted from one periods data into another periods data, for example the CPI was
found yearly at a monthly frequency.
5.2 Processing of Data
1. Plot each time series.
2. Find correlation values between each factor and the share price within each dataset, this can be done
using the built-in correlation function in Excel or Matlab.
3. Load the workbooks with all the data for each frequency and name it.
4. Cross correlation and auto correlation can now be found using Matlab and the cc function attached
in Appendix A. This is done to find the lead and lag relationships between the share price and the
factors.
5. When using the code in Appendix A, the data collection needs to be done for each frequency, daily,
weekly, monthly, quarterly, half-yearly and yearly.
6. After running the code in Appendix A, save each output variable into excel and display in graphs.
7. Use the code in Appendix B to test the assumptions listed below to a 5% significance. The
assumptions are;
ο· Linearity-Tested by fitting a linear regression model to the data, obtaining the residuals and the
calculating the coefficient of determination. Using Matlab function detrend() to capture the
residuals and then the rest was processed in excel using Equation 12.
ο· Randomness- This was tested using the Matlab function runstest()
ο· Stationarity- Tested by using ADF method for unit root stationary and using KPSS for trend
stationarity. Using adftest() and kpsstest()
ο· Homoscedasticity- using Engle test for residual heteroscedasticity. Using archtest()
ο· Normality- Tested using a one-sample Kolmogorov-Smirnov test. Using Matlab function-
kstest() .
8. The results from the assumption tests must now be saved into excel and tabulated.
9. Each series has been detrended when running the code for the assumptions tests.
10. The detrended data can be seen as residuals. From this the residuals must be squared and summed
and divided by the standard deviation squared, the subtracted from 1 to find the coefficient of
determination.
11. In all the tests, except the test for linearity, having all zeros means that the test has been passed.
Analyse the tables to see which series in the data sets have met all assumptions.
32. 21
12. If none meet all the assumptions, look for the variables that have passed stationarity and
homoscedasticity tests.
13. Plot the data and fit a linear best fit to the data in excel.
14. Residuals can be found in Matlab using the detrend function, which removes the line of best fit
from the data.
15. The data can then be plotted against the actual share price, this can be done in Matlab too using the
plot function and then editing the graphs as required.
16. Transformation of series that meet the homoscedasticity and the stationarity tests.
17. Start the transformations with each factor, iterate between different transformations to improve the
coefficient of determination. Residuals plots that have an increasing variance usually need to be log
transformed.
18. After transforming all data, plot the regression line through the data and calculate the coefficient of
determination for this fit. This can easily be done in excel
19. Check if all data falls with 95% confidence intervals by plotting the upper and lower confidence
lines with all the data and the regression line. This can be achieved by using the Matlab code in
Appendix C and D.
20. The fit of the regression model needs to be evaluated for the transformed data. This is done by using
the detrended transformed data and plotting it against the actual share price data.
21. A correlation matrix is found using the built-in Matlab correlation function.
22. The cross correlation is found using the cc function in the Appendix A
23. Study the correlation matrix and the cross correlations between factors to determine which factors
are not necessary.
24. From the factors that are found to be necessary determine which factors have very poor correlations
with the share price. These factors can be eliminated too
25. From the remaining factors, investigate how well the linear regression line fits the data from all
datasets that had been eliminated when they had not met the homoscedasticity assumption and the
stationarity assumptions. This can be done using the Matlab code in Appendix F to produce residual
vs actual plots for each dataset.
26. Determine the correlation between the Predicted share price values and the actual share price values,
this can be done using the correlation function in Matlab.
33. 22
5.3 Precautions
1. The collection of the data was done factor at a time to ensure that no data was left out and that it
takes as little time as possible.
2. Some datasets were unable to be found as results are not output at that particular interval, for
example interest rates and CPI do not change daily, or weekly.
3. Data needed to be checked to ensure that the correct data was in the right workbooks in excel. If
the data was not or if the file corrupted, the data would need to be downloaded again.
4. Ensure that the function is saved in the same working directory that is being used.
5. Ensure that when running the Matlab code the variables have been updated and that the right
variables have been used.
6. Ensure that the βlengthDataβ variable in the Matlab code is changed as the data collection frequency
is changed, this is due to the data being of different lengths each time.
34. 23
6 OBSERVATIONS
All data that was gathered was plotted in one dataset for each factor. This was done by using the lowest
resolution of the data available. This could be done due to the lowest resolution having the all the factors
data for each dataset within it. The factors were plotted in Figure 3 and Figure 5 to Figure 11.
Figure 3: Platinum Price Over Time
Figure 4: Share Price Over Time
Rand-Dollar
Exchange rate
Rand-Dollar
Exchange rate
35. 24
Figure 5: Gold Price Over Time
Figure 6 Volume Traded Over Time
Rand-Dollar
Exchange rate
Rand-Dollar
Exchange rate
36. 25
Figure 7: Rand-Euro Over Time
Figure 8: Dividend Yield Over Time
Rand-Dollar
Exchange rate
Rand-Dollar
Exchange rate
37. 26
Figure 9: Rand-Dollar Over Time
Figure 10: Interest Rate Over Time
Rand-Dollar
Exchange rate
Rand-Dollar
Exchange rate
38. 27
Figure 11: Inflation Over Time
7 DATA PROCESSING
The processing of the data was done within Matlab and excel, a lot of built in functions were used when
processing the data. There were 12 ways that the data was processed to be interpreted. Sometimes the
preceding step to processing the data determined how the data would be processed next. Below each
step will be discussed and the Matlab code and functions that were used will be explained.
7.1 Correlation Values
To produce the correlation values seen in Table 1, the correlation function in excel was used. This
computes the Pearson Product Moment Correlation as discussed in Section 2.3.4.
7.2 Cross Correlation and auto Correlations
The data displayed in Figure 12 to Figure 17 was processed using the cc function in Appendix A. The
function calculated the cross correlation by using Equation 7. This can be seen implemented in the
lines 12 and 19, with the code
n=(cov(x(k:length(x)),y(1:(length(x)k+1))))/(sqrt(var(x(k:length(x)))*var(y(1:(length(x)-k+1))))); and
n=cov(x(1:length(y)-l+1),y(l:length(y)))/(sqrt(var(x(1:length(y)-l+1))*var(y(l:length(y)))));
respectively. The covariance matrix has a structure that has the variance of each set on its diagonal, it
is symmetric about the diagonal. For example, to find the covariance between variables 1 and 2 in the
2x2 matrix named A, a person would need to look at either A(1,2) or A(2,1) as this shows the
covariance between the two variables.
The shifting of the data, was achieved by introducing the variables k and l respectively. The point of
this was to iterate the insertion of variables above the halfway point in the cross correlation vector
using k and the code βrow((length(x)+k-1))=n(1,2);β. The variable l was used to insert values before
Rand-Dollar
Exchange rate
39. 28
the halfway point in the cross correlation vector by using the code βrow((length(x)-l))= n(1,2);β. Note
that this had to be called for each factor in each data set against the share price. This can be seen in the
Matlab code in Appendix B with the following lines ;
1. AutoCorrel=cc(data(1:lengthData,2),data(1:lengthData,2));
2. DYCorrel=cc(data(1:lengthData,2),data(1:lengthData,5));
3. VolumeCorrel=cc(data(1:lengthData,2),data(1:lengthData,7));
4. Rand_Euro_Correl=cc(data(1:lengthData,2),data(1:lengthData,9));
5. Rand_Dollar_Correl=cc(data(1:lengthData,2),data(1:lengthData,11));
6. inflationCorrel=cc(data(1:lengthData,2),data(1:lengthData,13));
7. interestCorrel=cc(data(1:lengthData,2),data(1:lengthData,15));
8. GoldCorrel=cc(data(1:lengthData,2),data(1:lengthData,17));
9. PlatinumCorrel=cc(data(1:lengthData,2),data(1:lengthData,19));
10. alldataCC=[AutoCorrel;DYCorrel;VolumeCorrel;Rand_Euro_Correl;Rand_Dollar_Correl;inflatio
nCorrel;interestCorrel;GoldCorrel;PlatinumCorrel];
Where the variable βlengthDataβ was changed manually for each dataset, the matrix data needed to be
changed for each data set as well.
7.3 Assumption tests
The assumptions were tested using the following built-in Matlab functions;
1. detrended(1:lengthData,i)= detrend(data(1:lengthData,i),1);
2. arch(i)=archtest(detrended(:,i));
3. [h(i),p(i),k(i),c(i)]=kstest(data(1:lengthData,i));
4. r(i)=runstest(data(1:lengthData,i));
5. adf(i)=adftest(data(1:lengthData,i));
6. kpss(i)=adftest(data(1:lengthData,i));
The detrend function as using in the text above would result in the residuals being calculated for a linear
model being fit to the data. The function archtest, tested the hypothesis that the series was
heteroscedastic by looking at the residuals and employing the Engle test for residual heteroscedasticity
as described in Section 2.3.3 under the homoscedasticity section which was tested at 95% confidence.
The kstest function used the Kolmogorov-Smirnov test at a 95% confidence. The adftest function, used
the Augmented-Dickey-Fuller method to test if the series is unit root stationary at a 95% confidence
interval. The kpsstest function used the KwiatkowskiβPhillipsβSchmidtβShin (KPSS) test, to test the
hypothesis that the series is trend stationary, which was also conducted at a 95% confidence. The
outputs needed for a series to meet all the assumptions being tested was a row of zeros. This code
needed to be iterated for every dataset.
40. 29
7.4 Coefficient of determination
This was done manually using the detrended data and the original data for each series. The squared sum
of the residuals was calculated, followed by the standard deviation of the original data. These were then
substituted into Equation 12. A sample calculation is done below for the yearly share price.
π 2
= 1 β
β ππ
2
ππ2
ππ
2
=
[
7304921.91
1322521.9
2192712.23
902724.77
150292.829
3880524.77 ]
, π2
=7586437.77, π = 6
π 2
= 1 β
20753698.42
6Γ7586437.77
π 2
= 0.544
7.5 Confidence intervals of regression plots
The confidence intervals of the regression plot were done in Matlab using the codes attached in
Appendix C and D. The regress function in Matlab was used in order to achieve the values to be used
in the equation of the lines, the upper line at 95% confidence level, the lower line at a 95% confidence
level and the linear regression line.
7.6 Residual analysis for transformed data
The residual plot was done by using the detrend function in Matlab and the scatter function to plot the
data against the actual values.
7.7 Correlation of factors
The correlation of factors was done in Matlab using the correlation function which also uses the
Pearsonβs Product Moment Correlation coefficient calculation as shown in Section 2.3.4.
7.8 Cross Correlation of all factors
The cross correlation was calculated using the Matlab code attached in Appendix E, which uses the
function βccβ which is attached in Appendix A.
7.9 Residuals for fitting regression model to each data setsβ volume traded data
The equation of the regression model was attained from the graphs created in Excel. This equation was
input into Matlab and tested to see if it holds for each data set. The Code in Appendix F was used to
41. 30
plot the graphs required. From the code in Appendix F, the correlation coefficients were determined.
Below the code is explained.
1. cas=(max(Dailydata(1:1250,13).^(3))-min(Dailydata(1:1250,13).^(3)))/1249;
2. xvl=min(Dailydata(1:1250,13).^(3)):cas:max(Dailydata(1:1250,13).^(3));
3. SPdy=transpose((-2E-25*(xvl))+21.94);
4. dyres=(Dailydata(1:1250,2))-SPdy.^3;
5. scatter((Dailydata(1:25:1250,2)),dyres(1:25:1250,1),'DisplayName','dyres(1:1250,1)','YDataSourc
e','dyres(1:1250,1)');figure(gcf)
In line 1 the variable βcasβ determines the step of the x values in the graphs. Line 2 creates a vector of
x values. The variable βSPdyβ is the predicted share price. The variable βdyresβ is the residual between
the actual and predicted. Line 5 creates a scatter plot of the residuals against the actual values. In the
Matlab graphical user interface (GUI), the chart is editing according to what is required. The line of
best fit can be added and the equation of the line can be displayed. Residuals could also be displayed
using the Matlab GUI.
8 RESULTS
In this section, only the relevant results have been placed, all other results have been attached in
Appendix G. This section comprises of 12 sections. The 12 sections are, Correlation values, Cross
Correlation and Auto Correlation, Assumption Tests, Coefficients of Determination, Residual vs Actual
plots of data before transformation, Residual vs Actual plots of data before transformation, Confidence
Intervals on Regression Plots, Residual Analysis of transform data, Correlation of Factors, Cross
Correlation of factors and Residuals for fitting regression model to each periods volume data. Each
section will be briefly introduced below.
8.1 Correlation Values
The correlation values that are placed in the Table 1 are between each factor and the share price. This
was done in order to find which factors impact the share price most within each data set. There are 6
data sets that have been used, Daily, Weekly, Quarterly, Half-yearly and Yearly. The blank spaces in
the table are due to not having the data required to perform the calculation for that data set. Absolute
average was calculated to ensure that when looking at the average correlation for each dataset or factor,
the correlation is not reduced due to a few inverse relationships.
42. 31
Table 1: Correlation Values of Each Factor with Share Price
Yearly Weekly Monthly Quarterly Half-Yearly Daily
Absolute
Average
DY 0.687 0.451 0.471 0.460 0.360 0.458 0.481
Volume -0.987 -0.667 -0.823 -0.907 -0.731 -0.489 0.767
Rand-Euro 0.903 0.825 0.844 0.843 0.763 0.541 0.787
Rand-Dollar 0.783 0.740 0.752 0.732 0.591 0.699 0.716
Inflation 0.913 0.827 0.807 0.603 0.788
Interest 0.374 0.490 0.391 0.188 0.565 0.401
Gold 0.294 0.294 0.298 0.251 0.127 0.450 0.286
Platinum 0.958 0.594 0.612 0.638 0.667 -0.178 0.609
Absolute
Average 0.737 0.595 0.640 0.627 0.504 0.483
8.2 Cross Correlations and Auto Correlations
Figures 12 to 17, are the cross correlations and auto correlations for each factor within each dataset.
Note that a lag in each datasets graphs would represent a different time step due to the resolution of the
data. The data in each graph also becomes less as the data set moves from daily to yearly. The daily
data set has 1250 points and there is no line fit through it as it is easy to see the movement of the
correlation with each lag. The figures that have less data points, have a line passing through the points
due to the difficulty of following the points and seeing any patterns that may exist. Note that the
autocorrelation in each figure is symmetrical, this is used as a validation that the βccβ function in
Appendix A does work correctly. In each figure all the factors data have been plotted, this was to
decrease the number of figure that needed to be analysed. It also makes for easier reading of the data
when analysing.
43. 32
Figure 12: Cross Correlation of Factors Within Daily Data Set
Figure 13: Cross Correlation of Factors Within Weekly Data Set
44. 33
Figure 14: Cross Correlation of Factors Within Monthly Data Set
Figure 15: Cross Correlation of Factors Within Quarterly Data Set
45. 34
Figure 16: Cross Correlation of Factors Within Half-Yearly Data Set
Figure 17: Cross Correlation of Factors Within Yearly Data Set
8.3 Assumption Tests
In Table 2 to Table 6, each table is a summary of the results of the tests done for each factor within each
dataset. An assumption is met when there is a 0 in the block, 1 represents the failure to meet the
assumption.
46. 35
Table 2:Assumption Test Results for Each Series in The Daily Data Set
Homoscedastici
ty
Normal
Test
Random
Test
Unit-Root
Stationary
Trend
Stationary
Close 1 1 1 0 0
Volume 1 1 1 0 0
Rand-Euro 1 1 1 1 1
Rand-Dollar 1 1 1 0 0
Inflation
Rate 1 1 1 0 0
Gold Price 1 1 1 0 0
Platinum
Price 1 1 1 0 0
Table 3: Assumption Test Results for Each Series in The Weekly Data Set
Homoscedastici
ty
Normal
Test
Random
Test
Unit-Root
Stationary
Trend
Stationary
Close 1 1 1 0 0
Volume 1 1 1 1 1
Rand-Euro 1 1 1 0 0
Rand-Dollar 1 1 1 1 1
Gold Price 1 1 1 0 0
Platinum
Price 1 1 1 0 0
Table 4: Assumption Test Results for Each Series in The Monthly Data Set
Homoscedastici
ty
Normal
Test
Random
Test
Unit-Root
Stationary
Trend
Stationary
Close 1 1 1 0 0
Volume 1 1 1 0 0
Rand-Euro 1 1 1 0 0
Rand-Dollar 1 1 1 1 1
Inflation
Rate 1 1 1 1 1
Interest
Rate 1 1 1 0 0
Gold Price 1 1 1 0 0
Platinum
Price 0 1 1 0 0
47. 36
Table 5: Assumption Test Results for Each Series in The Quarterly Data Set
Homoscedastici
ty
Normal
Test
Random
Test
Unit-Root
Stationary
Trend
Stationary
Close 1 1 1 0 0
Volume 1 1 1 0 0
Rand-Euro 0 1 1 0 0
Rand-Dollar 0 1 1 1 1
Inflation
Rate 0 1 1 1 1
Interest
Rate 0 1 1 0 0
Gold Price 0 1 0 0 0
Platinum
Price 0 1 0 0 0
Table 6: Assumption Test Results for Each Series in The Half-Yearly Data Set
Homoscedastici
ty
Normal
Test
Random
Test
Unit-Root
Stationary
Trend
Stationary
Close 0 1 1 0 0
Volume 0 1 1 0 0
Rand-Euro 0 1 0 0 0
Rand-Dollar 0 1 1 0 0
Inflation
Rate 0 1 1 1 1
Interest
Rate 0 1 1 0 0
Gold Price 0 1 1 0 0
Platinum
Price 0 1 0 0 0
8.4 Coefficient of Determination
Table 7 is a summary of the coefficients of determination for each factor within each data set. This
coefficient of determination shows the ability to fit a straight line through the data for each factor within
each data set. The share price was also tested for linearity in this way.
48. 37
Table 7: Coefficient of Determination for Each Series
Yearly Weekly Monthly Quarterly Half-Yearly Daily Average
Share Price 0.544 0.701 0.715 0.615 0.455 0.706 0.606
DY 0.168 0.330 0.357 -0.015 0.095 0.324 0.187
Volume 0.933 0.286 0.451 0.565 0.754 0.152 0.598
Rand-Euro 0.835 0.823 0.831 0.838 0.827 0.573 0.831
Rand-Dollar 0.888 0.917 0.923 0.929 0.906 0.877 0.913
Inflation 0.997 No Data 0.993 0.993 0.993
No
Data
0.994
Interest 0.738 No Data 0.643 0.556 0.702 0.734 0.660
Gold 0.695 0.456 0.447 0.472 0.671 0.609 0.548
Platinum 0.817 0.205 0.222 0.318 0.607 0.033 0.434
Average 0.735 0.531 0.620 0.586 0.668 0.501
8.5 Regression of Price Vs Factors
Figure 18 to Figure 25 show a regression plot of the Share Price vs each the factors for the half-yearly
data set.
Figure 18: Linear Regression of DY Vs Share price
Rand-Dollar
Exchange rate
49. 38
Figure 19: Linear Regression of Volume Traded Vs Share price
Figure 20: Linear Regression of Rand-Euro Exchange Rate Vs Share price
Rand-Dollar
Exchange rate
Rand-Dollar
Exchange rate
50. 39
Figure 21: Linear Regression of Rand-Dollar Exchange Rate Vs Share price
Figure 22: Linear Regression of Inflation Vs Share price
Rand-Dollar
Exchange rate
Rand-Dollar
Exchange rate
51. 40
Figure 23: Linear Regression of Interest Rate Vs Share price
Figure 24: Linear Regression of Gold Price Vs Share price
Rand-Dollar
Exchange rate
Rand-Dollar
Exchange rate
52. 41
Figure 25: Linear Regression of Platinum Price Vs Share price
8.6 Residual vs Actual
Following the previous figures, which showed the regression plot of the factors vs the share price, this
section plots the residuals of the linear fits in the figures found in Section 8.5. This was done to look
for patterns in the residuals, an increasing variance in the residuals and the randomness of the
distribution of the residuals. The residuals are plotted in Figure 26 to Figure 34.
Figure 26: Residual Plot for Share Price
Rand-Dollar
Exchange rate
Rand-Dollar
Exchange rate
55. 44
Figure 31: Residual Plot for Inflation
Figure 32: Residual Plot for Interest Rate
Rand-Dollar
Exchange rate
56. 45
Figure 33: Residual Plot for Gold Price
Figure 34: Residual Plot for Platinum Price
8.7 Regression of Price Vs Factors After Transformations
Following the transformation of the factors and share price, each series of factor vs share price has been
plotted below. Note that the gold price and the interest rate are both not included within the figures
below, this is due to no improvement of the linearity through transformation of the data.
Rand-Dollar
Exchange rate
Rand-Dollar Exchange
rate
57. 46
Figure 35: Linear Regression of Dividend Yield Vs Share Price After Transformation
Figure 36: Linear Regression of Volume Traded Vs Share Price After Transformation
Rand-Dollar
Exchange rate
Rand-Dollar
Exchange rate
58. 47
Figure 37: Linear Regression of Rand-Euro Exchange Rate Vs Share Price After Transformation
Figure 38: Linear Regression of Rand-Dollar Exchange Rate Vs Share Price After Transformation
Rand-Dollar
Exchange rate
Rand-Dollar
Exchange rate
59. 48
Figure 39: Linear Regression of Inflation Vs Share Price After Transformation
Figure 40: Linear Regression of Platinum Price Vs Share Price After Transformation
8.8 Confidence Intervals on regression plots
Below are the figures of each regression plot after transformation, each plot has an upper and a lower
confidence interval, these confidence intervals communicate whether there is a significant outlier in the
data. The significance level of the data is at 95%.
Rand-Dollar
Exchange rate
Rand-Dollar
Exchange rate
60. 49
Figure 41: Regression Plot of transformed Dividend Yield vs transformed Share Price with Confidence
Intervals
Figure 42: Regression Plot of transformed Rand-Euro vs transformed Share Price with Confidence
Intervals
Figure 43: Regression Plot of transformed Rand-Dollar vs transformed Share Price with Confidence
Intervals
Rand-Dollar
Rand-Dollar
Rand-Dollar
Exchange rate
61. 50
Figure 44: Regression Plot of transformed Inflation vs transformed Share Price with Confidence
Intervals
Figure 45: Regression Plot of transformed Platinum Price vs transformed Share Price with Confidence
Intervals
8.9 Residual Analysis for transformed data
Here again the linear fit for each series is evaluated by observing the residuals. The residuals are
analysed for error growth with an increase of the factor as well as patterns within the plot.
DY
Rand-Dollar
Rand-Dollar
Exchange rate
62. 51
Figure 46: Residual Plot of transformed regression between Share Price and DY
Figure 47: Residual Plot of transformed regression between Share Price and Inflation
Figure 48: Residual Plot of transformed regression between Share Price and Platinum Price
Figure 49: Residual Plot of transformed regression between Share Price and Rand-Dollar
Inflation
Platinum Price
Rand-Dollar Exchange rate
63. 52
Figure 50: Residual Plot of transformed regression between Share Price and Rand-Euro
8.10 Correlation of Factors
Below is the table of cross correlation of each observation within the half-yearly dataset. Only half the
matrix has been attached as it symmetrical about the diagonal. The colour that has been added to the
table is done to highlight strong negative correlations in dark red and strong positive correlations in
dark green. The lower the correlation the lighter the colour of the block will be.
Rand-Euro Exchange rate
64. 53
Table 8: Correlation matrix of Untransformed Factors
Close DY Volume
Rand-
Euro
Rand-
Dollar
Inflation
Interest
Rate
Gold
Price
Platinum
Price
Share Price 1.000 0.360 -0.731 0.763 0.591 0.603 0.188 0.143 0.667
DY 1.000 -0.008
-
0.065
-
0.042
-0.098 -0.279 0.001 -0.245
Volume 1.000
-
0.761
-
0.704
-0.859 -0.642 0.178 -0.809
Rand-Euro 1.000 0.934 0.886 0.635 0.370 0.804
Rand-Dollar 1.000 0.942 0.760 0.409 0.639
Inflation 1.000 0.850 0.214 0.747
Interest
Rate
1.000 0.066 0.603
Gold Price 1.000 0.110
Platinum
Price
1.000
8.11 Cross Correlation of all Factors
The tables below are the cross correlations between each factor and all the observed data within the
half-yearly dataset. Here too, the tables are colour coded with a colour gradient to show the strength of
the correlation. Green means positive correlation and red means negative correlation.
67. 56
Table 15: Cross Correlation of the Volume Traded with Each Factor
Lag
Share
Price
DY Volume
Rand-
Euro
Rand-
Dollar
Inflation
Interest
Rate
Gold
Price
Platinum
Price
-3 -0.35 -0.65 0.54 -0.25 -0.53 -0.54 -0.17 -0.36 -0.06
-2 -0.72 -0.25 0.62 -0.82 -0.79 -0.72 -0.23 -0.69 -0.56
-1 -0.62 0.28 0.69 -0.76 -0.67 -0.79 -0.56 -0.64 -0.77
0 -0.73 -0.01 1.00 -0.76 -0.70 -0.86 -0.64 0.18 -0.81
1 -0.87 -0.25 0.69 -0.85 -0.77 -0.78 -0.58 -0.32 -0.67
2 -0.47 -0.24 0.62 -0.75 -0.82 -0.86 -0.75 -0.31 -0.40
3 0.08 -0.11 0.54 -0.66 -0.87 -0.92 -0.80 -0.28 -0.21
Table 16: Cross Correlation of the Dividend Yield with Each Factor
Lag
Share
Price
DY Volume
Rand-
Euro
Rand-
Dollar
Inflation
Interest
Rate
Gold
Price
Platinum
Price
-3 0.17 -0.49 -0.11 0.18 -0.29 -0.27 -0.73 -0.32 0.45
-2 0.32 -0.36 -0.24 0.10 -0.26 -0.09 -0.21 -0.48 0.63
-1 0.25 0.41 -0.25 -0.18 -0.30 -0.11 -0.10 -0.61 0.01
0 0.36 1.00 -0.01 -0.06 -0.04 -0.10 -0.28 0.00 -0.24
1 0.28 0.41 0.28 0.28 0.39 0.17 -0.02 0.76 -0.18
2 -0.34 -0.36 -0.25 0.48 0.71 0.70 0.68 0.44 0.11
3 -0.45 -0.49 -0.65 0.68 0.70 0.72 0.67 -0.12 0.39
8.12 Residuals for fitting regression model to each periods volume traded data
Figure 51 to Figure 57 analyses the fit of the linear model that was found for the volume traded against
the share price. It was evaluated to determine how well the model can explain the relationship between
the volume traded and the share price within the other data sets. A linear trend can be seen between the
residual vs actual share price in most of the data sets. Table 17 shows the correlation of the predicted
and the actual observations.
68. 57
Figure 51: Error for Fitting Equation to Daily Data
Figure 52: Error for Fitting Equation to Weekly Data
Figure 53: Error for Fitting Equation to Monthly Data
Figure 54: Error for Fitting Equation to Quarterly Data
69. 58
Figure 55: Error for Fitting Equation to Yearly Data
Figure 56: Error for Fitting Equation to Half-yearly Data
Table 17: Correlation of predicted and actual share price
Correlation
Coefficient
Daily 0.207
Weekly 0.561
Monthly 0.720
Quarterly 0.722
Half-yearly 0.865
Yearly 0.904