Econometrics: Advance


Published on

Published in: Economy & Finance
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Econometrics: Advance

  1. 1. Giulio Laudani #18 Cod. 20192 Econometrics II FAVERO PARTS: 2 A model to describe the return: 2 ARMA model: 2 Volatility model: 3 Classic computation: 4 ARCH and G-ARCH estimators: 4 How to improve estimates: 5 Modeling long-run relationships in finance: 5 Differencing procedure: 6 Cointegration: 6 GUIDOLIN PARTS: 8 Econometric analysis: 8 Types of data: 8 Steps involved in formulating an econometric model: 8 OLS model: 8 Multivariate model: 11 Simultaneous Equation model: 11 VAR model: 11 Studying the non-normality of distribution: 11 Method to detect non-normality: 11 How to fix non-normality issues: 12 Multivariate models: 13 Exposure Mapping models: 13 Conditional Covariance model: 13 DCC model: 14 VEC and BEKK model: 14 Principal component approach: 14 Switching/Regime models: 16 Threshold model: 16 Smoot transaction: 16 Markov switching model: 16 Simulation methods: 17 Historical simulation: 17 Monte Carlo Simulation: 17 Filtered Simulation (Bootstrapping): 18 1
  2. 2. Giulio Laudani #18 Cod. 20192Favero Parts:A model to describe the return:We are going to use as an explanatory model of return evolution the one that is assuming the presence of two compo-nent: thepermanent information and thetemporary noisy one. The second one prevails in the short term high frequencyobservation, while the first emerges for longer horizon. This property implies: In general, predictability of returns increases with the horizon. The best estimate for the mean of returns at high frequency is zero, but a slowly evolving time varying mean of returns at long-horizons could and should be mod- eled. [there is a strong evidence of correlation between industrialized countries stocks] If we run a regression we are expecting high statistical significance for the parameters for log horizon The presence of the noise component in returns causes volatility to be time-varying and persistent, and the an- nualized volatility of returns decrease with horizon Data shows non normality behavior, unconditional distribution with higher tails Non-linearity is also a feature of returns at high frequency: a natural approach to capture nonlinearity is to diffe- rentiate alternative regimes of the world that govern alternative description of dynamics(such as level of volatility in the market), for example Markov chain.The model is describe by the following formula obtained starting from:The consequences of this model are: The model implies the possibility that long-run returns are predictable. So forecasting models for the stock mar- ket return should perform better the longer the forecasting horizon. Moreover, theimportance of noise in de- termining returns disappears with the horizon. Return in the short term are not predictable, while volatility is predictable The forecasting performance for stock market returns depends cruciallyon the forecasting performance for divi- dend growth. Note that in the case in which the dividend yield predicts expected dividend growth perfectly the 1 proposition that returns are not predictable holds in the data . Since the model is assuming a linearization we need to test that the dividend yield fluctuates around a constant mean around the model is effectively modeled We can use this model to compute the Sharpe ratio Given two equation one for modeling the independent variable used in the second one, we need the first to compute the unconditional mean of the second dependent variableARMA model:Modeling time series is a way to predict financial variable by using past and current value of error terms, hence it is an a-theoretical model, meaning we do not want to understand why something is happening, but we are simply empiricallyattempt to describe the behavior of those variables. On the other hand the same goal can be achieved by structural modelwhich try to model depend variable by looking for explainatoring variables, meaning the aims to understand why and howsomething is happening.1 However, the empirical evidence available tells us that the dividend yield does not predict dividend growth. If other variables than the dividend yield are predictors of divi-dendgrowth, then the combination of these variables with the dividendyield delivers the best predicting model for the stock market 2
  3. 3. Giulio Laudani #18 Cod. 20192The most general specification is the ARIMA which is created by taking linear combination of errors, and it is used to runone or multi step forecast both with a recursive or rolling window procedure. The first one consist of adding the new ob-servation to the whale sample, while the second one is based on the idea of leaving the sample fixed and moving forwardfor any new observation.To check if the forecast is accurate or not there are several fit measures. The two simplest one are the mean squared er-ror (MSE) and the mean absolute error (MAE), the first will penalize more large deviation than small one, while the secondwill penalize high and small error at same level. A more sophisticated one is the mean absolute percentage error (MAPE)which give a % of the error and is bounded to be positive, however cannot be used for variable which will change in sign.The last one is the U-test which use a benchmark model to compare the used one.Belonging to same family of error checking, but with a different scope, are the economic loss function, which are focusnot on the punctual estimation, but on the ability of the model to predict the sign and the turning point. This kind of in-formation is really useful and in the end more profitable strategy.Within the ARMA class, the simplest one is the MA(k) moving average which describe a depend variable by using a combi-nation of current and lagged white noise. The features of this model are: after the k lagged considered the model collapseto the intercept which is even the mean of the dependent variable, and it has constant variance with auto correlation dif-ferent from zero only for the lagged term.The AR(q) autoregressive model describes the dependent variable upon only the lagged value of the variable itself plus anerror term. In this model the stationarity condition requires that the characteristic equation of the unit root for the laggedterm doesn’t have any solution greater than one, in this case the mean of the process exist and is given by:The Wold’s theorem says that any stationary process can be divided into a deterministic part and a purely stochastic partwhich is a MA of infinite order. The autocorrelation function can be solved by solving a set of simultaneous equations,where the function will geometrical decay to zero.If the data are experimenting a MA dynamic (autocorrelation) in the residual the R2 will be low, and the model should becorrected with introducing AR dynamic.The ARMA process is the combination of the MA and AR models. This class of model is specifically differentiated by theother two by using the pacf, in fact the acf is the same of the AR one, it will have a geometrical decay pacf function. Aspecial class is the ARIMA, where the process is not stationary.To choose the appropriate number of lags (to ensure to obtain a parsimonious model) we cannot directly/simply use the 2plot of the pacf and acf function, since when we are using mixture of variables it is hard to understand what’s going on.That’s why academics have developed several information criterion, which are based on the general idea to use the va-riance of the error term corrected by come parameters representing the penalty of adding new parameters and the sub-sequent absorption of degree of freedom for. Our goal is to minimize the criterion. There isn’t any superior index. *page233]Volatility model:2 This function is defined as the measure of correlation between the current and the lagged value, after controlling for any intermediate lags. It will always has the same valueof the acf for the lag number one, since there is no corrections. 3
  4. 4. Giulio Laudani #18 Cod. 20192The estimate of variance is central in the financial field, there exist several possibilities to be applied: those who are linear,like ARMA family, or non-linear. The last are useful to properly address some common features such as leptokurtosis, vola- 3tility clustering/pooling and leverage effect.To test the presence of non-linear relation we can still use the usual t-student or F distribution, however they are not flex-ible, in the maximum likelihood world there are three possible test to be run: Wald, Likelihood Ratio and Lagrange Multip-lier. Basically all of them works by comparing the ML estimates with another ML estimates restricted [a sort of F-test],more specifically we compute the distance between the maximum value of the two function.The Wald is based on the horizontal distance, while LM compare the slope of the two function and the LR is simply thepreviously formula written with likelihood function . The LR is distributed according with a chi-square distribu-tion with “m” degree of freedom (# of restrictions).Another totally different class of variance model is the one assuming a random variance (note that the G-ARCH class as-sumed a deterministic behavior given all the info at the time, basically the error term is only in the mean). Those modelare called stochastic volatility and add an error to the variance, meaning it is not any more observed, but latent (modeledindirectly). Unfortunately this class of model are hard to implement and to run appropriate estimatesClassic computation:In this class belongs the Risk Matrix model, Implied volatility and historical one [page. 384] They will capture heavy tail:ARCH and G-ARCH estimators:It is the most famous non-linear financial models, where the non-linearity is present in the volatility estimation, while themean is still assumed to be constant. The usage of this model can be suggested either by theory or by some sort of test 4that are classified in general and specific. This class of estimators are superior to the previously one because allow formean reverting and long run value (usually given by non-conditional variance), the conditional variance is constant.The ARCH model is an autoregressive model, which definition is based on the concept of conditional and unconditionalexpectation. This class of model aims to describe the conditional behavior of the random time series, while won’t say 5nothing on the conditional mean, which can take any forms : 6The condition for this class of model is that all the parameters must be positive and to test the significance of the modelwe can run a null hp on the joint meaningfulness of the model which is distributed according a chi-square. This specificclass has several drawbacks: we do not know how many lags we need, the model can easily become over-parameterizedand hence not parsimonious, the non-negative constrains can be easily breach where are employed many parameters.The G-ARCH will solve most of the limits previously seen, in fact it is less likely to breach the non-negative constrain sinceit allows an infinite number of past squared errors to influence current conditional variance, hence G-ARCH requires fewparameters to capture the whole lagged relationship.There are some class of G-ARCH model to specifically capture asymmetric behavior in the data distribution: GJR-GARCH,where it is added a Boolean variable [I=0,1] which assumes 1 value when the return is negative and negative otherwise3 Volatility level shows correlation in size in the short term4 The most famous one it the BDS test which is able to detect several possible departure form linearity, the specific test are good in detecting one class of possible nonlinearrelation, but it won’t be able to detect the other forms5 There could be the presence of the conditional variance even into the mean modeling GARCH-M6 This requirement is stronger than the minimum, but it is the only feasible to be applied 4
  5. 5. Giulio Laudani #18 Cod. 20192This specification is made to specifically capture leverage, in fact for negative return the equity of the company is reducedand so the leverage/risk is increased since debt is not changing.NG-ARCH model is defined as , where the unconditional variance isAnother leverage model is the EG-ARCH model where the function is defined with the exponential, hence it always posi-tive, however the expected variance beyond one period cannot be calculated analytically. 7To estimate the parameters both for ARCH and G-ARCH we cannot use OLS , but we should rather employ the maximum 8likelihood , more specifically a log-likelihood function to have the additive property in the function. The parameter esti-mated to show stationarity need to some to less than one, otherwise we will have an integrated G-ARCH.How to improve estimates:There exist two method to improve the quality of our estimates by using intraday observation. The two methods proposedare the range and the realized one.The first one use the parameter where the expected value of the squared range is , we can alsouse as a proxy the G-Arch volatility forecast itself. This form of computation has been shown more persistence than theusual one and autocorrelation lags are positive and significant from lag 1 to 28 (on average). This variable can be use into aG-Arch equation as a new variable to improve the quality of the estimates, note that even if the variable D and the pastvolatility are highly correlated we do not care, since we are not interested on the single parameter but on the final output.This model is preferred for illiquid asset, in general it is less performing than a good realized data (4 hourly).The second approach is basically the usage of intraday observation. This method has the property to increase the qualityof the estimate since the volatility of the parameter is positive affected by the frequency, furthermore has been shownthat at high frequency for high liquid asset the log estimate is close to the normal distribution. In this case it is possible toestimate logarithm of volatility with a LRW model or ARMA model, but since we are interested in level we must be awareof the transformation (it is no linear). This class of estimators perform quite well in the short term, for long horizon weneed to model integrated ARIMA model to ensure mean reversion/de-trending. The drawback of this approach is thatsometimes the quality of the data is poor, and so there could be noisy estimation due to microstructure of the market.These methodologies can be applied to increase covariance estimate as well. The realized approach can be directly appliedwith the as usual warnings, while the range based one need to correct: Assuming that the cross variance of portfolio we can estimate the covariance by reverting that formula and end up with our proxy 12− 22 , all the other consideration are still valid.Modeling long-run relationships in finance:As a prelude for this discussion we need to define the concept of stationarity and its importance in the finance field. Avariable is defined stationary if all of its variable are distributed with the same distribution over time, hence the IC won’tchange, however this is a strong assumption hence we usually refer to stationarity with a weaker requirement that is hav-ing a long-run mean. This result can be achieved if the random variable has constant mean, variance or auto cova-riance/correlation (white noise is always a stationary process). The last one Is represented with a specific function named7 OLS is focus only on the parameter of the conditional mean, not the variance one, hence the target optimization is not good8 Basically we want to differentiating the function by its own parameter and we are aiming to find the maximum point, in the non-linear case the function can show localmaximum point, hence it is crucial to properly specified the algorithm to find the real absolute maximum point. This method is usually run assuming random distributed error,however even if the hp do not hold, the estimates are still consistent. 5
  6. 6. Giulio Laudani #18 Cod. 20192autocorrelation function (acf). Box and Pierce have developed a test to check the joint hp of null coefficient for n-lags, thistest is distributed according a chi-square.In the finance field it is import to deal with stationarity since we are aiming to define shocks that will die away after some-time. Non-stationary series will have spurious regression estimates of the parameters (they won’t be different from zero)and the R2 will be low (suggesting lack of correlation), however those variables might be trending together. Furthermoreour estimates will lose all the asymptotic properties increasing the previously problem of inconsistency.There exist basically two kind of not stationarity: the random walk with drift and a trend stationary process. The first oneis an AR(1) model, while the second trend is function of time, however both the models have a coefficient greater thanone, so that all the past shocks have an infinite memory. The plot of those series will show an increasing trend behavior,hence the mean won’t be constant, while a stationary process will show a plot similar to a white noise with constantmean.Those two processes must be treat differently to eliminate the non-stationarity: the first one needs a differencingprocess/unit root process, while the second needs a de-trending procedure. It is important to use the appropriate me-thod, otherwise we will introduce some error: the deterministic trend if treated with differencing procedure will introducean MA(1) structure which is not invertible and so it has undesirable property; conversely if one tried to de-trend a stochas-tic trend then we won’t remove the non-stationarity.Differencing procedure:The differencing procedure can be generalized to solve series with more than one unit root, we say that a series is inte-grate od order “d” to express the number of unit root ( ) , in general a series of order “d” needs to be differentiatedd time to remove the non-stationarity.The parameter 1,2 are the same, the 3 is different but they will have same standard deviation, the residual are the sameTo test unit root we may use the autocorrelation function, however this method is not appropriate, the acf may show adecay factor even if the series has an infinite memory of shocks. To test unit root we need other approach: The first one proposed is to test the coefficient of the AR() to test the null hp “=1”, this model is employed on the differentiated equation to test if the coefficient on the lagged variable is still different form 0, we can also add the intercept and a deterministic trend in the differentiated equation. The test is distributed according to a non- standard distribution, and the null is rejected for value more negative than the critical one. The test need that the 9 error terms are not correlated one to the other, to ensure this we can add n-lags to our equation (augmented Dickey-Fuller test) To test for higher unit root, meaning to check if the series is of order s, that is greater than d (chosen value). This matter is not important in finance field since no time series has ever contains more than one unit rootCointegration:Now we are going to consider the case of two integrated variables and their joint behavior, the so called Cointegration. Iftwo integrated are combined, the combination will have an order of integration equal to the largest.The short run elasticity is given by while the long run . The previously set of equation can be differentiated:9 To choose the appropriate number of lags it is suggested to use a rule of dumb based on the frequency, it is crucial to choose the correct number since too few may leaveauto-correlation in the model, while too many may led to reduce the power of the test since we are using the degree of freedom 6
  7. 7. Giulio Laudani #18 Cod. 20192It is desirable to obtain residual that are not integrated, we can do that if there exist a possible combination of the twovariables which is stationary, meaning the two variable co-move, or more in general the two variables are bounded bysome sort of long run relationship. This kind of behavior is very frequent in the finance field studies.Differently form the simple differentiation procedure the Cointegration aims to ensure/find a long term relationship add-ing a lagged levels of cointegrated variables, known as error correction termsThis new formulation will grant us to define an equilibrium between the variables, where the coefficient Alfa represent thespeed of the reversal of the model, which can be expressed using more than two variables.There exist several test to check the existence of Cointegration among different variables based on a residual analysis simi-lar to the Durbin-Watson: we will use the residual to see if they are integrated or not, the null hp is that the model is inte-grated, of course this method depend on the model chosen to explain the relationship, it doesn’t provide us with the cor-rect specification. If the model fails we should use just the difference method to eliminate the integration problem, how-ever there won’t be any long run solution. 10To estimate the parameters we cannot use OLSstraightforward , instead there are three approaches: Engle-Granger; En-gle-You and Johansen.The first method consist on using the OLS to run our model with all the variables I(1)and save the residual, which must beI(0), otherwise we cannot continuing our estimates. We will run again our equation but instead of the explicit error correc-tion term we are going to use the lagged residuals, the two variable are now useful to made inference. this approach suf-fers of lack of power, possible simultaneous bias.The IC for integrated variable will increase to infinite10 The parameters are not meaningful if we are using more than one Cointegration 7
  8. 8. Giulio Laudani #18 Cod. 20192Guidolin Parts:Econometric analysis:Types of data:Time series are data that have been collected over a period of time on one or more variable. Possible issues can be: thefrequency need to be chose to avoid possible bias or not significant observation and also to ensure a continuous or regu-larly space; thevariable can be qualitative or quantitative, both of them are naturally ordered chronological.A time series of random variable is a stochastic process, the probability structure of a random variable is determined bythe joint distribution of a stochastic process, where we can assume to have a constant plus an error which dynamics willexplaina7describe the randomness.We can use time series to study the change of some variable over time, to perform the event study, basically to study allthe case in which the time dimension is the most important oneCross sectional data are on one or more variable collected at a single point in time, so there is no natural ordering in theobservation. Those data can be used to study the relationship between variables.Panel data have the dimension of both time series and cross-sections. Those data are the one providing more infoThe data collected can be continuously distributed or discrete data (depends on the variable observed). Another impor-tant characteristic of data is according to whenever they are cardinal, nominal or ordinal numbers. The cardinal are those for whom the value assumed have a meaning, twice bigger in value means twice The ordinal can be interpreted as providing a positioning/ordering, begging twice does not mean twice in value The nominal are those data that do not carry any information either in ordering and in valuingSteps involved in formulating an econometric model: 1. At first we need to state our general statement of the problems, keeping in mind that we do not need to capture every relevant real-world phenomenon, but it should present a sufficiently good approximation (be useful) 2. The second stage is to collect the data, remember to understand them before starting to use them 3. Third we need to choose our estimation model and to apply it 4. Forth we need to check the hypothesis of the model, if they are met and to understand possible deviation 5. Fifth we need to understand the result of the model and to formulate our theoryOLS model:OLS, ordinary least square is a method used to estimate the regression parameters of linear regression: a set of variablesregressed on common factors, hence evaluating the relationship between a given variable and one or more other va- 11riables . It assumed a linear relationship between the depend variable and the weights, hence the independent variableare free to have any form.Besides OLS there exist other methods to estimate the regression parameters: Moments and maximum likelihood. TheOLS estimates consists of minimizing the the sum squared error  we want that our model on aver-age is equal to Y. This method is preferred because has an analytic solution and under certain hp is superior to any othermethods as proofed by Gauss Markov theorem. An estimator need to have an important feature to be useful, that is un-11 It is different from the correlation measure which doesn’t implied any cause of the change, it simply stated the presence of a linear relationship 8
  9. 9. Giulio Laudani #18 Cod. 20192biasness and if it cannot be achieve we need to require consistency, which is an asymptotic property which require less hpon the error and its correlation with the independent variables.The procedure consist of Setting the first derivatives to 0 (it is a sufficient condition since the function is concave) we endup with the which andFrom those formula we see that to increase the estimation quality we need to increase the range of the independent va-riables X or to improve the estimate of the .As the formula shows the depend variable randomness comes from the presence of the error, under weak hp we can say E( ) = Y and V( )=The two parameter estimated are invariant to any change of unit of measure; speaking about the intercept we need tokeep in mind that it is not in general a good proxy of the real intercept because it will collect even all the garbage of themodel, furthermore it can be meaningless if there exist observation of the independent variable close to the originSince we are using sample data, not the population, we need to perform some estimates, one of the most important is theerror variance estimated with the classic variance formula with the correction of the absorption of two degree of freedomThe OLS HP:OLS requires certain hp to properly work and to allow the user to infer IC. Those hp are divided in weak and strong, thefirst are used to ensure satisfying property to the estimates, namely un-biasness, while the second one to perform thesignificant test and tobuild our IC and forecast Weak hp are three and they will ensure all together that OLS estimators are BLUE: 1. The expected value of the error is 0 (it is always the case if the intercept is included in the model) and they are not correlate with the X; if X is random we should request 2. The variance of error is constant and the correlation among errors is 0. If this hp fail, so that we can still estimate the β with the generalized least method where and it is still BLUE. Note the 3. Thematrix X is a full ranked one to avoid multi-collineratity and to ensure the matrix (X’X) to be invertible. The effect of the multi-collinearity is the increase of the betas variance The Gauss Markov theorem states that is BLUE by using the definition of variance efficiency which states that if and are both unbiased estimated we can say that is not worse than iif V( is at least psd. 9
  10. 10. Giulio Laudani #18 Cod. 20192 Strong HP are two: the error are independent one to each other and to X, hence they are distribute as a Normal. It follows that even the beta has the same distribution since they are linear combination of the errors. Under those hp we can built confidence of interval and test the statistical meaningfulness of the model parameters.Test used to assess hp and other model features:There are several test used in statistics to assess the fit and the overall and one by one coefficient significance, since theerror variance is unobservable we need to use the sample variance, so instead of the Gaussian distribution we are goingto use the t-student distribution The t-ratio which is a special types of test where we are assuming to check the difference of the variable from 0. The general idea is to divide the numerator (hp) by its standard deviation. The paired sample is a procedure to test the difference between estimator by considering the difference “d” introduced as a new parameters in the model. Hence The standard deviation is automatically computed by the model and it will consider the effect of the potential 12 correlation among estimators The F-test to test more/jointly hp. The F ratio is defined as ; where k is the # of parameters “q” is the # of factors tested. Clearly the higher will be the improvement of the unconstrained model to explain the relation- ship relative to the unconstrained one, the higher will be the ratio and so it will be significant. The F-test on one varia- ble in general gives the same result of a two side t-test 2 The R if the constant is included in the regression or better if . This measure can only be reduce by adding new independent variablesHow to deal with failure on the OLShp: Some consideration based on exams test: 1. Cov(r1,r2) where both return has been computed on the same factors it is equal to 2. If the constant is included in the model we have 3. 4. If we are doing an IC for the Forecast the IC will 5. The Mean square error where T is the estimators is the real value 6. If we do not consider the complete model, but we miss to consider one independent variable which is correlated with the included variables, there will be a bias in our coefficient since they will be correlated with the error 12 Positive corr will reduce the variance 10
  11. 11. Giulio Laudani #18 Cod. 20192 7. If the intercept is excluded in the model (and it is effectively different form zero) than the estimates of the betas are biased. However if the intercept is really 0 the coefficient variance will be lower. 8. If the Cov(Xi;xj)=0 than each Beta could be estimated by the univariate formula Where V is the Var-Cov Matrix of X, if the statement it is true that matrix is a diagonalEndogenous variables model:Those class of model has been introduced to overcame the problem of the endogenouoity in the X matrix, hence the ma-trix itself is stochastic and there will be a bias in the estimates, furthermore even the consistency property will be lost. Be 13endogenous means that the variables used are identify by the others .To identify if variables are endogenous there exist some possible test, but first we are now providing a definition of ex-ogeneity, actually we need to provide two definition since there exist two forms of exogeneity: a predetermined variableis on that is independent both of contemporaneous and future errors in the equation; a strictly exogenous variable is onethat is independent also for all past errors.Simultaneous Equation model:We need to find a reduced form equation for each endogenous variable, and use this new equation in oursimultanousestimates, granting that all the used variables are exogenous. Those reduced equation do not allow to directly retrieve theoriginal coefficient for the single endogenous variables, actually it is not always possible.The previously problem is called Identification issue, meaning that the extraction of the original value depends on the in-formation available, in other world the number of available equation must be equal or less to the number of parameters.There are two condition to be met to grant a identified equation: Order and Rank condition.VAR model:This class of model are a mix of the univariate time series and the simultaneous model. This model has been often advo-cate as a solution of endogenous structure, since they are flexible and compact in notation.The idea of this class of model is that all variables are endogenous, hence we do not need to model any identification re-strictions or perform any test or study the theory behind variable behavior. This is possible because at any time “t” all thevariable are known and so pre-determinate. On the other hand there are some drawback: It is a a-theoretical model, thereis no guideline on how many lags, or parameters are needed, there are lots of parameters to be estimated and besides allthe concept of stationarity becomes fuzzy, in fact any differentiating procedure will reduce the quality of info obtained tomodel the relationship between variables.Studying the non-normality of distribution:Understand data distribution is important in finance to properly manage risk. All the econometric model developed in thissection aims to explain finance data behavior.Method to detect non-normality:Those methodologies consist of tests, graphical procedure and possible data smoother. This task is really important sincewe are interested in defining risk measure which depends on tail distribution.13 A classic example is the price and quantity value 11
  12. 12. Giulio Laudani #18 Cod. 20192Smoother/converter into continuous distribution:Since financial data are discrete distributed we need some algorithm to convert them into continuous one. We can use aKernel estimator which depends on two parameters: the bandwidth “h” and the kernel function K(x)The kernel function can assume three possible forms: Gaussian; Epanechnikov and Triangular; actually this decision don’t 14really matter. The “h” is chosen to minimize the integrated MSE function .The Jarque-Bera test:This famous test is based on the third and fourth moments analysis, it aims to measure departure form normality usingthe term according to this formula: 15. This test is distributed following the chi-squarewith two degree of freedom, high value will show strong evidence against normality.If you standardized the data with their unconditional variance the test won’t changeQ-Q plots: 16A less formal and yet powerful method to visualize non-normality is to represent the quintiles of standardized dataagainst those of a normal distribution by using a scatter plot, graphically the data should distribute on a 45° line. This me-thod allow to see where the non-normality occur.How to fix non-normality issues:The academics had decide to describe data assuming that they are distributed with unconditional non-normal distribution,while the conditional one is IID distributed with dynamics/time-varying densities.We have developed a G-Arch model to try to fix/eliminate the non-normality of data, however empirical data shows usthat we still have failed in achieving that result. An explanation of this behavior is that the G-Arch error are still assumedto be normal, hence a possible easy solution is to substitute the normal with the t-student.The new parameter need to use the t-student is “d” degree of freedom, which is computed numerically to increase the fitof the distribution to the empirical data. It must be greater than 2 (to grant the existence of the variance at least) and canbe a real number, the higher its value more closer became the distribution to a normal. This distribution has a polynomialdecay factor with fatter tails but still with no skewness. If we are going to use this distribution to compute Var, we need tocorrect the sample variance estimation with the factor .Another possible solution proposed is the Cornish-Fisher Approximation, that can be defined as a Taylor expansion of anormal distribution to take into account the skewness and kurtosis indexA totally different approach is to study just the tail distribution without spending energy on studying the whole set of data,this method, named Extreme Value Theory, provides a good solution to our problems. We will analyze the distribution ofrescaled data conditioning to a given thresholds “u” , and we will assume that those data willfollow a generalized Pareto distribution (GP) which depends on the parameter .14 The distance between the Histogram and the continuous Cumulative distribution15 The moments are the sample one which are distributed according to a Normal (under the assumption of normal distributed error) with variance 6 and 24 and 0 expectedvalue [that is why they are standardized]16 this is the formula to compute the frequency probability 12
  13. 13. Giulio Laudani #18 Cod. 20192 . Value greater than zero implies a thicktail, while negative value will implies thin tails.To estimate the parameters we can use the Hill’s estimator (if B The problem with this methodology isthat do not provide guideline to choose the “u” and since the estimation highly depends on it, it is criticized to be a noisymethod.To complete the info provided by those methodology used to compute VaR we can introduce the Expected Shortfall,which gave us an idea on the possible distribution/outcome of extreme value behavior behind our VaR, it allow us to ad-dress difference between method on computing extreme quintile at higher confidence level (it may be the case to havesimilar value at 1%, but totally different at 0,1%). It is basically the expected value conditioning with our VaR. the ratiobetween ES and VaR is greater than one in the case of fatter tails, while be exactly equal 1 for Gaussian returnMultivariate models:We need to develop multidimensional model to achieve a more realistic and actively managed risk tool, hence we need tocreate model to compute/add correlation among assets. In theory we could stay in the univariate world by computing ourestimation using the portfolio as a single asset, however by doing so we are severely limiting our work, in fact for anychange of the portfolio composition we need to re-run all our model. This may be a suitable solution for passive strategywhen the relative weights of the portfolio do not change, but it is absolutely unfeasible for active purposeOn the other way round the computation of the corr matrix is not a trivial task, in fact this task suffers of the so call satura-tion problem, basically the number of parameters needed for the n security increase at an higher rate than “n”, so we will 17need more and more observation for any added securities . A ratio that measures this problem is the saturation ratiothat is the ratio between the total number of observation and the number of parameters needed. To solve this problemwe need to develop some approaches.Exposure Mapping models:We can overcome the computation problem by using a factors model. We will run our model to compute the betas foreach variables, from where will get the corr measure. The limits is that this method ignores the idiosyncratic components,hence it works only for well diversified portfolio, where the idiosyncratic component is negligible .Furthermore it is not obvious which factors are going to be chooseConstant Conditional Covariance model:In this case we are using the same algorithms used to compute variance to compute covariance measure, hence depend-ing on the algorithm chosen we have our result. We can chose the moving average estimates, or the exponential smooth-er, or G-Arch based.In the last two examples we need to add a restriction on the parameters, they must be the same for all the securities andconstant over time, meaning they must not depends on them, this restriction is quite fuzzy and do not match the realworld behavior, but it will ensure the respect of the SPD property for the Var-Cov matrix. Although the conditional cova-riance is time varying17 To give an example with 15 assets we have 15*14/2= 105 parameters, hence we need at least 160 periods observation, meaning 14 years of monthly data or 32 weeks 13
  14. 14. Giulio Laudani #18 Cod. 20192DCC model: 18The previously problem of constant parameters among all securities is dealt by this class of model with a step approachwhich is focus only to capture the time dynamic: at first we will estimate the variance with suitable methods, than weproceed to estimate the corr by using the standardized data given the previously estimate volatility (the corr matrix is thesame for both returns and standardized data).The corr matrix estimation needs an auxiliary variables to ensure that the estimated value falls in the interval [-1,1].The suggested approach to build those auxiliary variables is to use a G-Arch type dynamic, the parameters are the sameacross all the securities. This method is easy to implement, manageable since we need to estimate few parameters simul-taneously (QMLE used in waves), although a MLE will be more efficient, but unsuitable for large portfolio.VEC and BEKK model:First of all we define with VEC the function that converts the up triangular matrix into a vector. This function is used todeal with multivariate models where we need to define some structure on the parameters and estimators to ensure both 19the respect of PSD property and to manage the number of parameters needed .The method suggest is a five step process: 1) variance targeting for the unconditional value to avoid big fluctuation for anysmall change of the key parameters; 2) we then compute the diagonal value with a G-Arch model. we are going to restrictthe parameters matrix to be diagonal, but different from each assets.The BEKK is computational demanding (it is still an over parameterized model), however it is becoming more popular. Thismultivariate class of model do not have specific diagnostic check, people still use the univariate one, however this decisionis unfeasible due to impossibility to check for the size of the test. The elements that are checked are the adequacy of thespecification and the ex-ante evidence of multivariate effectsPrincipal component approach:This approach is based on 7 steps: 1. Estimate univariate G-Arch for each asset 2. Standardized the returns and order them 3. Compute P on this matrix 4. Estimate G-Arch for each column of the PC matrix 5. Compute the corr matrix with a loading function C=LDL’ and standardized it to ensure a diagonal of 1 6. Scale the corr matrix with the initial estimate of the variancePrinciple component is an old and alternative method to estimate factors and betas by using the spectral theorem, wherethe number of principal components is less than or equal to the number of original variables. The rational of the method isto proxy the unobservable factor with portfolio return, which are built up to be sensible to constrains. We need to jointlyestimate the factors and the betas.This transformation is defined in such a way that the first principal component has as high a variance as possible (that is, 20accounts for as much of the variability in the data as possible ), and each succeeding component in turn has the highestvariance possible under the constraint that it be orthogonal to (uncorrelated with) the preceding components, hence thefirst elements of the error matrix is smaller than the smaller of the factors’. Principal components are guaranteed to be18 We will decompose the Var-Cov matrix into the corr matrix and into the standard deviation matrix, doing that we can separately estimate them19 A G-arch model on 100 securities has 51010050 parameters.20 By possible we mean given the constrain on the squared sum of the weights to be equal one, otherwise there won’t be a boundsince it can be arbitrary change by multiplyingby a constant. There exist other alternative such using the module, however those methodologies doesn’t allow an analytic solution 14
  15. 15. Giulio Laudani #18 Cod. 20192independent only if the data set is jointly normally distributed. PCA is sensitive to the relative scaling of the original va-riables Assuming to know the varianceand to have a time independent Var matrix. This last assumption is added just to simply calculus, in fact there exist more complex methodologies to apply Principal component. Returns’ variance can be represented by the spectral theorem. Other assumption is that V(r) is a full rank matrix, thus if k is the rank it is equal to m which is the number of returns used 1. Where x is the eigenvectors and the is the diagonal matrix which has been ordered from the highest 21 to the smallest value starting from the upper left position 2. The factors proposed are portfolio return, computed using the eigenvector and the market returns. Since each portfolio is made by where x is the eigenvector, each of this portfolio is indepen- dent to the other so we can use the univariate formula to compute our beta so The beta are the eigenvector for the specified factor 3. The variance of this factors is equal to the diagonal matrix in the spectral decomposition and it is a di- agonal ; 4. Since our model completely explains the return behavior, so to change it in a model more close to the common regression we will rearrange the formula.We will divide the factors in two group. The first one will be the variables matrix, the residual will be the error matrix.  The residual matrix will have mean 0 and it is uncorrelated with the factors Thus the Var-Cov highest value of the residual will be smaller than those of the factors one  The factors matrix rank is equal to q where q is the number of factor considered (q=j) There is drawback in this methodology, it doesn’t generally respect the pricing theory which state that there should not be extra remuneration not to bear any risk, in fact the residual can be correlated to some return and so they are not idiosyncratic, and furthermore this risk is not negligible  an asset even if it’s not correlated with the factors included can have an excess return There is another way to build principal component by maximize the portfolio risk under orthogonally and that the sum of the squared weights is one constrain representing each principal components 1. We will built a Lagrangian function to maximize the variance under the constrain, we will end up with that the weights are the eigenvectors and the variance is the diagonal elements of the spectral theorem decomposition of the variance of the return The constrain is made to have an analitic solution, even if it doen’t have an economic meaning, in 22 fact in general the linear combination of and return is not a portfolio 2. The book suggest to see the marginal contribution of the total variance of each component to notice how basically all the variance is explained by the first three components21 Remember that the Var-Cov matrix is a PD, otherwise (PSD) we cannot directly apply the theorem. The is the characteristic equation. Which is of order equal tothe rank of the Var-Cov matrix, so it can be solved only numerically22 We can use the absolute sum of the , but only numerical solution are available 15
  16. 16. Giulio Laudani #18 Cod. 20192 Assuming an unknown Var-Cov matrix : we can start from an a priori estimate of V(r) using historical data, how- ever there could be the case that the quality is to low, that’s way it is suggested another methodology. We can start to estimate each components, starting from the highest, one by one. 1. This method consists of maximize the variance with the usual constrain x’x=1 leaving all the estimation error in the last component, since we can better off the estimate of the first oneSwitching/Regime models:The idea of this new set of models is to try to describe random variable by dividing their possible probability distributioninto several one, each of those will have specific parametersdepending onstate of nature. The simplest possible model isthe one based on regression estimation sing dummy variable, however this approach suffer of sever limits, it is valid onlyex-post, hence it is useless to forecast random state occurrence, while it is useful to explain cyclicity behavior.More advanced model will treat those state occurrence as random and will try to model the change of state of nature de-fining a proper probabilistic modelThreshold model:This method defines a discrete number of scenario, each of them has specific parameter value and the occurrence of themis conditional to some other variable value called thresholds. The change of state in this model are abrupt.Smooth transaction:This method is close to the previously one, but instead of using trigger event it will use probability distribution dependingon the value assumed by the threshold variables, so the threshold variable do not determine the state deterministically.Hence, given the cumulative distribution expressed as F(S,x) we can compute the marginal probability of the state of na-ture as the difference of the value assumed by the CDP using the ending and the beginning threshold value.Markov switching model:This is the frontier in the academic field. This class of models is assuming that the random variable defining the state ofnature is a discrete, first order, k-state, irreducible, ergodic Markov chain. Those attributes define a random variable whichdepends only on its immediately past, do not have absorbing state and it has a long run mean, henceThis approach can be easily combined with all the previously models to better capture non-normality.There exist different possible Markov model: one modeling just the first moment (MSI(k)) and the other where we modeleven the volatility (MSIH(k)), besides this two there are a class of autoregressive model, where we are going to modeleven those added parameters; the last one is the model based on VAR [MSIVARH(k,p)] which has been proofed useful toexplain contagion dynamics: Simultaneous (correlation between the variable –co-movement-), linear (through the VARelements, meaning the auto regression components) and nonlinear (through the fact that the regime variable that drivesthe process of all variable is common to all the variables, hence the switching process)The parameters of this class of models are the usual first and second moment state dependent, but also the probabilitycollected into the transition matrix, representing the persistence and change probability of the state of nature. They areestimated with ML model. 16
  17. 17. Giulio Laudani #18 Cod. 20192Appling the model has been noticed some wired behavior, the variance of the bull state, where returns are higher, is lowerthan the one were the we stay in a bearish state. This behavior, which seems to be against the theory of finance, is wellexplained by the uncertainty behind the state occurrence, if we will correct for this issue the variance becomes “correct”.Basically we are computing expected value conditional to the probability of occurrence of the different states.Since these random variables are unobservable statistician need to infer them. There are two possible methods: filteredstate probability is based on conditioning on past information, meaning all the available informationat the time when the analysis has been performed; smoothed probability is an ex-post measure, since we are valuing apoint in the past.By applying the Bayes theorem we compute the whole set of probability.The ergodic prob. ; the duration (for the state 1) of the state is ;The book proposes an algorithm developed by Kim, basically it proceed backwards instead of proceeding forward basedon the fact that at the final date both filtered and smoothed have the same value. We only need to provide the initializa-tion to the algorithm, the usually used one is the ergodic or a dummy 0,5 value.To test the null hp on the k-scenario against k+1 scenario we cannot use the likelihood test, since the limit distribution chi-square cannot be used, hence we need to use information criteria. The proposed one is the Hannan-Quinn criterion:Hence to minimize HQ we need to increase the likelihood or decrease the number of parameters.Simulation methods:The simulation methodologies is the answer of econometrician to the problem of dealing with completely new situation,where past observation cannot provide useful guideline. Thanks to simulation we have the chance to run experiment un-der “controlled” condition, we seek to model the functioning system as it evolves. In finance field their usage is focused onVaR or ES or other risk measure.Historical simulation:This kind of simulation is based on the idea that past distribution is a good proxy for the future one. We are going to findthe percentile distribution of data and use it to extract future quintile. It is easy to be implemented an model free, howev-er there is no guideline on the time length to chosen, there is a trade-off between significance and reactiveness, further-more it assign equal weights to all the observation.A close model is the Weighted historical simulation where instead of assign equal weight we are going to use a decliningexponential algorithm which is function to a decay factor.Some common critics on both methods are that we cannot use the usual accrual to modify daily data into weekly or an- 23nually data and if we are going to use this simulation to change short position we can have fuzzy result.Monte Carlo Simulation:This approach is based on parametric distribution defined ex-ante used to model the quintile. This method is flexible, pa-rameter efficient and it allows to model time series, on the other hand it is tightly parameterized , hence we need to be23 In this case it is suggested to compute the position on the right tail as gains 17
  18. 18. Giulio Laudani #18 Cod. 20192confident on the distribution chosen to explain the random variable. Differently form the historical one, MC has the use- 24ful property to define a path generation process .A classic use of the Monte Carlo simulation is to define the behavior of a variable whom we now the asymptotic property,but we have too few observation to be confident to directly apply them.There exist several techniques to be implemented to reduce the variance of this estimates: the antithetic and control va-rieties. This problem is quite sizable in fact e common techniques to reduce the variance of the estimate we should in-crease the sample size by 100 times.The first approach proposed is basically a smarter usage of the path generation to ensure that after a given number ofreplication we have covered the maximum range of possible outcomes. The idea is to take for each draw the complementone. Belong to this family there are other possible methods with different rules.The second one is based on the idea of using a highly correlated variable with the one on which we are interested on. Thechosen variable is known, hence it can be used as non-randomFiltered Simulation (Bootstrapping):This is mixture of the past two approaches, basically we are going to use the past percentile distribution to draw with re-placement our future quintile, following the usual MC path generation process. The advantage of this method is to takeinference without strong assumption on the underling variable distribution.This approach performance bad where the data has outlier, and there is no hp on the distribution, it is exactly extracted bythe data24 Usually we are modelling the error term not the whole variable 18