• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Econometrics: Basic
 

Econometrics: Basic

on

  • 335 views

 

Statistics

Views

Total Views
335
Views on SlideShare
335
Embed Views
0

Actions

Likes
0
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Econometrics: Basic Econometrics: Basic Document Transcript

    • Giulio Laudani #13 Cod. 20191 EconometricsBlack-Litterman Model ...................................................................................................................................................... 1OLS..................................................................................................................................................................................... 2VAR and volatility estimation ............................................................................................................................................ 4Stock for the long run........................................................................................................................................................ 6Style analysis (OLS application) ......................................................................................................................................... 6Principle component ......................................................................................................................................................... 7Logarithmic random walk.................................................................................................................................................. 8Types of return and their properties................................................................................................................................. 9Markowitz optimization portfolio (Algebra calculus application)................................................................................... 10Probability Mathematics and Laws ................................................................................................................................. 11Matlab question .............................................................................................................................................................. 12Black-Litterman Model 1Black-Litterman modelscope is to estimate the market expected return avoiding the Markowitz optimization pitfall . The basicidea is to use as weights for the market allocation, the ones computedstarting from those provided by some well diversifiedindexand by adjusting them with our views as departures from that index asset allocation. It is an application of the Bayesianstatistic, basically we want to find a new distribution under some new information provided by us.The methodology proposedconsists of a multi-step process: At first We should perform the estimations of the B-L variables: o We will chose a market index, from whom we will obtain the corresponding weights. Here we are making some assumptions on the index. The chosen market proxy should be mean variance efficient. This assumption is not really strong, in fact it is reasonable that a market proxy is at least not too much mean variance 2 inefficient. However we should remember that a sub set of an efficient portfolio is not in general efficient . o The available market information are distributed according to a Normal , where the mean is equal to the estimated market expected return and the Var-Cov matrix times a scalar smaller than one o We already know the relationship between Var-Cov, weight, market return and risk aversion coefficient, as it has been defined by Markowitz optimization, hence it is possible to invert that formula and to find out the implicit market expectation  Since theestimatedexpected market returnhighly depends on the choice of the proxy index, to lessen the problem, we should use a big portfolio. However the bigger the portfolio the harder/ numerical demanding is the computation power required, so to maintain a numerical manageability we can deepen the use ofCAPM:we are going to use a big portfolio and we willfind for each of our 3 securitiesjust the betas, hence we don’t need to estimate the whole Var-Cov matrix o The Γ is the transforming parameter of the Var-Cov matrix, its meaning is to account the relative importance gives to the market or our view info, it is important the ratio between it and the view matrix. The higher the ratio the higher the confidence in the market1 The problem overcome by this model is the high volatility of the historical return, which doesn’t allow to define narrow confidence interval at high probability level. Due to this highvolatility, there is an high sampling error, which doesn’t allow to use the Markowitz method to properly find out the weights of the market portfolio2 It could be the case only if the sub portfolio has been built by random sampling technique, so that it has the same sub class exposure3 There is a drawback, stocks with low correlation with the market tend to give unstable results, so it is necessary to implement a multifactor model 1
    • Giulio Laudani #13 Cod. 20191 o We will make some assumption on the Var-Cov matrix. The matrix is usually estimated by monthly historical data (usually a tree years time frame) or by smoothed estimates  A typical problem in the Var-Cov matrix is the overestimation correlation, which will lower the positive effect of diversification, if two securities have similar expected return and high correlation there will be an over concentration on the asset with higher expected return. There exist a 4 procedure to lowering this problem, that is similar to the adjusted beta. o The risk aversion parameter (assuming the absence of the risk free asset) is given by the Markowitz formula: variance over expected excess return. Note the denominator is an a priori guess since it is what we are looking for, we can use an iterated process o The views must be given in a numerical form, so that it is possible to immediately check the effect on the allocation.The asset manager views consist on portfolio return, which are summarized by a Normal with mean the expected return of the portfolio (given the manager views) and a diagonal Var-Cov matrix expressing the 5 confidence on those views Where P is the weights that allow to have V expected return given the expected return of the securities in the market Given all the previously information the Black and Litterman proposes to combine those two set of information using an optimization equation, aiming to minimizing the distance between our parameter and both the market’s and manger’s information. Note that if we will use only the market portfolio information the investor will end up with the market portfolio itself, the innovation of the model is the possibility to add views and so to have a different allocation The solution can be express in two ways as well: Which can be seen as the tangency portfolio in the Markowitz optimization theorem, where to the we will add a spread position representing the view correction Which is like a weighted average. The equivalent weights are the parameter g is a constant that made the weights sum equal 1OLSOLS, ordinary least square is a method used to estimate the regression parameters of linear regression: a set of variablesregressed on common factors. It assumed a linear relationship between the depend variable and the weights, not for theindependent variable.Besides OLS there exist other methods to estimate the regression parameters: Moments and maximum likelihood. The OLSestimates consists of minimizing the the sum squared error  6 we want that our model on average isequal to Y. This method is preferred because has an analytic solution and under certain hp is superior to any other methods asproofed by Gauss Markov theorem. An estimator need to have an important feature to be useful, that is un-biasness and if itcannot be achieve we need to require consistency, which is an asymptotic property which require less hp on the error and itscorrelation with the independent variables.Setting the first derivatives to 0 (it is a sufficient condition since the function is concave) we end up with the which and4 You will blend (meaning a weighted average) the estimated matrix with a reference one made of one in the diagonal and the average of the off diagonal elements of the estimatedmatrix5 We should define the matrix value so that to ideally built a IC at 95% in which our views are restrained6 doing thefirst derivatives we end up with 2
    • Giulio Laudani #13 Cod. 20191from this formula we see that to increase the estimation quality by increasing the range of the independent variables.As the formula shows the depend variable randomness comes from the presence of the error. Hence the conditional andunconditional distribution are the same, furthermore under the weak hp we can say: E( ) = Y and V( )=OLS requires certain hp to properly work and to allow the user to infer IC Weak hp are three and they will ensure all together that OLS estimators are BLUE: o The expected value of the error is 0 (it is always the case if the intercept is included in the model) and they are not correlate with the X; if X is random we should request o The variance of error is constant and the correlation among errors is 0, if this hp fail, so that we can still estimate the β with the generalized least method where it is still BLUE. We need to transform the original equation so that to have another one with . Here the proof: in GLS the o Note the o The matrix X is a full ranked one to avoid multi-collineratity and to ensure the matrix (X’X) to be invertible. The effect of the multi-collinearity is the increase of the betas variance The Gauss Markov theorem states that is BLUE by using the definition of variance efficiency which states that if and are both unbiased estimated we can say that is not worse than iif V( is at least psd. we should also consider that if we want to estimate a set of linear functions of where H is not random the definition of BLUE estimator is invariant to this. We call this property “invariance to linear transform” and it is the stronger argument in favor ofthis definition of ’not worse’ estimator. An implied hp in the previously theorem is that the class of estimators is to be linear on the dependent variable. Strong HP are two: the error are independent one to each other and to X, hence they are distribute as a Normal. It follows that even the beta has the same distribution since they are linear combination of the errors.Under those hp we can built confidence of interval and test the statistical meaningfulness of the model parameters. There are severaltest used in statistics to assess the fit and the overall and one by one coefficient significance o The t-ratiosince the error variance is unobservable we use the sample variance so instead of the Gaussian distribution we are going to use the t-student distribution the percentile za to define the IC and to see if the 0 is included or we can use the “a” p-value; “n” is the degree of freedom, each of them will be used for each estimators.The general idea is to divide the numerator (hp) by its standard deviation. The paired sample is a procedure to test the difference between estimator by considering the difference “d”introduced as a new parameters in the model. Hence The standard deviation is automatically computed by the model and it will 7 consider the effect of the potential correlation among estimators o The F-testto test more/jointly hp. The F ratio is defined as ; where k is the # of parameters “q” is the number of factors tested. The F-test on one variable in general gives the same result of a two side t-test7 Positive corr will reduce the variance 3
    • Giulio Laudani #13 Cod. 20191 2 o The R if the constant is included in the regression or better if . This measure can only be reduce by adding new independent variables. Note that if we are in the univariate the since y is a linear combination of x. Some consideration based on exams test: o Cov(r1,r2) where both return has been computed on the same factors it is equal to o Remember that the expected value of each betas in a multivariate regression is the real beta and that the difference of any pair of betas is always o If we us the estimated OLS parameter to make an inference in the region outside the X used (forecasting) you have to assume that the beta in the new region are the same and are still distributed according to a normal with the same parameters. The target function to estimate the IC is . If we are doing an IC for the Forecast the IC will o If the constant is included in the model we have and the fitted value of y on the average value of the X is the average of the fitted value itself, which is equal to the average of the real y as well. ; if we use a model without intercept the o o The Mean square error where T is the estimators is the real value o If we do not consider the complete model, but we miss to consider one independent variable which is correlated with the included variables, there will be a bias in our coefficient since they will be correlated with the error o If the intercept is excluded in the model (and it is effectively different form zero) than the estimates of the betas are biased. However if the intercept is really 0 the coefficient variance will be lower. o If the Cov(Xi;xj)=0 than each Beta could be estimated by the univariate formula Where V is the Var-Cov Matrix of X, if the statement it is true that matrix is a diagonalVAR and volatility estimationBefore talking about the VAR and its estimation procedure, we should spend some word on the volatility itself, on its meaningand on how to estimate.In the finance field the volatility is used as a measure of risk to have a sense of the unpredictability of an events. It is usuallycomputed looking to the historical trend of a variable or by looking to the derivatives market, that is the implied volatility,which is the one making true the market price given the other variable using a pre-specified pricing formula 8In the finance field Tails behavior is essential toestimateVaR , which is used to assess the max possible future loss on a certaintime interval. The VAR inputs are the exposure amount and the percentile indicating the given probability to experiment a loss atleast equal to the one indicated by the percentile itself. As we can see the hp on the distribution of the Tails of the return is thekey to ensure meaningfulness to this tool. The book propose four possible data distributions: The Parametric one is the first methodology proposed. It consists ofa gauss distribution with parameters infer from historical data.The parameters needed are the to find any quintile. Those parameters are estimated by historical data, in detail the volatility is estimated using the Riskmetrix. o Our goals are to estimate the quintile and the low bound since the variance it is estimated8 A general limit of the VAR methodology is that it doesn’t give information on the event that is causing the loss, but it gives only the probability of that event. It also ignores thedistribution after the quintile estimated, furthermore it is a pro-cyclical measure, in fact since many of the methodologies proposed use historical parameters (or more in general data)from the past time interval, a positive (negative) trend will bring a positive (negative) momentum that will biased the estimation downgrade (upgrade) 4
    • Giulio Laudani #13 Cod. 20191 The where we are assuming as a proxy So the lower bound is o This method has several limits highlighted by empirical evidence, in fact the underlying hp on Gaussian returns is counter proofed by data. Mixture of gauss distribution. It consists of the mixture of two or moreGaussian or not distribution with different parameters weighted with the probability of occurrence. The general idea is to use for the first the normal case parameters, while for the second the exceptional one. The blended distribution can be computed only numerically by maximum likelihood method where is estimated before running the quasi-likelihood function (log form); however the tails will decline still at an exponential rate like the Gaussian Distribution; this methods is like a Garch model with infinite components, so the unconditional distribution becomes a non-constant variance The Non parametricconsist of using a theoretical distribution based on a frequency probabilistic approach. We will use as distribution the cumulative function, no parameter needed o The confidence interval are built starting by finding the i-esimo observation from whom we have the wanted empirical probability using the frequentistic approach o To find out the low bound we will need to compute the volatility of the frequentistic probability  We will compute the probability of occurrence of that i-esimo observation using a binomial distribution, The cumulative distribution: With a n>> the distribution converge to a Gaussian distribution where j is the number of the ordered observation which maximize the so it is the lower bound o The drawbackis givenby the few insight provided for extreme quintile since the observations became either granular (be non-contiguous) or totally absent, hence this method is weak against alternative parametric distribution (high sampling error) Semi-parametricis a blend of a parametric model to estimate the central value (close to the mean) and a non- parametric one for the tails, while the non-parametricpart is to find where to plug in the tailor model o The parametric partfor the central value is a gauss distribution as in the parametric one o The non-parametric part suggested to estimate the tailor data consists in building a function to approximated the behavior of tails data.For where L(.) is a slow varying function and a is the speed with which the tails goes to 0 o To estimate “a” (the only parameter) we use the formula to represent the log frequency distribution Where represents all the approximation made, C is a conant, in fact the ln of slow varying function is basically a constant  a is estimated using OLS; polynomial declining rate for tails o Then we graphically search for the plug in point, which is the point form where the empirical cumulative distribution start to behave as a linear function  When we have found the sub set of data, we will use them to estimate the quintile with: Hence given a and the probability and the first point where to start the plug in  The procedure to find the low bond from the quintile probability is since: 5
    • Giulio Laudani #13 Cod. 20191 Where the low bound isStock for the long runStock for the long run is a common mistake in the finance field. It states that an investor should choose its own investmentstrategy choosing the stock with the highest expected return without considering the underling risk. This statement is based ontwoex-ante and one ex-post hp, those hp came from the intuition (LRW world) that after a certain time period any kind of returncan be achieved, regardless the risk: First hp: given the Sharpe ratio formula the idea is that with a sufficient big n any result can be acquired, or in other word there is time interval in which the probability to obtain a given expected return is reached (usually with a confidence of 95%) it is a direct consequence of the LRWhp that states that return grows at a rate “n” while volatility 9 grows at a rate equal to the “square root of n”  Second hp: taking two investment strategies with same mean and variance, one in 10 uncorrelated securities for one year and the other in just one security for 10 year, the hp will suggest the existence of time diversification Third hp (a posterior): looking the historical performance of the US stock exchange it makes sense to invest on it compared to other investment strategyAs it can be seen It is a consequence of how we built confidence of interval, however it can be proven wrong: First critique: it made some hp on the investors utility function, meaning how he will choice his investment strategy. The statement is assuming that investors will choice only comparing the Sharpe ratio on the long run, and that they won’t change their strategy. There is another comments to be done regarding the strategy: assuming to be confident on the criterion of Sharpe only for a certain long time frame, but be against the investment using the same criterion for each of the period subset, it is like assuming a peculiar Utility function of the investors; not only the statement is wrong, in fact the investment is not superior for any given horizontal period, but for sufficient long time horizon this strategy seems to be the best one among all the other possibilities. Furthermore, since we are interested in the total return (not in the expected return) we notice that the range of possible total return ( will increase at “n” rate over time, hence the uncertainty is not declining Second critique: it is an error based on wrong idea that two investment strategy with different time frame are comparable, thus there is not any kind of time diversification. Third critique: since the US stock market has shown the highest return on the last century you should invest in stock. This is an ex post statement and it cannot be proven for the future, in fact the positive US trend has been sustained by the economic growth of that economy, and we cannot infer from historical data a similar future successStyle analysis (OLS application)Style analysis is a statistical way to compare asset manger performance with a specified ex post portfolio built using marketindex, meaning we want to know if the manger has been able to over-perform the market performance, hence if he haddeserved the management’s fees. This capability to add value is not replicable by investor using public information it is an ex-post analysis The suggested methodology consist of regressing the fund return on some indexes, which are subjectively assumed by the investors to be a good proxy of management strategy We will consider the spread of the return and the estimated return (hence we are considering the error) and we will analyze if its mean is statistical significant and if the cumulative sum of the error show any trend Sharpe suggests to build the model following this procedure: o Set the constrain on the beta, they must sum to one and without intercept, this can be done by an ex-ante method or an ex-post one (normalizing the value), keep in mind that those methods don’t give same results. This is a simplification made by Sharpe to ensure a self-financing strategy and to avoid the presence of a 10 constant return over time (even risk free cannot achieve this result ) o The regression is made on subset of constant periodlength, making them moving forward one by one The critiques of this methodology consists of three points:9The theme of the explosion growth in the unitary coefficient auto regressive model, where LRW is one of those.10 Theory of finance justifies this statement. We can use sort term risk free rate investment 6
    • Giulio Laudani #13 Cod. 20191 oThe weights has been set to maintain constant relative proportion is a limit and a costly strategy, there exist alternative: buy and hold or trend strategy or even within the constant weight we can have changes o If the fund manager knows how he will be judge and he knows more than investor regarding the composition of the market portfolio, he can easily over perform the market, but it is not an easy task to replicate the market portfolio ex ante o The analysis is not considering the difference of variance produce by the two strategy and this can give an advantage to the fund manger There are three possible decision regarding the analysis depending on the error value o The cumulative error is negative, and it is a strong evidence against the fund performance, since it is more efficient a totally passive strategy o The cumulative error is equal or not statistically significantly different from 0, it is hard to assess if the management performance is not satisfying o The cumulative error is positive, it cannot consider an evidence of the goodness of a management team since this measure alone it is affected by many strong simplifying assumptions and doesn’t consider the volatility difference between the passive strategy implemented and the effective onePrinciple componentOne of the key task in the asset management industry is to estimate the where are the price ofrisktimes the beta (sensitivity). Those price are usually proxy with portfolio return, the problem is that we need tojoint estimate both the factors and the betas, so we have an infinite range of possible weight to be used as solution.There exist two methods to test the meaningfulness of a model: the first one is to check if the intercept is equal zero,however it is not really powerful and furthermore is not a good criterion (we may have really good fitting modelwhich fails the test); the other one is to test the linear relationship between return and beta. This last method is atwo-step process, at first we estimate the beta for each portfolio, than we will run a cross sectional regression tocheck that the estimated beta and factor are consistent with market data11. We may add other term like square ofbeta or error terms to see if those term are meaningful.Principal component is an old and alternative method to estimate factors and betas by using the spectral theorem, where thenumber of principal components is less than or equal to the number of original variables. The rational of the method is to proxythe unobservable factor with portfolio return, which are built up to be sensible to constrains, basically we chose as factor . We need to jointly estimate the factors and the betas.This transformation is defined in such a way that the first principal component has as high a variance as possible (that is, 12accounts for as much of the variability in the data as possible ), and each succeeding component in turn has the highestvariance possible under the constraint that it be orthogonal to (uncorrelated with) the preceding components, hence the firstelements of the error matrix is smaller than the smaller of the factors’. Principal components are guaranteed to be independentonly if the data set is jointly normally distributed. PCA is sensitive to the relative scaling of the original variables Assuming to know the varianceand to have a time independent Var matrix. This last assumption is added just to simply calculus, in fact there exist more complex methodologies to apply Principal component. Returns’ variance can be represented by the spectral theorem. Other assumption is that V(r) is a full rank matrix, thus if k is the rank it is equal to m which is the number of returns used o Where x is the eigenvectors and the is the diagonal matrix which has been ordered from the highest to the 13 smallest value starting from the upper left position11 To increase the power we group the return in box which maximize the distance between observation12 By possible we mean given the constrain on the squared sum of the weights to be equal one, otherwise there won’t be a boundsince it can be arbitrary change by multiplying by aconstant. There exist other alternative such using the module, however those methodologies doesn’t allow an analytic solution13 Remember that the Var-Cov matrix is a PD, otherwise (PSD) we cannot directly apply the theorem. The is the characteristic equation. Which is of order equal to the rankof the Var-Cov matrix, so it can be solved only numerically 7
    • Giulio Laudani #13 Cod. 20191 o The factors proposed are portfolio return, computed using the eigenvector and the market returns. Since each portfolio is made by where x is the eigenvector, each of this portfolio is independent to the other so we can use the univariate formula to compute our beta so The beta are the eigenvector for the specified factor o The variance of this factors is equal to the diagonal matrix in the spectral decomposition and it is a diagonal ; o Since our model completely explains the return behavior, so to change it in a model more close to the common regression we will rearrange the formula.We will divide the factors in two group. The first one will be the variables matrix, the residual will be the error matrix.  The residual matrix will have mean 0 and it is uncorrelated with the factors Thus the Var-Cov highest value of the residual will be smaller than those of the factors one  The factors matrix rank is equal to q where q is the number of factor considered (q=j) There is drawback in this methodology, it doesn’t generally respect the pricing theory which state that there should not be extra remuneration not to bear any risk, in fact the residual can be correlated to some return and so they are not idiosyncratic, and furthermore this risk is not negligible  an asset even if it’s not correlated with the factors included can have an excess return There is another way to built principal component by maximize the portfolio risk with the constrain that each portfolio is orthogonalto each other components and that the sum of the squared weights is set to one o We will built a Lagrangian function to maximize the variance under the constrain, we will end up with that the weights are the eigenvectors and the variance is the diagonal elements of the spectral theorem decomposition of the variance of the return The constrain is made to have an analitic solution, even if it doen’t have an economic meaning, in fact in 14 general the linear combination of and return is not a portfolio o The book suggest to see the marginal contribution of the total variance of each component to notice how basically all the variance is explained by the first three components Assuming an unknown Var-Cov matrix : we can start from an a priori estimate of V(r) using historical data, however there could be the case that the quality is to low, that’s way it is suggested another methodology. We can start to estimate each components, starting from the highest, one by one. o This method consists of maximize the variance with the usual constrain x’x=1 leaving all the estimation error in the last component, since we can better off the estimate of the first oneLogarithmic random walkIn finance we are interested to forecast return, however the uncertainty around return is not predictable (differently form gameof chance) so we need to make assumption on possible probability distribution. One of the first model used is the LRW. Itassumed that the price evolution over time are approximated according to a stochastic difference equationAs it is shown by the equation the current price level depends on the past evolution and on an idiosyncratic component, so it’slike saying that price movement are led by a modeled chance, meaning the underling distribution is assumed to be Gaussian. Thelog form is used since it allows for multi-period return to preserve normality since the log of a product is the sum of the log(linear function preserve the underling distribution). The idiosyncratic component has zero mean, constant variance and thecovariance among error across time is 0. Sometimes it is added the hp that those error are jointly normal distributed, hence theyare independent one to each other consistently with the time window, in fact if the observation are aggregated between periodthe new idiosyncratic component will be not correlated only for the new time window, but it will be correlated with the middleone, hence those middle observations must be dropped. Note that To Aggregate overtime the variance with a correlation14 We can use the absolute sum of the , but only numerical solution are available 8
    • Giulio Laudani #13 Cod. 20191structure between correlated observation is not any more “n” times the one period variance but ,hence the variance will increase an higher rate if the correlation is positive compared to the LRWvariance.Nowadays Logarithmic random walk is simply used as descriptive method used to made accruals on returns, since no otheralternative has reach enough consensus in finance field, however the LRWhp are counter-proofed by empirical dataPrice don’tevolve by chance as suggested by LRW&There exist a strong empirical evidence against constant variance and in favor of thepresence of correlation among securities; it can lead to negative Price levelThe accrual convention consists of annualizing return bymultiplying the expected return and volatility of one period by numberof period or by its square root, which is the correct procedure in case of LRW, while it is an accrual for the securitiesAnother proposed model is the Geometric RW which is basically the LRW applied on price instead of return, this modelhas a log normal distribution (hence a positive skewedness, which is related to the number of period considered). Some usefulproperties are: it cannot become negative, volatility is function of the level of price (lower for small price. Bigger for big one)Types of return and their propertiesEven if in finance we are interested on the price evolution over time, all the models and assumptions are based on returns. Theeasiest hp made on their possible evolution is basically supposing the existence of a stationarity process in the price, thisstatement is a big contentious in finance.There are two typologies of return, neither of them is better than the other, itdependon what we want to do: 15 Linear are best used for portfolio return over one time period to compute expected return and variance of portfolio, while the log return of portfolio doesn’t have alinear function to put together securities, hence any different combination of stocks have a not linear relationship making incredible difficult any optimization problem, because the return are even a function of the stocks return. not lin. Logarithmic are best used for single stock return over time, in this case the return will only depend on the initial and last elementof the time series ; The relationship between those returns can be better understood by using the Taylor expansion for the log return, which is the same of the linear if truncated at the first parameter . This formula shows how the difference between the linear and log return (for price ratio far away from 1) will be always greater than zero, since .In finance the ratio of consecutive prices (maybe corrected by taking into accountaccruals) is often modeled as a randomvariable with an expected value very near to 1.This implies that the two definitions shall give different values with sizableprobabilityonly when the variance (or more in general the dispersion) of the price ratio distributionis non-negligible, so thatobservations far from the expected value have non-negligibleprobability. Since standard models in finance assume that varianceof returns increaseswhen the time between returns increases, this implies that the two definitions shallmore likely implydifferent values when applied to long term returns.The mean is hard to estimate due to the relative size of volatility, which is so big that basically the IC ends up with including the0. Furthermore the increase of the frequency don’t provide any benefit nothing change since the monthlyWe are going to estimate the volatility measure using historical data, however there exist several procedure to be implementedfrom the simplest based on the LRWhp to more complex one to properly address some empirically volatility features. The one based on the LRW is simply the equally weighed sum of the difference between the i-observation and the mean, however this measure has one big drawback that is the hp on the equality between the marginal contribution of the new observation to improve the estimate and the one of the oldest.15 ; , it is linear 9
    • Giulio Laudani #13 Cod. 20191 To overcame this assumption forward a more market tailor procedure the financial industry has introduced the Riskmetrix procedure: The new formula is an exponentially smoothed estimate with coefficient usually around 0.95, 16 with boundary level set to >0 &<1; with the hp of zero mean  Alternatively we can be written as * where the last term is zero for n>> o The drawback of this estimate is the loss of the un-biasness property (if the data have a constant volatility), and the formula is basically like truncating the available infowith a daily data frequency at 1 years at most even for high coefficient  it is the case only if wi=1/n  it is minimize with wi=1/n by doing the LagrangianThe variance estimation (reducing the variance of the variance) on the other side is small relative to its estimation and it can beimproved by increasing the frequencyWhere the fourth moment is computed assuming Gaussian return with and the ;the monthly frequency formula becomes which is smaller than beforeBoth volatility estimation suffer of the so called ghost problem, meaning that extremely high new obs has a high impact in thelevel of our estimates. This behavior is asymmetric in fact incredible low obs are capped and it is more severe for the classicformula where the volatility level will change abruptly when the outlier goes out the sample or will be reduce at a rate 1/n if allthe sample is considered. In the case of the smoothed estimators we have a decking factor equal to 1/Markowitz optimization portfolio (Algebra calculus application)Markowitz optimization portfoliois a methodology to build mean variance efficient portfolio using a set of stocks. This in generalis not related with the CAPM, which is a general equilibrium model, however if we consider the whole market the Markowitzoptimization becomes the CAPM market portfolio itself The model is considering that the criterion used in the market is the mean variance efficiency and theinvestment time window is unique and preset at the beginning of the investment process (no change after that) o The hp to apply this method are that we know both the expected values of the single stock and the Var-Cov Matrix, if those assumption fail there will be problems on the error sampling side o One possible solution: The portfolio is built to minimize the variance with the constrain to achieve a specific return. One of the most important result is that the relative weight on the portfolio are the same and do not depend on the chosen return. This is a first instance of the separation theorem, meaning that the expected return that we want to achieve depend solely on the allocation between the risk free asset and the portfolio o The same result can be achieve by maximize the return given a certain risk, the tangency portfolio in this case is equivalent to the result of the previously equation. This a sort of mean variance utility function  The variance of the return is always equal to the portfolio one time the weight invest on it  The ratio of the expected value and its standard deviation is the same for all the portfolio, hence all the portfolio have the same marginal contribution on the composition of the stock portfolio risk We want to show to result the first one is simply that the and that the is the slope and that is the Sharpe ratio, basically we want to show that all the portfolio have the same value:16 This is a consequence of data property, the volatility of the mean is so high to not allow for small interval to have mean significantly different form 0, and a consequence to ensure amore conservative volatility estimates good for long term investor not for trader or hedger. 10
    • Giulio Laudani #13 Cod. 20191 o If we consider the weight: and plug into the Markowitz lambda: o If we plug this lambda in the Markowitz weight: o We have to consider the market allocation o The portfolio return will be: so computing the Expected and Variance of this equation and find the and equal them so we will end up: and Investors take on risk in order to generate higher expected returns. This trade-off implies that an investor must balance the return contribution of each securities against their portfolio risk contributions. Central to achieving this balance is some measure of the correlation of each investment’s returns with those of the portfolio We do not believe there is one optimally estimated covariance matrix. Rather, we use approaches designed to balance trade-offs along several dimensions and choose parameters that make sense for the task at hand. o One important trade-off arises from the desire to track time varying volatilities, which must be balanced against the imprecision that results from using only recent data. This balance is very different when the investment horizon is short, for example a few weeks, versus when it is longer, such as a quarter or a year. o Another trade-off arises from the desire to extract as much information from the data as possible, which argues toward measuring returns over short intervals. This desire must be balanced against the reality that the structure of volatility and correlation is not stable and may be contaminated by mean-reverting noise over very short intervals, such as intraday or even daily returns All the portfolio must have same Sharpe ratio if we can invested we can add as intercept the risk free rate, otherwise a self-financing strategy should have an intercept of zero assuming a portfolio equally weighted. The first term is , the second termProbability Mathematics and Laws this is the conditional probability, this formula allows as to update probability with new info, Bayes has proposed an alternative formula (more usable) Any random variable has an associated distribution: o Continuous case the “empirical” density is and the cumulative probability is and it is a strictly increasing function  The concept of percentile (q bounded between 0 and 1) is the probability that where z is the corresponding value which put q% of data/value  The excepted value is while the variance is  those are population moments, using real CDF Each distribution is defined by three parameters o Local are those which will shift the distribution to the right or left o Shape are the residual definition o Scale are those who will change the σ, anything else 11
    • Giulio Laudani #13 Cod. 20191 Possible distribution (useful) o Binomial or Bernoulli distribution. The parameters are “n” number of experiment and “p” probability of success for each experiment (it is assumed constant for all experiments) K is the target success occurrence, for n>> the binomial will approximate a Gaussian Distribution o Lognormal is a right skewed distribution o Multivariate distribution is the distribution of the jointly behavior of two or more variable Matrix operation: o The rank o A matrix is invertible iif it is full ranked and symmetric o A must have same col of the B’s row, the resulting matrix is A’s row and B’s col o o o o : o To sum rows: o With two vectors we can built a matrix: Inequalities o Tchebicev Inequality ( the model consider the module, so we will account for both tail side)  o Vysochankij-Petun Inequality  o Cantelli one side:  Distribution measures: o The skewedness measures the asymmetry of a distribution, it is the third moment centered. Positive value will indicate right asymmetry, the opposite the left o The kurtosis is a measure of the distribution in the shoulder or in the tails. The higher the value the higher the concentration in the tails. It will be affected by asymmetry as well and it is always >0Matlab question Data=xlsread(‘nomefile’,’worksheet’,’range’); if “worksheet =-1” opens an Excel window to interactively select data. Both string or number for worksheet Xlswrite(‘nome file’, dati , ‘worksheet’,range) Inv(A) is doing the inverse of the matrix [coeff, latent]= pcacov(A) is doing the pc: where “coeff” stands for the eigenvectors, while “latent” for the row of the lambda Flipud(A) the last element becomes the first 12
    • Giulio Laudani #13 Cod. 20191 Cov(A) will do the Var-Cov matrix for i=0:4:12 (…) end, where 0 is the starting value, 4 is length and 12 is the final value 13